Notebook 12 – Math 2121, Fall 2020

This notebook is about the algorithmic problem of computing eigenvalues and eigenvectors for (large) square matrices. First: why is this a problem?

If we know the eigenvalues $λ$ of a square matrix $A$ , then we can construct a basis for every nonzero eigenspace $N u l (A - λ I)$ using our algorithm based on row reduction. However, this is potentially time consuming if we have to row reduce many large matrices $A - λ I$ .

A bigger problem is figuring out the eigenvalues. The eigenvalues of an $n \times n$ matrix $A$ are the solutions to the polynomial equation $det (A - x I) = 0$ .

When $n = 2$ , we can solve this equation using the quadratic formula.

However, there is no analogue of the quadratic formula for polynomials of degree 5 or higher. So when $n \geq 5$ there is no formula to describe the exact solutions to $det (A - x I) = 0$ .

Even approximating these solutions is not easy if $n$ is large.

24.3 μs

xxxxxxxxxx
 
using LinearAlgebra

1.1 ms

xxxxxxxxxx
 
using Plots

15.8 s

There is a simple iterative algorithm to compute one eigenvalue of an $n \times n$ matrix $A$ , along with its eigenvector. This algorithm usually works if $A$ has $n$ distinct eigenvalues.

Here is the idea. Assume $A$ has real eigenvalues $λ_{1}, λ_{2}, \dots, λ_{n}$ .

Assume these eigenvalues satisfy $| λ_{1} | > | λ_{2} | \geq \dots \geq | λ_{n} | \geq 0$ .

Let $v_{1}, v_{2}, \dots, v_{n} \in R^{n}$ be the associated eigenvectors.

These are linearly independent, so form a basis for $R^{n}$ .

Now choose any vector $w = [\begin{matrix} w_{1} \\ w_{2} \\ ⋮ \\ w_{n} \end{matrix}] \in R^{n}$ . The length of $w$ is $∥ w ∥ = \sqrt{w_{1}^{2} + w_{2}^{2} + \dots + w_{n}^{2}}$

We can write this vector as a linear combination

$w = c_{1} v_{1} + c_{2} v_{2} + \dots + c_{n} v_{n}$

for $c_{1}, c_{2}, \dots, c_{n} \in R$ . Consider

$A w = c_{1} λ_{1} v_{1} + c_{2} λ_{2} v_{2} + \dots + c_{n} λ_{n} v_{n}$

$A^{2} w = c_{1} λ_{1}^{2} v_{1} + c_{2} λ_{2}^{2} v_{2} + \dots + c_{n} λ_{n}^{2} v_{n}$

$A^{3} w = c_{1} λ_{1}^{3} v_{1} + c_{2} λ_{2}^{3} v_{2} + \dots + c_{n} λ_{n}^{3} v_{n}$

$⋮$

$A^{N} w = c_{1} λ_{1}^{N} v_{1} + c_{2} λ_{2}^{N} v_{2} + \dots + c_{n} λ_{n}^{N} v_{n}$

When $N$ is very large, we have $(λ_{i} / λ_{1})^{N} \approx 0$ for $i = 2, 3, \dots, n$ so

$\frac{1}{λ_{1}^{N}} A^{N} w = c_{1} v_{1} + c_{2} (λ_{2} / λ_{1})^{N} v_{2} + \dots + c_{n} (λ_{n} / λ_{1})^{N} v_{n} \approx c_{1} v_{1} .$

Thus for $N ≫ 0$ we have $A^{N} w \approx c_{1} λ_{1}^{N} v_{1}$ which means $∥ A^{N} w ∥ \approx | c_{1} | | λ_{1} |^{N} ∥ v_{1} ∥$ .

Therefore if $N$ is large enough then

$u := \frac{1}{∥ A^{N} w ∥} A^{N} w \approx \pm \frac{1}{∥ v_{1} ∥} v_{1}$

If we compute this vector, then $A u \approx λ_{1} u$ and $u^{T} A u \approx λ_{1}$ since $∥ u ∥ = 1$ .

In this way we can compute the largest eigenvalue $λ_{1}$ of $A$ , along with one of its eigenvectors $u$ .

3.4 ms

discover_eigenvalue_iterative_method (generic function with 2 methods)

x
 
function discover_eigenvalue_iterative_method(A, tol=1e-12; limit=10000)
    # implements iterative algorithm to find largest eigenvalue
    
    # random starting vector
    w = rand(size(A)[1], 1)
    
    for i=1:limit
        x = A * w
        lambda = dot(x, w) # this computes w^T A w
        
        # stop iteration if we have already found an eigenvalue within tolerance
        if norm(x - lambda * w) < tol
            break
        end
        
        w = x / norm(x)
    end
    
    lambda = dot(A * w, w)
    return lambda, w
end

106 μs

random_matrix (generic function with 1 method)

xxxxxxxxxx
 
function random_matrix(n)
    # create a random diagonalizable matrix with real eigenvalues
    
    eigs = rand(-10:1:10, 1, n)
    
    A = zeros(n, n)
    for i=1:n
        A[i,i] = eigs[i]
    end
    
    while true
        B = rand(-1:0.01:1, n, n)
        if det(B) != 0
            return B * A * B^-1
        end
    end
end

57 μs

4×4 Array{Float64,2}:
 -4.45103  -0.70114  0.741133  -4.21311
 -1.18223  -4.02395  0.989379  -3.81804
  4.9414   -5.28547  2.7975     3.19111
 -1.26461  -2.53311  1.51567   -6.32252

 
Q = random_matrix(4)

29.2 μs

-10.0

4×1 Array{Float64,2}:
 -0.557762
 -0.531202
  0.150507
 -0.619736

xxxxxxxxxx
 
lambda, u = discover_eigenvalue_iterative_method(Q)

31.9 μs

4×1 Array{Float64,2}:
  5.57762
  5.31202
 -1.50507
  6.19736

4×1 Array{Float64,2}:
  5.57762
  5.31202
 -1.50507
  6.19736

xxxxxxxxxx
 
Q * u, lambda * u

18.1 μs

This algorithm works well most of the time, as we see in these example computations.

One situation where the iterative algorithm fails, even when $A$ has all real eigenvalues, is if the two largest eigenvalues of $A$ in absolute value satisfy $| λ_{1} | = | λ_{2} |$ and $λ_{1} + λ_{2} = 0$

13.9 μs

2×2 Array{Int64,2}:
 100     1
   0  -100

xxxxxxxxxx
 
B = [100 1; 0 -100]

5.6 μs

-45.4841

2×1 Array{Float64,2}:
 0.51996
 0.85419

 
lambdaB, uB = discover_eigenvalue_iterative_method(B)

12.2 ms

2×1 Array{Float64,2}:
  52.8502
 -85.419

2×1 Array{Float64,2}:
 -23.6499
 -38.8521

xxxxxxxxxx
 
B * uB, lambdaB * uB

15 μs

We can get around this issue with the following idea.

The eigenvalues of $A^{2}$ are the squares $λ^{2}$ of the eigenvalues $λ$ of $A$ .

So if $A$ has two eigenvalues $λ$ and $- λ$ , then $A^{2}$ will have only the eigenvalue $λ^{2}$ .

To avoid the issue of eigenvalues adding to zero, we first apply the iterative algorithm to $A^{2}$ .

If the positive square root of the result eigenvalue is $λ$ , then we apply the iterative algorithm again to $A + λ I$ or $A - λ I$ . These matrices will have eigenvalues $0$ and $\pm 2 λ$ rather than $- λ$ and $λ$ .

29.5 μs

discover_eigenvalue (generic function with 2 methods)

xxxxxxxxxx
 
function discover_eigenvalue(A, tol=1e-12; limit=10000)
    mu_squared, _ = discover_eigenvalue_iterative_method(A^2, tol; limit=limit)
    mu = abs(mu_squared)^0.5
    
    # decide which square root to return
    if rank(A - mu * I) < size(A)[1]
        nu, u = discover_eigenvalue_iterative_method(A + mu * I, tol; limit=limit)
        return nu - mu, u
    else
        nu, u = discover_eigenvalue_iterative_method(A - mu * I, tol; limit=limit)
        return nu + mu, u
    end
end

119 μs

100.0

2×1 Array{Float64,2}:
 1.0
 0.0

 
lambda_B, u_B = discover_eigenvalue(B)

35.8 μs

2×1 Array{Float64,2}:
 100.0
   0.0

2×1 Array{Float64,2}:
 100.0
   0.0

 
B * u_B, lambda_B * u_B

15.6 μs

The following code generates some pictures of the convergence process of the iterative eigenvalue algorithm applied to $2 \times 2$ matrices.

9.7 μs

plot_vector (generic function with 1 method)

xxxxxxxxxx
 
function plot_vector(v, scaling, color; opacity=0.25)
    v = v / norm(v) * (0.5 + 0.5 * (1 - (1 - scaling)^2))
    quiver!(quiver = ([v[1]],[v[2]]), [0], [0], color=color, opacity=opacity)
end

86.7 μs

animate_eigenvector_search (generic function with 2 methods)

xxxxxxxxxx
 
function animate_eigenvector_search(A, steps=25)
    v = rand(2, 1)
    w = rand(2, 1)
    
    p = scatter([0], [0], xlim=(-1,1), ylim=(-1,1), legend=false, aspect_ratio=:equal,
        title="Iterative method of finding eigenvectors and eigenvalues")
    
    eig = eigen(A)
    if (A[1,1] - A[2,2])^2 + 4 * A[1,2] * A[2,1] >= 0
        plot_vector(eig.vectors[:,1], 1.0, :grey; opacity=1.0)
        plot_vector(-eig.vectors[:,1], 1.0, :grey; opacity=1.0)
        plot_vector(eig.vectors[:,2], 1.0, :grey; opacity=1.0)
        plot_vector(-eig.vectors[:,2], 1.0, :grey; opacity=1.0)
    end
    
    anim = @animate for i=0:2 * steps + 1
        # find first eigenvector v
        if i <= steps       
            plot_vector(v, i / steps, :blue)
            v = A * v
            v = v / norm(v)
        # replace A by A - lambda I
        elseif i == steps + 1
            eigenvalue_v = (A * v)[1] / v[1]
            A = A - I * eigenvalue_v
        # first second eigevector w
        else
            plot_vector(w, (i - steps - 1) / steps, :orange)
            w = A * w
            w = w / norm(w)
        end
    end every 1
​
    gif(anim, fps = 10)
end

147 μs

2×2 Array{Float64,2}:
 -10.6148   8.80783
  -3.04604  7.61477

xxxxxxxxxx
 
A = random_matrix(2)

35.4 μs

xxxxxxxxxx
 
animate_eigenvector_search(A, 50)

5.9 s

The blue vectors are the iterations of our eigenvalue algorithm applied to $A$ . The picture shows earlier iterations are shorter vector than later ones.

Once the algorithm converges to an eigenvector $u$ with eigenvalue $λ$ , we can apply iterative algorithm a second time to $A - λ I$ . The iterations of this algorithm are the orange vectors.

If the second algorithm converges to an eigenvector $v$ with eigenvalue $μ$ , then we can have diagonalized $A = P D P^{- 1}$ where $P = [\begin{array}{cc} u & v \end{array}]$ and $D = [\begin{array}{cc} λ & 0 \\ 0 & μ \end{array}]$ .

20.4 μs

bad_matrix (generic function with 1 method)

 
function bad_matrix()
    while true
        A = 4 * (rand(2, 2) .- 0.5)
        a, b, c, d = A[1,1], A[1,2], A[2,1], A[2,2]
        if (a - d)^2 + 4 * b * c < 0 
            return A
        end
    end
end

64.3 μs

For some matrices $C$ , the iterative algorithm will fail to converge to any eigenvectors and we won't be able to discover an eigenvalue.

The failure here is related to the matrix $C$ having complex eigenvalues, which we'll discuss next time. Here is what the iterative algorithm looks like applied to such matrices:

14.5 μs

2×2 Array{Float64,2}:
  1.94497   1.4051
 -1.87377  -0.0389516

 
C = bad_matrix()

8.5 μs

xxxxxxxxxx
 
animate_eigenvector_search(C, 50)

6.1 s

Let's ignore this situation.

We can use the iterative algorithm as a building block in a larger algorithm to find all the eigenvalues along with bases for all the nonzero eigenspaces.

14.7 μs

RREF (generic function with 2 methods)

494 μs

column_space_basis (generic function with 2 methods)

xxxxxxxxxx
 
function column_space_basis(A, tol=1e-6)
    # function we'll need to compute basis of column space of a matrix
​
    pivots, _ = pivot_positions(A, tol)
    ans = zeros(size(A)[1], 0)
    for (i, j) = pivots
        ans = hcat(ans, A[1:end, j])
    end
    ans
end

68.7 μs

null_space_basis (generic function with 2 methods)

xxxxxxxxxx
 
function null_space_basis(A, tol=1e-6)
    # function we'll need to compute basis of null space of a matrix
    
    n = size(A)[2]
    pivots, rref = pivot_positions(A, tol)
    free = [j for j=1:n if ! any(j == pivot[2] for pivot=pivots)]
    
    ans = zeros(n, length(free))
    for k = 1:length(free)
        ans[free[k], k] = 1
        for (i, j) = pivots
            ans[j, k] = -rref[i, free[k]]
        end
    end
    ans
end

125 μs

3×4 Array{Int64,2}:
 1  1  2  2
 2  2  4  4
 1  2  1  3

 
V = [ 1 1 2 2; 2 2 4 4; 1 2 1 3 ]

51.1 ms

3×2 Array{Float64,2}:
 1.0  1.0
 2.0  2.0
 1.0  2.0

 
column_space_basis(V)

453 ms

4×2 Array{Float64,2}:
 -3.0  -1.0
  1.0  -1.0
  1.0   0.0
  0.0   1.0

xxxxxxxxxx
 
null_space_basis(V)

154 ms

Here is the algorithm to find all eigenvalues of an $n \times n$ matrix $A$ :

First find eigenvalue with largest absolute value $λ$ .
Compute basis $u_{1}, u_{2}, \dots, u_{m}$ of eigenspace $N u l (A - λ I)$ .
Find $v_{1}, v_{2}, \dots, v_{n - m}$ such that $u_{1}, u_{2}, \dots, u_{m}, v_{1}, v_{2}, \dots, v_{n - m}$ is a basis for $R^{n}$ .

Let $K : R^{n} \to R^{n - m}$ be linear transformation with

$K (u_{i}) = 0 and K (v_{i}) = e_{i} .$

Let $L : R^{n - m} \to R^{n}$ be linear transformation with $L (e_{i}) = v_{i}$ .

Compute $(n - m) \times (n - m)$ matrix $B$ that is the standard matrix of $K \circ L$ .
Now, recursively, compute basis of $R^{n - m}$ consisting of eigenvectors $w$ for $B$ .

Can show that if $w \in R^{n - m}$ has $B w = μ w$ then $λ \neq μ$ and

$(A - λ I) [\begin{array}{cccc} v_{1} & v_{2} & \dots & v_{n - m} \end{array}] w$

is an eigenvector for $A$ with eigenvalue $μ$ .

9.5 μs

discover_all_eigenvalues (generic function with 2 methods)

x
 
function discover_all_eigenvalues(A, tol=1e-12)
    # returns list `eigenvalues` and matrix `eigenvectors` if possible to compute
    # the list is the sequence of eigenvalues of A with multiplicity
    # the columns of the matrix are the corresponding eigenvectors
    
    n = size(A)[1]
    
    # if dim Nul A == n then A is zero matrix
    if size(null_space_basis(A))[2] == n
        eigenvalues = [0 for i=1:n]
        eigenvectors = rand(n, n)
        return eigenvalues, eigenvectors
    else
        # find largest eigenvalue, and basis for corresponding eigenspace
        lambda, v = discover_eigenvalue(A, tol)
        eigenvectors = null_space_basis(A - lambda * I, sqrt(tol))
        m = size(eigenvectors)[2]
        eigenvalues = [lambda for i=1:m]
​
        # compute basis u_1, u_2, ..., u_m, v_1, v_2, ..., v_{n-m}
        basis = column_space_basis(hcat(eigenvectors, I))
        
        # compute matrix B
        B = (basis^-1)[m + 1:end, 1:end] * A * basis[1:end, m + 1:end]
        
        # recursively construct eigenvalues and eigenvectors for B
        values, vectors = discover_all_eigenvalues(B, tol)
        
        for i=1:n - m
            mu = values[i]
            w = basis[1:end, m + 1:end] * vectors[1:end, i]
            next_eigenvector = A * w - lambda * w
            
            # optional: rescale eigenvector. may give more accurate result
            next_eigenvector = next_eigenvector / norm(next_eigenvector)
            
            # add next eigenvalue and eigenvector to return values
            append!(eigenvalues, values[i])
            eigenvectors = hcat(eigenvectors, next_eigenvector)
        end
        
        return eigenvalues, eigenvectors
    end
end

103 μs

This relatively simple algorithm works fairly well for random matrices with all real eigenvalues. Such a matrix will be diagonalizable with high probability.

14.2 μs

20×20 Array{Float64,2}:
 -4.68913    -0.617843  -1.6796    -2.91886   …  -1.94144    11.0339     3.43901
  0.195941    8.7241    -0.150681  -1.77377      -4.13233     3.3259     7.81364
 -4.68126     9.92404   -7.27107   -4.07498      -2.46368    -3.58807    3.66445
 -1.30953     9.94846   -4.01916   -3.29598      -6.00274    10.4173     9.55129
 -3.53049     7.13987   -5.55512   -0.119382     -2.77807     0.736316   4.71966
 -5.21105     4.60439   -8.11439    0.858668  …  -3.46147    -2.70693    2.82454
  5.46664     5.04265    4.37109    1.33311      -0.90611     0.441502   0.178801
  ⋮                                           ⋱                         
 -2.72044    -4.67965    0.120696  -1.92521      -0.977146    7.9267     3.97427
  3.69971    -6.24075    1.11717    7.25487   …   3.35706   -10.8081    -8.07071
 -5.2448     -4.44301   -3.69576   -0.811984     -2.02894     2.92825    3.09466
  2.90048   -10.205      5.90078    3.02679       0.799246    1.4141    -5.80947
 -3.60132     5.28736   -2.90162    0.424574     -2.93991     5.91369   10.4931
 -5.3127      9.44401   -8.29506   -0.502237     -3.83783    -4.90386    4.78342

xxxxxxxxxx
 
M = random_matrix(20)

57.9 μs

1Float641

-10.0

-9.0

-8.0

-8.0

7.0

-7.0

-6.0

-6.0

-5.0

5.0

4.0

-2.0

-1.0

1.0

1.0

1.0

2.0

3.0

3.0

3.0

20×20 Array{Float64,2}:
 -0.010101   0.228228   -0.150134   …   0.0674063  -0.142466      0.131463
 -0.79798    0.202461   -0.0336989     -0.38511     0.252647      0.147469
  0.585859   0.272401   -0.126279       0.0628273   0.148879      0.0948665
 -0.717172  -0.147244    0.137405       0.201773   -0.45317       0.123764
  0.40404    0.117795   -0.0128432      0.0722329  -0.206481      0.369894
  0.40404    0.327618   -0.149195   …  -0.537339    0.364683      0.209404
  0.414141  -0.294488   -0.120648       0.156754   -0.24368      -0.16277
  ⋮                                 ⋱                            
 -0.656566   0.283445   -0.010508       0.14895     0.113807     -0.229795
  0.838384   0.103071   -0.315836   …  -0.0744556   0.0133835     0.022895
 -0.888889   0.0809842   0.381928       0.237066   -0.000742201  -0.238057
 -0.767677  -0.209823    0.0150178     -0.158967    0.0414805    -0.22779
 -0.717172   0.298169    0.327534       0.229747   -0.337764     -0.0178747
  1.0        0.283445   -0.0131408      0.0300725   0.262729     -0.0655744

 
D, P = discover_all_eigenvalues(M)

62.9 ms

xxxxxxxxxx
 
# P * Diagonal(D) * P^-1, M

1.4 μs

6.296838914532222e-10

x
 
norm(P * Diagonal(D) - M * P) # norm computes sqrt(sum of squares of all entries)

34.3 μs

Of course, more advanced algorithms to do the same computations are already implemented in the Julia package LinearAlgebra and these are more robust, faster, and more accurate.

9.2 μs

Eigen{Float64,Float64,Array{Float64,2},Array{Float64,1}}
values:
20-element Array{Float64,1}:
 -9.999999999999991
 -8.99999999999999
 -8.0
 -7.999999999999989
 -6.999999999999989
 -5.999999999999993
 -5.9999999999999805
  ⋮
  2.9999999999999827
  2.9999999999999987
  2.9999999999999987
  4.000000000000007
  5.000000000000008
  7.000000000000016
vectors:
20×20 Array{Float64,2}:
  0.00350381   0.228228    0.268391   …  -0.374486  -0.0599401   -0.0495405
  0.276801     0.202461   -0.0678501      0.30427   -0.167832    -0.324265
 -0.203221     0.272401    0.187501       0.234054   0.355645    -0.099081
  0.248771    -0.147244   -0.155182      -0.339378  -0.339661    -0.261214
 -0.140153     0.117795   -0.0659771     -0.273063   0.023976    -0.292739
 -0.140153     0.327618   -0.0538223  …  -0.296468   0.195804    -0.211673
 -0.143656    -0.294488   -0.086948       0.191144  -0.0679321   -0.117096
  ⋮                                   ⋱                          
  0.227748     0.283445    0.0063399     -0.13263   -0.0679321    0.157629
 -0.290817     0.103071    0.0546691  …   0.206748  -0.395605     0.193658
  0.308336     0.0809842  -0.369211      -0.269162   0.00399601   0.17114
  0.26629     -0.209823   -0.17892        0.039009   0.0639361    0.25671
  0.248771     0.298169   -0.453486      -0.374486  -0.235764    -0.274725
 -0.346878     0.283445   -0.133741      -0.156036   0.023976    -0.364798

xxxxxxxxxx
 
a = eigen(M)

145 μs

2.0779501385380722e-13

x
 
norm(a.vectors * Diagonal(a.values) * a.vectors^-1 - M)

121 μs