Notebook 14 – Math 2121, Fall 2020

In this notebook we will explore an application of eigenvalues and eigenvectors to drawing graphs (in the sense of discrete mathematics, rather than calculus) and to ranking the importance of nodes in a network (specifically, the MTR system in Hong Kong).

8.2 μs
433 μs
10.7 s

A (simple, undirected) graph is a set of vertices V={v1,v2,,vn} along with a set of edges vivj between distinct vertices. We can draw a graph by plotting points that correspond to the vertices and drawing lines between vertices that are connected by an edge.

8.3 μs
plot_graph (generic function with 1 method)
57.1 μs
plot_graph_random (generic function with 1 method)
22.1 μs
graph_index (generic function with 1 method)
58.7 μs

For example, here is a graph with 6 vertices and 7 edges:

2.7 μs
6.7 s

The adjacency matrix of a graph with vertices V={v1,v2,,vn} is the n×n matrix that has 1 in position (i,j) if vivj is an edge and 0 in position (i,j) if vivj is not an edge.

The degree of a vertex vi is the number of edges that involve vi.

This is the sum of the entries in column i (or row i) of the adjacency matrix.

9.0 μs
graph_adjacency_matrix (generic function with 1 method)
71.8 μs
A
6×6 Array{Int64,2}:
 0  1  0  0  1  0
 1  0  1  0  1  0
 0  1  0  1  0  0
 0  0  1  0  1  1
 1  1  0  1  0  0
 0  0  0  1  0  0
30.8 ms
961 ns

The Laplacian matrix of a graph is the matrix L=DA where A is the adjacency matrix and D is the diagonal matrix whose entry in position (i,i) is the degree of vertex vi.

10.9 μs
graph_laplacian_matrix (generic function with 1 method)
28.1 μs
L
6×6 Array{Int64,2}:
  2  -1   0   0  -1   0
 -1   3  -1   0  -1   0
  0  -1   2  -1   0   0
  0   0  -1   3  -1  -1
 -1  -1   0  -1   3   0
  0   0   0  -1   0   1
314 ms

Some interesting facts:

  • The eigenvalues of the adjacency matrix A are always all real.

  • The eigenvalues of the Laplacian matrix L are all real and nonnegative.

  • The matrix L always has 0 as an eigenvalue.

  • The multiplicity of the 0-eigenvalue of L is the number of connected components of the graph.

5.6 μs
877 ms
41.4 μs
random_graph (generic function with 1 method)
51.1 μs
connected_components (generic function with 1 method)
37.5 μs
622 ms
3
59.8 ms

Here is an interesting algorithm for drawing a complicated graph:

  • Compute the adjacency matrix A or Laplacian matrix L.

  • Take two distinct eigenvectors u and v for this matrix.

  • Rescale these eigenvectors to have the same length.

  • Use the coordinates (u1,v1), (u2,v2), , (un,vn) for the vertices.

  • Often what works well is to use the two eigenvectors for L with smallest nonzero eigenvalue.

  • Another choice is the use the two eigenvectors for L with the largest eigenvalues.

5.0 μs
path_graph (generic function with 1 method)
54.0 μs
circle_graph (generic function with 1 method)
49.6 μs
complete_graph (generic function with 1 method)
80.8 μs
connected_random_graph (generic function with 2 methods)
42.6 μs

The following method creates the graph representing the station network for Hong Kong's MTR system, as it existed for a given input year.

2.6 μs
mtr_graph (generic function with 2 methods)
263 μs
plot_graph_adjacency_low (generic function with 1 method)
28.5 μs
plot_graph_adjacency_high (generic function with 1 method)
30.8 μs
plot_graph_laplacian_low (generic function with 1 method)
91.1 μs
plot_graph_laplacian_high (generic function with 1 method)
42.5 μs
get_adjacency_plots (generic function with 1 method)
21.1 μs
get_laplacian_plots (generic function with 1 method)
24.3 μs

Here is our starting graph from the example above, plotted randomly:

2.5 μs
10.9 ms

Here is the same graph, plotted using eigenvector methods based on the adjacency matrix:

2.4 μs
923 ms

Here is the same graph, plotted using eigenvector methods based on the Laplacian matrix:

2.4 μs
50.2 ms

In my opinion the plot using the eigenvectors of the Laplacian for the lowest nonzero eigenvalues as xy-coordinates gives the 'best' output.

Here are a few more examples to underscore this claim:

3.3 μs
157 ms
157 ms
169 ms
129 ms
140 ms
107 ms
159 ms
128 ms
101 ms

Remark: our eigenvector methods do not give such great pictures of the complete graph on n vertices.

6.2 μs
81.1 ms
161 ms
40.6 ms
739 ms
95
2.2 ms
5.3 ms
316 ms
669 ms
505 ms

The bottom left graph vaguely resembles the actual MTR map.

12.4 μs
plot_mtr_graph (generic function with 1 method)
72.0 μs
100 μs
29.1 ms
40.1 ms
80.3 ms
114 ms
110 ms
137 ms
203 ms
189 ms
203 ms
196 ms
219 ms

Another application:

  • Take the eigenvector of the adjacency matrix A with largest eigenvalue.

  • This vector can be rescaled to have all nonpositive entries.

  • These entries determine an order on the vertices of your graph.

  • This ordering (approximately) will be put important vertices first, unimportant ones last.

6.2 μs
rank_vertices (generic function with 1 method)
101 μs

Applying this strategy gives the following ranking of MTR stations, from most important to least important.

Does Lai King make sense as #1?

Certainly Chai Wan, Po Lam, and We Kai Sha are plausible as the least important stations, since each is located at the end of its respective line.

5.6 μs
796 ms

Here is a method to plot the relative 'eigenvector centrality' of various MTR stations as they evolve over time.

3.0 μs
plot_ranking (generic function with 1 method)
80.6 μs
1.4 s
230 ms

Some observations:

  • Central continues to become more central to the network.

  • In the early history of the MTR, Kowloon Tong was the most important station.

  • Lai King jumps ahead in centrality after the Airport Express line opens (on 6 July 1998).

  • Increased centrality of Diamond Hill station in 2020 probably due to new Kai Tak station.

2.9 μs

One more application. An n-step path in a graph is a sequence of n edges

vi0vi1vi2vin

such that consecutive edges share an endpoint.

The path starts at vertex vi0 and edges at vertex vin.

Taking powers of the adjacency matrix lets us count all n-step paths between two vertices.

The number of 1-step paths between vertices vi and vj is 1 is these two vertices are connected by an edge and 0 otherwise. This is just the entry in position (i,j) of the adjacency matrix A.

Similarly, the number of n-step paths between vertices vi and vj is the entry in position (i,j) of An.

9.2 μs
13.2 ms
1.5 μs
1.0 s
3
4.6 ms
6
51.0 μs

Consider the following modified adjacency matrix M: take the usual adjacency matrix A and divide all entries in column i by the degree of vertex i.

2.6 μs
degree_normalized_adjacency_matrix (generic function with 1 method)
30.0 μs

The entries in each column of M are nonnegative and sum to 1.0. The same is true of each column of any power Mn of this matrix. In other words, the entries in column i of Mn give a probability distribution on the vertices of your graph.

These probabilities give your chances of being at a givex vertex if you carry out the following random walk on your graph:

  • Start at vertex vi.

  • Choose an adjacent vertex (one connected by an edge) at random and move to that vertex.

  • Repeat n times.

We can plot this distribution.

4.5 μs
path_distribution (generic function with 1 method)
53.9 μs

As n becomes very large, the distribution of our random walk converges to some limiting set of probabilities that tell you how likely you are to be at a given vertex if you travel on the graph for a large number of steps, at each step choosing a random adjacent vertex to your current position to travel to.

Key fact: this limiting distribution is an eigenvector for M with eigenvalue 1.

If the 1-eigenspace of M is 1-dimensional, then there is a unique limiting distribution, given by the unique eigenvector with eigenvalue 1 whose entries sum to 1.

It can happen that this eigenspace has dimension greater than one. In this case more than one limiting distribution may exist for our random walk, and which one we end up in depends on the vertex the walk starts at.

11.7 μs
limiting_distribution (generic function with 1 method)
55.0 μs
distribution_animation (generic function with 1 method)
70.4 μs
4.9 s
30.4 ms

On the path graph, there are multiple limiting distributions:

15.0 μs
2.6 s
4.1 s
22.4 ms
1.1 s
1.1 s

For circle graphs with an even number of vertices, there are also multiple limiting distributions, as illustrated in the plots above.

However, for circle graphs with an odd number of vertices, there is a unique limiting distribution for our random walk, namely, the uniform distribution:

7.3 μs
1.2 s
165 ms

There is also a unique limiting distribution for the random walk on the MTR graph:

7.5 μs
2.3 s

This distribution is actually very simple: the limiting probabilities are proportional to the degrees of the vertices in the graph! We check this below:

7.9 μs
graph_degree (generic function with 1 method)
47.9 μs
plot_degree_distribution (generic function with 1 method)
59.6 μs
534 ms

This a general phenomenon. Compare the following with the limiting distribution plotted above:

24.9 μs
39.2 ms

One more interesting phenomenon: even if there is a unique limiting distribution, how fast our random walk converges to this distribution can still vary signficantly depending on which vertex the walk starts at.

10.6 μs
plot_convergence (generic function with 1 method)
63.3 μs
2.7 s
56.7 ms