Notebook 18 – Math 2121, Fall 2020

In today's notebook we discuss some applications of singular value decompositions.

10.2 μs
414 μs
9.6 s
945 ms

Every square matrix maps the unit disc to an ellipse

We proved this fact in class for 2×2 matrices.

Let's confirm this by drawing some pictures.

10.6 μs
plot_path (generic function with 2 methods)
35.7 μs

Construct the unit disc {vR2:vv1}:

6.0 μs
2×2601 Transpose{Float64,Array{Float64,2}}:
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  …   0.968583   0.992115   1.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0     -0.24869   -0.125333  -2.44929e-16
138 μs

Make a 2×2 matrix:

6.1 μs
M
2×2 Array{Float64,2}:
 -1.674  7.531
  5.59   6.774
5.3 μs
4.6 s

Application: subspaces of best fit

Suppose we have a list of vectors v1,v2,,vnRm that all belong to the same k-dimensional subspace. Then the matrix

A=[v1v2vn]

will have rank (at most) k and by finding a basis for the column space of this matrix we can find a subspace of dimension (at most) k that contains our input vectors.

But what if the vectors v1,v2,,vn are noisy measures, so they don't all exactly belong to a k-dimensional subspace but just all close to one. Can we still find the k-dimensional subspace that approximately contains our vectors?

Here is one way to do this:

  • Compute an SVD A=UΣVT.

  • Set all entries in Σ outside the upper left k×k block to zero.

  • Call the resulting matrix Σ|k.

  • Compute a basis for the column space of B:=U(Σ|k)VT.

The matrix B with have rank (at most) k and its column space should be close to the k-dimensional subspace containing our input vectors.

10.0 μs
38.0 ns

We will illustrate this idea below when k=1.

In this case, what we have just described is another way to compute a line of best fit.

4.6 μs
noise (generic function with 2 methods)
39.4 μs
m
50
50.0 ns

Create a random 2×m matrix whose columns all belong to the same line

2.5 μs
data
2×50 Array{Float64,2}:
 327.446  60.319  396.382  77.553  336.063  …  379.148  267.127  -258.51  422.233
  37.278   6.867   45.126   8.829   38.259      43.164   30.411   -29.43   48.069
45.4 μs
464 ms
54.0 ms

Application: compressed sensing

Suppose you're running a service like Netflix where there are a large number of users who can watch some large number of movies. Each user rates each movie with a score between 1.0 and 5.0.

Consider the matrix whose rows are indexed by movies and whose columns are indexed by users, in which each entry is one user's score of one movie.

If every user had watched every movie, then we could fill out every entry in this matrix and we would know every user's preferences. We make the following hopeful assumption about this ideal matrix: it probably has low rank compared to the number of users and movies.

In other words, we assume there are some small number of abstract user 'types', maybe people who only like horror movies or only like movies in certain languages or only likes comedies, and the preferences of a typical user is (approximately) a linear combination of these basic types.

In practice, we only have access to a noisy version of this matrix, because not every user has watched every movie. For movies a user hasn't watched, we assign the default rating of 2.5.

Is it possible to recover (approximately) the ideal matrix from the noisy version? That is, can we predict what a user's rating on an unseen movie will be? This is a compressed sensing problem.

A version of this problem was the basis for the USD 1 million Netflix Prize about ten years ago.

12.5 μs
users
4000
32.0 ns
movies
400
32.0 ns
watch_frequency
0.75
32.0 ns

Here is one way we might try to predict user ratings.

If the ideal ratings matrix Mideal has small rank r, then it only has r nonzero singular values.

It follows that all but the largest r singular values of the noisy ratings matrix Mnoisy should be close to zero. By inspecting the singular values of the noisy matrix, we can guess what r is.

Then we try to recover Mideal by computing a singular value decomposition

Mnoisy=UΣVT.

Then we set all but the first r diagonal entries of Σ to zero to form Σ|r, and calculate

MidealU(Σ|r)VT.

If the noisy ratings matrix is not too noisy, then this very simple method of recovering the ideal ratings matrix can work fairly well.

In practice, perfect ratings predictions are difficult, but every small improvements to your prediction algorithm can be extremely valuable, if there are a very large number of users.

10.7 μs
generate_noiseless_ratings_matrix (generic function with 1 method)
38.0 μs
generate_noisy_ratings_matrix (generic function with 1 method)
62.5 μs
39.4 ms
svals
255 ms
174 ms
163 ms

The location of the last spike in this graph is a good guess for the rank of our ideal ratings matrix.

2.8 μs
recover (generic function with 1 method)
59.3 μs

Error without any attempt at recovery:

2.5 μs
620.3058865020329
7.2 ms

Now let's attempt to recover the ideal ratings matrix use the SVD algorithm from above:

2.6 μs
7.1 s

The minimum error should occur when our recovered matrix has the rank we predicted for the ideal ratings matrix.

3.4 μs
16.9 ms