Notebook 18 – Math 2121, Fall 2020
In today's notebook we discuss some applications of singular value decompositions.
xxxxxxxxxxusing LinearAlgebraxxxxxxxxxxusing Plotsxxxxxxxxxxplotly()Every square matrix maps the unit disc to an ellipse
We proved this fact in class for
Let's confirm this by drawing some pictures.
plot_path (generic function with 2 methods)xxxxxxxxxxfunction plot_path(path, title="") plot(path[1,:], path[2,:], aspect_ratio=:equal, title=title, legend=false) scatter!([0], [0])endConstruct the unit disc
2×2601 Transpose{Float64,Array{Float64,2}}:
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 … 0.968583 0.992115 1.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 -0.24869 -0.125333 -2.44929e-16xxxxxxxxxxbegin dr = 0.02 dt = 0.02 Disc = transpose(hcat( [r * cos(2 * pi * t) for r=0:dr:1 for t=0.0:dt:1], [r * sin(2 * pi * t) for r=0:dr:1 for t=0.0:dt:1 ]))endMake a
2×2 Array{Float64,2}:
-1.674 7.531
5.59 6.774xxxxxxxxxxM = rand(-10:0.001:10, 2, 2)xxxxxxxxxxbegin # compute SVD of matrix U, S, V = svd(M) p1 = plot_path(Disc, "Unit disc D") # columns of V quiver!(quiver = ([V[1, 1]],[V[2, 1]]), [0], [0], color=:yellow) quiver!(quiver = ([V[1, 2]],[V[2, 2]]), [0], [0], color=:violet) p2 = plot_path(M * Disc, "Ellipse {Mv : v in D}") # columns of U * S quiver!(quiver = ([U[1, 1] * S[1]],[U[2, 1] * S[1]]), [0], [0], color=:yellow) quiver!(quiver = ([U[1, 2] * S[2]],[U[2, 2] * S[2]]), [0], [0], color=:violet) plot(p1, p2)endApplication: subspaces of best fit
Suppose we have a list of vectors
will have rank (at most)
But what if the vectors
Here is one way to do this:
Compute an SVD
.Set all entries in
outside the upper left block to zero.Call the resulting matrix
.Compute a basis for the column space of
.
The matrix
We will illustrate this idea below when
In this case, what we have just described is another way to compute a line of best fit.
noise (generic function with 2 methods)xxxxxxxxxxfunction noise(m, n, magnitude=0.001) # returns an m-by-n matrix of noise return magnitude * rand(m, n) .- magnitude/2end50xxxxxxxxxxm = 50Create a random
2×50 Array{Float64,2}:
327.446 60.319 396.382 77.553 336.063 … 379.148 267.127 -258.51 422.233
37.278 6.867 45.126 8.829 38.259 43.164 30.411 -29.43 48.069xxxxxxxxxxdata = rand(-10:0.001:10, 2, 1) * rand(-50:50, 1, m)xxxxxxxxxxbegin x1, y1 = minimum(data, dims=2); x2, y2 = maximum(data, dims=2); xlim = (x1 - 10, x2 + 10); ylim = (y1 - 10, y2 + 10); scatter(data[1,:], data[2,:], legend=false, title="Datapoints without noise") plot!([data[1, 1]*x for x=-5:5], [data[2, 1]*x for x=-5:5], xlim=xlim, ylim=ylim)endxxxxxxxxxxbegin ndata = data + noise(2, m, 20) scatter(ndata[1,:], ndata[2,:], legend=false, title="Datapoints with noise") _U, _S, _V = svd(ndata) B = _U * Diagonal([_S[1], 0]) * transpose(_V) plot!([B[1, 1]*x for x=-50:50], [B[2, 1]*x for x=-50:50], xlim=xlim, ylim=ylim)endApplication: compressed sensing
Suppose you're running a service like Netflix where there are a large number of users who can watch some large number of movies. Each user rates each movie with a score between 1.0 and 5.0.
Consider the matrix whose rows are indexed by movies and whose columns are indexed by users, in which each entry is one user's score of one movie.
If every user had watched every movie, then we could fill out every entry in this matrix and we would know every user's preferences. We make the following hopeful assumption about this ideal matrix: it probably has low rank compared to the number of users and movies.
In other words, we assume there are some small number of abstract user 'types', maybe people who only like horror movies or only like movies in certain languages or only likes comedies, and the preferences of a typical user is (approximately) a linear combination of these basic types.
In practice, we only have access to a noisy version of this matrix, because not every user has watched every movie. For movies a user hasn't watched, we assign the default rating of 2.5.
Is it possible to recover (approximately) the ideal matrix from the noisy version? That is, can we predict what a user's rating on an unseen movie will be? This is a compressed sensing problem.
A version of this problem was the basis for the USD 1 million Netflix Prize about ten years ago.
4000xxxxxxxxxxusers = 4000400xxxxxxxxxxmovies = 4000.75xxxxxxxxxxwatch_frequency = 0.75Here is one way we might try to predict user ratings.
If the ideal ratings matrix
It follows that all but the largest
Then we try to recover
Then we set all but the first
If the noisy ratings matrix is not too noisy, then this very simple method of recovering the ideal ratings matrix can work fairly well.
In practice, perfect ratings predictions are difficult, but every small improvements to your prediction algorithm can be extremely valuable, if there are a very large number of users.
generate_noiseless_ratings_matrix (generic function with 1 method)xxxxxxxxxxfunction generate_noiseless_ratings_matrix(users, movies) rank = rand(3:12) basis = rand(1:5, rank, movies) combinations = rand(users, rank) combinations = Diagonal(sum(combinations, dims=2)[:])^-1 * combinations noiseless_matrix = transpose(combinations * basis)endgenerate_noisy_ratings_matrix (generic function with 1 method)xxxxxxxxxxfunction generate_noisy_ratings_matrix(users, movies, watch_frequency) # returns noisy rating matrix plus a function to compute error of # recovered matrix compared to ideal rating matrix (which is not returned) noiseless_matrix = generate_noiseless_ratings_matrix(users, movies) noisy_matrix = noiseless_matrix[:,:] m, n = size(noiseless_matrix) noise_mask = zeros(m, n) for i=1:m for j=1:n if rand() > watch_frequency noisy_matrix[i, j] = 2.5 noise_mask[i, j] = 1 end end end measure_error = x -> norm((noiseless_matrix - x) .* noise_mask) return noisy_matrix, measure_errorend 400×4000 Array{Float64,2}:
3.72258 3.27579 2.5 2.5 3.63034 … 3.72601 3.8469 3.32659 3.91386
3.89421 4.05451 4.1789 4.28448 4.58943 4.1644 3.81497 3.24141 2.5
3.5913 2.5 2.5 2.24011 2.5 3.56145 3.64509 4.05129 2.6286
2.9963 3.18952 2.5 2.36413 2.91828 2.96441 2.96438 2.5 2.68022
2.17202 2.5 1.49161 3.23483 2.5 2.35855 2.20062 2.5 3.03937
2.84238 3.11622 2.5 2.63011 3.08712 … 2.86364 2.70931 2.94004 3.16261
3.28851 3.15564 2.5 4.17942 3.61483 3.38076 3.25995 2.82698 2.5
⋮ ⋱
2.77538 2.70192 2.28838 2.5 2.5 3.09185 2.8094 1.86173 3.68248
3.04346 2.5 2.20903 2.5 3.07184 … 3.16449 2.5 2.5 3.01874
3.24505 2.5 2.5 3.88973 2.5 2.5 3.08237 3.19508 4.02675
3.60239 3.26188 2.51181 4.14773 3.53322 3.66822 3.75195 2.5 2.5
4.43874 4.41847 4.30483 2.5 4.36425 4.44593 4.47941 2.5 4.24331
3.56029 3.0937 2.5 4.62951 2.5 2.5 3.66375 2.5 3.70153#1
xxxxxxxxxxnoisy, measure_error = generate_noisy_ratings_matrix(users, movies, watch_frequency)3749.64
233.945
212.981
211.019
64.1959
63.2061
62.4739
61.6208
57.1018
56.888
56.4822
56.1183
55.3432
54.643
53.7979
51.8557
51.0547
50.7022
50.2693
50.108
7.41396
7.32439
7.26869
7.20832
7.13759
7.05278
6.9317
6.80644
4.24778
4.19142
xxxxxxxxxxsvals = svd(noisy).Sxxxxxxxxxxplot(log.(svals), legend=false, title="Logarithms of singular values of noisy matrix")xxxxxxxxxxplot(-diff(log.(svals))[1:25], legend=false, title="Differences between logarithms of consecutive singular values")The location of the last spike in this graph is a good guess for the rank of our ideal ratings matrix.
recover (generic function with 1 method)xxxxxxxxxxfunction recover(A, r) # given SVD for A = U * Sigma * V^T, this method replaces # all but first r singular values of A by zero # and then returns U * (modified Sigma) * V^T U, S, V = svd(A) S = Diagonal([i > r ? 0.0 : S[i] for i=1:size(S)[1]]) return U * S * transpose(V)endError without any attempt at recovery:
620.3058865020329xxxxxxxxxxmeasure_error(noisy)Now let's attempt to recover the ideal ratings matrix use the SVD algorithm from above:
xxxxxxxxxxerror = [measure_error(recover(noisy, x)) for x = 1:25];The minimum error should occur when our recovered matrix has the rank we predicted for the ideal ratings matrix.
x
plot( error, ylim = (0, maximum(error) * 1.1), legend=false, title="Error of recovered matrix compared to true matrix", xlabel="x: rank of recovered matrix",)