Notebook 18 – Math 2121, Fall 2020

In today's notebook we discuss some applications of singular value decompositions.

10.2 μs

xxxxxxxxxx
 
using LinearAlgebra

414 μs

xxxxxxxxxx
 
using Plots

9.6 s

Plots.PlotlyBackend

xxxxxxxxxx
 
plotly()

945 ms

Every square matrix maps the unit disc to an ellipse

We proved this fact in class for $2 \times 2$ matrices.

Let's confirm this by drawing some pictures.

10.6 μs

plot_path (generic function with 2 methods)

xxxxxxxxxx
 
function plot_path(path, title="")
    plot(path[1,:], path[2,:], aspect_ratio=:equal, title=title, legend=false)
    scatter!([0], [0])
end

35.7 μs

Construct the unit disc ${v \in R^{2} : v ∙ v \leq 1}$ :

6.0 μs

2×2601 Transpose{Float64,Array{Float64,2}}:
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  …   0.968583   0.992115   1.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0     -0.24869   -0.125333  -2.44929e-16

xxxxxxxxxx
 
begin
    dr = 0.02
    dt = 0.02
    Disc = transpose(hcat(
        [r * cos(2 * pi * t) for r=0:dr:1 for t=0.0:dt:1], 
        [r * sin(2 * pi * t) for r=0:dr:1 for t=0.0:dt:1 ]))
end

138 μs

Make a $2 \times 2$ matrix:

6.1 μs

2×2 Array{Float64,2}:
 -1.674  7.531
  5.59   6.774

xxxxxxxxxx
 
M = rand(-10:0.001:10, 2, 2)

5.3 μs

xxxxxxxxxx
 
begin
    # compute SVD of matrix
    U, S, V = svd(M)
    
    p1 = plot_path(Disc, "Unit disc D")
    
    # columns of V
    quiver!(quiver = ([V[1, 1]],[V[2, 1]]), [0], [0], color=:yellow)
    quiver!(quiver = ([V[1, 2]],[V[2, 2]]), [0], [0], color=:violet)
    
    p2 = plot_path(M * Disc, "Ellipse {Mv : v in D}")
    
    # columns of U * S
    quiver!(quiver = ([U[1, 1] * S[1]],[U[2, 1] * S[1]]), [0], [0], color=:yellow)
    quiver!(quiver = ([U[1, 2] * S[2]],[U[2, 2] * S[2]]), [0], [0], color=:violet)
    
    plot(p1, p2)
end

4.6 s

Application: subspaces of best fit

Suppose we have a list of vectors $v_{1}, v_{2}, \dots, v_{n} \in R^{m}$ that all belong to the same $k$ -dimensional subspace. Then the matrix

$A = [\begin{array}{cccc} v_{1} & v_{2} & \dots & v_{n} \end{array}]$

will have rank (at most) $k$ and by finding a basis for the column space of this matrix we can find a subspace of dimension (at most) $k$ that contains our input vectors.

But what if the vectors $v_{1}, v_{2}, \dots, v_{n}$ are noisy measures, so they don't all exactly belong to a $k$ -dimensional subspace but just all close to one. Can we still find the $k$ -dimensional subspace that approximately contains our vectors?

Here is one way to do this:

Compute an SVD $A = U Σ V^{T}$ .
Set all entries in $Σ$ outside the upper left $k \times k$ block to zero.
Call the resulting matrix $Σ |_{k}$ .
Compute a basis for the column space of $B := U (Σ |_{k}) V^{T}$ .

The matrix $B$ with have rank (at most) $k$ and its column space should be close to the $k$ -dimensional subspace containing our input vectors.

10.0 μs

38.0 ns

We will illustrate this idea below when $k = 1$ .

In this case, what we have just described is another way to compute a line of best fit.

4.6 μs

noise (generic function with 2 methods)

xxxxxxxxxx
 
function noise(m, n, magnitude=0.001)
    # returns an m-by-n matrix of noise
    return magnitude * rand(m, n) .- magnitude/2
end

39.4 μs

xxxxxxxxxx
 
m = 50

50.0 ns

Create a random $2 \times m$ matrix whose columns all belong to the same line

2.5 μs

data

2×50 Array{Float64,2}:
 327.446  60.319  396.382  77.553  336.063  …  379.148  267.127  -258.51  422.233
  37.278   6.867   45.126   8.829   38.259      43.164   30.411   -29.43   48.069

xxxxxxxxxx
 
data = rand(-10:0.001:10, 2, 1) * rand(-50:50, 1, m)

45.4 μs

xxxxxxxxxx
 
begin
    x1, y1 = minimum(data, dims=2);
    x2, y2 = maximum(data, dims=2);
    xlim = (x1 - 10, x2 + 10);
    ylim = (y1 - 10, y2 + 10);
    
    scatter(data[1,:], data[2,:], legend=false, title="Datapoints without noise")
    plot!([data[1, 1]*x for x=-5:5], [data[2, 1]*x for x=-5:5], xlim=xlim, ylim=ylim)
end

464 ms

xxxxxxxxxx
 
begin
    ndata = data + noise(2, m, 20)
    scatter(ndata[1,:], ndata[2,:], legend=false, title="Datapoints with noise")
    
    _U, _S, _V = svd(ndata)
    B = _U * Diagonal([_S[1], 0]) * transpose(_V)
    
    plot!([B[1, 1]*x for x=-50:50], [B[2, 1]*x for x=-50:50], xlim=xlim, ylim=ylim)
​
end

54.0 ms

Application: compressed sensing

Suppose you're running a service like Netflix where there are a large number of users who can watch some large number of movies. Each user rates each movie with a score between 1.0 and 5.0.

Consider the matrix whose rows are indexed by movies and whose columns are indexed by users, in which each entry is one user's score of one movie.

If every user had watched every movie, then we could fill out every entry in this matrix and we would know every user's preferences. We make the following hopeful assumption about this ideal matrix: it probably has low rank compared to the number of users and movies.

In other words, we assume there are some small number of abstract user 'types', maybe people who only like horror movies or only like movies in certain languages or only likes comedies, and the preferences of a typical user is (approximately) a linear combination of these basic types.

In practice, we only have access to a noisy version of this matrix, because not every user has watched every movie. For movies a user hasn't watched, we assign the default rating of 2.5.

Is it possible to recover (approximately) the ideal matrix from the noisy version? That is, can we predict what a user's rating on an unseen movie will be? This is a compressed sensing problem.

A version of this problem was the basis for the USD 1 million Netflix Prize about ten years ago.

12.5 μs

users

xxxxxxxxxx
 
users = 4000

32.0 ns

movies

xxxxxxxxxx
 
movies = 400

32.0 ns

watch_frequency

0.75

xxxxxxxxxx
 
watch_frequency = 0.75

32.0 ns

Here is one way we might try to predict user ratings.

If the ideal ratings matrix $M_{i d e a l}$ has small rank $r$ , then it only has $r$ nonzero singular values.

It follows that all but the largest $r$ singular values of the noisy ratings matrix $M_{n o i s y}$ should be close to zero. By inspecting the singular values of the noisy matrix, we can guess what $r$ is.

Then we try to recover $M_{i d e a l}$ by computing a singular value decomposition

$M_{n o i s y} = U Σ V^{T} .$

Then we set all but the first $r$ diagonal entries of $Σ$ to zero to form $Σ |_{r}$ , and calculate

$M_{i d e a l} \approx U (Σ |_{r}) V^{T} .$

If the noisy ratings matrix is not too noisy, then this very simple method of recovering the ideal ratings matrix can work fairly well.

In practice, perfect ratings predictions are difficult, but every small improvements to your prediction algorithm can be extremely valuable, if there are a very large number of users.

10.7 μs

generate_noiseless_ratings_matrix (generic function with 1 method)

xxxxxxxxxx
 
function generate_noiseless_ratings_matrix(users, movies)
    rank = rand(3:12)
    basis = rand(1:5, rank, movies)
    
    combinations = rand(users, rank)
    combinations =  Diagonal(sum(combinations, dims=2)[:])^-1 * combinations
    
    noiseless_matrix = transpose(combinations * basis)
end

38.0 μs

generate_noisy_ratings_matrix (generic function with 1 method)

xxxxxxxxxx
 
function generate_noisy_ratings_matrix(users, movies, watch_frequency)
    # returns noisy rating matrix plus a function to compute error of
    # recovered matrix compared to ideal rating matrix (which is not returned)
​
    noiseless_matrix = generate_noiseless_ratings_matrix(users, movies)
    noisy_matrix = noiseless_matrix[:,:]
    m, n = size(noiseless_matrix)
    noise_mask = zeros(m, n)
    for i=1:m
        for j=1:n
            if rand() > watch_frequency
                noisy_matrix[i, j] = 2.5
                noise_mask[i, j] = 1
            end
        end
    end
    measure_error = x -> norm((noiseless_matrix - x) .* noise_mask)
    return noisy_matrix, measure_error
end
            
    

62.5 μs

400×4000 Array{Float64,2}:
 3.72258  3.27579  2.5      2.5      3.63034  …  3.72601  3.8469   3.32659  3.91386
 3.89421  4.05451  4.1789   4.28448  4.58943     4.1644   3.81497  3.24141  2.5
 3.5913   2.5      2.5      2.24011  2.5         3.56145  3.64509  4.05129  2.6286
 2.9963   3.18952  2.5      2.36413  2.91828     2.96441  2.96438  2.5      2.68022
 2.17202  2.5      1.49161  3.23483  2.5         2.35855  2.20062  2.5      3.03937
 2.84238  3.11622  2.5      2.63011  3.08712  …  2.86364  2.70931  2.94004  3.16261
 3.28851  3.15564  2.5      4.17942  3.61483     3.38076  3.25995  2.82698  2.5
 ⋮                                            ⋱                             
 2.77538  2.70192  2.28838  2.5      2.5         3.09185  2.8094   1.86173  3.68248
 3.04346  2.5      2.20903  2.5      3.07184  …  3.16449  2.5      2.5      3.01874
 3.24505  2.5      2.5      3.88973  2.5         2.5      3.08237  3.19508  4.02675
 3.60239  3.26188  2.51181  4.14773  3.53322     3.66822  3.75195  2.5      2.5
 4.43874  4.41847  4.30483  2.5      4.36425     4.44593  4.47941  2.5      4.24331
 3.56029  3.0937   2.5      4.62951  2.5         2.5      3.66375  2.5      3.70153

#1

xxxxxxxxxx
 
noisy, measure_error = generate_noisy_ratings_matrix(users, movies, watch_frequency)

39.4 ms

svals

Float641

3749.64

233.945

212.981

211.019

64.1959

63.2061

62.4739

61.6208

57.1018

56.888

56.4822

56.1183

55.3432

54.643

53.7979

51.8557

51.0547

50.7022

50.2693

50.108

more391

7.41396

392

7.32439

393

7.26869

394

7.20832

395

7.13759

396

7.05278

397

6.9317

398

6.80644

399

4.24778

400

4.19142

xxxxxxxxxx
 
svals = svd(noisy).S

255 ms

xxxxxxxxxx
 
plot(log.(svals), legend=false, title="Logarithms of singular values of noisy matrix")

174 ms

xxxxxxxxxx
 
plot(-diff(log.(svals))[1:25], legend=false, 
    title="Differences between logarithms of consecutive singular values")

163 ms

The location of the last spike in this graph is a good guess for the rank of our ideal ratings matrix.

2.8 μs

recover (generic function with 1 method)

xxxxxxxxxx
 
function recover(A, r)
    # given SVD for A = U * Sigma * V^T, this method replaces
    # all but first r singular values of A by zero
    # and then returns U * (modified Sigma) * V^T 
​
    U, S, V = svd(A)
    S = Diagonal([i > r ? 0.0 : S[i] for i=1:size(S)[1]])
    return U * S * transpose(V)
end

59.3 μs

Error without any attempt at recovery:

2.5 μs

620.3058865020329

xxxxxxxxxx
 
measure_error(noisy)

7.2 ms

Now let's attempt to recover the ideal ratings matrix use the SVD algorithm from above:

2.6 μs

xxxxxxxxxx
 
error = [measure_error(recover(noisy, x)) for x = 1:25];

7.1 s

The minimum error should occur when our recovered matrix has the rank we predicted for the ideal ratings matrix.

3.4 μs

x
 
plot(
    error,
    ylim = (0, maximum(error) * 1.1),
    legend=false, 
    title="Error of recovered matrix compared to true matrix",
    xlabel="x: rank of recovered matrix",)

16.9 ms