Notebook 7 – Math 2121, Fall 2020
Today's notebook will investigate (linear) vector valued functions, the operations we can perform on them, and their standard matrices. Then we'll also talk about how to use linear algebra to analyze a random walk.
Running this notebook (optional)
If you have Pluto up and running, you can access the notebook we are currently viewing by entering this link (right click -> Copy Link) in the Open from file menu in Pluto.
x
md"# Notebook 7 -- Math 2121, Fall 2020Today's notebook will investigate (linear) vector valued functions, the operations we can perform on them, and their standard matrices. Then we'll also talk about how to use linear algebra to analyze a random walk.### Running *this* notebook (optional)If you have Pluto up and running, you can access the notebook we are currently viewing by entering [this link](http://www.math.ust.hk/~emarberg/teaching/2020/Math2121/julia/07_Math2121_Fall2020.jl) (right click -> Copy Link) in the *Open from file* menu in Pluto."Vector valued functions
The following code creates a struct to represent functions
xxxxxxxxxxmd"### Vector valued functionsxxxxxxxxxxstruct VectorFunction formula domain::Int64 codomain::Int64endWe can evaluate a VectorFunction using this method:
xxxxxxxxxxmd"We can evaluate a `VectorFunction` using this method:"evaluate (generic function with 1 method)xxxxxxxxxxfunction evaluate(f::VectorFunction, x) return f.formula(x)end#1
1
1
xxxxxxxxxx# function R^1 -> R^1some_function = VectorFunction(x -> x^2, 1, 1)10000xxxxxxxxxxevaluate(some_function, 100)Operations on vector valued functions
As usual, we can add, subtract, rescale, and compose vector valued functions.
xxxxxxxxxxmd"### Operations on vector valued functionsadd_vector_functions (generic function with 1 method)xxxxxxxxxxfunction add_vector_functions(f::VectorFunction, g::VectorFunction) f.domain == g.domain f.codomain == g.codomain return VectorFunction(x -> evaluate(f, x) + evaluate(g, x), f.domain, g.domain)endsubtract_vector_functions (generic function with 1 method)xxxxxxxxxxfunction subtract_vector_functions(f::VectorFunction, g::VectorFunction) f.domain == g.domain f.codomain == g.codomain return VectorFunction(x -> evaluate(f, x) - evaluate(g, x), f.domain, g.domain)endscale_vector_functions (generic function with 1 method)xxxxxxxxxxfunction scale_vector_functions(c, f::VectorFunction) return VectorFunction(x -> c * evaluate(f, x), f.domain, f.domain)endcompose_vector_functions (generic function with 1 method)xxxxxxxxxxfunction compose_vector_functions(f::VectorFunction, g::VectorFunction) f.domain == g.codomain return VectorFunction(x -> evaluate(f, evaluate(g, x)), g.domain, f.codomain)end#9
2
2
xxxxxxxxxxbegin # functions R^2 -> R^2 fn_1 = VectorFunction(x -> -x, 2, 2) fn_2 = VectorFunction(x -> x + [1; 1], 2, 2) # fn_1 o fn_2 fn_3 = compose_vector_functions(fn_1, fn_2)end-4
-4
xxxxxxxxxxevaluate(fn_1, [4; 4])5
5
xxxxxxxxxxevaluate(fn_2, [4; 4])-1
-1
xxxxxxxxxxevaluate(fn_3, [0; 0]) # not linearBelow are some methods to test linearity and equality for our VectorFunctions.
xxxxxxxxxxmd"Below are some methods to test linearity and equality for our `VectorFunctions`."xxxxxxxxxxusing LinearAlgebrais_linear (generic function with 3 methods)xxxxxxxxxxfunction is_linear(f::VectorFunction, tolerance=10e-12, trials=1000) # this function does not prove that f is linear, but if it returns true # there f is linear with a high probability for i=1:trials u = rand(-10:10, f.domain, 1) v = rand(-10:10, f.domain, 1) c = rand(Float64) - 0.5 # check empirically that f(u + v) == f(u) + f(v) w = evaluate(f, u) + evaluate(f, v) - evaluate(f, u + v) if norm(w) > tolerance return false end # check empirically that f(cu) == c f(u) w = c * evaluate(f, u) - evaluate(f, c * u) if norm(w) > tolerance return false end end return trueendis_zero (generic function with 3 methods)xxxxxxxxxxfunction is_zero(f::VectorFunction, tolerance=10e-12, trials=1000) # this function does not prove that f is zero, but if it returns true # there f is the zero function with a high probability for i=1:trials v = rand(-10:10, f.domain, 1) if norm(evaluate(f, v)) > tolerance return false end end return trueend are_equal (generic function with 3 methods)xxxxxxxxxxfunction are_equal(f::VectorFunction, g::VectorFunction, tolerance=10e-12, trials=1000) return is_zero(subtract_vector_functions(f, g), tolerance, trials)endLet's test these methods. The following creates the vector valued function with the formula
This is nonlinear.
xxxxxxxxxxmd"Let's test these methods. The following creates the vector valued function with the formula nonlinear_function (generic function with 1 method)xxxxxxxxxxfunction nonlinear_function(;domain::Int64, codomain::Int64) function formula(v) w = copy(v) w[1,:] = w[1,:] .* w[2,:] return w end return VectorFunction(formula, domain, codomain)endformula
3
3
xxxxxxxxxxnonlinear = nonlinear_function(domain=3, codomain=3)20
10
3
xxxxxxxxxxevaluate(nonlinear, [2;10;3])falsexxxxxxxxxxis_linear(nonlinear)Row operations are linear transformations
We can implement our three types of elementary row operations as linear transformations.
xxxxxxxxxxmd"### Row operations are linear transformationsswap_rows (generic function with 1 method)xxxxxxxxxxfunction swap_rows(i, j; domain) function func(v) w = copy(v) w[i,:], w[j,:] = w[j,:], w[i,:] return w end return VectorFunction(func, domain, domain)endadd_rows (generic function with 1 method)xxxxxxxxxxfunction add_rows(src, dest; scalar=1, domain) function func(v) w = copy(v) w[dest,:] = scalar * w[src,:] + w[dest,:] return w end return VectorFunction(func, domain, domain)endscale_rows (generic function with 1 method)xxxxxxxxxxfunction scale_rows(src, scalar; domain) scalar != 0 function func(v) w = float(v) w[src,:] = w[src,:] * scalar return w end return VectorFunction(func, domain, domain)endfunc
4
4
xxxxxxxxxxbegin # functions R^4 -> R^4 (that's what domain=4 specifies) swap = swap_rows(3, 2, domain=4) # swaps rows 3 and 2 addop = add_rows(4, 2, scalar=4, domain=4) # adds 4 times row 4 to row 2 scaleop = scale_rows(3, 2.4, domain=4) # multiplies row 3 by 2.4end1
10
100
1000
xxxxxxxxxxvec = [1;10;100;1000]1
100
10
1000
xxxxxxxxxxevaluate(swap, vec)1
4010
100
1000
xxxxxxxxxxevaluate(addop, vec)1.0
10.0
240.0
1000.0
xxxxxxxxxxevaluate(scaleop, vec)truexxxxxxxxxxis_linear(swap)truexxxxxxxxxxis_linear(addop)truexxxxxxxxxxis_linear(scaleop)For any linear vector valued function, there is a unique standard matrix, which we can compute with the following methods.
xxxxxxxxxxmd"For any linear vector valued function, there is a unique standard matrix, which we can compute with the following methods."xxxxxxxxxxusing SparseArrayselementary_vector (generic function with 1 method)xxxxxxxxxxfunction elementary_vector(n, i) ans = sparse(zeros(n, 1)) ans[i, 1] = 1 return ansend4×1 Array{Float64,2}:
0.0
0.0
0.0
1.0xxxxxxxxxxMatrix(elementary_vector(4, 4))standard_matrix (generic function with 1 method)xxxxxxxxxxfunction standard_matrix(f::VectorFunction) elem = i -> Matrix(elementary_vector(f.domain, i)) return hcat([evaluate(f, elem(i)) for i=1:f.domain]...)end5×5 Array{Float64,2}:
1.0 0.0 0.0 0.0 0.0
0.0 1.0 0.0 0.0 0.0
0.0 0.0 1.0 0.0 0.0
0.0 0.0 0.0 1.0 0.0
0.0 0.0 0.0 0.0 1.0xxxxxxxxxx# should give identity 5-by-5 matrixstandard_matrix(VectorFunction(x -> x, 5, 5))4×4 Array{Float64,2}:
0.0 0.0 0.0 1.0
0.0 1.0 0.0 0.0
0.0 0.0 1.0 0.0
1.0 0.0 0.0 0.0xxxxxxxxxxSM1 = standard_matrix(swap_rows(1, 4, domain=4))4×4 Array{Float64,2}:
1.0 0.0 0.0 0.0
0.0 1.0 0.0 0.0
0.0 0.0 1.0 0.0
0.0 444.0 0.0 1.0xxxxxxxxxxSM2 = standard_matrix(add_rows(2, 4, scalar=444, domain=4))4×4 Array{Float64,2}:
1.0 0.0 0.0 0.0
0.0 1.0 0.0 0.0
0.0 0.0 2.4 0.0
0.0 0.0 0.0 1.0xxxxxxxxxxSM3 = standard_matrix(scaleop)truex
standard_matrix(compose_vector_functions(swap_rows(1, 4, domain=4), add_rows(2, 4, scalar=444, domain=4))) == SM1 * SM2truex
standard_matrix(add_vector_functions(swap_rows(1, 4, domain=4), add_rows(2, 4, scalar=444, domain=4))) == SM1 + SM2truex
standard_matrix(subtract_vector_functions(swap_rows(1, 4, domain=4), add_rows(2, 4, scalar=444, domain=4))) == SM1 - SM2RREF(A) is a linear function
Because row operations are linear transformations, we can can think of the row reduction algorithm as returning, instead of a matrix in reduced echelon form, the linear transformation formed by composing all the row operations we did to change our starting matrix to echelon form.
The slightly modifed code below reimplements our RREF method to return this vector valued function.
xxxxxxxxxxmd"### RREF(A) is a linear functionSome helper methods:
xxxxxxxxxxmd"Some helper methods:"rowop_swap (generic function with 1 method)xxxxxxxxxxbegin # Returns true if row has all zeros. function is_zero_row(A, row, tolerance=10e-16) (m, n) = size(A) return all([abs(A[row,col]) < tolerance for col=1:n]) end # Returns first index of nonzero entry in row, or 0 if none exists. function find_leading(A, row=1, tolerance=10e-16) (m, n) = size(A) for col=1:n if abs(A[row, col]) > tolerance return col end end return 0 end function find_nonzeros_in_column(A, row_start, col, tol=10e-12) (m, n) = size(A) return [i for i=row_start:m if abs(A[i,col]) > tol] end function columns(A) (m, n) = size(A) return n end function rowop_replace(A, source_row, target_row, scalar_factor, op) source_row != target_row op = compose_vector_functions( add_rows(source_row, target_row, scalar=scalar_factor, domain=op.domain), op ) for j=1:columns(A) A[target_row,j] += A[source_row,j] * scalar_factor end return A, op end function rowop_scale(A, target_row, scalar_factor, op) scalar_factor != 0 op = compose_vector_functions( scale_rows(target_row, scalar_factor, domain=op.domain), op ) for j=1:columns(A) A[target_row,j] *= scalar_factor end A, op end function rowop_swap(A, s, t) op = compose_vector_functions( swap_rows(s, t, domain=op.domain), op ) for j=1:columns(A) A[t,j], A[s,j] = A[s,j], A[t,j] end A, op endendAdjusted RREF algorithm:
xxxxxxxxxxmd"Adjusted RREF algorithm:"RREF (generic function with 2 methods)xxxxxxxxxxbegin function RREF_with_operator(A, tol=10e-12) A = float(copy(A)) (m, n) = size(A) op = VectorFunction(x -> x, m, m) row = 1 for col=1:n nonzeros = find_nonzeros_in_column(A, row, col, tol) if length(nonzeros) == 0 continue end i = nonzeros[1] if abs(A[i, col] - 1) < tol A[i, col] = 1 else A, op = rowop_scale(A, i, 1 / A[i, col], op) end for j=nonzeros[2:end] A, op = rowop_replace(A, i, j, -A[j, col], op) end if i != row A, op = rowop_swap(A, row, i, op) end row += 1 end for row=m:-1:1 l = find_leading(A, row, tol) if l > 0 for k=1:row - 1 if abs(A[k,l]) > tol A, op = rowop_replace(A, row, k, -A[k,l], op) end end end end return A, op end function RREF_operator(A, tol=10e-12) A, op = RREF_with_operator(A, tol) return op end function RREF(A, tol=10e-12) A, op = RREF_with_operator(A, tol) return A endend5×7 Array{Int64,2}:
1 1 2 1 0 2 -1
2 0 -1 -1 0 -1 1
1 0 -1 1 -1 0 2
0 0 0 1 2 -1 0
0 1 2 2 1 2 0xxxxxxxxxxM = rand(-1:2, 5,7)5×7 Array{Float64,2}:
1.0 0.0 0.0 0.0 0.0 -0.5 -1.0
-0.0 1.0 0.0 0.0 -0.0 2.5 6.0
0.0 0.0 1.0 0.0 0.0 0.0 -3.0
0.0 0.0 0.0 1.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 1.0 -0.5 0.0xxxxxxxxxxRREF(M)#9
5
5
xxxxxxxxxxrref_op_for_M = RREF_operator(M)truexxxxxxxxxxis_linear(rref_op_for_M)truexxxxxxxxxxevaluate(rref_op_for_M, M) == RREF(M)5×5 Array{Float64,2}:
1.25 -0.25 0.25 0.75 -1.25
-4.75 2.75 -0.75 -3.25 5.75
2.0 -1.0 0.0 1.0 -2.0
0.5 -0.5 0.5 0.5 -0.5
-0.25 0.25 -0.25 0.25 0.25xxxxxxxxxxstandard_matrix(rref_op_for_M)Random walks with matrix multiplication
The last part of today's notebook tries to motivate the utility of matrix multiplication and some of the things we'll do later in the course.
The idea is to analyze a simple random walk.
Suppose you start with
Let's suppose the probability of
You could think of this as a gambling process, or some kind of competition. Maybe you're playing a sport and your chance of winning the next match depends on your current streak of wins or losses.
We can graph your current state (the amount of money you have) as a function of time.
xxxxxxxxxxmd"### Random walks with matrix multiplication20xxxxxxxxxxstarting_amount = 2039xxxxxxxxxxmaximum_amount = 2 * starting_amount - 1We need to assign the upstep probabilities
The simplest choice is just to set
xxxxxxxxxxmd"We need to assign the upstep probabilities $p(x)$. 0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
xxxxxxxxxxbegin mid = starting_amount n = maximum_amount function power(x, exp) return x < 0 ? -(abs(x)^exp) : abs(x)^exp end function compute_prob(i) x = (i - mid) / mid delta = 0.0 # -0.5 * power(x, 2) noise = 0.0 # 0.5 * (rand(Float64) - 0.5) p = 0.5 + delta + noise return maximum([0.00, minimum([1.0, p])]) end start = elementary_vector(n, mid) probs = [compute_prob(i) for i=1:n]endxxxxxxxxxxbeginBelow is an animation of our random walk after some number of steps.
You can examine the code to see how this is implemented.
xxxxxxxxxxmd"Below is an animation of our random walk after some number of steps.100xxxxxxxxxxsteps = 100xxxxxxxxxxbegin function transform(probs) n = length(probs) A = sparse(zeros(n, n)) for i=1:n j = rand(Float64) < probs[i] ? i + 1 : i - 1 if 1 <= j <= n A[j,i] = 1 end end return A end function nz_position(sparse_matrix, j=1) I, V = findnz(sparse_matrix[1:end, j]) return length(I) > 0 ? I[1] : 0 end Base. mutable struct RandomWalk x::Int64 = mid end function step!(w::RandomWalk) if w.x != 0 && w.x != n + 1 T = transform(probs) curr_vector = elementary_vector(n, w.x) next_vector = T * curr_vector x = nz_position(next_vector) w.x = (w.x == n && x == 0) ? n + 1 : x end end plt = plot( 1, xlim = (1, steps), ylim = (0, n + 1), title = "Random walk", marker = 1, legend = false, markerstrokewidth = 0, ) walk = RandomWalk() for j=1:steps push!(plt, 1, walk.x) step!(walk) end every 1 # plot(plt)endAn essential question to ask is: after 100 steps, how likely are we to have a given amount of money?
We can approximate the answer by simulating our random walk for 100 steps some large number of trials and making a histogram of the possible outcomes.
xxxxxxxxxxmd"An essential question to ask is: after $(steps) steps, how likely are we to have a given amount of money? 10000xxxxxxxxxxtrials = 100000.0417
0.0
0.0085
0.0
0.0177
0.0
0.0289
0.0
0.0384
0.0
0.047
0.0
0.0562
0.0
0.0648
0.0
0.0766
0.0
0.0812
0.0
0.076
0.0
0.0794
0.0
0.0711
0.0
0.0694
0.0
0.056
0.0
0.0497
0.0
0.0313
0.0
0.0292
0.0
0.0186
0.0
0.0108
0.0
0.0475
xxxxxxxxxxbegin empirical_distribution = [0 for j=1:n + 2] for t=1:trials test = RandomWalk() for j=1:steps step!(test) end empirical_distribution[test.x + 1] += 1 end empirical_distribution = empirical_distribution / trialsendxxxxxxxxxxbeginNote the gaps in the above plot. Every other bar is zero because after 100 steps starting at 20, your current amount of money must be even.
xxxxxxxxxxif (steps + starting_amount) % 2 == 0This is time consuming to calculate!
There is a better way.
It turns out that we can compute the exact distribution of states after 100 steps by construction a certain transition matrix T and multiplying this matrix with itself 100 times.
xxxxxxxxxxmd"This is time consuming to calculate!Here is the transition matrix:
xxxxxxxxxxmd"Here is the transition matrix:"xxxxxxxxxxbegin function transition_matrix(probs) n = length(probs) A = zeros(n + 2, n + 2) for i=1:n A[i, i + 1] = 1 - probs[i] A[i + 2, i + 1] = probs[i] end A[1, 1] = 1 A[n + 2, n + 2] = 1 return A end T = transition_matrix(probs) spy(T)endHere is T^100:
xxxxxxxxxxmd"Here is T^$(steps):"41×41 Array{Float64,2}:
1.0 0.9204107626128211 0.8423820985077439 … 4.6341381160270354e-5 0.0
0.0 0.001560573282101458 0.0 1.5769965692431847e-5 0.0
0.0 0.0 0.006062226980470583 0.0 0.0
0.0 0.004501653698369124 0.0 5.4648993121546875e-5 0.0
0.0 0.0 0.011438164114091454 0.0 0.0
0.0 0.006936510415722331 0.0 … 0.00011722351975012556 0.0
0.0 0.0 0.015568612266375709 0.0 0.0
⋮ ⋱ ⋮
0.0 0.00011722351975012555 0.0 … 0.00693651041572233 0.0
0.0 0.0 0.00017187251287167243 0.0 0.0
0.0 5.4648993121546875e-5 0.0 0.004501653698369124 0.0
0.0 0.0 7.041895881397872e-5 0.0 0.0
0.0 1.576996569243185e-5 0.0 0.0015605732821014585 0.0
0.0 4.634138116027036e-5 0.00010845272801297256 … 0.9204107626128213 1.0xxxxxxxxxxT^stepsThe theoretical distribution of states is computed by multiplying T^100 by this starting vector:
xxxxxxxxxxmd"The theoretical distribution of states is computed by multiplying T^$(steps) by this starting vector:"41×1 Array{Float64,2}:
0.04604406623625023
0.0
0.008758339460287078
0.0
0.017819383810537857
0.0
0.027370695429502278
⋮
0.0
0.017819383810537864
0.0
0.008758339460287074
0.0
0.04604406623625023xxxxxxxxxxbegin start_with_boundary = vcat([0], start, [0]) theoretical_distribution = T^steps * start_with_boundaryendThe graph below should match closely with empirical distribution we calculated, if the number of trials is large enough.
xxxxxxxxxxmd"The graph below should match closely with empirical distribution we calculated, if the number of trials is large enough."xxxxxxxxxxbeginOne benefit of the theoretical approach is we can actually figure out the limiting behavior of our random walk as the number of steps grows to infinity. This is not feasible to compute empirically.
xxxxxxxxxxmd"One benefit of the theoretical approach is we can actually figure out the limiting behavior of our random walk as the number of steps grows to infinity. This is not feasible to compute empirically."xxxxxxxxxxbegin tsteps= 100000000 * steps ttheoretical_distribution = T^tsteps * start_with_boundary bar(ttheoretical_distribution, color=:green, legend=false, title="Theoretical distribution of states after $(tsteps) steps")endIn the constant probability 1/2 case, this last graph shows that if you continue long enough then you have a 1/2 chance of losing all your money and a 1/2 chance of making it to 40. With sufficient time, every other possible outcome becomes vanishingly unlikely.
xxxxxxxxxxmd"In the constant probability 1/2 case, this last graph shows that if you continue long enough then you have a 1/2 chance of losing all your money and a 1/2 chance of making it to $(maximum_amount + 1). With sufficient time, every other possible outcome becomes vanishingly unlikely."Other choices of probabilities may lead to different long term outcomes.
xxxxxxxxxxmd"Other choices of probabilities may lead to different long term outcomes."