Notebook 4 – Math 2121, Fall 2020

Running this notebook (optional)

You can run this notebook on your own by entering this link in the Open from file menu in Pluto.

Multiplying matrices and vectors

Suppose $A$ is an $m \times n$ matrix and $v \in R^{n}$ is a vector.

In today's lecture we introduced a way of multiplying $A$ and $v$ to get a vector $A v \in R^{m}$ .

If $m = n = 2$ then this operation is $[\begin{array}{cc} A_{11} & A_{12} \\ A_{21} & A_{22} \end{array}] [\begin{array}{cc} v_{1} \\ v_{2} \end{array}] = [\begin{matrix} A_{11} v_{1} + A_{12} v_{2} \\ A_{21} v_{2} + A_{22} v_{2} \end{matrix}]$ .

For example:

xxxxxxxxxx
md"# Notebook 4 -- Math 2121, Fall 2020

22.9 μs

matrix

2×2 Array{Int64,2}:
   1    10
 100  1000

xxxxxxxxxx
 
matrix = [1 10; 100 1000]

5.5 μs

vector

Int641

-1

xxxxxxxxxx
 
vector = [-1; 1]

5.3 μs

Int641

xxxxxxxxxx
 
matrix * vector

8.9 μs

Multiplying matrices

There is a simple way to extend this method of multiplying a matrix and a vector to a method of multiplying two matrices together. We will discuss this matrix multiplication operation in more depth in a few lectures, but here is a first look.

xxxxxxxxxx
md"##### Multiplying matrices

12.7 μs

Assume $A$ and $B$ are both $n \times n$ matrices, that is, square matrices of the same size.

Then every column of $B$ is a vector $v \in R^{n}$ that we can multiply with $A$ .

Write $v^{(i)}$ for column $i$ of $B$ so that $B = [\begin{array}{cccc} v^{(1)} & v^{(2)} & \dots & v^{(n)} \end{array}]$ .

Then the matrix product of $A$ and $B$ is the $n \times n$ matrix $A B = [\begin{array}{cccc} A v^{(1)} & A v^{(2)} & \dots & A v^{(n)} \end{array}]$ .

xxxxxxxxxx
md"Assume $A$ and $B$ are both $n\times n$ matrices, that is, square matrices of the same size.

17.7 μs

Enter cell code...xxxxxxxxxx
 
​

---

If $n = 2$ then $[\begin{array}{cc} A_{11} & A_{12} \\ A_{21} & A_{22} \end{array}] [\begin{array}{cc} B_{11} & B_{12} \\ B_{21} & B_{22} \end{array}] = [\begin{matrix} A_{11} B_{11} + A_{12} B_{21} & A_{11} B_{12} + A_{12} B_{22} \\ A_{21} B_{11} + A_{22} B_{21} & A_{21} B_{12} + A_{22} B_{22} \end{matrix}]$ .

The first column is $A v^{(1)}$ for $v^{(1)} = [\begin{array}{cc} B_{11} \\ B_{21} \end{array}]$ and the second column is $A v^{(2)}$ for $v^{(2)} = [\begin{array}{cc} B_{12} \\ B_{22} \end{array}]$ .

xxxxxxxxxx
md"

10.9 μs

A concrete example is $[\begin{array}{cc} 10 & 1 \\ 1000 & 100 \end{array}] [\begin{array}{cc} 2 & 4 \\ 6 & 8 \end{array}] = [\begin{matrix} 26 & 48 \\ 2600 & 4800 \end{matrix}]$ .

We can double check this computation in Julia.

xxxxxxxxxx
md"A concrete example is $\left[\begin{array}{cc} 10 & 1 \\ 1000 & 100 \end{array}\right]\left[\begin{array}{cc} 2 & 4 \\ 6  & 8 \end{array}\right] = \left[\begin{array}{c} 26 & 48  \\ 2600 & 4800 \end{array}\right]$.

10.5 μs

2×2 Array{Int64,2}:
   10    1
 1000  100

xxxxxxxxxx
 
A = [10 1; 1000 100]

33.1 ms

2×2 Array{Int64,2}:
 2  4
 6  8

xxxxxxxxxx
 
B = [2 4; 6 8]

7.4 μs

2×2 Array{Int64,2}:
   26    48
 2600  4800

xxxxxxxxxx
 
A * B

17.6 μs

2×2 Array{Int64,2}:
   26    48
 2600  4800

xxxxxxxxxx
 
begin
    # equivalent to computing A * B:
    v1 = [2; 
          6]
    v2 = [4; 
          8]
    hcat(A * v1, A * v2)
end

14.9 μs

This definition of $A B$ makes sense whenever $A$ is an $m \times p$ matrix and $B$ is a $p \times n$ matrix.

In that case, the result of multiplying $A$ and $B$ is an $m \times n$ matrix $A B$ .

For simplicity for we focus today on the case when $m = p = q = n$ .

xxxxxxxxxx
 
md"This definition of $AB$ makes sense whenever $A$ is an $m\times p$ matrix and $B$ is a $p\times n$ matrix. 
​
In that case, the result of multiplying $A$ and $B$ is an $m\times n$ matrix $AB$. 
​
For simplicity for we focus today on the case when $m=p=q=n$."

10.8 μs

Efficiency

The question we want to explore: how fast can we multiply two $n \times n$ matrices $A$ and $B$ ?

In other words, what is the most efficient algorithm for matrix multiplication?

If you don't want to think about the new operation of multiplying two matrices, then it's equivalent to think about how fast we can multiply an $n \times n$ matrix $A$ with a sequence of $n$ vectors.

xxxxxxxxxx
md"##### Efficiency 

32.4 μs

Enter cell code...xxxxxxxxxx
 
​

---

To multiply matrices, we just need to be able to perform addition and multiplication of individual scalars (i.e., numbers). Multiplication operations are much more time consuming than addition.

So our efficiency question can be reduced to: what is the fewest number of scalar multiplications needed to multiply two $n \times n$ matrices?

xxxxxxxxxx
md"To multiply matrices, we just need to be able to perform addition and multiplication of individual scalars (i.e., numbers). Multiplication operations are much more time consuming than addition. 

14.3 μs

Enter cell code...xxxxxxxxxx
 
​

---

The formula above for the $n = 2$ case shows 8 multiplications:

$A_{11} B_{11}, A_{11} B_{12}, A_{21} B_{11}, A_{21} B_{12}, A_{12} B_{21}, A_{12} B_{22}, A_{22} B_{21}, A_{22} B_{22} .$

In general, entry $(i, j)$ in $A B$ is $A_{i 1} B_{1 j} + A_{i 2} B_{2 j} + A_{i 3} B_{3 j} + \dots + A_{i n} B_{n j}$ .

This formula involves $n$ multiplications and there are $n^{2}$ entries $(i, j)$ .

So you might expect that we need $n^{3}$ multiplications (giving $8 = 2^{3}$ for $n = 2$ ) to compute $A B$ .

Let's write some code to verify this.

xxxxxxxxxx
md"The formula above for the $n=2$ case shows 8 multiplications: 

15.8 μs

scalar_multiply (generic function with 1 method)

xxxxxxxxxx
 
function scalar_multiply(a, b, count)
    
    # custom function to multiply scalars and return incremented multiplication count
    
    # if a == 0 or b == 0 then this returns 0 with same count.
    # otherewise returns a * b with count incremented by one.
    return (a == 0 || b == 0) ? (0, count) : (a * b, count + 1)
end

24.9 μs

scalar_type (generic function with 1 method)

xxxxxxxxxx
 
function scalar_type(A)
    # helper function that returns type (e.g., Int, Float64, ...) of matrix entries
    return minimum(size(A)) > 0 ? typeof(A[1,1]) : Int
end

22.7 μs

matrix_multiply (generic function with 2 methods)

x
 
function matrix_multiply(A, B, count=0)
​
    # multiplies matrices A and B using the naive algorithm.
    # returns both the product AB and number of scalar multiplications used.
    
    # check that matrix sizes match, so that AB is defined
    (rows, p), (q, cols) = size(A), size(B)
    @assert p == q
    
    C = zeros(scalar_type(A), rows, cols)   
    for i=1:rows
        for j=1:cols
            for t=1:p
                value, count = scalar_multiply(A[i,t], B[t,j], count)
                C[i,j] += value
            end
        end
    end
    
    return C, count
end
    

92.6 μs

Now let's actually check how many scalar multiplications are used in the naive algorithm when computing the product of two $n \times n$ matrices, say for $n = 1, 2, \dots, 200$ .

xxxxxxxxxx
md"Now let's actually check how many scalar multiplications are used in the naive algorithm when computing the product of two $n\times n$ matrices, say for $n=1,2,\dots,200$."

10.2 μs

xxxxxxxxxx
 
begin
    sizes = 1:200
    total_multiplications = []
    for n=sizes
        A = rand(Int, n, n)
        B = rand(Int, n, n)
        C, multiplication_count = matrix_multiply(A, B)
        append!(total_multiplications, multiplication_count)
        
        # check that our multiplication algorithm gives correct answer!
        @assert C == A * B 
    end
end

1.4 s

Plots.jl

xxxxxxxxxx
begin

1.8 ms

Any1

191

192

193

194

195

196

197

198

199

200

xxxxxxxxxx
 
total_multiplications

1.2 μs

true

xxxxxxxxxx
 
# the numbers look like they're exactly n^3, and they are:
total_multiplications == [n^3 for n in 1:length(total_multiplications)]

78.3 ms

Faster algorithms

Now let's discuss a different matrix multiplication algorithm.

The following is called the Strassen algorithm, discovered by Volker Strassen in the 1960s.

The significance of this algorithm is just to demonstrate that there is a faster way to multiply matrices than the naive method.

The idea of the algorithm is, to compute $A B$ where $A$ and $B$ are $n \times n$ matrices, proceed as follows:

First, we split both $A$ and $B$ into four $n / 2 \times n / 2$ submatrices.
Then we express $A B$ as a sum of terms derived from multiplying these submatrices together.
The clever part is to somehow write this using only 7 multiplications instead of the usual 8.
If $n = 2$ then the multiplications involve $1 \times 1$ submatrices, that is, scalars.
If $n > 2$ then we recursively apply the algorithm the multiply the submatrices together.

xxxxxxxxxx
md"##### Faster algorithms

14.2 μs

Enter cell code...xxxxxxxxxx
 
​

---

We can improve this algorithm by modifying the final step of the recursion.

When $n$ is small, below some threshhold value, then it's faster to use the naive multiplication algorithm. We can choose this threshhold below.

Setting this threshhold to 1 means we would keep subdividing the matrices until they are $1 \times 1$ . This isn't the most efficient choice.

x
 
md"We can improve this algorithm by modifying the final step of the recursion.
​
When $n$ is small, below some threshhold value, then it's faster to use the naive multiplication algorithm. We can choose this threshhold below.
​
Setting this threshhold to 1 means we would keep subdividing the matrices until they are $1\times 1$. This isn't the most efficient choice."

13.4 μs

Fast multiply threshold: 6

xxxxxxxxxx
 
begin
    using PlutoUI
    
    tslider = @bind THRESHHOLD Slider(1:40, default=6, show_value=true)
    
    md"""*Fast multiply threshold:* $(tslider)
    """
    
end

341 μs

fast_multiply (generic function with 2 methods)

x
 
begin
    function pad(A, n)
        
        # extends matrix A to an n-by-n matrix by adding zeros in vacant positions.
        #         
        # returns n-by-n matrix [A 0]
        #                       [0 0]
​
        (p, q), stype = size(A), scalar_type(A)
        return vcat(hcat(A, zeros(stype, p, n - q)), zeros(stype, n - p, n))
    end
    
    function fast_multiply(A, B, count=0; thrshld=THRESHHOLD)
    
        # multiplies matrices A and B using the Strassen algorithm (1969).
        # returns both the product AB and the number of scalar multiplications used.
    
        (rows, p), (q, cols) = size(A), size(B)
        @assert p == q
        
        # pads A and B with zeros if not same size.
        # can ignore this code for square matrices.
        if size(A) != size(B)
            n = maximum([rows, p, q, cols])
            A, B = pad(A, n), pad(B, n)
        else
            n = rows
        end
        
        # if size of input matrices is below THRESHHOLD value, then we switch
        # to the naive multiplication algorithm. this can gain some efficiency. 
        if n <= thrshld
            return matrix_multiply(A, B, count)
        else
            # pads inputs with one extra row/column to have even-by-even size
            if n % 2 != 0
                n += 1
                A, B = pad(A, n), pad(B, n)
            end
    
            p = n ÷ 2 # shorthand for n/2
            q = p + 1 # shorthand for n/2 + 1
            
            # divide A into 4 submatrices of size n/2-by-n/2
            A11 = A[1:p,1:p]
            A12 = A[1:p,q:n]
            A21 = A[q:n,1:p]
            A22 = A[q:n,q:n]
​
            # divide B into 4 submatrices of size n/2-by-n/2
            B11 = B[1:p,1:p]
            B12 = B[1:p,q:n]
            B21 = B[q:n,1:p]
            B22 = B[q:n,q:n]
        
            # uses only 7 multiplications.
            # we ignore +/- operations as these are faster than multiplications.
            M1, count = fast_multiply(A11 + A22, B11 + B22, count, thrshld=thrshld)
            M2, count = fast_multiply(A21 + A22, B11,       count, thrshld=thrshld)
            M3, count = fast_multiply(A11,       B12 - B22, count, thrshld=thrshld)
            M4, count = fast_multiply(A22,       B21 - B11, count, thrshld=thrshld)
            M5, count = fast_multiply(A11 + A12, B22,       count, thrshld=thrshld)
            M6, count = fast_multiply(A21 - A11, B11 + B12, count, thrshld=thrshld)
            M7, count = fast_multiply(A12 - A22, B21 + B22, count, thrshld=thrshld)
            
            C = zeros(scalar_type(A), n, n)
            
            # only +/- operations below, no multiplications!
            C[1:p,1:p] = M1 + M4 - M5 + M7
            C[1:p,q:n] = M3 + M5
            C[q:n,1:p] = M2 + M4
            C[q:n,q:n] = M1 - M2 + M3 + M6
            
            return C[1:rows,1:cols], count
        end
    end
end
    

140 μs

Let's compare both algorithms for $2 \times 2$ matrices.

xxxxxxxxxx
md"Let's compare both algorithms for $2\times 2$ matrices."

9.3 μs

2×2 Array{Int64,2}:
 73   52
 37  -44

xxxxxxxxxx
 
X = rand(-100:100, 2, 2)

10.4 μs

2×2 Array{Int64,2}:
  52   -9
 -23  -73

xxxxxxxxxx
 
Y = rand(-100:100, 2, 2)

10.9 μs

2×2 Array{Int64,2}:
 2600  -4453
 2936   2879

xxxxxxxxxx
 
# note: 8 multiplications
matrix_multiply(X, Y)

5.3 μs

2×2 Array{Int64,2}:
 2600  -4453
 2936   2879

xxxxxxxxxx
 
# note: only 7 multiplications!
fast_multiply(X, Y, thrshld=1)

186 ms

Now let's actually check how many scalar multiplications are used in the naive algorithm when computing the product of two $n \times n$ matrices for $n = 1, 2, \dots,$ 200.

xxxxxxxxxx
md"Now let's actually check how many scalar multiplications are used in the naive algorithm when computing the product of two $n\times n$ matrices for $n=1,2,\dots,$ $(minimum([40 * THRESHHOLD, 200]))."

18.4 μs

xxxxxxxxxx
 
begin
    nvals = 1:minimum([40 * THRESHHOLD, 200])
    
    naive_multiplications = total_multiplications[1:length(nvals)]
    
    strassen_multiplications = []
    for n = nvals
        A = rand(Int, n, n)
        B = rand(Int, n, n)
        C, multiplication_count = fast_multiply(A, B)
        append!(strassen_multiplications, multiplication_count)
        
        # check that our multiplication algorithm gives correct answer!
        @assert A * B == C
    end
end

5.1 s

Let's compare the multiplication counts.

xxxxxxxxxx
md"Let's compare the multiplication counts."

8.7 μs

Any1

191

192

193

194

195

196

197

198

199

200

xxxxxxxxxx
 
strassen_multiplications

757 ns

Any1

191

192

193

194

195

196

197

198

199

200

xxxxxxxxxx
 
naive_multiplications

1.2 μs

Plots.jl

xxxxxxxxxx
begin

1.9 ms

Consider arbitrary $n \times n$ matrices $A$ and $B$ .

Let $M_{n a i v e} (n)$ be the number of multiplications used by the naive algorithm compute $A B$ .

Let $M_{s t r a s s e n} (n)$ be the number of multiplications used by the Strassen algorithm.

The plot below graphs the quotient $\frac{M_{n a i v e} (n)}{M_{s t r a s s e n} (n)}$ .

If the Strassen algorithm is faster, then this quotient should be greater than 1.

xxxxxxxxxx
md"Consider arbitrary $n\times n$ matrices $A$ and $B$.

14.2 μs

Plots.jl

xxxxxxxxxx
begin

45.8 ms

When $n$ is very large, the number of multiplications needed by the Strasssen algorithm will approach a constant times $n^{\log_{2} (7)} = n^{2.807 . . .}$ . The naive algorithm uses $n^{\log_{2} (8)} = n^{3}$ multiplications.

When properly implemented, the Strassen algorithm is practical, and computes $A B$ faster than the naive algorithm. This is not the best possible algorithm!

(Caution: the code in this notebook is for demo purposes only and is not efficiently implemented for practical use. There is no need for us to write our own matrix multiplication functions, as we can and should just use the built-in methods.)

xxxxxxxxxx
 
md"When $n$ is very large, the number of multiplications needed by the Strasssen algorithm will approach a constant times $n^{\log_2(7)} = n^{2.807...}$.
The naive algorithm uses $n^{\log_2(8)} = n^3$ multiplications.
​
When properly implemented, the Strassen algorithm is practical, and computes $AB$ faster than the naive algorithm. This is not the best possible algorithm!
​
(Caution: the code in this notebook is for demo purposes only and is not efficiently implemented for practical use. There is no need for us to write our own matrix multiplication functions, as we can and should just use the built-in methods.)
"

12.6 μs

Remarks

As of now, the fastest known matrix multiplication algorithm uses a constant times $n^{2.3728639}$ multiplications. This is not a practical algorithm: you only get a speedup for very large $n$ .

When multiplying two matrices, you have to process all $2 n^{2}$ entries, so if a matrix multiplication algorithm uses a constant times $n^{ω}$ multiplications then we must have $ω \geq 2$ .

The best possible value of the exponent $ω$ is still unknown.

Currently, we just know that the best value is somewhere in the range $2 \leq ω \leq 2.3728639$ .

See wikipedia for more info.

xxxxxxxxxx
 
md"##### Remarks
​
As of now, the fastest known matrix multiplication algorithm uses a constant times $n^{2.3728639}$ multiplications. This is not a practical algorithm: you only get a speedup for very large $n$.
​
When multiplying two matrices, you have to process all $2n^2$ entries, so if a matrix multiplication algorithm uses a constant times $n^\omega$ multiplications then we must have $\omega\geq 2$. 
​
The best possible value of the exponent $\omega$ is still unknown.
​
Currently, we just know that the best value is somewhere in the range $2 \leq \omega \leq 2.3728639$.
​
​
See [wikipedia](https://en.wikipedia.org/wiki/Matrix_multiplication_algorithm#Sub-cubic_algorithms) for more info.
"

15.7 μs