6.694395 seconds
Cours ENPC — Pratique du calcul scientifique
$$
% % %
%
$$
\[ \definecolor{gris_C}{RGB}{96,96,96} \definecolor{blanc_C}{RGB}{255,255,255} \definecolor{pistache_C}{RGB}{176,204,78} \definecolor{pistache2_C}{RGB}{150,171,91} \definecolor{jaune_C}{RGB}{253,235,125} \definecolor{jaune2_C}{RGB}{247,208,92} \definecolor{orange_C}{RGB}{244,157,84} \definecolor{orange2_C}{RGB}{239,119,87} \definecolor{bleu_C}{RGB}{126,151,206} \definecolor{bleu2_C}{RGB}{90,113,180} \definecolor{vert_C}{RGB}{96,180,103} \definecolor{vert2_C}{RGB}{68,141,96} \definecolor{pistache_light_C}{RGB}{235,241,212} \definecolor{jaune_light_C}{RGB}{254,250,224} \definecolor{orange_light_C}{RGB}{251,230,214} \definecolor{bleu_light_C}{RGB}{222,229,241} \definecolor{vert_light_C}{RGB}{216,236,218} \]
Goal of this chapter
Study numerical methods to find the eigenpairs \((\mathbf{\boldsymbol{v}}_i, \mathbf{\boldsymbol{\lambda}}_i)\in \mathbf C^n \times \mathbf C\) satisfying \[ \mathsf A \mathbf{\boldsymbol{v}}_i= \lambda_i \mathbf{\boldsymbol{v}}_i, \] where \(\mathsf A \in \mathbf C^{n \times n}\).
\(~\)
Program of today
Simple vector iterations
calculate just one eigenpair
Simultaneous iteration
calculate multiple eigenpairs at the same time
Application: PageRank
find the invariant distribution of a Markov chain
Remark: can we calculate the roots of the characteristic polynomial?
Cost of calculating the characteristic polynomial scales as \({\color{red}n!}\)
\({\color{green} \rightarrow}\) not a viable approach…
Power iteration
The power iteration is a method for finding \(\color{blue}(\lambda_1, v_1)\): \[ \fcolorbox{blue}{lightgreen}{$ \mathbf{\boldsymbol{x}}_{k+1} = \frac{\mathsf A \mathbf{\boldsymbol{x}}_{k}}{{\lVert {\mathsf A \mathbf{\boldsymbol{x}}_{k}} \rVert}} $} \] Given \(\mathbf{\boldsymbol{x}}_k \approx \mathbf{\boldsymbol{v}}_1\), the eigenvalue is calculated from the Rayleigh quotient: \[ \fcolorbox{blue}{lightgreen}{$ \lambda_{1} \approx \frac{\mathbf{\boldsymbol{x}}_{k}^* \mathsf A \mathbf{\boldsymbol{x}}_k}{\mathbf{\boldsymbol{x}}_k^* \mathbf{\boldsymbol{x}}_k} $} \]
Inf
.Notation: let \(\angle\) denote the acute angle between \(\mathbf{\boldsymbol{x}}\) and \(\mathbf{\boldsymbol{y}}\): \[ \angle(\mathbf{\boldsymbol{x}}, \mathbf{\boldsymbol{y}}) = \arccos\left( \frac{{\lvert {\mathbf{\boldsymbol{x}}^* \mathbf{\boldsymbol{y}}} \rvert}}{\sqrt{\mathbf{\boldsymbol{x}}^* \mathbf{\boldsymbol{x}}} \sqrt{\mathbf{\boldsymbol{y}}^* \mathbf{\boldsymbol{y}}}}\right). \]
Since the eigenvectors of \(\mathsf A\) span \(\mathbf C^n\), any vector \(\mathbf{\boldsymbol{x}}_0\) may be decomposed as \[ \mathbf{\boldsymbol{x}}_0 = \alpha_1 \mathbf{\boldsymbol{v}}_1 + \dotsb + \alpha_n \mathbf{\boldsymbol{v}}_n. \]
Proposition: convergence of the power iteration started from \(\mathbf{\boldsymbol{x}}_0\)
Suppose that \({\lvert {\lambda_1} \rvert} > {\lvert {\lambda_2} \rvert}\) and \(\alpha_1 \neq 0\). Then \((\mathbf{\boldsymbol{x}}_k)_{k \geqslant 0}\) satisfies \[ \lim_{k \to \infty} \angle(\mathbf{\boldsymbol{x}}_k, \mathbf{\boldsymbol{v}}_1) = 0. \] The sequence \((\mathbf{\boldsymbol{x}}_k)\) is said to converge essentially to \(\mathbf{\boldsymbol{v}}_1\). \(~\)
Proof
By construction \[ \mathbf{\boldsymbol{x}}_k = \frac{\lambda_1^k \alpha_1 \mathbf{\boldsymbol{v}}_1 + \dotsb + \lambda_n^k \alpha_n \mathbf{\boldsymbol{v}}_n}{{\lVert {\lambda_1^k \alpha_1 \mathbf{\boldsymbol{v}}_1 + \dotsb + \lambda_n^k \alpha_n \mathbf{\boldsymbol{v}}_n} \rVert}} = \frac{\lambda_1^k \alpha_1}{|\lambda_1^k \alpha_1|} \frac{\mathbf{\boldsymbol{v}}_1 + \frac{\lambda_2^k \alpha_2}{\lambda_1^k \alpha_1} \mathbf{\boldsymbol{v}}_2 + \dotsb + \frac{\lambda_n^k \alpha_2}{\lambda_1^k \alpha_1} \mathbf{\boldsymbol{v}}_n }{\left\lVert\mathbf{\boldsymbol{v}}_1 + \frac{\lambda_2^k \alpha_2}{\lambda_1^k \alpha_1} \mathbf{\boldsymbol{v}}_2 + \dotsb + \frac{\lambda_n^k \alpha_n}{\lambda_1^k \alpha_1} \mathbf{\boldsymbol{v}}_n\right\rVert}. \] Therefore \[ \mathbf{\boldsymbol{x}}_k \left(\frac{|\lambda_1^k \alpha_1|}{\lambda_1^k \alpha_1}\right) \xrightarrow[k \to \infty]{} \mathbf{\boldsymbol{v}}_1. \] (Notice that the bracketed factor on the left-hand side is a scalar of modulus 1.)
The dominant error term scales as \(\color{red} |\lambda_2/\lambda_1|^k\).
Suppose that \(\mu \notin \sigma(\mathsf A)\) and \(|\lambda_J - \mu| < |\lambda_K - \mu| \leqslant|\lambda_j - \mu|\) for all \(j \neq J\).
Inverse iteration: to find \(\lambda_J\), the eigenvalue that is nearest \(\mu\)
Let \(\mathsf B = (\mathsf A - \mu \mathsf I)^{-1}\). The inverse iteration around \(\mu \in \mathbf C\) is given by \[ \fcolorbox{blue}{lightgreen}{$ \mathbf{\boldsymbol{x}}_{k+1} = \frac{\mathsf B \mathbf{\boldsymbol{x}}_{k}}{{\lVert {\mathsf B \mathbf{\boldsymbol{x}}_{k}} \rVert}}. $} \] Given \(\mathbf{\boldsymbol{x}}_k \approx \mathbf{\boldsymbol{v}}_J\), the eigenvalue is calculated from the Rayleigh quotient: \[ \fcolorbox{blue}{lightgreen}{$ \lambda_{J} \approx \frac{\mathbf{\boldsymbol{x}}_{k}^* \mathsf A \mathbf{\boldsymbol{x}}_k}{\mathbf{\boldsymbol{x}}_k^* \mathbf{\boldsymbol{x}}_k} $} \]
The dominant eigenvalue of \(\mathsf B\) is given by \((\lambda_J - \mu)^{-1}\).
The associated eigenvector is \(\mathbf{\boldsymbol{v}}_J\).
Use \
operator instead of calculating the inverse:
Can we adapt \(\mu\) during the iteration to accelerate convergence?
Error of the inverse iteration scales as \(\color{red} \left|\frac{\lambda_J - \mu} {\lambda_K - \mu}\right|^k\).
\({\color{green} \rightarrow}\) suggests letting \[ \fcolorbox{blue}{lightgreen}{$ \mu_k = \frac{\mathbf{\boldsymbol{x}}_k^* \mathsf A \mathbf{\boldsymbol{x}}_k}{\mathbf{\boldsymbol{x}}_k^* \mathbf{\boldsymbol{x}}_k}. $} \]
This leads to the Rayleigh quotient iteration.
If convergence occurs, the associated eigenvalue converges cubically:
\[ |\lambda^{(k+1)} - \lambda_J| \leqslant C |\lambda^{(k)} - \lambda_J|^3. \]
\({\color{green} \rightarrow}\) Number of significant digits is roughly tripled at each iteration.
A = [2 1; 1 2]
setprecision(500)
x = BigFloat.([1.2; 0.9])
μ = 0
for i in 1:6
x = (A - μ*I)\x
x /= √(x'x)
μ = x'A*x
println(μ)
end
2.689655172413793214338566074331922284628319727818488081860106227371262853189736760002654726539617400119969038165484095334893321873413803868251681795864
2.9876835222760986149331492272467355588185696109799282648595173771965397075426195063225678761434065353075142377089820027363840168362403721691362285304077
2.9999995241744513496497052218657690803207923292634754813048708145726507175594474194389978335452546331521437999768268695077710762632721314300139734193988
2.9999999999999999999730670707803381925912042248826047017208189162200256216460941091027939474791840614585910098581290345505474157692691833334644060848503
2.9999999999999999999999999999999999999999999999999999999999951158299301653110614230820165226784762374074665384291286744975203198864866424226623297764997
3.0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000012
A good stopping criterion is \[ \begin{equation} \tag{stopping criterion} {\lVert {\mathsf A \widehat {\mathbf{\boldsymbol{v}}} - \widehat \lambda \widehat {\mathbf{\boldsymbol{v}}}} \rVert} \leqslant\varepsilon {\lVert {\widehat {\mathbf{\boldsymbol{v}}}} \rVert} \label{stopping} \end{equation} \]
Proposition: forward error estimate
Suppose \(\mathsf A\) is Hermitian and \(\eqref{stopping}\) is satisfied. Then there exists \(\lambda \in \sigma(\mathsf A)\) such that \(|\lambda - \widehat \lambda| \leqslant\varepsilon\).
Proof
Let \(\mathbf{\boldsymbol{r}} := \mathsf A \widehat {\mathbf{\boldsymbol{v}}} - \widehat \lambda \widehat {\mathbf{\boldsymbol{v}}}\). Then \[ \frac{1}{\min_{\lambda \in \sigma(\mathsf A)} |\lambda - \widehat \lambda|} = {\lVert {(\mathsf A - \widehat \lambda \mathsf I)^{-1}} \rVert} \geqslant\frac{{\lVert {(\mathsf A - \widehat \lambda \mathsf I)^{-1}\mathbf{\boldsymbol{r}}} \rVert}}{{\lVert {\mathbf{\boldsymbol{r}}} \rVert}} \geqslant\frac{1}{\varepsilon}, \] Taking the reciprocal, we deduce that \[ \min_{\lambda \in \sigma(\mathsf A)} |\lambda - \widehat \lambda| \leqslant\varepsilon, \] which concludes the proof.
Proposition: backward error estimate
Suppose that \(\eqref{stopping}\) is satisfied and let \(\mathcal E = \Bigl\{\mathsf E \in \mathbf C^{n \times n}: (\mathsf A + \mathsf E) \widehat {\mathbf{\boldsymbol{v}}} = \widehat \lambda \widehat {\mathbf{\boldsymbol{v}}} \Bigr\}\). Then \[ \min_{\mathsf E \in \mathcal E} {\lVert {\mathsf E} \rVert} = \frac{{\lVert {\mathsf A \widehat {\mathbf{\boldsymbol{v}}} - \widehat \lambda \widehat {\mathbf{\boldsymbol{v}}}} \rVert}}{{\lVert {\widehat {\mathbf{\boldsymbol{v}}}} \rVert}} \quad(\leqslant\varepsilon). \]
Definition of the backward error by numerical analyst Nick Higham:
Backward error is a measure of error associated with an approximate solution to a problem. Whereas the forward error is the distance between the approximate and true solutions, the backward error is how much the data must be perturbed to produce the approximate solution.
Sketch of proof
Show first that any \(\mathsf E \in \mathcal E\) satisfies \(\mathsf E \widehat {\mathbf{\boldsymbol{v}}} = - \mathbf{\boldsymbol{r}}\), where \(\mathbf{\boldsymbol{r}} := \mathsf A \widehat {\mathbf{\boldsymbol{v}}} - \widehat \lambda \widehat {\mathbf{\boldsymbol{v}}}\).
Deduce from the first item that \[ \inf_{\mathsf E \in \mathcal E} {\lVert {\mathsf E} \rVert} \geqslant\frac{{\lVert {\mathbf{\boldsymbol{r}}} \rVert}}{{\lVert {\widehat {\mathbf{\boldsymbol{v}}}} \rVert}}. \]
Find a rank one matrix \(\mathsf E_*\) such that \({\lVert {\mathsf E_*} \rVert} = \frac{{\lVert {\mathbf{\boldsymbol{r}}} \rVert}}{{\lVert {\widehat {\mathbf{\boldsymbol{v}}}} \rVert}}\), and then conclude.
(Recall that any rank 1 matrix is of the form \(\mathsf E_* = \mathbf{\boldsymbol{u}} \mathbf{\boldsymbol{w}}^*\), with norm \({\lVert {\mathbf{\boldsymbol{u}}} \rVert} {\lVert {\mathbf{\boldsymbol{w}}} \rVert}\).)
Preliminary: QR decomposition
Any matrix \(\mathsf A \in \mathbf C^{n \times p}\) (with \(p ⩽ n\)) of rank \(p\) can be decomposed as \(\mathsf A = \mathsf Q \mathsf R\) where \(\mathsf Q\in \mathbf C^{n \times p}\) is an orthogonal matrix (\({\mathsf Q}^* \mathsf Q = \mathsf I_{p \times p}\)) and \(\mathsf R\in \mathbf C^{p \times p}\) is an upper triangular matrix.
Remarks about rectangular orthogonal matrices:
If \({(\mathbf q_j)}_{1⩽j⩽p} \in \mathbf C^{n}\) denote the \(p\) orthonormal column vectors of \(\mathsf Q\) and \({(\mathbf e_j)}_{1⩽j⩽p} \in \mathbf C^{p}\) the \(p\) canonical vectors of \(\mathbf C^{p}\), then \(\mathsf Q = \sum_{j=1}^p {\mathbf q}_j {\mathbf e}_j^*\)
It follows that \({\mathsf Q}^* \mathsf Q = \sum_{j=1}^p {\mathbf e}_j {\mathbf e}_j^* = \mathsf I_{p \times p}\) but \(\mathsf Q {\mathsf Q}^* = \sum_{j=1}^p {\mathbf q}_j {\mathbf q}_j^* \in \mathbf C^{n \times n}\) is the orthogonal projector onto the columns of \(\mathsf Q\) and is different from \(\mathsf I_{n \times n}\) as soon as \(p<n\).
Proof by induction on \(p\)
\(p=1\): \(\quad \mathsf A = \begin{pmatrix}\mathbf a \end{pmatrix} = \begin{pmatrix}\frac{\mathbf a}{\lVert \mathbf a \rVert} \end{pmatrix}\begin{pmatrix} \lVert \mathbf a \rVert \end{pmatrix}\)
Suppose \(\mathsf A \in \mathbf C^{n \times p}\) is of the form \(\mathsf A = \left( \widetilde {\mathsf A} \; \mathbf a \right)\) and assume that \(\widetilde {\mathsf A} = \widetilde {\mathsf Q} \widetilde {\mathsf R}\), for orthogonal \(\widetilde {\mathsf Q}\) and upper triangular \(\widetilde {\mathsf R}\).
Then \[\mathsf A = \begin{pmatrix} \widetilde {\mathsf Q} & \mathbf a \end{pmatrix} \begin{pmatrix} \widetilde {\mathsf R} & \mathbf 0 \\ \mathbf 0^T & 1 \end{pmatrix} = \begin{pmatrix} \widetilde {\mathsf Q} & \left(\mathbf a - \widetilde {\mathsf Q}\widetilde {\mathsf Q}^* \mathbf a\right) \end{pmatrix} \begin{pmatrix} \widetilde {\mathsf R} & \widetilde {\mathsf Q}^* \mathbf a \\ \mathbf 0^T & 1 \\ \end{pmatrix} \quad\textrm{motivated by}\quad \widetilde {\mathsf Q}^*\left(\mathbf a - \widetilde {\mathsf Q}\widetilde {\mathsf Q}^* \mathbf a\right)=\mathbf 0. \]
Denoting \(\mathbf q = \left(\mathbf a - \widetilde {\mathsf Q}\widetilde {\mathsf Q}^* \mathbf a\right) / \left\lVert \mathbf a - \widetilde {\mathsf Q}\widetilde {\mathsf Q}^* \mathbf a \right\rVert\), we obtain \[ \mathsf A = \underbrace{ \begin{pmatrix} \widetilde {\mathsf Q} & \mathbf q \end{pmatrix}}_{\text{orthogonal}} \underbrace{ \begin{pmatrix} \widetilde {\mathsf R} & \widetilde {\mathsf Q}^* \mathbf a \\ \mathbf 0 & {\left\lVert \mathbf a - \widetilde {\mathsf Q}\widetilde {\mathsf Q}^* \mathbf a \right\rVert} \end{pmatrix}}_{\text{upper triangular}} \quad\textrm{with}\quad \widetilde {\mathsf Q}^* \mathbf q=\mathbf 0 \quad\textrm{and}\quad \mathsf Q {\mathsf Q}^*=\widetilde {\mathsf Q}\widetilde {\mathsf Q}^* + {\mathbf q} {\mathbf q}^*. \]
Suppose that \(\mathsf X_0 \in \mathbf C^{n \times p}\) with \(p < n\) and consider the iteration
\[ \fcolorbox{blue}{lightgreen}{$ \mathsf X_{k+1} = {\color{green}\text{normalization}} (\mathsf A \mathsf X_{k}) $} \]
In practice, this can be achieved by QR decompostion: \[ \mathsf X_{k+1} = \mathsf Q_{k+1}, \qquad \mathsf Q_{k+1} \mathsf R_{k+1} = \mathsf A \mathsf X_k \]
\(\mathsf Q_{k+1} \in \mathbf C^{n\times p}\) is a unitary matrix: \(\mathsf Q_{k+1}^* \mathsf Q_{k+1} = \mathsf I_{p \times p}\);
\(\mathsf R_{k+1} \in \mathbf C^{p\times p}\) is upper triangular.
Remarks
The first column undergoes a simple power iteration;
Mathematically, it is equivalent to apply the QR decomposition at the end. Indeed \[ \mathsf X_{k} \underbrace{\mathsf R_{k} \mathsf R_{k-1} \dotsc \mathsf R_1}_{\text{upper tri.}} = \mathsf A \mathsf X_{k-1} \mathsf R_{k-1} \mathsf R_{k-2} \dotsc \mathsf R_{1} = \dots = \mathsf A^k \mathsf X_0. \] \({\color{green} \rightarrow}\) in particular, it holds that \(\mathop{\mathrm{col}}(\mathsf X_k) = \mathop{\mathrm{col}}(\mathsf A^k \mathsf X_0)\).
Theorem: Convergence of the simultaneous iteration
Assume that
\(\mathsf A \in \mathbf C^{n \times n}\) is Hermitian
\(\mathsf X_0 \in \mathbf C^{n \times p}\) has linearly independent columns
The subspace spanned by the columns of \(\mathsf X_0\) satisfies \[ \mathop{\mathrm{col}}(\mathsf X_0) \cap \mathop{\mathrm{Span}}\{ \mathbf{\boldsymbol{v}}_{p+1}, \dotsc, \mathbf{\boldsymbol{v}}_n \} = \varnothing. \]
The first \(p+1\) eigenvalues are distinct: \[ |\lambda_1| > |\lambda_2| > \dotsb > |\lambda_p| > |\lambda_{p+1}| \geqslant|\lambda_{p+2}| \geqslant\dotsc \geqslant|\lambda_n|, \] Then the columns \(\mathsf X_k\) converge essentially to \(\mathbf{\boldsymbol{v}}_1, \dots, \mathbf{\boldsymbol{v}}_p\).
Consider a random walk on the graph:
Assumption: jumps to adjacent nodes equi-probable.
State of the problem
\[ \fcolorbox{blue}{lightgreen}{$p_{ij} = \mathbf P{(X_{k+1}=j | X_k=i)}$} \quad\textrm{≠0 if an edge exists between $i$ and $j$} \]
Assuming equi-probability of jumps to adjacent nodes, we have \(p_{ij} = \frac{a_{ij}}{n_e(i)}\) where \(n_e(i)\) is the number of edges connected to node \(i\).
\({\color{green} \rightarrow}\) note that \(\mathsf A\) is symmetric but \(\mathsf P := (p_{ij})\) is not.
\[ \fcolorbox{blue}{lightgreen}{$ \mathbf P{(X_{k+1}=i)}=\sum_{j=1}^n \mathbf P{(X_{k+1}=i | X_k=j)} \mathbf P{(X_{k}=j)} =\sum_{j=1}^n p_{ji} \mathbf P{(X_{k}=j)} $} \]
\({\color{green} \rightarrow}\) the issue is that of the steady state of the distribution i.e. \(\lim_{k\to\infty} \mathbf P{(X_{k}=i)}\)
Steady state probability distribution \(\mathbf{\boldsymbol{v}}_1 \in \mathbf R_{\geqslant 0}^n\) satisfies \[ \mathsf P^T\mathbf{\boldsymbol{v}}_1 = \mathbf{\boldsymbol{v}}_1, \qquad \mathbf{\boldsymbol{1}}^T\mathbf{\boldsymbol{v}}_1 = 1. \] (it is possible to show that \(1\) is the dominant eigenvalue)
\({\color{green} \rightarrow}\) eigenvalue problem for matrix \(\mathsf P^T\).
function adj(m)
x = vcat([collect(0:i) for i in 0:m-1]...)
y = vcat([fill(m-i, i) for i in 1:m]...)
row_lengths = 1:m
row_indices = [0; cumsum(row_lengths)]
A = zeros(Int, length(x), length(x))
for i in 1:m-1
for j in 1:row_lengths[i]
r1 = row_indices[i]
r2 = row_indices[i+1]
A[r2 + j, r2 + j + 1] = 1 # Horizontal
A[r1 + j, r2 + j] = 1 # Vertical
A[r1 + j, r2 + j + 1] = 1 # Diagonal
end
end
x, y, A + A'
end
Here ...
is the splat operator (?...
for more info)
Steady state probability distribution satisfies \[ \mathsf P^T\mathbf{\boldsymbol{v}}_1 = \mathbf{\boldsymbol{v}}_1 \]
Strategy for finding \(\mathbf{\boldsymbol{v}}_1\): evolve probability vector until equilibrium: \[ \mathbf{\boldsymbol{x}}^{(k+1)} = \mathsf P^T\mathbf{\boldsymbol{x}}^{(k)} \qquad \Leftrightarrow \qquad x^{(k+1)}_i = \sum_{j=1}^{n_n} p_{ji} x^{(k)}_j \]
\(x^{(k)}_i\) is the probability of being at node \(i\) at iteration \(k\);
This is the power iteration;
Below we take \(\mathbf{\boldsymbol{x}}^{(0)} = \mathbf{\boldsymbol{e}}_1\).
Consider the following implementation of the power iteration
\(~\)
Naive approach: use dense data structure for \(\mathsf P^T\)
using CountFlops
flops = @count_ops power_iteration(Pᵗ, x, 10)
memory = sizeof(Pᵗ)
println(flops, "\n", "Memory: $memory bytes")
add64: 441000
mul64: 441000
Memory: 352800 bytes
\({\color{green} \rightarrow}\) time is wasted on 0s. Try zeros(10_000, 10_000)^2
.
Better solution: use sparse data structure
import Base: *, transpose
struct coo_matrix
rows::Array{Int} # Row indices of ≠0
cols::Array{Int} # Column indices of ≠0
vals::Array{Float64} # Values of ≠0 entries
m::Int # Number of rows
n::Int # Number of columns
end
function transpose(A::coo_matrix)
coo_matrix(A.cols, A.rows, A.vals, A.n, A.m)
end
\({\color{green} \rightarrow}\) Exercise: define the *
and to_coo
functions
Consider the following implementation of the power iteration
\(~\)
Naive approach: use dense data structure for \(\mathsf P^T\)
using CountFlops
flops = @count_ops power_iteration(Pᵗ, x, 10)
memory = sizeof(Pᵗ)
println(flops, "\n", "Memory: $memory bytes")
add64: 441000
mul64: 441000
Memory: 352800 bytes
\({\color{green} \rightarrow}\) time is wasted on 0s. Try zeros(10_000, 10_000)^2
.
We take the following matrix for illustration: \[ M = \begin{pmatrix} 5 & . & . \\ . & 6 & 7 \\ 8 & . & 9 \end{pmatrix} \]
Coordinate list (COO), presented on the previous slide;
rows
is [1, 2, 2, 3, 3]
cols
is [1, 2, 3, 1, 3]
vals
is [5, 6, 7, 8, 9]
Compressed sparse rows (CSR, for information only):
Similar to COO, but often more memory-efficient: rows
is compressed;
Vector rows
contains the indices, in cols
and vals
, where each row starts…
… except for rows[m+1]
, which is 1 + the number of nonzeros;
rows
is [1, 2, 4, 6]
cols
is [1, 2, 3, 1, 3]
vals
is [5, 6, 7, 8, 9]
Compressed sparse columns (CSC, for information only):
Same as CSR with rows \(\leftrightarrow\) columns;
Implemented in julia
by SparseArrays
.
3×3 SparseMatrixCSC{Float64, Int64} with 0 stored entries:
⋅ ⋅ ⋅
⋅ ⋅ ⋅
⋅ ⋅ ⋅
\(~\)
\(~\)
\(~\)
See also sprand
and spdiagm
.