Solving eigenvalue problems numerically

Cours ENPC — Pratique du calcul scientifique

Objective

Goal of this chapter

Study numerical methods to find the eigenpairs \((\mathbf{\boldsymbol{v}}_i, \mathbf{\boldsymbol{\lambda}}_i)\in \mathbf C^n \times \mathbf C\) satisfying \[ \mathsf A \mathbf{\boldsymbol{v}}_i= \lambda_i \mathbf{\boldsymbol{v}}_i, \] where \(\mathsf A \in \mathbf C^{n \times n}\).

\(~\)

  • Simplifying assumption throughout this chapter: the matrix \(\mathsf A\) is diagonalizable.
  • Notation: eigenvalues \(|\lambda_1| \geqslant\dotsc \geqslant|\lambda_n|\), normalized eigenvectors \(\mathbf{\boldsymbol{v}}_1, \dotsc, \mathbf{\boldsymbol{v}}_n\). \[ \mathsf A \mathsf V = \mathsf V \mathsf D, \qquad \mathsf V = \begin{pmatrix} \mathbf{\boldsymbol{v}}_1 & \dots & \mathbf{\boldsymbol{v}}_n \end{pmatrix}, \qquad \mathsf D = {\rm diag}(\lambda_1, \dotsc, \lambda_n). \]
  • Finding all the eigenvalues of a matrix is expensive, even with the best methods:

    using LinearAlgebra
    A = rand(2000, 2000)
    @time eigen(A)
    6.694395 seconds

    \({\color{green} \rightarrow}\) We first focus on methods for finding just one or a few eigenpairs.

Program of today

  • Simple vector iterations

    calculate just one eigenpair

  • Simultaneous iteration

    calculate multiple eigenpairs at the same time

  • Application: PageRank

    find the invariant distribution of a Markov chain

Remark: can we calculate the roots of the characteristic polynomial?

Cost of calculating the characteristic polynomial scales as \({\color{red}n!}\)

\({\color{green} \rightarrow}\) not a viable approach…

The power iteration

Power iteration

The power iteration is a method for finding \(\color{blue}(\lambda_1, v_1)\): \[ \fcolorbox{blue}{lightgreen}{$ \mathbf{\boldsymbol{x}}_{k+1} = \frac{\mathsf A \mathbf{\boldsymbol{x}}_{k}}{{\lVert {\mathsf A \mathbf{\boldsymbol{x}}_{k}} \rVert}} $} \] Given \(\mathbf{\boldsymbol{x}}_k \approx \mathbf{\boldsymbol{v}}_1\), the eigenvalue is calculated from the Rayleigh quotient: \[ \fcolorbox{blue}{lightgreen}{$ \lambda_{1} \approx \frac{\mathbf{\boldsymbol{x}}_{k}^* \mathsf A \mathbf{\boldsymbol{x}}_k}{\mathbf{\boldsymbol{x}}_k^* \mathbf{\boldsymbol{x}}_k} $} \]

  • Normalization is necessary to avoid numerical Inf.
A = [2 1; 1 2]
x = rand(2)
for i in 1:20
    x = A*x
    x /= (x'x)
end
println("Eigenvector: $x, eigenvalue: $(x'A*x)")
Eigenvector: [0.7071067811246757, 0.7071067812484193], eigenvalue: 2.9999999999999996

Convergence of the power iteration

Notation: let \(\angle\) denote the acute angle between \(\mathbf{\boldsymbol{x}}\) and \(\mathbf{\boldsymbol{y}}\): \[ \angle(\mathbf{\boldsymbol{x}}, \mathbf{\boldsymbol{y}}) = \arccos\left( \frac{{\lvert {\mathbf{\boldsymbol{x}}^* \mathbf{\boldsymbol{y}}} \rvert}}{\sqrt{\mathbf{\boldsymbol{x}}^* \mathbf{\boldsymbol{x}}} \sqrt{\mathbf{\boldsymbol{y}}^* \mathbf{\boldsymbol{y}}}}\right). \]

Since the eigenvectors of \(\mathsf A\) span \(\mathbf C^n\), any vector \(\mathbf{\boldsymbol{x}}_0\) may be decomposed as \[ \mathbf{\boldsymbol{x}}_0 = \alpha_1 \mathbf{\boldsymbol{v}}_1 + \dotsb + \alpha_n \mathbf{\boldsymbol{v}}_n. \]

Proposition: convergence of the power iteration started from \(\mathbf{\boldsymbol{x}}_0\)

Suppose that \({\lvert {\lambda_1} \rvert} > {\lvert {\lambda_2} \rvert}\) and \(\alpha_1 \neq 0\). Then \((\mathbf{\boldsymbol{x}}_k)_{k \geqslant 0}\) satisfies \[ \lim_{k \to \infty} \angle(\mathbf{\boldsymbol{x}}_k, \mathbf{\boldsymbol{v}}_1) = 0. \] The sequence \((\mathbf{\boldsymbol{x}}_k)\) is said to converge essentially to \(\mathbf{\boldsymbol{v}}_1\). \(~\)

Proof

By construction \[ \mathbf{\boldsymbol{x}}_k = \frac{\lambda_1^k \alpha_1 \mathbf{\boldsymbol{v}}_1 + \dotsb + \lambda_n^k \alpha_n \mathbf{\boldsymbol{v}}_n}{{\lVert {\lambda_1^k \alpha_1 \mathbf{\boldsymbol{v}}_1 + \dotsb + \lambda_n^k \alpha_n \mathbf{\boldsymbol{v}}_n} \rVert}} = \frac{\lambda_1^k \alpha_1}{|\lambda_1^k \alpha_1|} \frac{\mathbf{\boldsymbol{v}}_1 + \frac{\lambda_2^k \alpha_2}{\lambda_1^k \alpha_1} \mathbf{\boldsymbol{v}}_2 + \dotsb + \frac{\lambda_n^k \alpha_2}{\lambda_1^k \alpha_1} \mathbf{\boldsymbol{v}}_n }{\left\lVert\mathbf{\boldsymbol{v}}_1 + \frac{\lambda_2^k \alpha_2}{\lambda_1^k \alpha_1} \mathbf{\boldsymbol{v}}_2 + \dotsb + \frac{\lambda_n^k \alpha_n}{\lambda_1^k \alpha_1} \mathbf{\boldsymbol{v}}_n\right\rVert}. \] Therefore \[ \mathbf{\boldsymbol{x}}_k \left(\frac{|\lambda_1^k \alpha_1|}{\lambda_1^k \alpha_1}\right) \xrightarrow[k \to \infty]{} \mathbf{\boldsymbol{v}}_1. \] (Notice that the bracketed factor on the left-hand side is a scalar of modulus 1.)

The dominant error term scales as \(\color{red} |\lambda_2/\lambda_1|^k\).

Inverse and Rayleigh quotient iteration

Suppose that \(\mu \notin \sigma(\mathsf A)\) and \(|\lambda_J - \mu| < |\lambda_K - \mu| \leqslant|\lambda_j - \mu|\) for all \(j \neq J\).

Inverse iteration: to find \(\lambda_J\), the eigenvalue that is nearest \(\mu\)

Let \(\mathsf B = (\mathsf A - \mu \mathsf I)^{-1}\). The inverse iteration around \(\mu \in \mathbf C\) is given by \[ \fcolorbox{blue}{lightgreen}{$ \mathbf{\boldsymbol{x}}_{k+1} = \frac{\mathsf B \mathbf{\boldsymbol{x}}_{k}}{{\lVert {\mathsf B \mathbf{\boldsymbol{x}}_{k}} \rVert}}. $} \] Given \(\mathbf{\boldsymbol{x}}_k \approx \mathbf{\boldsymbol{v}}_J\), the eigenvalue is calculated from the Rayleigh quotient: \[ \fcolorbox{blue}{lightgreen}{$ \lambda_{J} \approx \frac{\mathbf{\boldsymbol{x}}_{k}^* \mathsf A \mathbf{\boldsymbol{x}}_k}{\mathbf{\boldsymbol{x}}_k^* \mathbf{\boldsymbol{x}}_k} $} \]

  • The dominant eigenvalue of \(\mathsf B\) is given by \((\lambda_J - \mu)^{-1}\).

  • The associated eigenvector is \(\mathbf{\boldsymbol{v}}_J\).

  • Use \ operator instead of calculating the inverse:

    A = [2 1; 1 2]
    x = rand(2)
    μ = 0
    B = A - μ*I
    for i in 1:10
        x = B\x
        x /= (x'x)
    end
    println("Eigenvector: $x, eigenvalue: $(x'A*x)")
    Eigenvector: [-0.7070464369954307, 0.7071671202283563], eigenvalue: 1.0000000145644428

Can we adapt \(\mu\) during the iteration to accelerate convergence?

  • Error of the inverse iteration scales as \(\color{red} \left|\frac{\lambda_J - \mu} {\lambda_K - \mu}\right|^k\).

    \({\color{green} \rightarrow}\) suggests letting \[ \fcolorbox{blue}{lightgreen}{$ \mu_k = \frac{\mathbf{\boldsymbol{x}}_k^* \mathsf A \mathbf{\boldsymbol{x}}_k}{\mathbf{\boldsymbol{x}}_k^* \mathbf{\boldsymbol{x}}_k}. $} \]

    This leads to the Rayleigh quotient iteration.

  • If convergence occurs, the associated eigenvalue converges cubically:

    \[ |\lambda^{(k+1)} - \lambda_J| \leqslant C |\lambda^{(k)} - \lambda_J|^3. \]

    \({\color{green} \rightarrow}\) Number of significant digits is roughly tripled at each iteration.

A = [2 1; 1 2]
setprecision(500)
x = BigFloat.([1.2; 0.9])
μ = 0
for i in 1:6
    x = (A - μ*I)\x
    x /= (x'x)
    μ = x'A*x
    println(μ)
end
2.689655172413793214338566074331922284628319727818488081860106227371262853189736760002654726539617400119969038165484095334893321873413803868251681795864
2.9876835222760986149331492272467355588185696109799282648595173771965397075426195063225678761434065353075142377089820027363840168362403721691362285304077
2.9999995241744513496497052218657690803207923292634754813048708145726507175594474194389978335452546331521437999768268695077710762632721314300139734193988
2.9999999999999999999730670707803381925912042248826047017208189162200256216460941091027939474791840614585910098581290345505474157692691833334644060848503
2.9999999999999999999999999999999999999999999999999999999999951158299301653110614230820165226784762374074665384291286744975203198864866424226623297764997
3.0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000012

Stopping criterion (for information only)

A good stopping criterion is \[ \begin{equation} \tag{stopping criterion} {\lVert {\mathsf A \widehat {\mathbf{\boldsymbol{v}}} - \widehat \lambda \widehat {\mathbf{\boldsymbol{v}}}} \rVert} \leqslant\varepsilon {\lVert {\widehat {\mathbf{\boldsymbol{v}}}} \rVert} \label{stopping} \end{equation} \]

Proposition: forward error estimate

Suppose \(\mathsf A\) is Hermitian and \(\eqref{stopping}\) is satisfied. Then there exists \(\lambda \in \sigma(\mathsf A)\) such that \(|\lambda - \widehat \lambda| \leqslant\varepsilon\).

Proof

Let \(\mathbf{\boldsymbol{r}} := \mathsf A \widehat {\mathbf{\boldsymbol{v}}} - \widehat \lambda \widehat {\mathbf{\boldsymbol{v}}}\). Then \[ \frac{1}{\min_{\lambda \in \sigma(\mathsf A)} |\lambda - \widehat \lambda|} = {\lVert {(\mathsf A - \widehat \lambda \mathsf I)^{-1}} \rVert} \geqslant\frac{{\lVert {(\mathsf A - \widehat \lambda \mathsf I)^{-1}\mathbf{\boldsymbol{r}}} \rVert}}{{\lVert {\mathbf{\boldsymbol{r}}} \rVert}} \geqslant\frac{1}{\varepsilon}, \] Taking the reciprocal, we deduce that \[ \min_{\lambda \in \sigma(\mathsf A)} |\lambda - \widehat \lambda| \leqslant\varepsilon, \] which concludes the proof.

Proposition: backward error estimate

Suppose that \(\eqref{stopping}\) is satisfied and let \(\mathcal E = \Bigl\{\mathsf E \in \mathbf C^{n \times n}: (\mathsf A + \mathsf E) \widehat {\mathbf{\boldsymbol{v}}} = \widehat \lambda \widehat {\mathbf{\boldsymbol{v}}} \Bigr\}\). Then \[ \min_{\mathsf E \in \mathcal E} {\lVert {\mathsf E} \rVert} = \frac{{\lVert {\mathsf A \widehat {\mathbf{\boldsymbol{v}}} - \widehat \lambda \widehat {\mathbf{\boldsymbol{v}}}} \rVert}}{{\lVert {\widehat {\mathbf{\boldsymbol{v}}}} \rVert}} \quad(\leqslant\varepsilon). \]

Definition of the backward error by numerical analyst Nick Higham:

Backward error is a measure of error associated with an approximate solution to a problem. Whereas the forward error is the distance between the approximate and true solutions, the backward error is how much the data must be perturbed to produce the approximate solution.

Sketch of proof

  • Show first that any \(\mathsf E \in \mathcal E\) satisfies \(\mathsf E \widehat {\mathbf{\boldsymbol{v}}} = - \mathbf{\boldsymbol{r}}\), where \(\mathbf{\boldsymbol{r}} := \mathsf A \widehat {\mathbf{\boldsymbol{v}}} - \widehat \lambda \widehat {\mathbf{\boldsymbol{v}}}\).

  • Deduce from the first item that \[ \inf_{\mathsf E \in \mathcal E} {\lVert {\mathsf E} \rVert} \geqslant\frac{{\lVert {\mathbf{\boldsymbol{r}}} \rVert}}{{\lVert {\widehat {\mathbf{\boldsymbol{v}}}} \rVert}}. \]

  • Find a rank one matrix \(\mathsf E_*\) such that \({\lVert {\mathsf E_*} \rVert} = \frac{{\lVert {\mathbf{\boldsymbol{r}}} \rVert}}{{\lVert {\widehat {\mathbf{\boldsymbol{v}}}} \rVert}}\), and then conclude.

    (Recall that any rank 1 matrix is of the form \(\mathsf E_* = \mathbf{\boldsymbol{u}} \mathbf{\boldsymbol{w}}^*\), with norm \({\lVert {\mathbf{\boldsymbol{u}}} \rVert} {\lVert {\mathbf{\boldsymbol{w}}} \rVert}\).)

Simultaneous calculation of several eigenpairs

Preliminary: QR decomposition

Any matrix \(\mathsf A \in \mathbf C^{n \times p}\) (with \(p ⩽ n\)) of rank \(p\) can be decomposed as \(\mathsf A = \mathsf Q \mathsf R\) where \(\mathsf Q\in \mathbf C^{n \times p}\) is an orthogonal matrix (\({\mathsf Q}^* \mathsf Q = \mathsf I_{p \times p}\)) and \(\mathsf R\in \mathbf C^{p \times p}\) is an upper triangular matrix.

Remarks about rectangular orthogonal matrices:

  • If \({(\mathbf q_j)}_{1⩽j⩽p} \in \mathbf C^{n}\) denote the \(p\) orthonormal column vectors of \(\mathsf Q\) and \({(\mathbf e_j)}_{1⩽j⩽p} \in \mathbf C^{p}\) the \(p\) canonical vectors of \(\mathbf C^{p}\), then \(\mathsf Q = \sum_{j=1}^p {\mathbf q}_j {\mathbf e}_j^*\)

  • It follows that \({\mathsf Q}^* \mathsf Q = \sum_{j=1}^p {\mathbf e}_j {\mathbf e}_j^* = \mathsf I_{p \times p}\) but \(\mathsf Q {\mathsf Q}^* = \sum_{j=1}^p {\mathbf q}_j {\mathbf q}_j^* \in \mathbf C^{n \times n}\) is the orthogonal projector onto the columns of \(\mathsf Q\) and is different from \(\mathsf I_{n \times n}\) as soon as \(p<n\).

Proof by induction on \(p\)

  • \(p=1\): \(\quad \mathsf A = \begin{pmatrix}\mathbf a \end{pmatrix} = \begin{pmatrix}\frac{\mathbf a}{\lVert \mathbf a \rVert} \end{pmatrix}\begin{pmatrix} \lVert \mathbf a \rVert \end{pmatrix}\)

  • Suppose \(\mathsf A \in \mathbf C^{n \times p}\) is of the form \(\mathsf A = \left( \widetilde {\mathsf A} \; \mathbf a \right)\) and assume that \(\widetilde {\mathsf A} = \widetilde {\mathsf Q} \widetilde {\mathsf R}\), for orthogonal \(\widetilde {\mathsf Q}\) and upper triangular \(\widetilde {\mathsf R}\).

Then \[\mathsf A = \begin{pmatrix} \widetilde {\mathsf Q} & \mathbf a \end{pmatrix} \begin{pmatrix} \widetilde {\mathsf R} & \mathbf 0 \\ \mathbf 0^T & 1 \end{pmatrix} = \begin{pmatrix} \widetilde {\mathsf Q} & \left(\mathbf a - \widetilde {\mathsf Q}\widetilde {\mathsf Q}^* \mathbf a\right) \end{pmatrix} \begin{pmatrix} \widetilde {\mathsf R} & \widetilde {\mathsf Q}^* \mathbf a \\ \mathbf 0^T & 1 \\ \end{pmatrix} \quad\textrm{motivated by}\quad \widetilde {\mathsf Q}^*\left(\mathbf a - \widetilde {\mathsf Q}\widetilde {\mathsf Q}^* \mathbf a\right)=\mathbf 0. \]

Denoting \(\mathbf q = \left(\mathbf a - \widetilde {\mathsf Q}\widetilde {\mathsf Q}^* \mathbf a\right) / \left\lVert \mathbf a - \widetilde {\mathsf Q}\widetilde {\mathsf Q}^* \mathbf a \right\rVert\), we obtain \[ \mathsf A = \underbrace{ \begin{pmatrix} \widetilde {\mathsf Q} & \mathbf q \end{pmatrix}}_{\text{orthogonal}} \underbrace{ \begin{pmatrix} \widetilde {\mathsf R} & \widetilde {\mathsf Q}^* \mathbf a \\ \mathbf 0 & {\left\lVert \mathbf a - \widetilde {\mathsf Q}\widetilde {\mathsf Q}^* \mathbf a \right\rVert} \end{pmatrix}}_{\text{upper triangular}} \quad\textrm{with}\quad \widetilde {\mathsf Q}^* \mathbf q=\mathbf 0 \quad\textrm{and}\quad \mathsf Q {\mathsf Q}^*=\widetilde {\mathsf Q}\widetilde {\mathsf Q}^* + {\mathbf q} {\mathbf q}^*. \]

Simultaneous iteration

Suppose that \(\mathsf X_0 \in \mathbf C^{n \times p}\) with \(p < n\) and consider the iteration

\[ \fcolorbox{blue}{lightgreen}{$ \mathsf X_{k+1} = {\color{green}\text{normalization}} (\mathsf A \mathsf X_{k}) $} \]

  • For normalization, apply Gram–Schmidt to the columns;
  • In practice, this can be achieved by QR decompostion: \[ \mathsf X_{k+1} = \mathsf Q_{k+1}, \qquad \mathsf Q_{k+1} \mathsf R_{k+1} = \mathsf A \mathsf X_k \]

    • \(\mathsf Q_{k+1} \in \mathbf C^{n\times p}\) is a unitary matrix: \(\mathsf Q_{k+1}^* \mathsf Q_{k+1} = \mathsf I_{p \times p}\);

    • \(\mathsf R_{k+1} \in \mathbf C^{p\times p}\) is upper triangular.

Remarks

  • The first column undergoes a simple power iteration;

  • Mathematically, it is equivalent to apply the QR decomposition at the end. Indeed \[ \mathsf X_{k} \underbrace{\mathsf R_{k} \mathsf R_{k-1} \dotsc \mathsf R_1}_{\text{upper tri.}} = \mathsf A \mathsf X_{k-1} \mathsf R_{k-1} \mathsf R_{k-2} \dotsc \mathsf R_{1} = \dots = \mathsf A^k \mathsf X_0. \] \({\color{green} \rightarrow}\) in particular, it holds that \(\mathop{\mathrm{col}}(\mathsf X_k) = \mathop{\mathrm{col}}(\mathsf A^k \mathsf X_0)\).

Theorem: Convergence of the simultaneous iteration

Assume that

  • \(\mathsf A \in \mathbf C^{n \times n}\) is Hermitian

  • \(\mathsf X_0 \in \mathbf C^{n \times p}\) has linearly independent columns

  • The subspace spanned by the columns of \(\mathsf X_0\) satisfies \[ \mathop{\mathrm{col}}(\mathsf X_0) \cap \mathop{\mathrm{Span}}\{ \mathbf{\boldsymbol{v}}_{p+1}, \dotsc, \mathbf{\boldsymbol{v}}_n \} = \varnothing. \]

  • The first \(p+1\) eigenvalues are distinct: \[ |\lambda_1| > |\lambda_2| > \dotsb > |\lambda_p| > |\lambda_{p+1}| \geqslant|\lambda_{p+2}| \geqslant\dotsc \geqslant|\lambda_n|, \] Then the columns \(\mathsf X_k\) converge essentially to \(\mathbf{\boldsymbol{v}}_1, \dots, \mathbf{\boldsymbol{v}}_p\).

Application: Random walk on a graph

Consider a random walk on the graph:

Assumption: jumps to adjacent nodes equi-probable.

State of the problem

  • The associated adjacency matrix \(\mathsf A \in \{0, 1\}^{n \times n}\) is given by \(a_{ij} = \begin{cases} 1 & \text{if there is an edge between $i$ and $j$,} \\ 0 & \text{otherwise.} \end{cases}\)
  • Let \(\left(X_k\right)_{k \in \mathbf N}\) be a Markov Chain taking values in the nodes such that the probability of transition \(i \rightarrow j\) is denoted

\[ \fcolorbox{blue}{lightgreen}{$p_{ij} = \mathbf P{(X_{k+1}=j | X_k=i)}$} \quad\textrm{≠0 if an edge exists between $i$ and $j$} \]

  • Assuming equi-probability of jumps to adjacent nodes, we have \(p_{ij} = \frac{a_{ij}}{n_e(i)}\) where \(n_e(i)\) is the number of edges connected to node \(i\).

    \({\color{green} \rightarrow}\) note that \(\mathsf A\) is symmetric but \(\mathsf P := (p_{ij})\) is not.

  • The marginal probabilty of \(X_{k+1}\) writes from the law of total probability

\[ \fcolorbox{blue}{lightgreen}{$ \mathbf P{(X_{k+1}=i)}=\sum_{j=1}^n \mathbf P{(X_{k+1}=i | X_k=j)} \mathbf P{(X_{k}=j)} =\sum_{j=1}^n p_{ji} \mathbf P{(X_{k}=j)} $} \]

\({\color{green} \rightarrow}\) the issue is that of the steady state of the distribution i.e. \(\lim_{k\to\infty} \mathbf P{(X_{k}=i)}\)

  • Steady state probability distribution \(\mathbf{\boldsymbol{v}}_1 \in \mathbf R_{\geqslant 0}^n\) satisfies \[ \mathsf P^T\mathbf{\boldsymbol{v}}_1 = \mathbf{\boldsymbol{v}}_1, \qquad \mathbf{\boldsymbol{1}}^T\mathbf{\boldsymbol{v}}_1 = 1. \] (it is possible to show that \(1\) is the dominant eigenvalue)

    \({\color{green} \rightarrow}\) eigenvalue problem for matrix \(\mathsf P^T\).

Constructing the adjacency matrix

function adj(m)
  x = vcat([collect(0:i) for i in 0:m-1]...)
  y = vcat([fill(m-i, i) for i in 1:m]...)
  row_lengths = 1:m
  row_indices = [0; cumsum(row_lengths)]
  A = zeros(Int, length(x), length(x))
  for i in 1:m-1
      for j in 1:row_lengths[i]
          r1 = row_indices[i]
          r2 = row_indices[i+1]
          A[r2 + j, r2 + j + 1] = 1  # Horizontal
          A[r1 + j, r2 + j]     = 1  # Vertical
          A[r1 + j, r2 + j + 1] = 1  # Diagonal
      end
  end
  x, y, A + A'
end
  • Here ... is the splat operator (?... for more info)

    sum3(a, b, c) = a + b + c
    numbers = [1, 2, 3]
    sum3(numbers...)
    6
  • Also works with keyword arguments (use ; and NamedTuple)

    plotargs = (color=:blue, label="example")
    plot(sin; plotargs...)
  • Exercise: write function transition(m) that builds \(\mathsf P\).

Notice that \(\mathsf A\) is a sparse matrix:

# Adjancency matrix
x, y, A = adj(10);
Plots.spy(A, xlabel=L"i", ylabel=L"j")
Plots.plot!(size=(600, 600))

This is the case in many applications:

  • Discretization of partial differential equations;

  • Graph applications.

Calculating the steady state

Steady state probability distribution satisfies \[ \mathsf P^T\mathbf{\boldsymbol{v}}_1 = \mathbf{\boldsymbol{v}}_1 \]

Illustration with units in , rounded to integers:

using LinearAlgebra
v₁ = eigvecs(P', sortby=real)[:, end] |> real
v₁ = v₁ / sum(v₁)  # Normalization
plotgraph(v₁)

Strategy for finding \(\mathbf{\boldsymbol{v}}_1\): evolve probability vector until equilibrium: \[ \mathbf{\boldsymbol{x}}^{(k+1)} = \mathsf P^T\mathbf{\boldsymbol{x}}^{(k)} \qquad \Leftrightarrow \qquad x^{(k+1)}_i = \sum_{j=1}^{n_n} p_{ji} x^{(k)}_j \]

  • \(x^{(k)}_i\) is the probability of being at node \(i\) at iteration \(k\);

  • This is the power iteration;

  • Below we take \(\mathbf{\boldsymbol{x}}^{(0)} = \mathbf{\boldsymbol{e}}_1\).

Efficiency considerations

Consider the following implementation of the power iteration

function power_iteration(M, x, niter)
    for i in 1:niter
        x = M*x
        # Here normalization is not required (why?)
    end
    return x
end

# Dense transition matrix transposed
Pᵗ = transition(20) |> transpose |> Matrix;

# Initial iterate
nn = size(Pᵗ, 1) ; x = zeros(nn) ; x[1] = 1;

\(~\)

Naive approach: use dense data structure for \(\mathsf P^T\)

using CountFlops
flops = @count_ops power_iteration(Pᵗ, x, 10)
memory = sizeof(Pᵗ)
println(flops, "\n", "Memory: $memory bytes")
add64: 441000
mul64: 441000

Memory: 352800 bytes

\({\color{green} \rightarrow}\) time is wasted on 0s. Try zeros(10_000, 10_000)^2.

Better solution: use sparse data structure

import Base: *, transpose
struct coo_matrix
    rows::Array{Int}       # Row indices of ≠0
    cols::Array{Int}       # Column indices of ≠0
    vals::Array{Float64}   # Values of ≠0 entries
    m::Int                 # Number of rows
    n::Int                 # Number of columns
end

function transpose(A::coo_matrix)
    coo_matrix(A.cols, A.rows, A.vals, A.n, A.m)
end

\({\color{green} \rightarrow}\) Exercise: define the * and to_coo functions

Efficiency considerations

Consider the following implementation of the power iteration

function power_iteration(M, x, niter)
    for i in 1:niter
        x = M*x
        # Here normalization is not required (why?)
    end
    return x
end

# Dense transition matrix transposed
Pᵗ = transition(20) |> transpose |> Matrix;

# Initial iterate
nn = size(Pᵗ, 1) ; x = zeros(nn) ; x[1] = 1;

\(~\)

Naive approach: use dense data structure for \(\mathsf P^T\)

using CountFlops
flops = @count_ops power_iteration(Pᵗ, x, 10)
memory = sizeof(Pᵗ)
println(flops, "\n", "Memory: $memory bytes")
add64: 441000
mul64: 441000

Memory: 352800 bytes

\({\color{green} \rightarrow}\) time is wasted on 0s. Try zeros(10_000, 10_000)^2.

Better solution: use sparse data structure

using CountFlops
sparse_Pᵗ = to_coo(Pᵗ)
flops = @count_ops power_iteration(sparse_Pᵗ, x, 10)
memory = sizeof(sparse_Pᵗ)
println(flops, "\n", "Memory: $memory bytes")
add64: 11400
mul64: 11400

Memory: 40 bytes

Sparse matrix formats

We take the following matrix for illustration: \[ M = \begin{pmatrix} 5 & . & . \\ . & 6 & 7 \\ 8 & . & 9 \end{pmatrix} \]

  • Coordinate list (COO), presented on the previous slide;

    • rows is [1, 2, 2, 3, 3]

    • cols is [1, 2, 3, 1, 3]

    • vals is [5, 6, 7, 8, 9]

  • Compressed sparse rows (CSR, for information only):

    • Similar to COO, but often more memory-efficient: rows is compressed;

    • Vector rows contains the indices, in cols and vals, where each row starts…

    • … except for rows[m+1], which is 1 + the number of nonzeros;

    • rows is [1, 2, 4, 6]

    • cols is [1, 2, 3, 1, 3]

    • vals is [5, 6, 7, 8, 9]

  • Compressed sparse columns (CSC, for information only):

    • Same as CSR with rows \(\leftrightarrow\) columns;

    • Implemented in julia by SparseArrays.

using SparseArrays
Z = spzeros(3, 3)
3×3 SparseMatrixCSC{Float64, Int64} with 0 stored entries:
  ⋅    ⋅    ⋅ 
  ⋅    ⋅    ⋅ 
  ⋅    ⋅    ⋅ 

\(~\)

M = [5 0 0; 0 6 7; 8 0 9]
M = sparse(M)
3×3 SparseMatrixCSC{Int64, Int64} with 5 stored entries:
 5  ⋅  ⋅
 ⋅  6  7
 8  ⋅  9

\(~\)

# Constructor using COO format
M = sparse([1, 2, 23, 3],
           [1, 2, 31, 3],
           [5, 6, 78, 9])
3×3 SparseMatrixCSC{Int64, Int64} with 5 stored entries:
 5  ⋅  ⋅
 ⋅  6  7
 8  ⋅  9

\(~\)

println(M.rowval)
println(M.colptr)
println(M.nzval)
[1, 3, 2, 2, 3]
[1, 3, 4, 6]
[5, 8, 6, 7, 9]

See also sprand and spdiagm.