From Surf Wiki (app.surf) — the open knowledge base
Min-max theorem
Theorem in functional analysis
Theorem in functional analysis
In linear algebra and functional analysis, the min-max theorem, or variational theorem, or Courant–Fischer–Weyl min-max principle, is a result that gives a variational characterization of eigenvalues of compact Hermitian operators on Hilbert spaces. It can be viewed as the starting point of many results of similar nature.
This article first discusses the finite-dimensional case and its applications before considering compact operators on infinite-dimensional Hilbert spaces. We will see that for compact operators, the proof of the main theorem uses essentially the same idea from the finite-dimensional argument.
In the case that the operator is non-Hermitian, the theorem provides an equivalent characterization of the associated singular values. The min-max theorem can be extended to self-adjoint operators that are bounded below.
Matrices
Let A be a n × n Hermitian matrix. As with many other variational results on eigenvalues, one considers the Rayleigh–Ritz quotient RA : Cn \ {0} → R defined by
:R_A(x) = \frac{(Ax, x)}{(x,x)}
where (⋅, ⋅) denotes the Euclidean inner product on Cn. Equivalently, the Rayleigh–Ritz quotient can be replaced by
:f(x) = (Ax, x), ; |x| = 1.
The Rayleigh quotient of an eigenvector v is its associated eigenvalue \lambda because R_A(v) = (\lambda x, x)/(x, x) = \lambda. For a Hermitian matrix A, the range of the continuous functions RA(x) and f(x) is a compact interval [a, b] of the real line. The maximum b and the minimum a are the largest and smallest eigenvalue of A, respectively. The min-max theorem is a refinement of this fact.
Min-max theorem
Let A be Hermitian on an inner product space V with dimension n, with spectrum ordered in descending order \lambda_1 \geq ... \geq \lambda_n.
Let v_1, ..., v_n be the corresponding unit-length orthogonal eigenvectors.
Reverse the spectrum ordering, so that \xi_1 = \lambda_n, ..., \xi_n = \lambda_1.
\langle x, Ax\rangle\leq \lambda_k, and \langle y, Ay\rangle \geq \xi_k.
Part 2 is a corollary, using -A.
M is a k dimensional subspace, so if we pick any list of n-k+1 vectors, their span N := span(v_k, ... v_n) must intersect M on at least a single line.
Take unit x \in M\cap N. That’s what we need.
: x = \sum_{i=k}^n a_i v_i, since x\in N.
: Since \sum_{i=k}^n |a_i|^2 = 1, we find \langle x,Ax \rangle = \sum_{i=k}^n |a_i|^2\lambda_i \leq \lambda_k.
\lambda_k &=\max _{\begin{array}{c} \mathcal{M} \subset V \ \operatorname{dim}(\mathcal{M})=k \end{array}} \min _{\begin{array}{c} x \in \mathcal{M} \ |x|=1 \end{array}}\langle x, A x\rangle\ &=\min _{\begin{array}{c} \mathcal{M} \subset V \ \operatorname{dim}(\mathcal{M})=n-k+1 \end{array}} \max _{\begin{array}{c} x \in \mathcal{M} \ |x|=1 \end{array}}\langle x, A x\rangle \text{. } \end{aligned}
Part 2 is a corollary of part 1, by using -A.
By Poincare’s inequality, \lambda_k is an upper bound to the right side.
By setting \mathcal M = span(v_1, ... v_k), the upper bound is achieved.
Define the partial trace tr_V(A) to be the trace of projection of A to V. It is equal to \sum_i v_i^*Av_i given an orthonormal basis of V.
Let 1 \leq i_1 be integers. Define a partial flag to be a nested collection V_1 \subset \cdots \subset V_k of subspaces of \mathbb{C}^n such that \operatorname{dim}\left(V_j\right)=i_j for all 1 \leq j \leq k.
Define the associated Schubert variety X\left(V_1, \ldots, V_k\right) to be the collection of all k dimensional subspaces W such that \operatorname{dim}\left(W \cap V_j\right) \geq j.
\lambda_{i_1}(A)+\cdots+\lambda_{i_k}(A)=\sup {V_1, \ldots, V_k} \inf{W \in X\left(V_1, \ldots, V_k\right)} tr_W(A)
The \leq case.
Let V_{j} = span(e_1, \dots, e_{i_j}), and any W \in X\left(V_1, \ldots, V_k\right), it remains to show that \lambda_{i_1}(A)+\cdots+\lambda_{i_k}(A) \leq tr_W(A)
To show this, we construct an orthonormal set of vectors v_1, \dots, v_k such that v_j \in V_j \cap W. Then tr_W(A) \geq \sum_j \langle v_j, Av_j\rangle \geq \lambda_{i_j}(A)
Since dim(V_1 \cap W) \geq 1, we pick any unit v_1 \in V_1 \cap W. Next, since dim(V_2 \cap W) \geq 2, we pick any unit v_2 \in (V_2 \cap W) that is perpendicular to v_1, and so on.
The \geq case.
For any such sequence of subspaces V_i, we must find some W \in X\left(V_1, \ldots, V_k\right) such that \lambda_{i_1}(A)+\cdots+\lambda_{i_k}(A) \geq tr_W(A)
Now we prove this by induction.
The n=1 case is the Courant-Fischer theorem. Assume now n \geq 2.
If i_1 \geq 2, then we can apply induction. Let E = span(e_{i_1}, \dots, e_n). We construct a partial flag within E from the intersection of E with V_1, \dots, V_k.
We begin by picking a (i_k-(i_1-1))-dimensional subspace W_k' \subset E \cap V_{i_k}, which exists by counting dimensions. This has codimension (i_1-1) within V_{i_k}.
Then we go down by one space, to pick a (i_{k-1} - (i_1 - 1))-dimensional subspace W_{k-1}' \subset W_k \cap V_{i_{k-1}}. This still exists. Etc. Now since dim(E) \leq n-1, apply the induction hypothesis, there exists some W \in X(W_1, \dots, W_k) such that \lambda_{i_1 - (i_1-1)}(A|E)+\cdots+\lambda_{i_k- (i_1-1)}(A|E) \geq tr_W(A) Now \lambda_{i_j - (i_1-1)}(A|E) is the (i_j-(i_1-1))-th eigenvalue of A orthogonally projected down to E. By Cauchy interlacing theorem, \lambda_{i_j - (i_1-1)}(A|E) \leq \lambda_{i_j}(A). Since X(W_1, \dots, W_k)\subset X(V_1, \dots, V_k), we’re done.
If i_1 = 1, then we perform a similar construction. Let E = span(e_{2}, \dots, e_n). If V_k \subset E, then we can induct. Otherwise, we construct a partial flag sequence W_2, \dots, W_k By induction, there exists some W' \in X(W_2, \dots, W_k)\subset X(V_2, \dots, V_k), such that \lambda_{i_2-1}(A|E)+\cdots+\lambda_{i_k-1}(A|E) \geq tr_{W'}(A) thus
\lambda_{i_2}(A)+\cdots+\lambda_{i_k}(A) \geq tr_{W'}(A) And it remains to find some v such that W' \oplus v \in X(V_1, \dots, V_k).
If V_1 \not\subset W', then any v \in V_1 \setminus W' would work. Otherwise, if V_2 \not\subset W', then any v \in V_2 \setminus W' would work, and so on. If none of these work, then it means V_k \subset E, contradiction.
This has some corollaries:
\lambda_1(A)+\dots+\lambda_k(A)=\sup_{\operatorname{dim}(V)=k }tr_V(A)
\xi_1(A)+\dots+\xi_k(A)=\inf_{\operatorname{dim}(V)=k }tr_V(A)
The sum \lambda_1(A)+\dots+\lambda_k(A) is a convex function, and \xi_1(A)+\dots+\xi_k(A) is concave.
(Schur-Horn inequality) \xi_1(A)+\dots+\xi_k(A) \leq a_{i_1,i_1} + \dots + a_{i_k,i_k} \leq \lambda_1(A)+\dots+\lambda_k(A) for any subset of indices.
Equivalently, this states that the diagonal vector of A is majorized by its eigenspectrum.
Given Hermitian A, B and Hölder pair 1/p + 1/q = 1, |\operatorname{tr}(A B)| \leq|A|{S^p}|B|{S^q}
WLOG, B is diagonalized, then we need to show |\sum_i B_{ii} A_{ii} | \leq |A |{S^p} |(B{ii})|_{l^q}
By the standard Hölder inequality, it suffices to show |(A_{ii})|{l^p}\leq |A |{S^p}
By the Schur-Horn inequality, the diagonals of A are majorized by the eigenspectrum of A, and since the map f(x_1, \dots, x_n) = |x|_p is symmetric and convex, it is Schur-convex.
Counterexample in the non-Hermitian case
Let N be the nilpotent matrix
:\begin{bmatrix} 0 & 1 \ 0 & 0 \end{bmatrix}.
Define the Rayleigh quotient R_N(x) exactly as above in the Hermitian case. Then it is easy to see that the only eigenvalue of N is zero, while the maximum value of the Rayleigh quotient is . That is, the maximum value of the Rayleigh quotient is larger than the maximum eigenvalue.
Applications
Min-max principle for singular values
The singular values {σk} of a square matrix M are the square roots of the eigenvalues of MM (equivalently MM). An immediate consequence of the first equality in the min-max theorem is:
:\sigma_k^{\downarrow} = \max_{S:\dim(S)=k} \min_{x \in S, |x| = 1} (M^* Mx, x)^{\frac{1}{2}}=\max_{S:\dim(S)=k} \min_{x \in S, |x| = 1} | Mx |.
Similarly,
:\sigma_k^{\downarrow} = \min_{S:\dim(S)=n-k+1} \max_{x \in S, |x| = 1} | Mx |.
Here \sigma_k^{\downarrow} denotes the kth entry in the decreasing sequence of the singular values, so that \sigma_1^{\downarrow} \geq \sigma_2^{\downarrow} \geq \cdots .
Cauchy interlacing theorem
Main article: Poincaré separation theorem
Let A be a symmetric n × n matrix. The m × m matrix B, where m ≤ n, is called a compression of A if there exists an orthogonal projection P onto a subspace of dimension m such that PAP* = B. The Cauchy interlacing theorem states:
:Theorem. If the eigenvalues of A are α1 ≤ ... ≤ αn, and those of B are β1 ≤ ... ≤ βj ≤ ... ≤ βm, then for all j ≤ m, ::\alpha_j \leq \beta_j \leq \alpha_{n-m+j}.
This can be proven using the min-max principle. Let βi have corresponding eigenvector bi and Sj be the j dimensional subspace then
:\beta_j = \max_{x \in S_j, |x| = 1} (Bx, x) = \max_{x \in S_j, |x| = 1} (PAP^*x, x) \geq \min_{S_j} \max_{x \in S_j, |x| = 1} (A(P^*x), P^*x) = \alpha_j.
According to first part of min-max, αj ≤ βj. On the other hand, if we define then
:\beta_j = \min_{x \in S_{m-j+1}, |x| = 1} (Bx, x) = \min_{x \in S_{m-j+1}, |x| = 1} (PAP^*x, x)= \min_{x \in S_{m-j+1}, |x| = 1} (A(P^*x), P^*x) \leq \alpha_{n-m+j},
where the last inequality is given by the second part of min-max.
When , we have αj ≤ βj ≤ α**j+1, hence the name interlacing theorem.
Lidskii's inequality
Main article: Trace class#Lidskii's theorem
& \lambda_{i_1}(A+B)+\cdots+\lambda_{i_k}(A+B) \ & \quad \leq \lambda_{i_1}(A)+\cdots+\lambda_{i_k}(A)+\lambda_1(B)+\cdots+\lambda_k(B) \end{aligned}
\begin{aligned} & \lambda_{i_1}(A+B)+\cdots+\lambda_{i_k}(A+B) \ & \quad \geq \lambda_{i_1}(A)+\cdots+\lambda_{i_k}(A)+\xi_1(B)+\cdots+\xi_k(B) \end{aligned}
The second is the negative of the first. The first is by Wielandt minimax.
\begin{aligned} & \lambda_{i_1}(A+B)+\cdots+\lambda_{i_k}(A+B) \ =& \sup_{V_1, \dots, V_k} \inf_{W\in X(V_1, \dots, V_k)}(tr_W(A) + tr_W(B)) \ =& \sup_{V_1, \dots, V_k} ( \inf_{W\in X(V_1, \dots, V_k)} tr_W(A) + tr_W(B)) \ \leq& \sup_{V_1, \dots, V_k} ( \inf_{W\in X(V_1, \dots, V_k)} tr_W(A) + (\lambda_1(B)+\cdots+\lambda_k(B))) \ =& \lambda_{i_1}(A)+\cdots+\lambda_{i_k}(A)+\lambda_1(B)+\cdots+\lambda_k(B) \end{aligned}
Note that \sum_i \lambda_i(A+B) = tr(A+B) = \sum_i \lambda_i(A) + \lambda_i(B) . In other words, \lambda(A+B) - \lambda(A) \preceq \lambda(B) where \preceq means majorization. By the Schur convexity theorem, we then have
Compact operators
Let A be a compact, Hermitian operator on a Hilbert space H. Recall that the non-zero spectrum of such an operator consists of real eigenvalues with finite multiplicities whose only possible cluster point is zero. If A has infinitely many positive eigenvalues, they accumulate at zero. In this case, we list the positive eigenvalues of A as
:\cdots \le \lambda_k \le \cdots \le \lambda_1,
where entries are repeated with multiplicity, as in the matrix case. (To emphasize that the sequence is decreasing, we may write \lambda_k = \lambda_k^\downarrow.) We now apply the same reasoning as in the matrix case. Letting Sk ⊂ H be a k dimensional subspace, we can obtain the following theorem.
:Theorem (Min-Max). Let A be a compact, self-adjoint operator on a Hilbert space H, whose positive eigenvalues are listed in decreasing order ... ≤ λk ≤ ... ≤ λ1. Then: ::\begin{align} \max_{S_k} \min_{x \in S_k, |x| = 1} (Ax,x) &= \lambda_k ^{\downarrow}, \ \min_{S_{k-1}} \max_{x \in S_{k-1}^{\perp}, |x|=1} (Ax, x) &= \lambda_k^{\downarrow}. \end{align}
A similar pair of equalities hold for negative eigenvalues.
Let *S' * be the closure of the linear span S' =\operatorname{span}{u_k,u_{k+1},\ldots}. The subspace *S' * has codimension k − 1. By the same dimension count argument as in the matrix case, *S' * ∩ Sk has positive dimension. So there exists x ∈ *S' * ∩ Sk with |x|=1. Since it is an element of *S' *, such an x necessarily satisfy
:(Ax, x) \le \lambda_k.
Therefore, for all Sk
:\inf_{x \in S_k, |x| = 1}(Ax,x) \le \lambda_k
But A is compact, therefore the function f(x) = (Ax, x) is weakly continuous. Furthermore, any bounded set in H is weakly compact. This lets us replace the infimum by minimum:
:\min_{x \in S_k, |x| = 1}(Ax,x) \le \lambda_k.
So
:\sup_{S_k} \min_{x \in S_k, |x| = 1}(Ax,x) \le \lambda_k.
Because equality is achieved when S_k=\operatorname{span}{u_1,\ldots,u_k},
:\max_{S_k} \min_{x \in S_k, |x| = 1}(Ax,x) = \lambda_k.
This is the first part of min-max theorem for compact self-adjoint operators.
Analogously, consider now a (k − 1)-dimensional subspace S**k−1, whose the orthogonal complement is denoted by S**k−1⊥. If *S' * = span{u1...uk},
:S' \cap S_{k-1}^{\perp} \ne {0}.
So
:\exists x \in S_{k-1}^{\perp} , |x| = 1, (Ax, x) \ge \lambda_k.
This implies
:\max_{x \in S_{k-1}^{\perp}, |x| = 1} (Ax, x) \ge \lambda_k
where the compactness of A was applied. Index the above by the collection of k-1-dimensional subspaces gives
:\inf_{S_{k-1}} \max_{x \in S_{k-1}^{\perp}, |x|=1} (Ax, x) \ge \lambda_k.
Pick S**k−1 = span{u1, ..., u**k−1} and we deduce
:\min_{S_{k-1}} \max_{x \in S_{k-1}^{\perp}, |x|=1} (Ax, x) = \lambda_k.
Self-adjoint operators
The min-max theorem also applies to (possibly unbounded) self-adjoint operators. Recall the essential spectrum is the spectrum without isolated eigenvalues of finite multiplicity. Sometimes we have some eigenvalues below the essential spectrum, and we would like to approximate the eigenvalues and eigenfunctions.
:Theorem (Min-Max). Let A be self-adjoint, and let E_1\le E_2\le E_3\le\cdots be the eigenvalues of A below the essential spectrum. Then
E_n=\min_{\psi_1,\ldots,\psi_{n}}\max{\langle\psi,A\psi\rangle:\psi\in\operatorname{span}(\psi_1,\ldots,\psi_{n}), , | \psi | = 1}.
If we only have N eigenvalues and hence run out of eigenvalues, then we let E_n:=\inf\sigma_{ess}(A) (the bottom of the essential spectrum) for nN, and the above statement holds after replacing min-max with inf-sup.
:Theorem (Max-Min). Let A be self-adjoint, and let E_1\le E_2\le E_3\le\cdots be the eigenvalues of A below the essential spectrum. Then
E_n=\max_{\psi_1,\ldots,\psi_{n-1}}\min{\langle\psi,A\psi\rangle:\psi\perp\psi_1,\ldots,\psi_{n-1}, , | \psi | = 1}.
If we only have N eigenvalues and hence run out of eigenvalues, then we let E_n:=\inf\sigma_{ess}(A) (the bottom of the essential spectrum) for n N, and the above statement holds after replacing max-min with sup-inf.
The proofs use the following results about self-adjoint operators:
:Theorem. Let A be self-adjoint. Then (A-E)\ge0 for E\in\mathbb{R} if and only if \sigma(A)\subseteq[E,\infty).
:Theorem. If A is self-adjoint, then
\inf\sigma(A)=\inf_{\psi\in\mathfrak{D}(A),|\psi|=1}\langle\psi,A\psi\rangle
and
\sup\sigma(A)=\sup_{\psi\in\mathfrak{D}(A),|\psi|=1}\langle\psi,A\psi\rangle.
References
References
- Tao, Terence. (2012). "Topics in random matrix theory". American Mathematical Society.
- G. Teschl, Mathematical Methods in Quantum Mechanics (GSM 99) https://www.mat.univie.ac.at/~gerald/ftp/book-schroe/schroe.pdf
- (2001). "Analysis". American Mathematical Society.
This article was imported from Wikipedia and is available under the Creative Commons Attribution-ShareAlike 4.0 License. Content has been adapted to SurfDoc format. Original contributors can be found on the article history page.
Ask Mako anything about Min-max theorem — get instant answers, deeper analysis, and related topics.
Research with MakoFree with your Surf account
Create a free account to save articles, ask Mako questions, and organize your research.
Sign up freeThis content may have been generated or modified by AI. CloudSurf Software LLC is not responsible for the accuracy, completeness, or reliability of AI-generated content. Always verify important information from primary sources.
Report