LINEAR ALGEBRA, WITH OPTIMIZATION

LINEAR ALGEBRA
(WITH OPTIMIZATION)
CHARAK RAY
LIBRA.CHARAK@GMAIL.COM

OUTLINE
• Basic definitions
• Subspaces and Dimensionality
• Matrix functions: inverses and eigenvalue decompositions
• Convex optimization

VECTORS AND MATRICES
• Vector 𝑥 ∈ ℝ 𝑑
𝑥 =
𝑥1
𝑥2
⋮
𝑥 𝑑
• May also write
𝑥 = 𝑥1 𝑥2 … 𝑥 𝑑
𝑇

VECTORS AND MATRICES
• Matrix 𝑀 ∈ ℝ 𝑚×𝑛
𝑀 =
𝑀11 ⋯ 𝑀1𝑛
⋮ ⋱ ⋮
𝑀 𝑚1 ⋯ 𝑀 𝑚𝑛
• Written in terms of rows or columns
𝑀 =
𝒓1
𝑇
⋮
𝒓 𝑚
𝑇
= 𝒄1 … 𝒄 𝑛
𝒓𝑖 = 𝑀𝑖1 … 𝑀𝑖𝑛
𝑇 𝒄𝑖 = 𝑀1𝑖 … 𝑀 𝑚𝑖
𝑇

MULTIPLICATION
• Vector-vector: 𝑥, 𝑦 ∈ ℝ 𝑑
→ ℝ
𝑥 𝑇
𝑦 =
𝑖=1
𝑑
𝑥𝑖 𝑦𝑖
• Matrix-vector: 𝑥 ∈ ℝ 𝑛
, 𝑀 ∈ ℝ 𝑚×𝑛
→ ℝ 𝑚
𝑀𝑥 =
𝒓1
𝑇
⋮
𝒓 𝑚
𝑇
𝑥 =
𝒓1
𝑇
𝑥
⋮
𝒓 𝑚
𝑇
𝑥

MULTIPLICATION
• Matrix-matrix: 𝐴 ∈ ℝ 𝑚×𝑘
, 𝐵 ∈ ℝ 𝑘×𝑛
→ ℝ 𝑚×𝑛
=5
3
3
4 4
5

MULTIPLICATION
• Matrix-matrix: 𝐴 ∈ ℝ 𝑚×𝑘
, 𝐵 ∈ ℝ 𝑘×𝑛
→ ℝ 𝑚×𝑛
• 𝒂𝑖 rows of 𝐴, 𝒃𝑗 cols of 𝐵
𝐴𝐵 = 𝐴 𝒃1 … 𝐴𝒃 𝑛 =
𝒂1
𝑇
𝐵
⋮
𝒂 𝑚
𝑇 𝐵
=
𝒂1
𝑇
𝒃1 ⋯ 𝒂1
𝑇
𝒃 𝑛
⋮ 𝒂𝑖
𝑇
𝒃𝑗 ⋮
𝒂 𝑚
𝑇 𝒃1 ⋯ 𝒂 𝑚
𝑇 𝒃 𝑛

MULTIPLICATION PROPERTIES
• Associative
𝐴𝐵 𝐶 = 𝐴 𝐵𝐶
• Distributive
𝐴 𝐵 + 𝐶 = 𝐴𝐵 + 𝐵𝐶
• NOT commutative
𝐴𝐵 ≠ 𝐵𝐴
• Dimensions may not even be conformable

USEFUL MATRICES
• Identity matrix 𝐼 ∈ ℝ 𝑚×𝑚
• 𝐴𝐼 = 𝐴, 𝐼𝐴 = 𝐴
1 0 0
0 1 0
0 0 1
𝐼𝑖𝑗 =
0 𝑖 ≠ 𝑗
1 𝑖 = 𝑗
• Diagonal matrix 𝐴 ∈ ℝ 𝑚×𝑚
𝐴 = diag 𝑎1, … , 𝑎 𝑚 =
𝑎1 ⋯ 0
⋮ 𝑎𝑖 ⋮
0 ⋯ 𝑎 𝑚

USEFUL MATRICES
• Symmetric 𝐴 ∈ ℝ 𝑚×𝑚: 𝐴 = 𝐴 𝑇
• Orthogonal 𝑈 ∈ ℝ 𝑚×𝑚:
𝑈 𝑇
𝑈 = 𝑈𝑈 𝑇
= 𝐼
• Columns/ rows are orthonormal
• Positive semidefinite 𝐴 ∈ ℝ 𝑚×𝑚
:
𝑥 𝑇 𝐴𝑥 ≥ 0 for all 𝑥 ∈ ℝ 𝑚
• Equivalently, there exists 𝐿 ∈ ℝ 𝑚×𝑚
𝐴 = 𝐿𝐿 𝑇

NORMS
• Quantify “size” of a vector
• Given 𝑥 ∈ ℝ 𝑛, a norm satisfies
1. 𝑐𝑥 = 𝑐 𝑥
2. 𝑥 = 0 ⇔ 𝑥 = 0
3. 𝑥 + 𝑦 ≤ 𝑥 + 𝑦
• Common norms:
1. Euclidean 𝐿2-norm: 𝑥 2 = 𝑥1
2
+ ⋯ + 𝑥 𝑛
2
2. 𝐿1-norm: 𝑥 1 = 𝑥1 + ⋯ + 𝑥 𝑛
3. 𝐿∞-norm: 𝑥 ∞ = max
𝑖
𝑥𝑖

LINEAR SUBSPACES
• Subspace 𝒱 ⊂ ℝ 𝑛 satisfies
1. 0 ∈ 𝒱
2. If 𝑥, 𝑦 ∈ 𝒱 and 𝑐 ∈ ℝ, then 𝑐 𝑥 + 𝑦 ∈ 𝒱
• Vectors 𝒙1, … , 𝒙 𝑚 span 𝒱 if
𝒱 = 𝑖=1
𝑚
𝛼𝑖 𝒙𝑖 𝛼 ∈ ℝ 𝑚

LINEAR INDEPENDENCE AND DIMENSION
• Vectors 𝒙1, … , 𝒙 𝑚 are linearly independent if
𝑖=1
𝑚
𝛼𝑖 𝒙𝑖 = 0 ⟺ 𝛼 = 0
• Every linear combination of the 𝒙𝑖 is unique
• Dim 𝒱 = 𝑚 if 𝒙1, … , 𝒙 𝑚 span 𝒱 and are linearly independent
• If 𝒚1, … , 𝒚 𝑘 span 𝒱 then
• 𝑘 ≥ 𝑚
• If 𝑘 > 𝑚 then 𝒚𝑖 are NOT linearly independent

LINEAR INDEPENDENCE AND DIMENSION

MATRIX SUBSPACES
• Matrix 𝑀 ∈ ℝ 𝑚×𝑛
defines two subspaces
• Column space col 𝑀 = 𝑀𝛼 𝛼 ∈ ℝ 𝑛
⊂ ℝ 𝑚
• Row space row 𝑀 = 𝑀 𝑇
𝛽 𝛽 ∈ ℝ 𝑚
⊂ ℝ 𝑛
• Nullspace of 𝑀: null 𝑀 = 𝑥 ∈ ℝ 𝑛
𝑀𝑥 = 0
• null 𝑀 ⊥ row 𝑀
• dim null 𝑀 + dim row 𝑀 = 𝑛
• Analog for column space

MATRIX RANK
• rank 𝑀 gives dimensionality of row and column spaces
• If 𝑀 ∈ ℝ 𝑚×𝑛 has rank 𝑘, can decompose into product of 𝑚 × 𝑘 and 𝑘 × 𝑛
matrices
𝑀
rank = 𝑘
=𝑚
𝑛
𝑚
𝑛
𝑘
𝑘

PROPERTIES OF RANK
• For 𝐴, 𝐵 ∈ ℝ 𝑚×𝑛
1. rank 𝐴 ≤ min 𝑚, 𝑛
2. rank 𝐴 = rank 𝐴 𝑇
3. rank 𝐴𝐵 ≤ min rank 𝐴 , rank 𝐵
4. rank 𝐴 + 𝐵 ≤ rank 𝐴 + rank 𝐵
• 𝐴 has full rank if rank 𝐴 = min 𝑚, 𝑛
• If 𝑚 > rank 𝐴 rows not linearly independent
• Same for columns if 𝑛 > rank 𝐴

MATRIX INVERSE
• 𝑀 ∈ ℝ 𝑚×𝑚
is invertible iff rank 𝑀 = 𝑚
• Inverse is unique and satisfies
1. 𝑀−1
𝑀 = 𝑀𝑀−1
= 𝐼
2. 𝑀−1 −1
= 𝑀
3. 𝑀 𝑇 −1
= 𝑀−1 𝑇
4. If 𝐴 is invertible then 𝑀𝐴 is invertible and 𝑀𝐴 −1
= 𝐴−1
𝑀−1

SYSTEMS OF EQUATIONS
• Given 𝑀 ∈ ℝ 𝑚×𝑛, 𝑦 ∈ ℝ 𝑚 wish to solve
𝑀𝑥 = 𝑦
• Exists only if 𝑦 ∈ col 𝑀
• Possibly infinite number of solutions
• If 𝑀 is invertible then 𝑥 = 𝑀−1
𝑦
• Notational device, do not actually invert matrices
• Computationally, use solving routines like Gaussian elimination

• What if 𝑦 ∉ col 𝑀 ?
• Find 𝑥 that gives 𝑦 = 𝑀𝑥 closest to 𝑦
• 𝑦 is projection of 𝑦 onto col 𝑀
• Also known as regression
• Assume rank 𝑀 = 𝑛 < 𝑚
𝑥 = 𝑀 𝑇
𝑀 −1
𝑀 𝑇
𝑦 𝑦 = 𝑀 𝑀 𝑇
𝑀 −1
𝑀 𝑇
𝑦
Invertible Projection
matrix

1 0
2 1
.5
−2.5
=
.5
−1.5
−1 1
1 1
−1
−.5
=
.5
−1.5

EIGENVALUE DECOMPOSITION
• Eigenvalue decomposition of symmetric 𝑀 ∈ ℝ 𝑚×𝑚
is
𝑀 = 𝑄Σ𝑄 𝑇 =
𝑖=1
𝑚
𝜆𝑖 𝒒𝑖 𝒒𝑖
𝑇
• Σ = diag 𝜆1, … , 𝜆 𝑛 contains eigenvalues of 𝑀
• 𝑄 is orthogonal and contains eigenvectors 𝒒𝑖 of 𝑀
• If 𝑀 is not symmetric but diagonalizable
𝑀 = 𝑄Σ𝑄−1
• Σ is diagonal by possibly complex
• 𝑄 not necessarily orthogonal

CHARACTERIZATIONS OF EIGENVALUES
• Traditional formulation
𝑀𝑥 = 𝜆𝑥
• Leads to characteristic polynomial
det 𝑀 − 𝜆𝐼 = 0
• Rayleigh quotient (symmetric 𝑀)
max
𝑥
𝑥 𝑇
𝑀𝑥
𝑥 2
2

EIGENVALUE PROPERTIES
• For 𝑀 ∈ ℝ 𝑚×𝑚
with eigenvalues 𝜆𝑖
1. tr 𝑀 = 𝑖=1
𝑚
𝜆𝑖
2. det 𝑀 = 𝜆1 𝜆2 … 𝜆 𝑚
3. rank 𝑀 = #𝜆𝑖 ≠ 0
• When 𝑀 is symmetric
• Eigenvalue decomposition is singular value decomposition
• Eigenvectors for nonzero eigenvalues give orthogonal basis for row 𝑀 =
col 𝑀

SIMPLE EIGENVALUE PROOF
• Why det 𝑀 − 𝜆𝐼 = 0?
• Assume 𝑀 is symmetric and full rank
1. 𝑀 = 𝑄Σ𝑄 𝑇
2. 𝑀 − 𝜆𝐼 = 𝑄Σ𝑄 𝑇 − 𝜆𝐼 = 𝑄 Σ − 𝜆𝐼 𝑄 𝑇
3. If 𝜆 = 𝜆𝑖, 𝑖th eigenvalue of 𝑀 − 𝜆𝐼 is 0
4. Since det 𝑀 − 𝜆𝐼 is product of eigenvalues, one of the terms is 0, so
product is 0
𝑄𝑄 𝑇 = 𝐼

CONVEX OPTIMIZATION
• Find minimum of a function subject to solution constraints
• Business/economics/ game theory
• Resource allocation
• Optimal planning and strategies
• Statistics and Machine Learning
• All forms of regression and classification
• Unsupervised learning
• Control theory
• Keeping planes in the air!

CONVEX SETS
• A set 𝐶 is convex if ∀𝑥, 𝑦 ∈ 𝐶 and ∀𝛼 ∈ 0,1
𝛼𝑥 + 1 − 𝛼 𝑦 ∈ 𝐶
• Line segment between points in 𝐶 also lies in 𝐶
• Ex
• Intersection of halfspaces
• 𝐿 𝑝 balls
• Intersection of convex sets

CONVEX FUNCTIONS
• A real-valued function 𝑓 is convex if dom𝑓 is convex and ∀𝑥, 𝑦 ∈ dom𝑓 and
∀𝛼 ∈ 0,1
𝑓 𝛼𝑥 + 1 − 𝛼 𝑦 ≤ 𝛼𝑓 𝑥 + 1 − 𝛼 𝑓 𝑦
• Graph of 𝑓 upper bounded by line segment between points on graph
𝑥, 𝑓 𝑥
𝑦, 𝑓 𝑦

GRADIENTS
• Differentiable convex 𝑓 with dom𝑓 = ℝ 𝑑
• Gradient 𝛻𝑓 at 𝑥 gives linear approximation
𝛻𝑓 =
𝛿𝑓
𝛿𝑥1
…
𝛿𝑓
𝛿𝑥 𝑑
𝑇
𝑓
𝑥
𝑓 𝑥 + 𝑤 𝑇 𝛻𝑓

GRADIENT DESCENT
• To minimize 𝑓 move down gradient
• But not too far!
• Optimum when 𝛻𝑓 = 0
• Given 𝑓, learning rate 𝛼, starting point 𝑥0
𝑥 = 𝑥0
Do until 𝛻𝑓 = 0
𝑥 = 𝑥 − 𝛼𝛻𝑓

STOCHASTIC GRADIENT DESCENT
• Many learning problems have extra structure
𝑓 𝜃 =
𝑖=1
𝑛
𝐿 𝜃; 𝒙𝑖
• Computing gradient requires iterating over all points, can be too costly
• Instead, compute gradient at single training example

STOCHASTIC GRADIENT DESCENT
• Given 𝑓 𝜃 = 𝑖=1
𝑛
𝐿 𝜃; 𝒙𝑖 , learning rate 𝛼, starting point 𝜃0
𝜃 = 𝜃0
Do until 𝑓 𝜃 nearly optimal
For 𝑖 = 1 to 𝑛 in random order
𝜃 = 𝜃 − 𝛼𝛻𝐿 𝜃; 𝒙𝑖
• Finds nearly optimal 𝜃

MINIMIZE 𝑖=1
𝑛
𝑦𝑖 − 𝜃 𝑇
𝒙𝑖
2

LINEAR ALGEBRA, WITH OPTIMIZATION

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to LINEAR ALGEBRA, WITH OPTIMIZATION

Similar to LINEAR ALGEBRA, WITH OPTIMIZATION (20)

More from CHARAK RAY

More from CHARAK RAY (20)

Recently uploaded

Recently uploaded (20)

LINEAR ALGEBRA, WITH OPTIMIZATION