SlideShare a Scribd company logo
1 of 29
1
Institute of Manufacturing Information and Systems (製造資訊與系統研究所)
Institute of Engineering Management (工程管理碩士在職專班)
National Cheng Kung University (國立成功大學)
指導教授:李家岩 博士
報 告 者:洪紹嚴
日期:2016/05/06
Online Optimization Problem(1)
Productivity Optimization Lab Shao-Yen Hung
0. Agenda
1. Problem Definition
2. Background Knowledge
 Convex Function, Subgradient
 Lagrange Multiplier, KKT Conditions
 Loss Function
 Regularization
 BGD(Batch Gradient Descent) vs SGD (Stochastic Gradient Descent)
3. SCR (Simple coefficient Rounding);TG (Truncated Gradient)
4. FOBOS (Forward-Backward Splitting / Forward Looking Subgradients)
5. RDA (Regularized Dual Averaging)
6. FTRL (Follow-the-Regularized-Leader)
2
Productivity Optimization Lab Shao-Yen Hung
0. Agenda
1. Problem Definition
2. Background Knowledge
 Convex Function, Subgradient
 Lagrange Multiplier, KKT Conditions
 Loss Function
 Regularization
 BGD(Batch Gradient Descent) vs SGD (Stochastic Gradient Descent)
3. SCR (Simple coefficient Rounding);TG (Truncated Gradient)
4. FOBOS (Forward-Backward Splitting / Forward Looking Subgradients)
5. RDA (Regularized Dual Averaging)
6. FTRL (Follow-the-Regularized-Leader)
3
Productivity Optimization Lab Shao-Yen Hung
1. Problem Definition (Online Optimization)
• 訓練模型(OLS, NN, SVM……)就是一個最佳化的過程(e.g. 找出最佳
的w*,使預測和實際值之間的誤差總和最小)
• 傳統上,當有新資料進來時,會把訓練過的資料再訓練過一遍,求得
新的w*,這樣的做法稱為Batch Learning(批量學習)。
 缺點:速度慢,記憶體需求大
• 只根據新的資料進行w*的修正,這樣的做法稱為Online Learning。
 優點:速度快,記憶體需求小
• 案例:廣告點擊分析,投資組合管理,推薦商品……(大數據範疇)
4
Productivity Optimization Lab Shao-Yen Hung
0. Agenda
1. Problem Definition
2. Background Knowledge
 Convex Function, Subgradient
 Lagrange Multiplier, KKT Conditions
 Loss Function
 Regularization
 BGD(Batch Gradient Descent) vs SGD (Stochastic Gradient Descent)
3. SCR (Simple coefficient Rounding);TG (Truncated Gradient)
4. FOBOS (Forward-Backward Splitting / Forward Looking Subgradients)
5. RDA (Regularized Dual Averaging)
6. FTRL (Follow-the-Regularized-Leader)
5
Productivity Optimization Lab Shao-Yen Hung
2. Background Knowledge
 Convex Function
𝜕2 𝑓
𝜕𝑥2 ≥ 0
 Gradient and Subgradient
6
(Gradient)
x
y y=f(x)=|x|
At x=0, subgradient ∂f ∈ [-1, 1]
Productivity Optimization Lab Shao-Yen Hung
2. Background Knowledge
7
Solve
If f(x) is a convex function:
(1)
(2)
Solve
(Lagrange Multiplier)
(3)
Solve
(KKT Conditions)
Productivity Optimization Lab Shao-Yen Hung
2. Background Knowledge
最佳化問題描述:
8
• l(W,Z) = 損失函數(loss function)
• Z = 觀測樣本集合
• 𝑋𝑗 = 第j個樣本的特徵向量
• 𝑦𝑗 = h(W, 𝑋𝑗) = 第j個的樣本的預測值
• W = 特徵權重(求解的參數)
損失函數可視為各樣本損失函數的累加:
Linear Regression
Logistic Regression
Productivity Optimization Lab Shao-Yen Hung
2. Background Knowledge
• Regularization
 avoid overfitting problem
 generate sparsity
9
稱為正則化因子(Regularization),一個和W有關的函數
convex
convex
(L1, L2 and sparsity)
Lasso
Ridge
Productivity Optimization Lab Shao-Yen Hung
2. Background Knowledge
Batch Gradient Descent vs Stochastic Gradient Descent :
10
const
current iteration
全部資料
新資料
Productivity Optimization Lab Shao-Yen Hung
0. Agenda
1. Problem Definition
2. Background Knowledge
 Convex Function, Subgradient
 Lagrange Multiplier, KKT Conditions
 Loss Function
 Regularization
 BGD(Batch Gradient Descent) vs SGD (Stochastic Gradient Descent)
3. SCR (Simple coefficient Rounding);TG (Truncated Gradient)
4. FOBOS (Forward-Backward Splitting / Forward Looking Subgradients)
5. RDA (Regularized Dual Averaging)
6. FTRL (Follow-the-Regularized-Leader)
11
Productivity Optimization Lab Shao-Yen Hung
3. SCR (Simple coefficient Rounding)
12
• Solve the problem that L1 Regularization in SGD doesn’t
generate sparsity.
• 3 parameters:
 θ : threshold for deciding whether coefficient is 0 or not
 K : doing truncating after k online steps
 η : learning rate
𝑊𝑡+1 =
𝑊𝑡 − η
𝜕 𝑙 𝑊𝑡, 𝑍
𝜕𝑊𝑡
, 𝑖𝑓 𝑚𝑜𝑑 𝑡, 𝐾 ≠ 0
𝑇0(𝑊𝑡 − η
𝜕 𝑙 𝑊𝑡, 𝑍
𝜕𝑊𝑡
, θ) , 𝑖𝑓 𝑚𝑜𝑑 𝑡, 𝐾 = 0
𝑇0(𝑣𝑖, θ) =
0 , 𝑖𝑓 𝑣𝑖 < θ
𝑣𝑖 , 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
We first apply the standard stochastic gradient descent rule, and then round
small coefficients to zero.(Langford et al., 2009)
Productivity Optimization Lab Shao-Yen Hung
3. TG (Truncated Gradient)
13
• We observe that the direct rounding to zero is too aggressive. A
less aggressive version is to shrink the coefficient to zero by a
smaller amount. We call this idea truncated gradient.(Langford et
al., 2009)
(Lasso)
Productivity Optimization Lab Shao-Yen Hung
3. TG (Truncated Gradient)
14
𝑇1(𝑣𝑖, α, θ) =
max 0, 𝑣𝑖 − α , 𝑖𝑓 𝑣𝑖∈ [0, θ]
min 0, 𝑣𝑖 + α , 𝑖𝑓 𝑣𝑖∈ [−θ, 0]
𝑣𝑖 , 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
𝑊𝑡+1 = 𝑇1(𝑊𝑡 − η
𝜕 𝑙 𝑊𝑡,𝑍
𝜕𝑊𝑡
, η𝒈𝒊, θ)
𝒈𝒊 =
0 , 𝑖𝑓 𝑚𝑜𝑑 𝑡, 𝐾 ≠ 0
𝐾𝑔 , 𝑖𝑓 𝑚𝑜𝑑 𝑡, 𝐾 = 0
• 4 parameters:
 θ : threshold for deciding whether coefficient is 0 or not
 K : doing truncating after k online steps
 η : learning rate
 g:gravity parameter
Productivity Optimization Lab Shao-Yen Hung
3. TG (Truncated Gradient)
15
𝑊𝑡+1 =
𝑇1(𝑊𝑡 − η
𝜕 𝑙 𝑊𝑡, 𝑍
𝜕𝑊𝑡
, η𝒈𝒊, θ) , 𝑖𝑓 𝑚𝑜𝑑 𝑡, 𝐾 = 0
𝑊𝑡 − η
𝜕 𝑙 𝑊𝑡, 𝑍
𝜕𝑊𝑡
, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
Rewrite the TG formulation:
𝑇1(𝑣𝑖, α, θ) =
0 , 𝑖𝑓 |𝑣𝑖| < α
𝑣𝑖 −α ∗ 𝑠𝑔𝑛 𝑣𝑖 , 𝑖𝑓α ≤ 𝑣𝑖 < θ
𝑣𝑖 , 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
1) If α = θ SCR(簡單截斷法)
2) If K=1, θ = INF 𝑊𝑡+1 = 𝑇1(𝑊𝑡 − η
𝜕 𝑙 𝑊𝑡,𝑍
𝜕𝑊𝑡
, η𝒈𝒊, 𝑰𝑵𝑭)
= 𝑊𝑡 −η
𝜕 𝑙 𝑊𝑡,𝑍
𝜕𝑊𝑡
− η𝒈𝒊sgn( 𝑊𝑡 −η
𝜕 𝑙 𝑊𝑡,𝑍
𝜕𝑊𝑡
)
L1 Regularization (Lasso)
(if α ≤ |𝑣𝑖|)
Productivity Optimization Lab Shao-Yen Hung
3. SCR vs TG vs LASSO
16
α = θ
K=1, θ = INF
−α
α
Productivity Optimization Lab Shao-Yen Hung
3. TG (Truncated Gradient)
17
Productivity Optimization Lab Shao-Yen Hung
0. Agenda
1. Problem Definition
2. Background Knowledge
 Convex Function, Subgradient
 Lagrange Multiplier, KKT Conditions
 Loss Function
 Regularization
 BGD(Batch Gradient Descent) vs SGD (Stochastic Gradient Descent)
3. SCR (Simple coefficient Rounding);TG (Truncated Gradient)
4. FOBOS (Forward-Backward Splitting / Forward Looking Subgradients)
5. RDA (Regularized Dual Averaging)
6. FTRL (Follow-the-Regularized-Leader)
18
Productivity Optimization Lab Shao-Yen Hung
4. FOBOS (Forward Backward Splitting)
19
(1) 標準梯度下降公式:
(2) L1-FOBOS的梯度下降公式,可以再細分為兩部分:
 前部分:微調發生在梯度下降的結果(𝑾 𝒕+
𝟏
𝟐
)附近
 後部分:處理正則化,產生稀疏性
r(w) = regularization functions
• 事實上,這個方法應該叫FOBAS,可是原作者John Langfold 一開始稱
呼這個方法為FOLOS (Forward Looking Subgradients),為了避免困擾,
於是把A改成O,變成FOBOS。
(加上L1 regularization的FOBOS)
Productivity Optimization Lab Shao-Yen Hung
4. FOBOS (Forward Backward Splitting)
20
(3) 要求得(2)最佳解的充分條件: 0 屬於其subgradient set之中
(4) 因為 ,(3) 可以改寫成:
(5) 換句話說,把(4)移項之後:
 迭代前的狀態𝑾 𝒕 與梯度
backward
 當次迭代的正則項資訊 𝝏𝒓(𝑾𝒕+𝟏)
forward
Productivity Optimization Lab Shao-Yen Hung
4. FOBOS (Forward Backward Splitting)
21
• L1-FOBOS’s Sparsity
(1)改寫一下原式:
(1)
(2)
λ
r(w) = λ ||w|| 1
Productivity Optimization Lab Shao-Yen Hung
4. FOBOS (Forward Backward Splitting)
22
• L1-FOBOS’s Sparsity
(2)可以拆解成每一維度權重的總和:
(2)
(3)
Productivity Optimization Lab Shao-Yen Hung
4. FOBOS (Forward Backward Splitting)
23
• L1-FOBOS’s Sparsity
(3)如果𝒘∗是某一維度𝒘𝒋的最佳解,則𝒘𝒋‧𝒗𝒋 ≥ 0:
(3)
(反證法)
如果上述不成立,表示 𝒘𝒋 ‧ 𝒗𝒋 < 0
因此:
𝟏
𝟐
𝒗 𝟐 <
𝟏
𝟐
𝒗 𝟐 − 𝒘∗‧𝒗 +
𝟏
𝟐
(𝒘∗) 𝟐 <
𝟏
𝟐
𝒗 − 𝒘∗ 𝟐 + λ 𝒘∗
這和𝒘𝒊
∗
為最佳解不符,故𝒘𝒋‧𝒗𝒋 ≥ 0
Productivity Optimization Lab Shao-Yen Hung
4. FOBOS (Forward Backward Splitting)
24
• L1-FOBOS’s Sparsity
(4)當𝒘𝒋‧𝒗𝒋 ≥ 0:
If 𝐯𝐣 ≥ 0:
a) 𝒘∗ > 0:Since βw*=0, β=0 w* = v - 𝜆
b) 𝒘∗
= 0:Since β≥ 0, 𝑣𝑖 - 𝜆 ≤ 0
That is, w* = max(0, 𝑣𝑖 - 𝜆)
s.t −𝒘𝒋 ≤ 𝟎
]=0 and βw=0
𝜕
𝜕𝑤
[
KKT
𝑤∗
(5) Same as 𝒗𝒋 < 0: w* = -max(0, −𝑣𝑖 - 𝜆)
Productivity Optimization Lab Shao-Yen Hung
4. FOBOS (Forward Backward Splitting)
25
• L1-FOBOS’s Sparsity
(6)綜合(4)(5)的結論:
𝑊𝑖
𝑡+1
= sgn 𝑣𝑖 max(0, |vi| − λ)
𝑊𝑖
𝑡+1
= sgn 𝑊𝑖
𝑡
− η 𝑡 𝑔𝑖
𝑡
max(0, |𝑊𝑖
𝑡
− η 𝑡 𝑔𝑖
𝑡
| − η 𝑡+
1
2
λ)
𝑣𝑖 = 𝑊𝑖
𝑡
− η 𝑡 𝑔𝑖
𝑡
Productivity Optimization Lab Shao-Yen Hung
4. FOBOS (Forward Backward Splitting)
26
• L1-FOBOS’s Sparsity
(7)根據(6)的式子:
可以發現到,當|𝑊𝑖
𝑡
− η 𝑡 𝑔𝑖
𝑡
| ≤ η 𝑡+
1
2
λ 時,會對𝑊𝑖
𝑡+1
進行截斷(Truncating)
換句話說:
𝑊𝑖
𝑡+1
= sgn 𝑊𝑖
𝑡
− η 𝑡 𝑔𝑖
𝑡
max(0, |𝑊𝑖
𝑡
− η 𝑡 𝑔𝑖
𝑡
| − η 𝑡+
1
2
λ )
𝑊𝑡+1
=
0 , 𝑖𝑓 𝑊𝑖
𝑡
− η 𝑡 𝑔𝑖
𝑡
≤ η
𝑡+
1
2
λ
𝑊𝑖
𝑡
− η 𝑡 𝑔𝑖
𝑡
− sgn 𝑊𝑖
𝑡
− η 𝑡 𝑔𝑖
𝑡
η
𝑡+
1
2
λ , 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
(可以這麼解釋)
當一個新樣本產生的梯度,不足以讓該維度權重產生足夠大的變化時,
認為該維度在本次更新中不重要,因此令其權重為 0
Productivity Optimization Lab Shao-Yen Hung
4. TG vs FOBOS
27
• L1-FOBOS
𝑊𝑡+1
=
0 , 𝑖𝑓 𝑊𝑖
𝑡
− η 𝑡 𝑔𝑖
𝑡
≤ η
𝑡+
1
2
λ
𝑊𝑖
𝑡
− η 𝑡 𝑔𝑖
𝑡
− sgn 𝑊𝑖
𝑡
− η 𝑡 𝑔𝑖
𝑡
η
𝑡+
1
2
λ , 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
有趣的是,當 K=1,
θ = INF, TG = L1-FOBOS
α = η 𝑡+
1
2
λ
• TG
𝑊𝑡+1 =
𝑇1(𝑊𝑡 − η
𝜕 𝑙 𝑊𝑡, 𝑍
𝜕𝑊𝑡
, η𝒈𝒊, θ) , 𝑖𝑓 𝑚𝑜𝑑 𝑡, 𝐾 = 0
𝑊𝑡 − η
𝜕 𝑙 𝑊𝑡, 𝑍
𝜕𝑊𝑡
, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
𝑇1(𝑣𝑖, α, θ) =
0 , 𝑖𝑓 |𝑣𝑖| ≤ α
𝑣𝑖 −α ∗ 𝑠𝑔𝑛 𝑣𝑖 , 𝑖𝑓α ≤ | 𝑣𝑖 | ≤ θ
𝑣𝑖 , 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
Productivity Optimization Lab Shao-Yen Hung
0. Agenda
1. Problem Definition
2. Background Knowledge
 Convex Function, Subgradient
 Lagrange Multiplier, KKT Conditions
 Loss Function
 Regularization
 BGD(Batch Gradient Descent) vs SGD (Stochastic Gradient Descent)
3. SCR (Simple coefficient Rounding);TG (Truncated Gradient)
4. FOBOS (Forward-Backward Splitting / Forward Looking Subgradients)
5. RDA (Regularized Dual Averaging)
6. FTRL (Follow-the-Regularized-Leader)
28
Productivity Optimization Lab Shao-Yen Hung
Reference
[1] John Langford, Lihong Li & Tong Zhang. Sparse Online Learning via
Truncated Gradient. Journal of Machine Learning Research, 2009.
[2] John Duchi & Yoram Singer. Efficient Online and Batch Learning using
Forward Backward Splitting. Journal of Machine Learning Research, 2009.
[3] Lin Xiao. Dual Averaging Methods for Regularized Stochastic Learning
and Online Optimization. Journal of Machine Learning Research, 2010.
[4] H. B. McMahan. Follow-the-regularized-leader and mirror descent:
Equivalence theorems and L1 regularization. In AISTATS, 2011.
[5] H. Brendan McMahan,Gary Holt, D. Sculley et al. Ad Click Prediction: a
View from the Trenches. In KDD , 2013.
29

More Related Content

What's hot

층류 익형의 설계 최적화
층류 익형의 설계 최적화층류 익형의 설계 최적화
층류 익형의 설계 최적화HyunJoon Kim
 
Assembly language (addition and subtraction)
Assembly language (addition and subtraction)Assembly language (addition and subtraction)
Assembly language (addition and subtraction)Muhammad Umar Farooq
 
2015 01 09 - Rende - Unical - Martin Gebser: Clingo = Answer Set Programming ...
2015 01 09 - Rende - Unical - Martin Gebser: Clingo = Answer Set Programming ...2015 01 09 - Rende - Unical - Martin Gebser: Clingo = Answer Set Programming ...
2015 01 09 - Rende - Unical - Martin Gebser: Clingo = Answer Set Programming ...kaliaragorn
 
RaVioli: A Parallel Vide Processing Library with Auto Resolution Adjustability
RaVioli: A Parallel Vide Processing Library with Auto Resolution AdjustabilityRaVioli: A Parallel Vide Processing Library with Auto Resolution Adjustability
RaVioli: A Parallel Vide Processing Library with Auto Resolution AdjustabilityMatsuo and Tsumura lab.
 
A Walk in the GAN Zoo
A Walk in the GAN ZooA Walk in the GAN Zoo
A Walk in the GAN ZooLarry Guo
 
Ration-by-Weight of Efficiency and Equity
Ration-by-Weight of Efficiency and Equity Ration-by-Weight of Efficiency and Equity
Ration-by-Weight of Efficiency and Equity Rong (Carina) Wang
 
Large scale logistic regression and linear support vector machines using spark
Large scale logistic regression and linear support vector machines using sparkLarge scale logistic regression and linear support vector machines using spark
Large scale logistic regression and linear support vector machines using sparkMila, Université de Montréal
 
Injecting image priors into Learnable Compressive Subsampling
Injecting image priors into Learnable Compressive SubsamplingInjecting image priors into Learnable Compressive Subsampling
Injecting image priors into Learnable Compressive SubsamplingMartino Ferrari
 
Towards a General Approach for Symbolic Model-Checker Prototyping
Towards a General Approach for Symbolic Model-Checker PrototypingTowards a General Approach for Symbolic Model-Checker Prototyping
Towards a General Approach for Symbolic Model-Checker PrototypingEdmundo López Bóbeda
 
Instruction Set Of 8086 DIU CSE
Instruction Set Of 8086 DIU CSEInstruction Set Of 8086 DIU CSE
Instruction Set Of 8086 DIU CSEsalmancreation
 
Gradient descent optimizer
Gradient descent optimizerGradient descent optimizer
Gradient descent optimizerHojin Yang
 
Video lecture for bca
Video lecture for bcaVideo lecture for bca
Video lecture for bcaEdhole.com
 
Graph Regularised Hashing (ECIR'15 Talk)
Graph Regularised Hashing (ECIR'15 Talk)Graph Regularised Hashing (ECIR'15 Talk)
Graph Regularised Hashing (ECIR'15 Talk)Sean Moran
 
Orthogonal Faster than Nyquist Transmission for SIMO Wireless Systems
Orthogonal Faster than Nyquist Transmission for SIMO Wireless SystemsOrthogonal Faster than Nyquist Transmission for SIMO Wireless Systems
Orthogonal Faster than Nyquist Transmission for SIMO Wireless SystemsT. E. BOGALE
 

What's hot (20)

층류 익형의 설계 최적화
층류 익형의 설계 최적화층류 익형의 설계 최적화
층류 익형의 설계 최적화
 
Assembly language (addition and subtraction)
Assembly language (addition and subtraction)Assembly language (addition and subtraction)
Assembly language (addition and subtraction)
 
2015 01 09 - Rende - Unical - Martin Gebser: Clingo = Answer Set Programming ...
2015 01 09 - Rende - Unical - Martin Gebser: Clingo = Answer Set Programming ...2015 01 09 - Rende - Unical - Martin Gebser: Clingo = Answer Set Programming ...
2015 01 09 - Rende - Unical - Martin Gebser: Clingo = Answer Set Programming ...
 
Slides
SlidesSlides
Slides
 
RaVioli: A Parallel Vide Processing Library with Auto Resolution Adjustability
RaVioli: A Parallel Vide Processing Library with Auto Resolution AdjustabilityRaVioli: A Parallel Vide Processing Library with Auto Resolution Adjustability
RaVioli: A Parallel Vide Processing Library with Auto Resolution Adjustability
 
A Walk in the GAN Zoo
A Walk in the GAN ZooA Walk in the GAN Zoo
A Walk in the GAN Zoo
 
Ration-by-Weight of Efficiency and Equity
Ration-by-Weight of Efficiency and Equity Ration-by-Weight of Efficiency and Equity
Ration-by-Weight of Efficiency and Equity
 
Large scale logistic regression and linear support vector machines using spark
Large scale logistic regression and linear support vector machines using sparkLarge scale logistic regression and linear support vector machines using spark
Large scale logistic regression and linear support vector machines using spark
 
Injecting image priors into Learnable Compressive Subsampling
Injecting image priors into Learnable Compressive SubsamplingInjecting image priors into Learnable Compressive Subsampling
Injecting image priors into Learnable Compressive Subsampling
 
Towards a General Approach for Symbolic Model-Checker Prototyping
Towards a General Approach for Symbolic Model-Checker PrototypingTowards a General Approach for Symbolic Model-Checker Prototyping
Towards a General Approach for Symbolic Model-Checker Prototyping
 
Instruction Set Of 8086 DIU CSE
Instruction Set Of 8086 DIU CSEInstruction Set Of 8086 DIU CSE
Instruction Set Of 8086 DIU CSE
 
Basic R
Basic RBasic R
Basic R
 
Pclsp ntnu
Pclsp ntnuPclsp ntnu
Pclsp ntnu
 
Scalable Link Discovery for Modern Data-Driven Applications
Scalable Link Discovery for Modern Data-Driven ApplicationsScalable Link Discovery for Modern Data-Driven Applications
Scalable Link Discovery for Modern Data-Driven Applications
 
Gradient descent optimizer
Gradient descent optimizerGradient descent optimizer
Gradient descent optimizer
 
Video lecture for bca
Video lecture for bcaVideo lecture for bca
Video lecture for bca
 
8086 instruction set
8086 instruction set8086 instruction set
8086 instruction set
 
Graph Regularised Hashing (ECIR'15 Talk)
Graph Regularised Hashing (ECIR'15 Talk)Graph Regularised Hashing (ECIR'15 Talk)
Graph Regularised Hashing (ECIR'15 Talk)
 
Fast rcnn
Fast rcnnFast rcnn
Fast rcnn
 
Orthogonal Faster than Nyquist Transmission for SIMO Wireless Systems
Orthogonal Faster than Nyquist Transmission for SIMO Wireless SystemsOrthogonal Faster than Nyquist Transmission for SIMO Wireless Systems
Orthogonal Faster than Nyquist Transmission for SIMO Wireless Systems
 

Similar to Online Optimization Problem-1 (Online machine learning)

On the Configuration-LP of the Restricted Assignment Problem
On the Configuration-LP of the Restricted Assignment ProblemOn the Configuration-LP of the Restricted Assignment Problem
On the Configuration-LP of the Restricted Assignment ProblemArash Pourdamghani
 
2014-06-20 Multinomial Logistic Regression with Apache Spark
2014-06-20 Multinomial Logistic Regression with Apache Spark2014-06-20 Multinomial Logistic Regression with Apache Spark
2014-06-20 Multinomial Logistic Regression with Apache SparkDB Tsai
 
Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 7
Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 7Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 7
Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 7Ono Shigeru
 
Stochastic Gradient Descent with Exponential Convergence Rates of Expected Cl...
Stochastic Gradient Descent with Exponential Convergence Rates of Expected Cl...Stochastic Gradient Descent with Exponential Convergence Rates of Expected Cl...
Stochastic Gradient Descent with Exponential Convergence Rates of Expected Cl...Atsushi Nitanda
 
Multinomial Logistic Regression with Apache Spark
Multinomial Logistic Regression with Apache SparkMultinomial Logistic Regression with Apache Spark
Multinomial Logistic Regression with Apache SparkDB Tsai
 
Alpine Spark Implementation - Technical
Alpine Spark Implementation - TechnicalAlpine Spark Implementation - Technical
Alpine Spark Implementation - Technicalalpinedatalabs
 
Frege - consequently functional programming for the JVM
Frege - consequently functional programming for the JVMFrege - consequently functional programming for the JVM
Frege - consequently functional programming for the JVMDierk König
 
Supervisory control of discrete event systems for linear temporal logic speci...
Supervisory control of discrete event systems for linear temporal logic speci...Supervisory control of discrete event systems for linear temporal logic speci...
Supervisory control of discrete event systems for linear temporal logic speci...AmiSakakibara
 
Julia Kreutzer - 2017 - Bandit Structured Prediction for Neural Seq2Seq Learning
Julia Kreutzer - 2017 - Bandit Structured Prediction for Neural Seq2Seq LearningJulia Kreutzer - 2017 - Bandit Structured Prediction for Neural Seq2Seq Learning
Julia Kreutzer - 2017 - Bandit Structured Prediction for Neural Seq2Seq LearningAssociation for Computational Linguistics
 
Introduction To Lisp
Introduction To LispIntroduction To Lisp
Introduction To Lispkyleburton
 
Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 6
Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 6Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 6
Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 6Ono Shigeru
 
CS571:: Part of-Speech Tagging
CS571:: Part of-Speech TaggingCS571:: Part of-Speech Tagging
CS571:: Part of-Speech TaggingJinho Choi
 
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdfvariBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdftaeseon ryu
 
Control configuration in digital control
Control configuration in digital controlControl configuration in digital control
Control configuration in digital controlAshvani Shukla
 
Control configuration in digital control
Control configuration in digital controlControl configuration in digital control
Control configuration in digital controlAshvani Shukla
 
lecture01_lecture01_lecture0001_ceva.pdf
lecture01_lecture01_lecture0001_ceva.pdflecture01_lecture01_lecture0001_ceva.pdf
lecture01_lecture01_lecture0001_ceva.pdfAnaNeacsu5
 
Chapter 08
Chapter 08Chapter 08
Chapter 08Tha Mike
 
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pptx
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pptxvariBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pptx
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pptxSeungeon Baek
 

Similar to Online Optimization Problem-1 (Online machine learning) (20)

On the Configuration-LP of the Restricted Assignment Problem
On the Configuration-LP of the Restricted Assignment ProblemOn the Configuration-LP of the Restricted Assignment Problem
On the Configuration-LP of the Restricted Assignment Problem
 
2014-06-20 Multinomial Logistic Regression with Apache Spark
2014-06-20 Multinomial Logistic Regression with Apache Spark2014-06-20 Multinomial Logistic Regression with Apache Spark
2014-06-20 Multinomial Logistic Regression with Apache Spark
 
Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 7
Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 7Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 7
Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 7
 
Stochastic Gradient Descent with Exponential Convergence Rates of Expected Cl...
Stochastic Gradient Descent with Exponential Convergence Rates of Expected Cl...Stochastic Gradient Descent with Exponential Convergence Rates of Expected Cl...
Stochastic Gradient Descent with Exponential Convergence Rates of Expected Cl...
 
Multinomial Logistic Regression with Apache Spark
Multinomial Logistic Regression with Apache SparkMultinomial Logistic Regression with Apache Spark
Multinomial Logistic Regression with Apache Spark
 
Alpine Spark Implementation - Technical
Alpine Spark Implementation - TechnicalAlpine Spark Implementation - Technical
Alpine Spark Implementation - Technical
 
A G1GC Saga-KCJUG.pptx
A G1GC Saga-KCJUG.pptxA G1GC Saga-KCJUG.pptx
A G1GC Saga-KCJUG.pptx
 
Bubble sort
Bubble sortBubble sort
Bubble sort
 
Frege - consequently functional programming for the JVM
Frege - consequently functional programming for the JVMFrege - consequently functional programming for the JVM
Frege - consequently functional programming for the JVM
 
Supervisory control of discrete event systems for linear temporal logic speci...
Supervisory control of discrete event systems for linear temporal logic speci...Supervisory control of discrete event systems for linear temporal logic speci...
Supervisory control of discrete event systems for linear temporal logic speci...
 
Julia Kreutzer - 2017 - Bandit Structured Prediction for Neural Seq2Seq Learning
Julia Kreutzer - 2017 - Bandit Structured Prediction for Neural Seq2Seq LearningJulia Kreutzer - 2017 - Bandit Structured Prediction for Neural Seq2Seq Learning
Julia Kreutzer - 2017 - Bandit Structured Prediction for Neural Seq2Seq Learning
 
Introduction To Lisp
Introduction To LispIntroduction To Lisp
Introduction To Lisp
 
Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 6
Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 6Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 6
Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 6
 
CS571:: Part of-Speech Tagging
CS571:: Part of-Speech TaggingCS571:: Part of-Speech Tagging
CS571:: Part of-Speech Tagging
 
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdfvariBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
 
Control configuration in digital control
Control configuration in digital controlControl configuration in digital control
Control configuration in digital control
 
Control configuration in digital control
Control configuration in digital controlControl configuration in digital control
Control configuration in digital control
 
lecture01_lecture01_lecture0001_ceva.pdf
lecture01_lecture01_lecture0001_ceva.pdflecture01_lecture01_lecture0001_ceva.pdf
lecture01_lecture01_lecture0001_ceva.pdf
 
Chapter 08
Chapter 08Chapter 08
Chapter 08
 
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pptx
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pptxvariBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pptx
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pptx
 

More from Shao-Yen Hung

思考技術(2)---隱而未見的顯而易見
思考技術(2)---隱而未見的顯而易見思考技術(2)---隱而未見的顯而易見
思考技術(2)---隱而未見的顯而易見Shao-Yen Hung
 
思考技術(1)---勢
思考技術(1)---勢思考技術(1)---勢
思考技術(1)---勢Shao-Yen Hung
 
Introduction of Spark
Introduction of SparkIntroduction of Spark
Introduction of SparkShao-Yen Hung
 
Introduction of Hadoop
Introduction of HadoopIntroduction of Hadoop
Introduction of HadoopShao-Yen Hung
 

More from Shao-Yen Hung (6)

台灣漫畫史
台灣漫畫史台灣漫畫史
台灣漫畫史
 
淺談秦始皇
淺談秦始皇淺談秦始皇
淺談秦始皇
 
思考技術(2)---隱而未見的顯而易見
思考技術(2)---隱而未見的顯而易見思考技術(2)---隱而未見的顯而易見
思考技術(2)---隱而未見的顯而易見
 
思考技術(1)---勢
思考技術(1)---勢思考技術(1)---勢
思考技術(1)---勢
 
Introduction of Spark
Introduction of SparkIntroduction of Spark
Introduction of Spark
 
Introduction of Hadoop
Introduction of HadoopIntroduction of Hadoop
Introduction of Hadoop
 

Recently uploaded

Earthing details of Electrical Substation
Earthing details of Electrical SubstationEarthing details of Electrical Substation
Earthing details of Electrical Substationstephanwindworld
 
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...Erbil Polytechnic University
 
Katarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School CourseKatarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School Coursebim.edu.pl
 
Robotics Group 10 (Control Schemes) cse.pdf
Robotics Group 10  (Control Schemes) cse.pdfRobotics Group 10  (Control Schemes) cse.pdf
Robotics Group 10 (Control Schemes) cse.pdfsahilsajad201
 
System Simulation and Modelling with types and Event Scheduling
System Simulation and Modelling with types and Event SchedulingSystem Simulation and Modelling with types and Event Scheduling
System Simulation and Modelling with types and Event SchedulingBootNeck1
 
SOFTWARE ESTIMATION COCOMO AND FP CALCULATION
SOFTWARE ESTIMATION COCOMO AND FP CALCULATIONSOFTWARE ESTIMATION COCOMO AND FP CALCULATION
SOFTWARE ESTIMATION COCOMO AND FP CALCULATIONSneha Padhiar
 
multiple access in wireless communication
multiple access in wireless communicationmultiple access in wireless communication
multiple access in wireless communicationpanditadesh123
 
Mine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptxMine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptxRomil Mishra
 
Input Output Management in Operating System
Input Output Management in Operating SystemInput Output Management in Operating System
Input Output Management in Operating SystemRashmi Bhat
 
『澳洲文凭』买麦考瑞大学毕业证书成绩单办理澳洲Macquarie文凭学位证书
『澳洲文凭』买麦考瑞大学毕业证书成绩单办理澳洲Macquarie文凭学位证书『澳洲文凭』买麦考瑞大学毕业证书成绩单办理澳洲Macquarie文凭学位证书
『澳洲文凭』买麦考瑞大学毕业证书成绩单办理澳洲Macquarie文凭学位证书rnrncn29
 
ROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.ppt
ROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.pptROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.ppt
ROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.pptJohnWilliam111370
 
TEST CASE GENERATION GENERATION BLOCK BOX APPROACH
TEST CASE GENERATION GENERATION BLOCK BOX APPROACHTEST CASE GENERATION GENERATION BLOCK BOX APPROACH
TEST CASE GENERATION GENERATION BLOCK BOX APPROACHSneha Padhiar
 
List of Accredited Concrete Batching Plant.pdf
List of Accredited Concrete Batching Plant.pdfList of Accredited Concrete Batching Plant.pdf
List of Accredited Concrete Batching Plant.pdfisabel213075
 
Research Methodology for Engineering pdf
Research Methodology for Engineering pdfResearch Methodology for Engineering pdf
Research Methodology for Engineering pdfCaalaaAbdulkerim
 
Main Memory Management in Operating System
Main Memory Management in Operating SystemMain Memory Management in Operating System
Main Memory Management in Operating SystemRashmi Bhat
 
Computer Graphics Introduction, Open GL, Line and Circle drawing algorithm
Computer Graphics Introduction, Open GL, Line and Circle drawing algorithmComputer Graphics Introduction, Open GL, Line and Circle drawing algorithm
Computer Graphics Introduction, Open GL, Line and Circle drawing algorithmDeepika Walanjkar
 
Levelling - Rise and fall - Height of instrument method
Levelling - Rise and fall - Height of instrument methodLevelling - Rise and fall - Height of instrument method
Levelling - Rise and fall - Height of instrument methodManicka Mamallan Andavar
 
Paper Tube : Shigeru Ban projects and Case Study of Cardboard Cathedral .pdf
Paper Tube : Shigeru Ban projects and Case Study of Cardboard Cathedral .pdfPaper Tube : Shigeru Ban projects and Case Study of Cardboard Cathedral .pdf
Paper Tube : Shigeru Ban projects and Case Study of Cardboard Cathedral .pdfNainaShrivastava14
 
2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.
2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.
2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.elesangwon
 

Recently uploaded (20)

Earthing details of Electrical Substation
Earthing details of Electrical SubstationEarthing details of Electrical Substation
Earthing details of Electrical Substation
 
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
 
Katarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School CourseKatarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School Course
 
Robotics Group 10 (Control Schemes) cse.pdf
Robotics Group 10  (Control Schemes) cse.pdfRobotics Group 10  (Control Schemes) cse.pdf
Robotics Group 10 (Control Schemes) cse.pdf
 
System Simulation and Modelling with types and Event Scheduling
System Simulation and Modelling with types and Event SchedulingSystem Simulation and Modelling with types and Event Scheduling
System Simulation and Modelling with types and Event Scheduling
 
SOFTWARE ESTIMATION COCOMO AND FP CALCULATION
SOFTWARE ESTIMATION COCOMO AND FP CALCULATIONSOFTWARE ESTIMATION COCOMO AND FP CALCULATION
SOFTWARE ESTIMATION COCOMO AND FP CALCULATION
 
multiple access in wireless communication
multiple access in wireless communicationmultiple access in wireless communication
multiple access in wireless communication
 
Mine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptxMine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptx
 
Input Output Management in Operating System
Input Output Management in Operating SystemInput Output Management in Operating System
Input Output Management in Operating System
 
『澳洲文凭』买麦考瑞大学毕业证书成绩单办理澳洲Macquarie文凭学位证书
『澳洲文凭』买麦考瑞大学毕业证书成绩单办理澳洲Macquarie文凭学位证书『澳洲文凭』买麦考瑞大学毕业证书成绩单办理澳洲Macquarie文凭学位证书
『澳洲文凭』买麦考瑞大学毕业证书成绩单办理澳洲Macquarie文凭学位证书
 
ROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.ppt
ROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.pptROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.ppt
ROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.ppt
 
TEST CASE GENERATION GENERATION BLOCK BOX APPROACH
TEST CASE GENERATION GENERATION BLOCK BOX APPROACHTEST CASE GENERATION GENERATION BLOCK BOX APPROACH
TEST CASE GENERATION GENERATION BLOCK BOX APPROACH
 
List of Accredited Concrete Batching Plant.pdf
List of Accredited Concrete Batching Plant.pdfList of Accredited Concrete Batching Plant.pdf
List of Accredited Concrete Batching Plant.pdf
 
Research Methodology for Engineering pdf
Research Methodology for Engineering pdfResearch Methodology for Engineering pdf
Research Methodology for Engineering pdf
 
Main Memory Management in Operating System
Main Memory Management in Operating SystemMain Memory Management in Operating System
Main Memory Management in Operating System
 
Computer Graphics Introduction, Open GL, Line and Circle drawing algorithm
Computer Graphics Introduction, Open GL, Line and Circle drawing algorithmComputer Graphics Introduction, Open GL, Line and Circle drawing algorithm
Computer Graphics Introduction, Open GL, Line and Circle drawing algorithm
 
Levelling - Rise and fall - Height of instrument method
Levelling - Rise and fall - Height of instrument methodLevelling - Rise and fall - Height of instrument method
Levelling - Rise and fall - Height of instrument method
 
Paper Tube : Shigeru Ban projects and Case Study of Cardboard Cathedral .pdf
Paper Tube : Shigeru Ban projects and Case Study of Cardboard Cathedral .pdfPaper Tube : Shigeru Ban projects and Case Study of Cardboard Cathedral .pdf
Paper Tube : Shigeru Ban projects and Case Study of Cardboard Cathedral .pdf
 
2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.
2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.
2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.
 
Designing pile caps according to ACI 318-19.pptx
Designing pile caps according to ACI 318-19.pptxDesigning pile caps according to ACI 318-19.pptx
Designing pile caps according to ACI 318-19.pptx
 

Online Optimization Problem-1 (Online machine learning)

  • 1. 1 Institute of Manufacturing Information and Systems (製造資訊與系統研究所) Institute of Engineering Management (工程管理碩士在職專班) National Cheng Kung University (國立成功大學) 指導教授:李家岩 博士 報 告 者:洪紹嚴 日期:2016/05/06 Online Optimization Problem(1)
  • 2. Productivity Optimization Lab Shao-Yen Hung 0. Agenda 1. Problem Definition 2. Background Knowledge  Convex Function, Subgradient  Lagrange Multiplier, KKT Conditions  Loss Function  Regularization  BGD(Batch Gradient Descent) vs SGD (Stochastic Gradient Descent) 3. SCR (Simple coefficient Rounding);TG (Truncated Gradient) 4. FOBOS (Forward-Backward Splitting / Forward Looking Subgradients) 5. RDA (Regularized Dual Averaging) 6. FTRL (Follow-the-Regularized-Leader) 2
  • 3. Productivity Optimization Lab Shao-Yen Hung 0. Agenda 1. Problem Definition 2. Background Knowledge  Convex Function, Subgradient  Lagrange Multiplier, KKT Conditions  Loss Function  Regularization  BGD(Batch Gradient Descent) vs SGD (Stochastic Gradient Descent) 3. SCR (Simple coefficient Rounding);TG (Truncated Gradient) 4. FOBOS (Forward-Backward Splitting / Forward Looking Subgradients) 5. RDA (Regularized Dual Averaging) 6. FTRL (Follow-the-Regularized-Leader) 3
  • 4. Productivity Optimization Lab Shao-Yen Hung 1. Problem Definition (Online Optimization) • 訓練模型(OLS, NN, SVM……)就是一個最佳化的過程(e.g. 找出最佳 的w*,使預測和實際值之間的誤差總和最小) • 傳統上,當有新資料進來時,會把訓練過的資料再訓練過一遍,求得 新的w*,這樣的做法稱為Batch Learning(批量學習)。  缺點:速度慢,記憶體需求大 • 只根據新的資料進行w*的修正,這樣的做法稱為Online Learning。  優點:速度快,記憶體需求小 • 案例:廣告點擊分析,投資組合管理,推薦商品……(大數據範疇) 4
  • 5. Productivity Optimization Lab Shao-Yen Hung 0. Agenda 1. Problem Definition 2. Background Knowledge  Convex Function, Subgradient  Lagrange Multiplier, KKT Conditions  Loss Function  Regularization  BGD(Batch Gradient Descent) vs SGD (Stochastic Gradient Descent) 3. SCR (Simple coefficient Rounding);TG (Truncated Gradient) 4. FOBOS (Forward-Backward Splitting / Forward Looking Subgradients) 5. RDA (Regularized Dual Averaging) 6. FTRL (Follow-the-Regularized-Leader) 5
  • 6. Productivity Optimization Lab Shao-Yen Hung 2. Background Knowledge  Convex Function 𝜕2 𝑓 𝜕𝑥2 ≥ 0  Gradient and Subgradient 6 (Gradient) x y y=f(x)=|x| At x=0, subgradient ∂f ∈ [-1, 1]
  • 7. Productivity Optimization Lab Shao-Yen Hung 2. Background Knowledge 7 Solve If f(x) is a convex function: (1) (2) Solve (Lagrange Multiplier) (3) Solve (KKT Conditions)
  • 8. Productivity Optimization Lab Shao-Yen Hung 2. Background Knowledge 最佳化問題描述: 8 • l(W,Z) = 損失函數(loss function) • Z = 觀測樣本集合 • 𝑋𝑗 = 第j個樣本的特徵向量 • 𝑦𝑗 = h(W, 𝑋𝑗) = 第j個的樣本的預測值 • W = 特徵權重(求解的參數) 損失函數可視為各樣本損失函數的累加: Linear Regression Logistic Regression
  • 9. Productivity Optimization Lab Shao-Yen Hung 2. Background Knowledge • Regularization  avoid overfitting problem  generate sparsity 9 稱為正則化因子(Regularization),一個和W有關的函數 convex convex (L1, L2 and sparsity) Lasso Ridge
  • 10. Productivity Optimization Lab Shao-Yen Hung 2. Background Knowledge Batch Gradient Descent vs Stochastic Gradient Descent : 10 const current iteration 全部資料 新資料
  • 11. Productivity Optimization Lab Shao-Yen Hung 0. Agenda 1. Problem Definition 2. Background Knowledge  Convex Function, Subgradient  Lagrange Multiplier, KKT Conditions  Loss Function  Regularization  BGD(Batch Gradient Descent) vs SGD (Stochastic Gradient Descent) 3. SCR (Simple coefficient Rounding);TG (Truncated Gradient) 4. FOBOS (Forward-Backward Splitting / Forward Looking Subgradients) 5. RDA (Regularized Dual Averaging) 6. FTRL (Follow-the-Regularized-Leader) 11
  • 12. Productivity Optimization Lab Shao-Yen Hung 3. SCR (Simple coefficient Rounding) 12 • Solve the problem that L1 Regularization in SGD doesn’t generate sparsity. • 3 parameters:  θ : threshold for deciding whether coefficient is 0 or not  K : doing truncating after k online steps  η : learning rate 𝑊𝑡+1 = 𝑊𝑡 − η 𝜕 𝑙 𝑊𝑡, 𝑍 𝜕𝑊𝑡 , 𝑖𝑓 𝑚𝑜𝑑 𝑡, 𝐾 ≠ 0 𝑇0(𝑊𝑡 − η 𝜕 𝑙 𝑊𝑡, 𝑍 𝜕𝑊𝑡 , θ) , 𝑖𝑓 𝑚𝑜𝑑 𝑡, 𝐾 = 0 𝑇0(𝑣𝑖, θ) = 0 , 𝑖𝑓 𝑣𝑖 < θ 𝑣𝑖 , 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 We first apply the standard stochastic gradient descent rule, and then round small coefficients to zero.(Langford et al., 2009)
  • 13. Productivity Optimization Lab Shao-Yen Hung 3. TG (Truncated Gradient) 13 • We observe that the direct rounding to zero is too aggressive. A less aggressive version is to shrink the coefficient to zero by a smaller amount. We call this idea truncated gradient.(Langford et al., 2009) (Lasso)
  • 14. Productivity Optimization Lab Shao-Yen Hung 3. TG (Truncated Gradient) 14 𝑇1(𝑣𝑖, α, θ) = max 0, 𝑣𝑖 − α , 𝑖𝑓 𝑣𝑖∈ [0, θ] min 0, 𝑣𝑖 + α , 𝑖𝑓 𝑣𝑖∈ [−θ, 0] 𝑣𝑖 , 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 𝑊𝑡+1 = 𝑇1(𝑊𝑡 − η 𝜕 𝑙 𝑊𝑡,𝑍 𝜕𝑊𝑡 , η𝒈𝒊, θ) 𝒈𝒊 = 0 , 𝑖𝑓 𝑚𝑜𝑑 𝑡, 𝐾 ≠ 0 𝐾𝑔 , 𝑖𝑓 𝑚𝑜𝑑 𝑡, 𝐾 = 0 • 4 parameters:  θ : threshold for deciding whether coefficient is 0 or not  K : doing truncating after k online steps  η : learning rate  g:gravity parameter
  • 15. Productivity Optimization Lab Shao-Yen Hung 3. TG (Truncated Gradient) 15 𝑊𝑡+1 = 𝑇1(𝑊𝑡 − η 𝜕 𝑙 𝑊𝑡, 𝑍 𝜕𝑊𝑡 , η𝒈𝒊, θ) , 𝑖𝑓 𝑚𝑜𝑑 𝑡, 𝐾 = 0 𝑊𝑡 − η 𝜕 𝑙 𝑊𝑡, 𝑍 𝜕𝑊𝑡 , 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 Rewrite the TG formulation: 𝑇1(𝑣𝑖, α, θ) = 0 , 𝑖𝑓 |𝑣𝑖| < α 𝑣𝑖 −α ∗ 𝑠𝑔𝑛 𝑣𝑖 , 𝑖𝑓α ≤ 𝑣𝑖 < θ 𝑣𝑖 , 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 1) If α = θ SCR(簡單截斷法) 2) If K=1, θ = INF 𝑊𝑡+1 = 𝑇1(𝑊𝑡 − η 𝜕 𝑙 𝑊𝑡,𝑍 𝜕𝑊𝑡 , η𝒈𝒊, 𝑰𝑵𝑭) = 𝑊𝑡 −η 𝜕 𝑙 𝑊𝑡,𝑍 𝜕𝑊𝑡 − η𝒈𝒊sgn( 𝑊𝑡 −η 𝜕 𝑙 𝑊𝑡,𝑍 𝜕𝑊𝑡 ) L1 Regularization (Lasso) (if α ≤ |𝑣𝑖|)
  • 16. Productivity Optimization Lab Shao-Yen Hung 3. SCR vs TG vs LASSO 16 α = θ K=1, θ = INF −α α
  • 17. Productivity Optimization Lab Shao-Yen Hung 3. TG (Truncated Gradient) 17
  • 18. Productivity Optimization Lab Shao-Yen Hung 0. Agenda 1. Problem Definition 2. Background Knowledge  Convex Function, Subgradient  Lagrange Multiplier, KKT Conditions  Loss Function  Regularization  BGD(Batch Gradient Descent) vs SGD (Stochastic Gradient Descent) 3. SCR (Simple coefficient Rounding);TG (Truncated Gradient) 4. FOBOS (Forward-Backward Splitting / Forward Looking Subgradients) 5. RDA (Regularized Dual Averaging) 6. FTRL (Follow-the-Regularized-Leader) 18
  • 19. Productivity Optimization Lab Shao-Yen Hung 4. FOBOS (Forward Backward Splitting) 19 (1) 標準梯度下降公式: (2) L1-FOBOS的梯度下降公式,可以再細分為兩部分:  前部分:微調發生在梯度下降的結果(𝑾 𝒕+ 𝟏 𝟐 )附近  後部分:處理正則化,產生稀疏性 r(w) = regularization functions • 事實上,這個方法應該叫FOBAS,可是原作者John Langfold 一開始稱 呼這個方法為FOLOS (Forward Looking Subgradients),為了避免困擾, 於是把A改成O,變成FOBOS。 (加上L1 regularization的FOBOS)
  • 20. Productivity Optimization Lab Shao-Yen Hung 4. FOBOS (Forward Backward Splitting) 20 (3) 要求得(2)最佳解的充分條件: 0 屬於其subgradient set之中 (4) 因為 ,(3) 可以改寫成: (5) 換句話說,把(4)移項之後:  迭代前的狀態𝑾 𝒕 與梯度 backward  當次迭代的正則項資訊 𝝏𝒓(𝑾𝒕+𝟏) forward
  • 21. Productivity Optimization Lab Shao-Yen Hung 4. FOBOS (Forward Backward Splitting) 21 • L1-FOBOS’s Sparsity (1)改寫一下原式: (1) (2) λ r(w) = λ ||w|| 1
  • 22. Productivity Optimization Lab Shao-Yen Hung 4. FOBOS (Forward Backward Splitting) 22 • L1-FOBOS’s Sparsity (2)可以拆解成每一維度權重的總和: (2) (3)
  • 23. Productivity Optimization Lab Shao-Yen Hung 4. FOBOS (Forward Backward Splitting) 23 • L1-FOBOS’s Sparsity (3)如果𝒘∗是某一維度𝒘𝒋的最佳解,則𝒘𝒋‧𝒗𝒋 ≥ 0: (3) (反證法) 如果上述不成立,表示 𝒘𝒋 ‧ 𝒗𝒋 < 0 因此: 𝟏 𝟐 𝒗 𝟐 < 𝟏 𝟐 𝒗 𝟐 − 𝒘∗‧𝒗 + 𝟏 𝟐 (𝒘∗) 𝟐 < 𝟏 𝟐 𝒗 − 𝒘∗ 𝟐 + λ 𝒘∗ 這和𝒘𝒊 ∗ 為最佳解不符,故𝒘𝒋‧𝒗𝒋 ≥ 0
  • 24. Productivity Optimization Lab Shao-Yen Hung 4. FOBOS (Forward Backward Splitting) 24 • L1-FOBOS’s Sparsity (4)當𝒘𝒋‧𝒗𝒋 ≥ 0: If 𝐯𝐣 ≥ 0: a) 𝒘∗ > 0:Since βw*=0, β=0 w* = v - 𝜆 b) 𝒘∗ = 0:Since β≥ 0, 𝑣𝑖 - 𝜆 ≤ 0 That is, w* = max(0, 𝑣𝑖 - 𝜆) s.t −𝒘𝒋 ≤ 𝟎 ]=0 and βw=0 𝜕 𝜕𝑤 [ KKT 𝑤∗ (5) Same as 𝒗𝒋 < 0: w* = -max(0, −𝑣𝑖 - 𝜆)
  • 25. Productivity Optimization Lab Shao-Yen Hung 4. FOBOS (Forward Backward Splitting) 25 • L1-FOBOS’s Sparsity (6)綜合(4)(5)的結論: 𝑊𝑖 𝑡+1 = sgn 𝑣𝑖 max(0, |vi| − λ) 𝑊𝑖 𝑡+1 = sgn 𝑊𝑖 𝑡 − η 𝑡 𝑔𝑖 𝑡 max(0, |𝑊𝑖 𝑡 − η 𝑡 𝑔𝑖 𝑡 | − η 𝑡+ 1 2 λ) 𝑣𝑖 = 𝑊𝑖 𝑡 − η 𝑡 𝑔𝑖 𝑡
  • 26. Productivity Optimization Lab Shao-Yen Hung 4. FOBOS (Forward Backward Splitting) 26 • L1-FOBOS’s Sparsity (7)根據(6)的式子: 可以發現到,當|𝑊𝑖 𝑡 − η 𝑡 𝑔𝑖 𝑡 | ≤ η 𝑡+ 1 2 λ 時,會對𝑊𝑖 𝑡+1 進行截斷(Truncating) 換句話說: 𝑊𝑖 𝑡+1 = sgn 𝑊𝑖 𝑡 − η 𝑡 𝑔𝑖 𝑡 max(0, |𝑊𝑖 𝑡 − η 𝑡 𝑔𝑖 𝑡 | − η 𝑡+ 1 2 λ ) 𝑊𝑡+1 = 0 , 𝑖𝑓 𝑊𝑖 𝑡 − η 𝑡 𝑔𝑖 𝑡 ≤ η 𝑡+ 1 2 λ 𝑊𝑖 𝑡 − η 𝑡 𝑔𝑖 𝑡 − sgn 𝑊𝑖 𝑡 − η 𝑡 𝑔𝑖 𝑡 η 𝑡+ 1 2 λ , 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 (可以這麼解釋) 當一個新樣本產生的梯度,不足以讓該維度權重產生足夠大的變化時, 認為該維度在本次更新中不重要,因此令其權重為 0
  • 27. Productivity Optimization Lab Shao-Yen Hung 4. TG vs FOBOS 27 • L1-FOBOS 𝑊𝑡+1 = 0 , 𝑖𝑓 𝑊𝑖 𝑡 − η 𝑡 𝑔𝑖 𝑡 ≤ η 𝑡+ 1 2 λ 𝑊𝑖 𝑡 − η 𝑡 𝑔𝑖 𝑡 − sgn 𝑊𝑖 𝑡 − η 𝑡 𝑔𝑖 𝑡 η 𝑡+ 1 2 λ , 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 有趣的是,當 K=1, θ = INF, TG = L1-FOBOS α = η 𝑡+ 1 2 λ • TG 𝑊𝑡+1 = 𝑇1(𝑊𝑡 − η 𝜕 𝑙 𝑊𝑡, 𝑍 𝜕𝑊𝑡 , η𝒈𝒊, θ) , 𝑖𝑓 𝑚𝑜𝑑 𝑡, 𝐾 = 0 𝑊𝑡 − η 𝜕 𝑙 𝑊𝑡, 𝑍 𝜕𝑊𝑡 , 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 𝑇1(𝑣𝑖, α, θ) = 0 , 𝑖𝑓 |𝑣𝑖| ≤ α 𝑣𝑖 −α ∗ 𝑠𝑔𝑛 𝑣𝑖 , 𝑖𝑓α ≤ | 𝑣𝑖 | ≤ θ 𝑣𝑖 , 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
  • 28. Productivity Optimization Lab Shao-Yen Hung 0. Agenda 1. Problem Definition 2. Background Knowledge  Convex Function, Subgradient  Lagrange Multiplier, KKT Conditions  Loss Function  Regularization  BGD(Batch Gradient Descent) vs SGD (Stochastic Gradient Descent) 3. SCR (Simple coefficient Rounding);TG (Truncated Gradient) 4. FOBOS (Forward-Backward Splitting / Forward Looking Subgradients) 5. RDA (Regularized Dual Averaging) 6. FTRL (Follow-the-Regularized-Leader) 28
  • 29. Productivity Optimization Lab Shao-Yen Hung Reference [1] John Langford, Lihong Li & Tong Zhang. Sparse Online Learning via Truncated Gradient. Journal of Machine Learning Research, 2009. [2] John Duchi & Yoram Singer. Efficient Online and Batch Learning using Forward Backward Splitting. Journal of Machine Learning Research, 2009. [3] Lin Xiao. Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization. Journal of Machine Learning Research, 2010. [4] H. B. McMahan. Follow-the-regularized-leader and mirror descent: Equivalence theorems and L1 regularization. In AISTATS, 2011. [5] H. Brendan McMahan,Gary Holt, D. Sculley et al. Ad Click Prediction: a View from the Trenches. In KDD , 2013. 29