1. 1
Institute of Manufacturing Information and Systems (製造資訊與系統研究所)
Institute of Engineering Management (工程管理碩士在職專班)
National Cheng Kung University (國立成功大學)
指導教授:李家岩 博士
報 告 者:洪紹嚴
日期:2016/05/06
Online Optimization Problem(1)
12. Productivity Optimization Lab Shao-Yen Hung
3. SCR (Simple coefficient Rounding)
12
• Solve the problem that L1 Regularization in SGD doesn’t
generate sparsity.
• 3 parameters:
θ : threshold for deciding whether coefficient is 0 or not
K : doing truncating after k online steps
η : learning rate
𝑊𝑡+1 =
𝑊𝑡 − η
𝜕 𝑙 𝑊𝑡, 𝑍
𝜕𝑊𝑡
, 𝑖𝑓 𝑚𝑜𝑑 𝑡, 𝐾 ≠ 0
𝑇0(𝑊𝑡 − η
𝜕 𝑙 𝑊𝑡, 𝑍
𝜕𝑊𝑡
, θ) , 𝑖𝑓 𝑚𝑜𝑑 𝑡, 𝐾 = 0
𝑇0(𝑣𝑖, θ) =
0 , 𝑖𝑓 𝑣𝑖 < θ
𝑣𝑖 , 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
We first apply the standard stochastic gradient descent rule, and then round
small coefficients to zero.(Langford et al., 2009)
13. Productivity Optimization Lab Shao-Yen Hung
3. TG (Truncated Gradient)
13
• We observe that the direct rounding to zero is too aggressive. A
less aggressive version is to shrink the coefficient to zero by a
smaller amount. We call this idea truncated gradient.(Langford et
al., 2009)
(Lasso)
14. Productivity Optimization Lab Shao-Yen Hung
3. TG (Truncated Gradient)
14
𝑇1(𝑣𝑖, α, θ) =
max 0, 𝑣𝑖 − α , 𝑖𝑓 𝑣𝑖∈ [0, θ]
min 0, 𝑣𝑖 + α , 𝑖𝑓 𝑣𝑖∈ [−θ, 0]
𝑣𝑖 , 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
𝑊𝑡+1 = 𝑇1(𝑊𝑡 − η
𝜕 𝑙 𝑊𝑡,𝑍
𝜕𝑊𝑡
, η𝒈𝒊, θ)
𝒈𝒊 =
0 , 𝑖𝑓 𝑚𝑜𝑑 𝑡, 𝐾 ≠ 0
𝐾𝑔 , 𝑖𝑓 𝑚𝑜𝑑 𝑡, 𝐾 = 0
• 4 parameters:
θ : threshold for deciding whether coefficient is 0 or not
K : doing truncating after k online steps
η : learning rate
g:gravity parameter
29. Productivity Optimization Lab Shao-Yen Hung
Reference
[1] John Langford, Lihong Li & Tong Zhang. Sparse Online Learning via
Truncated Gradient. Journal of Machine Learning Research, 2009.
[2] John Duchi & Yoram Singer. Efficient Online and Batch Learning using
Forward Backward Splitting. Journal of Machine Learning Research, 2009.
[3] Lin Xiao. Dual Averaging Methods for Regularized Stochastic Learning
and Online Optimization. Journal of Machine Learning Research, 2010.
[4] H. B. McMahan. Follow-the-regularized-leader and mirror descent:
Equivalence theorems and L1 regularization. In AISTATS, 2011.
[5] H. Brendan McMahan,Gary Holt, D. Sculley et al. Ad Click Prediction: a
View from the Trenches. In KDD , 2013.
29