introduction of online machine learning algorithms

Paper Report for SDM course in 2016

Ad Click Prediction: a View from the Trenches(Online Machine Learning)

報告者：蔡宗倫、洪紹嚴、蔡佳盈日期： 2016/12/22

2Final Presentation for SDM-2016

https://aci.info/2014/07/12/the-data-explosion-in-2014-minute-by-minute-infographic/

READ DATA Time Memory

read.csv 264.5 (secs) 8.73 (GB)

fread 33.18 (secs) 2.98 (GB)

read.big.matrix 205.03 (secs) 0.2 (MB)

2GB 資料，四百萬筆資料， 200個變數

lm Time Memory

read.csv X X

fread X X

read.big.matrix 2.72 (mins) 83.6 (MB)

Big Data (TB, PB, ZB)

Train • Memory• Time/Accuracy

Problem

• Parallel Computation: Hadoop, MapReduce, Spark (TB, PB, ZB)

• R-package: read.table, bigmemory, ff (GB)

• Online learning algorithms

Solutions

TG(2009, Microsoft)

FOBOS(2009, Google)

RDA(2010, Microsoft)

FTRL-Proximal(2011, Google)

Logistic Regression

AOGD(2007, IBM)

Adaptive online gradient descend Truncated Gradient

Online learning algorithms

Regularized dual averaging

Follow-the-regularized-Leader Proximal

Forward-Backward Splitting

Newdata

Renewweights

• Memory• Time/accuracy

Sparsity (LASSO)

SGD/OGD (NN/GBM)

Problem

TG(2009, Microsoft)

FOBOS(2009, Google)

Logistic Regression

AOGD(2007, IBM)

+¿ ¿

Online learning algorithms

Adaptive online gradient descend Truncated Gradient

Regularized dual averaging

Follow-the-regularized-Leader Proximal

Forward-Backward Splitting

Online Gradient Descent-OGDKind of algorithms used on the online convex optimization

Can be formulated as a repeated game between a player and an adversary

At round , the player chooses an action from some convex subset , and then the

adversary chooses a convex loss function

A central question is how the regret grows with the number of rounds

of the game

Final Presentation for SDM-2016

Online Gradient Descent-OGDZinkevich considered the following gradient descent algorithm, with step

Forward-Backward Splitting (FOBOS)

(1)Loss function of Logistic Regression: =

=Batch gradient descend formula:Online gradient descend formula:

=η𝜕

𝑙 (𝑊 𝑡 ,𝑋 )𝜕𝑊 𝑡

(1)Loss function of Logistic Regression: =

=Batch gradient descend formula:Online gradient descend formula:

(2) FOBOS 的梯度下降公式，可以細分為兩部分：前部分：微調發生在梯度下降的結果 () 附近後部分：處理正則化，產生稀疏性

r(w) = (regularization functions)

(3) 要求得 (2) 最佳解的充分條件 : 0 屬於其 subgradient set 之中

(4) 因為， (3) 可以改寫成：

(5) 換句話說，把 (4) 移項之後：

① 迭代前的狀態與梯度 backward

② 當次迭代的正則項資訊 forward

FOBOS, RDA, FTRL-Proximal

(A) ：過去的累積梯度量(B) ： regularization functions(C) ： proximal: = learning rate ( 保證微調不會離 0 或已迭代後的解太遠 )

(non-smooth convex function) : certain subgradient of

FOBOS, RDA, FTRL-Proximal

OGD 不夠稀疏 FOBOS 能產生更加好的稀疏特徵梯度下降類方法，精度比較好

RDA 可以在精度與稀疏性之間做更好的平衡稀疏性更加出色

最關鍵的不同點是累積 L1 懲罰項的處理方式FTRL-Proximal

綜合 FOBOS 的精度和 RDA 的稀疏性

f(x) = 0.5A + 1.1B + 3.8C + 0.1D + 11E + 41F1 2 3 4

Per-Coordinate

f(x) = 0.4A + 0.8B + 3.8C + 0.8D + 0E + 41F1 2 3 4

8 5 7 3

Per-Coordinate

f(x) = 0.4A + 1.2B + 3.5C + 0.9D + 0.3E + 41F1 2 3 4

8 5 7 3

Per-Coordinate

Newdata

Renew Weights(per-coordinate)

• Memory• Time/Accuracy

Sparsity (LASSO)

SGD/OGD (NN/GBM)

Problem

FOBOS(2009, Google)

Logistic Regression

R package: FTRLProximal

https://www.kaggle.com/c/avazu-ctr-prediction

5.87GB

Prediction result

[1] John Langford, Lihong Li & Tong Zhang. Sparse Online Learning via Truncated Gradient. Journal of Machine Learning Research, 2009.

[2] John Duchi & Yoram Singer. Efficient Online and Batch Learning using Forward Backward Splitting. Journal of Machine Learning Research, 2009.

[3] Lin Xiao. Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization. Journal of Machine Learning Research, 2010.

[4] H. B. McMahan. Follow-the-regularized-leader and mirror descent: Equivalence theorems and L1 regularization. In AISTATS, 2011.

[5] H. Brendan McMahan,Gary Holt, D. Sculley et al. Ad Click Prediction: a View from the Trenches. In KDD , 2013.

[6] Peter Bartlett, Elad Hazan, and Alexander Rakhlin. Adaptive online gradient descent. Technical Report UCB/EECS-2007-82, EECS Department, University of California, Berkeley, Jun 2007.

[7] Martin Zinkevich. Online convex programming and generalized infinitesimal gradient ascent. In ICML, pages 928–936, 2003.

Reference

introduction of online machine learning algorithms

Data & Analytics

algorithms & theory artificial intelligence bioinformatics...

amazon machine learning

dynamic virtual machine consolidation algorithms … virtual...

sparse methods for machine learning theory and algorithms

part iv generative learning algorithms · pdf filecs229...

machine learning - plantyst

machine learning bootstrap

machine learning for biological data mining ·...

machine learning algorithms and business use cases

machine learning deep learning neural nets

sparse methods for machine learning -...

demystifying machine learning

sparse methods for machine learning -...

segmentacion machine learning

machine learning

iot & machine learning

the anatomy of a chemical reaction: dissection by machine...

machine learning - introduction to machine...

"aging independently using machine learning, advanced...

machine learning dimensionality reduction · machine...