introduction of online machine learning algorithms

Post on 12-Apr-2017

71 Views

Category:

Data & Analytics

5 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Paper Report for SDM course in 2016

Ad Click Prediction: a View from the Trenches(Online Machine Learning)

報 告 者:蔡宗倫、洪紹嚴、蔡佳盈日期: 2016/12/22

3Final Presentation for SDM-2016

4Final Presentation for SDM-2016

READ DATA Time Memory

read.csv 264.5 (secs) 8.73 (GB)

fread 33.18 (secs) 2.98 (GB)

read.big.matrix 205.03 (secs) 0.2 (MB)

2GB 資料,四百萬筆資料, 200個變數

lm Time Memory

read.csv X X

fread X X

read.big.matrix 2.72 (mins) 83.6 (MB)

5Final Presentation for SDM-2016

6Final Presentation for SDM-2016

7Final Presentation for SDM-2016

8Final Presentation for SDM-2016

Big Data (TB, PB, ZB)

Model

Train • Memory• Time/Accuracy

Problem

• Parallel Computation: Hadoop, MapReduce, Spark (TB, PB, ZB)

• R-package: read.table, bigmemory, ff (GB)

• Online learning algorithms

Solutions

9Final Presentation for SDM-2016

TG(2009, Microsoft)

FOBOS(2009, Google)

RDA(2010, Microsoft)

FTRL-Proximal(2011, Google)

Logistic Regression

AOGD(2007, IBM)

Adaptive online gradient descend Truncated Gradient

Online learning algorithms

Regularized dual averaging

Follow-the-regularized-Leader Proximal

Forward-Backward Splitting

10Final Presentation for SDM-2016

Big Data (TB, PB, ZB)

Model

Train

Newdata

Renewweights

• Memory• Time/accuracy

Sparsity (LASSO)

SGD/OGD (NN/GBM)

Problem

11Final Presentation for SDM-2016

TG(2009, Microsoft)

FOBOS(2009, Google)

RDA(2010, Microsoft)

FTRL-Proximal(2011, Google)

Logistic Regression

AOGD(2007, IBM)

+¿ ¿

Online learning algorithms

Adaptive online gradient descend Truncated Gradient

Regularized dual averaging

Follow-the-regularized-Leader Proximal

Forward-Backward Splitting

12

Online Gradient Descent-OGDKind of algorithms used on the online convex optimization

Can be formulated as a repeated game between a player and an adversary

At round , the player chooses an action from some convex subset , and then the

adversary chooses a convex loss function

A central question is how the regret grows with the number of rounds

of the game

Final Presentation for SDM-2016

13

Online Gradient Descent-OGDZinkevich considered the following gradient descent algorithm, with step

size

Here,

Final Presentation for SDM-2016

14

Forward-Backward Splitting (FOBOS)

(1)Loss function of Logistic Regression: =

=Batch gradient descend formula:Online gradient descend formula:

=η𝜕

𝑙 (𝑊 𝑡 ,𝑋 )𝜕𝑊 𝑡

Final Presentation for SDM-2016

15

Forward-Backward Splitting (FOBOS)

Final Presentation for SDM-2016

(1)Loss function of Logistic Regression: =

=Batch gradient descend formula:Online gradient descend formula:

(2) FOBOS 的梯度下降公式,可以細分為兩部分: 前部分:微調發生在梯度下降的結果 () 附近 後部分:處理正則化,產生稀疏性

r(w) = (regularization functions)

=

16Final Presentation for SDM-2016

(3) 要求得 (2) 最佳解的充分條件 : 0 屬於其 subgradient set 之中

(4) 因為 , (3) 可以改寫成:

(5) 換句話說,把 (4) 移項之後:

① 迭代前的狀態與梯度 backward

② 當次迭代的正則項資訊 forward

x

y

Forward-Backward Splitting (FOBOS)

17

FOBOS, RDA, FTRL-Proximal

Final Presentation for SDM-2016

(A) :過去的累積梯度量(B) : regularization functions(C) : proximal: = learning rate ( 保證微調不會離 0 或已迭代後的解太遠 )

(non-smooth convex function) : certain subgradient of

18

FOBOS, RDA, FTRL-Proximal

Final Presentation for SDM-2016

OGD 不夠稀疏 FOBOS 能產生更加好的稀疏特徵梯度下降類方法,精度比較好

RDA 可以在精度與稀疏性之間做更好的平衡稀疏性更加出色

最關鍵的不同點是累積 L1 懲罰項的處理方式FTRL-Proximal

綜合 FOBOS 的精度和 RDA 的稀疏性

19Final Presentation for SDM-2016

20Final Presentation for SDM-2016

f(x) = 0.5A + 1.1B + 3.8C + 0.1D + 11E + 41F1 2 3 4

Per-Coordinate

21Final Presentation for SDM-2016

f(x) = 0.4A + 0.8B + 3.8C + 0.8D + 0E + 41F1 2 3 4

8 5 7 3

Per-Coordinate

22Final Presentation for SDM-2016

f(x) = 0.4A + 1.2B + 3.5C + 0.9D + 0.3E + 41F1 2 3 4

8 5 7 3

Per-Coordinate

23Final Presentation for SDM-2016

Big Data (TB, PB, ZB)

Model

Train

Newdata

Renew Weights(per-coordinate)

• Memory• Time/Accuracy

Sparsity (LASSO)

SGD/OGD (NN/GBM)

Problem

FOBOS(2009, Google)

RDA(2010, Microsoft)

FTRL-Proximal(2011, Google)

Logistic Regression

+¿

24Final Presentation for SDM-2016

25Final Presentation for SDM-2016

R package: FTRLProximal

26Final Presentation for SDM-2016

27Final Presentation for SDM-2016

28Final Presentation for SDM-2016

29Final Presentation for SDM-2016

30Final Presentation for SDM-2016

31Final Presentation for SDM-2016

32Final Presentation for SDM-2016

33Final Presentation for SDM-2016

35Final Presentation for SDM-2016

5.87GB

Prediction result

36Final Presentation for SDM-2016

37Final Presentation for SDM-2016

[1] John Langford, Lihong Li & Tong Zhang. Sparse Online Learning via Truncated Gradient. Journal of Machine Learning Research, 2009.

[2] John Duchi & Yoram Singer. Efficient Online and Batch Learning using Forward Backward Splitting. Journal of Machine Learning Research, 2009.

[3] Lin Xiao. Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization. Journal of Machine Learning Research, 2010.

[4] H. B. McMahan. Follow-the-regularized-leader and mirror descent: Equivalence theorems and L1 regularization. In AISTATS, 2011.

[5] H. Brendan McMahan,Gary Holt, D. Sculley et al. Ad Click Prediction: a View from the Trenches. In KDD , 2013.

[6] Peter Bartlett, Elad Hazan, and Alexander Rakhlin. Adaptive online gradient descent. Technical Report UCB/EECS-2007-82, EECS Department, University of California, Berkeley, Jun 2007.

[7] Martin Zinkevich. Online convex programming and generalized infinitesimal gradient ascent. In ICML, pages 928–936, 2003.

Reference

top related