ai&bigdata lab 2016. Руденко Петр: Особенности обучения,...

17
Training, tuning, selecting & serving of machine learning models at scale Peter Rudenko @peter_rud [email protected]

Upload: geekslab

Post on 17-Jan-2017

118 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: AI&BigData Lab 2016. Руденко Петр: Особенности обучения, настройки и использования моделей машинного обучения

Training, tuning, selecting & serving of machine learning models at scale

Peter Rudenko@peter_rud

[email protected]

Page 3: AI&BigData Lab 2016. Руденко Петр: Особенности обучения, настройки и использования моделей машинного обучения

Input data

Balanced vs skewed target distribution

The devil is in the detail:○ Partitioning○ Leakage○ Sample size

http://blog.mrtz.org/2015/03/09/competition.html

In [42]: ar2d = numpy.array([[1, 2, 3], [11, 12, 13], [10, 20, 40]], dtype='uint8', order='C')

In [43]: ' '.join(str(ord(x)) for x in ar2d.data)

Out[43]: '1 2 3 11 12 13 10 20 40'

In [44]: ar2df = numpy.array([[1, 2, 3], [11, 12, 13], [10, 20, 40]], dtype='uint8', order='F')

In [45]: ' '.join(str(ord(x)) for x in ar2df.data)

Out[45]: '1 11 10 2 12 20 3 13 40'

Page 4: AI&BigData Lab 2016. Руденко Петр: Особенности обучения, настройки и использования моделей машинного обучения

Big Data?

Criteo 1tb data:

Data size:● ~46GB/day● ~180,000,000/day● ~3.5% events rate

Raw Data:[email protected]%

Data:[email protected]%(189 GB in columnar parquet format)

Balanced classes:70GB(12 GB parquet)

Scalability! But at what COST?

“You can have a second computer once you’ve shown you know how to use the first one.” – Paul Barham

Page 5: AI&BigData Lab 2016. Руденко Петр: Особенности обучения, настройки и использования моделей машинного обучения

50 shades of machine learning

Supervised Unsupervised

Semi-supervised

Classification Regression Sequence prediction

Structure prediction

Reinforcement learning

Time series forecasting

Clustering Dimensionality reduction

Topic modeling

Recommendation

Online/Streaming ML

Ranking

Survival Analysis

Anomaly detection

Buzzword maker: REALTIME + BIGDATA + 1 or 2 boxes above = Profit

Page 6: AI&BigData Lab 2016. Руденко Петр: Особенности обучения, настройки и использования моделей машинного обучения

Model state (knowledge) vs hyperparameters

LEARNING = REPRESENTATION + EVALUATION + OPTIMIZATION

* Pedro Domingos, A few useful things to know about machine learning, 2012.

Evaluation = LossFunction(Prediction, True label)

Page 7: AI&BigData Lab 2016. Руденко Петр: Особенности обучения, настройки и использования моделей машинного обучения

OptimizationModel parameters Hyperparameters

Combinatorial optimization:● Greedy search ● Beam search ● Branch-and-bound

Continuous optimization❖ Unconstrained ❏ Gradient descent ❏ Conjugate gradient ❏ Quasi-Newton methods ❖ Constrained ❏ Linear programming ❏ Quadratic programming

● Grid search● Random Search● Bayesian Optimization● Tree of Parzen Estimators (TPE)● Gradient based optimization

Page 8: AI&BigData Lab 2016. Руденко Петр: Особенности обучения, настройки и использования моделей машинного обучения

Distributed Machine Learning

Model fits in memory

Data fits in memory

Yes No

Yes

No Distributed data (hdfs, spark)

Distributed data, distributed models

Page 9: AI&BigData Lab 2016. Руденко Петр: Особенности обучения, настройки и использования моделей машинного обучения

Distributed Machine Learning

Data1 Model 1...DataN Model N

Model Data Parallelism

http://parameterserver.org/https://github.com/intel-machine-learning/DistMLhttp://www.dmtk.io/https://petuum.github.io/bosen.html

Model

Page 10: AI&BigData Lab 2016. Руденко Петр: Особенности обучения, настройки и использования моделей машинного обучения

Speed up distributed machine learning

● Approximate all the things● Update asynchronously ● Early stopping

We draw inspiration from the high-level programming models of dataflow systems, and the low-level efficiency of parameter servers.

TensorFlow: A system for large-scale machine learning

A better model when time is the constraint

Page 11: AI&BigData Lab 2016. Руденко Петр: Особенности обучения, настройки и использования моделей машинного обучения

Сost based optimization

Automating Model Search for Large Scale Machine Learning

Apache SystemMLAutomatic OptimizationAlgorithms specified in DML and PyDML are dynamically compiled and optimized based on data and cluster characteristics using rule-based and cost-based optimization techniques. The optimizer automatically generates hybrid runtime execution plans ranging from in-memory single-node execution to distributed computations on Spark or Hadoop. This ensures both efficiency and scalability. Automatic optimization reduces or eliminates the need to hand-tune distributed runtime execution plans and system configurations.

Page 12: AI&BigData Lab 2016. Руденко Петр: Особенности обучения, настройки и использования моделей машинного обучения

Ensembles● Bagging.

● Boosting.

● Blending.

● Stacking.

Page 13: AI&BigData Lab 2016. Руденко Петр: Особенности обучения, настройки и использования моделей машинного обучения

Dark knowledge

http://www.ttic.edu/dl/dark14.pdf https://www.youtube.com/watch?v=EK61htlw8hY

Page 14: AI&BigData Lab 2016. Руденко Петр: Особенности обучения, настройки и использования моделей машинного обучения

Test time prediction

● Different environment● Different hardware ● Different requirements

Page 15: AI&BigData Lab 2016. Руденко Петр: Особенности обучения, настройки и использования моделей машинного обучения

Types of model transferring1. Model serialization:- Bound to a single language- Bound to a single version

2. Metadata + data (Spark-2.0)(https://tensorflow.github.io/serving/) 3. PMML (http://dmg.org/pmml/v4-2-1/GeneralStructure.html) 4. PFA (http://dmg.org/pfa/index.html) 5. Code generation (h2o.ai)

Page 17: AI&BigData Lab 2016. Руденко Петр: Особенности обучения, настройки и использования моделей машинного обучения

Thanks, QA