Transcript
Page 1: Top 10 Performance Gotchas for scaling in-memory Algorithms

H2O – The Open Source Math Engine !

Better Predictions!

Page 2: Top 10 Performance Gotchas for scaling in-memory Algorithms

4/23/13

H2O – Open Source in-memory Machine Learning for Big Data

Page 3: Top 10 Performance Gotchas for scaling in-memory Algorithms

Universe is sparse. Life is messy. Data is sparse & messy.!

- Lao Tzu

Page 4: Top 10 Performance Gotchas for scaling in-memory Algorithms

Hadoop = opportunity Not enough Data Scientists Analysts won’t code java

Page 5: Top 10 Performance Gotchas for scaling in-memory Algorithms

H2O the

Prediction

Engine

Adhoc  Explora-on  

Math  Modeling  

Real-­‐-me  Scoring  

Big Data

Messy  NAs  

Clustering  

Classifica-on                          

                               

Ensembles 100’s nanos  

models  

Regression  

Group  By  Grep  

Page 6: Top 10 Performance Gotchas for scaling in-memory Algorithms

H2O the

Prediction

Engine

Big  Data  Explora-on  Modeling  Scoring  Real-­‐-me  

 

No New API!

Approximate!results each step!

Page 7: Top 10 Performance Gotchas for scaling in-memory Algorithms

H2O the

Prediction

Engine

Intellectual  Legacy  

 Math  needs    to  be  free  

 Open  Source  

 

Support and Innovation

hFps://github.com/0xdata/h2o  

Page 8: Top 10 Performance Gotchas for scaling in-memory Algorithms

All Top 10ʼs are binary!- Anonymous

Page 9: Top 10 Performance Gotchas for scaling in-memory Algorithms

Data chunks > code chunks TCP for Data. UDP for Control.

>> Generated Java Assist

10      Move Code not Data  

Page 10: Top 10 Performance Gotchas for scaling in-memory Algorithms

JVM 4 Heap

JVM 1 Heap

JVM 2 Heap

JVM 3 Heap

A Frame: Vec[] age   sex   zip   ID   car  

l Vecs aligned in heaps l Optimized for concurrent access l Random access any row, any JVM

A Chunk, Unit of Parallel Access

Page 11: Top 10 Performance Gotchas for scaling in-memory Algorithms

season for Variable-sized chunks

and a season Uniform chunks. Tightly-packed! (chunk is also unit of batch!)

9      Chunk-ing Express!  

Page 12: Top 10 Performance Gotchas for scaling in-memory Algorithms

No Expensive intermediate states. Fine-grain parallelism wins! >> Fork / Join

8      Reduce early. Reduce Often!  

Page 13: Top 10 Performance Gotchas for scaling in-memory Algorithms

All CPUs grab Chunks in parallel Map/Reduce & F/J handles all sync

8      Reduce early. Reduce Often!  

JVM 4 Heap

JVM 1 Heap JVM 2 Heap JVM 3 Heap

Vec   Vec   Vec   Vec   Vec  

Page 14: Top 10 Performance Gotchas for scaling in-memory Algorithms

Debugging slow >> Heartbeats, Messages Two General’s Paradox

7      Slow is not different from Dead  

Page 15: Top 10 Performance Gotchas for scaling in-memory Algorithms

in-memory system as good as your memory manager! lazy eviction. compress.

align. Corollary: Track down Leaks!

6      Memory Manager  

Page 16: Top 10 Performance Gotchas for scaling in-memory Algorithms

Use primitives

5      Memory Overheads  

// A Distributed Vector // much more than 2billion elements class Vec { long length(); // more than an int's worth // fast random access double at(long idx); // Get the idx'th elem boolean isNA(long idx); void set(long idx, double d); // writable void append(double d); // variable sized }

Page 17: Top 10 Performance Gotchas for scaling in-memory Algorithms

Tree size Bin size Recursively divide Till Data à Cache

4      Cache-­‐Oblivious  

Page 18: Top 10 Performance Gotchas for scaling in-memory Algorithms

User-mode reliability S3 Readers will TCP Reset Mux your connections Not all toolkits are equal. >> JetS3

3      EC2 – Nothing is bounded  

Page 19: Top 10 Performance Gotchas for scaling in-memory Algorithms

Non-Blocking Data Structures.

2 No Locks, No Cry  

// VOLATILE READ before key compare. // CAS private final boolean CAS_kvs( final Object[] oldkvs, final Object[] newkvs ) { return _unsafe.compareAndSwapObject(this, _kvs_offset, oldkvs, newkvs ); }

Page 20: Top 10 Performance Gotchas for scaling in-memory Algorithms
Page 21: Top 10 Performance Gotchas for scaling in-memory Algorithms

byte[ ]. roll-your-own. fast.

1 endian wars ended! Keep-It-Simple-Serialization.  

public AutoBuffer putA1 ( byte[] ary, int sofar, int length ) {

while( sofar < length ) { int len = Math.min(length - sofar, _bb.remaining()); _bb.put(ary, sofar, len); sofar += len; if( sofar < length ) sendPartial(); } return this;

}

Page 22: Top 10 Performance Gotchas for scaling in-memory Algorithms

Data Movement is a Defect. Slowing down helps communication.

Got Speed?  

Page 23: Top 10 Performance Gotchas for scaling in-memory Algorithms

Accuracy rules over speed. Predictive Performance

0      Math always produces a number  

Page 24: Top 10 Performance Gotchas for scaling in-memory Algorithms

Data presentation bias. Sorted data => interesting results

1      Shuffle  

Page 25: Top 10 Performance Gotchas for scaling in-memory Algorithms

2      Random acts of Kindness?  

Page 26: Top 10 Performance Gotchas for scaling in-memory Algorithms
Page 27: Top 10 Performance Gotchas for scaling in-memory Algorithms

3      Convex Problems: ADMM  

Page 28: Top 10 Performance Gotchas for scaling in-memory Algorithms

Matrix operations jama, jblas.. all single node. Distributed version needs data transfer!

4  Amdahl strikes: Cholesky / QR Decomposition  

Page 29: Top 10 Performance Gotchas for scaling in-memory Algorithms

embarrassingly parallel binning tree-building splits

5    Random  Forests  

Page 30: Top 10 Performance Gotchas for scaling in-memory Algorithms

iterate & stage weak-learners =>

strong learners each tree can be parallel minimize communication

6    Boos-ng  

Page 31: Top 10 Performance Gotchas for scaling in-memory Algorithms

embarrassingly parallel pre-calculate base stats distance calculation weight matrices – small footprint

7    Neural  Nets  &  Clustering  

Page 32: Top 10 Performance Gotchas for scaling in-memory Algorithms

Daisy chain a bunch of models Interleave. JIT – Minimize loops over data.

8    Ensembles  

Page 33: Top 10 Performance Gotchas for scaling in-memory Algorithms

Deterministic versions first! Got Pen & Paper? Optimize often. Test Big Data soon.

9      Tools  

Page 34: Top 10 Performance Gotchas for scaling in-memory Algorithms

Replace NAs to improves predictive performance by about 10pc.!

- Newton

Page 35: Top 10 Performance Gotchas for scaling in-memory Algorithms

Munging Missing Features impute NAs with mean impute NAs with knn impute with recursive pca!

- Boyd

Page 36: Top 10 Performance Gotchas for scaling in-memory Algorithms

Unbalanced data single rare classes Fraud / No-Fraud!

Stratify

Page 37: Top 10 Performance Gotchas for scaling in-memory Algorithms

Unbalanced data multiple rare classes Browse, Click, Purchase!

Stratify

Page 38: Top 10 Performance Gotchas for scaling in-memory Algorithms

Use Customer Data Algorithms for Sparse vs. Dense Unbalanced Data. Robustness under noise

10      Data is the System  

Page 39: Top 10 Performance Gotchas for scaling in-memory Algorithms

Volume:  HDFS  

HIVE/SQL

Data Scientist

Munging slice n dice Features

Classification Regression Clustering Optimal Model

Engineer

Velocity:  Events   Online  Scoring  

Explora-on  

       Modeling  

Offline  Scoring  

Business Analyst

Ensemble models Low latency

Applications

Predictions

Rule  Engine  

Before H2O

Page 40: Top 10 Performance Gotchas for scaling in-memory Algorithms

Big  Data  Explora-on  Modeling  Scoring  Real-­‐-me  

 

Big Data beats Better Algorithms!

Page 41: Top 10 Performance Gotchas for scaling in-memory Algorithms

Big  Data  Explora-on  Modeling  Scoring  Real-­‐-me  

 

Big Data and Better Algorithms! Scale & Parallelism!

Page 42: Top 10 Performance Gotchas for scaling in-memory Algorithms

H2O the

Prediction

Engine

Intellectual  Legacy  

 Math  needs    to  be  free  

 Open  Source  

 

Support and Innovation

hFps://github.com/0xdata/h2o  

Page 43: Top 10 Performance Gotchas for scaling in-memory Algorithms
Page 44: Top 10 Performance Gotchas for scaling in-memory Algorithms

H2O – The Open Source Math Engine !

Better Predictions!

Page 45: Top 10 Performance Gotchas for scaling in-memory Algorithms

0xdata.com  

45  

Distributed Coding Taxonomy

l  No Distribution Coding: l  Whole Algorithms, Whole Vector-Math!l  REST + JSON: e.g. load data, GLM, get results!

l  Simple Data-Parallel Coding: l  Per-Row (or neighbor row) Math!l  Map/Reduce-style: e.g. Any dense linear algebra!

l  Complex Data-Parallel Coding l  K/V Store, Graph Algo's, e.g. PageRank!

Page 46: Top 10 Performance Gotchas for scaling in-memory Algorithms

46  

Distributed Coding Taxonomy

l  No Distribution Coding: l  Whole Algorithms, Whole Vector-Math!l  REST + JSON: e.g. load data, GLM, get results!

l  Simple Data-Parallel Coding: l  Per-Row (or neighbor row) Math!l  Map/Reduce-style: e.g. Any dense linear algebra!

l  Complex Data-Parallel Coding l  K/V Store, Graph Algo's, e.g. PageRank!

Read  the  docs!  

This  talk!  

Join  our  GIT!  

Page 47: Top 10 Performance Gotchas for scaling in-memory Algorithms

0xdata.com  

47  

Distributed Data Taxonomy

Frame – a collection of Vecs Vec – a collection of Chunks Chunk – a collection of 1e3 to 1e6 elems elem – a java double Row i – i'th elements of all the Vecs in a Frame

Page 48: Top 10 Performance Gotchas for scaling in-memory Algorithms

Usecases

Conversion, Retention & Churn!•  Lead Conversion!•  Engagement!•  Product Placement!•  Recommendations!

Pricing Engine!Fraud Detection!


Top Related