Download - L15. Machine Learning - Black Art
![Page 1: L15. Machine Learning - Black Art](https://reader033.vdocuments.pub/reader033/viewer/2022042907/587f5c101a28ab0d378b7679/html5/thumbnails/1.jpg)
Machine Learning - Black Art
Charles ParkerAllston Trading
![Page 2: L15. Machine Learning - Black Art](https://reader033.vdocuments.pub/reader033/viewer/2022042907/587f5c101a28ab0d378b7679/html5/thumbnails/2.jpg)
Machine Learning is Hard!
• By now, you know kind of a lot
• Different types of models
• Feature engineering
• Ways to evaluate
• But you’ll still fail!
• Out in the real world, there’s a whole bunch of things that will kill your project
• FYI - A lot of these talks are stolen
2
![Page 3: L15. Machine Learning - Black Art](https://reader033.vdocuments.pub/reader033/viewer/2022042907/587f5c101a28ab0d378b7679/html5/thumbnails/3.jpg)
Join Me!
• On a journey into the Machine Learning House of Horrors!
• Mwa ha ha!
3
![Page 4: L15. Machine Learning - Black Art](https://reader033.vdocuments.pub/reader033/viewer/2022042907/587f5c101a28ab0d378b7679/html5/thumbnails/4.jpg)
5
• The Horror of The Huge Hypothesis Space
• The Perils of The Poorly Picked Loss Function
• The Creeping Creature Called Cross Validation
• The Dread of the Drifting Domain
• The Repugnance of Reliance on Research Results
The Machine Learning House of Horrors!
![Page 5: L15. Machine Learning - Black Art](https://reader033.vdocuments.pub/reader033/viewer/2022042907/587f5c101a28ab0d378b7679/html5/thumbnails/5.jpg)
Choosing A Hypothesis Space
• By “hypothesis space” we mean the possible classifiers you could build with an algorithm given the data
• This is the choice you make when you pick a learning algorithm
• You have one job!
• Is there any way to make it easier?
6
![Page 6: L15. Machine Learning - Black Art](https://reader033.vdocuments.pub/reader033/viewer/2022042907/587f5c101a28ab0d378b7679/html5/thumbnails/6.jpg)
Theory to The Rescue!
• Probably Approximately Correct
• We’d like our model to have error less than epsilon
• We’d like that to happen at least some percentage of the time
• If the error is epsilon, the percentage is sigma, the number of training examples is m, and the hypothesis space size is d:
7
![Page 7: L15. Machine Learning - Black Art](https://reader033.vdocuments.pub/reader033/viewer/2022042907/587f5c101a28ab0d378b7679/html5/thumbnails/7.jpg)
The Triple Trade-Off
• There is a triple-trade off between the error, the size of the hypothesis space, and the amount of training data you have
8
Error
Hypothesis Space Training Data
![Page 8: L15. Machine Learning - Black Art](https://reader033.vdocuments.pub/reader033/viewer/2022042907/587f5c101a28ab0d378b7679/html5/thumbnails/8.jpg)
What About Huge Data?
• I’m clever, so I’ll use non-parametric methods (Decision tree, k-NN, kernelized SVMs)
• As data scales, curious things tend to happen
• Simpler models become more desirable as they’re faster to fit.
• You can increase model complexity by adding features (maybe word counts)
• Big data often trumps modeling!
9
![Page 9: L15. Machine Learning - Black Art](https://reader033.vdocuments.pub/reader033/viewer/2022042907/587f5c101a28ab0d378b7679/html5/thumbnails/9.jpg)
10
• The Horror of The Huge Hypothesis Space
• The Perils of The Poorly Picked Loss Function
• The Creeping Creature Called Cross Validation
• The Dread of the Drifting Domain
• The Repugnance of Reliance on Research Results
The Machine Learning House of Horrors!
![Page 10: L15. Machine Learning - Black Art](https://reader033.vdocuments.pub/reader033/viewer/2022042907/587f5c101a28ab0d378b7679/html5/thumbnails/10.jpg)
A Dirty Little Secret About ML Algorithms
• They don’t care what you want
• Decision Trees:
• SVM:
• LR:
• LDA:
11
![Page 11: L15. Machine Learning - Black Art](https://reader033.vdocuments.pub/reader033/viewer/2022042907/587f5c101a28ab0d378b7679/html5/thumbnails/11.jpg)
Real-world Losses
• Real losses are nothing like this
• False positive in disease diagnosis
• False positive in face detection
• False positive in thumbprint identification
• Some aren’t even instance-based
• Path dependencies
• Game playing
12
![Page 12: L15. Machine Learning - Black Art](https://reader033.vdocuments.pub/reader033/viewer/2022042907/587f5c101a28ab0d378b7679/html5/thumbnails/12.jpg)
Specializing Your Loss
• One solution is to let developers apply their own loss
• This is the approach of SVM light:
http://svmlight.joachims.org/
It’s been around for a while
• Losses other than Mutual Information can be plugged into the appropriate place in splitting code
• Models trained via gradient descent can obviously be customized (Python’s Theano is interesting for this)
• In the case of multi-example loss function, we have SEARN in Vowpal Wabbit
https://github.com/JohnLangford/vowpal_wabbit
13
![Page 13: L15. Machine Learning - Black Art](https://reader033.vdocuments.pub/reader033/viewer/2022042907/587f5c101a28ab0d378b7679/html5/thumbnails/13.jpg)
Other Hackery
• Sometimes, the solution is just to hack around the actual prediction
• Have several levels (cascade) of classifiers in e.g., medical diagnosis, text recognition
• Apply logic to explicitly avoid high loss cases (e.g., when buying/selling equities)
• Changing the problem setting
• Will you be doing queries? Use ranking or metric learning
• “I want to do crazy thing x with classifiers”, chances are it’s already been done and you can read about it.
14
![Page 14: L15. Machine Learning - Black Art](https://reader033.vdocuments.pub/reader033/viewer/2022042907/587f5c101a28ab0d378b7679/html5/thumbnails/14.jpg)
15
• The Horror of The Huge Hypothesis Space
• The Perils of The Poorly Picked Loss Function
• The Creeping Creature Called Cross Validation
• The Dread of the Drifting Domain
• The Repugnance of Reliance on Research Results
The Machine Learning House of Horrors!
![Page 15: L15. Machine Learning - Black Art](https://reader033.vdocuments.pub/reader033/viewer/2022042907/587f5c101a28ab0d378b7679/html5/thumbnails/15.jpg)
When Validation Attacks!
• Cross validation
• n-Fold - Hold out one fold for testing, train on n - 1 folds
• Great way to measure performance, right?
• It’s all about information leakage
• via instances
• via features
16
![Page 16: L15. Machine Learning - Black Art](https://reader033.vdocuments.pub/reader033/viewer/2022042907/587f5c101a28ab0d378b7679/html5/thumbnails/16.jpg)
Case Study #1: Law of Averages
• Estimate sporting event outcomes
• Use previous games to estimate points scored for each team (via windowing transform)
• Choose winner based on predicted score
• What if you’re off by one on the window?
17
![Page 17: L15. Machine Learning - Black Art](https://reader033.vdocuments.pub/reader033/viewer/2022042907/587f5c101a28ab0d378b7679/html5/thumbnails/17.jpg)
Case Study #2: Photo Dating
• Take scanned photos from 30 different users (on average 200 per user) and create a model to assign a date taken (plus or minus five years)
• Perform 10-cross validation
• Accuracy is 85%. Can you trust it?
18
![Page 18: L15. Machine Learning - Black Art](https://reader033.vdocuments.pub/reader033/viewer/2022042907/587f5c101a28ab0d378b7679/html5/thumbnails/18.jpg)
Case Study #3: Moments In Time
• You have a buy/sell opportunity every five seconds
• The signals you use to evaluate the opportunity are aggregates of market activity over the last five minutes
• How careful must you be with cross-validation?
19
![Page 19: L15. Machine Learning - Black Art](https://reader033.vdocuments.pub/reader033/viewer/2022042907/587f5c101a28ab0d378b7679/html5/thumbnails/19.jpg)
20
• The Horror of The Huge Hypothesis Space
• The Perils of The Poorly Picked Loss Function
• The Creeping Creature Called Cross Validation
• The Dread of the Drifting Domain
• The Repugnance of Reliance on Research Results
The Machine Learning House of Horrors!
![Page 20: L15. Machine Learning - Black Art](https://reader033.vdocuments.pub/reader033/viewer/2022042907/587f5c101a28ab0d378b7679/html5/thumbnails/20.jpg)
Breaking Machine Learning
• You’ve got this great model! Congratulations!
• Suddenly it stops working. Why?
• You might be in a domain that tends to change over time (document classification, sales prediction)
• You might be experiencing adverse selection (market data predictions, spam)
21
![Page 21: L15. Machine Learning - Black Art](https://reader033.vdocuments.pub/reader033/viewer/2022042907/587f5c101a28ab0d378b7679/html5/thumbnails/21.jpg)
Concept Drift
• This is called non-stationarity in either the prior or the conditional distributions
• Could be a couple of different things
• If the prior p(input) is changing, it’s covariate shift
• If the conditional p(output | input) is changing, it’s concept drift
• No rule that it can’t be both
• http://blog.bigml.com/2013/03/12/machine-learning-from-streaming-data-two-problems-two-solutions-two-concerns-and-two-lessons/
22
![Page 22: L15. Machine Learning - Black Art](https://reader033.vdocuments.pub/reader033/viewer/2022042907/587f5c101a28ab0d378b7679/html5/thumbnails/22.jpg)
Take Action!
• First: Look for symptoms
• Getting a lot of errors
• The distribution of predicted values changes
• Drift detection algorithms (that I know about) have the same basic flavor:
• Buffer some data in memory
• If recent data is “different” from past data, retrain, update or give up
• Some resources - A nice survey paper and an open source package:
23
http://www.win.tue.nl/~mpechen/publications/pubs/Gama_ACMCS_AdaptationCD_accepted.pdf
http://moa.cms.waikato.ac.nz/
![Page 23: L15. Machine Learning - Black Art](https://reader033.vdocuments.pub/reader033/viewer/2022042907/587f5c101a28ab0d378b7679/html5/thumbnails/23.jpg)
The Benefits of Archeology
• Why might you train on old data, even if it’s not relevant?
• Verification of your research process
• You’d do the same thing last year. Did it work?
• Gives you a good idea of how much drift you should expect
24
![Page 24: L15. Machine Learning - Black Art](https://reader033.vdocuments.pub/reader033/viewer/2022042907/587f5c101a28ab0d378b7679/html5/thumbnails/24.jpg)
25
• The Horror of The Huge Hypothesis Space
• The Perils of The Poorly Picked Loss Function
• The Creeping Creature Called Cross Validation
• The Dread of the Drifting Domain
• The Repugnance of Reliance on Research Results
The Machine Learning House of Horrors!
![Page 25: L15. Machine Learning - Black Art](https://reader033.vdocuments.pub/reader033/viewer/2022042907/587f5c101a28ab0d378b7679/html5/thumbnails/25.jpg)
Publish or Perish
• Academic papers are a certain type of result
• Show incremental improvement in accuracy or generality
• Prove something about your algorithm
• This latter is hard to come by as results get more realistic
• Machine learning proofs assume data is “i.i.d”, but this is obviously false.
• Real world data sucks, and dealing with that significantly changes the dataset
26
![Page 26: L15. Machine Learning - Black Art](https://reader033.vdocuments.pub/reader033/viewer/2022042907/587f5c101a28ab0d378b7679/html5/thumbnails/26.jpg)
Usefulness of Results
• Theoretical Results
• Most of the time bounds do not apply (error, sample complexity, convergence)
• Sometimes they don’t even make any sense
• Beware of putting too much faith in a single person or single person’s work
• Usefulness generally occurs only in the aggregate
• And sometimes not even then (researchers are people, too)
27
![Page 27: L15. Machine Learning - Black Art](https://reader033.vdocuments.pub/reader033/viewer/2022042907/587f5c101a28ab0d378b7679/html5/thumbnails/27.jpg)
Machine Learning Isn’t About Machine Learning
• Why doesn’t it work like in the paper?
• Remember, the paper is carefully controlled in a way your application is not.
• Performance is rarely driven by machine learning
• It’s driven by camera microphones
• It’s driven by Mario Draghi
28
![Page 28: L15. Machine Learning - Black Art](https://reader033.vdocuments.pub/reader033/viewer/2022042907/587f5c101a28ab0d378b7679/html5/thumbnails/28.jpg)
So, Don’t Bother With It?
• Of course not!
• What’s the alternative?
• “All our science, measured against reality, is primitive and childlike — and yet it is the most precious thing we have” - Albert Einstein
• Use academia as your starting point, but don’t think it will get you out of the work
29
![Page 29: L15. Machine Learning - Black Art](https://reader033.vdocuments.pub/reader033/viewer/2022042907/587f5c101a28ab0d378b7679/html5/thumbnails/29.jpg)
Some Themes
• The major points of this talk:
• Machine learning is hard to get right
• The algorithms won’t do what you want
• Good results are probably spurious
• Even if they aren’t, it won’t last
• Reading the research won’t help
• Wait, no!
• Have an attitude of skeptical optimism (or optimal skepticism?)
30