oxford lectures part 1
TRANSCRIPT
![Page 1: Oxford Lectures Part 1](https://reader037.vdocuments.pub/reader037/viewer/2022102609/589d6e9d1a28abd91d8b674b/html5/thumbnails/1.jpg)
A Brief Introduction toBig Data
Andrea PasquaOxford Lectures Part I
July 22nd, 2016
![Page 2: Oxford Lectures Part 1](https://reader037.vdocuments.pub/reader037/viewer/2022102609/589d6e9d1a28abd91d8b674b/html5/thumbnails/2.jpg)
Outline● Why big data
● Statistics and data science
● Machine learning models
● Questions
![Page 3: Oxford Lectures Part 1](https://reader037.vdocuments.pub/reader037/viewer/2022102609/589d6e9d1a28abd91d8b674b/html5/thumbnails/3.jpg)
Why Big Data
![Page 4: Oxford Lectures Part 1](https://reader037.vdocuments.pub/reader037/viewer/2022102609/589d6e9d1a28abd91d8b674b/html5/thumbnails/4.jpg)
Our Times● Data is plentiful and inexpensive for the first time in history
1 GB Storage
1980
$193K
2014
$0.02
![Page 5: Oxford Lectures Part 1](https://reader037.vdocuments.pub/reader037/viewer/2022102609/589d6e9d1a28abd91d8b674b/html5/thumbnails/5.jpg)
Challenges and Opportunities● Challenges
○ Ingesting data■ Distributed computing
■ Efficiency is paramount
![Page 6: Oxford Lectures Part 1](https://reader037.vdocuments.pub/reader037/viewer/2022102609/589d6e9d1a28abd91d8b674b/html5/thumbnails/6.jpg)
Challenges and Opportunities● Challenges
○ Ingesting data
○ Organizing data■ Structured versus Unstructured Data
■ Cleaning and Curating
![Page 7: Oxford Lectures Part 1](https://reader037.vdocuments.pub/reader037/viewer/2022102609/589d6e9d1a28abd91d8b674b/html5/thumbnails/7.jpg)
Challenges and Opportunities● Challenges
○ Ingesting data
○ Organizing data
○ Interpreting data■ Which signals matter
■ Visualizing
![Page 8: Oxford Lectures Part 1](https://reader037.vdocuments.pub/reader037/viewer/2022102609/589d6e9d1a28abd91d8b674b/html5/thumbnails/8.jpg)
Challenges and Opportunities● Challenges
○ Ingesting data
○ Organizing data
○ Interpreting data
○ Fighting spurious correlations■ Highly-adaptable models can mistake random correlations for systemic
■ Old problem magnified
![Page 9: Oxford Lectures Part 1](https://reader037.vdocuments.pub/reader037/viewer/2022102609/589d6e9d1a28abd91d8b674b/html5/thumbnails/9.jpg)
Challenges and Opportunities● Challenges
○ Ingesting data
○ Organizing data
○ Interpreting data
○ Fighting spurious correlations
Abundance of choices can be disorienting
![Page 10: Oxford Lectures Part 1](https://reader037.vdocuments.pub/reader037/viewer/2022102609/589d6e9d1a28abd91d8b674b/html5/thumbnails/10.jpg)
Challenges and Opportunities● Opportunities
○ Automated mining for signals and patterns■ Closer to a multi-purpose selection machine
■ Sensory cortex of the brain
![Page 11: Oxford Lectures Part 1](https://reader037.vdocuments.pub/reader037/viewer/2022102609/589d6e9d1a28abd91d8b674b/html5/thumbnails/11.jpg)
Challenges and Opportunities● Opportunities
○ Automated mining for signals and patterns
○ Ambitious models for ambitious goals■ IBM Watson Superhuman Jeopardy contestant
■ Google Car Self-driving cars
■ Ms COCO (Deep Learning) State-of-the-art image recognition
■ NLP: Word2Vec Glimpses of understanding in NLP
![Page 12: Oxford Lectures Part 1](https://reader037.vdocuments.pub/reader037/viewer/2022102609/589d6e9d1a28abd91d8b674b/html5/thumbnails/12.jpg)
Challenges and Opportunities● Opportunities
○ Automated mining for signals and patterns
○ Ambitious models for ambitious goals■ IBM Watson Superhuman Jeopardy contestant
■ Google Car Self-driving cars
■ Ms COCO (Deep Learning) State-of-the-art image recognition
■ NLP: Word2Vec Glimpses of understanding in NLP
■ Radius Inc. Predicting business behavior
![Page 13: Oxford Lectures Part 1](https://reader037.vdocuments.pub/reader037/viewer/2022102609/589d6e9d1a28abd91d8b674b/html5/thumbnails/13.jpg)
StatisticsAnd Data Science
![Page 14: Oxford Lectures Part 1](https://reader037.vdocuments.pub/reader037/viewer/2022102609/589d6e9d1a28abd91d8b674b/html5/thumbnails/14.jpg)
What is Data Science?● Big data challenges
○ Ingesting data
○ Organizing data
○ Interpreting data
○ Fighting spurious correlations
![Page 15: Oxford Lectures Part 1](https://reader037.vdocuments.pub/reader037/viewer/2022102609/589d6e9d1a28abd91d8b674b/html5/thumbnails/15.jpg)
What is Data Science?● Big data challenges
○ Ingesting data
○ Organizing data
○ Interpreting data
○ Fighting spurious correlations
● Technology-centered view○ Data science is about ingesting, organizing and visualizing large amounts of data
○ Basically the same as data analysis
![Page 16: Oxford Lectures Part 1](https://reader037.vdocuments.pub/reader037/viewer/2022102609/589d6e9d1a28abd91d8b674b/html5/thumbnails/16.jpg)
What is Data Science?● Big data challenges
○ Ingesting data
○ Organizing data
○ Interpreting data
○ Fighting spurious correlations
● Statistics-centered view○ So what is new?
○ Data-rich situation allows for powerful models (millions of parameters)
○ Extreme danger of overfitting
![Page 17: Oxford Lectures Part 1](https://reader037.vdocuments.pub/reader037/viewer/2022102609/589d6e9d1a28abd91d8b674b/html5/thumbnails/17.jpg)
What is Data Science?● Data science is statistical thinking about
○ The construction of highly-powerful, highly adaptable models
○ To predict complex phenomena
○ While keeping at bay overfitting
![Page 18: Oxford Lectures Part 1](https://reader037.vdocuments.pub/reader037/viewer/2022102609/589d6e9d1a28abd91d8b674b/html5/thumbnails/18.jpg)
The Generalization Problem
Machine model of “Man”
![Page 19: Oxford Lectures Part 1](https://reader037.vdocuments.pub/reader037/viewer/2022102609/589d6e9d1a28abd91d8b674b/html5/thumbnails/19.jpg)
The Generalization Problem
Machine model of “Man”
Extrapolationto the wild
![Page 20: Oxford Lectures Part 1](https://reader037.vdocuments.pub/reader037/viewer/2022102609/589d6e9d1a28abd91d8b674b/html5/thumbnails/20.jpg)
The Generalization Problem
Machine model of “Man”
Extrapolationto the wild
![Page 21: Oxford Lectures Part 1](https://reader037.vdocuments.pub/reader037/viewer/2022102609/589d6e9d1a28abd91d8b674b/html5/thumbnails/21.jpg)
The Generalization Problem
Machine model of “Man”
Extrapolationto the wild
Lack of generalization
The machine overfit, and the error was underestimated
![Page 22: Oxford Lectures Part 1](https://reader037.vdocuments.pub/reader037/viewer/2022102609/589d6e9d1a28abd91d8b674b/html5/thumbnails/22.jpg)
The Generalization Problem
Machine model of “Man”
Extrapolationto the wild
![Page 23: Oxford Lectures Part 1](https://reader037.vdocuments.pub/reader037/viewer/2022102609/589d6e9d1a28abd91d8b674b/html5/thumbnails/23.jpg)
The Generalization Problem
Machine model of “Man”
Extrapolationto the wild
Adequate generalization
The machine fit right, and the error was estimated correctly
![Page 24: Oxford Lectures Part 1](https://reader037.vdocuments.pub/reader037/viewer/2022102609/589d6e9d1a28abd91d8b674b/html5/thumbnails/24.jpg)
So What is New?● Photographic camera versus pencil drawings
○ Models can wrap around the training data and fit it precisely…
○ … too precisely (overfitting)
Hastie et al. The Elements of Statistical Learning, 2011
![Page 25: Oxford Lectures Part 1](https://reader037.vdocuments.pub/reader037/viewer/2022102609/589d6e9d1a28abd91d8b674b/html5/thumbnails/25.jpg)
So What is New?● Photographic camera versus pencil drawings
○ Models can wrap around the training data and fit it precisely…
○ … too precisely (overfitting)
Hastie et al. The Elements of Statistical Learning, 2011
Optimal Complexity
![Page 26: Oxford Lectures Part 1](https://reader037.vdocuments.pub/reader037/viewer/2022102609/589d6e9d1a28abd91d8b674b/html5/thumbnails/26.jpg)
Search for Optimal Complexity● Will models generalize to new data?
○ Set aside some data and use it only to test generalizability
![Page 27: Oxford Lectures Part 1](https://reader037.vdocuments.pub/reader037/viewer/2022102609/589d6e9d1a28abd91d8b674b/html5/thumbnails/27.jpg)
Search for Optimal Complexity● Will models generalize to new data?
● Separate the data in Train, Validate and Test portions
Train Validate Test
![Page 28: Oxford Lectures Part 1](https://reader037.vdocuments.pub/reader037/viewer/2022102609/589d6e9d1a28abd91d8b674b/html5/thumbnails/28.jpg)
Search for Optimal Complexity● Will models generalize to new data?
● Training: several models are all fitted
Train Validate Test
Fit models of varying complexity
![Page 29: Oxford Lectures Part 1](https://reader037.vdocuments.pub/reader037/viewer/2022102609/589d6e9d1a28abd91d8b674b/html5/thumbnails/29.jpg)
Search for Optimal Complexity● Will models generalize to new data?
● Validating: choose the model performing best
Train Validate Test
Choose the right amount of complexity
![Page 30: Oxford Lectures Part 1](https://reader037.vdocuments.pub/reader037/viewer/2022102609/589d6e9d1a28abd91d8b674b/html5/thumbnails/30.jpg)
Search for Optimal Complexity● Will models generalize to new data?
● Testing: evaluate the performance of the winning model
Train Validate Test
Evaluate the model on fresh data
![Page 31: Oxford Lectures Part 1](https://reader037.vdocuments.pub/reader037/viewer/2022102609/589d6e9d1a28abd91d8b674b/html5/thumbnails/31.jpg)
Judging Models● The special case of binary classification
○ Win or lose, pay back or default, buy or decline to buy
○ Only one metric of good performance
![Page 32: Oxford Lectures Part 1](https://reader037.vdocuments.pub/reader037/viewer/2022102609/589d6e9d1a28abd91d8b674b/html5/thumbnails/32.jpg)
Judging Models
TP FN
TNFP
● The special case of binary classification
● Confusion matrix○ TP: truly positive and classified as such
○ FP: truly negative but classified as positive
○ TN: truly negative and classified as such
○ FN: truly positive but classified as negative
![Page 33: Oxford Lectures Part 1](https://reader037.vdocuments.pub/reader037/viewer/2022102609/589d6e9d1a28abd91d8b674b/html5/thumbnails/33.jpg)
Judging Models● The special case of binary classification
● Confusion matrix
● Other measures are derived○ Accuracy: how often right
○ Precision: few false positives.
○ Sensitivity: few false negatives
TP FN
TNFP
![Page 34: Oxford Lectures Part 1](https://reader037.vdocuments.pub/reader037/viewer/2022102609/589d6e9d1a28abd91d8b674b/html5/thumbnails/34.jpg)
Judging Models● The special case of binary classification
● Confusion matrix
● Other measures are derived
● Which are more damaging FP or FN?○ Depends on the use case
TP FN
TNFP
![Page 35: Oxford Lectures Part 1](https://reader037.vdocuments.pub/reader037/viewer/2022102609/589d6e9d1a28abd91d8b674b/html5/thumbnails/35.jpg)
Tunable models and ROC● Some models are tunable
○ They can be made more precise and less sensitive, or vice versa
![Page 36: Oxford Lectures Part 1](https://reader037.vdocuments.pub/reader037/viewer/2022102609/589d6e9d1a28abd91d8b674b/html5/thumbnails/36.jpg)
Tunable models and ROC● Some models are tunable
● ROC curve○ Sensitivity vs. (1 - precision)
Sens
itivity
Precision
![Page 37: Oxford Lectures Part 1](https://reader037.vdocuments.pub/reader037/viewer/2022102609/589d6e9d1a28abd91d8b674b/html5/thumbnails/37.jpg)
Tunable models and ROC● Some models are tunable
● ROC curve○ Sensitivity vs. (1 - precision)
Few FP and FN
![Page 38: Oxford Lectures Part 1](https://reader037.vdocuments.pub/reader037/viewer/2022102609/589d6e9d1a28abd91d8b674b/html5/thumbnails/38.jpg)
Tunable models and ROC● Some models are tunable
● ROC curve○ Sensitivity vs. (1 - precision)
Many FP and FN
![Page 39: Oxford Lectures Part 1](https://reader037.vdocuments.pub/reader037/viewer/2022102609/589d6e9d1a28abd91d8b674b/html5/thumbnails/39.jpg)
Tunable models and ROC● Some models are tunable
● ROC curve○ Sensitivity vs. (1 - precision)
few FPs but many
FNs
few FNs but many
FPs
![Page 40: Oxford Lectures Part 1](https://reader037.vdocuments.pub/reader037/viewer/2022102609/589d6e9d1a28abd91d8b674b/html5/thumbnails/40.jpg)
Tunable models and ROC● Some models are tunable
● ROC curve○ Sensitivity vs. (1 - precision)
Perfect
Random
![Page 41: Oxford Lectures Part 1](https://reader037.vdocuments.pub/reader037/viewer/2022102609/589d6e9d1a28abd91d8b674b/html5/thumbnails/41.jpg)
Tunable models and ROC● Some models are tunable
● ROC curve
● Area Under the Curve (AUC) is a good summary statistics
○ ~ 50% - 100%
![Page 42: Oxford Lectures Part 1](https://reader037.vdocuments.pub/reader037/viewer/2022102609/589d6e9d1a28abd91d8b674b/html5/thumbnails/42.jpg)
Machine Learning Models
![Page 43: Oxford Lectures Part 1](https://reader037.vdocuments.pub/reader037/viewer/2022102609/589d6e9d1a28abd91d8b674b/html5/thumbnails/43.jpg)
ML models● A vast collection of tools
○ Random Forest and Gradient Boosted Trees (Netflix prize)
○ Collaborative filtering and matrix factorization methods: Netflix
○ … and more:■ Linear and logistic regression, with different regulators
■ Nonlinear models: e.g. SVM (Support Vector Machines)
■ Bayesian methods: naïve Bayes
■ Unsupervised clustering
■ … and many more
![Page 44: Oxford Lectures Part 1](https://reader037.vdocuments.pub/reader037/viewer/2022102609/589d6e9d1a28abd91d8b674b/html5/thumbnails/44.jpg)
ML models● A vast collection of tools
○ Random Forest and Gradient Boosted Trees (Netflix prize)
○ Collaborative filtering and matrix factorization methods: Netflix
○ … and more:
● A closer look to some tools○ Neural Networks and Deep Learning
○ Word2Vec
![Page 45: Oxford Lectures Part 1](https://reader037.vdocuments.pub/reader037/viewer/2022102609/589d6e9d1a28abd91d8b674b/html5/thumbnails/45.jpg)
Neural Networks
![Page 46: Oxford Lectures Part 1](https://reader037.vdocuments.pub/reader037/viewer/2022102609/589d6e9d1a28abd91d8b674b/html5/thumbnails/46.jpg)
Neural Networks● Origin: AI
○ To build an expert system imitate the only expert system we know: the brain
○ So we need analogs of brain components■ Neurons
■ Synapses
■ Axons
■ Sensory Inputs
![Page 47: Oxford Lectures Part 1](https://reader037.vdocuments.pub/reader037/viewer/2022102609/589d6e9d1a28abd91d8b674b/html5/thumbnails/47.jpg)
Neural Networks● Origin: AI
○ To build an expert system imitate the only expert system we know: the brain
○ So we need analogs of brain components
http://learn.genetics.utah.edu
![Page 48: Oxford Lectures Part 1](https://reader037.vdocuments.pub/reader037/viewer/2022102609/589d6e9d1a28abd91d8b674b/html5/thumbnails/48.jpg)
Neural Networks● Origin: AI
○ To build an expert system imitate the only expert system we know: the brain
○ So we need analogs of brain components
http://learn.genetics.utah.edu
![Page 49: Oxford Lectures Part 1](https://reader037.vdocuments.pub/reader037/viewer/2022102609/589d6e9d1a28abd91d8b674b/html5/thumbnails/49.jpg)
Neural Networks● Origin: AI
○ To build an expert system imitate the only expert system we know: the brain
○ So we need analogs of brain components
http://learn.genetics.utah.edu
![Page 50: Oxford Lectures Part 1](https://reader037.vdocuments.pub/reader037/viewer/2022102609/589d6e9d1a28abd91d8b674b/html5/thumbnails/50.jpg)
Neural Networks● Origin: AI
○ To build an expert system imitate the only expert system we know: the brain
○ So we need analogs of brain components
http://learn.genetics.utah.edu
![Page 51: Oxford Lectures Part 1](https://reader037.vdocuments.pub/reader037/viewer/2022102609/589d6e9d1a28abd91d8b674b/html5/thumbnails/51.jpg)
Neural Networks● Origin: AI
○ To build an expert system imitate the only expert system we know: the brain
○ So we need analogs of brain components
http://learn.genetics.utah.edu
![Page 52: Oxford Lectures Part 1](https://reader037.vdocuments.pub/reader037/viewer/2022102609/589d6e9d1a28abd91d8b674b/html5/thumbnails/52.jpg)
Neural Networks● Origin: AI
○ To build an expert system imitate the only expert system we know: the brain
○ So we need analogs of brain components
http://learn.genetics.utah.edu
![Page 53: Oxford Lectures Part 1](https://reader037.vdocuments.pub/reader037/viewer/2022102609/589d6e9d1a28abd91d8b674b/html5/thumbnails/53.jpg)
Neural Networks● Origin: AI
○ To build an expert system imitate the only expert system we know: the brain
○ So we need analogs of brain components
http://learn.genetics.utah.edu
...
![Page 54: Oxford Lectures Part 1](https://reader037.vdocuments.pub/reader037/viewer/2022102609/589d6e9d1a28abd91d8b674b/html5/thumbnails/54.jpg)
Neural Networks● Origin: AI
○ To build an expert system imitate the only expert system we know: the brain
○ So we need analogs of brain components
http://learn.genetics.utah.edu
... Signal
![Page 55: Oxford Lectures Part 1](https://reader037.vdocuments.pub/reader037/viewer/2022102609/589d6e9d1a28abd91d8b674b/html5/thumbnails/55.jpg)
Neural Networks● Signal Propagation
○ Linear weights for the synapses
○ Thresholding function in the body
... wn
w2….wn-1
w1Signal
![Page 56: Oxford Lectures Part 1](https://reader037.vdocuments.pub/reader037/viewer/2022102609/589d6e9d1a28abd91d8b674b/html5/thumbnails/56.jpg)
Neural Networks● Architecture
○ Arrange in layers and propagate the signal forward
Input Layer
Hidden Layers
OutputLayer
Prediction,Classification,
Ranking,Scoring,
…
Signal
![Page 57: Oxford Lectures Part 1](https://reader037.vdocuments.pub/reader037/viewer/2022102609/589d6e9d1a28abd91d8b674b/html5/thumbnails/57.jpg)
Neural Networks● Learning
○ Supervised: compare output with labeled data
○ Comparison: penalize deviations from truth■ Loss is often quadratic, but not necessarily
■ e.g. (estimated position - real position)2
○ Learning step:■ adjust weights to reduce penalty
■ back-propagate the adjustments towards earlier
layers
○ Weight-adjustment is analog to synaptic
reinforcement in the brain
Input Output
Forward Propagation of Signals
Back Propagation of Weight Adjustments
![Page 58: Oxford Lectures Part 1](https://reader037.vdocuments.pub/reader037/viewer/2022102609/589d6e9d1a28abd91d8b674b/html5/thumbnails/58.jpg)
Neural Networks● Too much learning?
○ Neural networks are rich and adaptable■ Very many weights, especially if there are several layers
○ Minimizing the loss on training, will likely result in overfitting
○ For the model to generalize■ Do small incremental step (how small? Learning rate)
■ Use cross-validation to determine the optimal learning speed
○ Higher training error but lower test error
![Page 59: Oxford Lectures Part 1](https://reader037.vdocuments.pub/reader037/viewer/2022102609/589d6e9d1a28abd91d8b674b/html5/thumbnails/59.jpg)
Towards Deep Learning● Humble beginnings
○ Just a few artificial neurons with simple learning (Perceptron Model, 1958)
● Fell in disrepute (70s)○ AI winter
● Came back (late 80s)○ Larger sizes and more sophisticated learning
● Exploded (2015)○ Deep learning stormed
○ Versatile and powerful
![Page 60: Oxford Lectures Part 1](https://reader037.vdocuments.pub/reader037/viewer/2022102609/589d6e9d1a28abd91d8b674b/html5/thumbnails/60.jpg)
Deep Learningand Convolutional Neural Networks
![Page 61: Oxford Lectures Part 1](https://reader037.vdocuments.pub/reader037/viewer/2022102609/589d6e9d1a28abd91d8b674b/html5/thumbnails/61.jpg)
Deep Learning
Input Layer Hidden Layers Output
Layer
...
● Neural Networks used to be shallow with one or few hidden layers
● Then a deep hierarchy of hidden layers was introduced
![Page 62: Oxford Lectures Part 1](https://reader037.vdocuments.pub/reader037/viewer/2022102609/589d6e9d1a28abd91d8b674b/html5/thumbnails/62.jpg)
CNN● CNN are Convolutional Neural Networks
○ Problem: large input size overfitting
○ Reason: large number of input weights high-res image ~ 6M input ws
○ Solution: look at the visual cortex localization & translation inv.
![Page 63: Oxford Lectures Part 1](https://reader037.vdocuments.pub/reader037/viewer/2022102609/589d6e9d1a28abd91d8b674b/html5/thumbnails/63.jpg)
CNN● CNN are Convolutional Neural Networks
● Convolution layers are localized and identical
![Page 64: Oxford Lectures Part 1](https://reader037.vdocuments.pub/reader037/viewer/2022102609/589d6e9d1a28abd91d8b674b/html5/thumbnails/64.jpg)
CNN● CNN are Convolutional Neural Networks
● Convolution layers are localized and identical
Input Layer
Conv.Layer
![Page 65: Oxford Lectures Part 1](https://reader037.vdocuments.pub/reader037/viewer/2022102609/589d6e9d1a28abd91d8b674b/html5/thumbnails/65.jpg)
CNN● CNN are Convolutional Neural Networks
● Convolution layers are localized and identical
Input Layer
Conv.Layer
![Page 66: Oxford Lectures Part 1](https://reader037.vdocuments.pub/reader037/viewer/2022102609/589d6e9d1a28abd91d8b674b/html5/thumbnails/66.jpg)
CNN● CNN are Convolutional Neural Networks
● Convolution layers are localized and identical
Input Layer
Conv.Layer
![Page 67: Oxford Lectures Part 1](https://reader037.vdocuments.pub/reader037/viewer/2022102609/589d6e9d1a28abd91d8b674b/html5/thumbnails/67.jpg)
CNN● CNN are Convolutional Neural Networks
● Convolution layers are localized and identical
Input Layer
Conv.Layer
Subs.Layer
![Page 68: Oxford Lectures Part 1](https://reader037.vdocuments.pub/reader037/viewer/2022102609/589d6e9d1a28abd91d8b674b/html5/thumbnails/68.jpg)
CNN● CNN are Convolutional Neural Networks
● Convolution layers are localized and identical
Input Layer
Conv.Layer
Subs.Layer
Conv.Layer
Subs.Layer
...
![Page 69: Oxford Lectures Part 1](https://reader037.vdocuments.pub/reader037/viewer/2022102609/589d6e9d1a28abd91d8b674b/html5/thumbnails/69.jpg)
CNN● CNN are Convolutional Neural Networks
● Convolution layers are localized and identical
...
Input Layer
Conv.Layer
Subs.Layer
Conv.Layer
Subs.Layer
Fully Conn.Layer
![Page 70: Oxford Lectures Part 1](https://reader037.vdocuments.pub/reader037/viewer/2022102609/589d6e9d1a28abd91d8b674b/html5/thumbnails/70.jpg)
CNN
...
● CNN are Convolutional Neural Networks
● Convolution layers are localized and identical
Input Layer
Conv.Layer
Subs.Layer
Conv.Layer
Subs.Layer
Fully Conn.Layer
OutputLayer
![Page 71: Oxford Lectures Part 1](https://reader037.vdocuments.pub/reader037/viewer/2022102609/589d6e9d1a28abd91d8b674b/html5/thumbnails/71.jpg)
CNN● CNN are Convolutional Neural Networks
● Convolution layers are localized and identical
● Deep indeed!○ And very neural too
○ State of the art for image recognition…
○ … and several other applications
![Page 72: Oxford Lectures Part 1](https://reader037.vdocuments.pub/reader037/viewer/2022102609/589d6e9d1a28abd91d8b674b/html5/thumbnails/72.jpg)
Word2Vecand Natural Language Processing
![Page 73: Oxford Lectures Part 1](https://reader037.vdocuments.pub/reader037/viewer/2022102609/589d6e9d1a28abd91d8b674b/html5/thumbnails/73.jpg)
Natural Language Processing● Natural Language: Human language as commonly expressed
○ Digest natural language
○ … and process it for a variety of purposes■ e.g. determine a course of action (imperative speech)
■ … or summarize/translate…
■ … or sentiment analysis (classification)
○ Crucial to smooth computer-human interactions
![Page 74: Oxford Lectures Part 1](https://reader037.vdocuments.pub/reader037/viewer/2022102609/589d6e9d1a28abd91d8b674b/html5/thumbnails/74.jpg)
Natural Language Processing● Natural Language: Human language as commonly expressed
○ Digest natural language…
○ … and process it for a variety of purposes
○ Crucial to smooth computer-human interactions
● But is it real understanding?○ Semantic field of a word: e.g. “king” and “monarch”
○ Analogical thinking: e.g. “woman is to man as queen is to king”
○ Context resolution: e.g. “does pear fit in a conversation about fruits”
![Page 75: Oxford Lectures Part 1](https://reader037.vdocuments.pub/reader037/viewer/2022102609/589d6e9d1a28abd91d8b674b/html5/thumbnails/75.jpg)
Word2Vec and NLP● Word2Vec
○ Tomas Mikolov et al. at Google
○ Associate a high-dimensional vector to a word or phrase
Internalrepresentation
of words
![Page 76: Oxford Lectures Part 1](https://reader037.vdocuments.pub/reader037/viewer/2022102609/589d6e9d1a28abd91d8b674b/html5/thumbnails/76.jpg)
Word2Vec and NLP● Word2Vec
○ Tomas Mikolov et al. at Google
○ Associate a high-dimensional vector to a word or phrase
w1, …. , wn
C1, …. , Cn
Words and Contexts
C w
![Page 77: Oxford Lectures Part 1](https://reader037.vdocuments.pub/reader037/viewer/2022102609/589d6e9d1a28abd91d8b674b/html5/thumbnails/77.jpg)
Word2Vec and NLP● Is it real understanding?
● Internal representation of words aware of context and analogies
● Potential to revolutionize computer-human interaction
Paris - France + Italy = Rome !!!
![Page 78: Oxford Lectures Part 1](https://reader037.vdocuments.pub/reader037/viewer/2022102609/589d6e9d1a28abd91d8b674b/html5/thumbnails/78.jpg)
Conclusions
![Page 79: Oxford Lectures Part 1](https://reader037.vdocuments.pub/reader037/viewer/2022102609/589d6e9d1a28abd91d8b674b/html5/thumbnails/79.jpg)
Our Times● Data is plentiful and inexpensive for the first time in history
● Statistical Thinking is learning to cope with large data…
● … and with new more ambitious goals
![Page 80: Oxford Lectures Part 1](https://reader037.vdocuments.pub/reader037/viewer/2022102609/589d6e9d1a28abd91d8b674b/html5/thumbnails/80.jpg)
Our Times● Data is plentiful and inexpensive for the first time in history
● Statistical Thinking is learning to cope with large data…
● … and with new more ambitious goals
● First glimpses of usable AI
![Page 81: Oxford Lectures Part 1](https://reader037.vdocuments.pub/reader037/viewer/2022102609/589d6e9d1a28abd91d8b674b/html5/thumbnails/81.jpg)
Our Times● Data is plentiful and inexpensive for the first time in history
● Statistics Thinking is learning to cope with large data…
● … and with new more ambitious goals
● First glimpses of usable AI
● Data science has disrupted many industries…
● … and will continue to do so.
![Page 82: Oxford Lectures Part 1](https://reader037.vdocuments.pub/reader037/viewer/2022102609/589d6e9d1a28abd91d8b674b/html5/thumbnails/82.jpg)
Next
![Page 83: Oxford Lectures Part 1](https://reader037.vdocuments.pub/reader037/viewer/2022102609/589d6e9d1a28abd91d8b674b/html5/thumbnails/83.jpg)
Predictive marketing● An area of high-impact for data science and big data
● Radius Intelligence
![Page 84: Oxford Lectures Part 1](https://reader037.vdocuments.pub/reader037/viewer/2022102609/589d6e9d1a28abd91d8b674b/html5/thumbnails/84.jpg)
Thanks
![Page 85: Oxford Lectures Part 1](https://reader037.vdocuments.pub/reader037/viewer/2022102609/589d6e9d1a28abd91d8b674b/html5/thumbnails/85.jpg)
Appendix ICross-validation
![Page 86: Oxford Lectures Part 1](https://reader037.vdocuments.pub/reader037/viewer/2022102609/589d6e9d1a28abd91d8b674b/html5/thumbnails/86.jpg)
Search for Optimal Complexity● We split the data to determine the optimal complexity and to test
● Good against overfitting, but data is not used fully
Train Validate Test
![Page 87: Oxford Lectures Part 1](https://reader037.vdocuments.pub/reader037/viewer/2022102609/589d6e9d1a28abd91d8b674b/html5/thumbnails/87.jpg)
Why Cross-Validation?● Two contrasting problems
○ Overfitting, a. k. a. generalization problem
○ Full use of available data
![Page 88: Oxford Lectures Part 1](https://reader037.vdocuments.pub/reader037/viewer/2022102609/589d6e9d1a28abd91d8b674b/html5/thumbnails/88.jpg)
Why Cross-Validation?● Two contrasting problems
○ Overfitting, a. k. a. generalization problem
○ Full use of available data
● Contrasting because...○ To ensure generalization, test on fresh data
○ Fresh data cannot be used for training.
![Page 89: Oxford Lectures Part 1](https://reader037.vdocuments.pub/reader037/viewer/2022102609/589d6e9d1a28abd91d8b674b/html5/thumbnails/89.jpg)
Why Cross-Validation?● Two contrasting problems
○ Overfitting, a. k. a. generalization problem
○ Full use of available data
● Contrasting because...○ To ensure generalization, test on fresh data
○ Fresh data cannot be used for training.
● Worse if we need to run multiple models○ Choosing on Test data would be overfitting
![Page 90: Oxford Lectures Part 1](https://reader037.vdocuments.pub/reader037/viewer/2022102609/589d6e9d1a28abd91d8b674b/html5/thumbnails/90.jpg)
Cross-Validation● Nested N-fold Cross-Validation
○ Start with a split …
Validate TestTrainTrain Train
![Page 91: Oxford Lectures Part 1](https://reader037.vdocuments.pub/reader037/viewer/2022102609/589d6e9d1a28abd91d8b674b/html5/thumbnails/91.jpg)
Cross-Validation● Nested N-fold Cross-Validation
○ … then change it …
Validate TestTrainTrain Train
![Page 92: Oxford Lectures Part 1](https://reader037.vdocuments.pub/reader037/viewer/2022102609/589d6e9d1a28abd91d8b674b/html5/thumbnails/92.jpg)
Cross-Validation● Nested N-fold Cross-Validation
○ … and so on, until you have gone through all N(N-1) ~ N^2 combinations
Train TrainTestTrain Validate
![Page 93: Oxford Lectures Part 1](https://reader037.vdocuments.pub/reader037/viewer/2022102609/589d6e9d1a28abd91d8b674b/html5/thumbnails/93.jpg)
Cross-Validation● Nested N-fold Cross-Validation
○ All the data gets used to train, validate and test the model
○ We are still out-of-sample
○ For each validation and test we now have a distribution of values
Train TrainTestTrain Validate
![Page 94: Oxford Lectures Part 1](https://reader037.vdocuments.pub/reader037/viewer/2022102609/589d6e9d1a28abd91d8b674b/html5/thumbnails/94.jpg)
Cross-Validation● Simple N-fold Cross-Validation
○ As an alternative, we can also perform a simple n-fold cross validation
○ Test is held-out
○ N-fold for the validation stage
Train TestValidateTrain Train
![Page 95: Oxford Lectures Part 1](https://reader037.vdocuments.pub/reader037/viewer/2022102609/589d6e9d1a28abd91d8b674b/html5/thumbnails/95.jpg)
Cross-Validation● Simple N-fold Cross-Validation
○ As an alternative, we can also perform a simple n-fold cross validation
○ Test is held-out
○ N-fold for the validation stage
Validate TestTrainTrain Train
![Page 96: Oxford Lectures Part 1](https://reader037.vdocuments.pub/reader037/viewer/2022102609/589d6e9d1a28abd91d8b674b/html5/thumbnails/96.jpg)
Cross-Validation● Simple N-fold Cross-Validation
○ As an alternative, we can also perform a simple n-fold cross validation
○ Test is held-out
○ N-fold for the validation stage
Train TestTrainTrain Validate