cs229 lecture 2 - github pages · 2019. 10. 1. · 2 linear regression : gradient descent let’s...
TRANSCRIPT
-
CS229 Lecture 2Machine Learning
-
Outlines
• Machine Learning: Overview
• Linear Regression
• Classification
• Logistic Regression
• Regularization
• Perceptron
-
1 Machine Learning:Overview
-
1 Machine Learning: Overview
-
2 Linear Regression: 1 Dimensional Input
-
2 Linear Regression: 2 Dimensional Input
-
2 Linear Regression: An Example• Suppose we have a dataset giving the living areas, number of bedrooms and
prices of 200 houses from a specific region:
• Given data like this, how can we learn to predict the prices of other houses, as a function of the size of their living areas and the number of bedrooms?
-
Sample, Example
Feature
Target
Hypothesis
Training Data (Training Set)
Test data (Test Set)
Training Error & Test Error
2 Linear Regression: Terms and Concepts
-
2 Linear Regression: Hypothesis
: the living area of the i-th house in the training set
: the number of bedrooms of the i-th house in the training set
: parameters (weights )
: the price of the i-th house in the trainingset
-
2 Linear Regression: Cost Function
Now, given a training set, how do we learn the parameters θ? One reasonable
method seems to be to make h(x) close to y.
Cost Function (Loss Function):
-
2 Linear Regression: Cost Function
Hypothesis:
Parameters:
Goal:
Cost Function:
, ,
Minimize
-
2 Linear Regression: Gradient Descent
Gradient Descent Algorithm
Update Rule:
In Our Case:
-
2 Linear Regression: Gradient Descent
-
2 Linear Regression : Gradient Descent
Let’s assume we have only one training example (x, y):
For a single training example, this gives the update rule:
-
• Batch Gradient Descent: looks at every example in the entire training set on every step.
2 Linear Regression: Gradient Descent
Old Version
New Version
-
• Stochastic Gradient Descent: update the parameters according to the gradient of the error with respect to that single training example only.
2 Linear Regression: Gradient Descent
Often, stochastic gradient descent gets θ “close” to the minimum muchfaster than batch gradient descent.
-
2 Linear Regression: Gradient Descent
-
2 Linear Regression: Gradient Descent
-
2 Linear Regression: Normal Equations • Represent J as matrix-vectorial notation
-
2 Linear Regression: Normal Equations
m-by-n matrix, actually m-by-n + 1
m-dimensional vector
• Represent J as matrix-vectorial notation
-
2 Linear Regression: Normal Equations
�⃗�𝑦 = 400330369
X = 1 2104 31 1600 31 2400 3
𝜃𝜃 =𝜃𝜃0𝜃𝜃1𝜃𝜃2
Note: more precisely, the features should be scaled.
• For example
-
2 Linear Regression: Feature Scaling
-
2 Linear Regression: Normal Equations
𝒚𝒚: m × 1X: m × (𝒏𝒏 +𝟏𝟏)
𝜽𝜽 ∶ (𝐧𝐧 + 𝟏𝟏) × 1
a real number
-
• Normal Equations
2 Linear Regression: Normal Equations
What if 𝑋𝑋𝑇𝑇𝑋𝑋 𝑖𝑖𝑖𝑖 𝑛𝑛𝑜𝑜𝑛𝑛−𝑖𝑖𝑛𝑛𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖?
-
2 Linear Regression: Normal Equations
-
2 Gradient Descent Vs. Normal Equations
-
3 Classification: Overview
l Classification: classifies examples into given set of categories
-
3 Classification: Application
• Image/Document Classification
• Spam Email Filtering
• Optical Character Recognition
• Machine Vision (e.g., face detection)
• Natural Language Processing (e.g., spoken language understanding)
• Fraud Detection (e.g., if a transaction is fraudulent)
• Market Segmentation (e.g.: predict if customer will respond to promotion)
• Bioinformatics (e.g., classify proteins according to their function)
-
3 Classification: Algorithms
• Logistic Regression
• Support Vector Machine (SVM)
• Naïve Bayes Classifier
• K-Nearest Neighbor Classifier
• Decision Tree, Random Forest
• (Deep) Neural Network
-
4 Logistic Regression: Overview
Logistic Regression was used in the biological sciences in early twentieth century. Logistic
Regression is used when the dependent variable(target) is categorical. For example:
To predict whether an email is spam (1) or (0)
Whether the tumor is malignant (1) or not (0)
-
4 Logistic Regression: : Hypothesis
is called the logistic function or sigmoid function
-
4 Logistic regression: Hypothesis
-
4 Logistic Regression: Sigmoid Function
A useful property of the derivative of the sigmoid function:
-
4 Logistic Regression: How to fit 𝜽𝜽?
Let’s assume that:
This can be written as:
-
• Assuming that the m training examples were generated independently, we can then write down the likelihood of the parameters as:
4 Logistic Regression: Maximize the Likelihood
It will be easier to maximize the log likelihood:
-
• Maximize the likelihood
4 Logistic Regression: Maximize the likelihood
• The stochastic gradient ascent rule
-
4 Logistic Regression: Minimize Cross-Entropy Loss
𝐶𝐶𝑜𝑜𝑖𝑖𝑖𝑖 ℎ𝜃𝜃 𝑥𝑥 𝑖𝑖 ,𝑦𝑦 𝑖𝑖 = �− log ℎ𝜃𝜃 𝑥𝑥 𝑖𝑖 , 𝑖𝑖𝑖𝑖 𝑦𝑦 = 1
− log 1 − ℎ𝜃𝜃 𝑥𝑥 𝑖𝑖 , 𝑖𝑖𝑖𝑖 𝑦𝑦 = 0
-
𝐶𝐶𝑜𝑜𝑖𝑖𝑖𝑖 ℎ𝜃𝜃 𝑥𝑥 𝑖𝑖 ,𝑦𝑦 𝑖𝑖 = �− log ℎ𝜃𝜃 𝑥𝑥 𝑖𝑖 , 𝑖𝑖𝑖𝑖 𝑦𝑦 = 1
− log 1 − ℎ𝜃𝜃 𝑥𝑥 𝑖𝑖 , 𝑖𝑖𝑖𝑖 𝑦𝑦 = 0
𝐽𝐽 𝜃𝜃 =1𝑚𝑚�𝑖𝑖=1
𝑚𝑚
𝐶𝐶𝑜𝑜𝑖𝑖𝑖𝑖 ℎ𝜃𝜃 𝑥𝑥 𝑖𝑖 ,𝑦𝑦 𝑖𝑖 = −1𝑚𝑚
[𝑦𝑦 𝑖𝑖 log ℎ𝜃𝜃 𝑥𝑥 𝑖𝑖 + 1 − 𝑦𝑦 𝑖𝑖 log(1 − ℎ𝜃𝜃 𝑥𝑥 𝑖𝑖 )]
4 Logistic Regression: Minimize Cross-Entropy Loss
• Cross-Entropy Loss
-
4 Logistic regression: Decision Boundary
-
4 Logistic regression: Decision Boundary
-
4 Logistic regression vs Linear regression
-
4 Logistic regression: Multiclass Classification
-
4 Logistic regression: Multiclass Classification
-
4 Logistic regression: Multiclass Classification
-
5 Regularization: Overfitting
-
5 Regularization: Overfitting
-
5 Regularization: Overfitting
-
5 Regularization
-
5 Regularization: Linear Regression
-
5 Regularization: Linear Regression
-
5 Regularization: Linear Regression
-
5 Regularization: Logistic Regression
-
5 Regularization: Logistic Regression
-
54
突触
树突
轴突
生物神经元的基本结构 人工神经元的结构模型
6 Perceptron: Artificial Neuron
-
6 Perceptron: Model • Perceptron
• First compute a weighted sum of the inputs from other neurons.
• Then output a 1 if the weighted sum exceeds the threshold.
-
6 Perceptron: Minimizing Squared Errors
𝐸𝐸 =12
(𝑦𝑦(𝑖𝑖) − ℎ𝜃𝜃(𝑥𝑥(𝑖𝑖)))2
𝜕𝜕𝐸𝐸𝜕𝜕𝜃𝜃𝑗𝑗
= 𝑦𝑦 𝑖𝑖 − ℎ𝜃𝜃 𝑥𝑥 𝑖𝑖 × −1 × 𝑔𝑔, 𝑖𝑖𝑛𝑛 × 𝑥𝑥𝑗𝑗(𝑖𝑖)
= - (𝑦𝑦(𝑖𝑖) − ℎ𝜃𝜃(𝑥𝑥(𝑖𝑖))) × 𝑥𝑥𝑗𝑗(𝑖𝑖)
The squared error for a single training example with input x and true output y is:
Gradient Descentundefined and omitted
Update Rule
-
Homework (1)
Task: Regression
Dataset: Housing
(https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/regression.html#
housing)
Algorithms: Linear Regression
-
Homework (2)
Task: Classification
Dataset: Mnist (http://yann.lecun.com/exdb/mnist/). The MNIST database
of handwritten digits, available from this page, has a training set of 60,000
examples, and a test set of 10,000 examples.
Algorithms: Logistic Regression
http://yann.lecun.com/exdb/mnist/)
-
Homework (3)
Task: Classification
Dataset: Breast Cancer Wisconsin (Diagnostic) Data Set
(https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Dia
gnostic%29)
Algorithms: Perceptron
-
Tips for Homework
• Evaluation Metrics:
Regression: Root Mean Squared Error
Classification: Accuracy
-
Tips for Homework
• Plot Loss Function Figure
Normal Overfitting Learning Rate
-
Tips for Homework• K-Folds Cross Validation
-
More Materials• 1. Deep Learning, deeplearning.ai (Andrew Ng et al.) @ Coursera 网易云课堂
• 2. Andrew Ng. Machine Learning. Coursera,网易公开课
• 3. 机器学习基石, Hsuan-Tien Lin, 林軒田,台湾大学 @ Coursera
• 4. 机器学习技法, Hsuan-Tien Lin, 林軒田,台湾大学 @ Coursera
• 5.CS231n: Convolutional Neural Networks for Visual Recognition Feifei Li et al., University of Stanford
• 6.Ian Goodfellow, Yoshua Bengio, and Aaron Courville, Deep Learning, Book in preparation for MIT Press, 2016, http://www.deeplearningbook.org/
• 7. 台大,李宏毅,http://speech.ee.ntu.edu.tw/~tlkagk/courses_ML16.html
• MIT公开课,Gilbert Strang ,《线性代数》
http://www.deeplearningbook.org/http://speech.ee.ntu.edu.tw/%7Etlkagk/courses_ML16.html
-
Programming Skills
• Python
• Ubuntu
• Keras
• Pytorch
• Tensorflow
-
END
CS229 Lecture 2Machine Learning
幻灯片编号 1Outlines1 Machine Learning:Overview1 Machine Learning: Overview2 Linear Regression: 1 Dimensional Input2 Linear Regression: 2 Dimensional Input2 Linear Regression: An Example2 Linear Regression: Terms and Concepts2 Linear Regression: Hypothesis 2 Linear Regression: Cost Function2 Linear Regression: Cost Function2 Linear Regression: Gradient Descent2 Linear Regression: Gradient Descent2 Linear Regression : Gradient Descent2 Linear Regression: Gradient Descent2 Linear Regression: Gradient Descent2 Linear Regression: Gradient Descent2 Linear Regression: Gradient Descent2 Linear Regression: Normal Equations �2 Linear Regression: Normal Equations �2 Linear Regression: Normal Equations �2 Linear Regression: Feature Scaling 2 Linear Regression: Normal Equations �2 Linear Regression: Normal Equations 2 Linear Regression: Normal Equations 2 Gradient Descent Vs. Normal Equations 3 Classification: Overview3 Classification: Application3 Classification: Algorithms 4 Logistic Regression: Overview� 4 Logistic Regression: : Hypothesis � 4 Logistic regression: Hypothesis �4 Logistic Regression: Sigmoid Function4 Logistic Regression: How to fit 𝜽? 4 Logistic Regression: Maximize the Likelihood 4 Logistic Regression: Maximize the likelihood 4 Logistic Regression: Minimize Cross-Entropy Loss4 Logistic Regression: Minimize Cross-Entropy Loss4 Logistic regression: Decision Boundary 4 Logistic regression: Decision Boundary 4 Logistic regression vs Linear regression4 Logistic regression: Multiclass Classification4 Logistic regression: Multiclass Classification4 Logistic regression: Multiclass Classification5 Regularization: Overfitting5 Regularization: Overfitting5 Regularization: Overfitting5 Regularization5 Regularization: Linear Regression5 Regularization: Linear Regression5 Regularization: Linear Regression5 Regularization: Logistic Regression5 Regularization: Logistic Regression6 Perceptron: Artificial Neuron6 Perceptron: Model �6 Perceptron: Minimizing Squared ErrorsHomework (1)�Homework (2)�Homework (3)�Tips for HomeworkTips for HomeworkTips for HomeworkMore MaterialsProgramming Skills幻灯片编号 65