cs229 lecture 2 - github pages · 2019. 10. 1. · 2 linear regression : gradient descent let’s...

65
CS229 Lecture 2 Machine Learning 顾晓玲 [email protected]

Upload: others

Post on 28-Jan-2021

0 views

Category:

Documents


0 download

TRANSCRIPT

  • CS229 Lecture 2Machine Learning

    顾晓玲[email protected]

  • Outlines

    • Machine Learning: Overview

    • Linear Regression

    • Classification

    • Logistic Regression

    • Regularization

    • Perceptron

  • 1 Machine Learning:Overview

  • 1 Machine Learning: Overview

  • 2 Linear Regression: 1 Dimensional Input

  • 2 Linear Regression: 2 Dimensional Input

  • 2 Linear Regression: An Example• Suppose we have a dataset giving the living areas, number of bedrooms and

    prices of 200 houses from a specific region:

    • Given data like this, how can we learn to predict the prices of other houses, as a function of the size of their living areas and the number of bedrooms?

  • Sample, Example

    Feature

    Target

    Hypothesis

    Training Data (Training Set)

    Test data (Test Set)

    Training Error & Test Error

    2 Linear Regression: Terms and Concepts

  • 2 Linear Regression: Hypothesis

    : the living area of the i-th house in the training set

    : the number of bedrooms of the i-th house in the training set

    : parameters (weights )

    : the price of the i-th house in the trainingset

  • 2 Linear Regression: Cost Function

    Now, given a training set, how do we learn the parameters θ? One reasonable

    method seems to be to make h(x) close to y.

    Cost Function (Loss Function):

  • 2 Linear Regression: Cost Function

    Hypothesis:

    Parameters:

    Goal:

    Cost Function:

    , ,

    Minimize

  • 2 Linear Regression: Gradient Descent

    Gradient Descent Algorithm

    Update Rule:

    In Our Case:

  • 2 Linear Regression: Gradient Descent

  • 2 Linear Regression : Gradient Descent

    Let’s assume we have only one training example (x, y):

    For a single training example, this gives the update rule:

  • • Batch Gradient Descent: looks at every example in the entire training set on every step.

    2 Linear Regression: Gradient Descent

    Old Version

    New Version

  • • Stochastic Gradient Descent: update the parameters according to the gradient of the error with respect to that single training example only.

    2 Linear Regression: Gradient Descent

    Often, stochastic gradient descent gets θ “close” to the minimum muchfaster than batch gradient descent.

  • 2 Linear Regression: Gradient Descent

  • 2 Linear Regression: Gradient Descent

  • 2 Linear Regression: Normal Equations • Represent J as matrix-vectorial notation

  • 2 Linear Regression: Normal Equations

    m-by-n matrix, actually m-by-n + 1

    m-dimensional vector

    • Represent J as matrix-vectorial notation

  • 2 Linear Regression: Normal Equations

    �⃗�𝑦 = 400330369

    X = 1 2104 31 1600 31 2400 3

    𝜃𝜃 =𝜃𝜃0𝜃𝜃1𝜃𝜃2

    Note: more precisely, the features should be scaled.

    • For example

  • 2 Linear Regression: Feature Scaling

  • 2 Linear Regression: Normal Equations

    𝒚𝒚: m × 1X: m × (𝒏𝒏 +𝟏𝟏)

    𝜽𝜽 ∶ (𝐧𝐧 + 𝟏𝟏) × 1

    a real number

  • • Normal Equations

    2 Linear Regression: Normal Equations

    What if 𝑋𝑋𝑇𝑇𝑋𝑋 𝑖𝑖𝑖𝑖 𝑛𝑛𝑜𝑜𝑛𝑛−𝑖𝑖𝑛𝑛𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖?

  • 2 Linear Regression: Normal Equations

  • 2 Gradient Descent Vs. Normal Equations

  • 3 Classification: Overview

    l Classification: classifies examples into given set of categories

  • 3 Classification: Application

    • Image/Document Classification

    • Spam Email Filtering

    • Optical Character Recognition

    • Machine Vision (e.g., face detection)

    • Natural Language Processing (e.g., spoken language understanding)

    • Fraud Detection (e.g., if a transaction is fraudulent)

    • Market Segmentation (e.g.: predict if customer will respond to promotion)

    • Bioinformatics (e.g., classify proteins according to their function)

  • 3 Classification: Algorithms

    • Logistic Regression

    • Support Vector Machine (SVM)

    • Naïve Bayes Classifier

    • K-Nearest Neighbor Classifier

    • Decision Tree, Random Forest

    • (Deep) Neural Network

  • 4 Logistic Regression: Overview

    Logistic Regression was used in the biological sciences in early twentieth century. Logistic

    Regression is used when the dependent variable(target) is categorical. For example:

    To predict whether an email is spam (1) or (0)

    Whether the tumor is malignant (1) or not (0)

  • 4 Logistic Regression: : Hypothesis

    is called the logistic function or sigmoid function

  • 4 Logistic regression: Hypothesis

  • 4 Logistic Regression: Sigmoid Function

    A useful property of the derivative of the sigmoid function:

  • 4 Logistic Regression: How to fit 𝜽𝜽?

    Let’s assume that:

    This can be written as:

  • • Assuming that the m training examples were generated independently, we can then write down the likelihood of the parameters as:

    4 Logistic Regression: Maximize the Likelihood

    It will be easier to maximize the log likelihood:

  • • Maximize the likelihood

    4 Logistic Regression: Maximize the likelihood

    • The stochastic gradient ascent rule

  • 4 Logistic Regression: Minimize Cross-Entropy Loss

    𝐶𝐶𝑜𝑜𝑖𝑖𝑖𝑖 ℎ𝜃𝜃 𝑥𝑥 𝑖𝑖 ,𝑦𝑦 𝑖𝑖 = �− log ℎ𝜃𝜃 𝑥𝑥 𝑖𝑖 , 𝑖𝑖𝑖𝑖 𝑦𝑦 = 1

    − log 1 − ℎ𝜃𝜃 𝑥𝑥 𝑖𝑖 , 𝑖𝑖𝑖𝑖 𝑦𝑦 = 0

  • 𝐶𝐶𝑜𝑜𝑖𝑖𝑖𝑖 ℎ𝜃𝜃 𝑥𝑥 𝑖𝑖 ,𝑦𝑦 𝑖𝑖 = �− log ℎ𝜃𝜃 𝑥𝑥 𝑖𝑖 , 𝑖𝑖𝑖𝑖 𝑦𝑦 = 1

    − log 1 − ℎ𝜃𝜃 𝑥𝑥 𝑖𝑖 , 𝑖𝑖𝑖𝑖 𝑦𝑦 = 0

    𝐽𝐽 𝜃𝜃 =1𝑚𝑚�𝑖𝑖=1

    𝑚𝑚

    𝐶𝐶𝑜𝑜𝑖𝑖𝑖𝑖 ℎ𝜃𝜃 𝑥𝑥 𝑖𝑖 ,𝑦𝑦 𝑖𝑖 = −1𝑚𝑚

    [𝑦𝑦 𝑖𝑖 log ℎ𝜃𝜃 𝑥𝑥 𝑖𝑖 + 1 − 𝑦𝑦 𝑖𝑖 log(1 − ℎ𝜃𝜃 𝑥𝑥 𝑖𝑖 )]

    4 Logistic Regression: Minimize Cross-Entropy Loss

    • Cross-Entropy Loss

  • 4 Logistic regression: Decision Boundary

  • 4 Logistic regression: Decision Boundary

  • 4 Logistic regression vs Linear regression

  • 4 Logistic regression: Multiclass Classification

  • 4 Logistic regression: Multiclass Classification

  • 4 Logistic regression: Multiclass Classification

  • 5 Regularization: Overfitting

  • 5 Regularization: Overfitting

  • 5 Regularization: Overfitting

  • 5 Regularization

  • 5 Regularization: Linear Regression

  • 5 Regularization: Linear Regression

  • 5 Regularization: Linear Regression

  • 5 Regularization: Logistic Regression

  • 5 Regularization: Logistic Regression

  • 54

    突触

    树突

    轴突

    生物神经元的基本结构 人工神经元的结构模型

    6 Perceptron: Artificial Neuron

  • 6 Perceptron: Model • Perceptron

    • First compute a weighted sum of the inputs from other neurons.

    • Then output a 1 if the weighted sum exceeds the threshold.

  • 6 Perceptron: Minimizing Squared Errors

    𝐸𝐸 =12

    (𝑦𝑦(𝑖𝑖) − ℎ𝜃𝜃(𝑥𝑥(𝑖𝑖)))2

    𝜕𝜕𝐸𝐸𝜕𝜕𝜃𝜃𝑗𝑗

    = 𝑦𝑦 𝑖𝑖 − ℎ𝜃𝜃 𝑥𝑥 𝑖𝑖 × −1 × 𝑔𝑔, 𝑖𝑖𝑛𝑛 × 𝑥𝑥𝑗𝑗(𝑖𝑖)

    = - (𝑦𝑦(𝑖𝑖) − ℎ𝜃𝜃(𝑥𝑥(𝑖𝑖))) × 𝑥𝑥𝑗𝑗(𝑖𝑖)

    The squared error for a single training example with input x and true output y is:

    Gradient Descentundefined and omitted

    Update Rule

  • Homework (1)

    Task: Regression

    Dataset: Housing

    (https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/regression.html#

    housing)

    Algorithms: Linear Regression

  • Homework (2)

    Task: Classification

    Dataset: Mnist (http://yann.lecun.com/exdb/mnist/). The MNIST database

    of handwritten digits, available from this page, has a training set of 60,000

    examples, and a test set of 10,000 examples.

    Algorithms: Logistic Regression

    http://yann.lecun.com/exdb/mnist/)

  • Homework (3)

    Task: Classification

    Dataset: Breast Cancer Wisconsin (Diagnostic) Data Set

    (https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Dia

    gnostic%29)

    Algorithms: Perceptron

  • Tips for Homework

    • Evaluation Metrics:

    Regression: Root Mean Squared Error

    Classification: Accuracy

  • Tips for Homework

    • Plot Loss Function Figure

    Normal Overfitting Learning Rate

  • Tips for Homework• K-Folds Cross Validation

  • More Materials• 1. Deep Learning, deeplearning.ai (Andrew Ng et al.) @ Coursera 网易云课堂

    • 2. Andrew Ng. Machine Learning. Coursera,网易公开课

    • 3. 机器学习基石, Hsuan-Tien Lin, 林軒田,台湾大学 @ Coursera

    • 4. 机器学习技法, Hsuan-Tien Lin, 林軒田,台湾大学 @ Coursera

    • 5.CS231n: Convolutional Neural Networks for Visual Recognition Feifei Li et al., University of Stanford

    • 6.Ian Goodfellow, Yoshua Bengio, and Aaron Courville, Deep Learning, Book in preparation for MIT Press, 2016, http://www.deeplearningbook.org/

    • 7. 台大,李宏毅,http://speech.ee.ntu.edu.tw/~tlkagk/courses_ML16.html

    • MIT公开课,Gilbert Strang ,《线性代数》

    http://www.deeplearningbook.org/http://speech.ee.ntu.edu.tw/%7Etlkagk/courses_ML16.html

  • Programming Skills

    • Python

    • Ubuntu

    • Keras

    • Pytorch

    • Tensorflow

  • END

    CS229 Lecture 2Machine Learning

    顾晓玲[email protected]

    幻灯片编号 1Outlines1 Machine Learning:Overview1 Machine Learning: Overview2 Linear Regression: 1 Dimensional Input2 Linear Regression: 2 Dimensional Input2 Linear Regression: An Example2 Linear Regression: Terms and Concepts2 Linear Regression: Hypothesis 2 Linear Regression: Cost Function2 Linear Regression: Cost Function2 Linear Regression: Gradient Descent2 Linear Regression: Gradient Descent2 Linear Regression : Gradient Descent2 Linear Regression: Gradient Descent2 Linear Regression: Gradient Descent2 Linear Regression: Gradient Descent2 Linear Regression: Gradient Descent2 Linear Regression: Normal Equations �2 Linear Regression: Normal Equations �2 Linear Regression: Normal Equations �2 Linear Regression: Feature Scaling 2 Linear Regression: Normal Equations �2 Linear Regression: Normal Equations 2 Linear Regression: Normal Equations 2 Gradient Descent Vs. Normal Equations 3 Classification: Overview3 Classification: Application3 Classification: Algorithms 4 Logistic Regression: Overview� 4 Logistic Regression: : Hypothesis � 4 Logistic regression: Hypothesis �4 Logistic Regression: Sigmoid Function4 Logistic Regression: How to fit 𝜽? 4 Logistic Regression: Maximize the Likelihood 4 Logistic Regression: Maximize the likelihood 4 Logistic Regression: Minimize Cross-Entropy Loss4 Logistic Regression: Minimize Cross-Entropy Loss4 Logistic regression: Decision Boundary 4 Logistic regression: Decision Boundary 4 Logistic regression vs Linear regression4 Logistic regression: Multiclass Classification4 Logistic regression: Multiclass Classification4 Logistic regression: Multiclass Classification5 Regularization: Overfitting5 Regularization: Overfitting5 Regularization: Overfitting5 Regularization5 Regularization: Linear Regression5 Regularization: Linear Regression5 Regularization: Linear Regression5 Regularization: Logistic Regression5 Regularization: Logistic Regression6 Perceptron: Artificial Neuron6 Perceptron: Model �6 Perceptron: Minimizing Squared ErrorsHomework (1)�Homework (2)�Homework (3)�Tips for HomeworkTips for HomeworkTips for HomeworkMore MaterialsProgramming Skills幻灯片编号 65