딥러닝 개요 (2015-05-09 kistep)

24
Deep Learning Keunwoo.Choi @qmul.ac.uk Introduction Machine- Learning Deep learning Issues Summary Deep Learning A brief explanation Keunwoo.Choi @qmul.ac.uk Centre for Digital Music, Queen Mary University of London, UK 1/24

Upload: keunwoo-choi

Post on 13-Jan-2017

1.253 views

Category:

Engineering


0 download

TRANSCRIPT

Page 1: 딥러닝 개요 (2015-05-09 KISTEP)

Deep Learning

[email protected]

Introduction

Machine-Learning

Deep learning

Issues

Summary

Deep LearningA brief explanation

[email protected]

Centre for Digital Music, Queen Mary University of London, UK

1/24

Page 2: 딥러닝 개요 (2015-05-09 KISTEP)

Deep Learning

[email protected]

Introduction

Machine-Learning

Deep learning

Issues

Summary

1 Introduction

2 Machine-Learning

3 Deep learningOverviewNonlinearityWeightsSGDTraining

4 IssuesOverfittingBatch processingBack-propagationOther architecturesImageNet

5 Summary

2/24

Page 3: 딥러닝 개요 (2015-05-09 KISTEP)

Deep Learning

[email protected]

Introduction

Machine-Learning

Deep learning

Issues

Summary

Keunwoo Choi

PhD, QMUL, EECS, c4dm, 2014-presentSupervised by Mark Sandler and George FazekasMusic Recommendation, (Deep) Machine LearningInternship, Naver Labs, July-Oct 2015Visiting PhD, New York University, July-Dec 2016

ETRI, 2011-20143D Audio (WFS)

Master’s, SNU EECS, 2009-2011Applied Acoustics Laboratory, 3D Audio,Music Signal Processing

Bachelor. SNU EECS, 2005-2009

3/24

Page 4: 딥러닝 개요 (2015-05-09 KISTEP)

Deep Learning

[email protected]

Introduction

Machine-Learning

Deep learning

Issues

Summary

Research Topics

Music Feature ExtractionsAnalysis of deep CNNs (ISMIR LDB 2015, MLSP 2016)Auto-Tagging using deep CNN (ISMIR 2016)

Playlist GenerationRNN-based playlist generation (ICML workshop 2016)

Music Captioning

Automatic CompositionText-based chords and drums (CSMC 2016)

4/24

Page 5: 딥러닝 개요 (2015-05-09 KISTEP)

Deep Learning

[email protected]

Introduction

Machine-Learning

Deep learning

Issues

Summary

Machine LearningMore correctly, supervised learning

Given a goal

Given data x , y

Train an algorithm that best matches x ! y

and validated using unseen x (good generalisation)”Do not memorise the examples!”

Conventional approaches:Feature extraction + ClassifierResearchers and experts hand-craft the featuresClassifier (e.g. SVM) is trained to achieve the goal

5/24

Page 6: 딥러닝 개요 (2015-05-09 KISTEP)

Deep Learning

[email protected]

Introduction

Machine-Learning

Deep learning

Issues

Summary

Machine LearningProblems of the conventional approaches

Hand-crafting takes resources

E.g. MFCCs (speech recognition), Histogram of Gradient,SIFT (computer vision)

Hand-crafting is not automatically optimisablebut a Jang-in-jeong-sin thingy.

Is a Jang-in better than machines?

6/24

Page 7: 딥러닝 개요 (2015-05-09 KISTEP)

Deep Learning

[email protected]

Introduction

Machine-Learning

Deep learning

Overview

Nonlinearity

Weights

SGD

Training

Issues

Summary

NN vs. DNN

1

Logistic regression: No hidden layer

Neural Networks: 1 hidden layer

Deep NN: N hidden layers (N>1)

1extremetech.com

7/24

Page 8: 딥러닝 개요 (2015-05-09 KISTEP)

Deep Learning

[email protected]

Introduction

Machine-Learning

Deep learning

Overview

Nonlinearity

Weights

SGD

Training

Issues

Summary

DEMOTensorFlow Playground

Logistic regression: No hidden layer

Neural Networks: 1 hidden layer

Deep NN: N hidden layers (N>1)

Logistic regression Logistic regression failsNN works well! NN failsShallow NN is okay The bigger, the better

8/24

Page 9: 딥러닝 개요 (2015-05-09 KISTEP)

Deep Learning

[email protected]

Introduction

Machine-Learning

Deep learning

Overview

Nonlinearity

Weights

SGD

Training

Issues

Summary

DL OverviewA motivation to deep leanring

Brain and human sensory system

Neurons are identical

Many (100B) identical neurons with suitable structures

Human learns by examples

Human sensory systems are deep

Parallel and serial neuron structure

9/24

Page 10: 딥러닝 개요 (2015-05-09 KISTEP)

Deep Learning

[email protected]

Introduction

Machine-Learning

Deep learning

Overview

Nonlinearity

Weights

SGD

Training

Issues

Summary

DL OverviewA motivation to deep leanring

Do not need to hand-craft features

Black box includes [feature extraction ! classifier]

The whole procedure is computationally optimised toachieve the goal

by iterative, heavy-computational methodshave outperformed many Jang-in’s

10/24

Page 11: 딥러닝 개요 (2015-05-09 KISTEP)

Deep Learning

[email protected]

Introduction

Machine-Learning

Deep learning

Overview

Nonlinearity

Weights

SGD

Training

Issues

Summary

ComparisonExample task: Speech recognition

Method Conventional ML Deep Learning

Feature

MFCCs(FFT ! mel-scaleaggregation!DCT!time-

derivative!ignore firstcoe↵!..)

FFT!NN

Classifier SVM, GMM NN

Every computation, parameters, weights is automaticallydecided by during training

11/24

Page 12: 딥러닝 개요 (2015-05-09 KISTEP)

Deep Learning

[email protected]

Introduction

Machine-Learning

Deep learning

Overview

Nonlinearity

Weights

SGD

Training

Issues

Summary

DLNonlinearity

Single layer performs a nonlinear mapping using �Let x=input vector, y=output vector,

NN: y = �2(W2�1(W1x))

DNN: Stacked (=deep) layers perform a more nonlinearand complex mapping

y = �6(W6�5(W5�4(W4�3(W3�2(W2�1(W1x))))))

Stacked layers = stacked Nonlinearity! 2

Multiple linear layers, otherwise, can be compressed intoone layer

2best explained in Colah’s blog

12/24

Page 13: 딥러닝 개요 (2015-05-09 KISTEP)

Deep Learning

[email protected]

Introduction

Machine-Learning

Deep learning

Overview

Nonlinearity

Weights

SGD

Training

Issues

Summary

DLWeights (= parameters)

NN = nonlinear �() and weights W

For �(), we use ReLU and its variants

DNN = Combination of ReLU and many Wi

’s

We want...the network to be trained to do the all dirty works -feature extraction and classification(=W

i

’s that do what we order to do)the network to learn by examples(=find the optimal W using training data)

How do we train? ! SGD

13/24

Page 14: 딥러닝 개요 (2015-05-09 KISTEP)

Deep Learning

[email protected]

Introduction

Machine-Learning

Deep learning

Overview

Nonlinearity

Weights

SGD

Training

Issues

Summary

Deep LearningHow it learns - by SGD!

SGD: Stochastic Gradient Descent

SGD computationally finds w so that J(w) is minimisedSGD iteratively finds w so that J(w) is minimisedSGD gradually finds w so that J(w) is minimised

w is updated to minimise J(w)

(J(w) J(w)� @J(w)@w )

...if J(w) is di↵erentiable14/24

Page 15: 딥러닝 개요 (2015-05-09 KISTEP)

Deep Learning

[email protected]

Introduction

Machine-Learning

Deep learning

Overview

Nonlinearity

Weights

SGD

Training

Issues

Summary

Deep LearningHow it learns - by SGD!

Loss function J(w)

A function that we want to minimise to achieve the goal

y

estimation

= �4(W4�3(W3�2(W2�1(W1x))))

y

true

is given in the dataset

E.g. l2: J(w) = (yestimation

� y

true

)2

Loss function measures how well the current algorithm isperforming

15/24

Page 16: 딥러닝 개요 (2015-05-09 KISTEP)

Deep Learning

[email protected]

Introduction

Machine-Learning

Deep learning

Overview

Nonlinearity

Weights

SGD

Training

Issues

Summary

Deep LearningHow it learns

We have (a set of) x and y

true

(aka dataset)

We decide a loss function

y

estimation

= �4(W4�3(W3�2(W2�1(W1x))))

J(w) = a function of (yestimation

, ytrue

)w is updated and becomes better weights= training is performed by SGD= the DNN is optimised

16/24

Page 17: 딥러닝 개요 (2015-05-09 KISTEP)

Deep Learning

[email protected]

Introduction

Machine-Learning

Deep learning

Overview

Nonlinearity

Weights

SGD

Training

Issues

Summary

Deep LearningThe whole learning procedure

Prepare a training dataset (x , y)

Get a DNN configured (number of layers, nodes, lossfunction)

for many times:for every x , y : (do SGD)

compute y

estimation

= f (x ,w) (go through DNN)update W according to the current loss,loss(y

true

, yestimation

)

Done!

17/24

Page 18: 딥러닝 개요 (2015-05-09 KISTEP)

Deep Learning

[email protected]

Introduction

Machine-Learning

Deep learning

Overview

Nonlinearity

Weights

SGD

Training

Issues

Summary

Break!

Q&A

playground.tensorflow.org

18/24

Page 19: 딥러닝 개요 (2015-05-09 KISTEP)

Deep Learning

[email protected]

Introduction

Machine-Learning

Deep learning

Issues

Overfitting

Batch processing

Back-propagation

Otherarchitectures

ImageNet

Summary

Overfitting

Overfitting

When the network memorises the training data and fails togeneralise

3

A general problem in ML

Example: ,3hrefhttp://cs231n.github.io/neural-networks-3/cs231n

19/24

Page 20: 딥러닝 개요 (2015-05-09 KISTEP)

Deep Learning

[email protected]

Introduction

Machine-Learning

Deep learning

Issues

Overfitting

Batch processing

Back-propagation

Otherarchitectures

ImageNet

Summary

Batch Gradient Descent

Batch Gradient Descent

Compute GD with seeing more than 1 examples simultaneously

Every computation ofy

estimation

= �4(W4�3(W3�2(W2�1(W1x))))is done by matrix computationsQuicker in GPU (because GPU is specialised at computinglarge matrix computations)Less zig-zag

4

4www.holehouse.org

20/24

Page 21: 딥러닝 개요 (2015-05-09 KISTEP)

Deep Learning

[email protected]

Introduction

Machine-Learning

Deep learning

Issues

Overfitting

Batch processing

Back-propagation

Otherarchitectures

ImageNet

Summary

Back-propatagionaka backprop

5

The essence inside Gradient Descent of NN

The way to compute the derivatives of all weights, @J(w)@w

so that J(w) can be updated as J(w)� @J(w)@w

Discovered by Rumelhart, Hinton, and Williams (1986)

5extremetech.com

21/24

Page 22: 딥러닝 개요 (2015-05-09 KISTEP)

Deep Learning

[email protected]

Introduction

Machine-Learning

Deep learning

Issues

Overfitting

Batch processing

Back-propagation

Otherarchitectures

ImageNet

Summary

Other architectures

Convolutional Networksby LeCun (in Facebook AI Research and NYU)Biological visual systemsVery widely used in almost every DL problem

Recurrent networksSequences (text) and time-series data (speech, weather,stock price,...)

22/24

Page 23: 딥러닝 개요 (2015-05-09 KISTEP)

Deep Learning

[email protected]

Introduction

Machine-Learning

Deep learning

Issues

Overfitting

Batch processing

Back-propagation

Otherarchitectures

ImageNet

Summary

ImageNet competition

6

14M images in 1K categoriesHave enabled to test new algorithms in DL

6Slide from NVIDIA

23/24

Page 24: 딥러닝 개요 (2015-05-09 KISTEP)

Deep Learning

[email protected]

Introduction

Machine-Learning

Deep learning

Issues

Summary

Resources

Deeplearning4j tutorials (Korean)

ML lecture in Coursera, Stanford

cs231n from Stanford

24/24