transfer learning: an overview
TRANSCRIPT
迁移学习理论与应用TransferLearning:AnOverview
杨强,香港科大 Qiang Yang, HKUST
Thanks:
Sinno Jialin Pan, NTU, Singapore Ying Wei, HKUST, Hong Kong Ben Tan, HKUST, Hong Kong
Apsychologicalpointofview
• Transfer of Learning(学习迁移)inEduca7onandPsychology– Thestudyofdependencyofhumanconduct,learningorperformanceonpriorexperience.
– [Thorndike and Woodworth, 1901] explored how individuals would transfer in one context to another context that share similar characteristics.
• E.g. ! C++ " Java ! Math/Physics "ComputerScience/Economics
2
Transfer LearningInthemachinelearningcommunity
• The ability of a system to recognize and apply knowledge and skills learned in previous domains/tasks to novel tasks/domains, which share some commonality.
• Given a target domain/task, how to transfer knowledge to new domains/tasks (target)?
• Key: – Representation Learning, Change of Representation
3
Why Transfer? ! Buildeverymodelfromscratch?
# Timeconsumingandexpensive# Expense:
• DataCollec7on/Labeling• Privacy• Timetotrain
! Reusecommonknowledgeextractedfromexis7ngsystems?# Moreprac7cal
4
Why Transfer Learning?
5
Source Domain Data
Target Domain Data
Predictive Models
Labeled Training
Unlabeled data/a few labeled data for adaptation
Transfer Learning Algorithms
Target Domain Data
Testing
Electronics
Time Period A
Device A
DVD Device B Time Period B
Transfer Learning Different fields
• Transferlearningforreinforcementlearning.
[Taylor and Stone, Transfer Learning for Reinforcement Learning Domains: A Survey, JMLR 2009]
• Transferlearningforclassifica7on,andregressionproblems.
[Pan and Yang, A Survey on
Transfer Learning, IEEE TKDE 2010]
6
Focus!
Indoor WiFi Localization (cont.)
8
Training
Training Test
Device A
Test
Device B
~ 1.5 meters
~10 meters
Device A
Device A
S=(-37dbm,..,-77dbm),L=(1,3)S=(-41dbm,..,-83dbm),L=(1,4)…S=(-49dbm,..,-34dbm),L=(9,10)S=(-61dbm,..,-28dbm),L=(15,22)
S=(-37dbm,..,-77dbm)S=(-41dbm,..,-83dbm)…S=(-49dbm,..,-34dbm)S=(-61dbm,..,-28dbm)
S=(-37dbm,..,-77dbm)S=(-41dbm,..,-83dbm)…S=(-49dbm,..,-34dbm)S=(-61dbm,..,-28dbm)
S=(-33dbm,..,-82dbm),L=(1,3)…S=(-57dbm,..,-63dbm),L=(10,23)
Localization model
Localization model
Drop!
Average Error Distance
Sentiment Classification (cont.)
11
Training
Training Test
Electronics
Test
~ 84.6%
~72.65%
Sentiment Classifier
Sentiment Classifier
Drop! Electronics
Classification Accuracy
Electronics DVD
Difference in Representation
12
Electronics Video Games (1) Compact; easy to operate; very good picture quality; looks sharp!
(2) A very good game! It is action packed and full of excitement. I am very much hooked on this game.
(3) I purchased this unit from Circuit City and I was very excited about the quality of the picture. It is really nice and sharp.
(4) Very realistic shooting action and good plots. We played this and were hooked.
(5) It is also quite blurry in very dark settings. I will never buy HP again.
(6) The game is so boring. I am extremely unhappy and will probably never buy UbiSoft again.
AMajorAssump7oninTradi7onalMachineLearning
! Training and future (test) data come from the same domain, which implies
# Represented in the same feature spaces.
# Follow the same data distribution.
13
MachineLearning:Yesterday,TodayandTomorrow
14
DeepLearning:Features
ReinforcementLearning:Rewards
TransferLearning:Adapta7on
Yesterday Today Tomorrow
MachineLearning:Yesterday,TodayandTomorrow
15
DeepLearning:LotsofDataOnlytheRich
ReinforcementLearning:
LotsofDataOnlytheRich
TransferLearning:FewDataEveryone
Yesterday Today Tomorrow
DifferentScenarios• Trainingandtes7ngdatamaycomefromdifferentdomains:# Different different feature spaces/ marginal
distributions:
# Different conditional distributions or different label spaces:
16
Transfer Learning Approaches
17
Instance-based Approaches
Feature-based Approaches
Parameter/Model -based Approaches
Relational Approaches
Instance-based Transfer Learning Approaches
Source and target domains have a lot of overlapping features
18
General Assumption
Instance-based Transfer Learning Approaches
CaseI:UnlabeledTarget
CaseII:SomeLabelsinTarget
19
Problem Setting
Assumption Assumption
Problem Setting
Instance-based Approaches Case I (cont.)
22
Correcting Sample Selection Bias / Covariate Shift [Quionero-Candela, etal, Data Shift in Machine Learning, MIT Press 2009]
Instance-based Approaches Correcting sample selection bias (cont.)
• Thedistribu7onoftheselectorvariablemapsthetargetontothesourcedistribu7on
23
! Label instances from the source domain with label 1! Label instances from the target domain with label 0! Train a binary classifier
[Zadrozny, ICML-04]
Instance-based Approaches Kernel mean matching (KMM)
Maximum Mean Discrepancy (MMD) [Alex Smola, Arthur Gretton and Kenji Kukumizu, ICML-08 tutorial]
24
Instance-based Approaches Direct density ratio estimation
25
[Sugiyama etal., NIPS-07, Kanamori etal., JMLR-09]
KL divergence loss Least squared loss
[Sugiyama etal., NIPS-07] [Kanamori etal., JMLR-09]
Instance-based Approaches Case II
• Intui7on:Part of the labeled data in the source domain can be reused in the target domain after re-weighting
26
Instance-based Approaches Case II (cont.)
! TrAdaBoost [Dai etal ICML-07] – For each boosting iteration,
# Use the same strategy as AdaBoost to update the weights of target domain data.
# Use a new mechanism to decrease the weights of misclassified source domain data.
27
Feature-based Transfer Learning Approaches
When source and target domains only have some overlapping features. (lots of features only have support in either the source or the target domain)
28
Feature-based Transfer Learning Approaches (cont.)
Howtolearn?! Solution 1: Encode application-specific
knowledge to learn the transformation.
! Solution 2: General approaches to learning the transformation.
29
Feature-based Approaches Encode application-specific knowledge
30
Electronics Video Games (1) Compact; easy to operate; very good picture quality; looks sharp!
(2) A very good game! It is action packed and full of excitement. I am very much hooked on this game.
(3) I purchased this unit from Circuit City and I was very excited about the quality of the picture. It is really nice and sharp.
(4) Very realistic shooting action and good plots. We played this and were hooked.
(5) It is also quite blurry in very dark settings. I will never_buy HP again.
(6) The game is so boring. I am extremely unhappy and will probably never_buy UbiSoft again.
Feature-based Approaches Encode application-specific knowledge (cont.)
31
compact sharp blurry hooked realistic boring 1 1 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0
( ) sgn( ), [1,1, 1,0,0,0]Ty f x w x w= = ⋅ = −
compact sharp blurry hooked realistic boring 0 0 0 1 0 0 0 0 0 1 1 0 0 0 0 0 0 1
Electronics
Video Game
Training
Prediction
Feature-based Approaches Encode application-specific knowledge (cont.)
32
Electronics Video Games (1) Compact; easy to operate; very good picture quality; looks sharp!
(2) A very good game! It is action packed and full of excitement. I am very much hooked on this game.
(3) I purchased this unit from Circuit City and I was very excited about the quality of the picture. It is really nice and sharp.
(4) Very realistic shooting action and good plots. We played this and were hooked.
(5) It is also quite blurry in very dark settings. I will never_buy HP again.
(6) The game is so boring. I am extremely unhappy and will probably never_buy UbiSoft again.
Feature-based Approaches Encode application-specific knowledge (cont.)
! Three different types of features ! Source domain (Electronics) specific features, e.g., compact, sharp, blurry ! Target domain (Video Game) specific features, e.g., hooked, realistic, boring ! Domain independent features (pivot features), e.g., good, excited, nice, never_buy
33
Feature-based Approaches Encode application-specific knowledge (cont.)
! How to identify pivot features? ! Term frequency on both domains ! Mutual information between features and labels (source
domain) ! Mutual information on between features and domains
! How to utilize pivots to align features across domains? ! Structural Correspondence Learning (SCL) [Biltzer etal.
EMNLP-06] ! Spectral Feature Alignment (SFA) [Pan etal. WWW-10]
34
Feature-based Approaches Spectral Feature Alignment (SFA)
! Intuition # Use a bipartite graph to model the correlations
between pivot features and other features # Discover new shared features by applying
spectral clustering techniques on the graph
35
! If two domain-specific words have connections to more common pivot words in the graph, they tend to be aligned or clustered together with a higher probability. ! If two pivot words have connections to more common domain-specific words in the graph, they tend to be aligned together with a higher probability.
Spectral Feature Alignment (SFA) High level idea
36
exciting
good
never_buy sharp
boring
blurry
hooked
compact
realistic Pivot features
Domain-specific features
7 6
8 3
6
2
4
5
Electronics
Video Game
exciting
good
never_buy sharp
boring
blurry
hooked
compact
realistic Pivot features
Domain-specific features
7 6
8 3
6
2
4
5
Electronics
Video Game
boring realistic
hooked
blurry
sharp
compact
Electronics
Video Game Electronics
Electronics Video Game
Video Game
Derive new features
Spectral Clustering
37
Spectral Feature Alignment (SFA) Derive new features (cont.)
sharp/hooked compact/realistic blurry/boring 1 1 0 1 0 0 0 0 1
38
( ) sgn( ), [1,1, 1]Ty f x w x w= = ⋅ = −
sharp/hooked compact/realistic blurry/boring 1 0 0 1 1 0 0 0 1
Electronics
Video Game
Training
Prediction
Spectral Feature Alignment (SFA)1. Identify P pivot features 2. Construct a bipartite graph between the pivot and
remaining features. 3. Apply spectral clustering on the graph to derive
new features 4. Train classifiers on the source using augmented
features (original features + new features)
39
Feature-based Approaches Develop general approaches
40
Time Period A Time Period B
Device B
Device A
Feature-based Approaches Transfer Component Analysis [Pan etal., IJCAI-09, TNN-11]
41
Target Source
Latent factors
Temperature Signal properties
Building structure
Power of APs
Motivation
Transfer Component Analysis (cont.)
42
Target Source
Latent factors
Temperature Signal properties
Building structure
Power of APs
Causes the data distributions between two domains to be different
Transfer Component Analysis (cont.)
43
Target Source
Signal properties
Noisy component
Building structure
Principal components
Transfer Component Analysis (cont.) Main idea: the learned should map the source and target domain data to the latent space spanned by the factors which can reduce domain difference and preserve original data structure.
45
High level optimization problem
Transfer Component Analysis (cont.)
47
An illustrative exampleLatent features learned by PCA and TCA
PCA Original feature space TCA
Feature-based Approaches Self-taught Feature Learning (Andrew Ng. et al.)
! Intuition: Useful higher-level features can be learned from unlabeled data.
! Steps: 1) Learn higher-level features from a lot of unlabeled data. 2) Use the learned higher-level features to represent the data of the
target task. 3) Train models from the new representations of the target task
(supervised)
! How to learn higher-level features # Sparse Coding [Raina etal., 2007] # Deep learning [Glorot etal., 2011]
48
Feature-based Approaches Mul7-taskFeatureLearning
! Assumption: If tasks are related, they should share some good common features.
! Goal: Learn a low-dimensional representation shared across related tasks.
49
General Multi-task Learning Setting
Multi-task Learning Assumption: If tasks are related, they may share similar parameter vectors. For example, [Evgeniou and Pontil, KDD-04]
50
Common part
Specific part for individual task
TransferLearningwithDeepLearning TransferLearningPerspec7ve:WhyneedDeepLearning?
• Deepneuralnetworkslearnnonlinearrepresenta7ons– thatarehierarchical;– thatdisentangledifferent
explanatoryfactorsofvaria7onbehinddatasamples;
– thatmanifestinvariantfactorsunderlyingdifferentpopula7ons.
DeepLearningPerspec7ve:WhyneedTransferLearning?
• TransferLearningalleviates– theincapabilityoflearningon
adatasetwhichmaynotbelargeenoughtotrainanen7redeepneuralnetworkfromscratch
BenchmarkDataset:Office • Descrip7on:leveragesourceimagestoimproveclassifica7onoftargetimages
3 domains
31 c
ateg
orie
s back
pack
bike
amazon 2,817 images
webcam
957 images dslr
795 images
object images in Amazon
low-resolution images taken from a web
camera
high-resolution images taken from a digital SLR
camera
Results Unsupervised domain adaptation Amazon→Webcam over time
2011 2012 2013 2014 2015
Mul
ti-cl
ass a
ccur
acy
10 20
30 40
50 60
70
TCA GFK
CNN
DDC
DLID
DAN BA
TL without DL TL with DL DL without TL
With Deep Learning, Transfer Learning improves. Applying Transfer Learning techniques outperforms directly applying Deep Learning models trained on the source.
DASH-N [1] Finetuning [2,3]
SCNN [4]
Overview • Overview
supervised unsupervised
single modality
multiple modalities
[5] DLID [6] DCC [7] DAN [8] BA [9]
SHL-MDNN [10] ST [11]
Are there labelled target data?
Are dimensions of source and target domains equal?
[12] Ngiam, Jiquan, et al. "Multimodal deep learning." ICML. 2011. [13] Srivastava, Nitish, and Ruslan Salakhutdinov. "Multimodal learning with deep Boltzmann machines." JMLR. 2014 [14] Sohn, Kihyuk, Wenling Shang, and Honglak Lee. "Improved multimodal deep learning with variation of information." NIPS. 2014.
DBN [12] DBM [13]
MDRNN [14]
[5] Glorot, Xavier, Antoine Bordes, and Yoshua Bengio. "Domain adaptation for large-scale sentiment classification: A deep learning approach." ICML. 2011. [6] Chopra, Sumit, Suhrid Balakrishnan, and Raghuraman Gopalan. "Dlid: Deep learning for domain adaptation by interpolating between domains." ICML. 2013. [7] Tzeng, Eric, et al. "Deep domain confusion: Maximizing for domain invariance." arXiv preprint arXiv:1412.3474. 2014. [8] Long, Mingsheng, and Jianmin Wang. "Learning transferable features with deep adaptation networks." arXiv preprint arXiv:1502.02791. 2015. [9] Ganin, Yaroslav, and Victor Lempitsky. "Unsupervised Domain Adaptation by Backpropagation." ICML. 2015.
[10] Huang, Jui-Ting, et al. "Cross-language knowledge transfer using multilingual deep neural network with shared hidden layers." ICASSP. 2013. [11] Gupta, Saurabh, Judy Hoffman, and Jitendra Malik. "Cross Modal Distillation for Supervision Transfer." arXiv preprint arXiv:1507.00448. 2015.
[1] Nguyen, Hien V., et al. "Joint hierarchical domain adaptation and feature learning." PAMI. 2013. [2] Oquab, Maxime, et al. "Learning and transferring mid-level image representations using convolutional neural networks." CVPR. 2014.[3] Yosinski, Jason, et al. "How transferable are features in deep neural networks?." NIPS. 2014. [4] Tzeng, Eric, et al. "Simultaneous deep transfer across domains and tasks." CVPR. 2015.
SingleModality
• Directlyapplyingthemodelparameters(deepneuralnetworkweights)fromthesourcetotarget source
domain
input outpu
t
target domai
n
input outpu
t
shared weights Arethefeaturestransferrable?
SingleModality • Transferability of layer-wise features
ImageNet1000class
A:500class
B:500class
sourcedomain
targetdomain
randomsplit
baseA: train all layers with AbaseB: train all layers with B
BnB: initialize the first n layers with baseB and fix, randomly initialize the other layers and train with BBnB+: initialize the first n layers with baseB, randomly initialize the other layers, and train all layers with B
AnB: initialize the first n layers with baseA and fix, randomly initialize the other layers and train with BAnB+: initialize the first n layers with baseA, randomly initialize the other layers, and train all layers with B
SingleModality • Transferability of layer-wise features
[3]
Conclusion 1: lower layer features are more general and transferrable, and higher layer features are more specific and non-transferrable.Conclusion 2: transferring features + fine-tuning always improve generalization. What if we do not have any labelled data to finetune in the target domain? What happens if the source and target domain are very dissimilar?
ImageNet is not randomly split, but into A = {man-made classes} andB = {natural classes}
SingleModality • Generalframeworkofunsupervisedtransfer
source domai
n
input outpu
t
target domai
n
input outpu
t
domaindistanceloss
For lower level features (more general & transferrable), the source transfers to the target directly.
For higher level features (more domain specific & not transferrable), the source transfers to the target by minimizing domain distances.
shared weights
If some labelled target data are available, it would be better.
SingleModality
• Overalltrainingobjec7ve
• Domaindistancelosses– MaximumMeanDiscrepancy[7]
source domain classification loss domain distance loss
a particular representation, e.g. the representation after 5th layer
SingleModality
• Domaindistancelosses– MK-MMD(Mul7-kernelvariantofMMD)[8]
– Domainclassifier[4,9]
an embedding
A distribution-free metric - maximizes the domain classification error
Learn a more flexible distance metric than MMD by adjusting
SingleModality • Otherfactorstoimprovetransfer
– Whichlayersshouldthedomaindistancelossbeconsidered?
• Bylearning,pinpointthelayerthatminimizesthedomaindistanceamongallspecificlayers,saythefourth.[7]
• Allthespecificlayers,saythelasttwolayers.[8]
source domain
input output
target domain
input
domaindistanceloss
SingleModality • Otherfactorstoimprovetransfer– Whenwehavesometrainingdatainthetargetdomain?
• sojlabelsupervision[4]:categorieswithoutanylabeledtargetdataares7llupdatedtooutputnon-zeroprobabili7es
target doma
in
input outpu
t
source domain
Mul7pleModali7es • Thesourcedomainandtargetdomaincouldhavedifferentfeature
spaces,i.e.,dimensionality.– Mul7mediaontheweb
• Images• Textdocuments• Audio• Video
– Recommendersystems• Douban• Taobao• XiamiMusic
– Robo7cs• Vision• Audio• Sensors
How to deal with multi-modal transfer with Deep Learning?
Mul7pleModali7es
• Key
The cat is sitting on a sofa with ears cocking.
shared concept
cat ears kitten eyes
Mul7pleModali7es
• Generalframeworkofunsupervisedtransfer source domai
n
input
target domai
n
input
common
Pairedloss
reconstruction layer
reconstruction layer
Reconstruction errors: Paired loss: the similarity of a pair of source and target instances is preserved in the common space. Paired loss:
similarity
Mul7pleModali7es
• Generalframeworkofsupervisedtransfer outpu
t
output
Pairedloss
Classification loss:
source domai
n
input
target domai
n
input
common
MIR-FlickrDataset • 1 million images with user-generated tags
• 25,000 images are labelled with 24 categories • 10,000 for training, 5,000 for validation, 10,000 for testing
categories baby, female, portrait, people
plant life, river, water
clouds, sea, sky,transport, water
animals, dog,food
domain 1: images
domain 2: text
Results Mean Average Precision (MAP) by applying LR to different layers [13]
Transferring either one of the two domains to the other (joint hidden), outperforms the domain itself (image_input OR text_input).
DBN [12]DBM [13]
References [1] Nguyen, Hien V., et al. "Joint hierarchical domain adaptation and feature learning." PAMI. 2013. [2] Oquab, Maxime, et al. "Learning and transferring mid-level image representations using convolutional neural networks." CVPR. 2014.[3] Yosinski, Jason, et al. "How transferable are features in deep neural networks?." NIPS. 2014. [4] Tzeng, Eric, et al. "Simultaneous deep transfer across domains and tasks." CVPR. 2015. [5] Glorot, Xavier, Antoine Bordes, and Yoshua Bengio. "Domain adaptation for large-scale sentiment classification: A deep learning approach." ICML. 2011. [6] Chopra, Sumit, Suhrid Balakrishnan, and Raghuraman Gopalan. "Dlid: Deep learning for domain adaptation by interpolating between domains." ICML. 2013. [7] Tzeng, Eric, et al. "Deep domain confusion: Maximizing for domain invariance." arXiv preprint arXiv:1412.3474. 2014. [8] Long, Mingsheng, and Jianmin Wang. "Learning transferable features with deep adaptation networks." arXiv preprint arXiv:1502.02791. 2015. [9] Ganin, Yaroslav, and Victor Lempitsky. "Unsupervised Domain Adaptation by Backpropagation." ICML. 2015. [10] Huang, Jui-Ting, et al. "Cross-language knowledge transfer using multilingual deep neural network with shared hidden layers." ICASSP. 2013. [11] Gupta, Saurabh, Judy Hoffman, and Jitendra Malik. "Cross Modal Distillation for Supervision Transfer." arXiv preprint arXiv:1507.00448. 2015. [12] Ngiam, Jiquan, et al. "Multimodal deep learning." ICML. 2011. [13] Srivastava, Nitish, and Ruslan Salakhutdinov. "Multimodal learning with deep Boltzmann machines." JMLR. 2014 [14] Sohn, Kihyuk, Wenling Shang, and Honglak Lee. "Improved multimodal deep learning with variation of information." NIPS. 2014.
SimultaneousDeepTransferAcrossDomainsandTasksEricTzeng,JudyHoffman,TrevorDarrell,KateSaenko,ICCV2015
Oquab, Bottou, Laptev, Sivic: Learning and Transferring Mid-Level Image Representations using Convolutional Neural Networks. CVPR 2014.
TransferLearninginConvolu7onalNeuralNetworks
• SourceDomain:ImageNet– 1000classes,1.2millionimages
• TargetDomain:PascalVOC2007objectclassifica7on– 20classes,about5000images
• PRE-1000C:theproposedmethod
DeCAF:ADeepConvolu7onalAc7va7onFeatureforGenericVisualRecogni7on
• JeffDonahue,YangqingJia,OriolVinyals,JudyHoffman,NingZhang,EricTzeng,TrevorDarrell.ICML2014
• Ques7ons:– Howtotransferfeaturestotaskswithdifferentlabels– DofeaturesextractedfromtheCNNgeneralizetootherdatasets?– Howdoesperformancevarywithnetworkdepth?
• Algorithm:– Adeepconvolu7onalmodelisfirsttrainedinafullysupervisedseqng
usingastate-of-the-artmethodKrizhevskyetal.(2012).– extractvariousfeaturesfromthisnetwork,andevaluatetheefficacyof
thesefeaturesongenericvisiontasks.
78
Relational Transfer Learning Approaches
! Motivation:! Iftwologicallydescribeddomains(rela7onal,dataisnon-i.i.d)arerelated,theymustsharesimilarrela)onsamongobjects.
! Theserela7onscanbeusedfortransferlearning
80
Relational Transfer Learning Approaches (cont.)
81
Actor(A) Director(B) WorkedFor
Movie (M)
Student (B) Professor (A) AdvisedBy
Paper (T)
Publication Publication
Academic domain (source) Movie domain (target)
MovieMember MovieMember
AdvisedBy (B, A) ˄ Publication (B, T) => Publication (A, T)
WorkedFor (A, B) ˄ MovieMember (A, M) => MovieMember (B, M)
P1(x, y) ˄ P2 (x, z) => P2 (y, z)
[Mihalkova etal., AAAI-07, Davis and Domingos, ICML-09]
83
Query Classification and Online Advertisement
• ACM KDDCUP 05 Winner
• SIGIR 06 • ACM Transactions on
Information Systems Journal 2006 – Joint work with Dou
Shen, Jiantao Sun and Zheng Chen
84 84
QC as Machine Learning
Inspired by the KDDCUP’05 competition – Classify a query into a ranked list of categories – Queries are collected from real search engines – Target categories are organized in a tree with each
node being a category
85
Target-transfer Learning in QC
• Classifier, once trained, stays constant – Target Classes Before
• Sports, Politics (European, US, China) – Target Classes Now
• Sports (Olympics, Football, NBA), Stock Market (Asian, Dow, Nasdaq), History (Chinese, World) How to allow target to change?
• Application: – advertisements come and go, – but our query"target mapping needs not be retrained!
• We call this the target-transfer learning problem
86 86
Solutions: Query Enrichment + Staged Classification
Target CategoriesQueries
Solution: Bridging classifier
Construction of Synonym- based
Classifiers
Construction of Statistical Classifier
QuerySearch Engine
Labels of Returned
Pages
Text of Returned
Pages
Classified results
Classified results
Finial ResultsPhase II: the testing phase
Phase I: the training phase
The Architecture of Our Approach
87 87
$ Category information
Full text
Step 1: Query enrichment
• Textual information
Title Snippet
Category
88 88
Step 2: Bridging Classifier
• Wish to avoid: – When target is changed, training needs to repeat!
• Solution: – Connect the target taxonomy and queries by
taking an intermediate taxonomy as a bridge
89 89
Bridging Classifier (Cont.)
$ How to connect?
Prior prob. of IjC
The relation between and I
jC
TiC
The relation between and I
jCq
The relation between and T
iCq
90 90
Category Selection for Intermediate Taxonomy
– Category Selection for Reducing Complexity
• Total Probability (TP)
• Mutual Information
91
Result of Bridging Classifiers
– Using bridging classifier allows the target classes to change freely • no the need to retrain the classifier!
$ Performance of the Bridging Classifier with Different Granularity of Intermediate Taxonomy
CrossDomainAc7vityRecogni7on[Zheng,Hu,Yang,Ubicomp2009]
• Challenges:– Anewdomainofac7vi7eswithoutlabeleddata
• Cross-domainac7vityrecogni7on– Transfersomeavailablelabeleddatafromsourceac7vi7estohelptrainingtherecognizerforthetargetac7vi7es.
92
CleaningIndoor
Laundry
Dishwashing
Howtousethesimilari7es?
93
SourceDomainLabeledData
SimilarityMeasure
<SensorReading,Ac7vityName>
Example:<SS,“MakeCoffee”>
sim(“MakeCoffee”,“MakeTea”)=0.6
PseudoTrainingData:<SS,“Make
Tea”,0.6>
TargetDomainPseudoLabeled
Data
WeightedSVMClassifier
THEWEB
Calcula7ngAc7vitySimilari7es! Howsimilararetwoac7vi7es?◦ UseWebsearchresults◦ TFIDF:Tradi7onalIRsimilaritymetrics(cosinesimilarity)◦ Example" Minedsimilaritybetweentheac7vity“sweeping”and“vacuuming”,“makingthebed”,“gardening”
CalculatedSimilaritywiththeactivity"Sweeping"
Similaritywiththeactivity"Sweeping"
94
Cross-DomainAR:PerformanceMean Accuracy with Cross Domain Transfer
# Activities (Source Domain)
# Activities (Target Domain)
Baseline (Random Guess)
MIT Dataset (Cleaning to Laundry)
58.9% 13 8 12.5%
MIT Dataset (Cleaning to Dishwashing)
53.2% 13 7 14.3%
Intel Research Lab Dataset
63.2% 5 6 16.7%
95
! Ac7vi7esinthesourcedomainandthetargetdomainaregeneratedfromtenrandomtrials,meanaccuraciesarereported.
Transferringknowledgefromsocialtophysical
! Ubiquitousphysicalsensorsmo7vateextensiveresearchonubiquitouscompu7ng.
Whichac7vityisthis person performing?
Transferringfromsocialtophysical IamonabusinesstripinNewYork.TheMetropolitanMuseumofArtisfantas7c! BrilliantnightatChilliFood,wine,hospitalityallexcellent.Bristol'stoprestaurant.
Backinthe#gymajer3.5weeks:)feelinggood#exercise
Transferfromsocialtophysical
CellphoneSensorDataset
! 232sensorrecords! 10volunteers! 7me,GPS,tri-axialaccelerometer,loca7onPOIinfo
SinaWeibo
! 10,791tweets ! Distribu7onoflabels
! Distribu7onoftop9labels
Transferfromsocialtophysical
! Results Anaivecombina7onofsensorandsocialfeaturesperformsbezerthansensorfeaturesonly(Combinedv.s.Sensor),whichvalidatesthenecessityofins7llingsocialknowledgeintophysicalsensordata.
HeterogeneoustransferlearningmethodsshowimprovementoverCombined:employingsocialmessagestoenrichsensorreadings’featurerepresenta7oninalatentspaceismoreeffec7vethannaivecombina7on.
! Ourmethodcoulduseonly50%labelleddataofothermethodstoachievethesameperformance.
TransferLearninginCollabora7veFiltering
• Source(Dense):Encodecluster-levelra7ngpazerns• Target(Sparse):Mapusers/itemstotheencodedprototypes
102
A B C
IIIIII
A B
IIIIII
a e b f c d2
64513
c d a b e1
362
4
7
a b c d e f
a b c d e1
65432
7
1
65432
BO
OK
S(T
arge
t - S
pars
e)M
OV
IES
(Aux
iliar
y - D
ense
)
Cluster-level Rating Pattern
Matching
323
231
323
231
132
3 ?3 3
1 11 1
2 2? 2
2 22 2
3 ?3 3
3 33 3
? 33 3
2 22 2
1 11 ?
? 1 11 1 ?1 ? 1
2 ?2 2
3 3 33 ? 3
? 33 3
2 2 ?? 2 2
2
32?33
3
23211
3
131?2
3
?3122
2
3233??
23211
?
31123
1
??123
3
2?3?2
3
23?3?
1
31??3
? 2 3 3 2
Perm
ute
row
s &
cols
5 Red
uce
to G
roup
s
3 33 ?? 3
Source-FreeTransferLearning
EvanWeiXiang,SinnoJialinPan,WeikePan,JianSuandQiangYang.Source-Selec7on-FreeTransferLearning.InProceedingsofthe22ndInterna7onalJointConferenceonAr7ficialIntelligence(IJCAI-11),Barcelona,Spain,July2011.
TransferLearning
Lackoflabeledtrainingdata
alwayshappens
Whenwehavesomerelated
sourcedomains
Supervised Learning
Transfer Learning
SFTL–Buildingbasemodels
vs.
vs.
vs.
vs.
vs.
vs.
vs.
vs.
vs.
vs.
vs.
Fromthetaxonomyoftheonlineinforma7onsource,wecan“compile”alotofbase
classifica7onmodels
SourceFreeTransferLearning
vs.
vs.
vs.
vs.
vs.
Foreachtargetinstance,wecanobtainacombinedresult
onthelabelspaceviaaggrega=ngthepredic=onsfromallthebaseclassifiers
However,doweneedtocallthebaseclassifiersduringthepredic)onphase?TheanswerisNo!
Thenwecanusetheprojec=onmatrixVtotransformsuchcombinedresultsfrom
thelabelspacetoalatentspace
V
Projection matrix
q
m
Prob
abili
ty
Label space
ATargetInstance
Compila7on:Learningaprojec7onmatrixWtoampthetargetinstancetolatentspace
vs.
vs.
vs.
vs.
vs.V
Projection matrixTarget Domain
Labeled & Unlabeled
Data
q
m
Wd
m
Learned Projection matrix
Ourregressionmodel
Lossonlabeleddata
Lossonunlabeleddata
Foreachtargetinstance,wefirstaggregateitspredic=oninthebaselabelspace,and
thenprojectitontothelatentspace
SFTL–Predic7onsfortheincomingtestdata
vs.
vs.
vs.
vs.
vs.V
Projec=onmatrix
Target Domain
Incoming Test Data
q
m
Wd
m
LearnedProjec=onmatrix
WiththeparametermatrixW,wecanmakepredic=ononanyincomingtestdatabasedonthedistancetothelabelprototypes,withoutcalling
thebaseclassifica=onmodelsNo need to use base models �explicitly!
Transi7veTransferLearning
withintermediatedomains Qiang Yang
Hong Kong University of Science and Technology
http://www.cse.ust.hk/~qyang
Problemdefini7on ! Givendistantsourceandtargetdomains,andasetofintermediatedomains,canwefindoneormoreintermediatedomainstoenablethetransferlearningbetweensourceandtarget?
Not directly Transferrable
Intermediatedomain1
Common factor 1
PreviousworkandTTL % Tradi7onalmachinelearning
& trainingandtestdatashouldbefromthesameproblemdomain.
% Transferlearning& trainingandtestdatashouldbefromsimilarproblemdomains.
% Transi7vetransferlearning& trainingandtestdatacouldbefromdistantproblemdomains.
ML: Same domain
TL: Similar domains
TTL: Distant domains
Text-to-ImageClassifica7on
Sourceandtargetdomainshavefewoverlaps
Text-to-image Classification with co-occurrence data as intermediate domain
accelerator-to-gyroscope activity recognition with data from intelligent devices as intermediate domains
TTL:singleintermediatedomain Intermediatedomainselec7on,thenpropagateknowledge
! Crawlalotofimageswithannota7onsfromInternet! Usedomaindistance,suchasA-distance,toiden7fydomain! Transi7vetransferthroughsharedhiddenfactorsinrowbymatrixtri-
factoriza7on
Matrixtri-factoriza7onforclustering/classifica7on
ExperimentsNUS-WISEdataset ! TheNUS-WISEdatasetareused
! 45text-to-imagetasks! Eachtaskiscomposedof1200textdocuments,600images,and1600co-occurredtext-imagepairs.
SupervisedLearningw/auto-encoder
Labeled Source Domain
Feature Engineering
Predictive Model Learning
Shared
Text Classification
DesigningObjec7veFunc7onofTTL
Transitive Transfer Learning with intermediate data
Intermediate domain weighting/selection
The weights for the intermediate domains are learned from data. The intermediate data help find a better hidden layer.
Predictive Model Learning
Feature Engineering
TTLwithsupervisedauto-encoder
SourceFeature Engineering
Predictive Model Learning
Shared Target
Intermediates
! The NUS-WISE data ! 45 text-to-image tasks! Each task is composed of 1200 text documents, 600 images, and 1600 co-occurred text-image pairs. In each task, 1600*45 co-occurred text-image pairs will be used for knowledge transfer.
TTLwithsupervisedauto-encoder
SourceFeature Engineering
Predictive Model Learning
Shared Target
Intermediates
Text-to-image w/ intermediate data
ReinforcementTransferLearningviaSparseCoding
• Slow learning speed remains a fundamental problem for reinforcement learning in complex environments.
• Main problem: the numbers of states and actions in the source and target domains are different. – Existing works: hand-coded inter-task mapping between state-
action pairs
• Tool: new transfer learning based on sparse coding
Ammar, Tuyls, Taylor, Driessens, Weiss: Reinforcement Learning Transfer via Sparse Coding. AAMAS, 2012.
ReinforcementLearningTransferviaSparseCoding A u t h o r s m e a s u r e d t h e
performance as the number ofstepsduringanepisodetocontrolthepoleinanuprightposi7ononagivenfixedamountofsamples.
• GivenState-Ac7on-StateTripletsinthesourcetask,learndic7onaryas
• Usingthecoefficientmatrixinthefirststep,wecanlearnthedic7onaryinthetargettaskas
• Thenforeachtripletinthetargettask,-sparseprojec7onisusedtofinditscoefficients
• Asaresult,theinter-taskmappingcanbelearned!
ReinforcementTransferLearningviaSparseCoding
Reference
! [Thorndike and Woodworth, The Influence of Improvement in one mental function upon the efficiency of the other functions, 1901]
! [Taylor and Stone, Transfer Learning for Reinforcement Learning Domains: A Survey, JMLR 2009]
! [Pan and Yang, A Survey on Transfer Learning, IEEE TKDE 2009] ! [Quionero-Candela, etal, Data Shift in Machine Learning, MIT Press
2009] ! [Biltzeretal..DomainAdapta7onwithStructuralCorrespondence
Learning,EMNLP2006] ! [Pan etal., Cross-Domain Sentiment Classification via Spectral Feature
Alignment, WWW 2010] ! [Pan etal., Transfer Learning via Dimensionality Reduction, AAAI
2008] 126
Reference(cont.)
! [Pan etal., Domain Adaptation via Transfer Component Analysis, IJCAI 2009]
! [Evgeniou and Pontil, Regularized Multi-Task Learning, KDD 2004] ! [Zhang and Yeung, A Convex Formulation for Learning Task
Relationships in Multi-Task Learning, UAI 2010] ! [Agarwal etal, Learning Multiple Tasks using Manifold
Regularization, NIPS 2010] ! [Argyriou etal., Multi-Task Feature Learning, NIPS 2007] ! [Ando and Zhang, A Framework for Learning Predictive Structures
from Multiple Tasks and Unlabeled Data, JMLR 2005] ! [Ji etal, Extracting Shared Subspace for Multi-label Classification,
KDD 2008]
127
Reference(cont.)
! [Raina etal., Self-taught Learning: Transfer Learning from Unlabeled Data, ICML 2007]
! [Dai etal., Boosting for Transfer Learning, ICML 2007] ! [Glorot etal., Domain Adaptation for Large-Scale Sentiment
Classification: A Deep Learning Approach, ICML 2011] ! [Davis and Domingos, Deep Transfer vis Second-order Markov Logic,
ICML 2009] ! [Mihalkova etal., Mapping and Revising Markov Logic Networks for
Transfer Learning, AAAI 2007] ! [Li etal., Cross-Domain Co-Extraction of Sentiment and Topic
Lexicons, ACL 2012]
128
Reference(cont.)
! [Sugiyama etal., Direct Importance Estimation with Model Selection and Its Application to Covariate Shift Adaptation, NIPS 2007]
! [Kanamori etal., A Least-squares Approach to Direct Importance Estimation, JMLR 2009]
! [Cris7aninietal.,OnKernelTargetAlignment,NIPS2002] ! [Huang etal., Correcting Sample Selection Bias by Unlabeled Data,
NIPS 2006] ! [Zadrozny, Learning and Evaluating Classifiers under Sample
Selection Bias, ICML 2004]
129
TransferLearninginConvolu7onalNeuralNetworks
• Convolutional neural networks (CNN): outstanding image-classification.
• Learning CNNs requires a very large number of annotated image samples– Millions of parameters, to many that prevents application
of CNNs to problems with limited training data.
• Key Idea: – the internal layers of the CNN can act as a generic
extractor of mid-level image representation– Model-based Transfer Learning