Dual Transfer Learning
Mingsheng Long1,2, Jianmin Wang2, Guiguang Ding2
Wei Cheng, Xiang Zhang, and Wei Wang1Department of Computer Science and Technology
2School of Software, Tsinghua University, Beijing 100084, China
Outline
Motivation The Framework
Dual Transfer Learning
An ImplementationJoint Nonnegative Matrix Tri-Factorization
Experiments Conclusion
Notations
DomainFeature space
Two domains are different
TaskGiven feature space and label space Learn or estimate where
Two tasks are different
( ), where P Xx x or ( ) ( )s t s tX X P P x x
:f yx ( | )P y x ,X y Y x or , ( | ) ( | )s t s t s tY Y f f P y P y x x
X
YX
Motivation
Exploring the marginal distributions ( ) and ( )s tP Px x
Target comp.hard
ware
Sourcecomp.os
Latent factors
Task scheduling
Performance ArchitecturePower consumption
Cause the discrepancy between domains
Represent the commonality between domains
Motivation
Exploring the conditional distributions ( | ) and ( | )s tP y P yx x
Target comp.hard
ware
Sourcecomp.os
Model parameters
Task scheduling
→ comp
Performance
→comp
Architecture
→comp
Power consumption
→comp
Represent the commonality between tasks
The Framework: Dual Transfer Learning (DTL)
Simultaneously learning the marginal distribution and the conditional distributionMarginal mapping: learning the marginal distribution Conditional mapping: learning the conditional distribution
Exploring the duality for mutual reinforcementLearning one distribution can help to learn the other distribution
( )P x( | )P y x
Marginal MappingDistinct View
Marginal MappingCommon View
Marginal MappingDistinct View
Conditional Mapping
sX
tX
( )s s X
( )t t X
( )t X
( )s X , ( ) X
Nonnegative Matrix Tri-Factorization (NMTF)
TNMTF, ,
min m n m k k c n cL
U H V 0X U H V
k feature clusters, latent factors induce marginal mapping
c example classes, representing the categorical information
association between k feature clusters and c example classes, model parameters induce conditional mapping
m kU
k cH
n cV
, 'min 'L
U X 0
X UX
T
,min 'L
H V 0
X HV
: , ' ( ) ( ) ( ')m kR R P P x x x x
T: , ( ') ( | ) ( | ')k cR R P y P y v x x x
An Implementation: Joint NMTF
Marginal mapping: learning the marginal distribution
2T
, , ,min ,L
U U H V 0X U U H V
Target comp.hard
ware
Sourcecomp.os
Task scheduling
Performance ArchitecturePower consumption
Cause the discrepancy between domains
Represent the commonality between domains
Latent factors
2T
, , ,min ,L
U U H V 0X U U HV
An Implementation: Joint NMTF
Conditional mapping: learning the conditional distribution
Target comp.hard
ware
Sourcecomp.os
Task scheduling
→ comp
Performance
→comp
Architecture
→comp
Power consumption
→comp
Represent the commonality between tasks
Model parameters
An Implementation: Joint NMTF
Joint Nonnegative Matrix Tri-Factorization
Solution to the Joint NMTF optimization problem
2T
, , ,{ , }
T T
min ,
. . , , , { , }
s t
m k c n
L
s t s t
U U H V 0
X U U HV
U U 1 1 V 1 1
T
T T,
X V HU U
U U HV V H
T
T T,
X V HU U
U U HV V H
T
TT
,
, ,
X U U HV V
V H U U U U H
T
T T
,
, ,
U U X VH H
U U U U HV VDual Transfer Learning
Joint NMTF: Theoretical Analysis
DerivationFormulate a Lagrange function for the optimization problem
Use the KKT condition
ConvergenceProve it by the auxiliary function approach [Ding et al. KDD’06]
2T
TT T
TT T
,
tr , ,
tr
m k m k
n c n c
X U U HV
Γ U U 1 1 U U 1 1
Λ V 1 1 V 1 1
, , etc U VU 0 V 0
Experiments
Open data sets20-NewsgroupsReuters-21578
Each cross-domain data set8,000 documents, 15,000 features approximately
Evaluation Criteria
:
:
t
t
D f yAccuracy
D
x x x x
x x
Experiments
Non-transfer methods: NMF, SVM, LR, TSVM Transfer learning methods:
Co-Clustering based Classification (CoCC) [Dai et al. KDD’07]Matrix Tri-Factorization based Classification (MTrick) [Zhuang
et al. SDM’10]Dual Knowledge Transfer (DKT) [Wang et al. SIGIR’11]
Conclusion
We proposed a novel Dual Transfer Learning (DTL) frameworkExploring the duality between the marginal distribution and
the conditional distribution for mutual reinforcement
We implemented a novel Joint NMTF algorithm based on the DTL framework
Experimental results validated that DTL is superior to the state-of-the-art single transfer learning methods