Heterogeneous Domain Adapation using Manifold Alignment
Chang Wang, Sridhar mahadevan
Layout
Problem Introduction Problem Definition Who cares Previous work & challenges Contribution A glance at methods
Problem Introduction
Example problem Input: Three collections of documents in
English (sufficient labels) Italian (sufficient labels) Arabic (few labels).
Target: Assign labels to the Arabic documents. A way: find a common feature space for 3 domains
Shared labels, (sports, military) No shared documents. (no instance
correspondence) No words translations are available.
Problem Introduction
English docs
Italian Docs
Arabic docs
Shared label set: {sports, military}No corresponding instances or words
Question: Can we construct a commonfeature space so that we can use English docs and Italian docs to help classify Arabic docs?
doc word1 word2 … label
1 0 2 sports
2 2 0 military
3 1 0 military
4 0 1 sports
doc parola1 parola2 … etichetta
1 2 0 sports
2 0 2 military
3 0 1 military
4 1 0 sports
doc كلمة1 كلمة2 … ملصق
1 0 2 sports
2 2 0 military
3 1 0 ?
4 0 1 ?
Problem Introduction
English docs
Italian docs
Arabic docs
doc word1 word2 … label
1 0 2 sports
2 2 0 military
3 1 0 military
4 0 1 sports
doc parola1 parola2 … etichetta
1 2 0 sports
2 0 2 military
3 0 1 military
4 1 0 sports
doc كلمة1 كلمة2 … ملصق
1 0 2 sports
2 2 0 military
3 1 0 ?
4 0 1 ?
doc feature1 feature2 … label
1 0 2 sports
2 2 0 military
3 1 0 military
4 0 1 sports
5 0 2 sports
6 2 0 military
7 1 0 military
8 0 1 sports
9 0 2 sports
10 2 0 military
11 1 0 ?
12 0 1 ?
Common feature space
Problem Introduction
Given K input datasets in different domains, with different features, but all of the datasets shared the same label set.
Source domain have sufficient labeled instances.
Target domain have few labeled instances.
Question: Can we construct a common feature space? So all instances in different domain can be mapped
to the same feature space, so that we can perform learning task?
Source k
Problem Definition
Source 1
Target
𝑚1
𝑝1
𝑚𝑘
𝑝𝑘
𝑚𝑡
𝑝𝑡
Common feature space∑
𝑖=1
𝑘
𝑚𝑖
𝑑
: # instances (domain i) : # features (domain i) : dimension of common feature space
Learning
Problem Definition
Input: K datasets from different domain : dataset k : instance i in dataset k is defined by feature
Goal: construct dimension common feature space for learning
Output: k mapping functions, , matrix
Who may benefit?
Search engine classify docs, rank docs, find docs topics
Businessman Customer clustering
Biologist Match protein
Challenges
Target domain have little labels No instance correspondence Source domain and target domain
have different feature space
Previous work
Most work assumes that the source domain and the target domain have the same features.
Manifold regularization Do not leverage source domain information
Transfer learning based on manifold alignment: use both label and unlabeled instance to learn mapping require small amount of instance
correspondence.
Contribution
Transfer learning perspective Can work on different feature space Cope with multiple input domain Can combine with existing domain
adaption methods Manifold alignment perspective
Need no instance correspondence Use label to learn alignment
A glance at methods
Find a set of mapping functions matrix
3 Criterions Instances from the same class (across
domains) are mapped to similar locations
Instances from different class (across domains) are mapped to separate locations
Preserve topology in the original domain.
A glance at methods
English docs
Italian docs
Arabic docs
doc word1 word2 … label
1 0 2 sports
2 2 0 military
3 1 0 military
4 0 1 sports
doc parola1 parola2 … etichetta
1 2 0 sports
2 0 2 military
3 0 1 military
4 1 0 sports
doc كلمة1 كلمة2 … ملصق
1 0 2 sports
2 2 0 military
3 1 0 ?
4 0 1 ?
doc feature1 feature2 … label
1 0 2 sports
2 2 0 military
3 1 0 military
4 0 1 sports
5 0 2 sports
6 2 0 military
7 1 0 military
8 0 1 sports
9 0 2 sports
10 2 0 military
11 1 0 ?
12 0 1 ?
Common feature space
Minimise distance(1,5)=0
Maximise distance(1,6)=
Minimise distance(10,11)=1
A glance at methods
Encode 3 criterion in a cost function Minimize
, for any pair with the same label , for any pair with different labels *similarity(, ), for any pair in one
original domain