ネットワーク ラーニング Network Learning
AI-driven Connectivist Framework for E-Learning 3.0
振興調整費
国立大学法人 電気通信大学
Unique and Exciting Campus
From Collective Intelligence to Connected Intelligence
Neil Rubens Okamoto/Ueno Laboratory Graduate School of Information Systems Center for Frontier Science and Engineering University of Electro-Communications
Neil Rubens Active Intelligence Group Knowledge Systems Laboratory University of Electro-Communications Tokyo, Japan
Evolution of eLearning: eLearning 1.0eLearning uses technology to enhance Learning
‣ eLearning 1.0:
‣ reading: content became easily accessible
‣ logging: user’s activities could be logged and analyzed
‣ Learning Theories:
‣ Behaviorism: learning is manifested by a change in behavior, environment shapes behavior, contiguity
‣ Cognitivism: how human memory works to promote learning
---------------------------------------------
Evolution of eLearning: eLearning 2.0‣ eLearning 2.0:
‣ writing: anybody can easily create content (e.g. blogs, wiki, etc.)
‣ socializing: interaction is easy (e.g. facebook, twitter, etc.)
‣ Learning Theories:
‣ Constructivism: constructing one's own knowledge from one's own experiences (enabled through writing)
‣ Social Learning: people learn from one another (enabled through socializing)
---------------------------------------------
---------------------------------------------
Broken Knowledge Cycle‣ Problem: The current cycle of knowledge creation/utilization is inefficient !
‣ large portion of created content is never utilized by others* only 0.05% of twitter messages attracts attention (Wu et. al., 2011) only 3% of users look beyond top 3 search results (Infolosopher, 2011)
‣ large parts of created contents are redundant (Drost, 2011)
‣ Peak Social – the point at which we can gain no new advantage from social activity (Siemens 2011)
*there are some personal benefits e.g. externalization, crystallization, etc.Knowledge
utilize
create
Redundant
Novel
U0lized
Existing Knowledge
“There is no data like more data” (Mercer at Arden. House, 1985)
Tan, Steinbach, Kumar; 2004
2,000 points 500 Points 8,000 points
“There is no data like more data” (Mercer at Arden. House, 1985)
Information Overload������
for Computers ���not a Problem ���but an Opportunity���
h"p://www.kieranhealy.org/files/misc/SocCoreCites.jpg:
h"p://wiki.ubc.ca/images/f/ff/SocialWeb.jpg9
Social Network
http://datamining.typepad.com/photos/uncategorized/2007/04/08/twitter20070405.png
Messaging Networks
Citation Network
How can we use computers to learn in these settings?���
Nova Spivack©
social connec0vity
inform
a0on
con
nec0vity
hFp://novaspivack.typepad.com/nova_spivacks_weblog/metaweb_graph.GIF
Our Focus
Connectivism (Learning Theory)������Connec0vism: Knowledge is distributed across a network of connecTons, and therefore learning consists of the ability to construct and traverse these networks (Siemens & Downes, 2008)
Property Behaviourism Cognitivism Constructivism Humanism Connectivism
Learning theorists
How learning occurs
Influencing factors
Role of memory
How transfer occurs
Types of learning best explained
Thorndike, Pavlov, Watson, Guthrie, Hull, Tolman, Skinner
Koffka, Kohler, Lewin, Piaget, Ausubel, Bruner, Gagne
Piaget, Vygotsky
Maslow, Rogers
Siemens, Downes
Black box—observable behaviour main focus
Structured, computational
Social, meaning created by each learner (personal)
Reflection on personal experience
Distributed within a network, social, technologically enhanced, recognizing and interpreting patterns
Nature of reward, punishment, stimuli
Existing schema, previous experiences
Engagement, participation, social, cultural
Motivation, experiences, relationships
Diversity of network, strength of ties, context of occurrence
Memory is the hardwiring of repeated experiences—where reward and punishment are most influential
Encoding, storage, retrieval
Prior knowledge remixed to current context
Holds changing concept of self
Adaptive patterns, representative of current state, existing in networks
Stimulus, response Duplicating knowledge constructs of “knower”
Socialization Facilitation, openness
Connecting to (adding) nodes and growing the network (social/conceptual/biological)
Task-based learning Reasoning, clear objectives, problem solving
Social, vague(“ill defined”)
Self-directed Complex learning, rapid changing core, diverse knowledge sources
connecTvism-‐vs-‐others.num
bers
hFp://imgs.sfgate.com/c/pictures/2011/12/19/ba-‐BRIDGE20_SFC0105724887.jpg
ConnecTvism: Nice Theory
Need: Tools & Frameworks To make it Prac0cal
Methods
Extraction Layer
docu
men
ts
Linking Layer
node
s
Aggregation Layer
links
conn
ectio
ns
Analysis Layer
netw
orkConceptual Framework���
hFp://www.progress.com/images/soluTons/rbi/rbi-‐stack2-‐705w.jpg?KeepThis=true&TB_iframe=true&height=534&width=705
hFp://mafra-‐toolkit.sourceforge.net/
use AI to:���§ connect contents���§ connect people���§ connect people & contents���§ connect models���
---------------------------------------------
---------------------------------------------
AI
---------------------------------------------
Concept Extraction
conceptconcept
conceptconcept
conceptconcept
conceptconcept
documentsconcepts
conceptconcept
conceptconcept
conceptconcept
conceptconcept
SemanticMapping
context(documents)
concepts
concept concept
concept concept
concept concept
Knowledge LevelEstimation
conceptconcept
conceptconcept
conceptconcept
conceptconcept
concepts
concept concept
concept concept
concept concept
Group Formation
tasks
Influence Estimation
interaction log
Modules���
Extraction Layer
docu
men
ts
Linking Layer
node
s
Aggregation Layer
links
conn
ectio
ns
Analysis Layer
netw
ork
Module���Pipeline���(Example)���
Search Engineconceptwant to learn about:
docs
Concept Extractor
conceptconcept
conceptconcept
conceptconcept
conceptconcept
docs
conceptconcept
conceptconcept
conceptconcept
conceptconcept
SemanticMapping
concept concept
concept
concept concept
Search Engine
concept
concept
concept
concept
concept concept
Analysis Layer Link Layer
discussions
……
……
…
concept
concept
concept
concept concept
concept
concept
concept
userI want to know about term t_i
t_i
t_i t_i
t_i
t_it_i
semantics
contents
social
u_i: I think t_i is same as t_j … u_j: no t_is is more like t_p ...u_k: you are both wrong t_i is ...
system user
I want to know about term t_i and t_k
system
semantics
contents
social
u_i: I think t_i is same as t_j … u_j: no t_is is more like t_p ... u_j: no t_is is more like t_p ...
t_i
t_i t_i
t_i
t_it_i
t_i
user
I think t_i and t_k are similar ..
system
social
u_i: I think t_i is same as t_j … u_j: no t_is is more like t_p ... u_j: no t_is is more like t_p ... u_m: I think t_i and t_k are similar .. u_i: you are right
semantics
contents
t_i
t_i t_i
t_i
t_it_i
t_i
social
u_i: I think t_i is same as t_j … u_j: no t_is is more like t_p ... u_j: no t_is is more like t_p ... u_m: I think t_i and t_k are similar .. u_i: you are right
Sequence Diagram (Example)���
�!V (d1)
�!V (d2)
�!V (d3)
1
�!V (d1)
�!V (d2)
�!V (d3)
1
�!V (d1)
�!V (d2)
�!V (d3)
1
�⇤V (d1)
�⇤V (d2)
�⇤V (d3)
tfi,j =
ni,jPk nk,j
idfi = log
|D||{d : ti ⌅ d}|
tf-idf i,j = tfi,j ⇥ idfi
wi,j = tf-idf i,j
�⇤V (dj) =
2
6664
w1,j
w2,j...
wt,j
3
7775
1
�⇤V (d1)
�⇤V (d2)
�⇤V (d3)
tfi,j =
ni,jPk nk,j
idfi = log
|D||{d : ti ⌅ d}|
tf-idf i,j = tfi,j ⇥ idfi
wi,j = tf-idf i,j
�⇤V (dj) =
2
6664
w1,j
w2,j...
wt,j
3
7775
sim(di, dj) =
�⇤V (di) ·�⇤V (dj)����⇤V (di)
�������⇤V (dj)
���
1
This is a Title of a Research Paper
Joe Fakeman Jane NomanNowhere University
{fakeman, noman}@nowhereuni.edu
Abstract~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Introduction~~~~~~~~~~~~~~~~- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~- ~~~~~~~~~~~~~~~~~~
This is a Title of a Research Paper
Joe Fakeman Jane NomanNowhere University
{fakeman, noman}@nowhereuni.edu
Abstract~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Introduction~~~~~~~~~~~~~~~~- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~- ~~~~~~~~~~~~~~~~~~
This is a Title of a Research Paper
Joe Fakeman Jane NomanNowhere University
{fakeman, noman}@nowhereuni.edu
Abstract~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Introduction~~~~~~~~~~~~~~~~- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~- ~~~~~~~~~~~~~~~~~~
�!V (d1)
�!V (d2)
�!V (d3)
1
�!V (d1)
�!V (d2)
�!V (d3)
1
�!V (d1)
�!V (d2)
�!V (d3)
1
�⇥V (dj) =
�
⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇤
w1,j
w2,j...
w|T |,j
w1,j
w2,j...
w|N |,j
⇥
⌃⌃⌃⌃⌃⌃⌃⌃⌃⌃⌃⌃⌃⌃⌅
2
term-space
network-space
�!V (d1)
�!V (d2)
�!V (d3)
1
�!V (d1)
�!V (d2)
�!V (d3)
1
�!V (d1)
�!V (d2)
�!V (d3)
1
term-network-space
�⇤V (d1)
�⇤V (d2)
�⇤V (d3)
tfi,j =
ni,jPk nk,j
idfi = log
|D||{d : ti ⌅ d}|
tf-idf i,j = tfi,j ⇥ idfi
wi,j = tf-idf i,j
�⇤V (dj) =
2
6664
w1,j
w2,j.
.
.
w|T |,j
3
7775
sim(di, dj) =
�⇤V (di) ·�⇤V (dj)����⇤V (di)
�������⇤V (dj)
���
Network Content based
wi,j = dist(i, j)
�⇤V (dj) =
2
6664
w1,j
w2,j.
.
.
w|N |,j
3
7775
1
�⇤V (d1)
�⇤V (d2)
�⇤V (d3)
tfi,j =
ni,jPk nk,j
idfi = log
|D||{d : ti ⌅ d}|
tf-idf i,j = tfi,j ⇥ idfi
wi,j = tf-idf i,j
�⇤V (dj) =
2
6664
w1,j
w2,j...
wt,j
3
7775
sim(di, dj) =
�⇤V (di) ·�⇤V (dj)����⇤V (di)
�������⇤V (dj)
���
1
——-
wi,j = �(i, j)
—-Generalized Assignment ProblemObjective: given a paper p, choose a group of expertsM ⇥ M (of a fixed size s) that collectively possessesthe most expertise about p:
maximize R(M, p) =P
m2Mr (m, p) (1)subject to |M | = s (2)
ChallengeExpertise is not additive:
R(M, p) ⇤=X
m2Mr (m, p) . (3)
–Group Expertise EstimationAssumptionsM⇤
p are athors of paper p; so R(M⇤p , p) = 1, and
R(M, p) = 0, where M ⌅M⇤p = Ø.
R(M, p) =
��M ⌅M⇤p
����M⇤
p
�� . (4)
Learn bR and use it to estimate group expertise.Use both semantic and structural features.Use ensemble predictive model.—Training Set: T = (XT , YT ) = {(xi, f(xi))xi2XT }Function learned from XT (and corresponding YT ): bRT
Generalization error: G( bRT ) = L( bRT , f)
minXT
G( bRT )
g: optimal function (in the sollution space)bR: learned functionbRi’s: learned functions from a slightly di�erent training set.EG = B + V + C
B =⇣E bR(x)� g(x)
⌘2
V =⇣
bR� E bR(x)⌘2
C = (g(x)� f(x))2
1
Reverse could also be done: i.e. converting textual representation to a network one.
Characteristics!
Network (N)! Content (C)! Proposed Hybrid!
Analysis!
!"#$%$&'()%*+(,( !"#$%$&'()%*+(-( same as C part of N
User Interface!./+$(/.(,(
Deployment!./+$(/.(,(
ExecutionSpeed! ./+$(/.(,(
Type!
Connecting Representations Semantic + Network���
#1
#3
#8
#2
#7
#6
#energy#star
#30
#soleus
#4
#5
#9
#50#pt
#70
#air#dp1
#pint
#a#b
#agw#globalwarming
#architecture
#construction
#asia
#news
#benefits
#thermal#blog
#reduce
#budget
#diy
#business
#buzz
#evenementiel
#pouf
#cadeau
#noel
#california
#cleantech
#eco
#canada
#cancun
#capandtrade
#glennbeck
#career
#green
#cars
#hybrid
#cleanenergy
#cleanthinking
#efficiency
#energyefficiency
#environment
#greenbuilding
#led
#pr
#utility
#climate
#iea
#peakoil
#climategate
#cmpcc
#wpccc
#co2
#coal
#oil
#cofely#duurzaam
#conservation#earth
#lamp
#power
#water
#cookies#starbucks
#cop16
#wef
#credits#tax
#csr
#cville
#dd45p#dd50p
#delonghi
#deco
#facebook#mobilier
#dehumidifier
#design
#digg#digguser
#dimmer
#disambiguation#underoath
#home
#money
#dlr#nature
#homes
#renewable
#science#solar
#sustainable
#tips
#economy
#edgestar
#education
#eecbg
#electricity
#encell
#windpower
#europe
#frugal
#greenbuild
#greenroofs
#improvements
#investment
#it
#leds
#lg
#lighting
#microwave
#policy
#renewableenergy
#renewables
#rural
#sfo
#sunpower
#sustainability
#utilities
#wind
#fail#tcot
#fb
#solarthermal
#florida
#footprint
#global
#gop
#hhrs
#gossip
#recycle
#greenjobs
#greentech
#housing
#ontario
#organic
#politics
#reuse
#uk
#greenenergy
#smartmeter
#grid
#jobs
#smart
#hiring
#nissan
#inception
#india#stock
#job
#tweetmyjobs
#usa
#livewithless#thepowerofwaiting
#lx
#meters
#obama
#ocra#sgp
#teaparty
#p2
#re
#tech
#smartgrid
#solpwr10
#sun
#webhost#webhosting
#011013#pentair
#10000#btu
#frigidaire
#65
#912
#c
#accenture
#affiliate#marketing
#architekt#calau
#audit
#becktips
#biz
#bolivia
#breakingnews
#prop23
#car#leaf
#carbon
#climatechange
#cdnpoli#cdnprog
#cfd #mkt
#stocks
#change
#china
#hydrogen
#sandiego
#socialmedia
#vc
#windenergy
#cloud#tcn
#tudelft
#cooperative
#unfccc#wcs
#cre #dehumidifiers
#dems
#dms#dsm
#ff
#ecofriendly
#economist
#ecosmart
#efficient
#emissions
#energystar
#fuelcell
#greenliving
#haier#indeed
#innovation
#leed
#light
#pakistan
#r
#siemens
#us
#engineering
#ev
#ft#sunpentown
#gadgets
#geospatial
#gis
#glam
#warming
#savings
#hayward
#pump
#super
#new
#hot
#in
#tip
#jobdanmark
#management
#veterans
#meter
#twisters
#platts
#pledge
#pv
#rated
#reddit#rulez
#rs
#technology
#sd
#tlot
#taf#tfb
#tpp
#vs
#45
#d
#africa
#aktien
#boerse
#alternative
#arm#linux
#cochabamba
#ca#electric
#finland
#greenit
#israel
#neweconomy
#clothdiapers#win
#cnet
#scriptie
#dhilipsiva
#doe
#ecomonday
#electronics
#employment
#solarpower
#environmental
#epa
#eu
#gas
#ghg
#indee
#mass
#save#saving
#victoria#media
#ftrs
#products
#gridweek
#sp1580
#sp2610x15
#heatpumps#saveenergy
#intelliflo
#mlfeeds#trhug
#mobile#web
#p21 #youcut
#windworks
#publicrelations
#privacy
#turbines
#46201
#aia
#art
#biofuel#biofuels
#building
#careers
#cdw
#cisco
#clean
#lightingdesign#nw
#energyrevolution
#un
#technews
#noc
#ebc
#tv
#energysaving
#enviro
#fossil
#fuels
#gadget
#gold
#silver
#greentip
#living
#transport
#travel
#greenwebhosting
#hohoho#stopglobalwarming
#lampe
#licht
#momentive
#streetlights
#novosti
#offshore
#topprog
#photography#pod
#realestate#sale
#space#yahoo
#timelyadvice
#winterhomeprep
#99#vt
#ac
#jobangels
#animals
#apple
#aps#getinthegame
#baby
#finance
#seattle
#socent
#startup
#climaterealism
#gobeyondoil
#datacenter
#esncm053e
#free
#fuel
#nuclear#texas
#football#motiongraphics
#london
#health
#heathrow#uksnow
#ledstreetlights
#nasa
#ondp
#ontpoli
#pkfloods
Figure 7: Visualization (zoomable) of 416 Hashtags in Energy-Related Tweets, September 2010 ThroughJanuary 2011; created with Network Explorer (Rubens et al., 2011).
proaches that help understand the large scale conversationstaking place on Twitter and elsewhere. This project hasdemonstrated the feasibility of using data mining techniquesto gather and analyze vast amounts of data from ongoingsocial media conversations and of analyzing the data formeaningful metrics that describe conversations about en-ergy consumption behavior. The methods for this prelim-inary investigation included: development of a list of en-ergy related metaphors, terms and general descriptors aswell as a list of energy conservation and reduction behav-iors, monitoring term usage (location, frequency, context,clustering) on the Internet at frequent intervals, analyzingdata for frequency and location of communication, clusteringof terms over time – such as changes in their proximity andoccurrence, introduction of new terms and fading of others.Our initial exploration confirmed that conversations aboutenergy-related issues are, indeed, taking place in social me-dia, specifically Twitter and that these communications canbe studied to better understand how to use technologically-
enhanced word-of-mouth to stimulate user-generated per-suasion. Using content analysis of full Tweets, network anal-ysis of co-occurring hashtags, and semantic analysis of theco-occurring hashtags and their authors, this preliminary in-vestigation identified descriptors, concerns, actions, and is-sues. We confirm that studying Twitter communications canprovide actionable means for assessing engagement, identify-ing influencers, and identifying word-of-mouth communitiesthat can accelerate change in energy e�ciency behaviors.
An ecolinguistic taxonomy of over one hundred terms wasestablished and included terms for: energy technologies/hardware&software; communication behaviors; energy & climate changeframes, metaphors, & visualizations; energy e�ciency andclimate change innovative programs; issues such as renew-able energy, global warning, energy insecurity; utilities, ven-ture firms and companies; and behaviors (high and low costand impact.)
By example, we have demonstrated that it is possible tocapture an issue-based sample of the Tweetstream and cu-
Connecting Concepts���
Modeling of: ���• topics���• lexicon���• semantics���• dynamics���
risk assessment
china
hypoxia
parasites
survival
complications
synaptic plasticity
trauma
internet
hypertension
disease outbreaks
bioinformatics
research
adaptation
biomarkers
public health
diabetes
exposure
echocardiography
systematic review
brain
mri
mitochondria
osteoporosis
liver
eeg
bangladesh
mortality
smoking
clinical trials
inflammation
ultrasound
education
antimicrobial resistance
elderly
incidence
genetics
perspective
malaria
africa
development
rehabilitation
mechanical ventilation
vaccination
biomonitoring
polychlorinated biphenyls
mass spectrometry
cadmium
serotoninattention
west nile virus
vaccine
glaucoma
reactive oxygen species
physical activity
computed tomography
oxidative stress
letter
child
epilepsy
air pollution
colorectal cancer
diabetes mellitus
imaging
molecular epidemiology
migration
retina
chronic obstructive pulmonary disease
body mass index
obesity
insulin
insulin resistance
prostate cancer
dispatch
treatment
breast cancer
alcohol
radiotherapy
pharmacokinetics
plasticity
evolution
neurodegeneration
meta-analysis
cholesterol
adolescents
fertility
diet
blood pressure
cognition
genotype
pain
cancer
asthma
autism
hippocampuslung cancer
dna methylation
tuberculosis viruses
x-ray crystallography
stroke
transcription
differentiation
communication
metastasis
biomarker
health korea
sepsis
infection
prognosis
chemotherapy
climate change
adherence
pregnancy
fmri
signal transduction
ethics
cell cycle
environmental exposure
transmission outbreak
nanotechnology
lung
risk factors
protein folding
myocardial infarction
atrial fibrillation
nutrition
quality of life
microarray
safety
environmental health
schizophrenia
particulate matter
kidney
management
zoonoses
diagnosis
aging
atherosclerosis
p53
mouse
surgery
zinc
critical care
prevention
occupational exposure
exercise
type 2 diabetes
epigenetics
electron microscopy
influenza
cardiovascular diseasedopamine
metabolism
environment
bacteria
pediatric
copd
outcome
antibodies
cytotoxicity
heart failure
memory
magnetic resonance imaging
endocrine disruption
electrophysiology
laparoscopy
pneumonia
calcium
melanoma
stress
chromatin
systems biology
prevalence
surveillance
therapy
mice
gaba
drinking water
phylogeny
intensive care
proliferation
arsenic
india
screening
human
angiogenesis
epidemiology
learning
apoptosis
simulation
gender
polymorphism
gene expression
prefrontal cortex
anxiety
drug resistance
brazil
mutation
multiple sclerosis
children
toxicity
infant
coronary artery disease
pesticides
depression
dementia
cytokines
proteomics
mercury
rat
nitric oxide
review
pcr
fish
metabolic syndrome
hiv
behavior
growth
women
immunohistochemistry
antioxidant
nanoparticles
lead
Domain-specific���Semantics���
J93-1006
J93-2003
P91-1022
P91-1023
P93-1002
P94-1012
J94-4004
W95-0107
J90-2002
J93-1003
J96-1002
P93-1003
P95-1032
C98-2225
A94-1006
A97-1050
J96-1001
A88-1019
J93-2004
A00-1031
W96-0213
J93-1004
P95-1037
P93-1001
P97-1003
C98-1013
J95-4004
C96-1058
P96-1025
A00-2018J98-2004
J98-4004
W98-1115
J97-4005
P99-1069
W97-0302A00-2031
W97-0301
P96-1023
P99-1059
H91-1060
H91-1026P93-1004
W93-0301
H94-1020
J92-4003
W95-0115
W96-0201
C92-2066
J99-2004
P92-1017
H94-1028
P96-1024
W99-0604
C00-2163
P97-1037
C98-2153
C96-2141
J00-2004
J97-3002
P95-1033
C98-1066
P00-1050
W00-0726
P97-1063
J99-4004
P00-1061
P02-1035
P99-1065
J99-4005
P01-1030
N01-1025
H01-1035
P00-1056
W02-2018
P00-1058
C02-1126
W00-1201
J01-2004
P01-1017
P03-1012
C04-1006
J03-1002
P03-1011
W02-1012
C04-1041
P04-1014
C04-1010
H92-1026
W04-2407
P02-1040
J02-3001
N03-1016
P03-1054
C04-1030
J03-1005
N03-1017
P02-1038
P03-1019
P03-1021
W02-1018
W03-0301
C04-1032
P02-1034
P02-1043
E03-1071
N04-1013
P02-1042
W03-1013
J04-4002
W01-0521
P03-1013
P03-1056
C04-1060
N04-1035
P01-1067
P02-1039
P02-1050
W02-1039
C04-1073
N04-1021
P96-1021
N03-1033
W02-1001
N03-1028
C04-1090
P02-1031
P03-1002
J03-3002
C94-2178
P03-2041
P02-1018
W03-1006
W03-1008
W02-1002
N04-1030
C04-1204
P02-1036
P03-1046
J05-1004
P05-1022
W06-1615
E06-1005
N07-1029
W07-0702
P07-1005
J03-4003
J04-4004
P07-1040
P07-2045
D07-1013
D07-1096
D07-1111
D07-1119
P06-1043
W06-2932
W06-2933
N06-1014
P05-1074
N06-1020
W07-2202
D07-1077
P06-1067
P07-1091
W06-3114
W07-0718
P04-1041
C08-1041
D07-1007
P05-1033
P06-1066
P06-1077
P06-1121
W04-3250
W05-1506
W06-3108
W06-3601
P04-1061P05-1044
P05-1012
P05-1013
W06-2920
P08-1067
C08-1050
N07-1070
P05-1072
P05-1073
W04-3212
W05-0620
P08-1068
P06-1033
P07-1122
H05-1059
C08-1064
D07-1079
D07-1104
H05-1095
N04-4026
N06-1002
N06-1013
P05-1032
P06-1002
P07-1032
W05-1511W07-2208
E03-1005
J05-1003
P06-1055
P07-1080
C08-1081
H05-1100
P03-1055
W07-2219
P05-1057
P06-1009
P06-1072
P07-1120
C08-1095
E06-1011
H05-1066
N07-1050
P05-1011
P07-1079
W05-1513P05-1010
H05-1010
N06-1019
J05-4003
C08-1127
D07-1056
D07-1091N06-1031
P05-1034
P05-1059
P05-1067
P07-1090
W06-1606
N04-1033
D07-1097
W05-1516
H05-1101
P06-1123
C08-1137
P05-1066
W06-1609
W07-0401
C08-1138
P04-1083
P07-1089
W06-1628
E06-1032
W05-0909
C08-1144
N07-1063
P07-1019
N06-1032
H05-1021
P04-1015
H05-1011
N07-1061
P07-1108
P06-2041
H05-1036
N04-1014
N06-1040
P96-1018
P97-1039
C98-2129
P97-1038
P06-1146
W04-2412
D07-1003
W06-3104
D07-1005
H05-1012
H05-1022
N04-1022
P06-1097 D07-1006
J00-1004
N06-1015
P06-1065W05-0812
W07-0403
E06-1010
N06-2033
W04-1513
D07-1014
W06-1638
D07-1015
H05-1064
N07-1051
P01-1042
W05-1512
W07-2216
D07-1101
W04-3201
P05-1039
W04-3207
P06-1091
P06-1096
D07-1027
W03-1005
N06-1024
P04-1042
P04-1082
N07-1008
D07-1030
D07-1038
N06-1033
W04-3224
N01-1026
D07-1055
N04-1023
P06-2101
P01-1010
D07-1062
W05-0634
D07-1070
W06-1616
E03-1008
J04-2003
N04-1032
D07-1078
W05-0908
D07-1080
P06-1098
W06-3119
N06-1003
D07-1099
N03-1014
P04-1013
P05-1023
W07-2218
H05-1098
P06-2089
N07-1049
W06-2922
W07-2217
N06-1021
D08-1008
D08-1010
D08-1011
P08-1066
W07-0711
D08-1012
N06-1022
D08-1016
P08-1108
D08-1017
W07-0405
D08-1022P08-1023
P08-1024
D08-1024
P06-1110
P08-1009
P08-1114
W08-0304
D08-1033
W06-3123
W07-0715
P08-1061
P06-1088
W04-3215
W06-1619
D08-1059
D08-1060
P05-1069
W03-1001
D08-1065
P08-1025W06-1666
D08-1066
P08-1012
D08-1076
D08-1089
N03-1021
D08-1091
P05-1038
P08-1109
D08-1093
E06-1019H05-1009
P04-1066
P03-1041
H05-1023
H05-1024
W05-1504
W00-1320
H05-1078
H05-1099
I05-1007
J97-2004
I08-1012
N07-3002
P07-1003
P07-1039
I08-1066
W05-1507
I08-2087
I08-2097
W05-1514
N06-1004
W05-1509N06-2026
N06-3004
W05-1505
W06-1608
N07-2022
W06-2904
P05-2016
P06-1062
W05-1515
W06-3603
W04-3228
P06-2014P07-1001
P07-1002
W06-1668
P07-1020
W07-1202
P07-2052
P08-1006
P08-1010
P08-1037
P08-1064
P08-1115
W07-0716
W01-0505
W03-0401
W04-2003
W06-2303
W06-3106
W07-0701
W07-0706
W07-0709
W08-0302
W08-0306
W08-0307
W08-0308
W08-0401
W08-0402
W08-0403
W08-0409
W08-2102
D09-1008D09-1021
D09-1023
D09-1037
D09-1040
D09-1050
D09-1058
D09-1059
D09-1060
D09-1073
D09-1085
D09-1087
D09-1105
D09-1106
D09-1108
D09-1114
D09-1119
D09-1123
D09-1127
D09-1135
D09-1136
D09-1141
D09-1161
E09-1033
E09-1037
E09-1044
E09-1049
E09-1061
E09-1090
N09-1013
N09-1025
N09-1026
N09-1027
N09-1029
N09-1049
N09-2066
P09-1007
P09-1020
P09-1036
P09-1039
P09-1040
P09-1041
P09-1042
P09-1059
P09-1063
P09-1064
P09-1065
P09-1066
P09-1087
P09-1103
P09-1104
P09-1106
P09-2035
P09-3009
W09-0103W09-0424
W09-0426
W09-0434
W09-0435
W09-0809
W09-1008
W09-1104
W09-1114
W09-2306
W09-2310
J08-3003
J08-4003
J07-2003
J07-3002
J07-3004
J07-4004
-�������
3�������
3�������
:�������
3�������
:�������
-�������
1�������
'�������
-�������
3�������
3�������
3�������
3�������
-�������
3�������
1�������
1�������
:�������
3�������
:�������
-�������
3�������
3�������
3�������
3�������'�������
1�������
+�������
3�������
'�������
'�������
3�������
3�������
:�������
:�������
3�������
3�������
&�������3�������
3�������
3�������
-�������
3�������
3�������
3�������
+�������
:�������
3�������:�������
1�������
Connection Formation���Curriculum / Survey Design���
Original Design:������Lacks Information������Limited Diversity ���Limited Context���
Revised Design:���Information Criterion-based���
output
inpu
t ind
ex
feature index
inpu
t ind
ex
output
inpu
t ind
ex
feature index
inpu
t ind
ex
Traditional
output
inpu
t ind
ex
feature index
inpu
t ind
ex
output
inpu
t ind
ex
feature index
inpu
t ind
ex
Collaborative
Learning in Collaborative Settings
Learning in Black-box Settings
�⇥V (d1)
�⇥V (d2)
�⇥V (d3)
1
�⇥V (d1)
�⇥V (d2)
�⇥V (d3)
1
�⇥V (d1)
�⇥V (d2)
�⇥V (d3)
1
This is a Title of a Research Paper
Joe Fakeman Jane NomanNowhere University
{fakeman, noman}@nowhereuni.edu
Abstract~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Introduction~~~~~~~~~~~~~~~~- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~- ~~~~~~~~~~~~~~~~~~
This is a Title of a Research Paper
Joe Fakeman Jane NomanNowhere University
{fakeman, noman}@nowhereuni.edu
Abstract~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Introduction~~~~~~~~~~~~~~~~- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~- ~~~~~~~~~~~~~~~~~~
This is a Title of a Research Paper
Joe Fakeman Jane NomanNowhere University
{fakeman, noman}@nowhereuni.edu
Abstract~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Introduction~~~~~~~~~~~~~~~~- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~- ~~~~~~~~~~~~~~~~~~
�⇥V (d1)
�⇥V (d2)
�⇥V (d3)
1
�⇥V (d1)
�⇥V (d2)
�⇥V (d3)
1
�⇥V (d1)
�⇥V (d2)
�⇥V (d3)
1
�⇥V (dj) =
�
⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇤
w1,j
w2,j...
w|T |,j
w1,j
w2,j...
w|N |,j
⇥
⌃⌃⌃⌃⌃⌃⌃⌃⌃⌃⌃⌃⌃⌃⌅
2
�⇥V (d1)
�⇥V (d2)
�⇥V (d3)
1
�⇥V (d1)
�⇥V (d2)
�⇥V (d3)
1
�⇥V (d1)
�⇥V (d2)
�⇥V (d3)
1
term-space node-space
term-node-space
Figure 2: Vector-space interpretation of the proposed conver-sion and integration methods (figure 1) [4], [6].
(a) (b) (c)
Figure 3: Utilizing training points selected by an activelearning method 3c, allows to more accurately predict thetrue values 3a, in comparison with selecting training pointsrandomly 3b [5].
our motivation is to use the only data that is accessible inblack box settings – output estimates. We note that accuracywill improve only if the learner’s output estimates change.Therefore we propose active learning criterion that utilizes theinformation contained within the changes of output estimates.
Many active learning methods are inapplicable in blackbox settings, since they rely on the knowledge of at leastsome aspect of the model’s workings, as indicated by recentsurveys [5]. Variance-based active learning approaches areapplicable, but are not effective for a number of reasons.Since no information about the model is available, we proposeto define an active learning criterion based on the indirectinformation available about the model – it’s output estimates.We note that model’s accuracy may improve only if its outputestimates change (as a result of adding a new training point).In an attempt to speed up the improvements in accuracy ofthe model estimates, we propose to estimate the usefulness oflabeling based on the magnitude of its impact on the estimates.We show that defining an active learning criterion by takinginto account changes in the output estimates is a promisingpractical approach [11].
C. Network Analysis
Network data structures are becoming increasingly commondata type, in part due to the social nature and inherent intercon-nectedness of many domains. Traditional machine learning has
�f (x)x
⇥
�yx
�
�yx
= � · x
(a) White Box Model.
�
�yx
= � · x
�f (x)x
⇥
�yx
(b) Black Box Model.
Figure 4: For Black Box models only the inputs and outputsare accessible, internal workings are not accessible, unlike forWhite Box models [11].
input index
�yt
�yt+1
a) Adding a training pointinfluences many output es-timates.
input index
�yt
�yt+1
b) Adding a training pointinfluences only a few outputestimates.
Figure 5: Intuition for the proposed method. Before a trainingpoint was added output estimates are denoted as �yt, after as�yt+1 [11].
been focused on a traditional (non-relational data), consistingof multi dimensional samples; which makes them incompatiblewith network data structures. Motivated by this, we focus ondeveloping machine learning methods that are applicable tocomplex data that includes networks, text, semantics, etc. Inparticular we concentrate on finding patterns within complexdata and modeling network dynamics (especially with regardsto semantics) [13], [12].
III. APPLICATIONS
In this section we describe application of the developedmethods in diverse practical domains.
A. Expertise FindingIn today’s knowledge-based economy, having the proper ex-pertise is crucial to resolving many tasks. Expertise Finding(EF) is the area of research concerned with matching availableexperts to given tasks. A standard approach is to input atask description/proposal/paper into an EF system, and receiverecommended experts as output. Traditionally group formation(GF) models are constructed from the data to represent each ofthe underlying entities, e.g. the task description and candidate
!"#$%&"'(!)*+,
-
./
!"#$"%&'
('%'#")$*"+$,%-'##,#
.,/')-'##,#
0
1
20$"1
314
315
316
31(
Figure 6: Decomposition of generalization error G into modelerror C, bias B, and variance V , where g denotes optimalfunction, f is a learned function fi’s are the learned functionsfrom a slightly different training set. Traditional active learningmethods concentrate on minimizing only the variance (V) partof the error, proposed methods takes into consideration all ofthe error components [11].
�⇥V (d1)
�⇥V (d2)
�⇥V (d3)
1
�⇥V (d1)
�⇥V (d2)
�⇥V (d3)
1
�⇥V (d1)
�⇥V (d2)
�⇥V (d3)
1
This is a Title of a Research Paper
Joe Fakeman Jane NomanNowhere University
{fakeman, noman}@nowhereuni.edu
Abstract~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Introduction~~~~~~~~~~~~~~~~- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~- ~~~~~~~~~~~~~~~~~~
This is a Title of a Research Paper
Joe Fakeman Jane NomanNowhere University
{fakeman, noman}@nowhereuni.edu
Abstract~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Introduction~~~~~~~~~~~~~~~~- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~- ~~~~~~~~~~~~~~~~~~
This is a Title of a Research Paper
Joe Fakeman Jane NomanNowhere University
{fakeman, noman}@nowhereuni.edu
Abstract~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Introduction~~~~~~~~~~~~~~~~- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~- ~~~~~~~~~~~~~~~~~~
�⇥V (d1)
�⇥V (d2)
�⇥V (d3)
1
�⇥V (d1)
�⇥V (d2)
�⇥V (d3)
1
�⇥V (d1)
�⇥V (d2)
�⇥V (d3)
1
�⇥V (dj) =
�
⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇤
w1,j
w2,j...
w|T |,j
w1,j
w2,j...
w|N |,j
⇥
⌃⌃⌃⌃⌃⌃⌃⌃⌃⌃⌃⌃⌃⌃⌅
2
�⇥V (d1)
�⇥V (d2)
�⇥V (d3)
1
�⇥V (d1)
�⇥V (d2)
�⇥V (d3)
1
�⇥V (d1)
�⇥V (d2)
�⇥V (d3)
1
term-space node-space
term-node-space
Figure 2: Vector-space interpretation of the proposed conver-sion and integration methods (figure 1) [4], [6].
(a) (b) (c)
Figure 3: Utilizing training points selected by an activelearning method 3c, allows to more accurately predict thetrue values 3a, in comparison with selecting training pointsrandomly 3b [5].
our motivation is to use the only data that is accessible inblack box settings – output estimates. We note that accuracywill improve only if the learner’s output estimates change.Therefore we propose active learning criterion that utilizes theinformation contained within the changes of output estimates.
Many active learning methods are inapplicable in blackbox settings, since they rely on the knowledge of at leastsome aspect of the model’s workings, as indicated by recentsurveys [5]. Variance-based active learning approaches areapplicable, but are not effective for a number of reasons.Since no information about the model is available, we proposeto define an active learning criterion based on the indirectinformation available about the model – it’s output estimates.We note that model’s accuracy may improve only if its outputestimates change (as a result of adding a new training point).In an attempt to speed up the improvements in accuracy ofthe model estimates, we propose to estimate the usefulness oflabeling based on the magnitude of its impact on the estimates.We show that defining an active learning criterion by takinginto account changes in the output estimates is a promisingpractical approach [11].
C. Network Analysis
Network data structures are becoming increasingly commondata type, in part due to the social nature and inherent intercon-nectedness of many domains. Traditional machine learning has
�f (x)x
⇥
�yx
�
�yx
= � · x
(a) White Box Model.
�
�yx
= � · x
�f (x)x
⇥
�yx
(b) Black Box Model.
Figure 4: For Black Box models only the inputs and outputsare accessible, internal workings are not accessible, unlike forWhite Box models [11].
input index
�yt
�yt+1
a) Adding a training pointinfluences many output es-timates.
input index
�yt
�yt+1
b) Adding a training pointinfluences only a few outputestimates.
Figure 5: Intuition for the proposed method. Before a trainingpoint was added output estimates are denoted as �yt, after as�yt+1 [11].
been focused on a traditional (non-relational data), consistingof multi dimensional samples; which makes them incompatiblewith network data structures. Motivated by this, we focus ondeveloping machine learning methods that are applicable tocomplex data that includes networks, text, semantics, etc. Inparticular we concentrate on finding patterns within complexdata and modeling network dynamics (especially with regardsto semantics) [13], [12].
III. APPLICATIONS
In this section we describe application of the developedmethods in diverse practical domains.
A. Expertise FindingIn today’s knowledge-based economy, having the proper ex-pertise is crucial to resolving many tasks. Expertise Finding(EF) is the area of research concerned with matching availableexperts to given tasks. A standard approach is to input atask description/proposal/paper into an EF system, and receiverecommended experts as output. Traditionally group formation(GF) models are constructed from the data to represent each ofthe underlying entities, e.g. the task description and candidate
!"#$%&"'(!)*+,
-
./
!"#$"%&'
('%'#")$*"+$,%-'##,#
.,/')-'##,#
0
1
20$"1
314
315
316
31(
Figure 6: Decomposition of generalization error G into modelerror C, bias B, and variance V , where g denotes optimalfunction, f is a learned function fi’s are the learned functionsfrom a slightly different training set. Traditional active learningmethods concentrate on minimizing only the variance (V) partof the error, proposed methods takes into consideration all ofthe error components [11].
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
!y
!yt+1
P (!yt+1)
16% 84%
y
Figure 3: Distribution of the estimates�yt+1 in relation to the estimate �yt and thetrue value y.
ty
1ˆ
+ty
*y
cba
Figure 4: �y after the training point �is added to the training set (making thenumber of training points equal to t+1).
Figure 5: T1 =⇤�yt � �yt+1⇤ and thevalue that it tries to approximate ⇥G(Section Section 3.1). Most importantly,high values of ⇤�yt � �yt+1⇤2 should cor-respond to high values of ⇥G, sincethose are the points that are likely to bechosen.
1 2 3 4 5 6 7 8 9 102
3
4
5
6
7
8
9
Training Set Size
Mean S
quare
d E
rror
Proposed
A!optimal
D!optimal
E!optimal
Transductive
Random
Optimal
Figure 6: Evaluation of active learningcriterions (Mean Square Error: lowervalues are better, values are different atthe statistical significance level of 95%).
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
!y
!yt+1
P (!yt+1)
16% 84%
y
Figure 3: Distribution of the estimates�yt+1 in relation to the estimate �yt and thetrue value y.
ty
1ˆ
+ty
*y
cba
Figure 4: �y after the training point �is added to the training set (making thenumber of training points equal to t+1).
Figure 5: T1 =⇤�yt � �yt+1⇤ and thevalue that it tries to approximate ⇥G(Section Section 3.1). Most importantly,high values of ⇤�yt � �yt+1⇤2 should cor-respond to high values of ⇥G, sincethose are the points that are likely to bechosen.
1 2 3 4 5 6 7 8 9 102
3
4
5
6
7
8
9
Training Set Size
Mean S
quare
d E
rror
Proposed
A!optimal
D!optimal
E!optimal
Transductive
Random
Optimal
Figure 6: Evaluation of active learningcriterions (Mean Square Error: lowervalues are better, values are different atthe statistical significance level of 95%).
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
!y
!yt+1
P (!yt+1)
16% 84%
y
Figure 3: Distribution of the estimates�yt+1 in relation to the estimate �yt and thetrue value y.
ty
1ˆ
+ty
*y
cba
Figure 4: �y after the training point �is added to the training set (making thenumber of training points equal to t+1).
Figure 5: T1 =⇤�yt � �yt+1⇤ and thevalue that it tries to approximate ⇥G(Section Section 3.1). Most importantly,high values of ⇤�yt � �yt+1⇤2 should cor-respond to high values of ⇥G, sincethose are the points that are likely to bechosen.
1 2 3 4 5 6 7 8 9 102
3
4
5
6
7
8
9
Training Set Size
Mean S
quare
d E
rror
Proposed
A!optimal
D!optimal
E!optimal
Transductive
Random
Optimal
Figure 6: Evaluation of active learningcriterions (Mean Square Error: lowervalues are better, values are different atthe statistical significance level of 95%).
Model
SL
Network Structure Learning
Proposed Method Proposed Approach
input index
�yt
�yt+1
a) Adding training point causes manyoutput estimates to change.
input index
�yt
�yt+1
b) Adding training point causes fewoutput estimates to change.
Proposed Approach
Use changes in the estimates ⇥�yt ��yt+1⇥2 to estimate improvement in thegeneralization error G (xd )where �yt+1 are the estimates after (xd ,yd ) was added to the training set.
15 / 20
Proposed Method Proposed Approach
input index
�yt
�yt+1
a) Adding training point causes manyoutput estimates to change.
input index
�yt
�yt+1
b) Adding training point causes fewoutput estimates to change.
Proposed Approach
Use changes in the estimates ⇥�yt ��yt+1⇥2 to estimate improvement in thegeneralization error G (xd )where �yt+1 are the estimates after (xd ,yd ) was added to the training set.
15 / 20
Proposed Method Proposed Approach
input index
�yt
�yt+1
a) Adding training point causes manyoutput estimates to change.
input index
�yt
�yt+1
b) Adding training point causes fewoutput estimates to change.
Proposed Approach
Use changes in the estimates ⇥�yt ��yt+1⇥2 to estimate improvement in thegeneralization error G (xd )where �yt+1 are the estimates after (xd ,yd ) was added to the training set.
15 / 20
Proposed Method Proposed Approach
input index
�yt
�yt+1
a) Adding training point causes manyoutput estimates to change.
input index
�yt
�yt+1
b) Adding training point causes fewoutput estimates to change.
Proposed Approach
Use changes in the estimates ⇥�yt ��yt+1⇥2 to estimate improvement in thegeneralization error G (xd )where �yt+1 are the estimates after (xd ,yd ) was added to the training set.
15 / 20
Network Structure Learning���Connecting/Pruning Nodes���
!"#$%&"'(!)*+,
-
./
!"#$"%&'
('%'#")$*"+$,%-'##,#
.,/')-'##,#
0
1
20$"1
314
315
316
31(
Figure 7: Decomposition of generalization error G into model error C, bias B, and variance V , where g denotesoptimal function, ⇥f is a learned function ⇥fi’s are the learned functions from a slightly di�erent training set.
6.2.1 Parameter Change-based
Parameter Change-based AL (Settles et al., 2008b) favors items that are likely to influence the model the most.Assuming that changes in the model’s parameters are for the better, i.e. approach the optimal parameters, itis then beneficial to select an item that has the greatest impact on the model’s parameters:
⇥G�change(xa) = ��
�
Ey�YL(�T , �T ⇥(xa,y)), (22)
where �T are the model’s parameters estimated from the current training set T , and �T ⇥(xa,y) are the model’sparameter estimates after a hypothetical rating y of an item xa is added to the training set T , and L is the lossfunction that measures the di�erences between the parameters.
6.2.2 Variance-based
In this approach the error is decomposed into three components: model error C (the di�erence between the opti-mal function approximation g, given the current model, and the true function f), bias B (the di�erence betweenthe current approximation ⇥f and an optimal one g), and variance V (how much the function approximation ⇥fvaries ). In other words, we have:
G = C + B + V. (23)
One solution (Cohn et al., 1996) is to minimize the variance component V of the error by assuming that thebias component becomes negligible (if this assumption is not satisfied then this method may not be e�ective).There are a number of methods proposed that aim to select training inputs for reducing a certain measure of thevariance of the model’s parameters. The A-optimal design (Chan, 1981) seeks to select training input pointsso as to minimize the average variance of the parameter estimates, the D-optimal design (John & Draper,1975) seeks to maximize the di�erential Shannon information content of the parameter estimates, and theTransductive Experimental design (Yu et al., 2006) seeks to find representative training points that may allowretaining most of the information of the test points. The AL method in (Sugiyama, 2006), in addition to thevariance component, also takes into account the existense of the model error component.
6.2.3 Image Restoration-based
It is also possible to treat the problem of predicting the user’s preferences as one of image restoration (Nakamura& Abe, 1998), that is, based on our limited knowledge of a user’s preferences (a partial picture), we try to restorethe complete picture of the user’s likes and dislikes. The AL task is then to select the training points that wouldbest allow us to restore the “image” of the user’s preferences. It is interesting to note that this approach satisfiesthe desired properties of the AL methods outlined in Section 2. For example, if a point already exists in a region,then without sampling neighboring points the image in that region could likely be restored. This approach alsomay favor sampling close to the edges of image components (decision boundaries).
7 Ensemble-based Active LearningSometimes instead of using a single model to predict a user’s preferences, an ensemble of models may bebeneficial (??). In other cases only a single model is used, but it is selected from a number of candidate models.The main advantage of this is the premise that di�erent models are better suited to di�erent users or di�erent
13
f(x)
�2 B V
E⇥
G = �2 + B + V,
E⇥ {⇥i}ni=1
B =
�
D
⇤E⇥
f(x)� g(x)
⌅2
q(x)dx,
V = E⇥
�
D
⇤ f(x)� E
⇥
f(x)
⌅2
q(x)dx.
p(x) ⇤= q(x)
min�
⇧n⌥
i=1
⇤q(xi)
p(xi)
⌅� � f(xi)� yi
⇥2⌃
,
⇤ 0 ⇥ ⇤ ⇥ 1 ⇤
⇤ = 0⇤ = 1
⇤JO =
(x⇤⇥ A�1x⇥)2
(1 + x⇤⇥ A�1x⇥)2,
JT =
⇤xt⇥X�\x�
�x⇤⇥ A�1xt
⇥2
(1 + x⇤⇥ A�1x⇥)2.
JO JT
X⇤X
X⇤X =p⌅
i=1
⇥i�i�⇤i ,
⇥i ⇥1 � ⇥2 � . . . � ⇥m > ⇥m+1 = . . . = ⇥d =0 �i m X⇤X
A
A = X⇤X + �I
=p⌅
i=1
⇥i�i�⇤i + �I.
JO
x⇤⇥ A�1x⇥ = a JO
JO =a2
(a + 1)2 .
JO
a aa
a =p⌅
i=1
�x⇤⇥ �i
⇥2 1
⇥i + �
=m⌅
i=1
�x⇤⇥ �i
⇥2 1
⇥i + �+
1
�
p⌅
i=m+1
�x⇤⇥ �i
⇥2.
� 1�
⇤pi=m+1
�x⇤⇥ �i
⇥2
a
JO =(x⇤⇥ A�1x⇥)2
(1 + x⇤⇥ A�1x⇥)2,
JT =
⇤xt⇥X�\x�
�x⇤⇥ A�1xt
⇥2
(1 + x⇤⇥ A�1x⇥)2.
JO JT
X⇤X
X⇤X =p⌅
i=1
⇥i�i�⇤i ,
⇥i ⇥1 � ⇥2 � . . . � ⇥m > ⇥m+1 = . . . = ⇥d =0 �i m X⇤X
A
A = X⇤X + �I
=p⌅
i=1
⇥i�i�⇤i + �I.
JO
x⇤⇥ A�1x⇥ = a JO
JO =a2
(a + 1)2 .
JO
a aa
a =p⌅
i=1
�x⇤⇥ �i
⇥2 1
⇥i + �
=m⌅
i=1
�x⇤⇥ �i
⇥2 1
⇥i + �+
1
�
p⌅
i=m+1
�x⇤⇥ �i
⇥2.
� 1�
⇤pi=m+1
�x⇤⇥ �i
⇥2
a
JO JT
y�
J(�| y� = r) J(�) y� = r
J(�) ��
r
P (y� = r)J(�| y� = r),
P (y� = r)
JO JT
y�
J(�| y� = r) J(�) y� = r
J(�) ��
r
P (y� = r)J(�| y� = r),
P (y� = r)
a ⇥ 1
�
p⌃
i=m+1
�x⌅⇥ �i
⇥2.
1�
⇧pi=m+1
�x⌅⇥ �i
⇥2x⇥
X⌅X {�i}pi=m+1 x⇥
X⌅X {�i}mi=1
JO x⇥
JT
JT
(1 + x⌅⇥ A�1x⇥)2 � 1 JT
JT �⌃
xt⇤X�\x�
�x⌅⇥ A�1xt
⇥2.
x⌅⇥ A�1xt
x⌅⇥ A�1xt =p⌃
i=1
�x⌅⇥ �i
⇥ �x⌅t �i
⇥ 1
⇥i + �
=m⌃
i=1
�x⌅⇥ �i
⇥ �x⌅t �i
⇥ 1
⇥i + �+
1
�
p⌃
i=m+1
�x⌅⇥ �i
⇥ �x⌅t �i
⇥.
� 1�
⇧pi=m+1
�x⌅⇥ �i
⇥ �x⌅t �i
⇥
x⌅⇥ A�1xt ⇥1
�
p⌃
i=m+1
�x⌅⇥ �i
⇥ �x⌅t �i
⇥,
JT
JT �⌃
xt⇤X�\x�
�x⌅⇥ A�1xt
⇥2
⇥⌃
xt⇤X�\x�
⇤1
�
d⌃
i=m+1
�x⌅⇥ �i
⇥ �x⌅t �i
⇥⌅2
.
JT x⇥ X⌅XX⇥ \x⇥
a ⇥ 1
�
p⌃
i=m+1
�x⌅⇥ �i
⇥2.
1�
⇧pi=m+1
�x⌅⇥ �i
⇥2x⇥
X⌅X {�i}pi=m+1 x⇥
X⌅X {�i}mi=1
JO x⇥
JT
JT
(1 + x⌅⇥ A�1x⇥)2 � 1 JT
JT �⌃
xt⇤X�\x�
�x⌅⇥ A�1xt
⇥2.
x⌅⇥ A�1xt
x⌅⇥ A�1xt =p⌃
i=1
�x⌅⇥ �i
⇥ �x⌅t �i
⇥ 1
⇥i + �
=m⌃
i=1
�x⌅⇥ �i
⇥ �x⌅t �i
⇥ 1
⇥i + �+
1
�
p⌃
i=m+1
�x⌅⇥ �i
⇥ �x⌅t �i
⇥.
� 1�
⇧pi=m+1
�x⌅⇥ �i
⇥ �x⌅t �i
⇥
x⌅⇥ A�1xt ⇥1
�
p⌃
i=m+1
�x⌅⇥ �i
⇥ �x⌅t �i
⇥,
JT
JT �⌃
xt⇤X�\x�
�x⌅⇥ A�1xt
⇥2
⇥⌃
xt⇤X�\x�
⇤1
�
d⌃
i=m+1
�x⌅⇥ �i
⇥ �x⌅t �i
⇥⌅2
.
JT x⇥ X⌅XX⇥ \x⇥
JO =(x⇤⇥ A�1x⇥)2
(1 + x⇤⇥ A�1x⇥)2,
JT =
⇤xt⇥X�\x�
�x⇤⇥ A�1xt
⇥2
(1 + x⇤⇥ A�1x⇥)2.
JO JT
X⇤X
X⇤X =p⌅
i=1
⇥i�i�⇤i ,
⇥i ⇥1 � ⇥2 � . . . � ⇥m > ⇥m+1 = . . . = ⇥d =0 �i m X⇤X
A
A = X⇤X + �I
=p⌅
i=1
⇥i�i�⇤i + �I.
JO
x⇤⇥ A�1x⇥ = a JO
JO =a2
(a + 1)2 .
JO
a aa
a =p⌅
i=1
�x⇤⇥ �i
⇥2 1
⇥i + �
=m⌅
i=1
�x⇤⇥ �i
⇥2 1
⇥i + �+
1
�
p⌅
i=m+1
�x⇤⇥ �i
⇥2.
� 1�
⇤pi=m+1
�x⇤⇥ �i
⇥2
a
JO =(x⇤⇥ A�1x⇥)2
(1 + x⇤⇥ A�1x⇥)2,
JT =
⇤xt⇥X�\x�
�x⇤⇥ A�1xt
⇥2
(1 + x⇤⇥ A�1x⇥)2.
JO JT
X⇤X
X⇤X =p⌅
i=1
⇥i�i�⇤i ,
⇥i ⇥1 � ⇥2 � . . . � ⇥m > ⇥m+1 = . . . = ⇥d =0 �i m X⇤X
A
A = X⇤X + �I
=p⌅
i=1
⇥i�i�⇤i + �I.
JO
x⇤⇥ A�1x⇥ = a JO
JO =a2
(a + 1)2 .
JO
a aa
a =p⌅
i=1
�x⇤⇥ �i
⇥2 1
⇥i + �
=m⌅
i=1
�x⇤⇥ �i
⇥2 1
⇥i + �+
1
�
p⌅
i=m+1
�x⇤⇥ �i
⇥2.
� 1�
⇤pi=m+1
�x⇤⇥ �i
⇥2
a
JO =(x⇤⇥ A�1x⇥)2
(1 + x⇤⇥ A�1x⇥)2,
JT =
⇤xt⇥X�\x�
�x⇤⇥ A�1xt
⇥2
(1 + x⇤⇥ A�1x⇥)2.
JO JT
X⇤X
X⇤X =p⌅
i=1
⇥i�i�⇤i ,
⇥i ⇥1 � ⇥2 � . . . � ⇥m > ⇥m+1 = . . . = ⇥d =0 �i m X⇤X
A
A = X⇤X + �I
=p⌅
i=1
⇥i�i�⇤i + �I.
JO
x⇤⇥ A�1x⇥ = a JO
JO =a2
(a + 1)2 .
JO
a aa
a =p⌅
i=1
�x⇤⇥ �i
⇥2 1
⇥i + �
=m⌅
i=1
�x⇤⇥ �i
⇥2 1
⇥i + �+
1
�
p⌅
i=m+1
�x⇤⇥ �i
⇥2.
� 1�
⇤pi=m+1
�x⇤⇥ �i
⇥2
a
⇤�t = A�1X⇤y.
⇤�t+1 �
⇤�t+1 = (A + x�x⇤� )�1(X⇤y + x�y�)
= (A + x�x⇤� )�1X⇤y + (A + x�x
⇤� )�1x�y�.
(A + x�x⇤� )�1 = A�1 � A�1x�x⇤� A�1
1 + x⇤� A�1x�.
(A + x�x⇤� )�1X⇤y = A�1X⇤y � A�1x�x⇤� A�1X⇤y
1 + x⇤� A�1x�
(A + x�x⇤� )�1x�y� = A�1x�y� �
A�1x�x⇤� A�1x�y�
1 + x⇤� A�1x�.
⇤�t+1 � ⇤�t =A�1x�y� �A�1x�x⇤� A�1x�y�
1 + x⇤� A�1x�� A�1x�x⇤� A�1X⇤y
1 + x⇤� A�1x�
=A�1x�y�1 + x⇤� A�1x� � x⇤� A�1x�
1 + x⇤� A�1x�� A�1x�x⇤� A�1X⇤y
1 + x⇤� A�1x�
=A�1x�(y� � x⇤� ⇤�t)
1 + x⇤� A�1x�.
⇤yt+1 � ⇤yt = X⇥A�1x�(y� � x⇤� ⇤�t)
1 + x⇤� A�1x�.
J(�) = ⇥⇤yt+1 � ⇤yt⇥2
=
�y� � x⇤� ⇤�t
1 + x⇤� A�1x�
⇥2
x⇤� A�1X⇥⇤X⇥A�1x�.
⇤�t = A�1X⇤y.
⇤�t+1 �
⇤�t+1 = (A + x�x⇤� )�1(X⇤y + x�y�)
= (A + x�x⇤� )�1X⇤y + (A + x�x
⇤� )�1x�y�.
(A + x�x⇤� )�1 = A�1 � A�1x�x⇤� A�1
1 + x⇤� A�1x�.
(A + x�x⇤� )�1X⇤y = A�1X⇤y � A�1x�x⇤� A�1X⇤y
1 + x⇤� A�1x�
(A + x�x⇤� )�1x�y� = A�1x�y� �
A�1x�x⇤� A�1x�y�
1 + x⇤� A�1x�.
⇤�t+1 � ⇤�t =A�1x�y� �A�1x�x⇤� A�1x�y�
1 + x⇤� A�1x�� A�1x�x⇤� A�1X⇤y
1 + x⇤� A�1x�
=A�1x�y�1 + x⇤� A�1x� � x⇤� A�1x�
1 + x⇤� A�1x�� A�1x�x⇤� A�1X⇤y
1 + x⇤� A�1x�
=A�1x�(y� � x⇤� ⇤�t)
1 + x⇤� A�1x�.
⇤yt+1 � ⇤yt = X⇥A�1x�(y� � x⇤� ⇤�t)
1 + x⇤� A�1x�.
J(�) = ⇥⇤yt+1 � ⇤yt⇥2
=
�y� � x⇤� ⇤�t
1 + x⇤� A�1x�
⇥2
x⇤� A�1X⇥⇤X⇥A�1x�.
y� T1
y�� T1
T2
T1 ⇤G⌅yt+1 ⌅yt+1
⌅yt y�
T1 ⇤G
T1 ⇤G⌅yt+1
⌅yt y�
T1
⇤GJ
J(⇥) = ⌅⌅yt � ⌅yt+1⌅2 .
argmax�J(⇥).
J(⇥) = ⌅⌅yt � ⌅yt+1⌅2 =���X�
⇥⌅�t � ⌅�t+1
⇤���2,
⌅�t⌅�t+1
A = X⇥X + �I,
�I 0 < � ⇥ 1A ⌅�t
y� T1
y�� T1
T2
T1 ⇤G⌅yt+1 ⌅yt+1
⌅yt y�
T1 ⇤G
T1 ⇤G⌅yt+1
⌅yt y�
T1
⇤GJ
J(⇥) = ⌅⌅yt � ⌅yt+1⌅2 .
argmax�J(⇥).
J(⇥) = ⌅⌅yt � ⌅yt+1⌅2 =���X�
⇥⌅�t � ⌅�t+1
⇤���2,
⌅�t⌅�t+1
A = X⇥X + �I,
�I 0 < � ⇥ 1A ⌅�t
⇤�t = A�1X⇤y.
⇤�t+1 �
⇤�t+1 = (A + x�x⇤� )�1(X⇤y + x�y�)
= (A + x�x⇤� )�1X⇤y + (A + x�x
⇤� )�1x�y�.
(A + x�x⇤� )�1 = A�1 � A�1x�x⇤� A�1
1 + x⇤� A�1x�.
(A + x�x⇤� )�1X⇤y = A�1X⇤y � A�1x�x⇤� A�1X⇤y
1 + x⇤� A�1x�
(A + x�x⇤� )�1x�y� = A�1x�y� �
A�1x�x⇤� A�1x�y�
1 + x⇤� A�1x�.
⇤�t+1 � ⇤�t =A�1x�y� �A�1x�x⇤� A�1x�y�
1 + x⇤� A�1x�� A�1x�x⇤� A�1X⇤y
1 + x⇤� A�1x�
=A�1x�y�1 + x⇤� A�1x� � x⇤� A�1x�
1 + x⇤� A�1x�� A�1x�x⇤� A�1X⇤y
1 + x⇤� A�1x�
=A�1x�(y� � x⇤� ⇤�t)
1 + x⇤� A�1x�.
⇤yt+1 � ⇤yt = X⇥A�1x�(y� � x⇤� ⇤�t)
1 + x⇤� A�1x�.
J(�) = ⇥⇤yt+1 � ⇤yt⇥2
=
�y� � x⇤� ⇤�t
1 + x⇤� A�1x�
⇥2
x⇤� A�1X⇥⇤X⇥A�1x�.
⇤�t = A�1X⇤y.
⇤�t+1 �
⇤�t+1 = (A + x�x⇤� )�1(X⇤y + x�y�)
= (A + x�x⇤� )�1X⇤y + (A + x�x
⇤� )�1x�y�.
(A + x�x⇤� )�1 = A�1 � A�1x�x⇤� A�1
1 + x⇤� A�1x�.
(A + x�x⇤� )�1X⇤y = A�1X⇤y � A�1x�x⇤� A�1X⇤y
1 + x⇤� A�1x�
(A + x�x⇤� )�1x�y� = A�1x�y� �
A�1x�x⇤� A�1x�y�
1 + x⇤� A�1x�.
⇤�t+1 � ⇤�t =A�1x�y� �A�1x�x⇤� A�1x�y�
1 + x⇤� A�1x�� A�1x�x⇤� A�1X⇤y
1 + x⇤� A�1x�
=A�1x�y�1 + x⇤� A�1x� � x⇤� A�1x�
1 + x⇤� A�1x�� A�1x�x⇤� A�1X⇤y
1 + x⇤� A�1x�
=A�1x�(y� � x⇤� ⇤�t)
1 + x⇤� A�1x�.
⇤yt+1 � ⇤yt = X⇥A�1x�(y� � x⇤� ⇤�t)
1 + x⇤� A�1x�.
J(�) = ⇥⇤yt+1 � ⇤yt⇥2
=
�y� � x⇤� ⇤�t
1 + x⇤� A�1x�
⇥2
x⇤� A�1X⇥⇤X⇥A�1x�.
J(�) = (y� � x⌅� ⇧�t)2x
⌅� A�1X⇥⌅X⇥A�1x�
(1 + x⌅� A�1x�)2
= JRJS
JP,
JR = (y� � x⌅� ⇧�t)2,
JS = x⌅� A�1X⇥⌅X⇥A�1x�,
JP = (1 + x⌅� A�1x�)2.
JR = (y��x⌅� ⇧�t)2
y� x⌅� ⇧�t x�
JR
JS
JS = x⌅� A�1X⇥⌅X⇥A�1x�
=⌅
xt⇤X�
�x⌅� A�1xt
⇥2,
X⇥ X⇥
x� ⇥ X⇥
JS = (x⌅� A�1x�)2 +
⌅
xt⇤X�\x�
�x⌅� A�1xt
⇥2.
JSJP
JS
JP=
(x⌅� A�1x�)2 +⇤
xt⇤X�\x�
�x⌅� A�1xt
⇥2
(1 + x⌅� A�1x�)2
=(x⌅� A�1x�)2
(1 + x⌅� A�1x�)2+
⇤xt⇤X�\x�
�x⌅� A�1xt
⇥2
(1 + x⌅� A�1x�)2
= JO + JT ,
f(x)
�2 B V
E⇥
G = �2 + B + V,
E⇥ {⇥i}ni=1
B =
�
D
⇤E⇥
f(x)� g(x)
⌅2
q(x)dx,
V = E⇥
�
D
⇤ f(x)� E
⇥
f(x)
⌅2
q(x)dx.
p(x) ⇤= q(x)
min�
⇧n⌥
i=1
⇤q(xi)
p(xi)
⌅� � f(xi)� yi
⇥2⌃
,
⇤ 0 ⇥ ⇤ ⇥ 1 ⇤
⇤ = 0⇤ = 1
⇤
Conceptual Justifications���
Connecting People���Collaboration Networks���
University Alumnus Company
Size (degree log-scaled)
(a) Stanford University. (b) Harvard University. (c) MIT. (d) UC Berkeley.
Figure 1: Intra-University Networks. Networks of the above universities are expanded in a breadth first manner up to the depthof 2, (showing university, alumni and companies they are associated with through employment, investment or other activities)(Section III-A1). Size of the node reflects degree of the node (scaled logarithmically).
relation to number of alumni nodes differs. Stanford Universityhas a significantly higher ratio of companies per alumni inleadership roles, followed by Harvard, trailed by Berkeley, andMIT. A high ratio of company nodes indicates that alumnihave been involved with multiple companies - either throughemployment, advisory or investment activities. In addition,the number of highly connected alumni (large nodes withmany connections located on the perimeter) differs significantlybetween universities (we further explore this in Section III-B).One particular characteristic of highly connected alumni standsout, namely their collaboration patterns. Stanford’s denselyconnected alumni are highly likely to collaborate with fellowalumni (indicated by the company nodes being pulled awayfrom highly connected alumnus towards other less-connectedalumni in the center). In the networks of other universities, thecollaboration of highly connected individuals with their fellowalumni is evidenced but to a lesser degree.
Harvard alumni appear active in leadership positions intechnology-based startups, even more so than MIT alumni(Figure 1b vs. 1c). A possible explanation for the relativelylower level of MIT alumni may be attributable to the focus ofthis dataset on leadership positions in the organizations. Whileengineers play a key role, they often do so in a technologydevelopment capacity rather than in the leadership positions thatare visible in public relations communications. Some supportfor this explanation may be seen in the Figure 2 relatively largedistance between Microsoft and University of Washington, eventhough a large number of engineers at Microsoft are indeedfrom University of Washington.
2) Inter-University Network: A graphic representation ofthe alumni-based inter-relations between universities, shownin Figure 2, was produced as follows. Four universities wereselected for analysis: Stanford University, University of Califor-nia (Berkeley), Harvard University and Massachusetts Instituteof Technology (MIT). From these nodes we have performedbreadth-first expansion up to the depth of three: 1st levelbeing alumni of these corresponding universities, 2nd levelare companies with which alumni have relations, and 3rd
University
Size (Centrality log-scaled)
Stanford
MIT
Harvard
Berkeley
Financial Org.
Company
Person
University & Alumni Other Nodes
Figure 2: Inter-University Network (between Stanford, Harvard,MIT, Berkeley) (Section III-A2). Network is obtained by start-ing with the nodes of the above mentioned universities andperforming a breadth-first expansion up to the depth of 3.
level entities/nodes that are linked to previous levels includingfinancial organizations, company employees, etc. Since weare primarily interested in relationships among alumni, all ofother entities are faded out, except for the above mentioneduniversities and their alumni. In addition, we glimpse at therelations between alumni and investment firms (a very impor-tant factor for entrepreneurism). Therefore, nodes of financialorganizations are not faded out.
Two distinct groups – universities (in the lower left corner)and financials (in the upper right corner) – are visible in the
University Alumnus Company
Size (degree log-scaled)
(a) Stanford University. (b) Harvard University. (c) MIT. (d) UC Berkeley.
Figure 1: Intra-University Networks. Networks of the above universities are expanded in a breadth first manner up to the depthof 2, (showing university, alumni and companies they are associated with through employment, investment or other activities)(Section III-A1). Size of the node reflects degree of the node (scaled logarithmically).
relation to number of alumni nodes differs. Stanford Universityhas a significantly higher ratio of companies per alumni inleadership roles, followed by Harvard, trailed by Berkeley, andMIT. A high ratio of company nodes indicates that alumnihave been involved with multiple companies - either throughemployment, advisory or investment activities. In addition,the number of highly connected alumni (large nodes withmany connections located on the perimeter) differs significantlybetween universities (we further explore this in Section III-B).One particular characteristic of highly connected alumni standsout, namely their collaboration patterns. Stanford’s denselyconnected alumni are highly likely to collaborate with fellowalumni (indicated by the company nodes being pulled awayfrom highly connected alumnus towards other less-connectedalumni in the center). In the networks of other universities, thecollaboration of highly connected individuals with their fellowalumni is evidenced but to a lesser degree.
Harvard alumni appear active in leadership positions intechnology-based startups, even more so than MIT alumni(Figure 1b vs. 1c). A possible explanation for the relativelylower level of MIT alumni may be attributable to the focus ofthis dataset on leadership positions in the organizations. Whileengineers play a key role, they often do so in a technologydevelopment capacity rather than in the leadership positions thatare visible in public relations communications. Some supportfor this explanation may be seen in the Figure 2 relatively largedistance between Microsoft and University of Washington, eventhough a large number of engineers at Microsoft are indeedfrom University of Washington.
2) Inter-University Network: A graphic representation ofthe alumni-based inter-relations between universities, shownin Figure 2, was produced as follows. Four universities wereselected for analysis: Stanford University, University of Califor-nia (Berkeley), Harvard University and Massachusetts Instituteof Technology (MIT). From these nodes we have performedbreadth-first expansion up to the depth of three: 1st levelbeing alumni of these corresponding universities, 2nd levelare companies with which alumni have relations, and 3rd
University
Size (Centrality log-scaled)
Stanford
MIT
Harvard
Berkeley
Financial Org.
Company
Person
University & Alumni Other Nodes
Figure 2: Inter-University Network (between Stanford, Harvard,MIT, Berkeley) (Section III-A2). Network is obtained by start-ing with the nodes of the above mentioned universities andperforming a breadth-first expansion up to the depth of 3.
level entities/nodes that are linked to previous levels includingfinancial organizations, company employees, etc. Since weare primarily interested in relationships among alumni, all ofother entities are faded out, except for the above mentioneduniversities and their alumni. In addition, we glimpse at therelations between alumni and investment firms (a very impor-tant factor for entrepreneurism). Therefore, nodes of financialorganizations are not faded out.
Two distinct groups – universities (in the lower left corner)and financials (in the upper right corner) – are visible in the
Inter-University Network
subdued edges in Figure 2. The distance from the universitiesto the cloud of ‘financial’ clusters also varies. In particular,Stanford and Berkeley are rather close to the financial cloud.This may be explained by the geographical proximity of theseuniversities to one of the largest sources of venture funding –Silicon Valley. While universities themselves are not embeddedwithin the financial clusters, a noticeable proportion of alumniare deeply connected within the financial clusters by havingdirect or indirect relations with multiple financial organizations.Stanford has the largest number of alumni connected to thefinancial cluster, followed by Harvard (even though universityitself is relatively distant from the financial cluster); followedby Berkeley, and only a few alumni from MIT. The proximitybetween alumni and their alma matters appear to differ signif-icantly. Berkeley alumni tend to be clustered together, MIT tosomewhat lesser degree, and Stanford and Harvard alumni arerather dispersed. Proximity between universities differs as well.Stanford and Berkeley are close together (many alumni holdleadership positions in the same companies). One of the likelyexplanations for this network proximity is the geographicalproximity of both universities to Silicon Valley where many ofthe investment firms and startup companies are located. Harvardand MIT do not appear to have as strong relations with otheruniversities in these settings.
University
Other Entities (Company, People, Financial Org.)
Size (Centrality degree log-scaled)
Figure 3: Universities within the Business Network (partialsnapshot) (Section III-A3). Note that nodes locations differsignificantly from Figure 2 due to additional forces exerted by avery large number of nodes and links of the complete network(144,685 nodes and 129,423 links). For better visibility of theentity types except for universities are faded out.
3) Universities Within the Technology-Based Business Net-work : Through alumni, universities become indirectly linked
to a variety of business entities – technology-based companies,the service organizations that support them, and investmentfirms. The positions of universities within the technology-based business network Figure 3 are determined by their directlinks only to the alumni. However the proximity and locationof universities within the business network Figure 3 differfrom those of the inter-university network [fig:Inter-University-Network]. It should be noted that a large number of nodesand links that were not included in the inter-university networkare, in fact, included in the full network layout of the nodes.The cluster and forced based layout algorithms used in thisanalysis produce nodes that have many interconnections andtend to be close together. Moreover, both the direct and indirectlinks influence position of nodes within the network. Hence,the patterns of nodes differs significantly between the BusinessNetwork and the Inter-University Network.
Let us look at the proximity between universities and compa-nies. While Microsoft and Yahoo are close to many major uni-versities, Google appears to be distant from them. Discoveringthe precise explanation for this warrants further investigation,but let us suggest two hypotheses. As we briefly discussed inSection III-A2 and Section II-C, our dataset is focused on ‘key’people within the company (e.g. mentioned in press releases).Unlike many of other companies, Google tends to give creditto its engineers, e.g. names of engineers are mentioned inpress releases. In addition, Google had experienced very rapidemployee growth, which has required establishing relationshipswith many universities to meet hiring goals.
B. Data AnalysisIn addition to examining networks visually, we use several
network measures to reveal the characteristics and patternsof the underlying network. One of the biggest advantages ofnumerical analysis of network data is the ability to analyze verylarge networks; in visual analysis patterns in large networksquickly become difficult to discern (Figure 2, 3). For thenumerical analysis, we used the full set of data as described inSection II-C; constructed network contains 144,685 nodes and129,423 links; including over 2,100 educational institutions.Due to space limitations, we have selected to report networkproperties of 20 universities with the largest number of alumniincluded in our dataset.
Social Network Metrics: Network metrics numericallyexpress characteristics and patterns of the underlying network.For this analysis, we have chosen to use the following networkmeasures: centrality (betweenness centrality and closeness cen-trality), and eccentricity. Centrality reflects the relative impor-tance of a node within the graph. Betweenness and closenesscentrality are typical measures of centrality [5], [3]. Between-ness centrality can be thought of as a kind of bridge/brokerscore, a measure of how much the connections between othernodes in the network would be disrupted by removing thatnode. If an alum has a very novel and highly desired expertise,s/he may provide crucial and rather exclusive links for doingbusiness in that domain. More precisely betweenness centralitymeasures how frequently a node appears on shortest pathsbetween nodes in the network. On the other hand, closeness
University-Company Network
�$03$$,,$/��
�� � � ��� �� ��� ���1,,(,&��1+�-%��-1,0�-%��1+!$.�-%��$"-.#/
������
����
�����
����
����
�����
�����
����
�����
� ���
�����
�����
� ���
�����
�����
����
����
�����
�����
�����
�����
�����
���
����
����
���
���
����
����
����
����
��
� *$��,(2$./(05
�,(2$./(05�-%�� /'(,&0-,
�,(2$./(05�-%��-10'$.,�� *(%-.,(
�,(2$./(05�-%��$,,/5*2 ,(
�,(2$./(05�-%��("'(& ,
�,(2$./(05�-%��'(" &-
����
����$.)$*$5
�0 ,%-.#��,(2$./(05
�.(,"$0-,��,(2$./(05
�4%-.#��,(2$./(05
�-.0'3$/0$.,��,(2$./(05
�$3��-.)��,(2$./(05
���
� .2 .#��,(2$./(05
�1)$��,(2$./(05
� .0+-10'��-**$&$
�-.,$**��,(2$./(05
�-*1+!( ��,(2$./(05
������������#��������$� !�"'
�� ��������$� !�"'
�� "��#"���������
�#������$� !�"'
�� $� �����$� !�"'
��
�%��� �����$� !�"'
� "�%�!"� �����$� !�"'
�&�� �����$� !�"'
� ����"������$� !�"'
"���� �����$� !�"'
����� ����'
����
���$� !�"'�����������
���$� !�"'�����������
���$� !�"'��������!'�$����
���$� !�"'���� �#"�� �������� ���
���$� !�"'������!����"��
��������$� !�"'
Betw
eenn
ess
Cen
tralit
y
Alumni Index
�������������
���� �������������������������
���� ���������������� �������������
(a) Betweenness Centrality of Alumni Nodes in Networks (University andalumni nodes are disconnected).
Betw
eenn
ess
Cen
tralit
y
Alumni Index
(b) Betweenness Centrality of Alumni Nodes in Networks (University andalumni nodes are connected).
Figure 4: Betweenness Centrality (y-axis) of alumni nodes in the network (connected to the university node (b)), and (disconnectedfrom the university node (a)). The x-axis corresponds to alumni ordered in descending order of centrality for each of theuniversities. Note that the scale of y-axis differs between (a) and (b).
# of Alumni Median Betweenness Centrality
MedianEccentricity
Median Closeness Centrality
Table II: Network Metrics of Alumni Nodes in Networks(University and alumni nodes are connected). Note that theheading, “# of Alumni” is in reference to the number of alumniin the IEN dataset (Section II-C).
of these data-gathering and network analysis approaches. Re-cent applications of network analysis have demonstrated theirpower in understanding social norms, inter-firm relationships,and influence. Continuing developments combining networkanalysis and machine learning are opening opportunities forpredictive methods as well. Alumni and their connections – toseveral educational institutions and to several business entities –provide a data domain with deep dimension. Alumni networkscan be both simple and complex. The complexity of the
relationships among alumni active in technology-based businessaffords extensive inquiry into many-mode, small and large-scale, directed and time-scaled. The authors invite collaborationon these frontiers. Legends, novels, and films have been madeabout the academic cohort – a graduating class, a student courseproject team, a laboratory group. Experiences at educationalinstitutions are often profound and memorable. The connectionsformed through those experiences – through processes of selfdiscovery, learning, creativity, invention, collaboration – enablethe personal and professional contributions of graduates. Edu-cational institutions stand to benefit substantially from betterunderstanding the connections of their alumni. Insights fromthis understanding could be leveraged to guide the curricula andenrichment programs that comprise the educational experience.The power of visualization can be harnessed to develop ashared mental model, among faculty, administrators and donors,toward which resources will be applied. This might includecurricular and extracurricular attention to students’ personaland professional network development in a global environment,in which life-long and life-wide learning yields competitiveadvantage.
REFERENCES
[1] N. Rubens, K. Still, J. Huhtamaki, and M. G. Russell, “Leveraging socialmedia for analysis of innovation players and their moves,” tech. rep., MediaX, Stanford University, Feb. 2010.
[2] L. C. Freeman, Encyclopedia of Complexity and Systems Science, ch. Meth-ods of Social Network Visualization. Berlin: Springer, 2009.
[3] M. Bastian, S. Heymann, and M. Jacomy, “Gephi: An open source softwarefor exploring and manipulating networks,” 2009.
[4] S. Martin, W. M. Brown, R. Klavans and K. Boyack, “OpenOrd: AnOpen-Source Toolbox for Large Graph Layout,” in SPIE Conference onVisualization and Data Analysis (VDA), 2011.
[5] D. Hansen, B. Shneiderman, and M. Smith, Analyzing Social Networks withNodeXL: Insights from a Connected World. Morgan Kaufmann, 2010.
Betw
eenn
ess
Cen
tralit
y
Alumni Index
(a) Betweenness Centrality of Alumni Nodes in Networks (University andalumni nodes are disconnected).
�$03$$,,$/��
�� � � ��� �� ��� ���1,,(,&��1+�-%��-1,0�-%��1+!$.�-%��$"-.#/
������
�����
������
������
����
�����
�����
����
����
����
�����
�����
�����
����
�����
�����
�����
�����
����
�����
�����
�����
�����
����
�����
�����
�����
����
���
����
����
��
� *$��,(2$./(05
�,(2$./(05�-%�� /'(,&0-,
�,(2$./(05�-%��-10'$.,�� *(%-.,( �,(2$./(05�-%��$,,/5*2 ,(
�,(2$./(05�-%��("'(& ,
�,(2$./(05�-%��'(" &-
����
����$.)$*$5
�0 ,%-.#��,(2$./(05
�.(,"$0-,��,(2$./(05
�4%-.#��,(2$./(05
�-.0'3$/0$.,��,(2$./(05
�$3��-.)��,(2$./(05
���
� .2 .#��,(2$./(05
�1)$��,(2$./(05
� .0+-10'��-**$&$
�-.,$**��,(2$./(05
�-*1+!( ��,(2$./(05
������������#��������$� !�"'
�� ��������$� !�"'
�� "��#"���������
�#������$� !�"'
�� $� �����$� !�"'
��
�%��� �����$� !�"'
� "�%�!"� �����$� !�"'
�&�� �����$� !�"'
� ����"������$� !�"'
"���� �����$� !�"'
����� ����'
����
���$� !�"'�����������
���$� !�"'�����������
���$� !�"'��������!'�$����
���$� !�"'���� �#"�� �������� ���
���$� !�"'������!����"��
��������$� !�"'
Betw
eenn
ess
Cen
tralit
y
���������� ���������������
���������������� ���� ������������� ������ ��
������������� ���
���� ������������������� �������������
���� �������������
Alumni Index
(b) Betweenness Centrality of Alumni Nodes in Networks (University andalumni nodes are connected).
Figure 4: Betweenness Centrality (y-axis) of alumni nodes in the network (connected to the university node (b)), and (disconnectedfrom the university node (a)). The x-axis corresponds to alumni ordered in descending order of centrality for each of theuniversities. Note that the scale of y-axis differs between (a) and (b).
# of Alumni Median Betweenness Centrality
MedianEccentricity
Median Closeness Centrality
Table II: Network Metrics of Alumni Nodes in Networks(University and alumni nodes are connected). Note that theheading, “# of Alumni” is in reference to the number of alumniin the IEN dataset (Section II-C).
of these data-gathering and network analysis approaches. Re-cent applications of network analysis have demonstrated theirpower in understanding social norms, inter-firm relationships,and influence. Continuing developments combining networkanalysis and machine learning are opening opportunities forpredictive methods as well. Alumni and their connections – toseveral educational institutions and to several business entities –provide a data domain with deep dimension. Alumni networkscan be both simple and complex. The complexity of the
relationships among alumni active in technology-based businessaffords extensive inquiry into many-mode, small and large-scale, directed and time-scaled. The authors invite collaborationon these frontiers. Legends, novels, and films have been madeabout the academic cohort – a graduating class, a student courseproject team, a laboratory group. Experiences at educationalinstitutions are often profound and memorable. The connectionsformed through those experiences – through processes of selfdiscovery, learning, creativity, invention, collaboration – enablethe personal and professional contributions of graduates. Edu-cational institutions stand to benefit substantially from betterunderstanding the connections of their alumni. Insights fromthis understanding could be leveraged to guide the curricula andenrichment programs that comprise the educational experience.The power of visualization can be harnessed to develop ashared mental model, among faculty, administrators and donors,toward which resources will be applied. This might includecurricular and extracurricular attention to students’ personaland professional network development in a global environment,in which life-long and life-wide learning yields competitiveadvantage.
REFERENCES
[1] N. Rubens, K. Still, J. Huhtamaki, and M. G. Russell, “Leveraging socialmedia for analysis of innovation players and their moves,” tech. rep., MediaX, Stanford University, Feb. 2010.
[2] L. C. Freeman, Encyclopedia of Complexity and Systems Science, ch. Meth-ods of Social Network Visualization. Berlin: Springer, 2009.
[3] M. Bastian, S. Heymann, and M. Jacomy, “Gephi: An open source softwarefor exploring and manipulating networks,” 2009.
[4] S. Martin, W. M. Brown, R. Klavans and K. Boyack, “OpenOrd: AnOpen-Source Toolbox for Large Graph Layout,” in SPIE Conference onVisualization and Data Analysis (VDA), 2011.
[5] D. Hansen, B. Shneiderman, and M. Smith, Analyzing Social Networks withNodeXL: Insights from a Connected World. Morgan Kaufmann, 2010.
Task Description
Researcher Profile
Knows
Researcher Profile
Knows
Group
Task Description
Researcher Profile
Knows
!"#$%&'#()*+,-.%/'010%"("&'2*(%+"+')3%
4'#'")(5')%
6'1'.&%
&"7"%2*.*.1%896'").*.1%
!"#$%&'(&)*+&*,-)&.(./)
(-.7)*:;7'&%$.-<='&1'%
!"#$%&'#()*+,-.%
!"#!$%&
!"#!$%&
#$'()(*+&,'%&),&!"#!$%&
——-Objective: given a paper p, choose a group of experts M ⇤ M (of a fixed
size s) that collectively possesses the most expertise about p:
maximize R(M, p) =P
m2M
r (m, p) (1)subject to |M | = s (2)
—Training Set: T = (X
T
, Y
T
) = {(xi
, f(xi
))xi2XT }
Function learned from X
T
(and corresponding Y
T
): bR
T
Generalization error: G( bR
T
) = L( bR
T
, f)
minXT
G( bR
T
)
g: optimal function (in the sollution space)bR: learned functionbR
i
’s: learned functions from a slightly di⇣erent training set.EG = B + V + C
B =⇣E b
R(x) � g(x)⌘2
V =⇣
bR � E b
R(x)⌘2
C = (g(x) � f(x))2
Use changes in the estimates �by
t
� by
t+1�2 to estimate improvement in thegeneralization error G(x
�
)where b
y
t+1 are the estimates after (x�
, y
�
) was added to the training set.G
t
: generalization error when the number of training points is equal to t.G
t+1: generalization error after (x�
, y
�
) is added to the training set.⌅G = G
t
� G
t+1: improvement in generalization error—
minx�
G
t+1 = maxx�
⌅G.
⌅G =J + K
J = �by
t
� by
t+1�2
K =2 byt+1 � b
y
t
,y
⇤ � by
t+1⌦
—-⌅G : estimate since the true output values y
⇤ are not accessible.
• K = 2 byt+1 � b
y
t
,y
⇤ � by
t+1⌦
– need to estimate y
⇤ (all values)
1
——-Generalized Assignment ProblemObjective: given a paper p, choose a group of expertsM ⇥ M (of a fixed size s) that collectively possessesthe most expertise about p:
maximize R(M, p) =P
m2M
r (m, p) (1)subject to |M | = s (2)
R(M, p) ⌅=X
m2M
r (m, p) . (3)
LimitationExpertise is not additive:–Group Expertise EstimationAssumptionsM⇤
p
are athors of paper p; so R(M⇤p
, p) = 1, andR(M, p) = 0, where M ⌥M⇤
p
= Ø.
R(M, p) =
��M ⌥M⇤p
����M⇤
p
�� . (4)
Learn bR and use it to estimate group expertise.—Training Set: T = (X
T
, YT
) = {(xi
, f(xi
))xi2XT }
Function learned from XT
(and corresponding YT
): bRT
Generalization error: G( bRT
) = L( bRT
, f)
minXT
G( bRT
)
g: optimal function (in the sollution space)bR: learned functionbR
i
’s: learned functions from a slightly different training set.EG = B + V + C
B =⇣E bR(x)� g(x)
⌘2
V =⇣
bR� E bR(x)⌘2
C = (g(x)� f(x))2
Use changes in the estimates ↵byt
� by
t+1↵2 to estimate improvement in thegeneralization error G(x
�
)where b
y
t+1 are the estimates after (x�
, y�
) was added to the training set.G
t
: generalization error when the number of training points is equal to t.G
t+1: generalization error after (x�
, y�
) is added to the training set.⇤G = G
t
�Gt+1: improvement in generalization error
1
——-Generalized Assignment ProblemObjective: given a paper p, choose a group of expertsM ⇤ M (of a fixed size s) that collectively possessesthe most expertise about p:
maximize R(M, p) =P
m2M
r (m, p) (1)subject to |M | = s (2)
—Training Set: T = (X
T
, Y
T
) = {(xi
, f(xi
))xi2XT }
Function learned from X
T
(and corresponding Y
T
): bR
T
Generalization error: G( bR
T
) = L( bR
T
, f)
minXT
G( bR
T
)
g: optimal function (in the sollution space)bR: learned functionbR
i
’s: learned functions from a slightly di⌘erent training set.EG = B + V + C
B =⇣E b
R(x) � g(x)⌘2
V =⇣
bR � E b
R(x)⌘2
C = (g(x) � f(x))2
Use changes in the estimates �by
t
� by
t+1�2 to estimate improvement in thegeneralization error G(x
�
)where b
y
t+1 are the estimates after (x�
, y
�
) was added to the training set.G
t
: generalization error when the number of training points is equal to t.G
t+1: generalization error after (x�
, y
�
) is added to the training set.⌅G = G
t
� G
t+1: improvement in generalization error—
minx�
G
t+1 = maxx�
⌅G.
⌅G =J + K
J = �by
t
� by
t+1�2
K =2 byt+1 � b
y
t
,y
⇤ � by
t+1⌦
—-⌅G : estimate since the true output values y
⇤ are not accessible.
• K = 2 byt+1 � b
y
t
,y
⇤ � by
t+1⌦
1
——-Generalized Assignment ProblemObjective: given a paper p, choose a group of expertsM ⇥ M (of a fixed size s) that collectively possessesthe most expertise about p:
maximize R(M, p) =P
m2M
r (m, p) (1)subject to |M | = s (2)
R(M, p) ⌅=X
m2M
r (m, p) . (3)
LimitationExpertise is not additive:–Group Expertise EstimationAssumptionsM⇤
p
are athors of paper p; so R(M⇤p
, p) = 1, andR(M, p) = 0, where M ⌥M⇤
p
= Ø.
R(M, p) =
��M ⌥M⇤p
����M⇤
p
�� . (4)
Learn bR and use it to estimate group expertise.—Training Set: T = (X
T
, YT
) = {(xi
, f(xi
))xi2XT }
Function learned from XT
(and corresponding YT
): bRT
Generalization error: G( bRT
) = L( bRT
, f)
minXT
G( bRT
)
g: optimal function (in the sollution space)bR: learned functionbR
i
’s: learned functions from a slightly different training set.EG = B + V + C
B =⇣E bR(x)� g(x)
⌘2
V =⇣
bR� E bR(x)⌘2
C = (g(x)� f(x))2
Use changes in the estimates ↵byt
� by
t+1↵2 to estimate improvement in thegeneralization error G(x
�
)where b
y
t+1 are the estimates after (x�
, y�
) was added to the training set.G
t
: generalization error when the number of training points is equal to t.G
t+1: generalization error after (x�
, y�
) is added to the training set.⇤G = G
t
�Gt+1: improvement in generalization error
1
——-Generalized Assignment ProblemObjective: given a paper p, choose a group of expertsM ⇥ M (of a fixed size s) that collectively possessesthe most expertise about p:
maximize R(M, p) =P
m2M
r (m, p) (1)subject to |M | = s (2)
R(M, p) ⌅=X
m2M
r (m, p) . (3)
LimitationExpertise is not additive:–Group Expertise EstimationAssumptionsM⇤
p
are athors of paper p; so R(M⇤p
, p) = 1, andR(M, p) = 0, where M ⌥M⇤
p
= Ø.
R(M, p) =
��M ⌥M⇤p
����M⇤
p
�� . (4)
Learn bR and use it to estimate group expertise.—Training Set: T = (X
T
, YT
) = {(xi
, f(xi
))xi2XT }
Function learned from XT
(and corresponding YT
): bRT
Generalization error: G( bRT
) = L( bRT
, f)
minXT
G( bRT
)
g: optimal function (in the sollution space)bR: learned functionbR
i
’s: learned functions from a slightly different training set.EG = B + V + C
B =⇣E bR(x)� g(x)
⌘2
V =⇣
bR� E bR(x)⌘2
C = (g(x)� f(x))2
Use changes in the estimates ↵byt
� by
t+1↵2 to estimate improvement in thegeneralization error G(x
�
)where b
y
t+1 are the estimates after (x�
, y�
) was added to the training set.G
t
: generalization error when the number of training points is equal to t.G
t+1: generalization error after (x�
, y�
) is added to the training set.⇤G = G
t
�Gt+1: improvement in generalization error
1
——-Generalized Assignment ProblemObjective: given a paper p, choose a group of expertsM ⇥ M (of a fixed size s) that collectively possessesthe most expertise about p:
maximize R(M, p) =P
m2M
r (m, p) (1)subject to |M | = s (2)
R(M, p) ⌅=X
m2M
r (m, p) . (3)
LimitationExpertise is not additive:–Group Expertise EstimationAssumptionsM⇤
p
are athors of paper p; so R(M⇤p
, p) = 1, andR(M, p) = 0, where M ⌥M⇤
p
= Ø.
R(M, p) =
��M ⌥M⇤p
����M⇤
p
�� . (4)
Learn bR and use it to estimate group expertise.—Training Set: T = (X
T
, YT
) = {(xi
, f(xi
))xi2XT }
Function learned from XT
(and corresponding YT
): bRT
Generalization error: G( bRT
) = L( bRT
, f)
minXT
G( bRT
)
g: optimal function (in the sollution space)bR: learned functionbR
i
’s: learned functions from a slightly different training set.EG = B + V + C
B =⇣E bR(x)� g(x)
⌘2
V =⇣
bR� E bR(x)⌘2
C = (g(x)� f(x))2
Use changes in the estimates ↵byt
� by
t+1↵2 to estimate improvement in thegeneralization error G(x
�
)where b
y
t+1 are the estimates after (x�
, y�
) was added to the training set.G
t
: generalization error when the number of training points is equal to t.G
t+1: generalization error after (x�
, y�
) is added to the training set.⇤G = G
t
�Gt+1: improvement in generalization error
1
——-Generalized Assignment ProblemObjective: given a paper p, choose a group of expertsM ⇥ M (of a fixed size s) that collectively possessesthe most expertise about p:
maximize R(M, p) =P
m2M
r (m, p) (1)subject to |M | = s (2)
ChallengeExpertise is not additive:
R(M, p) ⇤=X
m2M
r (m, p) . (3)
–Group Expertise EstimationAssumptionsM⇤
p
are athors of paper p; so R(M⇤p
, p) = 1, andR(M, p) = 0, where M ⌃M⇤
p
= Ø.
R(M, p) =
��M ⌃M⇤p
����M⇤
p
�� . (4)
Learn bR and use it to estimate group expertise.Use both semantic and structural features.Use ensemble predictive model.—Training Set: T = (X
T
, YT
) = {(xi
, f(xi
))xi2XT }
Function learned from XT
(and corresponding YT
): bRT
Generalization error: G( bRT
) = L( bRT
, f)
minXT
G( bRT
)
g: optimal function (in the sollution space)bR: learned functionbR
i
’s: learned functions from a slightly different training set.EG = B + V + C
B =⇣E bR(x)� g(x)
⌘2
V =⇣
bR� E bR(x)⌘2
C = (g(x)� f(x))2
Use changes in the estimates ⌦byt
� by
t+1⌦2 to estimate improvement in thegeneralization error G(x
�
)where b
y
t+1 are the estimates after (x�
, y�
) was added to the training set.G
t
: generalization error when the number of training points is equal to t.
1
——-Generalized Assignment ProblemObjective: given a paper p, choose a group of expertsM ⇥ M (of a fixed size s) that collectively possessesthe most expertise about p:
maximize R(M, p) =P
m2M
r (m, p) (1)subject to |M | = s (2)
ChallengeExpertise is not additive:
R(M, p) ⇤=X
m2M
r (m, p) . (3)
–Group Expertise EstimationAssumptionsM⇤
p
are athors of paper p; so R(M⇤p
, p) = 1, andR(M, p) = 0, where M ⌃M⇤
p
= Ø.
R(M, p) =
��M ⌃M⇤p
����M⇤
p
�� . (4)
Learn bR and use it to estimate group expertise.Use both semantic and structural features.Use ensemble predictive model.—Training Set: T = (X
T
, YT
) = {(xi
, f(xi
))xi2XT }
Function learned from XT
(and corresponding YT
): bRT
Generalization error: G( bRT
) = L( bRT
, f)
minXT
G( bRT
)
g: optimal function (in the sollution space)bR: learned functionbR
i
’s: learned functions from a slightly different training set.EG = B + V + C
B =⇣E bR(x)� g(x)
⌘2
V =⇣
bR� E bR(x)⌘2
C = (g(x)� f(x))2
Use changes in the estimates ⌦byt
� by
t+1⌦2 to estimate improvement in thegeneralization error G(x
�
)where b
y
t+1 are the estimates after (x�
, y�
) was added to the training set.G
t
: generalization error when the number of training points is equal to t.
1
——-Generalized Assignment ProblemObjective: given a paper p, choose a group of expertsM ⇥ M (of a fixed size s) that collectively possessesthe most expertise about p:
maximize R(M, p) =P
m2M
r (m, p) (1)subject to |M | = s (2)
ChallengeExpertise is not additive:
R(M, p) ⇤=X
m2M
r (m, p) . (3)
–Group Expertise EstimationAssumptionsM⇤
p
are athors of paper p; so R(M⇤p
, p) = 1, andR(M, p) = 0, where M ⌃M⇤
p
= Ø.
R(M, p) =
��M ⌃M⇤p
����M⇤
p
�� . (4)
Learn bR and use it to estimate group expertise.Use both semantic and structural features.Use ensemble predictive model.—Training Set: T = (X
T
, YT
) = {(xi
, f(xi
))xi2XT }
Function learned from XT
(and corresponding YT
): bRT
Generalization error: G( bRT
) = L( bRT
, f)
minXT
G( bRT
)
g: optimal function (in the sollution space)bR: learned functionbR
i
’s: learned functions from a slightly different training set.EG = B + V + C
B =⇣E bR(x)� g(x)
⌘2
V =⇣
bR� E bR(x)⌘2
C = (g(x)� f(x))2
Use changes in the estimates ⌦byt
� by
t+1⌦2 to estimate improvement in thegeneralization error G(x
�
)where b
y
t+1 are the estimates after (x�
, y�
) was added to the training set.G
t
: generalization error when the number of training points is equal to t.
1
——-Generalized Assignment ProblemObjective: given a paper p, choose a group of expertsM ⇥ M (of a fixed size s) that collectively possessesthe most expertise about p:
maximize R(M, p) =P
m2M
r (m, p) (1)subject to |M | = s (2)
R(M, p) ⌅=X
m2M
r (m, p) . (3)
LimitationExpertise is not additive:–Group Expertise EstimationAssumptionsM⇤
p
are athors of paper p; so R(M⇤p
, p) = 1, andR(M, p) = 0, where M ⌥M⇤
p
= Ø.
R(M, p) =
��M ⌥M⇤p
����M⇤
p
�� . (4)
Learn bR and use it to estimate group expertise.—Training Set: T = (X
T
, YT
) = {(xi
, f(xi
))xi2XT }
Function learned from XT
(and corresponding YT
): bRT
Generalization error: G( bRT
) = L( bRT
, f)
minXT
G( bRT
)
g: optimal function (in the sollution space)bR: learned functionbR
i
’s: learned functions from a slightly different training set.EG = B + V + C
B =⇣E bR(x)� g(x)
⌘2
V =⇣
bR� E bR(x)⌘2
C = (g(x)� f(x))2
Use changes in the estimates ↵byt
� by
t+1↵2 to estimate improvement in thegeneralization error G(x
�
)where b
y
t+1 are the estimates after (x�
, y�
) was added to the training set.G
t
: generalization error when the number of training points is equal to t.G
t+1: generalization error after (x�
, y�
) is added to the training set.⇤G = G
t
�Gt+1: improvement in generalization error
1
——-Generalized Assignment ProblemObjective: given a paper p, choose a group of expertsM ⇥ M (of a fixed size s) that collectively possessesthe most expertise about p:
maximize R(M, p) =P
m2M
r (m, p) (1)subject to |M | = s (2)
ChallengeExpertise is not additive:
R(M, p) ⇤=X
m2M
r (m, p) . (3)
–Group Expertise EstimationAssumptionsM⇤
p
are athors of paper p; so R(M⇤p
, p) = 1, andR(M, p) = 0, where M ⌃M⇤
p
= Ø.
R(M, p) =
��M ⌃M⇤p
����M⇤
p
�� . (4)
Learn bR and use it to estimate group expertise.Use both semantic and structural features.Use ensemble predictive model.—Training Set: T = (X
T
, YT
) = {(xi
, f(xi
))xi2XT }
Function learned from XT
(and corresponding YT
): bRT
Generalization error: G( bRT
) = L( bRT
, f)
minXT
G( bRT
)
g: optimal function (in the sollution space)bR: learned functionbR
i
’s: learned functions from a slightly different training set.EG = B + V + C
B =⇣E bR(x)� g(x)
⌘2
V =⇣
bR� E bR(x)⌘2
C = (g(x)� f(x))2
Use changes in the estimates ⌦byt
� by
t+1⌦2 to estimate improvement in thegeneralization error G(x
�
)where b
y
t+1 are the estimates after (x�
, y�
) was added to the training set.G
t
: generalization error when the number of training points is equal to t.
1
Connecting People & Contents���Expertise Finding & Group Formation���
hFp://cmap.ccdmd.qc.ca/rid=1225215801935_1377957481_826/
Operationalization���of complex ���conceptual models���
Motivation���Internalization���
Connecting���Models���
Table 10.1. Motivation Terms: Author by Factor MatrixAuthor Wlodkowski Paulsena Donald Keller MacKinnon Panitz Feldmanb Nuhfer Farmer Theallc Pintrich Forsythd Chickeringe
Factor
Inclusion X X X X XCommunity X X X XClimate X XOwnership X X X X X X
Attitude X X X XAffect XInterest X XAwareness X XAttention XEnthusiasm X
Meaning X XRelevance X X X X XValue X X X X
Competence X X XEmpowerment X X XConfidence X XExpectancy X X X
Leadership X XHigh expectations X X XStructure X X X X XFeedback X X XSupport X X X
Satisfaction X XRewards X Xa Paulsen and Feldman (Chapter Two). b Feldman and Paulsen (Chapter Seven). c Theall, Birdsall, and Franklin, 1997. d Forsyth and McMillan, 1991. e Chickering and Gamson, 1987.
(Theall & Franklin, 1999)
Model Diversity���
Complex Modeling���
.………………
Text (tweets) motivationlabel
.………………
.………………
.………………
.………………
Dataset Construction Feature Extraction
Extractors Features
relatedness
sentiment...
Conceptual
part of spch
dependency
Textual
...
Sub-Modeling
Sub-Models
Expct Val T
Cogn Diss T...
Conceptual
M1
M2
Computational
...
FeatureSelection Sub-Models
A1
A2
A3
Aggregation
Expct Val T
Cogn Diss T
M1
M2 A4
Classifier
Ensemble Construction
Typical Approach
Networks are crucial for learning
• use AI methods to construct / opTmize networks – connecTng contents – connecTng people – connecTng people & contents – connecTng models
---------------------------------------------
---------------------------------------------
AI
---------------------------------------------Summary���
Concept Extraction
conceptconcept
conceptconcept
conceptconcept
conceptconcept
documentsconcepts
conceptconcept
conceptconcept
conceptconcept
conceptconcept
SemanticMapping
context(documents)
concepts
concept concept
concept concept
concept concept
Knowledge LevelEstimation
conceptconcept
conceptconcept
conceptconcept
conceptconcept
concepts
concept concept
concept concept
concept concept
Group Formation
tasks
Influence Estimation
interaction log