the unified model of influence in social networks
TRANSCRIPT
Influence in Social Networks: A Unified Model?
Analytical and Computational Challenges
Charalampos [email protected]
Joint work with Ajitesh Srivastava, Viktor K. Prasanna
Influence in Social Networks
2
• Rule 1: we shape our network– We affect our friends
• Rule 2: our network shapes us– Our friends affect us– Our friends’ friends affect us
• External Influence– News– Media– …
• Emerging Behavior– Influence spread
Why?
3
InformationDissemination
• Emotions are contagious!
• Obesity is contagious!
Going Viral
• Technology adoption
• Epidemics– Fractional Immunization
• Other examples …
How to Study the Behavior of Infection Spread?
4
• Input: graph and a model of influence
• Problem: find the probability of infection of a nodeat time
• Challenges– Abundance of models to choose from
• Properties estimated from observation data– Peer pressure vs. buzz vs. external factors– Agent-based computational models
a
b
d
ef
g
chi
j
k
l
pab
pag pac
padpae
paf
pdi
pdj
pgk
pgl
pgb
Agent-based Modeling
5
• ABM: stochastic computer simulations of simulatedagents, in simulated space, over simulated time
• Useful to study emerging population dynamics– Macro-level behavioral patterns emerge
• Need explicitly programmed, micro-level rules– micro-level behaviors, interactions, and movements of
agents
• Challenges– Require an underlying social network model
• Also a model for social network growth• Community argues that “none of the standard network
models fit well with sociological theory”– Feature space is unbounded
• Individual behavior is a complex function of attributes andcharacteristics
• Nonlinear and/or complex interactions
Require extensive simulation runs!
ABM Computational Challenges
6
• Difficult to scale– Large-scale Networks– Evolutionary nature of the workload– Irregular graph processing
• Irregular communication• Load imbalance
• Back to influence/diffusion challenges– Most approaches are probabilistic– Require averaging 10,000s simulations
• Goal: Analytical solution for progressive diffusion with minimalcomputational complexity
Storage and computation patterns inversely affect performance!
Exact solution is #P hard!
Problem Setting
7
• Input: Directed graph G=(V,E), Seed set ⊂ , Diffusion process
• Output: Probability of infection for every node at time t, Bu,t
• Eu,t = 1, if u is infected by the time t
• Proceeds in discrete time steps• Two influence processes
– Pairwise– Collective
A
B
Ni(B)
No(B)C D
• Individual influence is the influence on a node u by an infected node virrespective of the state of infection of any other node in the network.
• Collective influence is the influence on a node u by the infected state of aset of nodes. (Popularity + External factors)
The Unified Model of Influence
8
Unified Model of Influence
9
• Individual influence: Each infected node attempts to infect its neighbors. Eachattempt of infecting node ∈ ( ) has a chance of success with probability( , ) ( ) and may change over time.
• Collective influence: Each susceptible node can be infected with probability( ), independent of individual influence. This may include external sources ofexposure, or the status of the incoming neighborhood of .
, ( )
Exact Solution When G is Tree (Special Case)
10
• Only one incoming neighbor (parent)
,= 1− 1 − 1 − , ( ) 1 − ,+ , 1 − , 1 −
• Solution is exact!
Individualinfluence
Collectiveinfluence
Infectionprobability of
parent
Exact Solution When G is Tree (Special Case)Proof Sketch
11
• The probability of u not being infected at time t: , = 0 = 1 − ,• Only two possibilities for this to happen:
– Collective influence failed with probability 1 − ( )• Parent v was **not** infected at time − 1• Or• Parent v **was** infected at time − 1
– But failed to infect u with probability 1 − ( , )( )Collective influence failed at
each time step ≤ t-1 1 − , 1 −Independent
General Solution
12
• Approximate analytical solution,= 1− 1 − 1 − , 1 − , ,∈+ , ( )(1 − , )∈ 1 − − 1 + ,
• Complexity:– Compared to ( | | ), where number of simulations is typically 10,000
Reduction to Other Models
13
• We show reductions of the Unified Model to the following models– Independent Cascade Model (Kempe, KDD 2003)
• Additive influence• n independent chances to become infected•
– Generalized Threshold Model (Kempe, KDD 2003)• LT, LFM
– Complex Contagion Model (Chelmis, ASONAM 2013)• ICM + Exponential Growth
– Infected nodes infect each of their outgoing neighbors with probability– Healthy nodes randomly get infected with probability ( )
nICM )-(1-1p
Experimental evaluation• Subset of Digg follower graph (Digg1k)• Erdös-Renyi random networks
Agreement of Simulation and Theory onDigg1k
14
Independent Cascade Model
Complex Contagion Model
Generalized Threshold Model
Experimental Setup
15
• Varying network size: [20, 40, 80, … , 10240]– Density set to 5/| | (sparse graph. No of edges 5|V|)– Varying parameters– Start with 20% infected nodes– Run 5 times with Erdös-Renyi random graphs– Each run has 1,000 simulations– Each simulation goes till = 50
• Varying density– | | = 1000– Number of edges: [1000, 2000, 4000,… , 512000]
• Influence Models– ICM, GLT, CCM
Performance Metrics
16
• RMS Error in probability prediction: Microscopic error
= , − ,∗• Number of infections at time : = ∑ ,• Fractional error in expected infection prediction: Macroscopic error= − ∗
Error Over Time
17
• ICM with different parameters– Error vs time, averaged over graph sizes
Microscopic error vs time Macroscopic error vs time
Error as a Function of Network Size
18
• ICM with different parameters– Error vs size of the graph, averaged over time
Microscopic error vs time Macroscopic error vs time
Error as a Function of Network Density
19
• ICM with different parameters– Microscopic error vs density (averaged over time and different graphs)
Error Over Time (Varying Density)
20
• ICM with p = 0.5 on a network of 1,000 nodes– Microscopic error vs time (for a variety of density values)
Computational Advantages of Unified Model
21
• Vertex-centric approach– Great modeling flexibility
• Personalized influence functions• Time-dependent functions allowed
– Depends only on local structure• Bu,t-1
• Bv,t-1 for ∈ ( )• ru(t)
Possible to achieve great parallelism!
Parallel Implementation Evaluation
22
• Maximum parallelism when #PE equals number of models
• Evaluation– OpenMP implementation
• Shared memory environment• 8-core 2.66GHz machine
• Datasets– HEPT
• 15K nodes - 59K edges– Twitter
• 3.8M nodes - 5.4M edges
Conclusions
23
• A general analytical framework for calculating influence spread– No requirement of expensive stochastic simulations
• Quality of analytical solution– Exact for trees– Empirically proven that error is small for generic graphs
• Generality of analytical solution– Applicable to a variety of models
• Enables “what-if” scenarios– Applicable to a variety of networks
• Numerous applications• Significant speedups
– Iterative formula vs. stochastic simulations– Parallel implementation
24
• My email: [email protected]
• My Webpage: http://www-scf.usc.edu/~chelmis/
• Research Group: http://pgroup.usc.edu/
Questions?
Thank you