the unified model of influence in social networks

24
Influence in Social Networks: A Unified Model? Analytical and Computational Challenges Charalampos Chelmis [email protected] Joint work with Ajitesh Srivastava, Viktor K. Prasanna

Upload: charalampos-chelmis

Post on 14-Jul-2015

102 views

Category:

Data & Analytics


2 download

TRANSCRIPT

Influence in Social Networks: A Unified Model?

Analytical and Computational Challenges

Charalampos [email protected]

Joint work with Ajitesh Srivastava, Viktor K. Prasanna

Influence in Social Networks

2

• Rule 1: we shape our network– We affect our friends

• Rule 2: our network shapes us– Our friends affect us– Our friends’ friends affect us

• External Influence– News– Media– …

• Emerging Behavior– Influence spread

Why?

3

InformationDissemination

• Emotions are contagious!

• Obesity is contagious!

Going Viral

• Technology adoption

• Epidemics– Fractional Immunization

• Other examples …

How to Study the Behavior of Infection Spread?

4

• Input: graph and a model of influence

• Problem: find the probability of infection of a nodeat time

• Challenges– Abundance of models to choose from

• Properties estimated from observation data– Peer pressure vs. buzz vs. external factors– Agent-based computational models

a

b

d

ef

g

chi

j

k

l

pab

pag pac

padpae

paf

pdi

pdj

pgk

pgl

pgb

Agent-based Modeling

5

• ABM: stochastic computer simulations of simulatedagents, in simulated space, over simulated time

• Useful to study emerging population dynamics– Macro-level behavioral patterns emerge

• Need explicitly programmed, micro-level rules– micro-level behaviors, interactions, and movements of

agents

• Challenges– Require an underlying social network model

• Also a model for social network growth• Community argues that “none of the standard network

models fit well with sociological theory”– Feature space is unbounded

• Individual behavior is a complex function of attributes andcharacteristics

• Nonlinear and/or complex interactions

Require extensive simulation runs!

ABM Computational Challenges

6

• Difficult to scale– Large-scale Networks– Evolutionary nature of the workload– Irregular graph processing

• Irregular communication• Load imbalance

• Back to influence/diffusion challenges– Most approaches are probabilistic– Require averaging 10,000s simulations

• Goal: Analytical solution for progressive diffusion with minimalcomputational complexity

Storage and computation patterns inversely affect performance!

Exact solution is #P hard!

Problem Setting

7

• Input: Directed graph G=(V,E), Seed set ⊂ , Diffusion process

• Output: Probability of infection for every node at time t, Bu,t

• Eu,t = 1, if u is infected by the time t

• Proceeds in discrete time steps• Two influence processes

– Pairwise– Collective

A

B

Ni(B)

No(B)C D

• Individual influence is the influence on a node u by an infected node virrespective of the state of infection of any other node in the network.

• Collective influence is the influence on a node u by the infected state of aset of nodes. (Popularity + External factors)

The Unified Model of Influence

8

Unified Model of Influence

9

• Individual influence: Each infected node attempts to infect its neighbors. Eachattempt of infecting node ∈ ( ) has a chance of success with probability( , ) ( ) and may change over time.

• Collective influence: Each susceptible node can be infected with probability( ), independent of individual influence. This may include external sources ofexposure, or the status of the incoming neighborhood of .

, ( )

Exact Solution When G is Tree (Special Case)

10

• Only one incoming neighbor (parent)

,= 1− 1 − 1 − , ( ) 1 − ,+ , 1 − , 1 −

• Solution is exact!

Individualinfluence

Collectiveinfluence

Infectionprobability of

parent

Exact Solution When G is Tree (Special Case)Proof Sketch

11

• The probability of u not being infected at time t: , = 0 = 1 − ,• Only two possibilities for this to happen:

– Collective influence failed with probability 1 − ( )• Parent v was **not** infected at time − 1• Or• Parent v **was** infected at time − 1

– But failed to infect u with probability 1 − ( , )( )Collective influence failed at

each time step ≤ t-1 1 − , 1 −Independent

General Solution

12

• Approximate analytical solution,= 1− 1 − 1 − , 1 − , ,∈+ , ( )(1 − , )∈ 1 − − 1 + ,

• Complexity:– Compared to ( | | ), where number of simulations is typically 10,000

Reduction to Other Models

13

• We show reductions of the Unified Model to the following models– Independent Cascade Model (Kempe, KDD 2003)

• Additive influence• n independent chances to become infected•

– Generalized Threshold Model (Kempe, KDD 2003)• LT, LFM

– Complex Contagion Model (Chelmis, ASONAM 2013)• ICM + Exponential Growth

– Infected nodes infect each of their outgoing neighbors with probability– Healthy nodes randomly get infected with probability ( )

nICM )-(1-1p

Experimental evaluation• Subset of Digg follower graph (Digg1k)• Erdös-Renyi random networks

Agreement of Simulation and Theory onDigg1k

14

Independent Cascade Model

Complex Contagion Model

Generalized Threshold Model

Experimental Setup

15

• Varying network size: [20, 40, 80, … , 10240]– Density set to 5/| | (sparse graph. No of edges 5|V|)– Varying parameters– Start with 20% infected nodes– Run 5 times with Erdös-Renyi random graphs– Each run has 1,000 simulations– Each simulation goes till = 50

• Varying density– | | = 1000– Number of edges: [1000, 2000, 4000,… , 512000]

• Influence Models– ICM, GLT, CCM

Performance Metrics

16

• RMS Error in probability prediction: Microscopic error

= , − ,∗• Number of infections at time : = ∑ ,• Fractional error in expected infection prediction: Macroscopic error= − ∗

Error Over Time

17

• ICM with different parameters– Error vs time, averaged over graph sizes

Microscopic error vs time Macroscopic error vs time

Error as a Function of Network Size

18

• ICM with different parameters– Error vs size of the graph, averaged over time

Microscopic error vs time Macroscopic error vs time

Error as a Function of Network Density

19

• ICM with different parameters– Microscopic error vs density (averaged over time and different graphs)

Error Over Time (Varying Density)

20

• ICM with p = 0.5 on a network of 1,000 nodes– Microscopic error vs time (for a variety of density values)

Computational Advantages of Unified Model

21

• Vertex-centric approach– Great modeling flexibility

• Personalized influence functions• Time-dependent functions allowed

– Depends only on local structure• Bu,t-1

• Bv,t-1 for ∈ ( )• ru(t)

Possible to achieve great parallelism!

Parallel Implementation Evaluation

22

• Maximum parallelism when #PE equals number of models

• Evaluation– OpenMP implementation

• Shared memory environment• 8-core 2.66GHz machine

• Datasets– HEPT

• 15K nodes - 59K edges– Twitter

• 3.8M nodes - 5.4M edges

Conclusions

23

• A general analytical framework for calculating influence spread– No requirement of expensive stochastic simulations

• Quality of analytical solution– Exact for trees– Empirically proven that error is small for generic graphs

• Generality of analytical solution– Applicable to a variety of models

• Enables “what-if” scenarios– Applicable to a variety of networks

• Numerous applications• Significant speedups

– Iterative formula vs. stochastic simulations– Parallel implementation

24

• My email: [email protected]

• My Webpage: http://www-scf.usc.edu/~chelmis/

• Research Group: http://pgroup.usc.edu/

Questions?

Thank you