dava: distributing vaccines over networks under prior information

30
DAVA: Distributing Vaccines over Networks under Prior Information Yao Zhang, B. Aditya Prakash Department of Computer Science Virginia Tech SDM, Philadelphia, April 24, 2014

Upload: altessa

Post on 23-Feb-2016

87 views

Category:

Documents


1 download

DESCRIPTION

DAVA: Distributing Vaccines over Networks under Prior Information. Yao Zhang, B . Aditya Prakash Department of Computer Science Virginia Tech. SDM, Philadelphia, April 24, 2014. Motivation: Epidemiology. Virus spreads over contact networks SIR model [Anderson+ 1991] - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: DAVA: Distributing Vaccines over Networks under Prior Information

DAVA: Distributing Vaccines over Networks under Prior Information

Yao Zhang, B. Aditya Prakash Department of Computer Science

Virginia Tech

SDM, Philadelphia, April 24, 2014

Page 2: DAVA: Distributing Vaccines over Networks under Prior Information

2

Motivation: Epidemiology• Virus spreads over contact

networks• SIR model [Anderson+ 1991]

• Susceptible-Infectious-Recovered• Weights pij: propagation prob.

from i to j• Recovered prob. δ for each node• (models mumps-like infections)

Zhang and Prakash, SDM2014

Page 3: DAVA: Distributing Vaccines over Networks under Prior Information

3

Motivation: Social Media• Meme/Rumor spreads over

friendship networks• E.g.: Twitter following network

• Independent cascade model (IC) [Kempe+ KDD2003]• Each node has only one chance

to infect its neighbors• Special case of SIR model

Zhang and Prakash, SDM2014

Page 4: DAVA: Distributing Vaccines over Networks under Prior Information

4

Immunization• Centers for Disease Control (CDC) cares

about containing epidemic diseases• E.g: ~400 million dollars used for vaccines for

children in 2013• Twitter tries to stop rumor spread• E.g.: rumors of victims after the Boston Marathon

bombs in 2013

Zhang and Prakash, SDM2014

How to choose best nodes to vaccinate (remove)?

Page 5: DAVA: Distributing Vaccines over Networks under Prior Information

5

Immunization

Zhang and Prakash, SDM2014

Pre-emptive immunization (choose nodes before the epidemic starts)• Acquaintance strategy [Cohen+ 2003]• pick a random person, immunize one of its

neighbors at random • Netshield [Tong+ 2010]• Minimize the epidemic threshold (point when the virus takes-off)

Good for baseline strategies

Page 6: DAVA: Distributing Vaccines over Networks under Prior Information

6

In reality

Typically the epidemic has already started!• More realistic intervention• Which nodes to vaccinate now?• We call it Data-Aware Immunization

this paperZhang and Prakash, SDM2014

Pre-emptive immunization (choose nodes before the epidemic starts)• Acquaintance strategy [Cohen+ 2003]• Netshield [Tong+ 2010] ?

Page 7: DAVA: Distributing Vaccines over Networks under Prior Information

7

Outline

• Motivation• Problem Definition• Complexity• Our Proposed Methods• Experiments• Conclusion

Zhang and Prakash, SDM2014

Page 8: DAVA: Distributing Vaccines over Networks under Prior Information

8

Data-Aware Vaccination ProblemProblem: Given a set of infected nodes and a contact graph, how to distribute k vaccines (node removal) to minimize the expected number of infected nodes at the end of the epidemic?

1 vaccine?

pij =1 for all edges

Best solutionA

B

C C

B

A

Remove A, save {A, D}; Remove B, save {B};Remove C, save {C};

Zhang and Prakash, SDM2014

F

E

D

E

F

D

Page 9: DAVA: Distributing Vaccines over Networks under Prior Information

9

Outline

• Motivation• Problem Definition• Complexity• Our Proposed Methods• Experiments• Conclusion

Zhang and Prakash, SDM2014

Page 10: DAVA: Distributing Vaccines over Networks under Prior Information

10

Complexity of DAV

• NP-hard• Reduce from Maximum K-Intersection Problem

(MaxKI: maximizing the intersection of k subsets)• MaxKI is NP-Complete [Vinterbo 2004]

• Approximation algorithm?• Not submodular

• Actually, DAV is hard to approximate within an absolute error!

See paper for details

Zhang and Prakash, SDM2014

Page 11: DAVA: Distributing Vaccines over Networks under Prior Information

11

Outline• Motivation• Problem Definition• Complexity• Our Proposed Methods• assume IC model and undirected graph

• Experiments• Conclusion

Zhang and Prakash, SDM2014

Page 12: DAVA: Distributing Vaccines over Networks under Prior Information

12

1: Simplify - Merging infected nodes

• Idea: merge all the infected nodes into a single ‘super infected’ node I

pX

pY

pB

Logical-ORpB=1-(1-pX)(1-pY)

pA

pC

pA

pC

Equivalent

Merged GraphOriginal Graph

A

B

C

A

B

CZhang and Prakash, SDM2014

Super node I

Page 13: DAVA: Distributing Vaccines over Networks under Prior Information

13

2: DAVA-Tree Algorithm: Idea • Select nodes with the largest “benefit”• : the expected number of saved nodes after

removing set S on graph G• Benefit of adding additional node j into S:

Merged Infected Node

Benefit: 4

Benefit: 2

Benefit: 5

pij =1for all edges

Additional number of saved nodes when adding node j into S

# of saved nodes after adding j into S

Zhang and Prakash, SDM2014

Page 14: DAVA: Distributing Vaccines over Networks under Prior Information

14

DAVA-Tree Alg.: Optimal on Trees

• Fact 1: the chosen nodes in the optimal set must be neighbors of infected node I

Benefit: 4

Benefit: 2 Benefit: 5

• Fact 2: the benefit of each such node is independent of the rest of the set S

DAVA-tree algorithm: Select top k node from I’s neighbors with the max. benefit

pij =1for all edges

Merged Infected Node

Linear Time

Zhang and Prakash, SDM2014

For any set S:

Page 15: DAVA: Distributing Vaccines over Networks under Prior Information

15

• Idea• We have the optimal algorithm for a tree• Extract a spanning tree, then run DAVA-tree• What kind of tree?

• Minimum spanning tree

3: General Case – Arbitrary Graphs

pij =1 for all edges

Optimal solution

MST

Optimal on MST by DAVA-tree

Zhang and Prakash, SDM2014

Page 16: DAVA: Distributing Vaccines over Networks under Prior Information

16

• Idea• We have the optimal algorithm for a tree• Build a spanning tree first• What kind of tree?

• Minimum spanning tree

3: General Case – Arbitrary Graphs

We propose to use dominator tree

u dominates v

every path from I to v contains u

4 dominates 8,9,10,11pij =1 for all edges

Software engineering

Zhang and Prakash, SDM2014

Page 17: DAVA: Distributing Vaccines over Networks under Prior Information

17

Dominator Tree

Merged Graph Dominator Tree

Linear time [Buchsbaum, Tarjan 1998]

Optimal from DAVA-tree

u dominates v AND every other dominator of v dominates u

u is immediate dominator of v

Dominator tree: add an edge between every such u and v

Optimal solution

pij =1 for all edges

• Fact 1: the optimal solution should be among the children of root I in the dominator tree for any arbitrary graph

• Fact 2: (for special case, k = 1, p = 1) running DAVA-tree on the dominator tree gives the optimal solution

Zhang and Prakash, SDM2014

Page 18: DAVA: Distributing Vaccines over Networks under Prior Information

18

Weighting the dominator tree• Weighting the dominator tree• #P-complete

• Our solution: maximum propagation path probability between nodes I and v (using Dijkstra’s algorithm)

Merged Graph Dominator Tree

Zhang and Prakash, SDM2014

p1

p6

p3

w1

w6

w3

Page 19: DAVA: Distributing Vaccines over Networks under Prior Information

19

DAVA algorithm

|S|=2Iteration=1

Merged Graph (pij =1 for all edges)

Dominator Tree

Step: 1. T = Build a dominator tree2. v = Run DAVA-tree on T with budget=13. Remove v from G4. Goto Step 1 until |S|=k

Zhang and Prakash, SDM2014

Page 20: DAVA: Distributing Vaccines over Networks under Prior Information

20

DAVA algorithmStep: 1. T = Build a dominator tree2. v = Run DAVA-tree on T with budget=13. Remove v from G4. Goto Step 1 until |S|=kO(k(|E|+ |V|log|V|))

Too slow for large networks!

Remove selected node

Dominator tree

|S|=2Iteration=2

Merged Graph

Iteration=1

Zhang and Prakash, SDM2014

Page 21: DAVA: Distributing Vaccines over Networks under Prior Information

21

DAVA-fast: a faster algorithm

• Time complexity: subquadratic!– DAVA-fast: O(|V|log|V|+|E|)

Step: 1. T = Build a dominator tree2. S = Run DAVA-tree on T with budget=k

|S|=2 • In practice, the performance of

DAVA-fast is very close to DAVA

Dominator tree

Merged Graph

Zhang and Prakash, SDM2014

Page 22: DAVA: Distributing Vaccines over Networks under Prior Information

22

Extending to SIR model• See the paper

Zhang and Prakash, SDM2014

Page 23: DAVA: Distributing Vaccines over Networks under Prior Information

23

Outline

• Motivation• Problem Definition• Complexity• Our Proposed Methods• Experiments• Conclusion

Zhang and Prakash, SDM2014

Page 24: DAVA: Distributing Vaccines over Networks under Prior Information

24

Experiments• Virus Propagation Model• IC and SIR

• Settings (See more settings in the paper)

• Randomly uniformly chosen initial infected nodes• Baseline Algorithms• RANDOM: randomly uniformly chosen healthy nodes• DEGREE: choose nodes with top weighted degrees• PAGERANK: choose nodes with top pageranks• NETSHIELD

• state-of-the-art pre-emptive immunization algorithm to minimize the epidemic threshold of the graph [Tong+ ICDM 2010]

• Assumes no data is given before the epidemic starts

Zhang and Prakash, SDM2014

Page 25: DAVA: Distributing Vaccines over Networks under Prior Information

25

Experiments: datasetsDatasets are chosen from different domains• Social media (IC model)

• OREGON: AS router graph• STANFORD: hyperlink network• GNUTELLA: peer-to-peer network• BRIGHTKITE: friendship network

• Epidemiology (SIR model)• PORTLAND and MIAMI: large urban social-contact graph used in

national smallpox modeling studies [Eubank+, 2004]

OREGON STANFORD GNUTELLA BRIGHTKITE PORTLAND MIAMI

|V| 633 8,929 10,876 58,228 0.5 million 0.6 million

|E| 2,172 53,829 39,994 21,4078 1.6 million 2.1 million

Zhang and Prakash, SDM2014

Page 26: DAVA: Distributing Vaccines over Networks under Prior Information

26

Experiments: QualityGNUTELLA (IC model) PORTLAND (SIR model)

DAVA consistently outperforms the baseline algorithms. Further DAVA-fast performs almost as well as DAVA.

(See more results in the paper)

Higher is better

Zhang and Prakash, SDM2014

Page 27: DAVA: Distributing Vaccines over Networks under Prior Information

27

Experiments: Scalabilitydid not finish within 10 hours

Run

ning

tim

e(se

c.)

Lower is better

Zhang and Prakash, SDM2014

Page 28: DAVA: Distributing Vaccines over Networks under Prior Information

28

Outline

• Motivation• Problem Definition• Complexity• Our Proposed Methods• Experiments• Conclusion

Zhang and Prakash, SDM2014

Page 29: DAVA: Distributing Vaccines over Networks under Prior Information

29

Conclusion

Dominator tree

Merged graph

Graph with infected nodes

Data-Aware Vaccination problemGiven: Graph and Infected nodesFind: ‘best’ nodes for immunization• Complexity

• NP-hard• Hard to approximate within an absolute error

• DAVA-tree• Optimal solution on the tree

• DAVA and DAVA-fast• Merging infected nodes• Build a dominator tree, and run DAVA-tree

• Running time: subquadratic• DAVA: O(k(|E|+ |V|log|V|))• DAVA-fast: O(|E|+|V|log|V|)

Zhang and Prakash, SDM2014

Page 30: DAVA: Distributing Vaccines over Networks under Prior Information

30

Any Questions?

Code at:http://people.cs.vt.edu/~yaozhang

Thanks for the support of NSF (Grant No. IIS-1353346).

Yao Zhang B. Aditya Prakash

Zhang and Prakash, SDM2014

Dominator tree

Merged graph

Graph with infected nodes