dava: distributing vaccines over networks under prior information
DESCRIPTION
DAVA: Distributing Vaccines over Networks under Prior Information. Yao Zhang, B . Aditya Prakash Department of Computer Science Virginia Tech. SDM, Philadelphia, April 24, 2014. Motivation: Epidemiology. Virus spreads over contact networks SIR model [Anderson+ 1991] - PowerPoint PPT PresentationTRANSCRIPT
DAVA: Distributing Vaccines over Networks under Prior Information
Yao Zhang, B. Aditya Prakash Department of Computer Science
Virginia Tech
SDM, Philadelphia, April 24, 2014
2
Motivation: Epidemiology• Virus spreads over contact
networks• SIR model [Anderson+ 1991]
• Susceptible-Infectious-Recovered• Weights pij: propagation prob.
from i to j• Recovered prob. δ for each node• (models mumps-like infections)
Zhang and Prakash, SDM2014
3
Motivation: Social Media• Meme/Rumor spreads over
friendship networks• E.g.: Twitter following network
• Independent cascade model (IC) [Kempe+ KDD2003]• Each node has only one chance
to infect its neighbors• Special case of SIR model
Zhang and Prakash, SDM2014
4
Immunization• Centers for Disease Control (CDC) cares
about containing epidemic diseases• E.g: ~400 million dollars used for vaccines for
children in 2013• Twitter tries to stop rumor spread• E.g.: rumors of victims after the Boston Marathon
bombs in 2013
Zhang and Prakash, SDM2014
How to choose best nodes to vaccinate (remove)?
5
Immunization
Zhang and Prakash, SDM2014
Pre-emptive immunization (choose nodes before the epidemic starts)• Acquaintance strategy [Cohen+ 2003]• pick a random person, immunize one of its
neighbors at random • Netshield [Tong+ 2010]• Minimize the epidemic threshold (point when the virus takes-off)
Good for baseline strategies
6
In reality
Typically the epidemic has already started!• More realistic intervention• Which nodes to vaccinate now?• We call it Data-Aware Immunization
this paperZhang and Prakash, SDM2014
Pre-emptive immunization (choose nodes before the epidemic starts)• Acquaintance strategy [Cohen+ 2003]• Netshield [Tong+ 2010] ?
7
Outline
• Motivation• Problem Definition• Complexity• Our Proposed Methods• Experiments• Conclusion
Zhang and Prakash, SDM2014
8
Data-Aware Vaccination ProblemProblem: Given a set of infected nodes and a contact graph, how to distribute k vaccines (node removal) to minimize the expected number of infected nodes at the end of the epidemic?
1 vaccine?
pij =1 for all edges
Best solutionA
B
C C
B
A
Remove A, save {A, D}; Remove B, save {B};Remove C, save {C};
Zhang and Prakash, SDM2014
F
E
D
E
F
D
9
Outline
• Motivation• Problem Definition• Complexity• Our Proposed Methods• Experiments• Conclusion
Zhang and Prakash, SDM2014
10
Complexity of DAV
• NP-hard• Reduce from Maximum K-Intersection Problem
(MaxKI: maximizing the intersection of k subsets)• MaxKI is NP-Complete [Vinterbo 2004]
• Approximation algorithm?• Not submodular
• Actually, DAV is hard to approximate within an absolute error!
See paper for details
Zhang and Prakash, SDM2014
11
Outline• Motivation• Problem Definition• Complexity• Our Proposed Methods• assume IC model and undirected graph
• Experiments• Conclusion
Zhang and Prakash, SDM2014
12
1: Simplify - Merging infected nodes
• Idea: merge all the infected nodes into a single ‘super infected’ node I
pX
pY
pB
Logical-ORpB=1-(1-pX)(1-pY)
pA
pC
pA
pC
Equivalent
Merged GraphOriginal Graph
A
B
C
A
B
CZhang and Prakash, SDM2014
Super node I
13
2: DAVA-Tree Algorithm: Idea • Select nodes with the largest “benefit”• : the expected number of saved nodes after
removing set S on graph G• Benefit of adding additional node j into S:
Merged Infected Node
Benefit: 4
Benefit: 2
Benefit: 5
pij =1for all edges
Additional number of saved nodes when adding node j into S
# of saved nodes after adding j into S
Zhang and Prakash, SDM2014
14
DAVA-Tree Alg.: Optimal on Trees
• Fact 1: the chosen nodes in the optimal set must be neighbors of infected node I
Benefit: 4
Benefit: 2 Benefit: 5
• Fact 2: the benefit of each such node is independent of the rest of the set S
DAVA-tree algorithm: Select top k node from I’s neighbors with the max. benefit
pij =1for all edges
Merged Infected Node
Linear Time
Zhang and Prakash, SDM2014
For any set S:
15
• Idea• We have the optimal algorithm for a tree• Extract a spanning tree, then run DAVA-tree• What kind of tree?
• Minimum spanning tree
3: General Case – Arbitrary Graphs
pij =1 for all edges
Optimal solution
MST
Optimal on MST by DAVA-tree
Zhang and Prakash, SDM2014
16
• Idea• We have the optimal algorithm for a tree• Build a spanning tree first• What kind of tree?
• Minimum spanning tree
3: General Case – Arbitrary Graphs
We propose to use dominator tree
u dominates v
every path from I to v contains u
4 dominates 8,9,10,11pij =1 for all edges
Software engineering
Zhang and Prakash, SDM2014
17
Dominator Tree
Merged Graph Dominator Tree
Linear time [Buchsbaum, Tarjan 1998]
Optimal from DAVA-tree
u dominates v AND every other dominator of v dominates u
u is immediate dominator of v
Dominator tree: add an edge between every such u and v
Optimal solution
pij =1 for all edges
• Fact 1: the optimal solution should be among the children of root I in the dominator tree for any arbitrary graph
• Fact 2: (for special case, k = 1, p = 1) running DAVA-tree on the dominator tree gives the optimal solution
Zhang and Prakash, SDM2014
18
Weighting the dominator tree• Weighting the dominator tree• #P-complete
• Our solution: maximum propagation path probability between nodes I and v (using Dijkstra’s algorithm)
Merged Graph Dominator Tree
Zhang and Prakash, SDM2014
p1
p6
p3
w1
w6
w3
19
DAVA algorithm
|S|=2Iteration=1
Merged Graph (pij =1 for all edges)
Dominator Tree
Step: 1. T = Build a dominator tree2. v = Run DAVA-tree on T with budget=13. Remove v from G4. Goto Step 1 until |S|=k
Zhang and Prakash, SDM2014
20
DAVA algorithmStep: 1. T = Build a dominator tree2. v = Run DAVA-tree on T with budget=13. Remove v from G4. Goto Step 1 until |S|=kO(k(|E|+ |V|log|V|))
Too slow for large networks!
Remove selected node
Dominator tree
|S|=2Iteration=2
Merged Graph
Iteration=1
Zhang and Prakash, SDM2014
21
DAVA-fast: a faster algorithm
• Time complexity: subquadratic!– DAVA-fast: O(|V|log|V|+|E|)
Step: 1. T = Build a dominator tree2. S = Run DAVA-tree on T with budget=k
|S|=2 • In practice, the performance of
DAVA-fast is very close to DAVA
Dominator tree
Merged Graph
Zhang and Prakash, SDM2014
22
Extending to SIR model• See the paper
Zhang and Prakash, SDM2014
23
Outline
• Motivation• Problem Definition• Complexity• Our Proposed Methods• Experiments• Conclusion
Zhang and Prakash, SDM2014
24
Experiments• Virus Propagation Model• IC and SIR
• Settings (See more settings in the paper)
• Randomly uniformly chosen initial infected nodes• Baseline Algorithms• RANDOM: randomly uniformly chosen healthy nodes• DEGREE: choose nodes with top weighted degrees• PAGERANK: choose nodes with top pageranks• NETSHIELD
• state-of-the-art pre-emptive immunization algorithm to minimize the epidemic threshold of the graph [Tong+ ICDM 2010]
• Assumes no data is given before the epidemic starts
Zhang and Prakash, SDM2014
25
Experiments: datasetsDatasets are chosen from different domains• Social media (IC model)
• OREGON: AS router graph• STANFORD: hyperlink network• GNUTELLA: peer-to-peer network• BRIGHTKITE: friendship network
• Epidemiology (SIR model)• PORTLAND and MIAMI: large urban social-contact graph used in
national smallpox modeling studies [Eubank+, 2004]
OREGON STANFORD GNUTELLA BRIGHTKITE PORTLAND MIAMI
|V| 633 8,929 10,876 58,228 0.5 million 0.6 million
|E| 2,172 53,829 39,994 21,4078 1.6 million 2.1 million
Zhang and Prakash, SDM2014
26
Experiments: QualityGNUTELLA (IC model) PORTLAND (SIR model)
DAVA consistently outperforms the baseline algorithms. Further DAVA-fast performs almost as well as DAVA.
(See more results in the paper)
Higher is better
Zhang and Prakash, SDM2014
27
Experiments: Scalabilitydid not finish within 10 hours
Run
ning
tim
e(se
c.)
Lower is better
Zhang and Prakash, SDM2014
28
Outline
• Motivation• Problem Definition• Complexity• Our Proposed Methods• Experiments• Conclusion
Zhang and Prakash, SDM2014
29
Conclusion
Dominator tree
Merged graph
Graph with infected nodes
Data-Aware Vaccination problemGiven: Graph and Infected nodesFind: ‘best’ nodes for immunization• Complexity
• NP-hard• Hard to approximate within an absolute error
• DAVA-tree• Optimal solution on the tree
• DAVA and DAVA-fast• Merging infected nodes• Build a dominator tree, and run DAVA-tree
• Running time: subquadratic• DAVA: O(k(|E|+ |V|log|V|))• DAVA-fast: O(|E|+|V|log|V|)
Zhang and Prakash, SDM2014
30
Any Questions?
Code at:http://people.cs.vt.edu/~yaozhang
Thanks for the support of NSF (Grant No. IIS-1353346).
Yao Zhang B. Aditya Prakash
Zhang and Prakash, SDM2014
Dominator tree
Merged graph
Graph with infected nodes