dava: distributing vaccines over networks under prior information yao zhang, b. aditya prakash...
TRANSCRIPT
DAVA: Distributing Vaccines over Networks under Prior InformationDAVA: Distributing Vaccines over Networks under Prior Information
Yao Zhang, B. Aditya Prakash
Department of Computer Science
Virginia Tech
SDM, Philadelphia, April 24, 2014
2
Motivation: EpidemiologyMotivation: Epidemiology• Virus spreads over contact
networks• SIR model [Anderson+ 1991]
• Susceptible-Infectious-Recovered• Weights pij: propagation prob.
from i to j• Recovered prob. δ for each node• (models mumps-like infections)
Zhang and Prakash, SDM2014
3
Motivation: Social MediaMotivation: Social Media• Meme/Rumor spreads over
friendship networks• E.g.: Twitter following network
• Independent cascade model (IC) [Kempe+ KDD2003]
• Each node has only one chance to infect its neighbors
• Special case of SIR model
Zhang and Prakash, SDM2014
4
ImmunizationImmunization
• Centers for Disease Control (CDC) cares about containing epidemic diseases• E.g: ~400 million dollars used for vaccines for
children in 2013
• Twitter tries to stop rumor spread• E.g.: rumors of victims after the Boston Marathon
bombs in 2013
Zhang and Prakash, SDM2014
How to choose best nodes to vaccinate (remove)?
5
ImmunizationImmunization
Zhang and Prakash, SDM2014
Pre-emptive immunization (choose nodes before the epidemic starts)• Acquaintance strategy [Cohen+ 2003]• pick a random person, immunize one of its
neighbors at random • Netshield [Tong+ 2010]• Minimize the epidemic threshold
(point when the virus takes-off)
Good for baseline strategies
6
In realityIn reality
Typically the epidemic has already started!• More realistic intervention• Which nodes to vaccinate now?• We call it Data-Aware Immunization
this paperZhang and Prakash, SDM2014
Pre-emptive immunization (choose nodes before the epidemic starts)• Acquaintance strategy [Cohen+ 2003]• Netshield [Tong+ 2010] ?
7
OutlineOutline
• Motivation• Problem Definition• Complexity• Our Proposed Methods• Experiments• Conclusion
Zhang and Prakash, SDM2014
8
Data-Aware Vaccination ProblemData-Aware Vaccination ProblemProblem: Given a set of infected nodes and a contact graph, how to distribute k vaccines (node removal) to minimize the expected number of infected nodes at the end of the epidemic?
1 vaccine?
pij =1 for all edges
Best solutionA
B
C C
B
A
Remove A, save {A, D}; Remove B, save {B};Remove C, save {C};
Zhang and Prakash, SDM2014
F
E
D
E
F
D
9
OutlineOutline
• Motivation• Problem Definition• Complexity• Our Proposed Methods• Experiments• Conclusion
Zhang and Prakash, SDM2014
10
Complexity of DAVComplexity of DAV
• NP-hard• Reduce from Maximum K-Intersection Problem
(MaxKI: maximizing the intersection of k subsets)• MaxKI is NP-Complete [Vinterbo 2004]
• Approximation algorithm?• Not submodular
• Actually, DAV is hard to approximate within an absolute error!
See paper for details
Zhang and Prakash, SDM2014
11
OutlineOutline• Motivation• Problem Definition• Complexity• Our Proposed Methods• assume IC model and undirected graph
• Experiments• Conclusion
Zhang and Prakash, SDM2014
12
1: Simplify - Merging infected nodes 1: Simplify - Merging infected nodes
• Idea: merge all the infected nodes into a single ‘super infected’ node I
pX
pY
pB
Logical-ORpB=1-(1-pX)(1-pY)
pA
pC
pA
pC
Equivalent
Merged GraphOriginal Graph
A
B
C
A
B
C
Zhang and Prakash, SDM2014
Super node I
13
2: DAVA-Tree Algorithm: Idea 2: DAVA-Tree Algorithm: Idea • Select nodes with the largest “benefit”• : the expected number of saved nodes after
removing set S on graph G• Benefit of adding additional node j into S:
Merged Infected Node
Benefit: 4
Benefit: 2
Benefit: 5
pij =1for all edges
Additional number of saved nodes when adding node j into S
# of saved nodes after adding j into S
Zhang and Prakash, SDM2014
14
DAVA-Tree Alg.: Optimal on TreesDAVA-Tree Alg.: Optimal on Trees
• Fact 1: the chosen nodes in the optimal set must be neighbors of infected node I
Benefit: 4
Benefit: 2 Benefit: 5
• Fact 2: the benefit of each such node is independent of the rest of the set S
DAVA-tree algorithm: Select top k node from I’s neighbors with the max. benefit
pij =1for all edges
Merged Infected Node
Linear Time
Zhang and Prakash, SDM2014
For any set S:
15
• Idea• We have the optimal algorithm for a tree• Extract a spanning tree, then run DAVA-tree• What kind of tree?
• Minimum spanning tree
3: General Case – Arbitrary Graphs3: General Case – Arbitrary Graphs
pij =1 for all edges
Optimal solution
MST
Optimal on MST by DAVA-tree
Zhang and Prakash, SDM2014
16
• Idea• We have the optimal algorithm for a tree• Build a spanning tree first• What kind of tree?
• Minimum spanning tree
3: General Case – Arbitrary Graphs3: General Case – Arbitrary Graphs
We propose to use dominator tree
u dominates v
every path from I to v contains u
4 dominates 8,9,10,11pij =1 for all edges
Software engineering
Zhang and Prakash, SDM2014
17
Dominator TreeDominator Tree
Merged Graph Dominator Tree
Linear time [Buchsbaum, Tarjan 1998]
Optimal from DAVA-tree
u dominates v AND every other dominator of v dominates u
u is immediate dominator of v
Dominator tree: add an edge between every such u and v
Optimal solution
pij =1 for all edges
• Fact 1: the optimal solution should be among the children of root I in the dominator tree for any arbitrary graph
• Fact 2: (for special case, k = 1, p = 1) running DAVA-tree on the dominator tree gives the optimal solution
Zhang and Prakash, SDM2014
18
Weighting the dominator treeWeighting the dominator tree• Weighting the dominator tree• #P-complete
• Our solution: maximum propagation path probability between nodes I and v (using Dijkstra’s algorithm)
Merged Graph Dominator Tree
Zhang and Prakash, SDM2014
p1
p6
p3
w1
w6
w3
19
DAVA algorithmDAVA algorithm
|S|=2Iteration=1
Merged Graph (pij =1 for all edges)
Dominator Tree
Step: 1. T = Build a dominator tree
2. v = Run DAVA-tree on T with
budget=1
3. Remove v from G
4. Goto Step 1 until |S|=k
Zhang and Prakash, SDM2014
20
DAVA algorithmDAVA algorithmStep: 1. T = Build a dominator tree
2. v = Run DAVA-tree on T with
budget=1
3. Remove v from G
4. Goto Step 1 until |S|=kO(k(|E|+ |V|log|V|))
Too slow for large networks!
Remove selected node
Dominator tree
|S|=2Iteration=2
Merged Graph
Iteration=1
Zhang and Prakash, SDM2014
21
DAVA-fast: a faster algorithmDAVA-fast: a faster algorithm
• Time complexity: subquadratic!– DAVA-fast: O(|V|log|V|+|E|)
Step: 1. T = Build a dominator tree
2. S = Run DAVA-tree on T
with budget=k
|S|=2 • In practice, the performance of
DAVA-fast is very close to DAVA
Dominator tree
Merged Graph
Zhang and Prakash, SDM2014
23
OutlineOutline
• Motivation• Problem Definition• Complexity• Our Proposed Methods• Experiments• Conclusion
Zhang and Prakash, SDM2014
24
ExperimentsExperiments• Virus Propagation Model• IC and SIR
• Settings (See more settings in the paper)
• Randomly uniformly chosen initial infected nodes
• Baseline Algorithms• RANDOM: randomly uniformly chosen healthy nodes• DEGREE: choose nodes with top weighted degrees• PAGERANK: choose nodes with top pageranks• NETSHIELD
• state-of-the-art pre-emptive immunization algorithm to minimize the epidemic threshold of the graph [Tong+ ICDM 2010]
• Assumes no data is given before the epidemic starts
Zhang and Prakash, SDM2014
25
Experiments: datasetsExperiments: datasetsDatasets are chosen from different domains• Social media (IC model)
• OREGON: AS router graph• STANFORD: hyperlink network• GNUTELLA: peer-to-peer network• BRIGHTKITE: friendship network
• Epidemiology (SIR model)• PORTLAND and MIAMI: large urban social-contact graph used in
national smallpox modeling studies [Eubank+, 2004]
OREGON STANFORD GNUTELLA BRIGHTKITE PORTLAND MIAMI
|V| 633 8,929 10,876 58,228 0.5 million 0.6 million
|E| 2,172 53,829 39,994 21,4078 1.6 million 2.1 million
Zhang and Prakash, SDM2014
26
Experiments: QualityExperiments: QualityGNUTELLA (IC model) PORTLAND (SIR model)
DAVA consistently outperforms the baseline algorithms. Further DAVA-fast performs almost as well as DAVA.
(See more results in the paper)
Higher is better
Zhang and Prakash, SDM2014
27
Experiments: ScalabilityExperiments: Scalabilitydid not finish within 10 hours
Ru
nn
ing
tim
e(se
c.)
Lower is better
Zhang and Prakash, SDM2014
28
OutlineOutline
• Motivation• Problem Definition• Complexity• Our Proposed Methods• Experiments• Conclusion
Zhang and Prakash, SDM2014
29
ConclusionConclusion
Dominator tree
Merged graph
Graph with infected nodes
Data-Aware Vaccination problem
Given: Graph and Infected nodes
Find: ‘best’ nodes for immunization• Complexity
• NP-hard• Hard to approximate within an absolute error
• DAVA-tree• Optimal solution on the tree
• DAVA and DAVA-fast• Merging infected nodes• Build a dominator tree, and run DAVA-tree
• Running time: subquadratic• DAVA: O(k(|E|+ |V|log|V|))• DAVA-fast: O(|E|+|V|log|V|)
Zhang and Prakash, SDM2014