Transcript
Page 1: Fast and Accurate Influence Maximization on Large Networks with Pruned Monte-Carlo Simulations (AAAI'14)

Fast and Accurate Influence Maximization

on Large Networks

with Pruned Monte-Carlo Simulations

Naoto Ohsaka (UTokyo)

Takuya Akiba (UTokyo)

Yuichi Yoshida (NII & PFI)

Ken-ichi Kawarabayashi (NII)

JST, ERATO, Kawarabayashi Large Graph Project

1

2014/7/30 AAAI-14 @ Quรฉbec, Canada

Page 2: Fast and Accurate Influence Maximization on Large Networks with Pruned Monte-Carlo Simulations (AAAI'14)

Influence Maximization[Kempe, Kleinberg, Tardos. KDDโ€™03]

Input

Directed graph ๐บ = ๐‘‰, ๐ธ Edge probability ๐‘๐‘’ ๐‘’ โˆˆ ๐ธ Size of seed set ๐‘˜

Problem

maximize ๐œŽ ๐‘† ๐‘† โ‰ค ๐‘˜ ๐œŽ โ‹… : the spread of influence

2

๐‘จ๐‘ช ๐‘ซ

๐‘ฉ๐‘ฌ ๐‘ญ

0.6 0.1

0.30.4 0.8

0.2 0.5

Motivation

Viral (word-of-mouth) Marketing[Domingos, Richardson. KDDโ€™01], [Richardson, Domingos. KDDโ€™02]

Q. How to find a small group of influential individuals?

mathematically formalizing

Page 3: Fast and Accurate Influence Maximization on Large Networks with Pruned Monte-Carlo Simulations (AAAI'14)

Each vertex has 2 states (inactive / active)

Diffusion Process

0. Activate vertices in ๐‘† โŠ† ๐‘‰ called seed set

1. Active vertex ๐‘ข activates inactive vertex ๐‘ฃwith probability ๐‘๐‘ข๐‘ฃ (single trial)

2. Repeat 1 while new activations occur

Independent Cascade Model[Goldenberg, Libai, Muller. Marketing Lettersโ€™01]

inactive active

3

๐’—๐’–

success or failure

๐‘๐‘ข๐‘ฃ = 0.1

Page 4: Fast and Accurate Influence Maximization on Large Networks with Pruned Monte-Carlo Simulations (AAAI'14)

Influence spread ๐œŽ ๐‘†

Expected number of active verticesgiven a seed set ๐‘†

Example of

Independent Cascade Model

4

Seed

Inactive

Active

Success

Failure

๐‘จ๐‘ช ๐‘ซ

๐‘ฉ๐‘ฌ ๐‘ญ

๐‘จ๐‘ช ๐‘ซ

๐‘ฉ๐‘ฌ ๐‘ญ

๐‘จ๐‘ช ๐‘ซ

๐‘ฉ๐‘ฌ ๐‘ญ

0.6 0.1

0.30.4 0.8

0.2 0.5

Page 5: Fast and Accurate Influence Maximization on Large Networks with Pruned Monte-Carlo Simulations (AAAI'14)

Previous Results

Hardness

Influence Maximization is

NP-hard[Kempe, Kleinberg, Tardos. KDDโ€™03]

Exact Computation of

๐œŽ โ‹… is

#P-hard[Chen, Wang, Wang. KDDโ€™10]

Original Greedy

ApproachGreedy Algorithm

[Kempe, Kleinberg, Tardos. KDDโ€™03]

Approx. ratio โ‰ˆ 63%

Monte-Carlo Simulations

Good approximation

5

Page 6: Fast and Accurate Influence Maximization on Large Networks with Pruned Monte-Carlo Simulations (AAAI'14)

Greedy Algorithm [Kempe, Kleinberg, Tardos. KDDโ€™03]

Monte-Carlo Simulations (1 ยฑ ๐œ€ approximation)[Kempe, Kleinberg, Tardos. KDDโ€™03]

Simulating diffusion process repeatedly

Averaging # of active vertices

Original Greedy Approach

6

๐‘† โ† โˆ…while ๐‘† < ๐‘˜ do

๐‘ก โ† argmax๐‘ฃโˆˆ๐‘‰๐œŽ ๐‘† โˆช {๐‘ฃ} โˆ’ ๐œŽ(๐‘†)

๐‘† โ† ๐‘† โˆช {๐‘ก}

Due to submodularity of ๐œŽ โ‹…

๐œŽ ๐‘† โ‰ฅ 1 โˆ’1

๐‘’OPT โ‰ฅ 0.63 OPT

[Nemhauser, Wolsey, Fisher.

Mathematical Programmingโ€™78]

Produces near-optimal 1 โˆ’1

๐‘’โˆ’ ๐œ€โ€ฒ solutions

Page 7: Fast and Accurate Influence Maximization on Large Networks with Pruned Monte-Carlo Simulations (AAAI'14)

Issue: Original Greedy Approach

Suffers from Scalability

Greedy Algorithm

# of Evaluating ๐œŽ โ‹… :

๐’๐’Œ

Monte-Carlo Simulations

Computation Time of ๐œŽ โ‹… :

๐‘ถ ๐’Ž๐‘น

Total Time: ๐‘ถ ๐’Œ๐’๐’Ž๐‘น (๐‘… โ‰ˆ 10,000)

๐‘› = ๐‘‰ >106

๐‘š = ๐ธ >107

๐‘˜: # of seeds

๐‘… = poly(๐œ€โˆ’1): # of simulations

TOO SLOW

7

Page 8: Fast and Accurate Influence Maximization on Large Networks with Pruned Monte-Carlo Simulations (AAAI'14)

Previous Methods

for Influence Maximization

Low Quality High Quality

Slow

Greedy Approach[Kempe, Kleinberg, Tardos. KDDโ€™03]

CELF[Leskovec, Krause, Guestrin, Faloutsos,

VanBriesen, Glance. KDDโ€™07]

StaticGreedyDU[Cheng, Shen, Huang, Zhang, Cheng. CIKMโ€™13]

Fast

DegreeDiscount[Chen, Wang, Yang. KDDโ€™09]

PMIA[Chen, Wang, Wang. KDDโ€™10]

SAEDV[Jiang, Song, Cong, Wang, Si, Xie. AAAIโ€™11]

IRIE[Jung, Heo, Chen. ICDMโ€™12]

CHALLENGE

8

Simulation-based

Heuristic-based

Page 9: Fast and Accurate Influence Maximization on Large Networks with Pruned Monte-Carlo Simulations (AAAI'14)

Our Contribution

Propose a simulation-based fast algorithm

Fast

Comparable to heuristics

Can handle graphs

with 60M edges in 20 min.

Accurate

Has a theoretical guarantee

Better than heuristics

9

Page 10: Fast and Accurate Influence Maximization on Large Networks with Pruned Monte-Carlo Simulations (AAAI'14)

Outline of Proposed Method

Preprocessing: Generating random graphs

Greedy Strategy

10

๐‘† โ† โˆ…while ๐‘† < ๐‘˜ do

๐‘ก โ† argmax๐‘ฃโˆˆ๐‘‰๐œŽ ๐‘† โˆช {๐‘ฃ} โˆ’ ๐œŽ(๐‘†)

๐‘† โ† ๐‘† โˆช {๐‘ก} โ‡ง Our Speed-up Techniques

โ‡ง Coin Flip Technique

Page 11: Fast and Accurate Influence Maximization on Large Networks with Pruned Monte-Carlo Simulations (AAAI'14)

Preprocessing:

Generating Random Graphs

Edge ๐‘’ lives w.p. ๐‘๐‘’

11

โ€ฆโ€ฆ

๐‘ฎ๐Ÿ

Input graph ๐‘ฎ

๐‘ฎ๐‘น

๐‘… random graphs

๐‘จ๐‘ช ๐‘ซ

๐‘ฉ๐‘ฌ ๐‘ญ

๐‘จ๐‘ช ๐‘ซ

๐‘ฉ๐‘ฌ ๐‘ญ

๐‘จ๐‘ช ๐‘ซ

๐‘ฉ๐‘ฌ ๐‘ญ

Coin Flip Technique[Kempe, Kleinberg, Tardos. KDDโ€™03]

Computing influence spread ๐œŽ(๐‘†)||

Counting # of vertices reachable

from ๐‘† on random graph

live edge: success

blocked edge: failure

Page 12: Fast and Accurate Influence Maximization on Large Networks with Pruned Monte-Carlo Simulations (AAAI'14)

How to Approximate ๐œŽ(๐‘†)

12

๐’— ๐ˆ๐‘ฎ๐Ÿ ๐’— โ€ฆ ๐ˆ๐‘ฎ๐‘น ๐’— ๐ˆ ๐’—

๐‘จ ๐Ÿ‘ โ€ฆ ๐Ÿ ๐Ÿ. ๐Ÿ’

๐‘ฉ ๐Ÿ’ โ€ฆ ๐Ÿ ๐Ÿ. ๐Ÿ–

๐‘ช ๐Ÿ โ€ฆ ๐Ÿ ๐Ÿ. ๐Ÿ”

๐‘ซ ๐Ÿ โ€ฆ ๐Ÿ ๐Ÿ

๐‘ฌ ๐Ÿ โ€ฆ ๐Ÿ ๐Ÿ

๐‘ญ ๐Ÿ‘ โ€ฆ ๐Ÿ ๐Ÿ. ๐Ÿ

๐‘จ๐‘ช ๐‘ซ

๐‘ฉ๐‘ฌ ๐‘ญ

๐‘จ๐‘ช ๐‘ซ

๐‘ฉ๐‘ฌ ๐‘ญ

โ€ฆ

๐œŽ ๐‘† โ‰ˆ1

๐‘…

๐‘–=1

๐‘…

๐œŽ๐บ๐‘– ๐‘†

๐œŽ๐บ๐‘– ๐‘† = # of vertices

reachable from ๐‘† on ๐บ๐‘–

CHALLENGE

Computing this table

as fast as possible

๐‘น = 200

106

Page 13: Fast and Accurate Influence Maximization on Large Networks with Pruned Monte-Carlo Simulations (AAAI'14)

Proposed Speed-up Techniques(we apply each random graph)

1. Pruned BFS for reachability tests (on random graphs)(We will focus on this)

[Akiba, Iwata, Yoshida. SIGMODโ€™13]

[Yano, Akiba, Iwata, Yoshida. CIKMโ€™13]

[Akiba, Iwata, Kawarabayashi, Kawata. ALENEXโ€™14]

2. Reducing unnecessary influence recomputations

3. Reducing # of random graphs by

Sample Average Approximation approach[Kimura, Saito, Nakano. AAAIโ€™07], [Cheng, Shen, Huang, Zhang, Cheng. CIKMโ€™13]

[Sheldon et al., UAIโ€™10]

We provide nice theoretical bound

13

These techniques do NOT affect

the estimation of ๐œŽ โ‹…

CORE IDEA

of

our paradigm

Page 14: Fast and Accurate Influence Maximization on Large Networks with Pruned Monte-Carlo Simulations (AAAI'14)

Pruned BFS

Idea: Most BFSs are redundant

Preprocessing: Compute ancestors and

descendants of vertex ๐ป with max. deg.

Pruning ๏ผˆBFS from ๐‘ฃ๏ผ‰: If ๐‘ฃ is ancestor of ๐ป,

we ignore descendants of ๐ป

14

๐‘ฏ

๐‘ฉ๐‘จ ๐‘ช

๐‘ฌ๐‘ซ ๐‘ญ

2

+

4

(# of vertices visited during BFS)

+

(# of descendants of ๐ป)

โ‡ง Precomputed

Page 15: Fast and Accurate Influence Maximization on Large Networks with Pruned Monte-Carlo Simulations (AAAI'14)

Is Pruned BFS Really Effective?

For Path Graphs

Pruned BFS is NOT effective ฮ˜ ๐‘‰ 2

But, for Social Networks

Pruned BFS works effectively

since there is a hub(or giant component)

15

๐‘ฏ

A path graph

Giant

Component

๐‘ฏ

A social network

Page 16: Fast and Accurate Influence Maximization on Large Networks with Pruned Monte-Carlo Simulations (AAAI'14)

Effect of Pruned BFS

on Social Networks(LiveJournal dataset, ๐‘‰ = 4.8M, ๐ธ = 69M, ๐‘๐‘’ = 0.1 โˆ€๐‘’)

# of vertices visited during Naive & Pruned BFSs

16

Average # of visited vertices (from each vertex):

400,000 (Naive BFS) โ‡จ 6 (Pruned BFS)

Giant

Component

๐‘ฏ

Pru

ne

d B

FS

Naive BFS

Page 17: Fast and Accurate Influence Maximization on Large Networks with Pruned Monte-Carlo Simulations (AAAI'14)

Experiments: Influence SpreadWe set ๐’‘๐’† = ๐‘ท for every edge. Size of seed set = 50

17

Ours & StaticGreedyDU

give the best results

Dataset Ours(this work)

StaticGreedy

DU[Cheng+'13]

IRIE[Jung+'12]

PMIA[Chen+'10]

SAEDV[Jiang+'11]

DBLP

๐‘ท = ๐ŸŽ. ๐ŸŽ๐Ÿ332 330 323 317 76

DBLP

๐‘ท = ๐ŸŽ. ๐Ÿ100076 -- 99533 99505 99579

LiveJournal

๐‘ท = ๐ŸŽ. ๐ŸŽ๐Ÿ47527 -- 41906 40544 26066

LiveJournal

๐‘ท = ๐ŸŽ. ๐Ÿ1686629 -- 1682436 -- 1682242

Dataset ๐‘ฝ ๐‘ฌ

DBLP 655K 2.0M

Live Journal 4.8M 69M

significantly

better

Page 18: Fast and Accurate Influence Maximization on Large Networks with Pruned Monte-Carlo Simulations (AAAI'14)

Experiments: Running Time [s]We set ๐’‘๐’† = ๐‘ท for every edge. Size of seed set = 50

18Environment: Intel Xeon X5670 (2.93GHz), 48GB, Language: C++

As fast as heuristics

Robust against value of ๐‘ท

Dataset Ours(this work)

StaticGreedy

DU[Cheng+'13]

IRIE[Jung+'12]

PMIA[Chen+'10]

SAEDV[Jiang+'11]

DBLP

๐‘ท = ๐ŸŽ. ๐ŸŽ๐Ÿ27 117 77 4 388

DBLP

๐‘ท = ๐ŸŽ. ๐Ÿ52 OOM 77 289 388

LiveJournal

๐‘ท = ๐ŸŽ. ๐ŸŽ๐Ÿ327 OOM 1622 500 1275

LiveJournal

๐‘ท = ๐ŸŽ. ๐Ÿ663 OOM 1635 OOM 1294

Dataset ๐‘ฝ ๐‘ฌ

DBLP 655K 2.0M

Live Journal 4.8M 69M

Page 19: Fast and Accurate Influence Maximization on Large Networks with Pruned Monte-Carlo Simulations (AAAI'14)

Future Work

Applying other models

Parallelization

Analysis of Pruned BFS on social networks

19

Page 20: Fast and Accurate Influence Maximization on Large Networks with Pruned Monte-Carlo Simulations (AAAI'14)
Page 21: Fast and Accurate Influence Maximization on Large Networks with Pruned Monte-Carlo Simulations (AAAI'14)

Supplement

21

Page 22: Fast and Accurate Influence Maximization on Large Networks with Pruned Monte-Carlo Simulations (AAAI'14)

Dataset

Pruned BFS

+

Technique 2

Naive BFS

+

Technique 2

Pruned BFS Naive BFS

DBLP

๐‘ท = ๐ŸŽ. ๐ŸŽ๐Ÿ27 26 149 158

DBLP

๐‘ท = ๐ŸŽ. ๐Ÿ54 3036 306 3275

LiveJournal

๐‘ท = ๐ŸŽ. ๐ŸŽ๐Ÿ327 1934 2176 3820

LiveJournal

๐‘ท = ๐ŸŽ. ๐Ÿ634 272518 2426 272973

Running Time [s] for Each Variant

of Our Method

Page 23: Fast and Accurate Influence Maximization on Large Networks with Pruned Monte-Carlo Simulations (AAAI'14)

Construct a Vertex-weighted DAG

from a Random Graph

Strongly Connected Component Decomposition

23

A

B C3

1

2

2

Page 24: Fast and Accurate Influence Maximization on Large Networks with Pruned Monte-Carlo Simulations (AAAI'14)

Other Models for

Information Diffusion Linear Threshold Model [Kempe, Kleinberg, Tardos. KDDโ€™03]

Inactive vertex ๐‘ฃ becomes active if

๐‘ข: active neighbor of ๐‘ฃ

๐‘ž๐‘ข๐‘ฃ โ‰ฅ ๐œƒ๐‘ฃ

๐œƒ๐‘ฃ: Threshold chosen from 0,1 uniformly at random

Equivalent to reachability tests on random graphs

Independent Cascade with Meeting Events [Chen, Lu, Zhang. AAAIโ€™12]

Maximizing the influence spread within a given deadline

We have to consider shortest paths(not only reachability)

24

Page 25: Fast and Accurate Influence Maximization on Large Networks with Pruned Monte-Carlo Simulations (AAAI'14)

Running Time for Each Value of ๐‘ท

25The Value of ๐‘ท

Ru

nn

ing

Tim

e

Page 26: Fast and Accurate Influence Maximization on Large Networks with Pruned Monte-Carlo Simulations (AAAI'14)

A Social Network

26http://www.cise.ufl.edu/research/sparse/matrices/SNAP/soc-LiveJournal1.html


Top Related