fast mining and forecasting of complex time-stamped eventsyasuko/publications/... · money.com...
TRANSCRIPT
Fast Mining and Forecasting of Complex Time-Stamped Events
Paper ID: 183
Yasuko Matsubara
Kyoto University [email protected]
Yasushi Sakurai
NTT Communication Science Labs [email protected]
Christos Faloutsos
Carnegie Mellon University [email protected]
Tomoharu Iwata
NTT Communication Science Labs [email protected]
Masatoshi Yoshikawa
Kyoto University [email protected]
Motivation Complex time-stamped events Web click events: {time, url, user, access device, http referrer,…}
Challenges
L We cannot see any trends!!
Problem definition Given: A sequence of complex time-stamped events Goal: (1) Find major topics (2) Forecast future events
Timestamp URL User Device 2012-08-01-12:00 CNN.com Smith iphone 2012-08-02-15:00 YouTube.com Brown iphone 2012-08-02-19:00 CNET.com Smith mac
… … … …
Proposed method: TriMine Main idea (1): M-way analysis (a) Complex time-stamped tensor (b) Decompose to 3 topic vectors (c) Inference (Gibbs sampling) Infer k topics for each element according to probability p: Main idea (2): Multi-scale analysis TriMine is linear on the input size N, i.e., (N: counts of events in X, n: duration of X)
u
vn
Object
Actor
Time
Topic1 (business)
Topic2 (news)
Topic3 (media)
…
Object
Actor
Time
TriMine-F – forecasts Use topic matrices O, A, C [step1] Forecast time-topic matrix C’ Forecast C’ using multiple levels of matrices [step2] Generate events using three matrices O, A, C’
(a) Count estimation – X’ = O * A * C’ (b) Complex event generation – sampling approach
Experiments web click data / On demand TV data TriMine-plot (red points: each URL/user/TV program) Benefit of multi-scale forecasting Forecasting accuracy Temporal perplexity RMSE between original & forecasted events Scalability
Conclusions We addressed the problem of complex event mining TriMine has following properties:
• Effective: It finds meaningful patterns in real datasets • Accurate: It enables forecasting • Scalable: It is linear on the database size
Code: http://www.kecl.ntt.co.jp/csl/sirg/people/yasuko/software.html
Original access counts (100 users, 1 week)
Mth order tensor (M=3) cube {object x actor x time} : event count of object i, actor j at time t xijt
e.g., (‘Smith’, ‘CNN.com’, ‘Aug1 12pm’; 21 times)
Object/URL
Actor/user
Time
u
v
n
xijt
e.g., Topic 1 :business topic (Higher value: related topic)
Object/URL
Money.com
CNN.com
Smith Johnson
Actor/user
Time
Mon-Fri Sat-Sun
Object matrix
Actor matrix
Time matrix
u
v n
k
k
k€
O
€
A
C
time actors
€
O
€
A
€
⇒
€
n€
u
€
k
€
u
€
k€
k
€
χ = χ(0)
€
u
€
χ(1)
€
C(1)
€
k
€
u
€
n / l1
€
C(2)
€
k
€
n / l2
€
χ(2)
€
⇒
€
⇒
€
l0
€
l1
€
l2
v
v
v
v
€
n / l0
€
C(0)
Share O & A for all levels
Hourly pattern
Daily pattern
Weekly pattern
Compute C for each level
Future events
Tensor X €
O
€
AC C
€
O
€
AC
time 1−t2−t
€
c r(0)
€
c r(1)
€
c r(2)
r,t(0)c
Forecasted value
[Step 1] [Step 2]
102 103 104102
104
106
Duration
Wal
l clo
ck ti
me
(sec
.)
ARTriMine−F (naive)TriMine−F
7x
74x
10 20 3020
40
60
80
Time (per 2 hours)
RM
SE
Marginal
TriMine−FPLiFT2
1 2 3
103
104
Time (per 1 day)
RM
SE
Marginal
TriMine−FPLiFT2AR
O(N logn)→O(N )
0 100 200 3000
0.005
0.01
0.015
0.02
0.025
Time (per 1 hour)
Valu
e
"Business""Media""Drive"
Weekend
Weekend
"Business"
"Drive"
"Media"
car&biketravelarea,
word−of mouth
newsportal
business news
pet
restaurant
money,finance
URL matrix User matrix Time matrix
"Blog"
"Communication"
"Food" blog
route map, diet
community
pet
rss
web mailmobile mail
restaurant0 100 200 300
0
0.002
0.004
0.006
0.008
0.01
0.012
0.014
Time (per 1 hour)
Valu
e
"Blog""Food""Communication"
11pm4pm
0 20 40 60 800
0.005
0.01
0.015
Time (per 4 hours)
Valu
e
"Sports""Romance""Action"
Saturday
SundaySaturday
"Romance"
"Sports"
"Action"
LOST
Taxi 1 & 2Oceans 11 & 12
Australian Open tennis
French Open tennis final (women)
Desperate Housewives
French Open tennis final (men)
TV matrix Time matrix
0
5
10 x 10−3
Time (per 2hours)
Valu
e
2 topics by TriMine
0 50 100 150 200 250 3000
5
10 x 10−3
Time (per 2hours)
Valu
e
Forecasting − TriMine−F (single)
0 50 100 150 200 250 3000
5
10 x 10−3
Time (per 2hours)
Valu
e
Forecasting − TriMine−F
0
5
10 x 10−3
Time (per 2hours)
Valu
e
2 topics by TriMine
0 50 100 150 200 250 3000
5
10 x 10−3
Time (per 2hours)
Valu
e
Forecasting − TriMine−F (single)
0 50 100 150 200 250 3000
5
10 x 10−3
Time (per 2hours)
Valu
e
Forecasting − TriMine−F
0
5
10 x 10−3
Time (per 2hours)
Valu
e
2 topics by TriMine
0 50 100 150 200 250 3000
5
10 x 10−3
Time (per 2hours)
Valu
e
Forecasting − TriMine−F (single)
0 50 100 150 200 250 3000
5
10 x 10−3
Time (per 2hours)
Valu
e
Forecasting − TriMine−F
Original C matrix Single scale: failed Multi-scale: capture patterns
Any trends?
Can we forecast future events?
0 50 100 150 200 250 3000
100
200
300
Time
Cou
nt o
f clic
ks URL: blog site
Bursty L
Noisy L
URL matrix Time matrix
Similar trends very clear groups
daily + weekly periodicities
180 200 220 240 260 280 300
105
Time (per 2 hours)
Perp
lexi
ty
TriMine−FT2
170 180 190 200 210 220 230 240
106
Time (per 4 hours)
Perp
lexi
ty
TriMine−FT2
T2: [Hong et al.’11], PLiF [Li et al.’10]