![Page 1: Chris Re, Julie Letchner, Magdalena Balazinska and Dan Suciu University of Washington](https://reader035.vdocuments.pub/reader035/viewer/2022062407/56649c7f5503460f94934e2b/html5/thumbnails/1.jpg)
EXTRACTING EVENTS FROM PROBABILISTIC
STREAMS
Chris Re, Julie Letchner,
Magdalena Balazinska and Dan Suciu
University of Washington
![Page 2: Chris Re, Julie Letchner, Magdalena Balazinska and Dan Suciu University of Washington](https://reader035.vdocuments.pub/reader035/viewer/2022062407/56649c7f5503460f94934e2b/html5/thumbnails/2.jpg)
One Slide Overview Motivating App: RFID Ecosystem
Tagged people, cups, books, keys, laptops, etc. Event queries [Cayuga, SASE, Snoop]
Alert when anyone enters the coffee room Two problems
Missed readings, read-rates in practice are lowGranularity mismatch, e.g. Office v. Antenna 41
Instead, infer location from sensors Propose, keep probs & query with PEEX+
PEEX+ (Probabilistic Event EXtraction) keeps data probabilistic to get higher P/R and is still efficient.
![Page 3: Chris Re, Julie Letchner, Magdalena Balazinska and Dan Suciu University of Washington](https://reader035.vdocuments.pub/reader035/viewer/2022062407/56649c7f5503460f94934e2b/html5/thumbnails/3.jpg)
Motivating Apps
RFID appsDiary and Active Calendar Application.
○ Alert if I go to a database meeting.Supply chain
○ Alert if Mach 3 razors are being stolen
Many independent HMMsElder care [Intel,Patterson]
○ Alert if elder takes their medicine with waterFinancial applications on predictive HMM
○ Alert if head-and-shoulders market
![Page 4: Chris Re, Julie Letchner, Magdalena Balazinska and Dan Suciu University of Washington](https://reader035.vdocuments.pub/reader035/viewer/2022062407/56649c7f5503460f94934e2b/html5/thumbnails/4.jpg)
Outline
RFID to Probabilities via Particle Filters PEEX+ query language Extended Regular Query Algorithm Experiments
![Page 5: Chris Re, Julie Letchner, Magdalena Balazinska and Dan Suciu University of Washington](https://reader035.vdocuments.pub/reader035/viewer/2022062407/56649c7f5503460f94934e2b/html5/thumbnails/5.jpg)
The source of probabilities
Each orange particle is a guess of true location
6th Floor in PAC
Blue ring is ground truth
Connectivity Diagram
Antennas
![Page 6: Chris Re, Julie Letchner, Magdalena Balazinska and Dan Suciu University of Washington](https://reader035.vdocuments.pub/reader035/viewer/2022062407/56649c7f5503460f94934e2b/html5/thumbnails/6.jpg)
PFs to a (prob) DB personTag t Loc P
Joe 7 O2 0.4
H2 0.2
H3 0.4
Joe 8 O2 0.6
H2 0.2
H3 0.2
Sue 7 … …
At(tag,loc)
To query Particle Filter output, query At
![Page 7: Chris Re, Julie Letchner, Magdalena Balazinska and Dan Suciu University of Washington](https://reader035.vdocuments.pub/reader035/viewer/2022062407/56649c7f5503460f94934e2b/html5/thumbnails/7.jpg)
Tag t Loc P
Joe 7 O2 0.4
H2 0.2
H3 0.4
Joe 8 O2 0.6
H2 0.2
H3 0.2
Sue 7 … …
Semantics of the Model
At(tag,loc)
Tag t Loc
Joe 7 O2
Joe 8 O2
Sue 7 …
Prob =0.4 * 0.6 * …
NB: Markovian correlations OK
“Joe enter O2 at t=8”
(0.2 0.4)*0.6 0.36 Query Semantic: sum weight of all worlds where Q is true at time t
possible stream (worlds)
Probability outside O2 (in H2,H3)
![Page 8: Chris Re, Julie Letchner, Magdalena Balazinska and Dan Suciu University of Washington](https://reader035.vdocuments.pub/reader035/viewer/2022062407/56649c7f5503460f94934e2b/html5/thumbnails/8.jpg)
Outline
RFID to Probabilities via Particle Filters PEEX+ query language Extended Regular Query Algorithm Experiments
![Page 9: Chris Re, Julie Letchner, Magdalena Balazinska and Dan Suciu University of Washington](https://reader035.vdocuments.pub/reader035/viewer/2022062407/56649c7f5503460f94934e2b/html5/thumbnails/9.jpg)
A hierarchy of PEEX+ queries Regular Queries
Alert me when Joe goes to the coffee room Extended Regular
Alert when anyone goes to the coffee room Safe
Alert when anyone goes to the coffee room and a DB member follows them.
Hard Others (Simulation)This line is sharp for some queries
![Page 10: Chris Re, Julie Letchner, Magdalena Balazinska and Dan Suciu University of Washington](https://reader035.vdocuments.pub/reader035/viewer/2022062407/56649c7f5503460f94934e2b/html5/thumbnails/10.jpg)
Peex+ Queries
Fragment of Cayuga, queries define events.Operator Description
Base stream
semicolon Sequence
Select
Kleene+
Technical Point: Left-to-right eval,
( ) ( )V P1( ) 1( ( , ))l At p lRoom
1( , )At p l
1 2( , ); ( , )At p l At p l
( )( , ) { , }Hall lAt p l p
g;
{ , }P V
1 2 3 1 2 3; ; ( ; );E E E E E E
Same p in both
p in some location
![Page 11: Chris Re, Julie Letchner, Magdalena Balazinska and Dan Suciu University of Washington](https://reader035.vdocuments.pub/reader035/viewer/2022062407/56649c7f5503460f94934e2b/html5/thumbnails/11.jpg)
Regulars and Extended Regular
Query is regular if no variable is shared between subgoals
Query is extended regular if any variable shared by two subgoals, is shared by all subgoals, i.e. templated regular query
502 ( (' ', '501'); (' ', ))l At Joe At Joe l
502 ( ( , 5̀01 ); ( , ))l At p At p l p is shared between subgoals
![Page 12: Chris Re, Julie Letchner, Magdalena Balazinska and Dan Suciu University of Washington](https://reader035.vdocuments.pub/reader035/viewer/2022062407/56649c7f5503460f94934e2b/html5/thumbnails/12.jpg)
Wrinkle in the language:Filter v. Selection
“Alert next time Joe is in 502 after he is in 501”
(` ', 5̀01'); (` ', 5̀02 ')filterq At Joe At Joe
`502'( (` ', 5̀01 ); (` ', ))select lq At Joe At Joe l
Time
Yes
No
( ,501)Joe ( ,502)Joe( ,503)Joe
“Alert if the next place Joe is in after 501 is 502”
At
![Page 13: Chris Re, Julie Letchner, Magdalena Balazinska and Dan Suciu University of Washington](https://reader035.vdocuments.pub/reader035/viewer/2022062407/56649c7f5503460f94934e2b/html5/thumbnails/13.jpg)
Outline
RFID to Probabilities via Particle Filters PEEX+ query language Extended Regular Query Algorithm Experiments
![Page 14: Chris Re, Julie Letchner, Magdalena Balazinska and Dan Suciu University of Washington](https://reader035.vdocuments.pub/reader035/viewer/2022062407/56649c7f5503460f94934e2b/html5/thumbnails/14.jpg)
Why are ER queries hard?
Regular Queries ~ Regular ExpressionsMapping is non-trivial
○ similar to Cayuga [Demers et al. 06] Queries have #P-combined complexity
○ Can encode mDNF as regular expressionIntuition: n-sized automaton leads to
Extended regular ~ 1 NFA per/personk persons implies O(k)-size automatonExponential cost
time(2 )n
When ER, can avoid blowup
![Page 15: Chris Re, Julie Letchner, Magdalena Balazinska and Dan Suciu University of Washington](https://reader035.vdocuments.pub/reader035/viewer/2022062407/56649c7f5503460f94934e2b/html5/thumbnails/15.jpg)
Algorithm for Regular Queries Overview
Deterministic Algorithm
1. Compile a query q1. NFA –like-thing in a language
2. Mapping events to subsets of
2. At runtime, at time t have events E1. Create set of symbols at time t:
2. Process NFA on
( ) ( )q qe E
M E M e
( )qM E
qL
qLqM
Focus on the compilation
![Page 16: Chris Re, Julie Letchner, Magdalena Balazinska and Dan Suciu University of Washington](https://reader035.vdocuments.pub/reader035/viewer/2022062407/56649c7f5503460f94934e2b/html5/thumbnails/16.jpg)
Compile Select and Filter
(` ', 5̀01'); (` ', 5̀02 ')filterq At Joe At Joe
`502'( (` ', 5̀01 ); (` ', ))select lq At Joe At Joe l
Intuition: goal maps to two letters:match (m) : matches filteraccept (a) : accepted by select
1 1 2 2{m , , , }L a m a
1a 2a
2{ }m Final
Does not contain
Does contain
language and automaton are the same for both queries
![Page 17: Chris Re, Julie Letchner, Magdalena Balazinska and Dan Suciu University of Washington](https://reader035.vdocuments.pub/reader035/viewer/2022062407/56649c7f5503460f94934e2b/html5/thumbnails/17.jpg)
The difference is the mapping
1 1 2 2{m , , , }L a m a
1a 2a
2{ }m Final
Does not contain
Does contain
(` ', 5̀01'); (` ', 5̀02 ')filterq At Joe At Joe
`502'( (` ', 5̀01 ); (` ', ))select lq At Joe At Joe l
Event Filter Select
( ,501)Joe11{ , }m a 1 1 2{ , , }m a m
( ,502)Joe22{ , }m a
2 2{ , }m a
0( , )Joe l2{ }m
![Page 18: Chris Re, Julie Letchner, Magdalena Balazinska and Dan Suciu University of Washington](https://reader035.vdocuments.pub/reader035/viewer/2022062407/56649c7f5503460f94934e2b/html5/thumbnails/18.jpg)
Regular Queries w. Probabilities
Probabilistic Algorithm
1. Compile a query q1. NFA with transition in a language
2. Mapping events to subsets of
2. At time t have events E with probs1. Create set of symbols at time t:
2. Process NFA on
( ) ( )q qe E
M E M e
( )qM E
qLqLqM
Stays the same
distribution on inputs
Algorithm is constant in data, exponential in |Q|
distribution on states
State at t+1 only depends on state at t and input at t+1
![Page 19: Chris Re, Julie Letchner, Magdalena Balazinska and Dan Suciu University of Washington](https://reader035.vdocuments.pub/reader035/viewer/2022062407/56649c7f5503460f94934e2b/html5/thumbnails/19.jpg)
Extension to Extended regular “Alert when anyone in 501 and next step
in 502”
If substitute for p, result is regular
Bindings use disjoint sets of tuples. Algorithm: independent copies, multiply
`502'( ( , 5̀01 ); ( , ))select lq At p At p l
`502'[ ] ( ( , 5̀01 ); ( , ))lq p Joe At Joe At Joe l
Depends on # distinct values (shared vars), not # of timesteps – can stream
`502'[ ] ( ( , 5̀01 ); ( , ))lq p Tom At Tom At Tom l
![Page 20: Chris Re, Julie Letchner, Magdalena Balazinska and Dan Suciu University of Washington](https://reader035.vdocuments.pub/reader035/viewer/2022062407/56649c7f5503460f94934e2b/html5/thumbnails/20.jpg)
Recap of Algorithms
Regular QueriesCompiled them to an NFA, then used imageData complexity O(1)
Extended regularSeveral regulars multiplied togetherDepends on number of distinct people in the
data, not number of time steps. Markov Correlations: more arithmetic &
state
![Page 21: Chris Re, Julie Letchner, Magdalena Balazinska and Dan Suciu University of Washington](https://reader035.vdocuments.pub/reader035/viewer/2022062407/56649c7f5503460f94934e2b/html5/thumbnails/21.jpg)
PEEX+ Algorithms and Analysis
Compilation procedures Safe plans.
More complicated based on algebracost grows with data (useful for archives)
Aggregates Complexity: Can we do better?
For a restricted class, draw a crisp lineMinor variants of safe result in hardness
![Page 22: Chris Re, Julie Letchner, Magdalena Balazinska and Dan Suciu University of Washington](https://reader035.vdocuments.pub/reader035/viewer/2022062407/56649c7f5503460f94934e2b/html5/thumbnails/22.jpg)
Outline
RFID to Probabilities via Particle Filters PEEX+ query language Extended Regular Query Algorithm Experiments
![Page 23: Chris Re, Julie Letchner, Magdalena Balazinska and Dan Suciu University of Washington](https://reader035.vdocuments.pub/reader035/viewer/2022062407/56649c7f5503460f94934e2b/html5/thumbnails/23.jpg)
Experimental Setup
Quality Experiment52 objects, 352 locations, 10k sq. ft.
○ 2x30m trace with 10 m break in betweenParticipants marked down true locations“Alert when anyone enters the Coffee Room”
Consider two ScenariosRealtime (No correlations) v. MLEArchived (Smoothing) v. Viterbi
2 1( ) ( ) 1 2( ( ( , )); ( , ))Coffee l Hallway l At p l At p l
In practice, can smooth in a short time
![Page 24: Chris Re, Julie Letchner, Magdalena Balazinska and Dan Suciu University of Washington](https://reader035.vdocuments.pub/reader035/viewer/2022062407/56649c7f5503460f94934e2b/html5/thumbnails/24.jpg)
Quality: Realtime Declare an event “true”, if its Pr > threshold
Vary threshold
0
0.2
0.4
0.6
0.8
1Precision
0
0.2
0.4
0.6
0.8
1Recall
0
0.2
0.4
0.6
0.8
1F1
10% improvement in F1
![Page 25: Chris Re, Julie Letchner, Magdalena Balazinska and Dan Suciu University of Washington](https://reader035.vdocuments.pub/reader035/viewer/2022062407/56649c7f5503460f94934e2b/html5/thumbnails/25.jpg)
Quality: Archived
Smoothing v. ViterbiPEEX keeps track of Markovian Correlations
0
0.2
0.4
0.6
0.8
1
Precision Recall F1
0
0.2
0.4
0.6
0.8
1
0
0.2
0.4
0.6
0.8
1
Approx ~30% gain in F1
![Page 26: Chris Re, Julie Letchner, Magdalena Balazinska and Dan Suciu University of Washington](https://reader035.vdocuments.pub/reader035/viewer/2022062407/56649c7f5503460f94934e2b/html5/thumbnails/26.jpg)
Performance
![Page 27: Chris Re, Julie Letchner, Magdalena Balazinska and Dan Suciu University of Washington](https://reader035.vdocuments.pub/reader035/viewer/2022062407/56649c7f5503460f94934e2b/html5/thumbnails/27.jpg)
Conclusion Showed PEEX+
Processed output of several inference tasks○ Applies more generally than just RFID
Quality (F1) gains by keeping probability50% from probs, 50% from correlations
Performance was usable in real-timeNo indexing!
Preprint available on request
![Page 28: Chris Re, Julie Letchner, Magdalena Balazinska and Dan Suciu University of Washington](https://reader035.vdocuments.pub/reader035/viewer/2022062407/56649c7f5503460f94934e2b/html5/thumbnails/28.jpg)
![Page 29: Chris Re, Julie Letchner, Magdalena Balazinska and Dan Suciu University of Washington](https://reader035.vdocuments.pub/reader035/viewer/2022062407/56649c7f5503460f94934e2b/html5/thumbnails/29.jpg)
Future Work Implementing archived stream indexing.
Aggregations in timeAggressive indexingRanking? Top-K?
Shaper lines for complexityAre there more streamable queries?
Richer languageSimilar to linear style plansWhat do people need?
Temporal Models!Consistency
![Page 30: Chris Re, Julie Letchner, Magdalena Balazinska and Dan Suciu University of Washington](https://reader035.vdocuments.pub/reader035/viewer/2022062407/56649c7f5503460f94934e2b/html5/thumbnails/30.jpg)
Correlations
![Page 31: Chris Re, Julie Letchner, Magdalena Balazinska and Dan Suciu University of Washington](https://reader035.vdocuments.pub/reader035/viewer/2022062407/56649c7f5503460f94934e2b/html5/thumbnails/31.jpg)
Sequencing by example
Sequencing is parameterized [Cayuga]
502' ( ( , 5̀01'); ( , ))l At p At p l
( ,501)Joe ( ,502)Bob ( ,502)Joe
Time
( ,503)Joe
Semicolon means “the next event among those that match next goal”
Semicolon is not “after”
![Page 32: Chris Re, Julie Letchner, Magdalena Balazinska and Dan Suciu University of Washington](https://reader035.vdocuments.pub/reader035/viewer/2022062407/56649c7f5503460f94934e2b/html5/thumbnails/32.jpg)
Compilation by example
Each goal “corresponds” to two letters:move (m) – the query should advanceaccept (a) – the next subgoal accepts
1 50` 2' ( ( , 5̀01 ); ( , ))lq At Joe At Joe l
1 1 1 2 2{m , , , }L a m a
1a 2a
2{ }m1 1 2( ,501) { , , }Joe m a m
2 2( ,502) { , }Joe m a
Any other maps to empty set0 2( , ) { }Joe l m
Final
Does not contain
Does contain
qM
![Page 33: Chris Re, Julie Letchner, Magdalena Balazinska and Dan Suciu University of Washington](https://reader035.vdocuments.pub/reader035/viewer/2022062407/56649c7f5503460f94934e2b/html5/thumbnails/33.jpg)
Subtle example..
What about:
1 50` 2' ( ( , 5̀01 ); ( , ))lq At Joe At Joe l 1 1 1 2 2{m , , , }L a m a
1a 2a
2{ }m
1 1 2( ,501) { , , }Joe m a m
2 2( ,502) { , }Joe m a
Any other maps to empty set0 2( , ) { }Joe l m
Final
Does not contain
Does contain
1M
2 ( , 5̀01 ); ( , 5̀02 ')q At Joe At Joe
0( , )Joe l
2M