stream processing of xpath queries with predicates

30
2004/4/23 1 Stream Processing of XP ath Queries with Predic ates Ashish Kumar Gupta Dan Suciu University of Washington SIGMOD 2003 報報報 : 報報報

Upload: libby

Post on 31-Jan-2016

39 views

Category:

Documents


0 download

DESCRIPTION

Stream Processing of XPath Queries with Predicates. Ashish Kumar Gupta Dan Suciu University of Washington SIGMOD 2003. 報告者 : 蔡明瑾. Introduction. XML messages :exchange information XML stream processing problem Processing XPath queries(filters) on an incoming stream of XML packets - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Stream Processing of XPath Queries with Predicates

2004/4/23 1

Stream Processing of XPath Queries with Predicates

Ashish Kumar Gupta Dan Suciu University of Washington

SIGMOD 2003

報告者 : 蔡明瑾

Page 2: Stream Processing of XPath Queries with Predicates

2

Introduction

XML messages :exchange information XML stream processing problem

Processing XPath queries(filters) on an incoming stream of XML packets

Workload is very high XPath queries multiple predicates

Page 3: Stream Processing of XPath Queries with Predicates

3

Definition - XPath fragment

E is atomic predicates

Page 4: Stream Processing of XPath Queries with Predicates

4

Definition – XML and SAX Parsers

startDocument() startElement(a) text(s) endElement(a) endDocument()

a:element or attribute label s:data value

Page 5: Stream Processing of XPath Queries with Predicates

5

<a c=“3”> <b>4</b></a>

startDocument() startElement(a) startElement(@c) text(“3”) endElement(@c) startElement(b) text(“4”) endElement(b) endElement(a) endDocument()

Page 6: Stream Processing of XPath Queries with Predicates

6

XML stream processing problem

XPath expression P:boolean filter A XML document matches P if and only

if P selects at least one node when evaluated on the document’s root

Set P = {P1,…,Pn} Set I = {o1,…,on}

Page 7: Stream Processing of XPath Queries with Predicates

7

XPush Machine

Modified deterministic pushdown automaton

Simulate the execution of XPath filters Input :stream of XML documents Outout:oids Changes:

States:top-down,bottom-up Accepts SAX events as input

Page 8: Stream Processing of XPath Queries with Predicates

8

XPush Machine(cont.)

Page 9: Stream Processing of XPath Queries with Predicates

9

SAX call-back functions current state(qt,qb)

Page 10: Stream Processing of XPath Queries with Predicates

10

P1 = //a[b/text()=1 and .//a[@c>2]]P2 = // a[@c>2 and b/text()=1]

<a> <b> 1 </b> <a c= “3” > <b> 1 </b></a></a>

qo qo

qo

qo

qo

q1

qo

qo

qo

qo

q3

qo

q3

qo

qo

q3

qo

qo

qo

q3

q2

qo

qo

q3

qo

q4

qo

q3

q1

q4

qo

q3

q5

qo

q9

q15qo

q3

q4

Page 11: Stream Processing of XPath Queries with Predicates

11

Page 12: Stream Processing of XPath Queries with Predicates

12

Page 13: Stream Processing of XPath Queries with Predicates

13

Compiling a set of XPath filters to an XPush Machine

Convert XPath filters P1,…Pn into an Alternating Finite Automaton A1,…An

Translate all AFAs to a single XPush machine

Page 14: Stream Processing of XPath Queries with Predicates

14

Step1:Construct the AFA Nondeterministic finite automaton A1,…,An

S:union of all states in A1,…,An

One initial state s1,…,sn

terminal states are OR states labeled with an atomic predicate on data values

πs(v): true of predicates on v V, else false

Page 15: Stream Processing of XPath Queries with Predicates

15

Step1:Construct the AFA (cont.)

States label: AND, OR, or NOT εtransitions δ: S * (Σ∪ {ε}) P(S) AND and OR states :εtransitions NOT st

ates : one outgoing transition

Page 16: Stream Processing of XPath Queries with Predicates

16

Step1:Construct the AFA (cont.)

Given an XML document tree, AFA accepts document: Initial states matches the root node

OR state s matches node x: node x is a data value node and πs(v)=true Some transition s’ δ(s,a) matches y(child of x labele

d a) AND state s matches node x:

All transitions s’ δ(s,ε) matches x NOT state s matches node x:

If s’ doesn’t match x ,δ(s,ε) = {s’}

Page 17: Stream Processing of XPath Queries with Predicates

17

AFAs for P1,P2

Page 18: Stream Processing of XPath Queries with Predicates

18

example1

S = {1,..,13} s1 = 1,s2 = 8 wildcard:δ(5,@c) = Ø , δ(5,b) =5, δ(5,a) =6 And states : states2 and 9 π7(55)=true, π2(v)=false State :correspond to a subquery in XPath: state2 [b/text()=1 and .//a[@c>2]]

Page 19: Stream Processing of XPath Queries with Predicates

19

Step2: construct XPush Machine

(Qt,Qb,qot,qob,tpush,tvalue,tpop,)

Page 20: Stream Processing of XPath Queries with Predicates

20

tpop(qb,a)= δ-1(q,a)

δ-1(q,a) {s’|δ(s’,a) ∩ q≠ Ø } eval (q): a set of states q Adds to q all states that are implied by

states already in q AND states OR states NOT states

Page 21: Stream Processing of XPath Queries with Predicates

21

XPush Machine

Page 22: Stream Processing of XPath Queries with Predicates

22

example2

tvalue(qot,1)={4,13} = q1

tvalue(qot,x)={7,11} = q2 , for x > 2 tvalue(qo

t,x)= {Ø} = qo, for all other values of x

tpop(q8,a)={1,5} = q14

tbadd(q3, q6)={3,12}∪{5}= q8

leaf states cannot match with any other statesno mixed data

<a>1<b>2</b></a> X

Page 23: Stream Processing of XPath Queries with Predicates

23

Lazy XPush Machine

Do not construct states that are inconsistent with DTD

Lazy evaluation exploits regularities in the data that are not captured by the DTD

Avoid constructing States don’t occur in a given data set

Page 24: Stream Processing of XPath Queries with Predicates

24

Top-down Pruning

<e1>….<c>ci1</c>…..<c>cij</c>…</e1> keeping track of the enabled branches

in the top-down state bottom-up computations only at the e

nabled branches

Page 25: Stream Processing of XPath Queries with Predicates

25

Order Optimization

/person[name/text()=“smith” and age/text()=“33”and phone/text()=“5551234”]

prec(s)={s’|s’ s} tadd(qs

b,qb)=qsb ∪ {s|s qb,prec(s) qs

b}

Page 26: Stream Processing of XPath Queries with Predicates

26

Training the XPush Machine

Generate one XML document tree for every XPath query

Page 27: Stream Processing of XPath Queries with Predicates

27

Experiment

Real data sets: Protein 9.12MB XML fragment A non-recursive DTD Max depth of document is 7

Page 28: Stream Processing of XPath Queries with Predicates

28

Effectiveness

Page 29: Stream Processing of XPath Queries with Predicates

29

Runtime memory

Page 30: Stream Processing of XPath Queries with Predicates

30

Hit Ratio