stream processing of xpath queries with predicates

Post on 31-Jan-2016

39 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Stream Processing of XPath Queries with Predicates. Ashish Kumar Gupta Dan Suciu University of Washington SIGMOD 2003. 報告者 : 蔡明瑾. Introduction. XML messages :exchange information XML stream processing problem Processing XPath queries(filters) on an incoming stream of XML packets - PowerPoint PPT Presentation

TRANSCRIPT

2004/4/23 1

Stream Processing of XPath Queries with Predicates

Ashish Kumar Gupta Dan Suciu University of Washington

SIGMOD 2003

報告者 : 蔡明瑾

2

Introduction

XML messages :exchange information XML stream processing problem

Processing XPath queries(filters) on an incoming stream of XML packets

Workload is very high XPath queries multiple predicates

3

Definition - XPath fragment

E is atomic predicates

4

Definition – XML and SAX Parsers

startDocument() startElement(a) text(s) endElement(a) endDocument()

a:element or attribute label s:data value

5

<a c=“3”> <b>4</b></a>

startDocument() startElement(a) startElement(@c) text(“3”) endElement(@c) startElement(b) text(“4”) endElement(b) endElement(a) endDocument()

6

XML stream processing problem

XPath expression P:boolean filter A XML document matches P if and only

if P selects at least one node when evaluated on the document’s root

Set P = {P1,…,Pn} Set I = {o1,…,on}

7

XPush Machine

Modified deterministic pushdown automaton

Simulate the execution of XPath filters Input :stream of XML documents Outout:oids Changes:

States:top-down,bottom-up Accepts SAX events as input

8

XPush Machine(cont.)

9

SAX call-back functions current state(qt,qb)

10

P1 = //a[b/text()=1 and .//a[@c>2]]P2 = // a[@c>2 and b/text()=1]

<a> <b> 1 </b> <a c= “3” > <b> 1 </b></a></a>

qo qo

qo

qo

qo

q1

qo

qo

qo

qo

q3

qo

q3

qo

qo

q3

qo

qo

qo

q3

q2

qo

qo

q3

qo

q4

qo

q3

q1

q4

qo

q3

q5

qo

q9

q15qo

q3

q4

11

12

13

Compiling a set of XPath filters to an XPush Machine

Convert XPath filters P1,…Pn into an Alternating Finite Automaton A1,…An

Translate all AFAs to a single XPush machine

14

Step1:Construct the AFA Nondeterministic finite automaton A1,…,An

S:union of all states in A1,…,An

One initial state s1,…,sn

terminal states are OR states labeled with an atomic predicate on data values

πs(v): true of predicates on v V, else false

15

Step1:Construct the AFA (cont.)

States label: AND, OR, or NOT εtransitions δ: S * (Σ∪ {ε}) P(S) AND and OR states :εtransitions NOT st

ates : one outgoing transition

16

Step1:Construct the AFA (cont.)

Given an XML document tree, AFA accepts document: Initial states matches the root node

OR state s matches node x: node x is a data value node and πs(v)=true Some transition s’ δ(s,a) matches y(child of x labele

d a) AND state s matches node x:

All transitions s’ δ(s,ε) matches x NOT state s matches node x:

If s’ doesn’t match x ,δ(s,ε) = {s’}

17

AFAs for P1,P2

18

example1

S = {1,..,13} s1 = 1,s2 = 8 wildcard:δ(5,@c) = Ø , δ(5,b) =5, δ(5,a) =6 And states : states2 and 9 π7(55)=true, π2(v)=false State :correspond to a subquery in XPath: state2 [b/text()=1 and .//a[@c>2]]

19

Step2: construct XPush Machine

(Qt,Qb,qot,qob,tpush,tvalue,tpop,)

20

tpop(qb,a)= δ-1(q,a)

δ-1(q,a) {s’|δ(s’,a) ∩ q≠ Ø } eval (q): a set of states q Adds to q all states that are implied by

states already in q AND states OR states NOT states

21

XPush Machine

22

example2

tvalue(qot,1)={4,13} = q1

tvalue(qot,x)={7,11} = q2 , for x > 2 tvalue(qo

t,x)= {Ø} = qo, for all other values of x

tpop(q8,a)={1,5} = q14

tbadd(q3, q6)={3,12}∪{5}= q8

leaf states cannot match with any other statesno mixed data

<a>1<b>2</b></a> X

23

Lazy XPush Machine

Do not construct states that are inconsistent with DTD

Lazy evaluation exploits regularities in the data that are not captured by the DTD

Avoid constructing States don’t occur in a given data set

24

Top-down Pruning

<e1>….<c>ci1</c>…..<c>cij</c>…</e1> keeping track of the enabled branches

in the top-down state bottom-up computations only at the e

nabled branches

25

Order Optimization

/person[name/text()=“smith” and age/text()=“33”and phone/text()=“5551234”]

prec(s)={s’|s’ s} tadd(qs

b,qb)=qsb ∪ {s|s qb,prec(s) qs

b}

26

Training the XPush Machine

Generate one XML document tree for every XPath query

27

Experiment

Real data sets: Protein 9.12MB XML fragment A non-recursive DTD Max depth of document is 7

28

Effectiveness

29

Runtime memory

30

Hit Ratio

top related