stream processing of xpath queries with predicates
DESCRIPTION
Stream Processing of XPath Queries with Predicates. Ashish Kumar Gupta Dan Suciu University of Washington SIGMOD 2003. 報告者 : 蔡明瑾. Introduction. XML messages :exchange information XML stream processing problem Processing XPath queries(filters) on an incoming stream of XML packets - PowerPoint PPT PresentationTRANSCRIPT
2004/4/23 1
Stream Processing of XPath Queries with Predicates
Ashish Kumar Gupta Dan Suciu University of Washington
SIGMOD 2003
報告者 : 蔡明瑾
2
Introduction
XML messages :exchange information XML stream processing problem
Processing XPath queries(filters) on an incoming stream of XML packets
Workload is very high XPath queries multiple predicates
3
Definition - XPath fragment
E is atomic predicates
4
Definition – XML and SAX Parsers
startDocument() startElement(a) text(s) endElement(a) endDocument()
a:element or attribute label s:data value
5
<a c=“3”> <b>4</b></a>
startDocument() startElement(a) startElement(@c) text(“3”) endElement(@c) startElement(b) text(“4”) endElement(b) endElement(a) endDocument()
6
XML stream processing problem
XPath expression P:boolean filter A XML document matches P if and only
if P selects at least one node when evaluated on the document’s root
Set P = {P1,…,Pn} Set I = {o1,…,on}
7
XPush Machine
Modified deterministic pushdown automaton
Simulate the execution of XPath filters Input :stream of XML documents Outout:oids Changes:
States:top-down,bottom-up Accepts SAX events as input
8
XPush Machine(cont.)
9
SAX call-back functions current state(qt,qb)
10
P1 = //a[b/text()=1 and .//a[@c>2]]P2 = // a[@c>2 and b/text()=1]
<a> <b> 1 </b> <a c= “3” > <b> 1 </b></a></a>
qo qo
qo
qo
qo
q1
qo
qo
qo
qo
q3
qo
q3
qo
qo
q3
qo
qo
qo
q3
q2
qo
qo
q3
qo
q4
qo
q3
q1
q4
qo
q3
q5
qo
q9
q15qo
q3
q4
11
12
13
Compiling a set of XPath filters to an XPush Machine
Convert XPath filters P1,…Pn into an Alternating Finite Automaton A1,…An
Translate all AFAs to a single XPush machine
14
Step1:Construct the AFA Nondeterministic finite automaton A1,…,An
S:union of all states in A1,…,An
One initial state s1,…,sn
terminal states are OR states labeled with an atomic predicate on data values
πs(v): true of predicates on v V, else false
15
Step1:Construct the AFA (cont.)
States label: AND, OR, or NOT εtransitions δ: S * (Σ∪ {ε}) P(S) AND and OR states :εtransitions NOT st
ates : one outgoing transition
16
Step1:Construct the AFA (cont.)
Given an XML document tree, AFA accepts document: Initial states matches the root node
OR state s matches node x: node x is a data value node and πs(v)=true Some transition s’ δ(s,a) matches y(child of x labele
d a) AND state s matches node x:
All transitions s’ δ(s,ε) matches x NOT state s matches node x:
If s’ doesn’t match x ,δ(s,ε) = {s’}
17
AFAs for P1,P2
18
example1
S = {1,..,13} s1 = 1,s2 = 8 wildcard:δ(5,@c) = Ø , δ(5,b) =5, δ(5,a) =6 And states : states2 and 9 π7(55)=true, π2(v)=false State :correspond to a subquery in XPath: state2 [b/text()=1 and .//a[@c>2]]
19
Step2: construct XPush Machine
(Qt,Qb,qot,qob,tpush,tvalue,tpop,)
20
tpop(qb,a)= δ-1(q,a)
δ-1(q,a) {s’|δ(s’,a) ∩ q≠ Ø } eval (q): a set of states q Adds to q all states that are implied by
states already in q AND states OR states NOT states
21
XPush Machine
22
example2
tvalue(qot,1)={4,13} = q1
tvalue(qot,x)={7,11} = q2 , for x > 2 tvalue(qo
t,x)= {Ø} = qo, for all other values of x
tpop(q8,a)={1,5} = q14
tbadd(q3, q6)={3,12}∪{5}= q8
leaf states cannot match with any other statesno mixed data
<a>1<b>2</b></a> X
23
Lazy XPush Machine
Do not construct states that are inconsistent with DTD
Lazy evaluation exploits regularities in the data that are not captured by the DTD
Avoid constructing States don’t occur in a given data set
24
Top-down Pruning
<e1>….<c>ci1</c>…..<c>cij</c>…</e1> keeping track of the enabled branches
in the top-down state bottom-up computations only at the e
nabled branches
25
Order Optimization
/person[name/text()=“smith” and age/text()=“33”and phone/text()=“5551234”]
prec(s)={s’|s’ s} tadd(qs
b,qb)=qsb ∪ {s|s qb,prec(s) qs
b}
26
Training the XPush Machine
Generate one XML document tree for every XPath query
27
Experiment
Real data sets: Protein 9.12MB XML fragment A non-recursive DTD Max depth of document is 7
28
Effectiveness
29
Runtime memory
30
Hit Ratio