organizing and searching information with xml selectivity estimation for xml queries thomas beer,...
TRANSCRIPT
![Page 1: Organizing and Searching Information with XML Selectivity Estimation for XML Queries Thomas Beer, Christian Linz, Mostafa Khabouze](https://reader035.vdocuments.pub/reader035/viewer/2022062408/56649e9a5503460f94b9cea4/html5/thumbnails/1.jpg)
Organizing and Searching Information with XML
Selectivity Estimation for XML Queries
Thomas Beer, Christian Linz,Mostafa Khabouze
![Page 2: Organizing and Searching Information with XML Selectivity Estimation for XML Queries Thomas Beer, Christian Linz, Mostafa Khabouze](https://reader035.vdocuments.pub/reader035/viewer/2022062408/56649e9a5503460f94b9cea4/html5/thumbnails/2.jpg)
Outline
• Definition Selectivity Estimation• Motivation• Algorithms for Selectivity Estimation
oPath Tree oMarkov TablesoXPathLearneroXSketches
• Summary
![Page 3: Organizing and Searching Information with XML Selectivity Estimation for XML Queries Thomas Beer, Christian Linz, Mostafa Khabouze](https://reader035.vdocuments.pub/reader035/viewer/2022062408/56649e9a5503460f94b9cea4/html5/thumbnails/3.jpg)
SelectivityDefinition
Selectivity of a path expression σ(p) is defined as the number of paths in the XML data tree that match the tag sequence in p
A
B C
E DD
Example: σ(A/B/D) = 2
![Page 4: Organizing and Searching Information with XML Selectivity Estimation for XML Queries Thomas Beer, Christian Linz, Mostafa Khabouze](https://reader035.vdocuments.pub/reader035/viewer/2022062408/56649e9a5503460f94b9cea4/html5/thumbnails/4.jpg)
Motivation
• Estimating the size of query results and inter-mediate results is neccessary for effective query optimization
• Knowing selectivities of sub-queries help identifying cheap query evaluation plans
• Internet Context: Quick feedback about expected
result sizebefore evaluating the full query
result
![Page 5: Organizing and Searching Information with XML Selectivity Estimation for XML Queries Thomas Beer, Christian Linz, Mostafa Khabouze](https://reader035.vdocuments.pub/reader035/viewer/2022062408/56649e9a5503460f94b9cea4/html5/thumbnails/5.jpg)
Example
XQuery-Expression:
For $f IN document („personnel.xml“)//department/facultyWHERE count ($f/TA) > 0 AND count($f/RA) > 0RETURN $f
This expression matches all faculty members that has at least
one TA and one RA
• one join for every edge is computed
Presumption
• Number of nodes is known• Join-Algorithm: Nested Loop
Department
Faculty
RA TA
![Page 6: Organizing and Searching Information with XML Selectivity Estimation for XML Queries Thomas Beer, Christian Linz, Mostafa Khabouze](https://reader035.vdocuments.pub/reader035/viewer/2022062408/56649e9a5503460f94b9cea4/html5/thumbnails/6.jpg)
Node Count
Dep. 1
Faculty
3
RA 7
TA 2
Department
Name
Faculty
Secretary
Name
RARA TATA
Faculty Faculty
RA
RARA
Scientist
Name RARA
Method 1Join 1: (Faculty) – TAJoin 2: (Result Join 1) – RAJoin 3: (Result Join 2) – Dep.
Method 2Join 1: (Faculty) – Dep.Join 2: (Result Join 1) – RAJoin 3: (Result Join 2) – TA
Evaluating the join
Number of operations:Join 1: 3 * 2 = 6Join 2: 1 * 7 = 7Join 3: 1 * 1 = 1 Total = 14
Number of operations:Join 1: 3 * 1 = 3Join 2: 3 * 7 = 21Join 3: 3 * 2 = 6 Total = 30
![Page 7: Organizing and Searching Information with XML Selectivity Estimation for XML Queries Thomas Beer, Christian Linz, Mostafa Khabouze](https://reader035.vdocuments.pub/reader035/viewer/2022062408/56649e9a5503460f94b9cea4/html5/thumbnails/7.jpg)
Outline
• Motivation• Definition Selectivity Estimation• Algorithms for Selectivity Estimation
oPath Trees oMarkov TablesoXPathLearneroXSketches
• Summary
![Page 8: Organizing and Searching Information with XML Selectivity Estimation for XML Queries Thomas Beer, Christian Linz, Mostafa Khabouze](https://reader035.vdocuments.pub/reader035/viewer/2022062408/56649e9a5503460f94b9cea4/html5/thumbnails/8.jpg)
Representing XML data structure
Path Trees Markov Tables
![Page 9: Organizing and Searching Information with XML Selectivity Estimation for XML Queries Thomas Beer, Christian Linz, Mostafa Khabouze](https://reader035.vdocuments.pub/reader035/viewer/2022062408/56649e9a5503460f94b9cea4/html5/thumbnails/9.jpg)
A
B C
D D E1 31
2 1
1
Path Trees<A> <B></B> <B> <D></D> </B> <C> <D></D> <E></E> <E></E> <E></E> </C></A>
Problem: The Path Tree may become larger than the available memory
The tree has to be summarized
![Page 10: Organizing and Searching Information with XML Selectivity Estimation for XML Queries Thomas Beer, Christian Linz, Mostafa Khabouze](https://reader035.vdocuments.pub/reader035/viewer/2022062408/56649e9a5503460f94b9cea4/html5/thumbnails/10.jpg)
Summarizing a Path Tree
4 different Algorithms:•Sibling-*
•Level-*
•Global-*
•No-*
Delete the nodes with the lowest frequencies and replace them with a „* “ (star-node) to preserve some structural information
Operation breakdown:
![Page 11: Organizing and Searching Information with XML Selectivity Estimation for XML Queries Thomas Beer, Christian Linz, Mostafa Khabouze](https://reader035.vdocuments.pub/reader035/viewer/2022062408/56649e9a5503460f94b9cea4/html5/thumbnails/11.jpg)
Sibling-*
Operation breakdown:
A
B C
E G H
K K
FD
1
9
10 6
11 12
1557
13
KI J 4I J2
• Mark the nodes with the lowest frequencies for deletion
• Check siblings, if sibling coalesce
*n=2f=6• Traverse Tree and compute average frequency 3
A
B C
*
K
F*
*
1
9
8
f=23
n=23
156
13
![Page 12: Organizing and Searching Information with XML Selectivity Estimation for XML Queries Thomas Beer, Christian Linz, Mostafa Khabouze](https://reader035.vdocuments.pub/reader035/viewer/2022062408/56649e9a5503460f94b9cea4/html5/thumbnails/12.jpg)
Level-*
A
B C
G
K
F*
*
1
9
10
113
156
13
K 12
A
B C
E G H
K K
FD
I J
1
9
10 6
11 122
1557
13
4
• As before, delete the nodes with the lowest frequency
• One *-node for every level
![Page 13: Organizing and Searching Information with XML Selectivity Estimation for XML Queries Thomas Beer, Christian Linz, Mostafa Khabouze](https://reader035.vdocuments.pub/reader035/viewer/2022062408/56649e9a5503460f94b9cea4/html5/thumbnails/13.jpg)
Global-*A
B C
E G H
K K
FD
I J
1
9
10 6
11 122
1557
13
4
• Delete the nodes with the lowest frequency
• One *-node for the complete tree
*
B C
G H
K K
FD
9
10 6
11 12
157
13
3
![Page 14: Organizing and Searching Information with XML Selectivity Estimation for XML Queries Thomas Beer, Christian Linz, Mostafa Khabouze](https://reader035.vdocuments.pub/reader035/viewer/2022062408/56649e9a5503460f94b9cea4/html5/thumbnails/14.jpg)
No-*
• Low frequency nodes are deleted and not replaced• Tree may becomes a forest with many roots
No-* conservatively assumes that nodes that do not exist in the summarized path tree did not exist in the original path tree
![Page 15: Organizing and Searching Information with XML Selectivity Estimation for XML Queries Thomas Beer, Christian Linz, Mostafa Khabouze](https://reader035.vdocuments.pub/reader035/viewer/2022062408/56649e9a5503460f94b9cea4/html5/thumbnails/15.jpg)
Selectivity-EstimationA
B C
*
K
F*
*
1
9
8
113
156
13
•find all matchings tags
•estimated selectivity = total frequency of these nodes
Example: σ(A/B/F) = 15 + 6 = 21
σ(A/B/Z) = 6
σ(A/C/Z/K) = 11
![Page 16: Organizing and Searching Information with XML Selectivity Estimation for XML Queries Thomas Beer, Christian Linz, Mostafa Khabouze](https://reader035.vdocuments.pub/reader035/viewer/2022062408/56649e9a5503460f94b9cea4/html5/thumbnails/16.jpg)
Outline
• Motivation• Definition Selectivity Estimation• Algorithms for Selectivity Estimation
oPath Trees oMarkov TablesoXPathLearneroXSketches
• Summary
![Page 17: Organizing and Searching Information with XML Selectivity Estimation for XML Queries Thomas Beer, Christian Linz, Mostafa Khabouze](https://reader035.vdocuments.pub/reader035/viewer/2022062408/56649e9a5503460f94b9cea4/html5/thumbnails/17.jpg)
What are Markov Tables ?
• Table, conaining all distinct paths in the data of length up to m and their selectivity
• m 2• Order: m - 1• Markov Table = Markov Histogramm
A
B C
1
611 D 4
C 9 D 7
D 8
Path
Sel. Path
Sel.
A 1 AC 6
B 11 AD 4
C 15 BC 9
D 19 BD 7
AB 11 CD 8
![Page 18: Organizing and Searching Information with XML Selectivity Estimation for XML Queries Thomas Beer, Christian Linz, Mostafa Khabouze](https://reader035.vdocuments.pub/reader035/viewer/2022062408/56649e9a5503460f94b9cea4/html5/thumbnails/18.jpg)
Selectivity Estimation• The table provides selectivity estimates for all paths
of length up to m• Assumption that the occurence of a particular tag in
a path is dependant only on m-1 tags occuring before it
• Selectivity estimation for longer path expressions is done with the following formula
![Page 19: Organizing and Searching Information with XML Selectivity Estimation for XML Queries Thomas Beer, Christian Linz, Mostafa Khabouze](https://reader035.vdocuments.pub/reader035/viewer/2022062408/56649e9a5503460f94b9cea4/html5/thumbnails/19.jpg)
Selectivity Estimation
NtPttPttt niinn
i
*][*)]|[(,...,(1
1
1)2,1
P[tn] Propability of tag tn occuring in the xml data treeN Total number of nodes in the xml data tree
P[ti|ti+1] Probability of tag ti occuring before tag ti+1
E
E Predictand for the occurence of tag tn
E1
E1 Predictand for the occurence of tag ti before tag ti+1
Markov Chaint1
t2
t3
t…
t…
![Page 20: Organizing and Searching Information with XML Selectivity Estimation for XML Queries Thomas Beer, Christian Linz, Mostafa Khabouze](https://reader035.vdocuments.pub/reader035/viewer/2022062408/56649e9a5503460f94b9cea4/html5/thumbnails/20.jpg)
Selectivity Estimation
Ntf
tPi
i)(
][ )1
11
(
),(]|[
i
iiii
tf
ttfttP
)( pf = Selectivity of path p
),(*)(
),(),...,,( 1
2
1 1
121 )( nn
n
i i
iin ttf
tf
ttftttf
8*15
9),(*
)(
),(),,( )(
23
1
DCfCf
CBfDCBf
Example:
![Page 21: Organizing and Searching Information with XML Selectivity Estimation for XML Queries Thomas Beer, Christian Linz, Mostafa Khabouze](https://reader035.vdocuments.pub/reader035/viewer/2022062408/56649e9a5503460f94b9cea4/html5/thumbnails/21.jpg)
Summarizing Markov Tables
The Nodes with the lowest selectivity are deleted and replaced
3 Algorithms:
• Suffix-*
• Global-*
• No-*
![Page 22: Organizing and Searching Information with XML Selectivity Estimation for XML Queries Thomas Beer, Christian Linz, Mostafa Khabouze](https://reader035.vdocuments.pub/reader035/viewer/2022062408/56649e9a5503460f94b9cea4/html5/thumbnails/22.jpg)
Suffix-** - Path : representing all deleted paths of length 1*/* - Path : representing all deleted paths of
length 2
•Deleting a path of length 1 add to path *
SD : Set of deleted paths with length 2
•Deleting a path of length 2 add to SD and look for paths with the same start tag
Example: SD={(A/C), (G/H)}
deleting (A/B) (A/*)
•Before checking SD, check Markov Table
suffix-* path
![Page 23: Organizing and Searching Information with XML Selectivity Estimation for XML Queries Thomas Beer, Christian Linz, Mostafa Khabouze](https://reader035.vdocuments.pub/reader035/viewer/2022062408/56649e9a5503460f94b9cea4/html5/thumbnails/23.jpg)
Global-*
* - Path : representing all deleted paths of length 1*/* - Path : representing all deleted paths of
length 2
•Deleting a path of length 1 add to path *
•Deleting a path of length 2 immediately add to path */*
![Page 24: Organizing and Searching Information with XML Selectivity Estimation for XML Queries Thomas Beer, Christian Linz, Mostafa Khabouze](https://reader035.vdocuments.pub/reader035/viewer/2022062408/56649e9a5503460f94b9cea4/html5/thumbnails/24.jpg)
No-*
•does not use *-Paths
•Low-frequency paths simply discarded
If any of the required paths is not found (in the markov table) its selectivity is conservatively assumed to be zero
![Page 25: Organizing and Searching Information with XML Selectivity Estimation for XML Queries Thomas Beer, Christian Linz, Mostafa Khabouze](https://reader035.vdocuments.pub/reader035/viewer/2022062408/56649e9a5503460f94b9cea4/html5/thumbnails/25.jpg)
Which method should be used ?
Path Trees vs. Markov Table
Path exists in XML-Data * - Algorithm
Path do not exist No - * - Algorithm
„ * “ vs. „ No-* “
Data has common structure Markov Table
Data has NO common structure Path Trees
![Page 26: Organizing and Searching Information with XML Selectivity Estimation for XML Queries Thomas Beer, Christian Linz, Mostafa Khabouze](https://reader035.vdocuments.pub/reader035/viewer/2022062408/56649e9a5503460f94b9cea4/html5/thumbnails/26.jpg)
Outline
• Motivation• Definition Selectivity Estimation• Algorithms for Selectivity Estimation
oPath Trees oMarkov TablesoXPathLearneroXSketches
• Summary
![Page 27: Organizing and Searching Information with XML Selectivity Estimation for XML Queries Thomas Beer, Christian Linz, Mostafa Khabouze](https://reader035.vdocuments.pub/reader035/viewer/2022062408/56649e9a5503460f94b9cea4/html5/thumbnails/27.jpg)
Weaknesses of previous methods
• Off-line, scan of the entire data set
• Limited to simple path expressions
• Oblivious to workload distribution
• Updates too expensive
![Page 28: Organizing and Searching Information with XML Selectivity Estimation for XML Queries Thomas Beer, Christian Linz, Mostafa Khabouze](https://reader035.vdocuments.pub/reader035/viewer/2022062408/56649e9a5503460f94b9cea4/html5/thumbnails/28.jpg)
XPathLearner is...
• An on-line self-tuning Markov histogram for XML path selectivity estimation
• on-line: collects statistics from query feedback
• self-tuning: learns Markov model from feedback, adapts to changing XML
data
• workload-aware
• supports simple, single-value and multi-value path expressions
![Page 29: Organizing and Searching Information with XML Selectivity Estimation for XML Queries Thomas Beer, Christian Linz, Mostafa Khabouze](https://reader035.vdocuments.pub/reader035/viewer/2022062408/56649e9a5503460f94b9cea4/html5/thumbnails/29.jpg)
HistogramLearner
Histogram
Training data
SelectivityEstimator
feedback,real
selectivity
updates
estimated selectivity
System uses feedback to update the statistics for the queried path. Updates are based on the observed estimation error.
initial training
Workflow
observed estimation error
![Page 30: Organizing and Searching Information with XML Selectivity Estimation for XML Queries Thomas Beer, Christian Linz, Mostafa Khabouze](https://reader035.vdocuments.pub/reader035/viewer/2022062408/56649e9a5503460f94b9cea4/html5/thumbnails/30.jpg)
Basics
• Relies on path trees as intermediate representation
• Uses Markov histogram of order (m-1) to store the path tree and the statistics
• Henceforth m=2
table stores tag-tag and tag-value pairs and single tags
![Page 31: Organizing and Searching Information with XML Selectivity Estimation for XML Queries Thomas Beer, Christian Linz, Mostafa Khabouze](https://reader035.vdocuments.pub/reader035/viewer/2022062408/56649e9a5503460f94b9cea4/html5/thumbnails/31.jpg)
Data values
• Problem: Number of distinct data values is very large
table may become larger than the available memory
• Solution• Only the k most frequent tag-value pairs are
stored exactly• All other pairs are aggregated into buckets
according to some feature• Feature should distribute as uniform as possible
![Page 32: Organizing and Searching Information with XML Selectivity Estimation for XML Queries Thomas Beer, Christian Linz, Mostafa Khabouze](https://reader035.vdocuments.pub/reader035/viewer/2022062408/56649e9a5503460f94b9cea4/html5/thumbnails/32.jpg)
Example, k=1
Tag
Count
A 1
B 6
C 3
Tag
Tag
Count
A B 6
A C 3
Tag
Value
Count
B v3 3
Tag
Feat.
Sum
#pairs
B b 1 1
C a 1 1
Data value v1 begins with letter ‘a‘, v2 with the letter ‘b‘
A
B C
1
36
1V3 V1V2
3 1
![Page 33: Organizing and Searching Information with XML Selectivity Estimation for XML Queries Thomas Beer, Christian Linz, Mostafa Khabouze](https://reader035.vdocuments.pub/reader035/viewer/2022062408/56649e9a5503460f94b9cea4/html5/thumbnails/33.jpg)
Selectivity Estimation
NtPttPttt niinn
i
][)]|[(,...,(
1
1
1)2,1
P[tn] Propability of tag tn occuring in the xml data treeN Total number of nodes in the xml data tree
P[ti|ti+1] Probability of tag ti occuring before tag ti+1
E
E Expectation for the occurence of tag tn
E1
E1 Expectation for the occurence of tag ti before tag ti+1 (if n=2 ti+1 = tn)
![Page 34: Organizing and Searching Information with XML Selectivity Estimation for XML Queries Thomas Beer, Christian Linz, Mostafa Khabouze](https://reader035.vdocuments.pub/reader035/viewer/2022062408/56649e9a5503460f94b9cea4/html5/thumbnails/34.jpg)
Selectivity Estimation• Simple path p=//t1/t2.../tn
• Analogous for single-value path p=//t1/t2.../tn-
1=vn-1
• Slightly more complicated for multi-value path
)()(
),()...( ,1
1
1 1
121 nn
n
i i
iin ttf
tf
ttfttt
![Page 35: Organizing and Searching Information with XML Selectivity Estimation for XML Queries Thomas Beer, Christian Linz, Mostafa Khabouze](https://reader035.vdocuments.pub/reader035/viewer/2022062408/56649e9a5503460f94b9cea4/html5/thumbnails/35.jpg)
Example
336
6
)3()(
)()3(
vBfBf
ABfvAB
Tag
Count
A 1
B 6
C 3
Tag
Tag
Count
A B 6
A C 3
Tag
Value
Count
B v3 3
Tag
Feat.
Sum
#pairs
B b 1 1
C a 1 1
Real selectivity =3
![Page 36: Organizing and Searching Information with XML Selectivity Estimation for XML Queries Thomas Beer, Christian Linz, Mostafa Khabouze](https://reader035.vdocuments.pub/reader035/viewer/2022062408/56649e9a5503460f94b9cea4/html5/thumbnails/36.jpg)
Updates
• Changes in the data require the statistics to be updated
• Done via query feedback tuple (p,) • p denotes the path denotes the accurate selectivity of p
• Feedback is contributed to all path p according to some strategies
![Page 37: Organizing and Searching Information with XML Selectivity Estimation for XML Queries Thomas Beer, Christian Linz, Mostafa Khabouze](https://reader035.vdocuments.pub/reader035/viewer/2022062408/56649e9a5503460f94b9cea4/html5/thumbnails/37.jpg)
Learning process
• Given• Initially empty Markov Histogram f• Query feedback (p,)• Estimated selectivity
• Learn any unknown length-2-path• Update selectivities for known paths
• Two strategiesoHeavy-Tail-RuleoDelta-Rule
![Page 38: Organizing and Searching Information with XML Selectivity Estimation for XML Queries Thomas Beer, Christian Linz, Mostafa Khabouze](https://reader035.vdocuments.pub/reader035/viewer/2022062408/56649e9a5503460f94b9cea4/html5/thumbnails/38.jpg)
Algorithm-Part 1• Learn new paths of length up to 2
UPDATE(Histogram f, Feedback(p, ), Estimate )if |p|2 then
if not exists f(p)then add entry f(p)=
else f(p)
• Example: (AD)=1 (not in f), (AD) = 2Tag
Count
A 1
B 6
C 3
3CA
6BA
CountTagTag
2 DA
Tag
Value
Count
B v3 3
Tag
Feat.
Sum
#pairs
B b 1 1
C a 1 1
![Page 39: Organizing and Searching Information with XML Selectivity Estimation for XML Queries Thomas Beer, Christian Linz, Mostafa Khabouze](https://reader035.vdocuments.pub/reader035/viewer/2022062408/56649e9a5503460f94b9cea4/html5/thumbnails/39.jpg)
Algorithm-Part 2
• Learn longer paths (decompose into paths of length 2)
elsefor each (ti,ti+1)p
if not exists f(ti,ti+1)
then add entry f(ti,ti+1)=1
f(ti,ti+1) updateendfor
• f(ti,ti+1) update depends on update strategy
![Page 40: Organizing and Searching Information with XML Selectivity Estimation for XML Queries Thomas Beer, Christian Linz, Mostafa Khabouze](https://reader035.vdocuments.pub/reader035/viewer/2022062408/56649e9a5503460f94b9cea4/html5/thumbnails/40.jpg)
Example
Tag
Count
A 1
B 6
C 3
5CA
1DC
6BA
CountTagTag Tag
Value
Count
B v3 3
Tag
Feat.
Sum
#pairs
B b 1 1
C a 1 1
(ACD)=1, (ACD)=5
f(CD)=4
•decompose into AC and CD•AC is present update the frequency•CD is not present• update f(CD)
add f(CD)=1
4DC
![Page 41: Organizing and Searching Information with XML Selectivity Estimation for XML Queries Thomas Beer, Christian Linz, Mostafa Khabouze](https://reader035.vdocuments.pub/reader035/viewer/2022062408/56649e9a5503460f94b9cea4/html5/thumbnails/41.jpg)
Algorithm-Part 3
• Learn frequency of single tagsfor each tip, i1
if not exists f(ti)
then add entry f(ti)
f(ti) max{f(ti),f(, ti)}
endfor
• Example: (AD)=1 (not in f), (AD) = 2
3C
2D
6B
1A
CountTag
3CA
6BA
CountTagTag
2 DA
Tag
Value
Count
B v3 3
Tag
Feat.
Sum
#pairs
B b 1 1
C a 1 1
![Page 42: Organizing and Searching Information with XML Selectivity Estimation for XML Queries Thomas Beer, Christian Linz, Mostafa Khabouze](https://reader035.vdocuments.pub/reader035/viewer/2022062408/56649e9a5503460f94b9cea4/html5/thumbnails/42.jpg)
Update strategiesHeavy-Tail-Rule
• Attribute more of the estimation error to the end of the path
• where • wi weighting factors (increasing with i,e.g. 2i) learning rate• W normalized weight
njji
iit
iit
wwttf
ttf
))(sgn(),(
),(
1
11
)()( pp W
![Page 43: Organizing and Searching Information with XML Selectivity Estimation for XML Queries Thomas Beer, Christian Linz, Mostafa Khabouze](https://reader035.vdocuments.pub/reader035/viewer/2022062408/56649e9a5503460f94b9cea4/html5/thumbnails/43.jpg)
Update strategiesDelta-Rule
• Error reduction learning technique• Minimizes an error function
• update to term f(ti,ti+1) proportional to the negative gradient of E with respect to f(ti,ti+1)
determines the length of a step
2))()(( ppE
),(),(),(
1111
iil
iiliilttf
Ettfttf
![Page 44: Organizing and Searching Information with XML Selectivity Estimation for XML Queries Thomas Beer, Christian Linz, Mostafa Khabouze](https://reader035.vdocuments.pub/reader035/viewer/2022062408/56649e9a5503460f94b9cea4/html5/thumbnails/44.jpg)
Evaluation
• Good• on-line, adapts to changing data• workload-aware• after learning phase comparable to
off-line methods• update overhead nearly constant
• Bad• still restricted to XML trees, no
support for idrefs
![Page 45: Organizing and Searching Information with XML Selectivity Estimation for XML Queries Thomas Beer, Christian Linz, Mostafa Khabouze](https://reader035.vdocuments.pub/reader035/viewer/2022062408/56649e9a5503460f94b9cea4/html5/thumbnails/45.jpg)
Outline
• Motivation• Definition Selectivity Estimation• Algorithms for Selectivity Estimation
oPath Trees and Markov TablesoXPathLearneroXSketches
• Summary
![Page 46: Organizing and Searching Information with XML Selectivity Estimation for XML Queries Thomas Beer, Christian Linz, Mostafa Khabouze](https://reader035.vdocuments.pub/reader035/viewer/2022062408/56649e9a5503460f94b9cea4/html5/thumbnails/46.jpg)
Preliminaries
XML Data Graph
• A: Author • P: Paper• B: Book• PB: Publisher• T: Title• N: Name
P0
A1
PB3
P6N4
T13
N8 B5
T10
A2
P7 B9
T12 V8 T11 V4
E14
V10
V11
V12
V13
V14
![Page 47: Organizing and Searching Information with XML Selectivity Estimation for XML Queries Thomas Beer, Christian Linz, Mostafa Khabouze](https://reader035.vdocuments.pub/reader035/viewer/2022062408/56649e9a5503460f94b9cea4/html5/thumbnails/47.jpg)
Preliminaries
Path Expressions
• XPath Expressions : • Simple: A/P/T• Complex :
A[B]/P/T• Result is a set
P0
A1
PB3
P6N4
T13
N8 B5
T10
A2
P7 B9
T12 V8 T11 V4
E14
V10
V11
V12
V13
V14
T11 T12
![Page 48: Organizing and Searching Information with XML Selectivity Estimation for XML Queries Thomas Beer, Christian Linz, Mostafa Khabouze](https://reader035.vdocuments.pub/reader035/viewer/2022062408/56649e9a5503460f94b9cea4/html5/thumbnails/48.jpg)
Preliminaries
Path Expressions
• XPath Expressions : • Simple: A/P/T• Complex :
A[B]/P/T• Result is a set
P0
A1
PB3
P6N4
T13
N8 B5
T10
A2
P7 B9
T12 V8 V4
E14
V10
V11
V12
V13
V14
T11
![Page 49: Organizing and Searching Information with XML Selectivity Estimation for XML Queries Thomas Beer, Christian Linz, Mostafa Khabouze](https://reader035.vdocuments.pub/reader035/viewer/2022062408/56649e9a5503460f94b9cea4/html5/thumbnails/49.jpg)
Preliminaries
Path Expressions
• XPath Expressions : • Simple: A/P/T• Complex :
A[B]/P/T• Result is a set:
{T1,T2}
P0
A1
PB3
P6N4
T13
N8 B5
T10
A2
P7 B9
T12 V8 T11 V4
E14
V10
V11
V12
V13
V14
T11 T12
![Page 50: Organizing and Searching Information with XML Selectivity Estimation for XML Queries Thomas Beer, Christian Linz, Mostafa Khabouze](https://reader035.vdocuments.pub/reader035/viewer/2022062408/56649e9a5503460f94b9cea4/html5/thumbnails/50.jpg)
Preliminaries
• MotivationSelectivity Estimation over XML Data
Graphs
• OutlineoXSketch SynopsisoEstimation FrameworkoXSketch Refinement OperationsoExperiment
![Page 51: Organizing and Searching Information with XML Selectivity Estimation for XML Queries Thomas Beer, Christian Linz, Mostafa Khabouze](https://reader035.vdocuments.pub/reader035/viewer/2022062408/56649e9a5503460f94b9cea4/html5/thumbnails/51.jpg)
XSketch Synopsis
• XML Data Graph
• General Synopsis Graph
P(1)
A(2) PB(1)
N(2) P(2) B(2)
T(2) T(2) E(1)
Count(A) = | Extent(A) |
= |{A1,A2}| =2
P0
A1
PB3
P6N4
T13
N8 B5
T10
A2
P7 B9
T12 V8 T11 V4
E14
V10
V11
V12
V13
V14
![Page 52: Organizing and Searching Information with XML Selectivity Estimation for XML Queries Thomas Beer, Christian Linz, Mostafa Khabouze](https://reader035.vdocuments.pub/reader035/viewer/2022062408/56649e9a5503460f94b9cea4/html5/thumbnails/52.jpg)
Backward-edge Stability
• XML Data Graph
• Synopsis Graph
b P(1) b
A(2) PB(1) b b
N(2) P(2) B(2)
b b b
T(2) T(2) E(1)
Label(u,v) = b if all elements in v have a parent in u
P0
A1
PB3
P6N4
T13
N8 B5
T10
A2
P7 B9
T12 V8 T11 V4
E14
V10
V11
V12
V13
V14
![Page 53: Organizing and Searching Information with XML Selectivity Estimation for XML Queries Thomas Beer, Christian Linz, Mostafa Khabouze](https://reader035.vdocuments.pub/reader035/viewer/2022062408/56649e9a5503460f94b9cea4/html5/thumbnails/53.jpg)
Backward-edge Stability
• XML Data Graph
• Synopsis Graph
b P(1) b
A(2) PB(1) b b
N(2) P(2) B(2)
b b b
T(2) T(2) E(1)
Label(A2,B2) & Label(PB1,B2)
are empty
P0
A1
PB3
P6N4
T13
N8 B5
T10
A2
P7 B9
T12 V8 T11 V4
E14
V10
V11
V12
V13
V14
![Page 54: Organizing and Searching Information with XML Selectivity Estimation for XML Queries Thomas Beer, Christian Linz, Mostafa Khabouze](https://reader035.vdocuments.pub/reader035/viewer/2022062408/56649e9a5503460f94b9cea4/html5/thumbnails/54.jpg)
Forward-edge Stability
• XML Data Graph
• Synopsis Graph
f P(1) f
A(2) PB(1) f f f
N(2) P(2) B(2)
f f
T(2) T(2) E(1)
Label(u,v) = f if all elements in u have a child in v
P0
A1
PB3
P6N4
T13
N8 B5
T10
A2
P7 B9
T12 V8 T11 V4
E14
V10
V11
V12
V13
V14
![Page 55: Organizing and Searching Information with XML Selectivity Estimation for XML Queries Thomas Beer, Christian Linz, Mostafa Khabouze](https://reader035.vdocuments.pub/reader035/viewer/2022062408/56649e9a5503460f94b9cea4/html5/thumbnails/55.jpg)
Forward-edge Stability
• XML Data Graph
• Synopsis Graph
f P(1) f
A(2) PB(1) f f f
N(2) P(2) B(2)
f f
T(2) T(2) E(1)
B9 is in B(2) have no child in E(1)
P0
A1
PB3
P6N4
T13
N8 B5
T10
A2
P7 B9
T12 V8 T11 V4
E14
V10
V11
V12
V13
V14
![Page 56: Organizing and Searching Information with XML Selectivity Estimation for XML Queries Thomas Beer, Christian Linz, Mostafa Khabouze](https://reader035.vdocuments.pub/reader035/viewer/2022062408/56649e9a5503460f94b9cea4/html5/thumbnails/56.jpg)
XSketch Synopsis
• XML Data Graph
• XSketch Synopsis Graph
f/b P(1) f/b
A(2) PB(1) f/b f/b Ø f
N(2) P(2) B(2)
f/b f/b b
T(2) T(2) E(1)
XSketch is a Synopsis G. with Label(u,v)={b,f,b/f, Ø}
P0
A1
PB3
P6N4
T13
N8 B5
T10
A2
P7 B9
T12 V8 T11 V4
E14
V10
V11
V12
V13
V14
![Page 57: Organizing and Searching Information with XML Selectivity Estimation for XML Queries Thomas Beer, Christian Linz, Mostafa Khabouze](https://reader035.vdocuments.pub/reader035/viewer/2022062408/56649e9a5503460f94b9cea4/html5/thumbnails/57.jpg)
Estimation Framework
• calculate the Selectivity for the PE. V=V1/…/Vn
Count (V) = Count (Vn) * f( V )
1.Case:For all i if Label (Vi , Vi+1) = {b}f (V) =1, so
Count (V) = Count (Vn)
• Example :
f/b P(1) f/b
A(2) PB(1) f/b f/b f
N(2) P(2) B(2)
f/b f/b b
T(2) T(2) E(1)
Count (A/P/T) = Count (T) * f (A/P/T) = 2
![Page 58: Organizing and Searching Information with XML Selectivity Estimation for XML Queries Thomas Beer, Christian Linz, Mostafa Khabouze](https://reader035.vdocuments.pub/reader035/viewer/2022062408/56649e9a5503460f94b9cea4/html5/thumbnails/58.jpg)
Estimation Framework2.Case:if exist i s.t. Label (Vi ,Vi+1)≠ {b}
A1. Path Independance Assum-ption: f (u/v | v/w) ≈ f(u/v)
A2. B-Edge Uniformity Assum-ption:
all Ui in U such that: Label (U,V) ≠ b are uniformlydistributed over all suchparents
• Example :
f/b P(1) f/b
A(2) PB(1) f/b f/b Ø f
N(2) P(2) B(2)
f/b f/b b
T(2) T(2) E(1)
f (P/PB/B/T) = ???
![Page 59: Organizing and Searching Information with XML Selectivity Estimation for XML Queries Thomas Beer, Christian Linz, Mostafa Khabouze](https://reader035.vdocuments.pub/reader035/viewer/2022062408/56649e9a5503460f94b9cea4/html5/thumbnails/59.jpg)
Estimation Framework
• Example: f (P/PB/B/T) = ??
f (P/PB/B/T) = f (B/T) * f (P/PB/B | B/T) = f (B/T) * f (PB/B | B/T) * f (P/PB |
PB/B/T)B-Stability = f (PB/B | B/T) A1: ≈ f (PB/B)A2: = Count (PB) / [ Count (PB) + Count
(A) ]
f (P/PB/B/T) = 1 / 1+2 = 1/3
![Page 60: Organizing and Searching Information with XML Selectivity Estimation for XML Queries Thomas Beer, Christian Linz, Mostafa Khabouze](https://reader035.vdocuments.pub/reader035/viewer/2022062408/56649e9a5503460f94b9cea4/html5/thumbnails/60.jpg)
Estimation Framework
• A3. Branch-Independence Assumption: Outgoing paths from v are conditionally
independent of the existence of other outgoing paths
• A4. Forward-Edge Uniformity Assumption : The outgoing edges from v to all children u of
v such that Label(u,v) ≠ F are uniformly
distributed across all such children
![Page 61: Organizing and Searching Information with XML Selectivity Estimation for XML Queries Thomas Beer, Christian Linz, Mostafa Khabouze](https://reader035.vdocuments.pub/reader035/viewer/2022062408/56649e9a5503460f94b9cea4/html5/thumbnails/61.jpg)
XSketch Refinement Operations
• Goal : construct an efficient
XSketch for given space budget
• Refinement Operations:B-Stabilize (Xs (G), u,v): Label(v,u) ≠ B. Refine node u into two elementpartitions u1,u2 with the samelabel s.t. Label(v,u1) = B orLabel(v,u2) = B
Example : V1 V2…Vn
U V1 V2….Vn b U1 U2 b-Stabilize
![Page 62: Organizing and Searching Information with XML Selectivity Estimation for XML Queries Thomas Beer, Christian Linz, Mostafa Khabouze](https://reader035.vdocuments.pub/reader035/viewer/2022062408/56649e9a5503460f94b9cea4/html5/thumbnails/62.jpg)
XSketch Refinement Operations
• f-Stabilize (Xs(G),u,w):
• Label(u,w)≠ F
• Refine u into two nodes
u1,u2 with same label s.t.
Label (u1,w) = label(u,w)U{F}
Example: U
W1 W2….Wn U1 U2 f W1 W2…….Wn
f - Stabilize
![Page 63: Organizing and Searching Information with XML Selectivity Estimation for XML Queries Thomas Beer, Christian Linz, Mostafa Khabouze](https://reader035.vdocuments.pub/reader035/viewer/2022062408/56649e9a5503460f94b9cea4/html5/thumbnails/63.jpg)
XSketch Refinement Operations
A
P1 ...Pi
Pi+1... Pn
Pi Pi+1...PnP1 ...
A1 A2
P1 ... Pi
c(A)
P1 ...Pi
Pi+1... Pn
Pi+1...Pn
Backward Split
![Page 64: Organizing and Searching Information with XML Selectivity Estimation for XML Queries Thomas Beer, Christian Linz, Mostafa Khabouze](https://reader035.vdocuments.pub/reader035/viewer/2022062408/56649e9a5503460f94b9cea4/html5/thumbnails/64.jpg)
0
10
20
30
40
50
60
70
80
90
100
15 20 25 30 35 40 45 50Summary Size (KB)
Avg
Abs
Rel
Err
or (%
)XSketches
MT
Wp pcount
pestimpcount
W )(
|)()(|
||
1
Markov Tables vs. XSketch
![Page 65: Organizing and Searching Information with XML Selectivity Estimation for XML Queries Thomas Beer, Christian Linz, Mostafa Khabouze](https://reader035.vdocuments.pub/reader035/viewer/2022062408/56649e9a5503460f94b9cea4/html5/thumbnails/65.jpg)
Outline
• Motivation• Definition Selectivity Estimation• Algorithms for Selectivity Estimation
oPath Trees and Markov TablesoXPathLearneroXSketches
• Summary
![Page 66: Organizing and Searching Information with XML Selectivity Estimation for XML Queries Thomas Beer, Christian Linz, Mostafa Khabouze](https://reader035.vdocuments.pub/reader035/viewer/2022062408/56649e9a5503460f94b9cea4/html5/thumbnails/66.jpg)
Summary
• Definition Selectivity
• Summarizing XML Documents (Path Trees / Markov Tables)
• Application using Markov Tables: XPathLearner
• Extension of Selectivity Estimation on Graphs: XSketch
![Page 67: Organizing and Searching Information with XML Selectivity Estimation for XML Queries Thomas Beer, Christian Linz, Mostafa Khabouze](https://reader035.vdocuments.pub/reader035/viewer/2022062408/56649e9a5503460f94b9cea4/html5/thumbnails/67.jpg)
Questions?