sept. 27, 2002 isdb’02 transforming xpath queries for bottom-up query processing yoshiharu...
Post on 14-Jan-2016
214 Views
Preview:
TRANSCRIPT
Sept. 27, 2002Sept. 27, 2002 ISDB’02ISDB’02
Transforming XPath Queries for Transforming XPath Queries for Bottom-Up Query ProcessingBottom-Up Query Processing
Yoshiharu Ishikawa Yoshiharu Ishikawa Takaaki NagaiTakaaki Nagai
Hiroyuki KitagawaHiroyuki KitagawaUniversity of TsukubaUniversity of Tsukuba
{ishikawa,kitagawa}@is.tsukuba.ac.jp{ishikawa,kitagawa}@is.tsukuba.ac.jp
Sept. 27, 2002Sept. 27, 2002 ISDB'02ISDB'02
Presentation OverviewPresentation Overview
BackgroundBackground Motivation and Our ApproachMotivation and Our Approach The Proximal Nodes ModelThe Proximal Nodes Model Query TranslationQuery Translation Translation ExampleTranslation Example Related WorkRelated Work Conclusions and Future WorkConclusions and Future Work
Sept. 27, 2002Sept. 27, 2002 ISDB'02ISDB'02
BackgroundBackground
XML : content-description language on the XML : content-description language on the WebWeb
XPathXPath pattern-based query language for XMLpattern-based query language for XML extracts XML nodes based on the specified patteextracts XML nodes based on the specified patte
rnrn has has navigational semanticsnavigational semantics XSLT uses XPath for the node specificationXSLT uses XPath for the node specification XQuery also uses XPathXQuery also uses XPath
Sept. 27, 2002Sept. 27, 2002 ISDB'02ISDB'02
XML ExampleXML Example<itemlist><itemlist> <item category=<item category=""audio equipment">audio equipment"> <catalog-info><catalog-info> <type>CD player</type><type>CD player</type> <manufacturer>Star Electronics</manufacturer><manufacturer>Star Electronics</manufacturer> <catalog-no>CDP-R55N</catalog-no><catalog-no>CDP-R55N</catalog-no> </catalog-info></catalog-info> <sales-info><sales-info> <prod-year>2001</prod-year><prod-year>2001</prod-year> <price>125.00</price><price>125.00</price> </sales-info></sales-info> </item></item> ......</itemlist></itemlist>
Sept. 27, 2002Sept. 27, 2002 ISDB'02ISDB'02
XPath QueryXPath Query
Sample query Sample query QQ: retrieve prices of CD player: retrieve prices of CD playerss
XPath sentenceXPath sentence contains contains location stepslocation steps separated by "/" separated by "/" a location step has the format a location step has the format axis::node_test[praxis::node_test[pr
edicate]...[predicate]edicate]...[predicate] location steps can be abbreviatedlocation steps can be abbreviated
e.g., /descendant::foo e.g., /descendant::foo →→ //foo, /attribute::bar //foo, /attribute::bar →→ @bar @bar
/itemlist/item[@category = "audio equipment"]/itemlist/item[@category = "audio equipment"] [catalog-info/type = "CD player"]/sales-info/price[catalog-info/type = "CD player"]/sales-info/price
Sept. 27, 2002Sept. 27, 2002 ISDB'02ISDB'02
Presentation OverviewPresentation Overview
BackgroundBackground Motivation and Our ApproachMotivation and Our Approach The Proximal Nodes ModelThe Proximal Nodes Model Query TranslationQuery Translation Translation ExampleTranslation Example Related WorkRelated Work Conclusions and Future WorkConclusions and Future Work
Sept. 27, 2002Sept. 27, 2002 ISDB'02ISDB'02
article
authorsauthors
author author
"Smith" "White" "Chen"
author author
"Miller"
XPath SemanticsXPath Semantics XPath assumes XPath assumes top-downtop-down query processing query processing
Not efficient for large XML databasesNot efficient for large XML databases Bottom-up processingBottom-up processing is better in some cases is better in some cases
query: /article/authors[author = "Miller"]
article
authorsauthors
author author
"Smith" "White" "Chen"
article
authors authors
author author
"Miller""Miller"
author author authorauthor
article
authors
"Miller"
author
top-downtop-down bottom-upbottom-up
Sept. 27, 2002Sept. 27, 2002 ISDB'02ISDB'02
Bottom-Up Query ProcessingBottom-Up Query Processing We can process the We can process the
example query whenexample query when we can determine the we can determine the
specified leaf elements specified leaf elements (i.e., "Miller") with the (i.e., "Miller") with the help of an help of an indexindex, and, and
we can select the parent we can select the parent for a specific author for a specific author node.node.
We do not need to We do not need to access all the access all the authors/author authors/author elementselements
article
authorsauthors
author author
"Smith" "White" "Chen"
author author
"Miller"
article
authors
"Miller"
author
Sept. 27, 2002Sept. 27, 2002 ISDB'02ISDB'02
Our Objective and ApproachOur Objective and Approach
Our ObjectiveOur Objective Efficient bottom-up processing of XPath queries Efficient bottom-up processing of XPath queries
with the help of index structureswith the help of index structures Our ApproachOur Approach
Use of the Use of the proximal nodes modelproximal nodes model as the underlyi as the underlying retrieval modelng retrieval model
The model enables bottom-up query evaluationThe model enables bottom-up query evaluation Development of transformation rules from XPath Development of transformation rules from XPath
queries to proximal nodes expressionsqueries to proximal nodes expressions
Sept. 27, 2002Sept. 27, 2002 ISDB'02ISDB'02
Presentation OverviewPresentation Overview
BackgroundBackground Motivation and Our ApproachMotivation and Our Approach The Proximal Nodes ModelThe Proximal Nodes Model Query TranslationQuery Translation Translation ExampleTranslation Example Related WorkRelated Work Conclusions and Future WorkConclusions and Future Work
Sept. 27, 2002Sept. 27, 2002 ISDB'02ISDB'02
The Proximal Nodes Model (1)The Proximal Nodes Model (1)
Proposed by Navarro and Baeza-Yates [7] as a strProposed by Navarro and Baeza-Yates [7] as a structured document retrieval modeluctured document retrieval model
Uses Uses bottom-up bottom-up query processing approachquery processing approach XML data can be treated as nested nodes:XML data can be treated as nested nodes:
a a node node corresponds to an element or attribute in XMLcorresponds to an element or attribute in XML each node has an associated text region (called the each node has an associated text region (called the segseg
mentment): segments can take nested structure): segments can take nested structure Expressive power and efficiency are well-balancedExpressive power and efficiency are well-balanced
evaluation cost is almost O(n): n is the no. of nodesevaluation cost is almost O(n): n is the no. of nodes
Sept. 27, 2002Sept. 27, 2002 ISDB'02ISDB'02
The Proximal Nodes Model (2)The Proximal Nodes Model (2) The model consists of three componentsThe model consists of three components Text pattern matching languageText pattern matching language
specifies pattern matching conditionsspecifies pattern matching conditions implementation dependentimplementation dependent returns a set of the matched nodesreturns a set of the matched nodes example: "ABC Corporation"example: "ABC Corporation"
Retrieval operators based on document structuresRetrieval operators based on document structures returns a set of nodes for a given element or attribute returns a set of nodes for a given element or attribute
namename example: chapter, priceexample: chapter, price
Operators to integrate partial retrieval resultsOperators to integrate partial retrieval results calculates the result node set from the given node setscalculates the result node set from the given node sets efficient computation based on segment relationshipsefficient computation based on segment relationships
Sept. 27, 2002Sept. 27, 2002 ISDB'02ISDB'02
Proximal Nodes OperatorsProximal Nodes Operators
P P inin Q Q a set of P nodes contained in one or more Q nodesa set of P nodes contained in one or more Q nodes
P P withwith Q Q a set of P nodes that contains one or more Q nodesa set of P nodes that contains one or more Q nodes
P P childchild Q Q a set of P nodes each of which is a child of a Q nodea set of P nodes each of which is a child of a Q node
P P parentparent Q Q a set of P nodes each of which is a parent of a Q nodea set of P nodes each of which is a parent of a Q node
P P + + QQ the union of P and Qthe union of P and Q
P P -- Q Q the difference of P and Qthe difference of P and Q
P P isis Q Q the intersection of P and Qthe intersection of P and Q
P P samesame Q Q a set of P nodes each of which is equal to a Q nodea set of P nodes each of which is equal to a Q node
P and Q are nodes with associated segmentsP and Q are nodes with associated segments
Sept. 27, 2002Sept. 27, 2002 ISDB'02ISDB'02
Example of Proximal Nodes ExpressionExample of Proximal Nodes Expression
Example expression of proximal nodes modelExample expression of proximal nodes model
Query processing stepsQuery processing steps 1. determine the node sets that corresponds to the 1. determine the node sets that corresponds to the
elements "item" and "type" using indexeselements "item" and "type" using indexes 2. determine the node set that corresponds to the pattern 2. determine the node set that corresponds to the pattern
"CD player" using an index"CD player" using an index 3. compute the result of "same" operator 3. compute the result of "same" operator 4. compute the result of "with" operator4. compute the result of "with" operator
item with (type same "CD player")
Sept. 27, 2002Sept. 27, 2002 ISDB'02ISDB'02
Presentation OverviewPresentation Overview
BackgroundBackground Motivation and Our ApproachMotivation and Our Approach The Proximal Nodes ModelThe Proximal Nodes Model Query TranslationQuery Translation Translation ExampleTranslation Example Related WorkRelated Work Conclusions and Future WorkConclusions and Future Work
Sept. 27, 2002Sept. 27, 2002 ISDB'02ISDB'02
Translation Rules (1)Translation Rules (1)
Supports major XPath patternsSupports major XPath patterns Based on the XPath semantic description by Based on the XPath semantic description by
Wadler [10]Wadler [10] Use of denotational semanticsUse of denotational semantics
Sept. 27, 2002Sept. 27, 2002 ISDB'02ISDB'02
Translation Rules (2)Translation Rules (2)
A
, child
otherwise
namewith][][haswhen,][][
][]::[
][ where][]/[
][()][
][][/
][][]|[
)(→→→:
111
11111221
2121
1
error
naasaxxanxn
xpxpa
xpxxxpxpp
xaATextxtext
Rootpxp
xpxpxpp
SegmentSetSNodeNamePatternAxis
a
aa
aaa
aa
aa
aaa
PAS
SS
SSS
S
SS
SSS
S
Sept. 27, 2002Sept. 27, 2002 ISDB'02ISDB'02
Translation Rules (3)Translation Rules (3)
,,
][withofattributestheallare@
][][]][[
][@][@][@
]::[][@
][andexpressionnonnumericais where,
][]][[
expressionnumericaiswhere,][][]][[
][typenodethehavethatofs'][allare,,
][][]*[
1
1
1
11
1
1
aAxnn
xpqxqp
xnxnx
xnattributexn
xpxq
xqxxqp
qxpqxqp
aPxaAnn
xnxnx
m
aa
maaa
aa
a
aa
aa
m
maaa
SS
SSS
SS
S
QwithS
SS
SSS
Sept. 27, 2002Sept. 27, 2002 ISDB'02ISDB'02
Auxiliary FunctionsAuxiliary Functions
child]attribute[
with]ancestor[
in]descendant[
parent]parent[
child]child[
→:
=
=
=
=
=
A
A
A
A
A
A OperatorAxis
Attribute]attribute[
Element]ancestor[
Element]descendant[
Element]parent[
Element]child[
P
P
P
P
P
P Nodetype:Axis
Sept. 27, 2002Sept. 27, 2002 ISDB'02ISDB'02
Simplification Using the Knowledge Simplification Using the Knowledge of Document Structureof Document Structure
If we know the DTD of the target XML, we If we know the DTD of the target XML, we can derive more simplified translation resultscan derive more simplified translation results
nxn
xann
error
naasaxxanxn
xpxxpxpp
pp
xpxxxpxpp
a
a
aaa
aaa
][:rulesimplified
,][iprelationshthesatisfiestoingcorrespondsetnodetheknowweif
otherwise
namewith][][haswhen,][][:original
][ where][]/[:rulesimplified
,ofchildtheasappearsonlyknowweif
][ where][]/[:original
1111221
12
11111221
S
A
PAS
SSS
SSS
A
,
, child
Sept. 27, 2002Sept. 27, 2002 ISDB'02ISDB'02
Presentation OverviewPresentation Overview
BackgroundBackground Motivation and Our ApproachMotivation and Our Approach The Proximal Nodes ModelThe Proximal Nodes Model Query TranslationQuery Translation Translation ExampleTranslation Example Related WorkRelated Work Conclusions and Future WorkConclusions and Future Work
Sept. 27, 2002Sept. 27, 2002 ISDB'02ISDB'02
Translation ExampleTranslation Example Original query Original query QQ
Translation result:Translation result: tt11 = item with (item with (category same "audio equipment")) = item with (item with (category same "audio equipment"))
tt22 = catalog-info child t = catalog-info child t11
tt33 = t = t11 with (t with (t11 with (((type child t with (((type child t22) child t) child t22) same "CD player"))) same "CD player"))
tt44 = sales-info child t = sales-info child t33
ans = (((price child tans = (((price child t44) child t) child t44) child t) child t33) child itemlist) child itemlist
/itemlist/item[@category = "audio equipment"]/itemlist/item[@category = "audio equipment"] [catalog-info/type = "CD player"]/sales-info/price[catalog-info/type = "CD player"]/sales-info/price
Sept. 27, 2002Sept. 27, 2002 ISDB'02ISDB'02
Simplification of Query Plan (1)Simplification of Query Plan (1)
The translated result contains multiple The translated result contains multiple application of an operatorapplication of an operator
We can delete redundant operators We can delete redundant operators considering the operator semanticsconsidering the operator semantics
Example:Example: tt11 = = item with (item withitem with (item with (category same "audio (category same "audio
equipment")) equipment")) → → item withitem with (category same "audio (category same "audio equipment")equipment")
Sept. 27, 2002Sept. 27, 2002 ISDB'02ISDB'02
Simplification of Query Plan (2)Simplification of Query Plan (2)
If we can use the DTD information, we can furtIf we can use the DTD information, we can further simplify the expressionsher simplify the expressions
Example:Example: tt33 = t = t11 with (( with ((type child (catalog-info child ttype child (catalog-info child t11))) same ) same
"CD player") → t"CD player") → t11 with (( with ((type in ttype in t11) same "CD playe) same "CD player")r")
Simplified query plan for query QSimplified query plan for query Q tt11 = item with (category name "audio equipment") = item with (category name "audio equipment") ans = price in (tans = price in (t11 with ((type in t with ((type in t11) same "CD player) same "CD player
"))"))
Sept. 27, 2002Sept. 27, 2002 ISDB'02ISDB'02
Presentation OverviewPresentation Overview
BackgroundBackground Motivation and Our ApproachMotivation and Our Approach The Proximal Nodes ModelThe Proximal Nodes Model Query TranslationQuery Translation Translation ExampleTranslation Example Related WorkRelated Work Conclusions and Future WorkConclusions and Future Work
Sept. 27, 2002Sept. 27, 2002 ISDB'02ISDB'02
Related WorkRelated Work
Translation of XQL queries into proximal nodTranslation of XQL queries into proximal nodes expressions (Baeza-Yates&Navarro [2])es expressions (Baeza-Yates&Navarro [2])
Rewriting techniques for XQL queries (Wood Rewriting techniques for XQL queries (Wood [13])[13])
Use of document structure for the query optiUse of document structure for the query optimization [3,11,12,13]mization [3,11,12,13]
Optimization of regular path expressions in tOptimization of regular path expressions in the context of semistructured DBs [4,8]he context of semistructured DBs [4,8]
Sept. 27, 2002Sept. 27, 2002 ISDB'02ISDB'02
Presentation OverviewPresentation Overview
BackgroundBackground Motivation and Our ApproachMotivation and Our Approach The Proximal Nodes ModelThe Proximal Nodes Model Query TranslationQuery Translation Translation ExampleTranslation Example Related WorkRelated Work Conclusions and Future WorkConclusions and Future Work
Sept. 27, 2002Sept. 27, 2002 ISDB'02ISDB'02
Conclusions and Future WorkConclusions and Future Work
ConclusionsConclusions Bottom-up processing approach for XPath queriBottom-up processing approach for XPath queri
eses Support of major XPath query patternsSupport of major XPath query patterns Translation to proximal nodes expressionsTranslation to proximal nodes expressions Simplification and optimization techniquesSimplification and optimization techniques
Future workFuture work Support of more complete XPath semanticsSupport of more complete XPath semantics Application of hybrid approach (top-down and boApplication of hybrid approach (top-down and bo
ttom-up)ttom-up)
top related