1 query optimization for inter document relationships in xml structured document radha senthilkumar,...

17
1 Query Optimization for Inter Document Relationships in XML Structured Document Radha Senthilkumar, A. Kannan, D.Vimala, M. Bhuvaneswari Department of Information Technology, MIT, Anna University, Chennai, India International Conference on Comp utational Intelligence and Multi media pplications 2007 報報報 報報報

Upload: allyson-oconnor

Post on 01-Jan-2016

215 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: 1 Query Optimization for Inter Document Relationships in XML Structured Document Radha Senthilkumar, A. Kannan, D.Vimala, M. Bhuvaneswari Department of

1

Query Optimization for Inter Document Relationships in XML

Structured Document

Radha Senthilkumar, A. Kannan, D.Vimala, M. Bhuvaneswari

Department of Information Technology, MIT, Anna University, Chennai, India

International Conference on Computational Intelligence and Multimedia pplications 2007

報告者:劉芸如

Page 2: 1 Query Optimization for Inter Document Relationships in XML Structured Document Radha Senthilkumar, A. Kannan, D.Vimala, M. Bhuvaneswari Department of

2

Outline

• Introduction• The Proposed Optimization Techniques

– Tree Construction– XPath to PAT– Optimization of Input PAT Expression

• Normalization• Semantic Transformation• Simplification• Optimized PAT to XPath• Mapping XML to Relational

• SIMULATION AND RESULTS• Conclusion and Future work

Page 3: 1 Query Optimization for Inter Document Relationships in XML Structured Document Radha Senthilkumar, A. Kannan, D.Vimala, M. Bhuvaneswari Department of

3

Introduction

• 相對於其他資料模型,最佳化的 XML資料庫是極具挑戰性,由於有高度的複雜性。

• 在早期工作,查詢最佳化是為遏制在一個 XML結構化文件的關係。

• 然而,這樣一個最佳化的問題引起多個相互關聯的表時使用。

• 為了做到查詢優化有必要確定屬性的每個表,以及在不同表的關係屬性。

Page 4: 1 Query Optimization for Inter Document Relationships in XML Structured Document Radha Senthilkumar, A. Kannan, D.Vimala, M. Bhuvaneswari Department of

4

The Proposed Optimization Techniques

Figure 1. Overall Flow of Optimization

Page 5: 1 Query Optimization for Inter Document Relationships in XML Structured Document Radha Senthilkumar, A. Kannan, D.Vimala, M. Bhuvaneswari Department of

5

• The optimization starts with the conversion of XPath to a PAT expression.

• The conversion is followed by three level of optimization, namely Normalization, Semantic Transformation and Simplification.

• The Semantic Transformation phase uses the three structural properties, Exclusivity, Obligation, Entrance Location.

• The output of the three levels is the optimized PAT expression.

• This PAT expression is converted into an SQL statement for executing on a RDBMS(relational database management system).

Page 6: 1 Query Optimization for Inter Document Relationships in XML Structured Document Radha Senthilkumar, A. Kannan, D.Vimala, M. Bhuvaneswari Department of

6

Tree Construction

• Each node in the tree represents the elements in the table and the edges give the structural relationship that exists among the nodes.

Figure.2. Node Representation for an element in XML document

Page 7: 1 Query Optimization for Inter Document Relationships in XML Structured Document Radha Senthilkumar, A. Kannan, D.Vimala, M. Bhuvaneswari Department of

7

XPath to PAT

• XPath expression specifies a pattern that selects a set of XML nodes.

• XPath query is represented in PAT so that algebraic transformation can be performed.

• XPath queries are transformed into an equivalent PAT algebraic expression to perform optimization by applying certain rules.

Page 8: 1 Query Optimization for Inter Document Relationships in XML Structured Document Radha Senthilkumar, A. Kannan, D.Vimala, M. Bhuvaneswari Department of

8

Optimization of Input PAT Expression(1)

• Normalization– During this phase, the algebraic properties of the PAT

algebra are exploited to isolate different types of operators at different levels, eliminate the ∩ operator from the query expression and push down the more selective operations.

– Pushing the $ operators to the bottom level can simplify the identification of potential indices.

– The normalization phase also performs obvious simplifications, like transforming “E U E” to a single E, to make the subsequent semantic optimization more efficient.

Page 9: 1 Query Optimization for Inter Document Relationships in XML Structured Document Radha Senthilkumar, A. Kannan, D.Vimala, M. Bhuvaneswari Department of

9

Optimization of Input PAT Expression(2)

• Semantic Transformation– The primary goal of this phase is to identify

the opportunities of applying potential structure indices which is a short cut path between any two nodes.

– This semantic transformation phase focuses on exploiting the opportunities of using an available structure index or enabling such an opportunity to applying a structure index.

Page 10: 1 Query Optimization for Inter Document Relationships in XML Structured Document Radha Senthilkumar, A. Kannan, D.Vimala, M. Bhuvaneswari Department of

10

Optimization of Input PAT Expression(3)

• Simplification– Hence, a further run of the simplification rules

of Phase 1 is desired, and additional simplification rules are needed to handle the combination of the index operator with the ∩ operator.

– The third phase is intended to carry out a bottom-up cleaning on the output of semantic optimization.

Page 11: 1 Query Optimization for Inter Document Relationships in XML Structured Document Radha Senthilkumar, A. Kannan, D.Vimala, M. Bhuvaneswari Department of

11

Optimization of Input PAT Expression(4)

• Optimized PAT to XPath– The optimized PAT expression is converted to

its equivalent XPath expression for mapping to SQL.

– As the given input is XPath query, the PAT expression after optimization is converted back to XPath so that the query can be directly executed to obtain the results using stylesheets.

Page 12: 1 Query Optimization for Inter Document Relationships in XML Structured Document Radha Senthilkumar, A. Kannan, D.Vimala, M. Bhuvaneswari Department of

12

Optimization of Input PAT Expression(5)

• Mapping XML to Relational– Relational database is used to show the effect of quer

y optimization.– The original XPath query and the optimized XPath qu

ery are mapped to their respective SQL queries for testing the performance in a relational database Oracle [10].

– To clearly show the time differences in the execution times of the non-optimized and optimized XPath queries they are mapped to equivalent SQL queries based on the algorithm.

Page 13: 1 Query Optimization for Inter Document Relationships in XML Structured Document Radha Senthilkumar, A. Kannan, D.Vimala, M. Bhuvaneswari Department of

13

SIMULATION AND RESULTS• The first database [Database 1] uses two XML document

s having less number of levels.– Thus the complexity involved in the optimization is less and not

much of difference is noticed in the execution time for the optimized and non-optimized queries.

• The second database [Database 2] is complex than the first in that it contains two XML documents, which have large number of levels.– As the levels are larger, XPath queries written for these docume

nts involve larger path length.– Time taken for execution of both the optimized and non-optimize

d queries has a clear difference, which is depicted in Fig.4.

• The third database [Database 3] is used for the purpose of showing the applicability of this approach for more than two documents.

Page 14: 1 Query Optimization for Inter Document Relationships in XML Structured Document Radha Senthilkumar, A. Kannan, D.Vimala, M. Bhuvaneswari Department of

14

Figure 3 Performance result for Database 2

Page 15: 1 Query Optimization for Inter Document Relationships in XML Structured Document Radha Senthilkumar, A. Kannan, D.Vimala, M. Bhuvaneswari Department of

15

Figure 4 scaling effect of optimized queries with Database 2

Page 16: 1 Query Optimization for Inter Document Relationships in XML Structured Document Radha Senthilkumar, A. Kannan, D.Vimala, M. Bhuvaneswari Department of

16

Figure.5 Effect of selectivity on optimized queries with Database 3

Page 17: 1 Query Optimization for Inter Document Relationships in XML Structured Document Radha Senthilkumar, A. Kannan, D.Vimala, M. Bhuvaneswari Department of

17

Conclusion and Future work• 這樣技術證明有效實施和測試執行時間、可擴展性和選擇性。

• 這種方法運行在邏輯層,因此可獨立於任何儲存模型。

• 最佳化開發基於這種方法可以很容易地適用於廣泛的 XML數據 /信息伺服器來實現更快的查詢優化。