source description-based approach for the modeling of spatial information integration yoshiharu...

Post on 31-Dec-2015

215 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Source Description-based Approach for the Modeling of Spatial Information Integration

Yoshiharu Ishikawa and Hiroyuki KitagawaUniversity of Tsukuba

{ishikawa,kitagawa}@is.tsukuba.ac.jp

Outline

Background Our Objective and Approach Motivating Example Data Model Query Specification and Source

Description Query Processing Conclusions and Future Work

Background: Spatial Information Sources (1)

Spatial information sources: emerging new information sources on the Internet information sources that

provide region- or location-oriented information

some of them support mobile users with GPSs and hand-held devices

Background: Spatial Information Sources (2)

Need for the technology to integrate spatial information sources description of spatial

information sources by taking their contents into consideration

efficient and effective query planning and processing

Spatial Information Integration

Background: Spatial Information Sources (3)

Standarization Efforts of Spatial Technologies OpenGIS [5]: standardization of GIS system POIX [6]: language for location-oriented information exchan

ge G-XML [7]: XML vocaburary for geographic information des

cription RWML [8]: road information description language

Spatial Information Services Digital City [10], citysearch.com [11]: location-oriented info

rmation services Ekimae Tanken Club [12]: provides local information nearb

y a specified rail station MONET system [13]: provides information for car drivers

Background: Heterogeneous Information Integration (1)

Popular approach for information integration well-known wrapper-mediator approach

Wrapper encapsulates the detail of each information

source provides abstract uniform view of the source

Mediator selects appropriate information sources for a

given query query planning and processing

Unified Access tothe Integrated Information

Heterogeneous Information Integration System

Wrapper

Wrapper

Wrapper

Wrapper

Wrapper

Wrapper

MediatorMediator

InformationSource B

InformationSource C

InformationSource D

Wrapper

Wrapper

InformationSource A

Background: Heterogeneous Information Integration (2)

Outline

Background Our Objective and Approach Motivating Example Data Model Query Specification and Source

Description Query Processing Conclusions and Future Work

Our Objective

Development of a spatial information integration framework for location-aware information services integration of heterogeneous spatial

information sources heterogeneity of the contents of the sources heterogeneity of the capabilities of the sources

provide useful location-oriented information service to mobile users

selection of neighborhood geometric features

Our Approach Development of a description method to represent s

patial information sources based on the source description framework: describes the

contents and the service of the source introduction of spatial data types and spatial operators: ba

sed on OpenGIS standard Development of query planning and processing meth

ods that effectively utilize source descriptions selection of appropriate information sources for a given qu

ery effective use of the query processing power of each inform

ation source

Outline

Background Our Objective and Approach Motivating Example Data Model Query Specification and Source

Description Query Processing Conclusions and Future Work

Motivating Example (1)

Heterogeneous Information Integration SystemWrappe

rWrappe

rWrappe

rWrappe

rWrapp

erWrapp

er

MediatorMediator

InformationSource B

InformationSource C

InformationSource D

Wrapper

Wrapper

InformationSource A

Global Schema based on the relational model represents a virtual database schema each information source is (partially)

mapped to the global schema

relation Restaurant { relation Evalouation { name string; name string; category string; score real; address string;       }; location point; };

Motivating Example (2)

Motivating Example (3) Query issued by the

user: show top-20 nearest restaurants such that within 1000 meters

from the current position

the score is more than or equal to 2.5 stars

1000m

1

2

34

5

67

SELECT r.name, r.addressFROM Restaurant as r, Evaluation as eWHERE r.name = e.name, e.score >= 2.5 Distance(r.location, p) <= 200ORDER BY Distance (r.location, p)STOP AFTER 20

SQLrepresentation

p

Motivating Example (4) Information Source A:

provides restaurant info for a specific area

Contents: contains information of restaurants within the rectangle area r

Capability: given name or address, it returns the matched restaurants

r

Motivating Example (5) Information Source B:

supports spatial conditions to query restaurant info

Contents: contains information about restaurants

Capability returns restaurants within

the specified circle area receives additional

condition on restaurant category

category = “Chinese”

Motivating Example (6) Information Source C:

supports spatial conditions to query restaurant info

Contents: contains information about restaurants

Capability returns restaurants that

match the specified name if an optional polygon is

given, it only returns restaurants within the specified polygon region

name like “%Sushi”

Motivating Example (7) Information Source

D: provides restaurant evaluation scores given restaurant

name, it returns the evaluation score

select *from Source-Dwhere name like “%Sushi”

name

Tokyo Sushi

score

3.0

Edo Sushi 2.7

Outline

Background Our Objective and Approach Motivating Example Data Model Query Specification and Source

Description Query Processing Conclusions and Future Work

Data Model for Integration The relational model enhanced with spatial da

ta types and spatial operations Spatial data types and spatial operations are

based on OpenGIS proposal [5] A wrapper for each spatial information source

wraps the operations of the source, then provides OpenGIS-conformed operations

A wrapper for a source provides a subset of OpenGIS operations, depending on the capability of the source

Based on OpenGIS Proposal To simplify the problem, we only considers Point, LineString, and Polygon type

s

Geometry

MultiPointMultiCurve

MultiSurface

Point Curve Surface

Geometry

Point GeometryCollectionCurve Surface

LineString Polygon

MultiPointMultiCurve

MultiSurfaceOur Target

Spatial Data Types

intersects(g1,g2)

disjoint(g1,g2)

equals(g1,g2)

overlaps(g1,g2)

contains(g1,g2)

within(g1,g2)

crosses(g1,g2)

touches(g1,g2)

g1 and g2 have intersections

g1 and g2 ao not have any overlap

g1 and g2 are equal

g1 and g2 have one or more overlaps

g1 contains g2

g1 is contained in g2

g1 and g2 have intersections

g1 and g2 touch at one or more points

Spatial Operations (1)Spatial Predicates of OpenGIS

Spatial Functions of OpenGIS

intersection(g1,g2)

distance(g1,g2)

envelope(g)

union(g1,g2)

isempty(g) Integer

Double

Geometry

Geometry

Geometry

g is empty

mindist between g1and g2

MBB of g

unified region of g1 and g2

intersection of g1 and g2

name return type semantics

Spatial Operations (2)

Outline

Background Our Objective and Approach Motivating Example Data Model Query Specification and Source

Description Query Processing Conclusions and Future Work

Source Description Framework

Source Description Framework: a formal framework to specify meta information for an information source proposed by Information Manifold [3]

A source description consists of: Contents Description: describes the contents of th

e source in terms of the global schema Capability Description: describes the types of quer

ies which the source can support We extend the source description approach by consi

dering OpenGIS data types and operations

Query Description An extension of a conjunctive query: it can contain

spatial predicates (e.g., intersects, contains)   spatial functions (e.g., envelope, distance)   use of additional comparison operators (e.g., ≤)

General form of a conjunctive query

    R1,…,Rn : global relations

    u, u1,…,un   :  sequences of variables

    c1,…,cm   (m 0) : conditions

ans(u) R1(u1),…,Rn(un), c1,…,cm

Query Description (1)

ans(n, a) Restaurant(n, c, a, l), Evaluation(e, s),         n = e, s 2.5, distance(l, p) 1000

Show restaurants within 1000 meters from the current position and their scores are larger than or equal to 2.5 stars

SELECT r.name, r.addressFROM Restaurant as r, Evaluation as eWHERE r.name = e.name, e.score >= 2.5 Distance(r.position, p) <= 1000

Query Description (2)

Spatial Query Conditions For spatial query condition, we allow the follow

ing spatial range restriction predicates (g is a geometric constant) equals(g, g) and equals(g, g) within(g, g) contains(g, g)

Also, we allow distance-based range restriction conditions (g is a Geometry object, d is a real constant, is < or ≤) distance(g, g) θ d

A source description consists of contents description

capability description

  pat : mandatory input arguments (input pattern)

  out : denotes the condition issued to the underlying

source when the input arguments (pat) are given

contents : S (u) R (u), c1,…,cn

example: S(n, c, a, l) Restaurant(n, c, a, l), c = “Italian”

filters : pat out

Source Descriptions (1)

Information Source A Information Source A:

provides restaurant info for a specific area

Contents: contains information of restaurants within the rectangle area r

Capability: given name or address, it returns the matched restaurants

r

Source A provides restaurant information provides information within r also allows retrieval by restaurant name and

address

Source A

contents: SA Restaurant(n, c, a, l), contains(r, l)

filters: <n: string> n = n, <a: string> a = a

Source Description for A

Information Source B Information Source B:

supports spatial conditions to query restaurant info

Contents: contains information about restaurants

Capability returns restaurants within

the specified circle area receives additional

condition on restaurant category

category = “Chinese”

Source B provides restaurant information inputs are a query point (p) and a threshold

value of distances (d) allows an additional filtering condition based

on the restaurant category (c)

Source Bcontents: SB Restaurant(n, c, a, l) filters: <p : Point, d : real> distance(l, p) d,   <c: string> c = c

Source Description for B

Information Source C Information Source C:

supports spatial conditions to query restaurant info

Contents: contains information about restaurants

Capability returns restaurants that

match the specified name if an optional polygon is

given, it only returns restaurants within the specified polygon region

name like “%Sushi”

Source C provides restaurant information returns restaurants that match the specified name (n) allows additional filtering condition based on polygonal

region (g )

Source C

contents: SC Restaurant(n, c, a, l) filters: <n: string> n = n, <g: Polygon> contains(g, l)

Source Description for C

Information Source D Information Source

D: provides restaurant evaluation scores given restaurant

name, it returns the evaluation score

select *from Source-Dwhere name like “%Sushi”

name

Tokyo Sushi

score

3.0

Edo Sushi 2.7

Source D provides restaurant evaluation scores allows retrieval by restaurant name and/or

evaluation score

Source D

contents: SD Evaluation(n, s) filters: <n: string> n = n, <s: real> s θ s (θ in {=, ≠, <, >, ≤, ≥})

Source Description for D

Outline

Background Our Objective and Approach Motivating Example Data Model Query Specification and Source

Description Query Processing Conclusions and Future Work

Query Plan Construction1. Preprocessing

- Validation of the correctness of the given query

according to the global schema - deletion of redundant variables - simplifications of expressions2. Selection of useful information sources

based on contents description 3. Pushing query conditions into the underlying

information sources as possible4. Generation of the integrated query plan

Overview of Query Processing (1)

Wrapper

Wrapper

Wrapper

Wrapper

Wrapper

Wrapper

MediatorMediator

Source C Source DSource B

Wrapper

Wrapper

Source A

Pushing subqueries to the sources

query validity checkquery simplification

Source selection basedon contents description

Integration of Subquery results

query result

Receives partial results

Overview of Query Processing (2)

Contents Description used to select useful information sources to proce

ss the given query also used to eliminate redundant join conditions

Capability Description used to decide whether a wrapper on a source can

process the given query condition using its query processing capability

also used to generate a subquery to an information source

Usage of Source Descriptions

Unifies the given query condition and a contents description of a information sourceQuery : ans(u) R1,…,Rn, c1,…,cm

Contents Description : SR (v) Ri(v), e1,…,en

possibility condition for an information sourceto fulfill the given query condition:

x1…xn(c1 … cm e1 … en) = true

Selection of Information Source (1)

Example: a query over the global schema:

ans(n) Restaurant(n, c, a, l), distance(l, p) 1000

Source Description for E: SE (n, c, a, l) Restaurant(n, c, a, l), c = “Italian” , contains(r, l)

Source E has a possibility to satisfy the subquery if: c, l (c = “Italian” contains(r, l)

distance(l, p) 1000) = true

Selection of Information Source (2)

simplification of the possibility condition:

l(contains(r, l) distance(l, p) 1000) = true

intersects(r, circle(p, 1000)) = true

query regionsupported area by source E

1000m

Selection of Information Source (3)

rp

Example: a query over the global schema: ans(n, m) Restaurant(n, c, a, l), BusStop(m, p), distance(l, p) 200

Contents Description for Sources F and G: SF (n, c, a, l) Restaurant(n, c, a, l), contains(r, l)

SG (m, p) BusStop (m, p), contains(s, p)

F and G may satisfy the query if distance(r, s) 200

region of E

200m

region of A

Elimination of Redundant Joins

Pushing Query Conditions (1)

Check the possibility that the given query condition can be processed by the source  When the query condition and the filtering

condition (supported by the source) are equivalent

direct push There is no equivalent condition, but if the

source has more general condition transform into more general condition then push

to the source we need an additional step to check the

retrieved results exactly satisfy the given query condition

Capability Description of the Source:Source C

contents: SC Restaurant(n, c, a, l) filters: <n: string> n = n, <g: Polygon> contains(g, l)

Query:

ans(n)   Restaurant(n, c, a, l), contains(r, l)

push contains(r, p) to the source C

Pushing Query Conditions (2)

Source Description for the Source: Source H

contents: SH Restaurant(n, c, a, l) filters: <n: string> n = n, <g: Polygon> intersects(l, g)

Query:

ans(n) Restaurant(n, c, a, l), distance(l, p) 1000

push condition intersects(p , envelope(circle(p, 1000)))then examine distance(p, circle(p, 1000)) 1000for the retrieved data

Pushing Query Conditions (3)

Outline

Background Our Objective and Approach Motivating Example Data Model Query Specification and Source

Description Query Processing Conclusions and Future Work

Conclusions Proposal of a framework for heterogeneous spatial info

rmation sources based on source description framework

contents description capability description

use of data types and operations of OpenGIS proposal query processing strategies

source selection pushing query conditions

Future Work investigation of source selection and query planning str

ategies more formal framework (e.g., constraint-based approac

h)

Conclusions and Future Work

top related