efficient rdf storage and retrieval in jena2 written by: kevin wilkinson, craig sayers, harumi kuno,...

34
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파파파

Upload: flora-townsend

Post on 04-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드

Efficient RDF Storage and

Retrieval in Jena2Written by: Kevin Wilkinson, Craig Sayers,

Harumi Kuno, Dave Reynolds

Presented by: Umer Fareed 파리드

Page 2: Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드

Outline

IntroductionOverview of Jena Overview of RDFStorage Schema for Jena1 and Jena2Jena2 ArchitectureJena2 Query ProcessingMiscellaneous TopicsRelated and Future WorkConclusion

Page 3: Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드

Introduction

Semantic Web programmer’s Toolkit Open-source project grown out of HP Labs

Semantic Web Programme Offers a simple abstraction of the RDF graph as

its central internal interface Supports a number of database engines (e.g.,

Postgresql, MySQL, Oracle) A flexible architecture that facilitate porting to

new SQL database engines

Page 4: Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드

Introduction

Facilitates experimentation with different database layouts.

Jena2 : Second generation of Jena New internal architecture and capabilities Minimizes changes in API Maintains persistent storage Addresses performance and scaling issues in

Jena1

Page 5: Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드

Outline

IntroductionOverview of Jena Overview of RDFStorage Schema for Jena1 and Jena2Jena2 ArchitectureJena2 Query ProcessingMiscellaneous TopicsRelated and Future WorkConclusion

Page 6: Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드

Overview of Jena

Jena1 provided rich API for manipulating RDF graphs

User can choose to store RDF graphs in memory or in databases

In Jena2, architecture was modified to achieve two goals:

Provide a simple minimalist view of the RDF graph Allow easy access to, and manipulation of, data in

graphs enabling the data to be exposed as triples

Page 7: Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드

Overview of Jena

Jena2 Architectural Overview

Page 8: Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드

At abstract level, Jena2 storage implement three operations: statement, to remove an RDF statement from the

database; find add statement, to store an RDF statement in a

database; delete operation; to retrieve all statements that match

a pattern of the form <S,P,O> where each S, P, O is either a constant or a don’t-care

Overview of Jena

Page 9: Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드

Outline

Introduction Overview of Jena Overview of RDF Storage Schema for Jena1 and Jena2 Jena2 persistence Architecture Jena2 Query Processing Miscellaneous Topics Related and Future Work Conclusion

Page 10: Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드

Overview of RDF

RDF is a W3C standard Means of expressing and exchanging semantic

metadata RDF was originally designed for the

representation and processing of metadata about remote information sources

Provides a simple tuple model, <Subject,Property,Object>, to express all knowledge

Page 11: Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드

Provide some predefined basic properties

such as type, class, subclass, etc. RDF permits resources to be associated with

arbitrary properties Statements associating a resource with new

properties and values may be added to an RDF fact base at any time.

Require efficient and flexible mapping to provide persistent storage

Overview of RDF

Page 12: Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드

Outline

Introduction Overview of Jena Overview of RDF Storage Schema for Jena1 and Jena2 Jena2 Architecture Jena2 Query Processing Miscellaneous Topics Related and Future Work Conclusion

Page 13: Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드

Storage Schema for Jena1 and Jena2

Storing Arbitrary RDF Statements in Jena1

Jena1 use two different database schemas; 1. Relational Databases2. Berkeley Database

For relational databases, the schema consisted of a statement table, a literals table and a resources table

For Berkeley DB, all parts of a statement were stored in a single row

Page 14: Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드

Storage Schema for Jena1 and Jena2

Each statement was stored three times: once indexed by subject, once by predicate and once by object

Berkeley DB schema used a single access method to store statements

Jena graphs stored using Berkeley DB were observed to be faster than graphs stored in relational databases

Page 15: Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드

Storage Schema for Jena1 and Jena2

Jena1 Schema (Normalized)

Page 16: Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드

Storage Schema for Jena1 and Jena2

Storing Arbitrary RDF Statements in Jena2o Jena2 schema trades-off space for timeo Uses a denormalized schema in which resource URIs

and simple literal values are stored directly in the statement table

A separate literals table is only used to store literal values

A separate resources table is used to store long URIs Many find operations without a join are possible by

storing values directly in the statement table

Page 17: Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드

Storage Schema for Jena1 and Jena2

Jena2 Schema (Denormalized)

Page 18: Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드

Storage Schema for Jena1 and Jena2

A denormalized schema uses more database space because the same value (literal or URI) is stored repeatedly

Jena1 and Jena2 permit multiple graphs to be stored in a single database instance

Jena2 supports the use of multiple statement tables in a single database so that applications can flexibly map graphs to different tables

Use of multiple statement tables may improve performance through better locality and caching

Page 19: Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드

Outline

Introduction Overview of Jena Overview of RDF Storage Schema for Jena1 and Jena2 Jena2 Architecture Jena2 Query Processing Miscellaneous Topics Related and Future Work Conclusion

Page 20: Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드

Jena2 Architecture

Jena2 Persistent Architecture is implemented using

Specialized Graph Interface Persistence layer presents a Graph interface to the

higher levels of Jena supporting the usual Graph operations of add, delete and find

Each logical graph is implemented using an ordered list of specialized graphs

An operation on the entire logical graph, such as add , delete or find, is processed by invoking add, delete, find on each specialized graph

Page 21: Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드

Jena2 Architecture

Results of the individual operations are combined and returned as the result for the entire graph

An operation can be completely processed for the entire graph by one specialized graph resulting in process optimization

Each specialized graph maps the graph operations onto appropriate tables in the database

Many-to-one mapping between specialized graphs and database tables

Page 22: Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드

Jena2 Architecture

Graphs Comprise Specialized Graphs Over Tables

Page 23: Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드

Database Driver The driver is responsible for data definition operations

such as database initialization, table creation and deletion, allocating database identifiers

Responsible for mapping graph objects between their Java representation and their database encoding.

Use a combination of static and dynamically generated SQL for data manipulation

Maintains a cache of prepared SQL statements to reduce the overhead of query compilation

Jena2 Architecture

Page 24: Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드

Configuration and Meta-Graphs Configuration parameters are specified as RDF

statements. A meta-graph, a separate, auxiliary RDF graph

containing metadata about each logical graph is associated with each Jena2 persistent store

Meta-graph may be queried just as any other Jena graph but, unlike other graphs, it may not be modified and it does not support reification.

Meta-graph may also specify additional property, property-class tables and indexes

Jena2 Architecture

Page 25: Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드

Outline

Introduction Overview of Jena Overview of RDF Storage Schema for Jena1 and Jena2 Jena2 Architecture Jena2 Query Processing Miscellaneous Topics Related and Future Work Conclusion

Page 26: Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드

Jena2 Query Processing

Two forms of Jena Querying: Find Processing RDQL Processing In find querying, the find operation returns all statements

satisfying a pattern. In Jena1, a find pattern is evaluated with a single SQL

select query over the statement table. For pattern evaluation in Jena2, the pattern is passed to

each specialized graph handler. The results are concatenated and returned to the application

Page 27: Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드

Jena2 Query Processing

An RDQL query in Jena1 is converted into a pipeline of find patterns connected by join variables

Query is evaluated in a nested-loops fashion by using the result of a find operation over one pattern

Generation of patterns for new find operations

• Goal of Jena2 query processing is to convert multiple triple patterns into a single query for evaluation by the database engine

Page 28: Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드

Outline

Introduction Overview of Jena Overview of RDF Storage Schema for Jena1 and Jena2 Jena2 Architecture Jena2 Query Processing Miscellaneous Topics Related and Future Work Conclusion

Page 29: Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드

Miscellaneous Topics

Jena2 Performance Toolkit Explore various layout options and understand

performance trade-offsJena Transaction Management The underlying database needs to support

transactionsBulk Load Significant reduction in the time to load

persistent graphs

Page 30: Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드

Outline

Introduction Overview of Jena Overview of RDF Storage Schema for Jena1 and Jena2 Jena2 Architecture Jena2 Query Processing Miscellaneous Topics Related and Future Work Conclusion

Page 31: Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드

Related Work

Jena2 schema design Supports a denormalized schema used for

storing generic triple statements as well as Property tables to store subject-value pairs

related by arbitrarily specified properties Provides an efficient implementation for

reification Most systems support only a fixed set of

underlying tables that implement a (non-schema-specific) generic store

Page 32: Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드

Performance measurements indicate that the denormalized schema of Jena2 is twice as fast for many operations than the normalized schema of Jena1

Jena2 algorithm is a modest improvement over the Jena1 nested-loops approach RDQL query processing

An important enhancement in Jena2 for typed literals will be to store them as native SQL types rather as strings.

Support for OWL and reasoning in Jena2.

Future Work

Page 33: Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드

Outline

Introduction Overview of Jena Overview of RDF Storage Schema for Jena1 and Jena2 Jena2 Architecture Jena2 Query Processing Miscellaneous Topics Related and Future Work Conclusion

Page 34: Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드

Conclusion

Jena2 supports application-specific schema

Retains the flexibility to store arbitrary graphs

Use of property-class tables beneficial for query languages that expose higher-level abstractions to applications

More work needed on efficient algorithms query processing and optimization