a survey of data provenance in scientific workflow wu yuan email: wuyuan@cnlab.net kunming...

Post on 03-Jan-2016

217 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

A Survey of Data provenance in Scientific Workflow

Wu Yuan Email: wuyuan@cnlab.netKunming University of Science and technology

Outline

1. What is Provenance and how it works?

2. Taxonomy of provenance techniques.

3. Provenance model.

4. Provenance in scientific workflow.

5. What’s next.

What is Provenance ?

Provenance.

→ 数据起源 / 数据出处 / 数据溯源…

Provenance Service in Scientific experiment

Provenance Definition

●The provenance of a result is the process that led to that result.

●where- provenance & why- provenance (Buneman, P., S. Khanna, and W.C. Tan,2001 )

Understanding the Process provenance Change

Where- provenance? &Why- provenance?

Provenance is an increasingly important topic.

→biology;

… Financial Auditing;

…Aerospace;

The importance of provenance

Scientific analysis is complex Issues of trust, quality, and copyright of data Reproducing, interpreting results depends on

the provenance of the data. Workflow systems Support scientists in their analysis Trace the data used / generated at each step Replication Recipe

Provenance “Lifecycle”

Taxonomy of Provenance Techniques

Provenance computing

● Query inversion.( 查询反演 ) …or the style of “lazy” .● Annotations. (标注) …or the style of “eager” .

Model of describing provenance

W7(when, where, how, who, why, what, which) An information model and architecture for stream provenance

(LEAD, Calder, 气象 )

The four-level provenance model (LEAD, Kama,地球学 )

A time-and-value centric provenance model (Century,医学 )

W7 model

Data model of provenance

Relational Model XML

The Model of Provenance in workflow

●RDF (resource description framework) ●PCS (provenance collection service) ●PQS (provenance query server)

Provenance collection service

工作流日志逆 SQL查询语句标注模式

provenance query server

Query :

Relational Model/SQL

XML/XQueryOR ……

Current study

Query Language Annotation-aware

Provenance computing Provenance model Annotation management need a data model. Annotation management system

Data security and data reliability

→ Current study is limited …Query itself is complex

Conclusion

Provenance recording should be part of the infrastructure.

Currently, the Web Services and the Open Grid Services Architecture do not provide any support for recording provenance.

Thanks for your

attendance !

…The End.

top related