a survey of data provenance in scientific workflow wu yuan email: [email protected] kunming...
TRANSCRIPT
A Survey of Data provenance in Scientific Workflow
Wu Yuan Email: [email protected] University of Science and technology
Outline
1. What is Provenance and how it works?
2. Taxonomy of provenance techniques.
3. Provenance model.
4. Provenance in scientific workflow.
5. What’s next.
What is Provenance ?
Provenance.
→ 数据起源 / 数据出处 / 数据溯源…
Provenance Service in Scientific experiment
Provenance Definition
●The provenance of a result is the process that led to that result.
●where- provenance & why- provenance (Buneman, P., S. Khanna, and W.C. Tan,2001 )
Understanding the Process provenance Change
Where- provenance? &Why- provenance?
Provenance is an increasingly important topic.
→biology;
… Financial Auditing;
…Aerospace;
The importance of provenance
Scientific analysis is complex Issues of trust, quality, and copyright of data Reproducing, interpreting results depends on
the provenance of the data. Workflow systems Support scientists in their analysis Trace the data used / generated at each step Replication Recipe
Provenance “Lifecycle”
Taxonomy of Provenance Techniques
Provenance computing
● Query inversion.( 查询反演 ) …or the style of “lazy” .● Annotations. (标注) …or the style of “eager” .
Model of describing provenance
W7(when, where, how, who, why, what, which) An information model and architecture for stream provenance
(LEAD, Calder, 气象 )
The four-level provenance model (LEAD, Kama,地球学 )
A time-and-value centric provenance model (Century,医学 )
W7 model
Data model of provenance
Relational Model XML
…
The Model of Provenance in workflow
●RDF (resource description framework) ●PCS (provenance collection service) ●PQS (provenance query server)
Provenance collection service
工作流日志逆 SQL查询语句标注模式
provenance query server
Query :
Relational Model/SQL
XML/XQueryOR ……
Current study
Query Language Annotation-aware
Provenance computing Provenance model Annotation management need a data model. Annotation management system
Data security and data reliability
→ Current study is limited …Query itself is complex
Conclusion
Provenance recording should be part of the infrastructure.
Currently, the Web Services and the Open Grid Services Architecture do not provide any support for recording provenance.
Thanks for your
attendance !
…The End.