![Page 1: Die ZBW ist Mitglied der Leibniz-Gemeinschaft A Data Restore Model for Reproducibility in Computational Statistics Daniel Bahls, ZBW, I-Know 2013, Graz,](https://reader035.vdocuments.pub/reader035/viewer/2022062404/55161778550346a2308b55ca/html5/thumbnails/1.jpg)
Die ZBW ist Mitglied der Leibniz-Gemeinschaft
A Data Restore Model
for Reproducibility in Computational Statistics
Daniel Bahls, ZBW, I-Know 2013, Graz, Austria
![Page 2: Die ZBW ist Mitglied der Leibniz-Gemeinschaft A Data Restore Model for Reproducibility in Computational Statistics Daniel Bahls, ZBW, I-Know 2013, Graz,](https://reader035.vdocuments.pub/reader035/viewer/2022062404/55161778550346a2308b55ca/html5/thumbnails/2.jpg)
Outline
1. Motivation – Repeatability in Empirical Research
2. Our Approach – The Data Restore Model
3. Outlook – Status of this Work / Next Steps
Seite 2
![Page 3: Die ZBW ist Mitglied der Leibniz-Gemeinschaft A Data Restore Model for Reproducibility in Computational Statistics Daniel Bahls, ZBW, I-Know 2013, Graz,](https://reader035.vdocuments.pub/reader035/viewer/2022062404/55161778550346a2308b55ca/html5/thumbnails/3.jpg)
Repeatability in Science
• Fundamental criterion – to verify is the job of the community
• Experiments must lead to the same findings• different researchers• under certain constant parameters
• Further• Robustness (w.r.t measuring errors, etc.)• Repeatability vs. Reproducibility vs. Verifiability
Seite 3
![Page 4: Die ZBW ist Mitglied der Leibniz-Gemeinschaft A Data Restore Model for Reproducibility in Computational Statistics Daniel Bahls, ZBW, I-Know 2013, Graz,](https://reader035.vdocuments.pub/reader035/viewer/2022062404/55161778550346a2308b55ca/html5/thumbnails/4.jpg)
Repeatability in Economicsand the infamous case of Rogoff and Reinhard
Seite 4
![Page 5: Die ZBW ist Mitglied der Leibniz-Gemeinschaft A Data Restore Model for Reproducibility in Computational Statistics Daniel Bahls, ZBW, I-Know 2013, Graz,](https://reader035.vdocuments.pub/reader035/viewer/2022062404/55161778550346a2308b55ca/html5/thumbnails/5.jpg)
Improving Review Processes
Seite 5
- Justin Wolfers, Betsey Stevenson, economists at University of Michigan
....so we need access to the data
If we try it all on our own
and cannot reproduce the results,
what does it mean?
![Page 6: Die ZBW ist Mitglied der Leibniz-Gemeinschaft A Data Restore Model for Reproducibility in Computational Statistics Daniel Bahls, ZBW, I-Know 2013, Graz,](https://reader035.vdocuments.pub/reader035/viewer/2022062404/55161778550346a2308b55ca/html5/thumbnails/6.jpg)
McCullough – Experiences & Recommendations
Seite 6
![Page 7: Die ZBW ist Mitglied der Leibniz-Gemeinschaft A Data Restore Model for Reproducibility in Computational Statistics Daniel Bahls, ZBW, I-Know 2013, Graz,](https://reader035.vdocuments.pub/reader035/viewer/2022062404/55161778550346a2308b55ca/html5/thumbnails/7.jpg)
McCullough – Requirements & Experiences
Seite 7
![Page 8: Die ZBW ist Mitglied der Leibniz-Gemeinschaft A Data Restore Model for Reproducibility in Computational Statistics Daniel Bahls, ZBW, I-Know 2013, Graz,](https://reader035.vdocuments.pub/reader035/viewer/2022062404/55161778550346a2308b55ca/html5/thumbnails/8.jpg)
McCullough – Requirements & Experiences
Seite 8
![Page 9: Die ZBW ist Mitglied der Leibniz-Gemeinschaft A Data Restore Model for Reproducibility in Computational Statistics Daniel Bahls, ZBW, I-Know 2013, Graz,](https://reader035.vdocuments.pub/reader035/viewer/2022062404/55161778550346a2308b55ca/html5/thumbnails/9.jpg)
Sweave – Literate Programming for Statistics
Seite 9
![Page 10: Die ZBW ist Mitglied der Leibniz-Gemeinschaft A Data Restore Model for Reproducibility in Computational Statistics Daniel Bahls, ZBW, I-Know 2013, Graz,](https://reader035.vdocuments.pub/reader035/viewer/2022062404/55161778550346a2308b55ca/html5/thumbnails/10.jpg)
Sweave – Literate Programming for Statistics
Seite 10
![Page 11: Die ZBW ist Mitglied der Leibniz-Gemeinschaft A Data Restore Model for Reproducibility in Computational Statistics Daniel Bahls, ZBW, I-Know 2013, Graz,](https://reader035.vdocuments.pub/reader035/viewer/2022062404/55161778550346a2308b55ca/html5/thumbnails/11.jpg)
Data Publishing in Economics / Social Sciences
Different disciplines have different challenges
Characteristics of empirical research:
• sensitive / protected data
• distributed external data sources
Seite 11
Data Sharing
submit data bundles to 3rd-party repositories?
![Page 12: Die ZBW ist Mitglied der Leibniz-Gemeinschaft A Data Restore Model for Reproducibility in Computational Statistics Daniel Bahls, ZBW, I-Know 2013, Graz,](https://reader035.vdocuments.pub/reader035/viewer/2022062404/55161778550346a2308b55ca/html5/thumbnails/12.jpg)
?
Data ManagementThe Black Box Approach
data reviewcuration legal situation
re-use transparency repeatability
Seite 12
a data set copy(some resource bundle)
![Page 13: Die ZBW ist Mitglied der Leibniz-Gemeinschaft A Data Restore Model for Reproducibility in Computational Statistics Daniel Bahls, ZBW, I-Know 2013, Graz,](https://reader035.vdocuments.pub/reader035/viewer/2022062404/55161778550346a2308b55ca/html5/thumbnails/13.jpg)
Statistical Data on the Semantic Web
Seite 13
![Page 14: Die ZBW ist Mitglied der Leibniz-Gemeinschaft A Data Restore Model for Reproducibility in Computational Statistics Daniel Bahls, ZBW, I-Know 2013, Graz,](https://reader035.vdocuments.pub/reader035/viewer/2022062404/55161778550346a2308b55ca/html5/thumbnails/14.jpg)
Outline
1. Motivation – Repeatability in Empirical Research
2. Our Approach – The Data Restore Model
3. Outlook – Status of this Work / Next Steps
Seite 14
![Page 15: Die ZBW ist Mitglied der Leibniz-Gemeinschaft A Data Restore Model for Reproducibility in Computational Statistics Daniel Bahls, ZBW, I-Know 2013, Graz,](https://reader035.vdocuments.pub/reader035/viewer/2022062404/55161778550346a2308b55ca/html5/thumbnails/15.jpg)
Data Restore Model
Seite 15
Spreadsheet
obs data set
![Page 16: Die ZBW ist Mitglied der Leibniz-Gemeinschaft A Data Restore Model for Reproducibility in Computational Statistics Daniel Bahls, ZBW, I-Know 2013, Graz,](https://reader035.vdocuments.pub/reader035/viewer/2022062404/55161778550346a2308b55ca/html5/thumbnails/16.jpg)
Data Restore Model
Seite 16
Spreadsheet
obs data set
![Page 17: Die ZBW ist Mitglied der Leibniz-Gemeinschaft A Data Restore Model for Reproducibility in Computational Statistics Daniel Bahls, ZBW, I-Know 2013, Graz,](https://reader035.vdocuments.pub/reader035/viewer/2022062404/55161778550346a2308b55ca/html5/thumbnails/17.jpg)
DataSet
type
UserDataSet
Data Items
type
Data Itemsfrom own survey
includesData
external dataset
buildScript
No gaps
Trust
Incentive
17
![Page 18: Die ZBW ist Mitglied der Leibniz-Gemeinschaft A Data Restore Model for Reproducibility in Computational Statistics Daniel Bahls, ZBW, I-Know 2013, Graz,](https://reader035.vdocuments.pub/reader035/viewer/2022062404/55161778550346a2308b55ca/html5/thumbnails/18.jpg)
Seite 18
Source: EuroStatDataset: Household XZVersion: 0.2Published: Jan 2009[read more]
![Page 19: Die ZBW ist Mitglied der Leibniz-Gemeinschaft A Data Restore Model for Reproducibility in Computational Statistics Daniel Bahls, ZBW, I-Know 2013, Graz,](https://reader035.vdocuments.pub/reader035/viewer/2022062404/55161778550346a2308b55ca/html5/thumbnails/19.jpg)
Integration with Research Environments
Seite 19
![Page 20: Die ZBW ist Mitglied der Leibniz-Gemeinschaft A Data Restore Model for Reproducibility in Computational Statistics Daniel Bahls, ZBW, I-Know 2013, Graz,](https://reader035.vdocuments.pub/reader035/viewer/2022062404/55161778550346a2308b55ca/html5/thumbnails/20.jpg)
Seite 20
![Page 21: Die ZBW ist Mitglied der Leibniz-Gemeinschaft A Data Restore Model for Reproducibility in Computational Statistics Daniel Bahls, ZBW, I-Know 2013, Graz,](https://reader035.vdocuments.pub/reader035/viewer/2022062404/55161778550346a2308b55ca/html5/thumbnails/21.jpg)
Review and Re-use
Seite 21
Client
Source CodeRepository
Archive DArchive CArchive B
Archive A
DOI
Code andData Templates
Authenticate & Request Data
![Page 22: Die ZBW ist Mitglied der Leibniz-Gemeinschaft A Data Restore Model for Reproducibility in Computational Statistics Daniel Bahls, ZBW, I-Know 2013, Graz,](https://reader035.vdocuments.pub/reader035/viewer/2022062404/55161778550346a2308b55ca/html5/thumbnails/22.jpg)
Data Infrastructure Concept
• One source per data set
transparency, curation by highest expertise
• Data protection
make data publishing possible for all scenarios
• Data and code integration
one-click-solution – no manual efforts for replication attempts
• Precise Citation
traceable data provenance
Seite 22
![Page 23: Die ZBW ist Mitglied der Leibniz-Gemeinschaft A Data Restore Model for Reproducibility in Computational Statistics Daniel Bahls, ZBW, I-Know 2013, Graz,](https://reader035.vdocuments.pub/reader035/viewer/2022062404/55161778550346a2308b55ca/html5/thumbnails/23.jpg)
Incentives for the Research Community
• Transparency increases trust:
no gaps – trust – incentive
• Easy re-use:
the research models applied live longer
• More impact:
more citation
Seite 23
![Page 24: Die ZBW ist Mitglied der Leibniz-Gemeinschaft A Data Restore Model for Reproducibility in Computational Statistics Daniel Bahls, ZBW, I-Know 2013, Graz,](https://reader035.vdocuments.pub/reader035/viewer/2022062404/55161778550346a2308b55ca/html5/thumbnails/24.jpg)
Incentives for the Research Community
• Material for tutorials:
Students learn computational research in practice
• Research is more efficient:
Easier to understand and pick up the research of others
• Secured Knowledge:
Replication attempts in different research environments and context
discussion, inspiration, innovation
“Non-Findings” may get more recognition
Seite 24
![Page 25: Die ZBW ist Mitglied der Leibniz-Gemeinschaft A Data Restore Model for Reproducibility in Computational Statistics Daniel Bahls, ZBW, I-Know 2013, Graz,](https://reader035.vdocuments.pub/reader035/viewer/2022062404/55161778550346a2308b55ca/html5/thumbnails/25.jpg)
Outline
1. Motivation – Repeatability in Empirical Research
2. Our Approach – The Data Restore Model
3. Outlook – Status of this Work / Next Steps
Seite 25
![Page 26: Die ZBW ist Mitglied der Leibniz-Gemeinschaft A Data Restore Model for Reproducibility in Computational Statistics Daniel Bahls, ZBW, I-Know 2013, Graz,](https://reader035.vdocuments.pub/reader035/viewer/2022062404/55161778550346a2308b55ca/html5/thumbnails/26.jpg)
What we are currently working on
Seite 26
The Rogoff and Reinhard / Herndon case
• apply Data Restore Model
• add semantic data documentation (partly available as RDF already)
• model by Data and Code ontology
![Page 27: Die ZBW ist Mitglied der Leibniz-Gemeinschaft A Data Restore Model for Reproducibility in Computational Statistics Daniel Bahls, ZBW, I-Know 2013, Graz,](https://reader035.vdocuments.pub/reader035/viewer/2022062404/55161778550346a2308b55ca/html5/thumbnails/27.jpg)
Data and Code Ontology
Seite 27
Data and Code
System Environment
Resources
HW
SW
Replication Attempts
ExperimentSetup
• Maven• Make
• Build
• Virtualisation
• Emulation
• Linked Science
• Social M
edia
Data References
• Semantic Coding?
![Page 28: Die ZBW ist Mitglied der Leibniz-Gemeinschaft A Data Restore Model for Reproducibility in Computational Statistics Daniel Bahls, ZBW, I-Know 2013, Graz,](https://reader035.vdocuments.pub/reader035/viewer/2022062404/55161778550346a2308b55ca/html5/thumbnails/28.jpg)
What we are currently working on
Seite 28
The Koenker Zeileis case
• Model relations between Data and Code instances
protectedpublic use file
figures
data set
transformationby code
The Koenker Zeileis case
![Page 29: Die ZBW ist Mitglied der Leibniz-Gemeinschaft A Data Restore Model for Reproducibility in Computational Statistics Daniel Bahls, ZBW, I-Know 2013, Graz,](https://reader035.vdocuments.pub/reader035/viewer/2022062404/55161778550346a2308b55ca/html5/thumbnails/29.jpg)
Data Access and Retrieval
![Page 30: Die ZBW ist Mitglied der Leibniz-Gemeinschaft A Data Restore Model for Reproducibility in Computational Statistics Daniel Bahls, ZBW, I-Know 2013, Graz,](https://reader035.vdocuments.pub/reader035/viewer/2022062404/55161778550346a2308b55ca/html5/thumbnails/30.jpg)
Next Steps
Seite 30
1. Challenge, Goals, Requirements
2. The Data Restore Model
3. Semantic Linkup / Data Annotation
4. Data Retrieval and Reuse
5. System Architecture
6. Validation / Evaluation
![Page 32: Die ZBW ist Mitglied der Leibniz-Gemeinschaft A Data Restore Model for Reproducibility in Computational Statistics Daniel Bahls, ZBW, I-Know 2013, Graz,](https://reader035.vdocuments.pub/reader035/viewer/2022062404/55161778550346a2308b55ca/html5/thumbnails/32.jpg)
So there are still gaps
Examples:
•data set is titled “EU Unemployment statistics 2012, EuroStat”• age class? seasonal adjustments?
•Executing the code does not produce the results• wrong data? system environment? error?• cf. Herndon’s replication of Rogoff/Reinhard research
•DOI does not specify file format
Seite 32
![Page 33: Die ZBW ist Mitglied der Leibniz-Gemeinschaft A Data Restore Model for Reproducibility in Computational Statistics Daniel Bahls, ZBW, I-Know 2013, Graz,](https://reader035.vdocuments.pub/reader035/viewer/2022062404/55161778550346a2308b55ca/html5/thumbnails/33.jpg)
Data and Code Ontology
Seite 33
observation string value
s p o
data ref
default value
for_stata
for_spss
![Page 34: Die ZBW ist Mitglied der Leibniz-Gemeinschaft A Data Restore Model for Reproducibility in Computational Statistics Daniel Bahls, ZBW, I-Know 2013, Graz,](https://reader035.vdocuments.pub/reader035/viewer/2022062404/55161778550346a2308b55ca/html5/thumbnails/34.jpg)
Such relationship can be stated within the semantic model
Proxy Relations
Dataset foreconomic growth(GDP or the like)
Dataset forAluminium
Price Index
Describes the proxy relation: - details on correlation
- best practices - frequency of use
- ...
hasProxyRel