integrating research data into the publication workflow: ebank uk experience rachel heery, ukoln,...

39
Integrating research data into the publication workflow: eBank UK experience Rachel Heery, UKOLN, University of Bath http://www.ukoln.ac.uk/projects/ebank-uk/ PV-2004, ESRIN Centre, Frascati, 5-7 October 2004

Upload: eric-bentley

Post on 28-Mar-2015

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Integrating research data into the publication workflow: eBank UK experience Rachel Heery, UKOLN, University of Bath

                                                             

Integrating research data into the publication workflow: eBank UK experience

Rachel Heery, UKOLN, University of Bath

http://www.ukoln.ac.uk/projects/ebank-uk/

PV-2004, ESRIN Centre, Frascati, 5-7 October 2004

Page 2: Integrating research data into the publication workflow: eBank UK experience Rachel Heery, UKOLN, University of Bath

                                                             

Overview

• eScience agenda– Imperative to re-use data – Publication at source

• Innovations in scholarly communications– Open Access – Institutional repositories

• eBank UK – Integrating research data and journal articles– Information architecture and data flow– Data model and schemas

• Challenges for the future

More effective curation by integrating research data and publications

Page 3: Integrating research data into the publication workflow: eBank UK experience Rachel Heery, UKOLN, University of Bath

                                                             

eBank project team

• UKOLN, University of Bath

• Michael Day• Monica Duke• Rachel Heery• Liz Lyon

• University of Southampton

• Les Carr• Simon Coles• Jeremy Frey• Chris Gutteridge• Mike Hursthouse

• University of Manchester

• John Blunden-Ellis

Page 4: Integrating research data into the publication workflow: eBank UK experience Rachel Heery, UKOLN, University of Bath

                                                             

Imperative to re-use research data

Page 5: Integrating research data into the publication workflow: eBank UK experience Rachel Heery, UKOLN, University of Bath

                                                             

“The next generation of research breakthroughs will rely upon new ways of handling the immense amounts of data that are being produced by modern research methods and equipment, such as telescopes, particle accelerators, genome sequencers and biological imagers….Similar developments are having an impact in the arts and humanities, and in the social sciences.”

A Vision for Research,

Research Councils UK, December 2003

Page 6: Integrating research data into the publication workflow: eBank UK experience Rachel Heery, UKOLN, University of Bath

                                                             

“It is envisaged that the sharing of primary data would prevent unnecessary repetition of experiments and enable scientists to build directly on each others’ work, creating greater efficiencies and productivity in the research process.”

UK Parliamentary Committee report

Page 7: Integrating research data into the publication workflow: eBank UK experience Rachel Heery, UKOLN, University of Bath

                                                             

Current chemistry publishing protocolsIdeas and interpretations

Results & derived data

Hooks into the literature

Raw data!

Page 8: Integrating research data into the publication workflow: eBank UK experience Rachel Heery, UKOLN, University of Bath

                                                             

Calls for new modes of curation for digital data

• Publication

• Discovery

• Re-use

• Preservation

Page 9: Integrating research data into the publication workflow: eBank UK experience Rachel Heery, UKOLN, University of Bath

                                                             

eBank motivation

• Publication bottleneck in many scientific communities

• Small percentage of data referenced in literature

• Limited amount of results data

• Publication at source

• Open repositories• Link data to

research literature• More timely access

Page 10: Integrating research data into the publication workflow: eBank UK experience Rachel Heery, UKOLN, University of Bath

                                                             

eBank focus on crystallography

• Computer controlled instruments• Generates large quantities of digital data and

metadata automatically• Requirement for curaton of data• Strict workflow• Data formatted to international standard

– Crystallographical Information File (CIF) maintained by the International Union of Crystallography

• CombeChem: funded by UK eScience programme

Page 11: Integrating research data into the publication workflow: eBank UK experience Rachel Heery, UKOLN, University of Bath

                                                             

CombeChem: an eScience project

X-Raye-Lab

Analysis

Properties

Propertiese-Lab

SimulationVideo

Diff

ract

omet

er

Grid Middleware

StructuresDatabase

Page 12: Integrating research data into the publication workflow: eBank UK experience Rachel Heery, UKOLN, University of Bath

                                                             

Emerging infrastructure to support curation of digital data

Page 13: Integrating research data into the publication workflow: eBank UK experience Rachel Heery, UKOLN, University of Bath

                                                             

Improving access to research publications

• Repositories– Subject based (arXiv, CogPrints)– Institutional (CDL, MIT)– Supporting technology (DSpace, eprints.org)

• Open Access– Self archiving peer reviewed journal articles– ‘Toll free’ journals (free at point of use)– Supporting technology (OAI-PMH)

Potential for integrating access to data and publications

Page 14: Integrating research data into the publication workflow: eBank UK experience Rachel Heery, UKOLN, University of Bath

                                                             

Supporting technology: Open Archives Initiative

• Protocol for Metadata Harvesting (OAI-PMH)

• Architecture of the OAI-PMH• Harvest available metadata from Data Providers• Place aggregated metadata in a repository• Expose aggregated metadata via a Web interface

• Potential for added value services…

• www.openarchives.org

Page 15: Integrating research data into the publication workflow: eBank UK experience Rachel Heery, UKOLN, University of Bath

                                                             

Architecture of the OAI PMH

• Consistent interfaces for data provider and service provider

• Low barrier protocol / effortless implementation • Based on existing standards (e.g. HTTP, XML, DC)

Harvester Repository

Requests (based on HTTP)

Metadata (encoded in XML)

Metadata and DataMetadata

Data Provider

Service

Service Provider

Page 16: Integrating research data into the publication workflow: eBank UK experience Rachel Heery, UKOLN, University of Bath

                                                             

Page 17: Integrating research data into the publication workflow: eBank UK experience Rachel Heery, UKOLN, University of Bath

                                                             

eBank in a nutshell

● Create institutional repository of Crystallography

Data (at Southampton) ● Modify repository software to handle datasets

(eprints.org at Southampton)● Demonstrate eBank search service linked to

ePrints UK, indexing harvested descriptions of datasets and journal articles (at UKOLN)

● Embed eBank service into PSIgate subject gateway (at Manchester)

To develop pilot service linking journal articles and scientific datasets (September 2003 - October 2005)

Page 18: Integrating research data into the publication workflow: eBank UK experience Rachel Heery, UKOLN, University of Bath

                                                             

Institutional repository

(Southampton repository)

eBank UK aggregator service

(metadata describing datasets)

ePrint UK aggregator service (metadata describing journal articles)

Harvesting OAI-PMH

ebank_dc

Harvesting OAI-PMH oai_dc

Searching, linking and embedding

Searching, linking and embedding

PSIgate portal

eBank architecture

Page 19: Integrating research data into the publication workflow: eBank UK experience Rachel Heery, UKOLN, University of Bath

                                                             

Institutional repositories at various sites – providing links to data and journal articles, providing metadata for harvesting

Various aggregators of metadata describing datasets – international subject based services, publishers’ services etc

Various aggregators of metadata describing journal articles – international subject based services, publishers’ services etc

Harvesting OAI-PMH

ebank_dc

Harvesting OAI-PMH oai_dc

Searching, linking and embedding

Searching, linking and embedding

Embedded services in various specialist portals

Potential extended architecture

Page 20: Integrating research data into the publication workflow: eBank UK experience Rachel Heery, UKOLN, University of Bath

                                                             

First steps: establishing common ground…

• Understand the data creation process • Terminology and definitions

– Data– Metadata– Datafile– Dataset– Data holding

• Different views– Digital library researchers, computer scientists, chemists– Generic vs specific– Modeller vs practitioner

• Data modelling• Defining metadata schema

Page 21: Integrating research data into the publication workflow: eBank UK experience Rachel Heery, UKOLN, University of Bath

                                                             

Crystallographic data workflow

Set up data collection

Collect data

Process + correct images

Solve structure

Refine structure

CIF

RAW DATA

DERIVED DATA

RESULTS DATA

1

2

3

4

5

6

Page 22: Integrating research data into the publication workflow: eBank UK experience Rachel Heery, UKOLN, University of Bath

                                                             

Crystallographic data workflow

Set up data collection

Collect data

Process + correct images

Solve structure

Refine structure

CIF

RAW DATA

DERIVED DATA

RESULTS DATA

1

2

3

4

5

6

Page 23: Integrating research data into the publication workflow: eBank UK experience Rachel Heery, UKOLN, University of Bath

                                                             

Linking Crystallograpy data and journal ePrints

EPrint (Local)

eBank World

JOURNAL PUBLICATION

DATA HOLDING

INVESTIGATIONRAW

DERIVED

RESULTS

CIF

STRUCTURE REPORT

DATASET (Contains

DATAFILES)

REPORT (EPrint)

EBank REPORT

Page 24: Integrating research data into the publication workflow: eBank UK experience Rachel Heery, UKOLN, University of Bath

                                                             

Crystallography data model

Page 25: Integrating research data into the publication workflow: eBank UK experience Rachel Heery, UKOLN, University of Bath

                                                             

Metadata approach

• Extended Dublin Core for structure reports within institutional repository

• Both simple Dublin Core and extended Dublin Core are offered as alternative schemas for harvesting using OAI-PMH

• Exploring use of extended DC schema within DCMI – impact on aggregator service

• Engaging the broader scientific community to ensure different schemas are compliant and standards can emerge

Page 26: Integrating research data into the publication workflow: eBank UK experience Rachel Heery, UKOLN, University of Bath

                                                             

Extended Dublin Core schema

• Additional chemical information in schema for harvesting e.g. empirical formula

• Schema contains International Chemical Identifier (InChI)

• Links to all datasets associated with an experiment• Links to individual datasets within an experiment• Links to eprints (and other published literature)

derived from the data• Using vocabularies specific to crystallography

Page 27: Integrating research data into the publication workflow: eBank UK experience Rachel Heery, UKOLN, University of Bath

                                                             

Page 28: Integrating research data into the publication workflow: eBank UK experience Rachel Heery, UKOLN, University of Bath

                                                             

Structure reports link back to the underlying data…

Page 29: Integrating research data into the publication workflow: eBank UK experience Rachel Heery, UKOLN, University of Bath

                                                             

eBank aggregator : search

Page 30: Integrating research data into the publication workflow: eBank UK experience Rachel Heery, UKOLN, University of Bath

                                                             

Ebank aggregator: browse

Page 31: Integrating research data into the publication workflow: eBank UK experience Rachel Heery, UKOLN, University of Bath

                                                             

And finally…eBank search embedded in a science portal

Page 32: Integrating research data into the publication workflow: eBank UK experience Rachel Heery, UKOLN, University of Bath

                                                             

ebank_dc record (XML)

Crystal structure (data holding)

Crystal structure report (HTML)

Dataset

Dataset

Institutional repository

eBank UK aggregator service

ePrint UK aggregator service

Subject service

Harvesting OAI-PMH

ebank_dc

Harvesting OAI-PMH oai_dc

Harvesting OAI-PMH oai_dc

Searching, linking and embedding

Searching, linking and embedding

Searching, linking and embedding

dc:identifier

dcterms:references

Linking

dc:type=“CrystalStructure”and/or“Collection”

Model input Andy Powell, UKOLN.

PSIgate portal

Eprint oai_dc record (XML)

dcterms:isReferencedBy

dc:type=“Eprint” and/or ”Text”

Eprint‘jump-off’

page (HTML)

Eprint manifestation

(e.g. PDF)Linking

dc:identifier

Page 33: Integrating research data into the publication workflow: eBank UK experience Rachel Heery, UKOLN, University of Bath

                                                             

Challenges for the future

Page 34: Integrating research data into the publication workflow: eBank UK experience Rachel Heery, UKOLN, University of Bath

                                                             

Progress update

• Version 2.0 eBank metadata schema• Enhanced ePrints.org software• Pilot institutional e-data repository for

harvesting (raw, derived, results data)• Exports records as ebank_dc and oai_dc• Pilot eBank UK aggregator service• Developing search interface Version 1.0 • Testing with PSIgate physical sciences portal

– embedding eBank UK

Page 35: Integrating research data into the publication workflow: eBank UK experience Rachel Heery, UKOLN, University of Bath

                                                             

Plans for eBank Phase 2

• Progress towards generic data model for description of research datasets– Validate eBank schema against other schema– CLRC Scientific Metadata Model

• Modify eprints.org software to allow for more varied scientific data and schemas

• Investigate identifiers e.g. International Chemical Identifier (InChI code)

Page 36: Integrating research data into the publication workflow: eBank UK experience Rachel Heery, UKOLN, University of Bath

                                                             

Plans for eBank Phase 2…….(contd.)

• Explore embedding in chemistry workflow

Potential to expand remit to

• wider range of crystallography data

• other chemistry sub-domains

• broader physical sciences

Page 37: Integrating research data into the publication workflow: eBank UK experience Rachel Heery, UKOLN, University of Bath

                                                             

eBank (potential) links with eLearning

• Provide access to primary research data within learning materials – in the taught postgraduate curriculum in chemistry,

undergraduate project work, chemical informatics courses

• Inclusion of e-research data in e-learning courses. – through links in reading lists, through essay assignments,

through analytical problems, through practical work, through RDN PsiGATE links

Page 38: Integrating research data into the publication workflow: eBank UK experience Rachel Heery, UKOLN, University of Bath

                                                             

In conclusion

• eBank demonstrates benefits to research community

• Potential for integration into digital library services– Moving from demonstrator to service, need to

involve publishers and specialist services

Page 39: Integrating research data into the publication workflow: eBank UK experience Rachel Heery, UKOLN, University of Bath

                                                             

The end…

Questions?

http://www.ukoln.ac.uk/projects/ebank-uk/