sc13 bof: rda and hpc
Post on 25-May-2015
233 Views
Preview:
DESCRIPTION
TRANSCRIPT
Research Data Alliance (RDA) for HPC SC13 Birds of a Feather session November 20, 2013 17:30-19:00 MST Colorado Convention Center Denver Colorado
Contribution of John W. Cobb Oak Ridge National Lab. DataONE Project
2 Presentation name
Why Am I here? From what perspectives do I speak?
• Discipline scientist • HPC application evangelist • Cyberinfrastructure leverage for experimental facilities • Cyberinfrastructure/HPC center operations • Cyberinfrastructure efforts for data-Intensive science efforts
Without data there is no science
3 Presentation name
HPC centers and archive have different service objectives
Cycles not used are lost
Data management involves a long-term commitment of resources
4 Presentation name
Comparing HPC centers and data archives
Simulations • Generate data at will
• Can programmatically
control data quality • Can be reproduced more
easily • ==> Can be copious • weaker tradition of
metadata and data quality
Experiment/Observation • Collect data from physical
events • Data quality may be limited
by collection methods • May be difficult, expensive,
or impossible to reproduce • ==> May be more limited • long-term focus on metadata
and data quality
5 Presentation name
Consequently different challenges
• HPC centers excel at: – Volume and velocity – Analysis at scale
• Archives excel at: – Variety – Metadata capture – Data quality
6 Presentation name
Convergence of data and HPC Some DataONE experience
7 Presentation name
eBird pilot project exploration and visualization
Spa$o-‐Temporal Exploratory Model iden$fies factors affec$ng pa;erns of migra$on
Diverse bird observa$ons and environmental data from 300,00 loca$ons in the US integrated and analyzed using High Performance Compu$ng Resources
Land Cover
Meteorology
MODIS – Remote sensing data
• Examine pa;erns of migra$on
• Infer how climate change may affect bird migra$on
Model results
Occurrence of Indigo Bun=ng (2008)
Jan Sep Dec Jun Apr
8 Presentation name 8
9 Presentation name
Exploration, Visualization, and Analysis
9
Benchmark Observa=ons
Terrestrial Biosphere
Model Output
Model Structure
Informa=on
Provenance Framework
Workflows for hypothesis
development, testing, and exploration
Interactive maps and plots for multi-dimensional data exploration and analysis
10 Presentation name
DataONE experience • CI created: interoperable data service functional interfaces • 4 reference interface implementations completed • 8 client-side “investigator toolkit” tools released, 4 more in
development • 16 collaborating Member Node repositories (internationally) • > 100,000 data objects published • Conducted 81 workshops of data management • Published 65 data management “best practices” • Completed several baseline and follow-up surveys on state
of data management with scientists, libraries, librarians, …
11 Presentation name
DataONE experience (cont.)
About half the effort has been on education, training and outreach about
data management practices
12 Presentation name
“Data = Human” - Genevieve Bell SC13 Keynote
top related