realizing the potential of research data by carole l. palmer
TRANSCRIPT
Realizing the Poten.al of Research Data Carole L. Palmer Informa.on School University of Washington Coalition for Networked Information 14 April 2015
Realizing the Poten.al of Research Data
Carole L. Palmer
Information School University of Washington
Coalition for Networked Information 14 April 2015
• Are we the experts we need to be?
• What are the exemplar for data resources and services?
• Can we learn and lead at the scale and pace needed?
Well posi.oned—ins.tu.onal and human infrastructure, exper.se, commitment
Preparing e-‐Science Informa1on Specialists: New Programs and Professionals
BiologicalInformationSpecialists
2006-09
Summer Institutes in
Data Curation 2008-11
Data Curation in the Sciences
2006-11
Data Curation in the Humanities
2008-12
2008
Going forward, must not underes+mate challenge.
2003 at least 11 reports from NSB, NRC, NSF
Deluge of discourse and direc.ves
Early leadership from Information Schools
Unsworth Atkins
Building the Infrastructure for Cyberscholarship
Larsen
Deluge of repositories and standards
665 “databases” 584 standards
215+ standards
2783 RDA Community Members +138 since Feb. 15
393
993 1276
1658 2051
2407 2645 2783
May - July 13
Aug - Oct 13
Nov - Jan 14
Feb - Apr 14
May - July 14
Aug - Oct 15
Nov - Jan 15
Feb -Mar 15
95 countries - 50% Europe, 37% US
Reduce barriers to data sharing
Accelerate coordinated global data infrastructure
RDA/WDS Publishing Data Bibliometrics
Data providers
Data consumers
Social
Technical
Solutions dimension
Beneficiary dimension
Data Citation
Data Foundation and Terminology
Repository Audit and Certification
Brokering Governance
PID Information Types
Data Type Registries
RDA/WDS Publishing Data Workflows
RDA/CODATA Summer Schools in Data Science
Metadata Standards Directory
Practical Policy The BioSharing
Registry Wheat Data Interoperability
RDA/WDS Publishing Data Services
Data Description Registry Interoperability
Q1
Q2 Q3
Q4
56 Working and Interest Groups
Libraries for Research Data
Data Repositories
Na.onal data services
Abundance of data science ini.a.ves
5/19/15 Bill Howe, UW 10
2014 -‐ UW eScience Celebra1on of data intensive research Data Science Kickoff Session: 137 posters from 30+ departments and units
hCp://escience.washington.edu/
Bill Howe, UW
from homogeneous, centralized, local to heterogeneous, distributed, coordinated
-‐ consolida.on -‐ gateways for interopera.on
“make-‐or-‐break” phase (Parsons & Berman, 2013)
Early choices constrain op.ons
(Edwards, et al., 2007)
Problem: dynamics of systems and networks
Research programs of researchers • extend and legi.mate products of work • dominate cycles of credit and resources
Ins.tu.ons support new rou1nes long enough for dis1nc1ve types of work to emerge
• establish service roles • facilitate links with other disciplines • enable transmission of techniques and informa.on
Ins.tu.ons as intellectual habitat
Lenoir, Timothy. 1993. “The Discipline of Nature and the Nature of Disciplines.” In Knowledges: Historical and Cri6cal Studies in Disciplinarity, edited by Ellen Messer-‐Davidow, David R. Shumway, and David J. Sylvan. Charlogesville: University Press of Virginia.
PIs on major proposals
+ eScience Ins.tute Steering Commigee
+ Par.cipants in February 7 Campus-‐Wide Data Science poster session
Bill Howe, UW
5-‐year, $37.8 million cross-‐ins.tu.onal collabora.on to create a data science environment
Data Science Studio
6th floor Physics Astronomy Building
Partnership among: • Provost
• UW Libraries
• Physics, Astronomy,
Arts & Sciences
• eScience Ins.tute
Bill Howe, UW
Revisioned the library focus on working spaces and culture
Bill Howe, UW
16
Quiet work
Seminar & Group work
Casual & Open work
Bill Howe, UW
17 Bill Howe, UW
18
Project leads must physically co-‐locate with the incubator staff.
Resident data science team
– Permanent staff of ~5 data scien6sts – applied research and development – Drop-‐in open workspace – Studio “Office Hours” – Incuba.on Program
…plus seminars, sponsored lunches, workshops, bootcamps...
19 Bill Howe, UW
Library in the mix:
Office hours Reproducibility and Open Science Group Site visits
“Don’t see how you do it without the library.”
Data Science: the rising .de that liss all boats
Bill Howe, UW
Problems - eScience vs. open, curated data
How much time do you spend “handling data” as opposed to “doing science”?
Mode answer: 90% (Bill Howe, 2015)
21
What qualifies as releasable data? Open data constrained by eviden.al cultures -‐
Individualism vs. Collec.vism (Collins, 1998)
Who takes responsibility for validity and meaning?
Why do we invest in data?
Open data requirements and expecta.ons Reproducibility, replica.on, and other “Rs” Stewards of the common good / scholarly record
Compe..ve, innova.ve research
– exemplars of “open” research
– centers of excellence, research prominence
Op.mizing data for reuse
Different objec.ves and exper.se than:
preserving a record of research providing access and transparency
and much more resource intensive.
data resources
richdeep
functional
Promo.ng our deep, rich, func.onal data
Data Curation Profiles Project
Data Conservancy
Site-Based Data Curation at YNP
Producer sets / consumer subsets
Empirically derived reuse principles
Releasable ≠ reusable
Indicators of reuse value
Primacy of method
Geobiology data from Yellowstone National Park
Used with permission from B. Fouke
True reusability for site-‐based data
Retain value and promote reuse of data from scientifically significant sites.
Reuse dependent on Sampling procedures
Field campaign context:
geological feature *** new measurements
vent location, etc.
within
Crisis in resource collec.ons
NSB 2005:…ever increasing investment in crea.ng and maintaining collec.ons, and the rapid mul.plica.on of collec.ons, with a poten.al for decades of cura.on.
Atkins and Unsworth: Value-‐added … widely shared … collec.ons…enabling …interdisciplinary research …
Greatest challenges not ability to move across disciplinary boundaries but in maintaining the increasingly long and mutable intellectual paths to our disciplinary past. (Palmer, 2010)
Center for Information and Communication Studies
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
2011 2014 2011 2014 2011 2014 2011 2014 2011 2014
ID datasets Create metadata Prepare for deposit Deselec.on Tech support
Yes, currently offered No, but < 12 months No, but 13-‐24 months No, but > 24 months No, and no plans
Research Data Services Offered or Planned in ACRL Libraries
(Tenopir, ASIST, 2014)
Where are we with the workforce?
Trending up: • Informa.on Steward • Data Steward • Digital repository • Digital preserva.on • Cura.on Science • Digital Curator • Data Curator Trending down:
Librarian
(R. Larsen, IDCC, 2014)
Forthcoming -‐ Preparing the Workforce for Digital Cura.on
Illinois data cura.on placements
• Research Data Management Service Design Analyst
• Data Management Consultant • Data Science & Informa.cs Librarian • Data Curator • Assistant Dean, Digital Humani.es Research
• Data Steward Consultant • Solu.ons Analyst • Senior General Engineer • GIS Specialist • Director of Archive Technology • Digital Asset Manager • Informa.on Architect • Informa.on Systems Associate • Digital Project Coordinator • Media Content Specialist
Academic
Posi1ons that (probably) didn’t exist 5 years ago
Non-‐academic posi1ons
• 40% of placements, ¼ of those outside library
• Many focused on metadata and technology
More exper.se
Field experiences
with multiple mentors
Data / Science / Peer mentors
(See DCERC video at: https://www.youtube.com/watch?v=mbX5bvgT1ME)
Classroom experiences with multiple experts • Earth science data center services • Cyberinfrastructure R & D • International data sharing coordination • Funding & policy perspectives
Climate model metadata Sensor data
archiving Time-series temporal spatial
Social science data
organization Analog data for digital access
metadata harvesting, standards compliance, quality
processing & file migration cross-disciplinary
data curation; subsetting
high resolution, provenance, NetCDF
50 international collections, OAIS, DOIs
NCAR internships
Data
Technology Domain
Programming Databases
Website & interface Design
Storage & movement
New technologies
Measurements
Research process
Search & retrieval
Metadata Transformation
Interest Terminology
Communication with scientists
Processing
Policy
Standards
Systems analysis
Data structures & formats
Quality control
Archiving & preservation
User requirements
Trends in na.onal data facili.es • Some growth in positions • Fewer specialized roles • Value of Information Science • Intern programs restricted
Too much to lose, if we don’t get it right.
data resources
richdeep
functional
• marshal our strengths in LIS • leverage progress across disciplines • build a new LIS founda.on in the science of data
“Your analytics are only as good as your curation.”
Thank you for your aCen6on.