outline of presentation public use datasets (puds) p5_public use... · outline of presentation ......

16
5/16/2015 1 Society of Clinical Trials Pre-Conference Workshop, May 17, 2015 Arlington, Virginia Public Use Datasets (PUDS) J. Michael Dean, M.D., M.B.A. H.A. and Edna Benning Presidential Professor of Pediatrics Vice Chairman for Research, Department of Pediatrics University of Utah School of Medicine Outline of Presentation Why is there pressure for public use datasets? DCC development of datasets Description of two public use data set websites Required agreements for investigators Authorship requirements and restrictions How much do you include? Pressure for public use datasets Why did this question arise? EPA and Airborne Asbestos 1978 through 1990 EPA required REMOVAL of asbestos- containing materials (ACM) 1990 - EPA reversed course and acknowledged “managing asbestos in place”, which remains the current course of management unless building being demolished or renovated Measurement debate Scientific discourse, repeat measurements, led to change of EPA

Upload: vannhan

Post on 07-Apr-2018

214 views

Category:

Documents


1 download

TRANSCRIPT

5/16/2015

1

Society of Clinical Trials Pre-Conference Workshop, May 17, 2015 Arlington, Virginia

Public Use Datasets (PUDS) J. Michael Dean, M.D., M.B.A. H.A. and Edna Benning Presidential Professor of Pediatrics Vice Chairman for Research, Department of Pediatrics University of Utah School of Medicine

Outline of Presentation

✤ Why is there pressure for public use datasets?

✤ DCC development of datasets

✤ Description of two public use data set websites

✤ Required agreements for investigators

✤ Authorship requirements and restrictions

✤ How much do you include?

Pressure for public use datasets

Why did this question arise?

✤ EPA and Airborne Asbestos

✤ 1978 through 1990 EPA required REMOVAL of asbestos-containing materials (ACM)

✤ 1990 - EPA reversed course and acknowledged “managing asbestos in place”, which remains the current course of management unless build ing being demolished or renovated

✤ Measurement debate

✤ Scientific d iscourse, repeat measurements, led to change of EPA

5/16/2015

2

Asbestos measurements

✤ In the EPA example, there was debate about the measurements

✤ The measurement methods could be replicated (actually not)

✤ New measurements led to new conclusions, and policy changed

✤ This does NOT work with epidemiologic stud ies or large studies of specific populations (such as TBI, for example)

Endocrine “disrupters”

✤ Great Lakes pollu tants leading to cancer

✤ Pollu tants affecting estrogen and testosterone actions, which affect all meaningful aspects of human life, and hence, pollu tants were causing a panoply of adverse effects

✤ Tulane study in 1996 (McLachlan) showed synergy of toxicity of pesticides -> led to Food Quality Protection Act (1996) and Safe Drinking Water Act (1996)

✤ Tulane results were refu ted and even the Tulane investigators w ithdrew their report (1997)

✤ Results were refu ted; the laws remain on the books

Theo Colburn (endocrinologist)

✤ As the evidence for her theory collapsed , Colborn declared that evidence isn’t important:

✤ “Just because we don’t have the evidence doesn’t mean there are no effects.”

More Examples

✤ National Cancer Institu te and herbicide 2,4-D

✤ NIH Women’s Health Study and the Dalkon Shield

✤ FDA and Fen-Phen (heart valve damage)

✤ In all three instances, serious issues about data management and interpretation of resu lt. All had serious economic consequences.

5/16/2015

3

Air Quality and Pope Study

✤ Congress requested EPA data for 1996 air quality standards and EPA refused to allow access

✤ Pope study - telephone survey of 1 million people by 77,000 volunteers, combined with air quality / pollu tion data

✤ EPA: “We do not believe ... there is a usefu l purpose for EPA to obtain the underlying data [since the stud ies were published in peer -reviewed journals]. ...Securing more detail about this information is not necessary as part of EPA’s public health standard -setting process.”

✤ 1997 - new EPA air quality regulations

Congress irked ...

✤ In 1997, Congress evaluated legislation requiring the federal government to make data from federally funded research available to the public. Measure was defeated .

✤ Early 1998, passed law to require OMB to study implications of public access to data from federally funded research. Vetoed by Clinton.

✤ October 1998 Shelby Amendment to OMB funding:

✤ “...to require Federal awarding agencies to ensure that all data produced under an award will be made available to the public through procedures established under the Freedom of Information Act.”

OMB Circular A-110 (1999)

✤ Any private citizen can FOIA for data produced with federal funding that has resu lted in a published report.

✤ Report is published when it appears in a scientific or technical journal, OR when the find ings are cited in support of a Federal agency action that has the force and effect of law.

Reproducibility of Results

✤ Public use datasets should enable other investigators to at least reproduce the statistical analyses of resu lting publications.

✤ Some statistical methods are trivial to describe, and reproducing those resu lts may not even be interesting

✤ Other methods are moderately complex (TBI and CART - there are numerous software settings and judgments)

✤ Some methods are extremely complex (genomic pipelines)

5/16/2015

4

Example of Reproducibility

Nature Medicine 12(11):1294, 2006, 2008

Genomic Signatures

Predict Breast Cancer Response Cancer Trials Launched

✤ Breast cancer trials were launched based on these data

✤ Women were stratified accord ing to the sensitivity pred iction of the signatures from the Potti paper

✤ Keith Baggerly (MD Anderson) was asked to reproduce these resu lts so that the method could be used at MD Anderson for their patients

✤ Baggerly could not reproduce the resu lts (the data were, in fact, available)

5/16/2015

5

149 page supplement to Nature letter to ed itor, included

all statistical code, scripts, and instructions to create the

report that repudiated the original Potti paper.

Paper retracted in 2011

Clinical trials had actually assigned women to the incorrect sensitivity

arms, had the original report even been accurate.

Publicity on 60 Minutes

5/16/2015

6

How does this relate?

✤ Duke is a “pretty good institu tion” - nobody would d ispute top tier

✤ The only way that the Potti resu lts were refu ted was reanalyses of the data, which had been provided in fu ll to other investigators

✤ Potti et al actually assisted Baggerly as they tried to reproduce the resu lts

✤ The PUBLIC (and Congress) believe this is not rare.

Maximize Scientific Return on $$

Tell me what

datasets you

have released!

Development of Public Use

Datasets by the DCC

Initial evaluation and development

✤ Identify data elements that are sensible; goal is to make the PUDS actually of use

✤ Data elements that are not usually missing

✤ Create documentation of how data elements are defined

✤ Create Excel spreadsheet describing all the data elements that were identified by DCC as candidates

✤ Create SAS cross tabulations and frequency tables

5/16/2015

7

Collaboration with investigator

✤ Materials developed by DCC should be provided to investigator for refinement and d iscussion

✤ DCC goal is not to limit collaboration with investigator, bu t it IS to limit the amount of time that investigators need to spend on this

✤ Estimate that 3 to 6 months of effort goes into getting the data set to the stage where investigator input is valuable, and then hopefu lly the investigator only needs to spend hours on the process.

Goals for datasets

✤ In current stud ies, we try to keep ALL manipulations of data within the database, so that releasing the final database will reflect what was done in the study

✤ This is not always possible going backwards, as many derived variable constructions are done in SAS or other statistical environments

✤ Theoretical Goal: all study data, adverse events, medications, etc. would be released . Not restricted to data used in publications.

Example PUDS Website: CPCCRN

5/16/2015

8

5/16/2015

9

5/16/2015

10

Overview for Context

Identify Exclusions Documentation: Key Issues

✤ If you release a public use dataset, documentation needs to be sufficient so that you do not get asked to support it.

✤ In fact, support is expressly denied .

✤ Provide annotated CRF or eCRF diagrams, complete definitions, etc. so that the work of creating the PUDS is not wasted .

5/16/2015

11

Example PUDS Website: PECARN

Major study, huge ramifications

5/16/2015

12

Bronchiolitis Dataset

5/16/2015

13

Required investigator agreement

5/16/2015

14

Authorship requirements

How much do you include?

Actual data ...

5/16/2015

15

Seems easy - but slippery slope.

✤ What about statistical code? At least for getting the data into their computer?

✤ What about sample scripts so the user can verify that the data were not corrupted?

✤ What about the SAS scripts that created the exact tables and figures in the published manuscripts?

✤ What about all the derivations of derived variables?

✤ What about the dual coding for critical analyses? Do you release both programs?

Releasing Code

✤ Seems like good idea and we are actually considering how to accomplish release of some of our code

✤ Will you then have to explain how the code works?

✤ If a DCC has a large library of SAS macros that have been developed over years, are these macros really proprietary information?

✤ If you accept the “goodness” of releasing code, are you obliged to provide it in multiple languages (SAS, STATA, SPSS, R, etc.)?

Special Requests

✤ Contacted by esteemed, international expert on a relevant area, who knows that there are some additional data elements that were not included in the PUDS.

✤ Example: identification of institu tions in multicenter stud ies

✤ How do you handle this, since support of the PUDS is expressly not provided by the DCC?

Summary

✤ Public use data sets have political support at all levels

✤ We will lose if we try to block data release (personal opinion)

✤ Data use agreements to protect from superfluous requests

✤ Data sets represent work product, which supports fu ture network funding

✤ We (DCC) work hard to minimize the work load required of investigators, bu t this is not intended to eliminate the collaboration of the investigator during the process. This is key.

5/16/2015

16

Thank you for your attention.