where do we currently stand at icarda?

22
Data management for CRP DS research: where do we currently stand at ICARDA? CO: CGIAR Open Access and Data Management Plans & Implementation (Article 4.1.9) states “Open Access and Data Management Plans should be prepared in order to ensure implementation of this Policy. Such Plans shall, in particular, outline a strategy for maximizing opportunities to make information products Open Access”. Output: Research quality and data quality issues in CRP DS research and mechanism/workflow

Upload: cgiar-research-program-on-dryland-systems

Post on 22-Nov-2014

96 views

Category:

Environment


1 download

DESCRIPTION

Data management: Where do we currently stand at ICARDA?

TRANSCRIPT

Page 1: Where do we currently stand at ICARDA?

Data management for CRP DS research: where do we currently stand at ICARDA?

• CO: CGIAR Open Access and Data Management Plans & Implementation (Article 4.1.9) states “Open Access and Data Management Plans should be prepared in order to ensure implementation of this Policy. Such Plans shall, in particular, outline a strategy for maximizing opportunities to make information products Open Access”.

• Output: Research quality and data quality issues in CRP DS research and mechanism/workflow

Page 2: Where do we currently stand at ICARDA?

Data management for CRP DS research: where do we currently stand at ICARDA?

Plan

•Sources of data under CRP DS

•Status of DM at ICARDA

•CRP DS research areas for data generation

•Issues and solutions related to Research quality and data quality

•Workflow for DM sharing

Page 3: Where do we currently stand at ICARDA?

Scope: Sources of data

Scope of work is determined by observing a complex interplay of•Base components: crops, livestock, rangelands, trees etc. & production systems

וBiophysical environment constraints: water scarcity, land degradation

וTechnological access : Access to the product and regulatory environment

Page 4: Where do we currently stand at ICARDA?

Partners in the DM (OA)

• Who generates the data? Who owns them? Who regulates their sharing?

Outcome: What after archiving with an Open Assess (OA) System?

• Data mining• Exploration of large or even BIG data leading to a wider

picture viewed from the bridge• No dearth of random factors/sources in data• Availability of prior information• Bayesian analysis to span the statistical inference domain

to reality

Page 5: Where do we currently stand at ICARDA?

CRP DS DM Current Status

DM Status at ICARDA /Its Flagships Target Regions

•ICARDA Projects: D and DM with scientists, archived in their laptops, different various locations/countries

•GU data on Central servers, Amman, Jordan

•D Manager to be recruited•NARS data with NARS

Page 6: Where do we currently stand at ICARDA?

ICARDA Projects dealing with CRP DS

ICARDA Programs: DSIPS, IWLM, SEPR, BIGM (also generating data for other CRPs)

1. Cropping systems and Agronomy on-station and on-farm•On-station Trials

– Single factor, multi-factors including:– systems of rotations, intercrop, monocrops– crop components– fertilizer input– IPDM controls and other management factors

Page 7: Where do we currently stand at ICARDA?

CRP DS DM - Research quality

On-farm Trials:

• Less frequent: research for technology generation

– Experimental design with small number of treatments, small blocks, variable treatment designated as control or farmer-technology, relatively large number of replications

•Most frequent: technology verification and demonstration

•Sampling design: large plots, small number of sample is a concern

Page 8: Where do we currently stand at ICARDA?

ICARDA Projects dealing with CRP DS

• A list of data sources for CRP DS and other ICARDA projects:

• Design of crop rotation trials [general]• DM and Analyses of data from the 2-course long-term

wheat rotations (productivity, sustainability aspects including time-trend estimation). [NAWA: Long-term crop rotation trials on wheat & Barley at Tel Hadya, Syria, Long-term wheat rotation trial at Kamishly, Syria, Long-term sustainability trials in Egypt, etc.]

• Evaluation of conservation tillage data [CA trials in Jordan and Iraq]

• Analyses of data from livestock evaluation experiments [Long-term trials, wheat & Barley at Tel Hadya]

Page 9: Where do we currently stand at ICARDA?

ICARDA Projects dealing with CRP DS

ICARDA Outlook:•Decentralization of ICARDA has changed the way we do our business. •Archiving data and sharing has [essentially] become the way of our business.

•We need to extend [quality] data sharing from within ICARDA to Public.

NARS/ five Flagship target regions•1) The West African Sahel and dry savannas , 2) East and Southern Africa, 3) North Africa and West Asia, 4) Central Asia, 5) South Asia

Page 10: Where do we currently stand at ICARDA?

ICARDA Projects dealing with CRP DS:Key to DQ

Research quality (RQ)•Experimental design could be an issue (in terms of blocking and replications)

•Approach/Solution: thorough discussion with subject matter specialists and biometrician/statistician

•Resources for enhancing RQ and DQ:• •JNR Jeffers (1978). Statistical Checklist: Design of Experiments No. 1 (Statistical checklists). Institute of Terrestrial Ecology, Natural Environment Research Council, Cambridge, UK. http://www.sawleystudios.co.uk/jnrj/StatisticalCheck/Design.htm)

• 

•JNR Jeffers (1979). Sampling (Statistical Checklist 2). Institute of Terrestrial Ecology, Natural Environment Research Council, Cambridge, UK. http://www.sawleystudios.co.uk/jnrj/StatisticalCheck/Sampling.htm

• 

•David J. Finney (1990). Statistical data-their care and maintenance. Indian Society of Agricultural Statistics.

•“This bulletin is extremely useful for students and research workers … topics dealt with are: acquisition of data, design of data gathering , care for data, types and units of data analysis and databases, copying, statistical ethics, data-entry to the computer, data scrutiny, integrity and some illustrations.”

Page 11: Where do we currently stand at ICARDA?

ICARDA Projects dealing with CRP DS

Examples of Data quality issues:•Experimental design accepted; crop management properly followed.• Experimental plots: plot size, harvested areas [2-row, 3-row, 4-row plots], calculation per hectare basis•Days to 50% flowering- how many plants were actually observed? •plant height (cm)- number of plants•seed yield, bio yield – area used; drying methods•Data entry?•Lack of Data recording electronic devices and transfer to file at laptop

– Early days: field-books– Recent: Android Apps etc.– Data in Excel worksheet

•What checks should we perform?•What should be the level of Experimental data quality for public sharing

Page 12: Where do we currently stand at ICARDA?

ICARDA Projects dealing with CRP DS2. Crop Improvement CRPs (CRP Wheat, CRP DC, CRP GL)•Single factor- Crop varieties •Unreplicated designs for test materials + replicated or repeated checks•Replicated variety trials in RCB, IBD (alpha-designs), p-rep designs•METs (Multi-environments/Multi-location and multi-year trials)•Two-factor experiments

– Crops + Crop varieties

– Sometimes agronomic trials – planting dates, IPDMs etc.

•Result outputs from Commodity CRPs, where breeding is the key component (CRP Wheat, CRP DC, CRP GL) flow to CRP DS.

Where are the data?•Data with scientists in their laptops•Status in relation to DM(OA)/sharing is unknown to me

Page 13: Where do we currently stand at ICARDA?

ICARDA Projects dealing with CRP DS

3. Issues of the Poor Quality of Data- Indicators and resolves

A frequent issue of data quality•Is really something wrong with my data? Some statistical procedures work and some others do not, BUT the data are the same. Regression, GLM works but ANOVA does not. What is wrong with Stats?•ANOVA may turn to be a great tool for data checking, -- missing values in data variables may be the reality.•How about missing or repeats by mistake in a factor levels or factorial combinations?

Page 14: Where do we currently stand at ICARDA?

ICARDA Projects dealing with CRP DSSome cases of data quality issues:•1. Research Quality– experimental design OK but Data on Design not OK/ design factors incorrectly entered; frequently encountered; Must be corrected before analysis else we have carried out a study different from what we planned and still think.

– factor combinations not aligning with design (not missing observations)

•2. Observed data values; traits values: errors of recording/data transfers to files

– values out of range (a variable to lie within 0-100 or 0-1 goes outside; recording error)

– Outliers/ recorded values appear too extreme. Will require validation with the assistant/scientists and if errors are found then must be corrected; generally viewed as the context of uni-variate analysis.

– Outliers may have issues of interpretation and detection. Looks outlier in BY but not in log(BY) or sqrt(BY). There might be multivariate outliers. A column of remarks, possibly in the field book may support the recorded data.

Page 15: Where do we currently stand at ICARDA?

ICARDA Projects dealing with CRP DS

Some cases of data quality issues… continued

•3. Relationships between the traits appearing along the crop development cycle may also be identified and used to build in data quality

• DAF << DMAT

• GY << BY

•4. Helpful: Electronic data loggers (balance, Android Apps, with GIS/Date)

•5. Role of the scientist/ a data supervisor must be made effective—random checks on data recording in the field book as well as in the file. Observations should be validated by another researchers experienced in the same discipline, particularly with visual scores. Random checks could be more effective. Data errors could be linked to the observer.

Page 16: Where do we currently stand at ICARDA?

ICARDA Projects dealing with CRP DS

Some checks and balances:•Data care bulletin (see References)

Tools:•Design experiment/survey specific tools (Biometrician/Statistician to Data Manager). Clearly define the roles.•Examine factors combinations appearing in the data•Examine tables/cross-tables for qualitative data•Descriptive statistics

• min, max, range, ratio=max/min (min>0)• Histograms

•Box-plots and other diagnostics

Page 17: Where do we currently stand at ICARDA?

ICARDA Projects dealing with CRP DS

Some checks and balances … continued.•No go with ANOVA may turn to be a good thing to check bad data. However, as in above,

– Missing values in response/covariate variables are a reality – But missing a factor level or factor combinations appear due to

data entry error; combinations being different from those in the design.

– Cases of repeated units – data entry errors

•Outliers, if detected via a model fitting should stay in the data. Of course data validation, where possible, is encouraged.

Page 18: Where do we currently stand at ICARDA?

ICARDA Projects dealing with CRP DS

Some checks and balances...continued:•Benefiting from ICRISAT Tools and Techniques

• on data checking tools• archiving the data on public platforms (an

enforcer of Data Quality)• e.g. data systems from ICRISAT, Dataverse (http://dvn.iq.harvard.edu/dvn/)•Computing tools/procedures: Training and development•Excel macros, Genstat/SAS/SPSS/R/other software•Database development/datasheet preparation/ archiving

Page 19: Where do we currently stand at ICARDA?

ICARDA Projects dealing with CRP DS

An attractive specialization:

•Data Science•The Data Scientist’s Toolbox: https://www.coursera.org/course/datascitoolbox

Page 20: Where do we currently stand at ICARDA?

ICARDA Projects dealing with CRP DSCrystalizing an approach:

4. CRP: Dryland Systems Management: Workflow components

Home Center: Project/Meta data: Project ID, objectives, location, year, personnel (Planner, M&E team, data collector etc.), trial level information, factors (design and treatment), variables etc., A report of data validation in Step 2; links to data;

•Data <<<< validation (via agreed tools) •Mechanism for Data Quality Check•1. Scientists >> >2. Statistician/DM team: apply the agreed tools• a) If fails-----> (1) to scientist for update • b) If passes----> Get metadata and links to data•2. Archiving (what? who will do this? DM Team?)

• Sharing permissions etc. This could be a Workflow of permissions: Requester ---> Approval 1--->Approval 2 ---…---> Director CRP DS/nominee.

Page 21: Where do we currently stand at ICARDA?

ICARDA Projects dealing with CRP DS

…..continued:•Information Management. This refers to the [statistically analysed] results files/publications generated.•Knowledge Management: Key findings, Implications, lessons learned

NARS•Identify the active NARS partners•Training on the above tools and workflow, Share Policy and Procedure on CRP DS DM (OA) •Identify the risk factors and their indicators and develop an action plan with resources required•Measure and Monitor the impact

Page 22: Where do we currently stand at ICARDA?

Thank you