impact evaluation in agriculture - abdul latif jameel ... · impact evaluation in agriculture ......

55
2012-01-19 ATAI-IITA Impact Evaluation Workshop 1 of 1 Impact Evaluation in Agriculture January 23–26, 2012 Center of Evaluation for Global Action International Institute of Tropical Agriculture Agricultural Technology Adoption Initiative

Upload: vuonganh

Post on 19-Aug-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Impact Evaluation in Agriculture - Abdul Latif Jameel ... · Impact Evaluation in Agriculture ... and investments in the future and children -in East ... II Professor of Economics

2012-01-19 ATAI-IITA Impact Evaluation Workshop 1 of 1

Impact Evaluation in Agriculture January 23–26, 2012

Center of Evaluation for Global Action International Institute of Tropical Agriculture Agricultural Technology Adoption Initiative

Page 2: Impact Evaluation in Agriculture - Abdul Latif Jameel ... · Impact Evaluation in Agriculture ... and investments in the future and children -in East ... II Professor of Economics

2012-01-19 ATAI-IITA Impact Evaluation Workshop 1 of 1

Table of Contents Agenda………….............................................................................................................................3 Biographies of Lecturers………………………..................................................................................5

List of Participants......................................................................................................................8

Groups…………………………….……………………………………………………………………………………………10

Group Presentation Guide…………………………..………………………………………………………………….12

Course Material

Group Work: Drafting Theory of Change on Adoption and Impact.............................13

Case Study 1: How to Randomize..................................................................................15

Case Study 2: Threats to Experimental Integrity..........................................................19

Exercise 1: Mechanics of Randomization ....................................................................25

Exercise 2: Sample Size and Power………………………………...........................................33

Group Presentation Template...................................................................................................40

Checklist for Reviewing Randomized Evaluations of Social Programs....................................42

Impact Evaluation Glossary......................................................................................................50

Page 3: Impact Evaluation in Agriculture - Abdul Latif Jameel ... · Impact Evaluation in Agriculture ... and investments in the future and children -in East ... II Professor of Economics

2012-01-19 ATAI-IITA Impact Evaluation Workshop 1 of 2

IMPACT EVALUATION IN AGRICULTURE

January 23 – January 26, 2012

Day 1

9:00-9:10 Welcome, Introduction and Official Opening: IITA Emily Ouma, IITA

9:10-9:30 Introduction: Introductions to ATAI and CEGA Temina Madon, CEGA

9:30-10:45 Lecture 1: Using Randomized Evaluations to Test Adoption Constraints Chris Udry, Yale University

TEA BREAK 11-12pm Lecture 2: Quasi-experimental Methods

Chris Udry, Yale University 12-1 pm Lecture 3: Alternative Strategies for Randomizing Programs

Karen Macours, Paris School of Economics LUNCH

2:00—3:00pm Group Work: Drafting Theory of Change on Adoption and Impact

TEA BREAK 3:15-4:45pm Panel: Overview of studies

- Land Tenure (Karen Macours) - Insurance (Chris Udry) - Collectives and Cooperatives (Ruth Vargas Hill, IFPRI)

4:45-6:00 Group Work: Randomization Design - Case Study: Fertilizer and BlueSpoon

Day 2

9:00-9:10 Recap: from Day 1 Temina Madon

9:10-10:00 Lecture 4: Power and Sample size for Clustered RCTs using examples Karen Macours

TEA BREAK 10:10-11:00 Lecture 5: Managing and Minimizing Threats to Analysis

Ruth Vargas Hill 11:00-11:45 Group Work: Managing Threats

- Case Study: TNS Agronomy Training 11:45-1pm Lecture 6: Randomized Evaluation: Start-to-finish

Chris Udry LUNCH

2:00-3:00 Lecture 7: Gender in Impact Evaluation Chris Udry

Page 4: Impact Evaluation in Agriculture - Abdul Latif Jameel ... · Impact Evaluation in Agriculture ... and investments in the future and children -in East ... II Professor of Economics

2012-01-19 ATAI-IITA Impact Evaluation Workshop 2 of 2

TEA BREAK 3:15-6:00 Group Work: Project Work

- Work on evaluation design

Day 3 9-12pm STATA Session 1 (breakout):

Beginner – Introduction to STATA Advanced

- Getting to know the dataset, Exercise: Sampling and power calculations

- Changing the dataset, Exercise: Problem set and review LUNCH

1-3pm STATA Session 2 (breakout): Beginner – Getting to know the dataset

- Exercise: Sampling and power calculations Advanced – Analysis

- Exercise: Problem set and review

TEA BREAK 3:15-5:30pm STATA Session 3 (breakout):

Beginner – Changing the dataset - Exercise: Problem set and review

Advanced – Analysis - Exercise: Problem Set and Review

Day 4 9:00-11:00 Group Work: Project Work

TEA BREAK 11:10-12:00 Lecture 8: Scaling up successful interventions

Karen Levy, IPA LUNCH

1:00-3:00 Group Presentations – Each presentation 15 minutes + 15 minutes Q&A

TEA BREAK 3:15-5:00 Group Presentations

5:00-5:15 Wrap up

Page 5: Impact Evaluation in Agriculture - Abdul Latif Jameel ... · Impact Evaluation in Agriculture ... and investments in the future and children -in East ... II Professor of Economics

2012-01-19 ATAI-IITA Impact Evaluation Workshop 1 of 3

IMPACT EVALUATION IN AGRICULTURE Workshop Lecturers Samuel Bazzi PhD Student University of California, San Diego Samuel Bazzi is a 4th year Ph.D. candidate at the University of California, San Diego. His research focuses on international migration, economic growth, and other topics in development economics including the welfare effects of cash transfers, the evolution of the firm size distribution, and the effect of economic shocks on conflict. His primary thesis chapter explores the role of financial barriers to international migration from low-income settings. He has regional expertise in Southeast Asia and Brazil. Marshall Burke PhD Student University of California, Berkeley Marshall Burke is a PhD student in the Department of Agricultural and Resource Economics, University of California - Berkeley. His research focuses on agricultural technology adoption in Africa, and on the impacts of climate change on agricultural productivity and social outcomes in both the de veloped and developing world. Current fieldwork is in western Kenya.

Willa Friedman PhD Student University of California, Berkeley Willa Friedman is a PhD Candidate in Economics at UC Berkeley. Her research spans the field of development economics, focusing on Sub-Saharan Africa. Her current projects include: investigating behavioral responses to the availability of antiretroviral drugs – including risk-taking, HIV testing, and investments in the future and children - in East Africa; testing the impacts of provider training on diarrhea treatment in Ghana; studying the impact of education on political beliefs among girls in Western Kenya; and estimating the relationship between local economic conditions and participation in violence during the genocide in Rwandan. Before coming to Berkeley, Willa worked with the Abdul Latif Jameel Poverty Action Lab in Western Kenya. She has worked and studied in Uganda, Rwanda, Mali, Kenya, Burkina Faso, Cambodia, and Ghana.

Page 6: Impact Evaluation in Agriculture - Abdul Latif Jameel ... · Impact Evaluation in Agriculture ... and investments in the future and children -in East ... II Professor of Economics

2012-01-19 ATAI-IITA Impact Evaluation Workshop 2 of 3

Ruth Vargas Hill Research Fellow IFPRI Ruth Hill is a Research Fellow in the Markets Trade and Institutions division of IFPRI. She joined IFPRI in 2007 as a post-doctoral fellow in the Director General’s Office. Ruth has 9 years experience conducting research on rural markets in East Africa and South Asia, more recently focusing on formal and informal markets for insurance. She has worked on the design and implementation of index-insurance projects in Bangladesh, Ethiopia and India. Her work in Ethiopia and Bangladesh focuses on designing group-based index insurance schemes which combine group saving and lending with the purchase of formal insurance products. She also conducts research on market institutions and has been working with firms in Tanzania and farmers groups in Uganda to identify and implement interventions that improve the functioning of markets. Prior to joining IFPRI Ruth worked at the World Bank. She received a PhD in economics from the University of Oxford in 2005. Karen Levy Senior Director Innovations for Poverty Action Dr. Karen Levy is a Senior Director at Innovations for Poverty Action. Her current work focuses on the development and implementation of school-based health policies and programs, particularly related to deworming. From 2006 through 2010, she served as Kenya Country Director for IPA. Since January of 2009, Karen has also served as Regional Director for Africa of Deworm the World. In this role she works with senior government health and education officials, providing technical and logistical support for the design, implementation, and monitoring of national school-based deworming programs. The first program she worked on in Kenya successfully reached over 3.6 million children. Karen received a BA with Honors from Brown University in 1994, an MSc with Distinction in Social Policy and Planning from the London School of Economics in 2000, and a PhD in Development Planning from the University of London in 2008. Karen was awarded the sole university-wide Bonnart-Braunthal Scholarship for her doctoral research, and a Public Service Fellowship from the Echoing Green Foundation for her leadership as a social entrepreneur. She has lived in Kenya for 14 of the last 18 years and speaks fluent Swahili.

Page 7: Impact Evaluation in Agriculture - Abdul Latif Jameel ... · Impact Evaluation in Agriculture ... and investments in the future and children -in East ... II Professor of Economics

2012-01-19 ATAI-IITA Impact Evaluation Workshop 3 of 3

Karen Macours Associate Professor Paris School of Economics Karen Macours is associate professor at the Paris School of Economics and researcher at INRA. Her research focuses on constraints to agricultural technology adoption and adaptation to climate change in developing countries; impact of social programs on rural households human capital (early childhood development, education, and health), productive investments and economic activities; and the role of aspirations in poor households decision-making. She is currently working on field experiments in Cambodia, the DRC, Haiti, and Nicaragua. Karen is an affiliate of CEPR and of JPAL Europe, and previously was an associate Professor at SAIS – Johns Hopkins University. She also was a core team member of the World Bank’s 2008 World Development Report on Agriculture for Development. She received her PhD in agricultural and resource economics from the University of California at Berkeley.

Temina Madon Executive Director Center of Evaluation for Global Action Temina Madon is Executive Director of CEGA and provides leadership in the Center's scientific development, partnerships, and outreach. She has worked as science policy advisor for the National Institutes of Health Fogarty International Center, where she focused on enhancing research capacity in developing countries. She has also served as Science and Technology Policy Fellow for the U.S. Senate Committee on Health, Education, Labor and Pensions, managing an extensive portfolio of global health policy issues. She holds a Ph.D. from U.C. Berkeley and an S.B. from MIT. Christopher Udry Professor of Economics Yale Universtiy Christopher Udry is the Henry J. Heinz, II Professor of Economics at Yale University. He is a development economist whose research focuses on rural economic activity in sub-Saharan Africa. He has conducted extensive field research in West Africa on technological change in agriculture, the use of financial markets, asset accumulation and gift exchange to cope with risk, gender relations and the structure of household economies, property rights and a variety of other aspects of rural economic organization. He spent two years as a secondary school teacher in northern Ghana, and has been a visiting scholar at Ahmadu Bello University in Nigeria and at the University of Ghana at Legon. At Yale, Udry has directed the Economic Growth Center and served as the Chair of the Department of Economics. He teaches graduate courses on development economics, and undergraduate courses on economic development in Africa.

Page 8: Impact Evaluation in Agriculture - Abdul Latif Jameel ... · Impact Evaluation in Agriculture ... and investments in the future and children -in East ... II Professor of Economics

2012-01-19 ATAI-IITA Impact Evaluation Workshop 1 of 2

IMPACT EVALUATION IN AGRICULTURE

January 23 – January 26, 2012 PARTICIPANT LIST

NAME ORGANIZATION EMAIL

Ignatius Abaijuka IFPRI (Harvest Plus) [email protected] Tahirou Abdoulaye IITA [email protected] Olu Ajayi ICRAF [email protected] Arega Alene IITA [email protected] Isabelle Baltenweck ILRI [email protected] Samuel Bazzi UC San Diego [email protected] Marshall Burke UC Berkeley [email protected] Richard Caldwell Bill and Melinda Gates Foundation [email protected] Ousmane Coulibaly IITA [email protected] Anne Degrande ICRAF [email protected] Mohamadou Fadiga ILRI [email protected] Steven Franzel ICRAF [email protected] Willa Friedman UC Berkeley [email protected] Robert Fungo Bioversity International [email protected] Zachary Gitonga CIMMYT [email protected] Kalie Gold One Acre Fund [email protected] Delia Grace ILRI [email protected] Paul Guthiga ILRI/ReSAKSS [email protected] Amos Gyau ICRAF [email protected] Anna Hiltunen University of Helsinki [email protected] Maryam Janani CEGA/UC Berkeley [email protected] Henry Kamkwamba ICRISAT [email protected] Joseph Karugia ILRI/ReSAKSS [email protected] Jacqueline Kaytsie East Africa Dairy Development Project [email protected] David Kimani East Africa Dairy Development Project [email protected] Evelyne Kiptot ICRAF [email protected] Holger Kirscht IITA [email protected] Oliver Kirui CIMMYT [email protected] Margaret Kroma AGRA [email protected] Karen Levy Innovations for Poverty Action [email protected] Rodney Lunduka CABI [email protected] Karen Macours Paris School of Economics [email protected] Temina Madon CEGA/UC Berkeley [email protected]

Page 9: Impact Evaluation in Agriculture - Abdul Latif Jameel ... · Impact Evaluation in Agriculture ... and investments in the future and children -in East ... II Professor of Economics

2012-01-19 ATAI-IITA Impact Evaluation Workshop 2 of 2

Rosina Manjala ICRISAT [email protected] Victor Manyong IITA [email protected] Jessica Marter Kenyon One Acre Fund [email protected] Stella Massawe ReSAKSS [email protected] Kai Mausch ICRISAT [email protected] Kizito Mazvimavi ICRISAT [email protected] Samuel Mburu ILRI Sander Muilerman IITA [email protected] Bernard Munyua ICRISAT [email protected] Geoffrey Muricho CIMMYT [email protected] Richard Musebe CABI [email protected] Essa Mussa ICRISAT [email protected] Marja Mutanen University of Helsinki [email protected] Elijah Mwangi ICRISAT [email protected] Beatrice Nabwire East Africa Dairy Development Project [email protected] Nicholas Ndiwa ILRI-ICRAF [email protected] Manson Nwifor IITA [email protected] Judith Oduol ICRAF [email protected] Emily Ouma IITA [email protected] Juliette Page CABI [email protected] Jane Poole ILRI-ICRAF [email protected] Dannie Romney CABI [email protected] Joseph Rusike IITA [email protected] Petra Saghir ILRI [email protected] Christin Schipmann ICRISAT [email protected] Mila Sell MTT Agrifood Research Finland [email protected] Franklin Simtowe ICRISAT [email protected] Francisca Smith Bioversity International [email protected] Amare Tegbaru IITA [email protected] Nils Teufel ILRI [email protected] Chris Udry Yale University [email protected] Ruth Vargas Hill IFPRI [email protected]

Page 10: Impact Evaluation in Agriculture - Abdul Latif Jameel ... · Impact Evaluation in Agriculture ... and investments in the future and children -in East ... II Professor of Economics

2012-01-19 ATAI-IITA Impact Evaluation Workshop 1 of 2

IMPACT EVALUATION IN AGRICULTURE

Groups Group 1: Temina Madon Isabelle Baltenweck, ILRI Mohamadou Fadiga, ILRI Delia Grace, ILRI Samuel Mburu, ILRI Petra Saghir, ILRI Nils Teufel, ILRI Group 2: Willa Friedman Paul Guthiga, ILRI-ReSAKSS Jane Poole, ILRI-ICRAF Nicholas Ndiwa, ILRI-ICRAF Stella Massawe, ILRI-ReSAKSS Jacqueline Kaytsie, East Africa Dairy Development Project Beatrice Nabwire, East Africa Dairy Development Project David Kimani, East Africa Dairy Development Project Joseph Karugia, ILRI-ReSAKSS Group 3: Sam Bazzi Geoffrey Muricho, CIMMYT Zachary Gitonga , CIMMYT Oliver Kirui, CIMMYT Richard Musebe, CABI Dannie Romney, CABI Rodney Lunduka, CABI Juliette Page, CABI Ignatius Abaijuka, IFPRI (Harvest Plus) Group 4: Maryam Janani Elijah Mwangi, ICRISAT Bernad Munyuwa, ICRISAT Kizito Mazvimavi, ICRISAT Henry Kamkwamba, ICRISAT Rosina Manjala, ICRISAT Essa Mussa, ICRISAT

Page 11: Impact Evaluation in Agriculture - Abdul Latif Jameel ... · Impact Evaluation in Agriculture ... and investments in the future and children -in East ... II Professor of Economics

2012-01-19 ATAI-IITA Impact Evaluation Workshop 2 of 2

Group 5: Chris Udry (Monday and early Tuesday)/Steve Franzel (Tuesday 3:15-6pm and Thursday session) Francisca Smith, Bioversity International Marja Mutanen, University of Helsinki Anna Hiltunen, University of Helsinki Robert Fungo, Bioversity International Steven Franzel, ICRAF Amos Gyau, ICRAF Evelyne Kiptot, ICRAF Judith Oduol, ICRAF Mila Sell, MTT Agrifood Research Finland Group 6: Marshall Burke Victor Manyong, IITA Arega Alene, IITA Tahirou Abdoulaye, IITA Joseph Rusike, IITA Emily Ouma, IITA Margaret Kroma, AGRA/ATAI Board Member Group 7: Karen Macours Holger Kirscht, IITA Amare Tegbaru, IITA Sander Muilerman, IITA Manson Nwafor, IITA Ousmane Coulibaly, IITA Richard Caldwell, The Bill and Melinda Gates Foundation/ATAI Board Member Group 8: Ruth Vargas Hill Olu Ajayi, ICRAF Anne Degrande, ICRAF Kalie Gold, One Acre Fund Jessica Marter Kenyon, One Acre Fund Kai Mausch, ICRISAT Christin Schipmann, ICRSAT Franklin Simtowe, ICRISAT

Page 12: Impact Evaluation in Agriculture - Abdul Latif Jameel ... · Impact Evaluation in Agriculture ... and investments in the future and children -in East ... II Professor of Economics

2012-01-19 ATAI-IITA Impact Evaluation Workshop 1 of 1

IMPACT EVALUATION IN AGRICULTURE

Group Presentation Guide Participants will be placed into 6-8 person groups which will work through the design process for an evaluation of an intervention that considers technology adoption or final impact. Groups will be aided in this project by both the faculty and teaching assistants with the work culminating in presentations at the end of the week. The goal of the group presentations is to consolidate and apply the knowledge of the lectures. We encourage groups to work on projects that are relevant to participants’ organizations. All groups will present on Thursday. The 15-minute presentation is followed by a 15- minute question-answer session. We provide groups with template slides for their presentation (template can be found in this course packet). While the groups do not need to follow this exactly, the presentation should have no more than 9 slides (including title slide, excluding appendix) and should include the following topics:

• Brief project background • Theory of change • Evaluation question • Outcomes • Evaluation design • Data and sample size • Potential validity threats and how to manage them • Dissemination strategy of results

Please time yourself and do not exceed the allotted time. We have only a limited amount of time for these presentations, so we will follow a strict timeline to be fair to all groups.

Page 13: Impact Evaluation in Agriculture - Abdul Latif Jameel ... · Impact Evaluation in Agriculture ... and investments in the future and children -in East ... II Professor of Economics

11

Drafting Theory of Change on Adoption and Impact

Assignment: On Monday, your group will present the design of a randomized evaluation to the rest of the participants. During this hour, your team must agree on which project you will collaborate, and draft the project’s theory of change. In later sessions, you will think more specifically about the research design and randomization strategy.

This hour is divided into 5 sections: 1. Project Introductions (20 minutes)

Each member of the group should discuss a research project on which s/he is working or think would be interesting for the group to use. Justify why this project should be chosen, ideally offering an informal needs assessment to justify the program. After hearing options from each of the group members, choose one project. For this exercise, please think about issues of adoption and impact in particular.

2. Theory of Change (15 minutes) Construct a logical framework or theory of change for the team’s chosen intervention. What are the inputs, outputs, intermediate outcomes, and impact? Where does adoption fit into this framework?

3. Indicators (15 minutes) What are some indicators used to measure stages or outcomes identified in the log-frame?

4. What might go wrong? (8 minutes) What can go wrong with the implementation of or concept behind the intervention? How can this be measured? Think about the process evaluation of this program.

5. Unintended Consequences (2 minutes) What unintended consequences might come of this intervention? How can these be measured? You can also think of this as being part of the impact evaluation of the program.

Background: According to Rossi, Freeman, and Lipsy,1 a comprehensive program evaluation is made up of five elements: (1) Needs assessment, which is conducted to identify the key policy issues—where social indicators are lagging, as well as the potential sources of the problem. Ideally, an intervention is conceived after need is established. (2) Program theory assessment is an umbrella term used to describe the process of drawing up the blueprints of an intervention. More familiar than the term, “program theory assessment,” are the specific examples it is meant to encapsulate: logical framework, results

1 Evaluation: A systematic approach

Page 14: Impact Evaluation in Agriculture - Abdul Latif Jameel ... · Impact Evaluation in Agriculture ... and investments in the future and children -in East ... II Professor of Economics

12

framework, theory of change, etc. Upon implementation, a (3) process evaluation can be conducted to ensure that the services are being delivered and that the program is being run efficiently. Distinguishing it from “monitoring,” process evaluation is usually seen as an external activity—meant to report on implementation, rather than used for day-to-day management. (4) Impact evaluation will test the causal impact of the program on the most important outcomes—i.e. establish how the program changed the lives of those in its catchment area. Once all these different evaluations are completed, (5) a cost-effectiveness of cost-benefit analysis can be conducted to test whether the costs associated to the program are sufficiently outweighed by its benefits. This will inform whether it is worth scaling it up to a wider population.

Page 15: Impact Evaluation in Agriculture - Abdul Latif Jameel ... · Impact Evaluation in Agriculture ... and investments in the future and children -in East ... II Professor of Economics

13

This case study is based on the paper "Nudging Farmers to Use Fertilizer: Theory

and Experimental Evidence from Kenya." by Esther Duflo (MIT), Michael Kremer

(Harvard), and Jonathan Robinson (UCSC)

J-PAL thanks the authors for allowing us to use their paper

Case 3: Wome

Examining Barriers to Fertilizer Use in Kenya

TRANSLATING RESEARCH INTO ACTION

Case 1: How to Randomize

Page 16: Impact Evaluation in Agriculture - Abdul Latif Jameel ... · Impact Evaluation in Agriculture ... and investments in the future and children -in East ... II Professor of Economics

14

Key Vocabulary

Overview

Adoption of fertilizer in much of sub-Saharan Africa is quite low even though there are clear benefits to its

use.1 In 2009, a group of researchers with Innovations for Poverty Action in Kenya began a project to

investigate three possible ways to increase fertilizer adoption among subsistence farmers in western

Kenya. At the time of harvest, farmers were provided with coupons for discounted fertilizer to be

purchased within the following few weeks. Researchers encouraged farmers to form cooperatives to share

information about their farming. Lastly, measuring spoons were provided to farmers to allow them to

apply the appropriate about of fertilizer on their plants.

How would these interventions affect farmers’ uptake of the right amount of fertilizer? What experimental

designs could test the impact of this intervention on fertilizer adoption? Which of these interventions is

primarily responsible for improved fertilizer use?

Problem

Agricultural productivity in Sub-Saharan Africa has stagnated over the past decades: although total

output has risen, food production has not kept up with the increase in Africa’s population. The number of

chronically undernourished people in Africa has increased to 200 million in 1997-99.2 When used

correctly, chemical fertilizer can substantially raise agricultural yields, yet usage of fertilizer remains low.

Many reasons exist for this underinvestment in fertilizer by smallholder farmer. Some past studies have

suggested that usage is low because of:

1. “Time inconsistent preferences”: Farmers have difficulty saving harvest income to purchase

fertilizer for the next growing season. At harvest time, farmers may have the cash on-hand that

could be used to purchase fertilizer for the following growing season, but they have hard time

holding on to that money until it is time to buy the fertilizer. Research has shown that if farmers

are able to purchase fertilizer for the next season at the time of harvest they are much more likely

to use fertilizer.3

2. Lack of information: Farmers have limited information on the benefits of using fertilizer

properly. Though there is some awareness about fertilizer, farmers are not aware of the

appropriate application techniques and amounts of fertilizer necessary to maximize their

profitability.

3. Lack of knowledge sharing: Farmers do not pass knowledge about fertilizer use to each

other. Even if a farmer in a community is a user of fertilizer, he or she does not tend to share this

information with fellow farmers. Randomized Evaluation

1 (The World Bank 2008)

2 (Harsch n.d.)

3 (Duflo, Kremer and Robinson n.d.)

1. Level of Randomization: the level of observation (ex. individual, household,

school, village) at which treatment and comparison groups are randomly assigned.

Page 17: Impact Evaluation in Agriculture - Abdul Latif Jameel ... · Impact Evaluation in Agriculture ... and investments in the future and children -in East ... II Professor of Economics

15

Attempting to understand these barriers to fertilizer adoption,

researchers in western Kenya devised a research project to investigate these questions. The project,

Examining Barriers to Fertilizer Use in Kenya, focused on small-scale subsistence farmers in rural

Western Kenya, many of whom grow maize as their staple crop. All farmers in this population are

extremely poor, earning on the order of $1 per day. Previous research in this area has shown that when

used correctly, top dressing fertilizer can increase yields by about 48%, amounting to a 36% rate of return

on this investment over just a few months. However, only 40% of sampled farmers in the Busia district of

Western Kenya report ever having used fertilizer.4

In 2009, Innovations for Poverty Action in Kenya (IPAK) began the implementation and evaluation of this

two year project. IPAK oversaw the implementation of three interventions described below for 20,000

subsistence farmers in rural western Kenya.

Interventions

The researchers used three interventions to examine the barriers above.

The first intervention, designed to address farmers’ difficulties with time preferences, IPAK distributed

small, time-limited discounts, which were valid within a three week window right after harvest,

redeemable at a local shop. Farmers received coupons for a discount of about 15% of the price of fertilizer,

for up to 25 kilograms. With this coupon farmers would have greater incentive to use their earnings from

harvest to purchase fertilizer for the next season’s crop.

The second intervention included efforts to catalyze the establishment of farmers’ networks. Groups of

farmers were encouraged to form farmers’ networks with their friends and neighbors to talk about

fertilizer and agricultural practices. The researchers organized the groups and coordinated the first few

meetings, but did not provide any direct information to the groups. The logic behind these efforts is that if

farmers have an established network to communicate about farming practices, perhaps information about

fertilizer can spread more swiftly.

In the third intervention, IPAK supplied measuring spoons to farmers so that they could apply the

appropriate amount of fertilizer to their plants.5 The research team visited farmers and provided them

with ½ teaspoon measuring spoons, as well as information about the returns to using ½ teaspoon of

fertilizer per plant. To enable diffusion of this technology to others in the community, the spoons were

made available in nearby fertilizer shops to other farmers for a nominal fee. In addition, when distributing

the measuring spoons, the farmers were given vouchers for spoons which they could give to their friends.

Addressing the research questions through experimental design

Different randomization strategies may be used to answer different questions. What strategies could be

used to evaluate the following questions? How would you design the study? Who would be in the

treatment and control groups, and how would they be randomly assigned to these groups?

4 (Duflo, Kremer and Robinson n.d.)

5 (Duflo, Kremer and Robinson n.d.)

Page 18: Impact Evaluation in Agriculture - Abdul Latif Jameel ... · Impact Evaluation in Agriculture ... and investments in the future and children -in East ... II Professor of Economics

16

Discussion Topic 1: Testing the effectiveness of coupons

1. What is the relative effectiveness of coupons?

Discussion Topic 2: Testing the effectiveness of social networks

1. What is the effect of forming cooperatives?

2. What is the effect of supplying measuring spoons?

Discussion Topic 3: Addressing all questions with a single evaluation

1. Could a single evaluation explore all of these issues at once?

2. What randomization strategy could do so?

3. What do think about the time-specific aspect of the coupon? How would you design a

project to disentangle the different attributes of the coupon?

Bibliography

Duflo, Esther, Michael Kremer, and Jonathan Robinson. "Nudging Farmers to Use Fertilizer: Theory and

Experimental Evidence from Kenya." n.d.

Harsch, Ernest. Agriculture: Africa’s Engine for Growth. n.d.

http://www.un.org/ecosocdev/geninfo/afrec/vol17no4/174ag.htm.

The World Bank. World Development Report 2008: Agriculture for Development. Washington, DC: The

World Bank, 2008.

Page 19: Impact Evaluation in Agriculture - Abdul Latif Jameel ... · Impact Evaluation in Agriculture ... and investments in the future and children -in East ... II Professor of Economics

17

This case study is based on a current study by Esther Duflo and Tavneet

Suri. J-PAL thanks the authors for allowing us to use their project

TRANSLATING RESEARCH INTO ACTION

Page 20: Impact Evaluation in Agriculture - Abdul Latif Jameel ... · Impact Evaluation in Agriculture ... and investments in the future and children -in East ... II Professor of Economics

18

Key Vocabulary

In 2010, the Technoserve (TNS) Coffee Initiative partnered with J-PAL researchers to conduct a randomized evaluation on their coffee agronomy-training program in Nyarubaka sector in southern Rwanda. Technoserve carried out their regular recruitment sign-up processes across all 27 villages in the sector and registered 1600 coffee farmers who were interested in attending the monthly training modules. The study design for the evaluation then required that this pool of farmers be split into treatment and control groups, meaning those who would participate in the training, and those who wouldn’t (for now—they would be trained in later phases). The trainings in Nyarubaka included 800 coffee farmers, randomly selected from the pool of 1600. Randomization ensures that the treatment and comparison groups are equivalent at the beginning, mitigating concern for selection bias. But it cannot ensure that they remain comparable until the end of the program. Nor can it ensure that people comply with the treatment, or even the non-treatment, that they were assigned. Life also goes on after the randomization: other events besides the program happen between initial randomization and the end-line data collection. These events can reintroduce selection bias; they diminish the validity of the impact estimates and are threats to the integrity of the experiment. How can common threats to experimental integrity be managed?

1. Equivalence: groups are identical on all baseline characteristics, both observable and unobservable. Ensured by randomization. 2. Attrition: the process of individuals joining in or dropping out of either the treatment or comparison group over the course of the study. 3. Attrition Bias: statistical bias which occurs when individuals systematically join in or drop out of either the treatment or the comparison group for reasons related to the treatment. 4. Partial Compliance: individuals do not comply with their assignment (to treatment or comparison). Also termed "diffusion" or "contamination." 5. Intention to Treat: the measured impact of a program that includes all data from participants in the groups to which they were randomized, regardless of whether they actually received the treatment. Intention-to-treat analysis prevents bias caused by the loss of participants, which may disrupt the baseline equivalence established by randomization and which may reflect non-adherence to the protocol. 6. Treatment on the Treated: the measured impact of a program that includes only the data for participants who actually received the treatment. 7. Externality: an indirect cost or benefit incurred by individuals who did not directly receive the treatment. Also termed "spillover."

Page 21: Impact Evaluation in Agriculture - Abdul Latif Jameel ... · Impact Evaluation in Agriculture ... and investments in the future and children -in East ... II Professor of Economics

19

Evaluation design — The experiment as planned As previously mentioned, the agronomy training evaluation consisted of 1600 farmers, half of which attended monthly training sessions, and the other half did not. In addition, there was a census done of the entire sector to show us which households were coffee farmers and which ones were not. The census showed that there were 5400 households in Nyarubaka - 2400 non-coffee farming households and 3000 coffee farming households (1600 of which were already in our sample). Each month a Technoserve farmer trainer would gather the farmers assigned to his/her group and conduct a training module on farming practices (e.g. weeding, pruning, bookkeeping, etc). The farmers were taught the best practices by using a practice plot so they could see and do exactly what the instructor was explaining. To think about: How can we be certain that the control group farmers did not attend the training too? What can be done to reduce this risk? Since we have a census for Nyarubaka, how might this be helpful in at least controlling for or documenting any spillovers? Think about what can be done at the trainings themselves. What type of data might you need/want to try to control for any spillovers in this case? What were other forms or opportunities for agronomy training in the area?

Threats to integrity of the planned experiment

Discussion Topic 1: Threats to experimental integrity

Randomization ensures that the groups are equivalent, and therefore comparable, at the beginning of the program. The impact is then estimated as the difference

between the average outcome of the treatment group and the average outcome of the comparison group, both at the end of the program. To be able to say that the program caused the impact, you need to be able to say that the program was the only difference between the treatment and comparison groups over the course of the evaluation.

1. What does it mean to say that the groups are equivalent at the start of the

program?

2. Can you check if the groups are equivalent at the beginning of the program? How?

3. Other than the program’s direct and indirect impacts, what can happen over the course of the evaluation (after conducting the random assignment) to make the groups non-equivalent?

4. How does non-equivalence at the end threaten the integrity of the experiment?

5. In the Technoserve agronomy training example, why is it useful to randomly

select from the farmers who signed up for the Technoserve training program, rather than amongst all the coffee farmers in the sector?

Page 22: Impact Evaluation in Agriculture - Abdul Latif Jameel ... · Impact Evaluation in Agriculture ... and investments in the future and children -in East ... II Professor of Economics

20

Managing attrition—when the groups do not

remain equivalent Attrition is when people join or drop out of the sample—both treatment and comparison groups—over the course of the experiment. One common example in clinical trials is when people die; so common indeed that attrition is sometimes called experimental mortality.

Discussion Topic 2: Managing Attrition

You are looking at how much farmers adopt the recommendations and techniques from the agronomy trainings. Using a stylized example, let’s divide adoption of the techniques as follows: Full adoption = score of 2

Partial adoption = score of 1 No adoption = score of 0

Let’s assume that there are 1800 farmers: 900 treatment farmers who receive the training and 900 comparison farmers who do not receive the training. After you randomize and collect some baseline data, you determine that the treatment and comparison groups are equivalent, meaning farmers from each of the three categories are equally represented in both groups.

Suppose protocol compliance is 100 percent: all farmers who are in the treatment go to the training and none of the farmers in the comparison attend the training. Let’s Farmers who attend all agronomy trainings end up with full adoption, scoring a 2. Let’s assume that there was a drought during this period, and those who adopted best-practices managed to protect their crops against damage. However, the farmers who have adoption level 0 see most of their crops perish, and members of

the household enter the migrant labor market to generate additional income. The

number of farmers in each treatment group, and each adoption category is shown for both the pre-adoption and post-adoption.

Pre-adoption Post-adoption

Adoption Level

Treatment Comparison Treatment Comparison

0 300 300 0 Dropped

out

1 300 300 0 300

2 300 300 900 300

Total farmers in

the sample 900 900 900 600

1. a. At program end, what is the average adoption for the treatment group? b. At program end, what is the average adoption for the comparison group? c. What is the difference?

d. Is this outcome difference an accurate estimate of the impact of the program? Why or why not?

e. If it is not accurate, does it overestimate or underestimate the impact? f. How can we get a better estimate of the program’s impact?

2. Besides level of adoption, the Technoserve agronomy training evaluation also looked at outcome measures such as yields and farm labor. a. Would differential attrition (i.e. differences in drop-outs between treatment

and comparison groups) bias either of these outcomes? How? b. Would the impacts on these final outcome measures be underestimated or

overestimated?

Page 23: Impact Evaluation in Agriculture - Abdul Latif Jameel ... · Impact Evaluation in Agriculture ... and investments in the future and children -in East ... II Professor of Economics

21

3. In the Technoserve agronomy evaluation, identify some other causes for

attrition in the Treatment group and the Control groups? What can be done to mitigate these?

4. You may know of other research designs to measure impact, such as non-experimental or quasi-experimental methodologies (eg. pre-post, difference-in-difference, regression discontinuity, instrumental variables (IV), etc) a. Is the threat of attrition unique to randomized evaluations?

Managing partial compliance—when the treatment does

not actually get treated or the comparison gets treated Some people assigned to the treatment may in the end not actually get treated. In an after-school tutoring program, for example, some children assigned to receive tutoring may simply not show up for tutoring. And the others assigned to the comparison may obtain access to the treatment, either from the program or from another provider. Or comparison group children may get extra help from the teachers or acquire program materials and methods from their classmates. In any of these scenarios, people are not complying with their assignment in the planned experiment. This is called “partial compliance” or “diffusion” or, less benignly, “contamination.” In contrast to carefully-controlled lab experiments, diffusion is ubiquitous in social programs. After all, life goes on, people will be people, and you have no control over what they decide to do over the course of the experiment. All you can do is plan your experiment and offer them treatments. How, then, can you deal with the complications that arise from partial compliance?

Discussion Topic 3: Managing partial compliance

Suppose that farmers who have adoption level 0 are too risk averse to adopt the

techniques they learn at the training. Famers believe that there is no way for them to adopt the techniques that are described in early trainings and stop attending. Consequently, none of the treatment farmers with adoption level 0 increased their adoption and remained at level 0 at the end of the program. No one assigned to comparison had attended the trainings. All the farmers in the sample at the beginning of the program were followed up.

PreadoptionPreadoption

Postadoption

Adoption Level Treatment Comparison Treatment Comparison

0 300 300 300 300

1 300 300 0 300

2 300 300 600 300

Total farmers in

the sample 900 900 900 900

1. Calculate the impact estimate based on the original group assignments. a. Is this an unbiased measure of the effect of the program?

b. In what ways is it useful and in what ways is it not as useful?

You are interested in learning the effect of treatment on those actually treated (“treatment on the treated” (TOT) estimate).

2. Five of your colleagues are passing by your desk; they all agree that you should calculate the effect of the treatment using only the 10,000 farmers who attended the training.

a. Is this advice sound? Why or why not?

Page 24: Impact Evaluation in Agriculture - Abdul Latif Jameel ... · Impact Evaluation in Agriculture ... and investments in the future and children -in East ... II Professor of Economics

22

3. Another colleague says that it’s not a good idea to drop the farmers who

stopped attending the trainings entirely; you should use them but consider them as part of the control group. a. Is this advice sound? Why or why not?

4. Another colleague suggests that you use the compliance rates, the

proportion of people in each group that did or did not comply with their treatment assignment. You should divide the “intention to treat” estimate by the difference in the treatment ratios (i.e. proportions of each experimental group that received the treatment). a. Is this advice sound? Why or why not?

Managing spillovers—when the comparison, itself

untreated, benefits from the treatment being treated People assigned to the control group may benefit indirectly from those receiving treatment. For example, a program that distributes insecticide-treated nets may reduce malaria transmission in the community, indirectly benefiting those who themselves do not sleep under a net. Such effects are called externalities or spillovers.

Discussion Topic 4: Managing spillovers

In the Technoserve agronomy training evaluation, randomization was at the farmer level, meaning that while one farmer might have been selected to be in the training, his

neighbor didn’t have the same fortunes during the randomization process. Depending on the evaluation and the nature of the program, it might be more challenging to prevent spillovers of agronomic knowledge between friends, than it is for delivering hard tangible objects in farmers’ hands, like a weighing scale or calendar to maintain harvest records.

1. How do you imagine spillovers might occur in agronomy training?

2. What types of mechanisms can you think of that could be used to reduce or manage spillovers?

Measuring Spillovers

Discussion Topic 5: Measuring spillovers

1. Can you think of ways to design the experiment explicitly to measure the spillovers of the agronomy training?

Page 25: Impact Evaluation in Agriculture - Abdul Latif Jameel ... · Impact Evaluation in Agriculture ... and investments in the future and children -in East ... II Professor of Economics

Impact Evaluation in Agriculture Exercise 1 Exercise 1: The mechanics of random assignment using MS Excel ®

Part 1: simple randomization Like most spreadsheet programs MS Excel has a random number generator function. Say we had a list of schools and wanted to assign half to treatment and half to control

(1) We have all our list of schools.

Page 26: Impact Evaluation in Agriculture - Abdul Latif Jameel ... · Impact Evaluation in Agriculture ... and investments in the future and children -in East ... II Professor of Economics

Incorporating Random Assignment into the Research Design

The Abdul Latif Jameel Poverty Action Lab @MIT, Cambridge, MA 02130, USA | @IFMR, Chennai 600 008, India | @PSE, Paris 75014, France

2

(2) Assign a random number to each school: The function RAND () is Excel’s random number generator. To use it, in Column C, type in the following = RAND() in each cell adjacent to every name. Or you can type this function in the top row (row 2) and simply copy and paste to the entire column, or click and drag.

Typing = RAND() puts a 15-digit random number between 0 and 1 in the cell.

Page 27: Impact Evaluation in Agriculture - Abdul Latif Jameel ... · Impact Evaluation in Agriculture ... and investments in the future and children -in East ... II Professor of Economics

Remedial Education: Evaluating the Balsakhi Program

The Abdul Latif Jameel Poverty Action Lab @MIT, Cambridge, MA 02130, USA | @IFMR, Chennai 600 008, India | @PSE, Paris 75014, France

3

(3) Copy the cells in Colum C, then paste the values over the same cells The function, =RAND() will re-randomize each time you make any changes to any other part of the spreadsheet. Excel does this because it recalculates all values with any change to any cell. (You can also induce recalculation, and hence re-randomization, by pressing the key F9.) This can be confusing, however. Once we’ve generated our column of random numbers, we do not need to re-randomize. We already have a clean column of random values. To stop excel from recalculating, you can replace the “functions” in this column with the “values”. To do this, highlight all values in Column C. Then right-click anywhere in the highlighted column, and choose Copy. Then right click anywhere in that column and chose Paste Special. The “Paste Special window will appear. Click on “Values”.

Page 28: Impact Evaluation in Agriculture - Abdul Latif Jameel ... · Impact Evaluation in Agriculture ... and investments in the future and children -in East ... II Professor of Economics

Incorporating Random Assignment into the Research Design

The Abdul Latif Jameel Poverty Action Lab @MIT, Cambridge, MA 02130, USA | @IFMR, Chennai 600 008, India | @PSE, Paris 75014, France

4

(4) Sort the columns in either descending or ascending order of column C:

Highlight columns A, B, and C. In the data tab, and press the Sort button:

A Sort box will pop up.

In the Sort by column, select “random #”. Click OK. Doing this sorts the list by the random number in ascending or descending order, whichever you chose.

Page 29: Impact Evaluation in Agriculture - Abdul Latif Jameel ... · Impact Evaluation in Agriculture ... and investments in the future and children -in East ... II Professor of Economics

Remedial Education: Evaluating the Balsakhi Program

The Abdul Latif Jameel Poverty Action Lab @MIT, Cambridge, MA 02130, USA | @IFMR, Chennai 600 008, India | @PSE, Paris 75014, France

5

There! You have a randomly sorted list.

(5) Sort the columns in either descending or ascending order of column C: Because your list is randomly sorted, it is completely random whether schools are in the top half of the list, or the bottom half. Therefore, if you assign the top half to the treatment group and the bottom half to the control group, your schools have been “randomly assigned”. In column D, type “T” for the first half of the rows (rows 2-61). For the second half of the rows (rows 62-123), type “C”

Page 30: Impact Evaluation in Agriculture - Abdul Latif Jameel ... · Impact Evaluation in Agriculture ... and investments in the future and children -in East ... II Professor of Economics

Incorporating Random Assignment into the Research Design

The Abdul Latif Jameel Poverty Action Lab @MIT, Cambridge, MA 02130, USA | @IFMR, Chennai 600 008, India | @PSE, Paris 75014, France

6

Re-sort your list back in order of school id. You’ll see that your schools have been randomly assigned to treatment and control groups

Page 31: Impact Evaluation in Agriculture - Abdul Latif Jameel ... · Impact Evaluation in Agriculture ... and investments in the future and children -in East ... II Professor of Economics

Remedial Education: Evaluating the Balsakhi Program

The Abdul Latif Jameel Poverty Action Lab @MIT, Cambridge, MA 02130, USA | @IFMR, Chennai 600 008, India | @PSE, Paris 75014, France

7

Part 2: stratified randomization Stratification is the process of dividing a sample into groups, and then randomly assigning individuals within each group to the treatment and control. The reasons for doing this are rather technical. One reason for stratifying is that it ensures subgroups are balanced, making it easier to perform certain subgroup analyses. For example, if you want to test the effectiveness on a new education program separately for schools where children are taught in Hindi versus schools where children are taught in Gujarati, you can stratify by “language of instruction” and ensure that there are an equal number schools of each language type in the treatment and control groups.

(1) We have all our list of schools and potential “strata”. Mechanically, the only difference in random sorting is that instead of simply sorting by the random number, you would first sort by language, and then the random number. Obviously, the first step is to ensure you have the variables by which you hope to stratify.

(2) Sort by strata and then by random number Assuming you have all the variables you need: in the data tab, click “Sort”. The Sort window will pop up. Sort by “Language”. Press the button, “Add Level”. Then select, “Random #”.

Page 32: Impact Evaluation in Agriculture - Abdul Latif Jameel ... · Impact Evaluation in Agriculture ... and investments in the future and children -in East ... II Professor of Economics

Incorporating Random Assignment into the Research Design

The Abdul Latif Jameel Poverty Action Lab @MIT, Cambridge, MA 02130, USA | @IFMR, Chennai 600 008, India | @PSE, Paris 75014, France

8

(3) Assign Treatment – Control Status for each group. Within each group of languages, type “T” for the first half of the rows, and “C” for the second half.

Page 33: Impact Evaluation in Agriculture - Abdul Latif Jameel ... · Impact Evaluation in Agriculture ... and investments in the future and children -in East ... II Professor of Economics

Exercise 2: Power Exercise Overview

Key Vocabulary:

Sample size calculations In this exercise, we’ll use a currently ongoing research project to explore the same issues of sample size and power: the project is Promoting Agricultural Technology Adoption in Rwanda. In this example, we’re interested in measuring the effect of a treatment (agronomy training) on outcomes measured at the household level: specifically, productivity. However, the randomization of trainings was done at the village level. It could be that our outcome of interest is correlated for farmers in the same village, for reasons that have nothing to do with the training itself. For example, all the farmers in a village may be affected by their original shared knowledge, by whether their land is especially fertile or not, or whether their weather patterns are helpful; these factors mean that when one farmer in the village does particularly well for this reason, all the farmers in that village probably also do better—which might have nothing to do with the training. Therefore, if we sample 100 households from 10 randomly selected villages, that sample is less representative of the population of villages in a district than if we selected 100 random households from the whole population of villages, and therefore absorbs less variance. In effect, we have a smaller sample size than we think. This will lead to more noise in our sample, and hence larger standard error than in the usual case of independent sampling. However, sampling some households in fewer villages may be cheaper – since travel costs between villages are likely greater than travel costs within villages. When planning both the sample size and the best way to sample villages, we need to take both statistical and budgetary issues into account. This exercise will help you understand how to do that. Should you sample every household in just a few villages? Should you sample a few households from many villages? How do you decide? We will work through these questions by determining the sample size that allows us to detect

1. Power: the likelihood that, when the program has an effect, one will be able to distinguish the effect from zero given the sample size. 2. Significance: the likelihood that the measured effect did not occur by chance. Statistical tests are performed to determine whether one group (e.g. the experimental group) is different from another group (e.g. comparison group) on the measurable outcome variables used in the evaluation. 3. Standard Deviation: a standardized measure of the variation of a sample population from its mean on a given characteristic/outcome. Mathematically, the square root of the variance. 4. Standardized Effect Size: a standardized measure of the [expected] magnitude of the effect of a program. 5. Cluster: the level of observation at which a sample size is measured. Generally, observations which are highly correlated with each other should be clustered and the sample size should be measured at this clustered level. 6. Intra-cluster Correlation Coefficient: a measure of the correlation between observations within a cluster; i.e. the level of correlation in drinking water source for individuals in a household.

Page 34: Impact Evaluation in Agriculture - Abdul Latif Jameel ... · Impact Evaluation in Agriculture ... and investments in the future and children -in East ... II Professor of Economics

a specific effect with at least 80% power. Remember: power is the likelihood that when the treatment has an effect, you will be able to distinguish it from zero in your sample. In this example, “clusters” refer to “clusters of households”—in other words, a village. This exercise shows you how the power of your sample changes with the number of clusters, the size of the clusters, the size of the treatment effect and the Intraclass Correlation Coefficient. We will use a software program developed by Steve Raudebush (with funding from the William T. Grant Foundation) called Optimal Design. (You can find additional resources on clustered designs on their web site.). We will also repeat the exercise in Stata and Excel, exploring the capabilities of each. Section 1: Using the OD Software (Windows PCs only) – 30 minutes First, download the OD software from the website (a software manual is also available): http://sitemaker.umich.edu/group-based/optimal_design_software When you open it, you will see a screen which looks like the one below. Select the menu option “Design” to see the primary menu. Select the option “Cluster Randomized Trials with person-level outcomes,” “Cluster Randomized Trials,” and then “Treatment at level 2.” You’ll see several options to generate graphs; choose “Power vs. Total number of clusters (J).”

A new window will appear:

Select α (alpha). You’ll see it is already set to 0.050 for a 95% significance level. First, let’s assume we want to survey only 50 households per village. How many villages do you need to go to in order to have a statistically significant answer? Click on n, which represents the number of households per village. Since we are surveying only 50 households per village, fill in n(1) with 50 and click OK.

Page 35: Impact Evaluation in Agriculture - Abdul Latif Jameel ... · Impact Evaluation in Agriculture ... and investments in the future and children -in East ... II Professor of Economics

Now we have to determine δ (delta), the standardized effect size (the effect size divided by the standard deviation of the variable of interest). Assume we are interested in detecting whether there is an increase of 10% in coffee cherry production. (Or more accurately, are uninterested in an effect less than 10%.) Our baseline survey indicated that the average production is 104 KG of coffee cherries, with a standard deviation of 109 KG. We want to detect an effect size of 10% of 104, which is 10.4. We divide 10.4 by the standard deviation to get δ equal to 10.49/109, or 0.096. Select δ from the menu. In the dialogue box that appears there is a prefilled value of 0.200 for delta(1). Change the value to 0.096, and change the value of delta(2) to empty. Select OK. Finally we need to choose ρ (rho), which is the intra-cluster correlation. ρ tells us how strongly the outcomes are correlated for units within the same cluster. If households from the same village were clones (no variation) and all produced the exact same amount of coffee cherries, then ρ would equal 1. If, on the other hand, households from the same villages are in fact independent—and there was no differences between villages, then ρ would equal 0. You have determined in your pilot study that ρ is 0.034. Fill in rho(1) to 0.034, and set rho(2) to empty. You should see a graph similar to the one below.

Page 36: Impact Evaluation in Agriculture - Abdul Latif Jameel ... · Impact Evaluation in Agriculture ... and investments in the future and children -in East ... II Professor of Economics

You’ll notice that your x axis isn’t long enough to allow you to see what number of clusters would

give you 80% power. Click on the button to set your x axis maximum to 500. Then, you can click on the graph with your mouse to see the exact power and number of clusters for a particular point, as seen below.

Exercise 3.1: How many villages are needed to achieve 80% power? 90% power? Now you have seen how many clusters you need for 80% power, sampling 50 households per village. Suppose instead that you only have the ability to go to 150 villages, due to budget constraints. Exercise 3.2: Given a constraint of 150 villages, how many households per village are needed to achieve 80% power? 90% power? Choose different values for n to see how your graph changes.

Page 37: Impact Evaluation in Agriculture - Abdul Latif Jameel ... · Impact Evaluation in Agriculture ... and investments in the future and children -in East ... II Professor of Economics

Finally, let’s see how the Intraclass Correlation Coefficient (ρ) changes the power of a given sample. Leave rho(1) to be 0.034 but for comparison change rho(2) to 0.00. You should see a graph like the one below. The solid blue curve is the one with the parameters you’ve set - based on your estimates of the effect of agronomy training on farmer productivity. The blue dashed curve is there for comparison – to see how much power you would get from your sample if ρ were zero. Look carefully at the graph. Exercise 3.3: How does the power of the sample change with the Intraclass Correlation Coefficient (ρ)?

To take a look at some of the other menu options, close the graph by clicking on the in the top right hand corner of the inner window. Select the Cluster Randomized Trial menu again.

Page 38: Impact Evaluation in Agriculture - Abdul Latif Jameel ... · Impact Evaluation in Agriculture ... and investments in the future and children -in East ... II Professor of Economics

Exercise 3.4: Try generating graphs for how power changes with cluster size (n), intra-class correlation (ρ) and effect size (δ). You will have to re-enter your parameters each time you open a new graph. Section 2: Using Stata – 30 minutes For this section, we’ll be using Stata. Stata is a powerful data analysis software. Often, complicated tasks – things which would take several steps in another software, such as Optimal Design or Excel – take only a single line of code in Stata. The problem, then, is knowing which Stata code to use. For computing sample size and power, the relevant Stata code is: sampsi and sampclus. The first command, sampsi, is default in Stata. The second one, sampclus, is an additional .ado file – you’ll need to download that before beginning this exercise. To get familiar with sampsi and sampclus, try typing in help sampsi and help sampclus. (To download sampclus, you can either type findit sampclus into Stata, and then download the file from the URL Stata provides.) The remaining instructions for this exercise can be found in the pre-prepared .do file, 2. CIMMYT_MX_Stata.do.

Page 39: Impact Evaluation in Agriculture - Abdul Latif Jameel ... · Impact Evaluation in Agriculture ... and investments in the future and children -in East ... II Professor of Economics

Section 3: Using Excel - Optional The same data which was used in Stata is available for our Excel exercise. We’ll be conducting the same power calculations for the same sample – so, ideally, we should be getting the same results. The benefits of the Excel exercise are that it builds the power analysis from the ground up: you manually calculate each part of the power analysis function (provided in the lecture). The remaining instructions for this exercise can be found in the Excel file, 3. CIMMYT_MX_Excel.xls. Section 4: Working within a budget constraint – 45 minutes A typical constraint to sample size is the budget - and this may sound discouraging, given that other aspects of power (the intra-class correlation coefficient, effect size and original summary statistics) are also beyond the researcher’s control. Indeed, it may sometimes appear that power is a direct consequence of funds available. That said, it’s not a one-to-one relationship. A number of factors impact the funds needed. One example: some villages may be further away and hard to reach. And perhaps the villages which are easier to reach are also similar in other characteristics. So we would pick up some variance by spending a little extra to visit the more remote villages as well. What about variables? Some measures are cheaper to collect – such as a household survey – and some are more expensive – such as checking water quality. Researchers must identify (and prioritize) the data collection instruments which will minimize cost. The remaining instructions for this exercise can be found in the Excel file, 4. CIMMYT_MX_budget.xls. General tips However, researchers can positively impact their power by

The looser the level of significance we impose, the more likely we are to reject the null, i.e. the higher the power: but also the more likely we are to make false positive (type II) errors.

The higher the MDE, the higher the power. The lower the variance of the underlying population, the lower the variance of the

estimated effect size and the higher the power. The larger the sample size, the lower the variance of our estimate effect and the higher

the power. The more evenly the sample is distributed between treatment and comparison, the

higher the power. Individual-level randomization is more powerful than group-level randomization given

the same sample size. The more outcomes are correlated within groups in a group-level randomization, the less

power.

Page 40: Impact Evaluation in Agriculture - Abdul Latif Jameel ... · Impact Evaluation in Agriculture ... and investments in the future and children -in East ... II Professor of Economics

1/18/2012

1

GROUP PRESENTATIONTEMPLATE

You don’t have to follow this exactly. This is just a guideline.

Background

Talk briefly about general context, needs assessment, problem you want to solve.

Theory of Change

Describe the specific intervention you are evaluating

Talk about how it will solve part of the problem you described in the background

You may want to mention other causes of a problem that your intervention will not solve

(you can use the TOC template in the appendix)

Evaluation Questions and Outcomes

These should be directly linked to the TOC described above.

What outcomes do you need to measure to test your research hypothesis?

Evaluation Design

Unit of randomization, type of randomization (why did you choose these).

The actual randomization design, i.e. specific treatment group(s).

Data and Sample Size

Outcomes

Tell us where you will get the data – survey? Administrative?

Power calcs Justify where you got effect size and rho from, don’t

make it up.

You may need to do separate power calcs for separate outcomes.

Page 41: Impact Evaluation in Agriculture - Abdul Latif Jameel ... · Impact Evaluation in Agriculture ... and investments in the future and children -in East ... II Professor of Economics

1/18/2012

2

Potential Challenges

Talk about threats (attrition, spillover, etc.) and how you want to manage them.

You may need to revise your power calcs

Results

Why and for whom they would be useful.

How would you disseminate them.

Appendix Theory of Change

Need Assessment

Intervention

Final Outcome

Need Assessment

Intervention

Assumptions

Intermediary outcomes

Outcomes

Page 42: Impact Evaluation in Agriculture - Abdul Latif Jameel ... · Impact Evaluation in Agriculture ... and investments in the future and children -in East ... II Professor of Economics

Checklist For Reviewing a Randomized Controlled Trial of a Social Program or Project, To Assess Whether It Produced Valid Evidence

Page 43: Impact Evaluation in Agriculture - Abdul Latif Jameel ... · Impact Evaluation in Agriculture ... and investments in the future and children -in East ... II Professor of Economics

Updated February 2010 This publication was produced by the Coalition for Evidence-Based Policy, with funding support from the William T. Grant Foundation, Edna McConnell Clark Foundation, and Jerry Lee Foundation. This publication is in the public domain. Authorization to reproduce it in whole or in part for educational purposes is granted. We welcome comments and suggestions on this document ([email protected]).

2

Page 44: Impact Evaluation in Agriculture - Abdul Latif Jameel ... · Impact Evaluation in Agriculture ... and investments in the future and children -in East ... II Professor of Economics

3

Checklist For Reviewing a Randomized Controlled Trial of a Social Program or Project, To Assess Whether It Produced Valid Evidence

This is a checklist of key items to look for in reading the results of a randomized controlled trial of a social program, project, or strategy (“intervention”), to assess whether it produced valid evidence on the intervention’s effectiveness. This checklist closely tracks guidance from both the U.S. Office of Management and Budget (OMB) and the U.S. Education Department’s Institute of Education Sciences (IES)1; however, the views expressed herein do not necessarily reflect the views of OMB or IES. This checklist limits itself to key items, and does not try to address all contingencies that may affect the validity of a study’s results. It is meant to aid – not substitute for – good judgment, which may be needed for example to gauge whether a deviation from one or more checklist items is serious enough to undermine the study’s findings. A brief appendix addresses how many well-conducted randomized controlled trials are needed to produce strong evidence that an intervention is effective.

Checklist for overall study design

Random assignment was conducted at the appropriate level – either groups (e.g., classrooms,

housing projects), or individuals (e.g., students, housing tenants), or both.

Random assignment of individuals is usually the most efficient and least expensive approach. However, it may be necessary to randomly assign groups – instead of, or in addition to, individuals – in order to evaluate (i) interventions that may have sizeable “spillover” effects on nonparticipants, and (ii) interventions that are delivered to whole groups such as classrooms, housing projects, or communities. (See reference 2 for additional detail.2)

The study had an adequate sample size – one large enough to detect meaningful effects of the intervention.

Whether the sample is sufficiently large depends on specific features of the intervention, the sample population, and the study design, as discussed elsewhere.3 Here are two items that can help you judge whether the study you’re reading had an adequate sample size: If the study found that the intervention produced statistically-significant effects (as discussed

later in this checklist), then you can probably assume that the sample was large enough. If the study found that the intervention did not produce statistically-significant effects, the

study report should include an analysis showing that the sample was large enough to detect meaningful effects of the intervention. (Such an analysis is known as a “power” analysis.4)

Reference 5 contains illustrative examples of sample sizes from well-conducted randomized controlled trials conducted in various areas of social policy.5

Page 45: Impact Evaluation in Agriculture - Abdul Latif Jameel ... · Impact Evaluation in Agriculture ... and investments in the future and children -in East ... II Professor of Economics

Checklist to ensure that the intervention and control groups remained equivalent

during the study

The study report shows that the intervention and control groups were highly similar in key

characteristics prior to the intervention (e.g., demographics, behavior).

If the study asked sample members to consent to study participation, they provided such consent before learning whether they were assigned to the intervention versus control group.

If they provided consent afterward, their knowledge of which group they are in could have affected their decision on whether to consent, thus undermining the equivalence of the two groups.

Few or no control group members participated in the intervention, or otherwise benefited from

it (i.e., there was minimal “cross-over” or “contamination” of controls).

The study collected outcome data in the same way, and at the same time, from intervention and control group members.

The study obtained outcome data for a high proportion of the sample members originally

randomized (i.e., the study had low sample “attrition”).

As a general guideline, the studies should obtain outcome data for at least 80 percent of the sample members originally randomized, including members assigned to the intervention group who did not participate in or complete the intervention. Furthermore, the follow-up rate should be approximately the same for the intervention and the control groups. The study report should include an analysis showing that sample attrition (if any) did not undermine the equivalence of the intervention and control groups.

The study, in estimating the effects of the intervention, kept sample members in the original group to which they were randomly assigned. This even applies to:

Intervention group members who failed to participate in or complete the intervention (retaining

them in the intervention group is consistent with an “intention-to-treat” approach); and

Control group members who may have participated in or benefited from the intervention (i.e., “cross-overs,” or “contaminated” members of the control group).6

Checklist for the study’s outcome measures

The study used “valid” outcome measures – i.e., outcome measures that are highly correlated

with the true outcomes that the intervention seeks to affect. For example:

Tests that the study used to measure outcomes (e.g., tests of academic achievement or psychological well-being) are ones whose ability to measure true outcomes is well-established.

4

Page 46: Impact Evaluation in Agriculture - Abdul Latif Jameel ... · Impact Evaluation in Agriculture ... and investments in the future and children -in East ... II Professor of Economics

If sample members were asked to self-report outcomes (e.g., criminal behavior), their reports were corroborated with independent and/or objective measures if possible (e.g., police records).

The outcome measures did not favor the intervention group over the control group, or vice-versa.

For instance, a study of a computerized program to teach mathematics to young students should not measure outcomes using a computerized test, since the intervention group will likely have greater facility with the computer than the control group.7

The study measured outcomes that are of policy or practical importance – not just

intermediate outcomes that may or may not predict important outcomes.

As illustrative examples: (i) the study of a pregnancy prevention program should measure outcomes such as actual pregnancies, and not just participants’ attitudes toward sex; and (ii) the study of a remedial reading program should measure outcomes such as reading comprehension, and not just the ability to sound out words.

Where appropriate, the members of the study team who collected outcome data were

“blinded” – i.e., kept unaware of who was in the intervention and control groups.

Blinding is important when the study measures outcomes using interviews, tests, or other instruments that are not fully structured, possibly allowing the person doing the measuring some room for subjective judgment. Blinding protects against the possibility that the measurer’s bias (e.g., as a proponent of the intervention) might influence his or her outcome measurements. Blinding would be important, for example, in a study that measures the incidence of hitting on the playground through playground observations, or a study that measures the word identification skills of first graders through individually-administered tests.

Preferably, the study measured whether the intervention’s effects lasted long enough to constitute meaningful improvement in participants’ lives (e.g., a year, hopefully longer).

This is important because initial intervention effects often diminish over time – for example, as changes in intervention group behavior wane, or as the control group “catches up” on their own.

Checklist for the study’s reporting of the intervention’s effects

If the study claims that the intervention has an effect on outcomes, it reports (i) the size of the

effect, and whether the size is of policy or practical importance; and (ii) tests showing the effect is statistically significant (i.e., unlikely to be due to chance).

These tests for statistical significance should take into account key features of the study design, including:

Whether individuals (e.g., students) or groups (e.g., classrooms) were randomly assigned;

Whether the sample was sorted into groups prior to randomization (i.e., “stratified,” “blocked,” or

“paired”); and Whether the study intends its estimates of the intervention’s effect to apply only to the sites (e.g.,

housing projects) in the study, or to be generalizable to a larger population.

5

Page 47: Impact Evaluation in Agriculture - Abdul Latif Jameel ... · Impact Evaluation in Agriculture ... and investments in the future and children -in East ... II Professor of Economics

The study reports the intervention’s effects on all the outcomes that the study measured, not just those for which there is a positive effect.

This is so you can gauge whether any positive effects are the exception or the pattern. In addition, if the study found only a limited number of statistically-significant effects among many outcomes measured, it should report tests showing that such effects were unlikely to have occurred by chance.

Appendix: How many randomized controlled trials are needed to produce strong

evidence of effectiveness?

To have strong confidence that an intervention would be effective if faithfully replicated, one generally would look for evidence including the following:

The intervention has been demonstrated effective, through well-conducted randomized controlled trials, in more than one site of implementation.

Such a demonstration might consist of two or more trials conducted in different implementation sites, or alternatively one large multi-site trial.

The trial(s) evaluated the intervention in the real-world community settings and conditions where it would normally be implemented (e.g., community drug abuse clinics, public schools, job training program sites).

This is as opposed to tightly-controlled conditions, such as specialized sites that researchers set up at a university for purposes of the study, or settings where the researchers themselves administer the intervention.

There is no strong countervailing evidence, such as well-conducted randomized controlled trials of the intervention showing an absence of effects.

6

Page 48: Impact Evaluation in Agriculture - Abdul Latif Jameel ... · Impact Evaluation in Agriculture ... and investments in the future and children -in East ... II Professor of Economics

7

References 1 U.S. Office of Management and Budget (OMB), What Constitutes Strong Evidence of Program Effectiveness, http://www.whitehouse.gov/omb/part/2004_program_eval.pdf, 2004; U.S. Department of Education’s Institute of Education Sciences, Identifying and Implementing Educational Practices Supported By Rigorous Evidence, http://www.ed.gov/rschstat/research/pubs/rigorousevid/index.html, December 2003; What Works Clearinghouse of the U.S. Education Department’s Institute of Education Sciences, Key Items To Get Right When Conducting A Randomized Controlled Trial in Education, prepared by the Coalition for Evidence-Based Policy, http://ies.ed.gov/ncee/wwc/pdf/guide_RCT.pdf.

2 Random assignment of groups rather than, or in addition to, individuals may be necessary in situations such as the following:

(a) The intervention may have sizeable “spillover” effects on individuals other than those who receive it.

For example, if there is good reason to believe that a drug-abuse prevention program for youth in a public housing project may produce sizeable reductions in drug use not only among program participants, but also among their peers in the same housing project (through peer-influence), it is probably necessary to randomly assign whole housing projects to intervention and control groups to determine the program’s effect. A study that only randomizes individual youth within a housing project to intervention versus control groups will underestimate the program’s effect to the extent the program reduces drug use among both intervention and control-group students in the project.

(b) The intervention is delivered to groups such as classrooms or schools (e.g., a classroom curriculum or

schoolwide reform program), and the study seeks to distinguish the effect of the intervention from the effect of other group characteristics (e.g., quality of the classroom teacher).

For example, in a study of a new classroom curriculum, classrooms in the sample will usually differ in two ways: (i) whether they use the new curriculum or not, and (ii) who is teaching the class. Therefore, if the study (for example) randomly assigns individual students to two classrooms that use the curriculum versus two classrooms that don’t, the study will not be able to distinguish the effect of the curriculum from the effect of other classroom characteristics, such as the quality of the teacher. Such a study therefore probably needs to randomly assign whole classrooms and teachers (a sufficient sample of each) to intervention and control groups, to ensure that the two groups are equivalent not only in student characteristics but also in classroom and teacher characteristics. For similar reasons, a study of a schoolwide reform program will probably need to randomly assign whole schools to intervention and control groups, to ensure that the two groups are equivalent not only in student characteristics but also school characteristics (e.g., teacher quality, average class size).

3 What Works Clearinghouse of the U.S. Education Department’s Institute of Education Sciences, Key Items To Get Right When Conducting A Randomized Controlled Trial in Education, op. cit., no. 1. 4 Resources that may be helpful in reviewing or conducting power analyses include: the William T. Grant Foundation’s free consulting service in the design of group-randomized trials, at http://sitemaker.umich.edu/group-based/consultation_service; Steve Raudenbush et. al., Optimal Design Software for Group Randomized Trials, at http://sitemaker.umich.edu/group-based/optimal_design_software; Peter Z. Schochet, Statistical Power for Random Assignment Evaluations of Education Programs (http://www.mathematica-mpr.com/publications/PDFs/statisticalpower.pdf), prepared for the U.S. Education Department’s Institute of Education Sciences, June 22, 2005; and Howard S. Bloom, “Randomizing Groups to Evaluate Place-Based Programs,” in Learning More from Social Experiments: Evolving Analytical Approaches, edited by Howard S. Bloom. New York: Russell Sage Foundation Publications, 2005, pp. 115-172. 5 Here are illustrative examples of sample sizes from well-conducted randomized controlled trials in various areas of social policy: (i) 4,028 welfare applicants and recipients were randomized in a trial of Portland Oregon’s Job Opportunities and Basic Skills Training Program (a welfare-to work program), to evaluate the program’s effects on employment and earnings – see http://evidencebasedprograms.org/wordpress/?page_id=140; (ii) between 400 and 800 women were randomized in each of three trials of the Nurse-Family Partnership (a nurse home visitation program for low-income, pregnant women), to evaluate the program’s effects on a range of maternal and child outcomes, such as child abuse and neglect, criminal arrests, and welfare dependency – see http://evidencebasedprograms.org/wordpress/?page_id=57; 206 9th graders were randomized in a trial of Check and

Page 49: Impact Evaluation in Agriculture - Abdul Latif Jameel ... · Impact Evaluation in Agriculture ... and investments in the future and children -in East ... II Professor of Economics

8

Connect (a school dropout prevention program for at-risk students), to evaluate the program’s effects on dropping out of school – see http://evidencebasedprograms.org/wordpress/?page_id=92; 56 schools containing nearly 6000 students were randomized in a trial of LifeSkills Training (a substance-abuse prevention program), to evaluate the program’s effects on students’ use of drugs, alcohol, and tobacco – see http://evidencebasedprograms.org/wordpress/?page_id=128. 6 The study, after obtaining estimates of the intervention’s effect with sample members kept in their original groups, can sometimes use a “no-show” adjustment to estimate the effect on intervention group members who actually participated in the intervention (as opposed to no-shows). A variation on this technique can sometimes be used to adjust for “cross-overs.” See Larry L. Orr, Social Experimentation: Evaluating Public Programs With Experimental Methods, Sage Publications, Inc., 1999, p. 62 and 210; and Howard S. Bloom, “Accounting for No-Shows in Experimental Evaluation Designs,” Evaluation Review, vol. 8, April 1984, pp. 225-246. 7 Similarly, a study of a crime prevention program that involves close police supervision of program participants should not use arrest rates as a measure of criminal outcomes, because the supervision itself may lead to more arrests for the intervention group.

Page 50: Impact Evaluation in Agriculture - Abdul Latif Jameel ... · Impact Evaluation in Agriculture ... and investments in the future and children -in East ... II Professor of Economics

 

Evaluating Social Programs Course: Evaluation Glossary

(Sources: 3ie and The World Bank) Attribution The extent to which the observed change in outcome is the result of the intervention, having allowed for all other factors which may also affect the outcome(s) of interest. Attrition Either the drop out of subjects from the sample during the intervention, or failure to collect data from a subject in subsequent rounds of a data collection. Either form of attrition can result in biased impact estimates. Baseline Pre-intervention, ex-ante. The situation prior to an intervention, against which progress can be assessed or comparisons made. Baseline data are collected before a program or policy is implemented to assess the “before” state. Bias The extent to which the estimate of impact differs from the true value as a result of problems in the evaluation or sample design. Cluster A cluster is a group of subjects that are similar in one way or another. For example, in a sampling of school children, children who attend the same school would belong to a cluster, because they share the same school facilities and teachers and live in the same neighborhood. Cluster sample Sample obtained by drawing a random sample of clusters, after which either all subjects in selected clusters constitute the sample or a number of subjects within each selected cluster is randomly drawn. Comparison group A group of individuals whose characteristics are similar to those of the treatment groups (or participants) but who do not receive the intervention. Comparison groups are used to approximate the counterfactual. In a randomized evaluation, where the evaluator can ensure that no confounding factors affect the comparison group, it is called a control group. Confidence level The level of certainty that the true value of impact (or any other statistical estimate) will fall within a specified range.

Page 51: Impact Evaluation in Agriculture - Abdul Latif Jameel ... · Impact Evaluation in Agriculture ... and investments in the future and children -in East ... II Professor of Economics

Confounding factors Other variables or determinants that affect the outcome of interest. Contamination When members of the control group are affected by either the intervention (see “spillover effects”) or another intervention that also affects the outcome of interest. Contamination is a common problem as there are multiple development interventions in most communities. Cost-effectiveness An analysis of the cost of achieving a one unit change in the outcome. The advantage compared to cost-benefit analysis, is that the (often controversial) valuation of the outcome is avoided. Can be used to compare the relative efficiency of programs to achieve the outcome of interest. Counterfactual The counterfactual is an estimate of what the outcome would have been for a program participant in the absence of the program. By definition, the counterfactual cannot be observed. Therefore it must be estimated using comparison groups. Dependent variable A variable believed to be predicted by or caused by one or more other variables (independent variables). The term is commonly used in regression analysis. Difference-in-differences (also known as double difference or D-in-D) The difference between the change in the outcome in the treatment group compared to the equivalent change in the control group. This method allows us to take into account any differences between the treatment and comparison groups that are constant over time. The two differences are thus before and after and between the treatment and comparison groups. Evaluation Evaluations are periodic, objective assessments of a planned, ongoing or completed project, program, or policy. Evaluations are used to answer specific questions often related to design, implementation and/or results. Ex ante evaluation design An impact evaluation design prepared before the intervention takes place. Ex ante designs are stronger than ex post evaluation designs because of the possibility of considering random assignment, and the collection of baseline data from both treatment and control groups. Also called prospective evaluation. Ex post evaluation design An impact evaluation design prepared once the intervention has started, and possibly been completed. Unless the program was randomly assigned, a quasi-experimental design has to be used.

Page 52: Impact Evaluation in Agriculture - Abdul Latif Jameel ... · Impact Evaluation in Agriculture ... and investments in the future and children -in East ... II Professor of Economics

External validity The extent to which the causal impact discovered in the impact evaluation can be generalized to another time, place, or group of people. External validity increases when the evaluation sample is representative of the universe of eligible subjects. Follow-up survey Also known as “post-intervention” or “ex-post” survey. A survey that is administered after the program has started, once the beneficiaries have benefited from the program for some time. An evaluation can include several follow-up surveys. Hawthorne effect The “Hawthorne effect” occurs when the mere fact that you are observing subjects makes them behave differently. Hypothesis A specific statement regarding the relationship between two variables. In an impact evaluation the hypothesis typically relates to the expected impact of the intervention on the outcome. Impact The effect of the intervention on the outcome for the beneficiary population. Impact evaluation An impact evaluation tries to make a causal link between a program or intervention and a set of outcomes. An impact evaluation tries to answer the question of whether a program is responsible for changes in the outcomes of interest. Contrast with “process evaluation”. Independent variable A variable believed to cause changes in the dependent variable, usually applied in regression analysis. Indicator An indicator is a variable that measures a phenomenon of interest to the evaluator. The phenomenon can be an input, an output, an outcome, or a characteristic. Inputs The financial, human, and material resources used for the development intervention. Intention to treat (ITT) estimate The average treatment effect calculated across the whole treatment group, regardless of whether they actually participated in the intervention or not. Compare to “treatment on the treated estimate”. Intra-cluster correlation Intra-cluster correlation is correlation (or similarity) in outcomes or characteristics between subjects that belong to the same cluster. For example, children that attend the

Page 53: Impact Evaluation in Agriculture - Abdul Latif Jameel ... · Impact Evaluation in Agriculture ... and investments in the future and children -in East ... II Professor of Economics

same school would typically be similar or correlated in terms of their area of residence or socio-economic background. Logical model Describes how a program should work, presenting the causal chain from inputs, through activities and outputs, to outcomes. While logical models present a theory about the expected program outcome, they do not demonstrate whether the program caused the observed outcome. A theory-based approach examines the assumptions underlying the links in the logical model. John Henry effect The “John Henry effect” happens when comparison subjects work harder to compensate for not being offered a treatment. When one compares treated units to those “harder-working” comparison units, the estimate of the impact of the program will be biased: we will estimate a smaller impact of the program than the true impact we would find if the comparison units did not make the additional effort. Minimum desired effect Minimum change in outcomes that would justify the investment that has been made in an intervention, accounting not only for the cost of the program and the type of benefits that it provides, but also on the opportunity cost of not having invested funds in an alternative intervention. The minimum desired effect is an input for power calculations: evaluation samples need to be large enough to detect at least the minimum desired effects with sufficient power. Null hypothesis A null hypothesis is a hypothesis that might be falsified on the basis of observed data. The null hypothesis typically proposes a general or default position. In evaluation, the default position is usually that there is no difference between the treatment and control group, or in other words, that the intervention has no impact on outcomes. Outcome A variable that measures the impact of the intervention. Can be intermediate or final, depending on what it measures and when. Output The products and services that are produced (supplied) directly by an intervention. Outputs may also include changes that result from the intervention which are relevant to the achievement of outcomes. Power calculation A calculation of the sample required for the impact evaluation, which depends on the minimum effect size that we want to be able to detect (see “minimum desired effect”) and the required level of confidence. Pre-post comparison Also known as a before and after comparison. A pre-post comparison attempts to establish the impact of a program by tracking changes in outcomes for program

Page 54: Impact Evaluation in Agriculture - Abdul Latif Jameel ... · Impact Evaluation in Agriculture ... and investments in the future and children -in East ... II Professor of Economics

beneficiaries over time using measures both before and after the program or policy is implemented. Process evaluation A process evaluation is an evaluation that tries to establish the level of quality or success of the processes of a program. For example: adequacy of the administrative processes, acceptability of the program benefits, clarity of the information campaign, internal dynamics of implementing organizations, their policy instruments, their service delivery mechanisms, their management practices, and the linkages among these. Contrast with “impact evaluation”. Quasi-experimental design Impact evaluation designs that create a control group using statistical procedures. The intention is to ensure that the characteristics of the treatment and control groups are identical in all respects, other than the intervention, as would be the case in an experimental design. Random assignment An intervention design in which members of the eligible population are assigned at random to either the treatment group (receive the intervention) or the control group (do not receive the intervention). That is, whether someone is in the treatment or control group is solely a matter of chance, and not a function of any of their characteristics (either observed or unobserved). Random sample The best way to avoid a biased or unrepresentative sample is to select a random sample. A random sample is a probability sample where each individual in the population being sampled has an equal chance (probability) of being selected. Randomized evaluation (RE) (also known as randomized controlled trial, or RCT) An impact evaluation design in which random assignment is used to allocate the intervention among members of the eligible population. Since there should be no correlation between participant characteristics and the outcome, and differences in outcome between the treatment and control can be fully attributed to the intervention, i.e. there is no selection bias. However, REs may be subject to several types of bias and so need follow strict protocols. Also called “experimental design”. Regression analysis A statistical method which determines the association between the dependent variable and one or more independent variables. Selection bias A possible bias introduced into a study by the selection of different types of people into treatment and comparison groups. As a result, the outcome differences may potentially be explained as a result of pre-existing differences between the groups, rather than the treatment itself.

Page 55: Impact Evaluation in Agriculture - Abdul Latif Jameel ... · Impact Evaluation in Agriculture ... and investments in the future and children -in East ... II Professor of Economics

Significance level The significance level is usually denoted by the Greek symbol, α (alpha). Popular levels of significance are 5% (0.05), 1% (0.01) and 0.1% (0.001). If a test of significance gives a p-value lower than the α-level, the null hypothesis is rejected. Such results are informally referred to as 'statistically significant'. The lower the significance level, the stronger the evidence required. Choosing level of significance is an arbitrary task, but for many applications, a level of 5% is chosen, for no better reason than that it is conventional. Spillover effects When the intervention has an impact (either positive or negative) on units not in the treatment group. Ignoring spillover effects results in a biased impact estimate. If there are spillover effects then the group of beneficiaries is larger than the group of participants. Stratified sample Obtained by dividing the population of interest (sampling frame) into groups (for example, male and female), then by drawing a random sample within each group. A stratified sample is a probabilistic sample: every unit in each group (or strata) has the same probability of being drawn. Treatment group The group of people, firms, facilities or other subjects who receive the intervention. Also called participants. Treatment on the treated (TOT) estimate The treatment on the treated estimate is the impact (average treatment effect) only on those who actually received the intervention. Compare to intention to treat. Unobservables Characteristics which cannot be observed or measured. The presence of unobservables can cause selection bias in quasi-experimental designs.