using standardized patients to measure quality · problem diarrhea heart problem other bp gas heart...

Using Standardized Patients to Measure Quality

Early Learning

Jishnu Das (World Bank)

Based on joint work with Aakash Mohpal, Karthik Muralidharan, Alaka Holla, Abhijit Banerjee, Reshma Hussam, Madhukar Pai, Ada Kwan, Sofi Bergkvist Ben Daniels, Amy Dolinger, Guadalupe Bedoya, Jorge Coarasa and Ana Goicoechea.

Measuring quality appropriately is hard….• Four problems

• Accounting for case and patient-mix• Accounting for Hawthorne effects• Allowing for case-specific inference (did the doctor do the right thing given what the

patient has)• Allow for distinguishing under and over-treatment

• Options• Medical Vignettes measure clinical knowledge, but knowledge need not reflect

practice (generically it can’t if behavior matters….)• Observations of doctor-patient interactions: Measures practice but suffers from all

four problems (1 and 2 may be smaller than thought, 3 and 4 are very serious problems)

• Standardized patients: People recruited from local communities and extensively trained to depict the same case to multiple providers. Interaction details obtained through structured questionnaire within 1 hour of the interaction.

Standardized patients

• Extensively used in the U.S. in medical schools (and part of the examination system)

• Large number of studies looking at various aspects of validation (more so for medical education)

• Limited studies on viability in the field with large sample sizes both (in high-income countries as well)

• Here: Document early learning from SP studies, focusing on methods and emerging substantive issues

Studies• Cross-section study from population-based sample of providers in rural India

(Madhya Pradesh) and urban India (Delhi) for three tracer conditions—unstable angina in a 45 y.o. male, asthma in a 25 y.o. female/male, dysentery for a 2 y.o. child who is sleeping at home

• Cross-section study among sample of public providers with dual practices in rural India (MP) for angina, asthma and dysentery: SPs sent to both public and private practices of the same provider

• Randomized Control Trial of extended training (4 hours a week for 9 months) for informal sector providers in Birbhum, West Bengal: SPs used to evaluate impact on angina, asthma and dysentery. NOTE: Evaluation completely firewalled from implementation, so that training foundation did not know the cases that would be tested

• Cross-section pilot of 4 variants of Tuberculosis cases among 100 providers in Delhi

• Cross-section pilot of angina, asthma, dysentery and one TB case among 42 clinics in Nairobi, Kenya

Process and Timeline• Case Development: 2-3 months. Uses experts, anthropologists and

panel that understands both the local context and the medical details of the case

• SP recruitment and further case development: Several exercises and interviews to emerge at SPs who will be trained (typically 50% of people recruited make the cut): 1 week

• SP training and script development: 3-4 weeks, eventually decreasing the SP pool by another 50%

• Survey and data: 2 weeks to 3 months; data virtually immediate

• IRB: First pilot survey done with informed consent from providers. If required, larger survey done with waiver of consent after first proving public benefit and no harm to either participating providers or SPs

Validation• SP detection: In cases with consent, go back and ask providers whether

they have seen an SP (and if yes, what they presented with and their age). • Detection rates ranging from 0% (Kenya) to 3% (TB, with highly compressed

schedule)

• Harm to SPs: In pilot, 3 cases arose across all 5 studies where an SP was exposed to an injection or a finger prick (with sterile needles in all cases): Protocols revised accordingly (For instance: don’t leave hands on table)

• Harm to providers: No self-reported harm (TB), time taken is 3-5 minutes

• Inter-rater agreement: This is an inappropriate validation technique, since it assumes that there is little variation in performance for the same provider. Better to use SP fixed-effects and test for joint significant of SP fixed-effects (typically small)

Validation• Do providers treat patients “as if” they had the real

case they were presenting with, or does the patient lead them to suspect that nothing is wrong?

• If latter, more history taking and examinations would increasingly lead the provider to not do anything

• Typically, we find the opposite: The less the provider did, the more likely they were to get it wrong. Those who did the most were also more likely to think that the SP had the case that they were presenting with

• Extreme examples include providers trying to immediately take the angina case to hospital

Some results

• Quality of care is very low in India across all cases

Process measures in MI

3.89 minutes

2.89 questions

1.46 exams

2.34 medicines

Rs. 31

An average interaction

Das and others, 2012

Diagnosis of MI

0.09

0.43

0.48

Correct

Partial

Wrong

Diagnosis Correct, if given?(% of interactions)

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.40

0.45

Allergy Breathingproblem

Diarrhea Heartproblem

Other BP Gas Heartproblem

Muscleproblem

Weather Other

Partially correct Wrong

Partially correct and wrong diagnoses, if given (% of interactions)

0.44

0.56

Diagnosis given

Diagnosis not given

MI - Diagnosis given at all? (% of interactions)

Das and others, 2012

Representative sample

Public PrivateDifference

(2)-(1)

Panel A: Unstable Angina

Correct treatment 0.04 0.08 0.05

Correct treatment (alternate) 0.55 0.48 -0.07

Aspirin 0.03 0.04 0.02

Anti-platelet agents 0.03 0.01 -0.02

Referred 0.30 0.24 -0.05

ECG 0.24 0.23 -0.02

ECG & Referred 0.11 0.12 0.01

Antibiotic 0.14 0.17 0.03

Unnecessary treatment 0.66 0.74 0.09

Number of observations 37 180

Panel B: Asthma

Correct treatment 0.47 0.61 0.14*

Bronchodilators 0.33 0.36 0.03

Theophylline 0.13 0.22 0.09*

Oral Corticosteroids 0.15 0.31 0.16**

Antibiotic 0.38 0.40 0.02



Panel C: Dysentery


ORS 0.08 0.13 0.05

Asked to see child 0.33 0.14 -0.20***

Antibiotic 0.44 0.61 0.18**

Unnecessary treatment 0.11 0.41 0.30***


Example from TB• Patient with 3

week history of cough and fever. Took medicines from chemist but is not feeling better

Using SPs with dual samples

• Adherence to checklist and treatment patterns respond to incentives

Dual practice sample

Public PrivateDifference

(4)-(3)

Panel A: Unstable Angina

Correct treatment 0.03 0.30 0.27***

Correct treatment (alternate) 0.42 0.61 0.20*

Aspirin 0.03 0.23 0.20***

Referred 0.22 0.32 0.10

ECG 0.28 0.35 0.08

ECG & Referred 0.08 0.16 0.08

Antibiotic 0.28 0.23 -0.05


Panel B: Asthma


Bronchodilators 0.52 0.59 0.07

Theophylline 0.31 0.31 0.00

Oral Corticosteroids 0.16 0.24 0.09

Antibiotic 0.59 0.46 -0.14*

Unnecessary treatment 0.91 0.83 -0.08*

Panel C: Dysentery

Correct treatment 0.33 0.22 -0.11*

ORS 0.33 0.22 -0.11*

Asked to see child 0.27 0.42 0.15**

Antibiotic 0.75 0.61 -0.13*

Unnecessary treatment 0.43 0.33 -0.10

Extensive training improves care

• Birbhum Evaluation, West Bengal

• Training improved checklist adherence for all cases • (And the untrained were better than the

PHCs to begin with)

• Large improvements in correct treatments from very low base

• No significant change in incorrect treatments at initially very high levels• Incorrect antibiotic use

But

• These differences are dwarfed by differences across countries in terms of correct treatment, with differences in unnecessary treatment all over the place

Early Learnings

• SPs are a viable tool for understanding a broad system of care in population based samples• They also provide information on case-specific information (correct diagnosis

rates for instance) that cannot be obtained easily by any other means, particularly for rare cases

• Metrics behave the way we expect them w.r.t. incentives and training

• Key issue is difference between checklist and treatment, which we don’t fully understand• Rules of thumb versus discretionary treatments

• While marginal costs are low, fixed costs are high in terms of money and time (economies of scale are immense)

using standardized patients to measure quality · problem diarrhea heart problem other bp gas heart...

Documents