how to evaluate and improve the quality of mhealth behaviour change tools

31
Yorkshire Centre for Health Informatics How to evaluate & improve the quality of mHealth behaviour change tools ? Jeremy Wyatt DM FRCP ACMI Fellow Leadership chair in eHealth research, University of Leeds; Clinical Adviser on New Technologies, Royal College of Physicians, London [email protected]

Upload: john-rooksby

Post on 19-Aug-2015

125 views

Category:

Health & Medicine


1 download

TRANSCRIPT

  1. 1. Yorkshire Centre for Health Informatics How to evaluate & improve the quality of mHealth behaviour change tools ? Jeremy Wyatt DM FRCP ACMI Fellow Leadership chair in eHealth research, University of Leeds; Clinical Adviser on New Technologies, Royal College of Physicians, London [email protected]
  2. 2. Agenda Why mHealth BC tools ? Is there a quality problem ? What does quality mean: In general ? For mHealth BC tools ? What methods might improve quality ? How to evaluate the quality of these tools ? Some example studies Conclusions
  3. 3. Why mHealth for behaviour change ? 1. Face-to-face contacts do not scale 2. Smart phone hardware used by > 75% of adults: Cheap, convenient, fashionable Inbuilt sensors / wearables allow easy measurements Multiple channels: SMS, MMS, voice, video, apps 3. mHealth software enables: Unobtrusive alerts to record data, take action Incorporation of BCTs (eg. BCTs present in 96% of adherence apps, median 2 BCTs) Tailoring makes BC more effective (corrected d=0.16 - SR of 21 studies on BC websites, Lustria, J H Comm 2013)
  4. 4. Why digital channels ? 8.60 5.00 2.83 0.15 0 1 2 3 4 5 6 7 8 9 10 Face to face Letter Telephone Digital Costinperencounter Mean public sector cost per completed encounter across 120 councils Source: Cabinet Office Digital efficiency report, 2013
  5. 5. A broken market ? y = 3.7 - 0.1x R = 0.016 0 5 10 15 20 25 30 35 40 45 0 5 10 15 20 25 30 35 ApppriceUSdollars Evidence score: high score means app adheres to US Preventive Service Task Force guidelines Price ($US) of 47 smoking cessation apps versus evidence score (data from Abroms et al 2013)
  6. 6. Privacy and mHealth apps Permissions requested: use accounts, modify USB, read phone ID, find files, full net access, view connections Our study of 80 apps: average of 4 clear privacy breaches for health apps, only 1 for medical apps We know that - we read the Terms & Conditions ! (this one only 1200 words, but many much longer) FirstFolioAsYouLikeItPublicDomainPhototakenbyCowardlyLion-FolioSociety editionof1996 With Hannah Panayiotou & Anam Noel, Leeds medical students
  7. 7. What is Quality ? The totality of features and characteristics of a product or service that bear on its ability to satisfy stated or implied needs (ISO 9000:2005 Quality management systems -- Fundamentals and vocabulary) IE. Quality is about fitness for purpose - a good quality product complies with client requirements [That 30-page ISO document costs 96-50!]
  8. 8. Possible processes to improve quality of mHealth tools Methods Advantages Disadvantages Examples Wisdom of the crowd Simple user ranking Hard for users to assess quality; click factory bias Current app stores MyHealthApps Users apply quality criteria Explicit Requires widespread dissemination; can everyone apply them ? RCP checklist Classic peer reviewed article Rigorous (?) Slow, resource intensive, doesnt fit App model 47 PubMed articles Physician peer review Timely Dynamic Not as rigorous Scalable ? iMedicalApps, MedicalAppJournal Developer self- certification Dynamic Requires developers to understand & comply; checklist must fit apps HON Code ? RCP checklist Developer support Resource light Technical knowledge needed Multitude of developers BSI PAS 277 CE marking, external regulation Credible Slow, expensive, apps dont fit national model NHS App Store, FDA, MHRA
  9. 9. BSI publically accessible standard 277 on health & medical apps PAS is not a BSI Standard, but a supportive framework for developers Steering group includes DHACA, BUPA, NHS AHSN, app developers, notified bodies, HSCIC, RCP, NIHR mental health technology centre, patient rep, medical device manufacturer Now published on BSI web site free access
  10. 10. Regulation of medical apps by FDA, FCC If classified as a medical device by FDA a product must demonstrate efficacy, but: Only 100 apps so far classified as a medical device Decision to exercise enforcement discretion on most medical apps So, FDA has not actually banned any apps, yet However, the Federal Communication Commission has banned some apps with misleading claims, eg. Acne Cure (no evidence of claimed benefit of iPhone screen backlight) Sharpe, New England Center for Investigative Reporting, Many health apps are based on flimsy science at best, and often do not work. Washington Post, November 12th 2012
  11. 11. We need to think differently Old think New Think Paternalism: we know & determine what is best for users Self determination: users decide what is best for them Regulation will eliminate harmful Apps after release Prevent bad Apps - help App developers understand safety & quality The NHS must control Apps, apply rules and safety checks Self regulation by developer community Consumer choice informed by truth in labelling App developers are in control Aristotles civil society* is in control Quality is best achieved by laws and regulations Quality is best achieved by consensus and culture change The aim of Apps is innovation (sometimes above other considerations) App innovation must balance benefits and risks An Apps market driven by viral campaigns, unfounded claims of benefit An Apps market driven by fitness for purpose (ISO) & evidence of benefit The elements that make up a democratic society, such as freedom of speech, an independent judiciary, collaborating for common wellbeing
  12. 12. User ratings: app display rank versus adherence to evidence Study of 47 smoking cessation apps (Abroms et al, 2013)
  13. 13. What went wrong with user rankings and reviews ? Wall Street Journal Nov. 24, 2013 6:25 p.m. Inside a Twitter Robot Factory: Fake Activity, Often Bought for Publicity Purposes, Influences Trending Topics By JEFF ELDER One day earlier this month, Jim Vidmar bought 1,000 fake Twitter accounts for $58 from an online vendor in Pakistan. Mr. Vidmar programs accounts like these to "follow" other Twitter accounts, and to rebroadcast tweets http://on.wsj.com/18A2hr9
  14. 14. Promoting quality in the marketplace Quality is defined by ISO as fitness for purpose However, app users and purposes vary, so a simple good / bad quality mark is insufficient We need a checklist of optional criteria to empower users / clinicians / commissioners / app developers Select the criteria you need according to complexity of the app and contextual risk (Lewis & Wyatt, JMIR 2014) Labelling apps will make quality explicit, adding a new Darwinian selection pressure to the apps marketplace
  15. 15. Risk Framework for mHealth apps Context in which app used Lewis T, Wyatt JC. JMIR 2014
  16. 16. Our draft quality criteria for apps based on Donabedian 1966 Structure = the app development team, the evidence base, use of an appropriate behaviour change model etc. Processes = app functions: usability, accuracy etc. Outcomes = app impacts on user knowledge & self efficacy, user behaviours, resource usage Wyatt JC, Curtis K, Brown K, Michie S,. Submitted to Lancet
  17. 17. Structure: what should a high quality App include ? Appropriate sponsor & developer skills Appropriate components: An appropriate underlying health promotion theory A sensible choice of behaviour change techniques Good user interface Accurately programmed algorithms Adequate security Accurate knowledge from a reliable source NICE ? Study method: independent expert review
  18. 18. Process: does the App work well ? Suggested measures: App download rates [surrogate for user recognition] App usage rates immediately after download & later Time taken to enter data, receive report Acceptability, usability Accuracy of calculations, data Appropriateness of any advice given Study method: lab tests and surveys
  19. 19. Case study of CVD risk apps: methods Assembled search terms: heart attach, cardiovascular disease etc. Searched for & downloaded all iPhone / iPad free & paid apps that public might use to assess personal risk Assembled 15 scenarios varying in risk from 1% to 98% Assessed the risk figure, format and advice given by each app Definition of error: App above 20% & GS below 20%, or vice versa (old NICE guidance - or 10% - new guidance) With Hannah Cullumbine & Sophie Moriarty, Leeds medical students
  20. 20. Overall results Located 21 apps, only 19 (7 paid) gave figures All 19 communicated risk using percentages (cf. advice from Gigerenzer, BMJ 2004) One app said see your GP every time; none of the rest gave advice Some apps refused to accept key data, eg. age > 74, diabetes Heart Health App
  21. 21. Misclassification rates for 20% threshold Varied from 7% to 33% 5 (26%) of 19 apps misclassified 25% or more of the scenarios, 3 (16%) misclassified 20-24%; 8 (42%) misclassified at least a fifth of the scenarios Median error rate free 13%, paid 27% Error rate for free apps significantly less than for paid apps (p = 0.026, M- W U test 13.5) 0% 5% 10% 15% 20% 25% 30% 35% Misclassificationrate App misclassification rate (20% threshold; paid = dark blue)
  22. 22. Assessing app accuracy Only applies to apps that give advice, calculate a risk / drug dose etc. Need a gold standard for correct advice / risk [QRisk2 in our case] Need a representative case series, or plausible simulated cases Ideally, users should enter case data or their own data How accurate is accurate enough: Accurate enough to get used ? Accurate enough to encourage user to take action ?
  23. 23. Intervention modelling experiments Aim: to check intervention before expensive large scale study (MRC Framework: Campbell BMJ 207) What to measure: acceptability, usability accuracy of data input by users, accuracy of output whether users correctly interpret output stated impact of output on decision, self efficacy, action users emotional response to output user impressions & suggested improvements
  24. 24. Example IME: How to make prescribing alerts more acceptable to doctors ? Background: interruptive alerts annoy doctors Randomised IME in 24 junior doctors, each viewing 30 prescribing scenarios, with prescribing alerts presented in two different ways Same alert text presented as modal dialogue box (interruptive) or on ePrescribing interface (non- interruptive) Funded by Connecting for Health, carried out by Academic F2 doctor Published as Scott G et al, JAMIA 2011
  25. 25. Interruptive alert Interruptive alert in modal dialogue box
  26. 26. Non-interruptive alert same text
  27. 27. Outcomes the bottom line What is the impact of a health promotion App on: User knowledge and attitudes Self efficacy Short term behaviour change Long term behaviour change Individual health-related outcomes Population health Study method: randomised controlled trials (eg. My Meal Mate App - Carter et al JMIR 2013) surrogate outcomes
  28. 28. Online RCT on Foggs persuasive technology theory & NHS organ donation register sign up Persuasive features: 1. URL includes https, dundee.ac.uk 2. University Logo 3. No advertising 4. References 5. Address & contact details 6. Privacy Statement 7. Articles all dated 8. Site certified (W3C / Health on Net) Nind, Sniehotta et al 2010
  29. 29. Why bother with impact evaluation cant we just predict the results ? No, the real word of people and organisations is too messy / complex to predict whether technology works: Diagnostic decision support (Wyatt, MedInfo 89) Integrated medicines management for a childrens hospital (Koppel, JAMA 2005) MSN messenger for nurse triage (Eminovic, JTT 2006)
  30. 30. A proposed evaluation cascade for mHealth Apps Area Topics Methods Source Purpose, sponsor User, cost Inspection Safety Data protection Usability Inspection HCI lab / user tests Content Based on sound evidence Proven behaviour change methods Inspection Accuracy Calculations Advice Scenarios with gold standard Potential impact Ease of use in the field Understanding of output Intervention modelling experiments Impact Knowledge, attitudes, self-efficacy Health behaviours, outcomes Within-subject expts Field trials
  31. 31. Conclusions 1. The quality of mHealth tools varies too much 2. User & professional reviews, developer self-certification and regulation are not enough 3. To help reduce apptimism and strengthen other strategies, we need to agree quality criteria, evaluate apps against them, & label apps with the results 4. We have the evaluation methods (eg. rating quality of evidence, usability / accuracy studies, RCTs) 5. This will support patients, health professionals and app developers to maximise the benefits of mHealth