Chief Science Officer, Epidemico, Inc.
Presented at ISOP 2015Prague
October 29, 2015
01
Slides available @epidemico
02
Use of Social Mediafor Data Miningin PharmacovigilanceNabarun Dasgupta, MPH, PhD
DisclosuresEpidemico is commercializing aspects of this research.
Epidemico is wholly-owned subsidiary of Booz Allen Hamilton.
FDA has been funding research technology for 3 years.
Office of the Chief Scientist (OCS)Center for Tobacco Products (CTP)
Office of International Programs (OIP)
US FDA 01
A public-private partnership entering Year 2. The WEB-RADR project is supported by theInnovative Medicines Initiative Joint Undertaking (IMI JU) under grant agreement n° 115632,resources of which are composed of financial contributions from the European Union's SeventhFramework Programme (FP7/2007-2013) and EFPIA companies’ in kind contribution.
EUROPE - IMI
02
GSK has been a development partner to extendtechnology and obtain GxP software validation,
over the last 2 years.
INDUSTRY
03
BackgroundWhy bother looking at social media?
01
Infographic elementsSlide sub title
The Drug I’m TakingThe Drug I’m Responsible for Monitoring
MethodsHow to tame social media
02
Data Processing Overview
Automated processes to identifyrelevant posts and clean data
02. Filter
Automated statistics and visual display serve hypothesis generation. HOWEVER, determining causation requiresinformation from other sources.
04. Statistics… and Determining Causation
OPTIONAL: remove false positives,disentangle attributions,
expand dictionary, correct geography
03. Curate
Collect social media data from APIs, third-partyauthorized resellers, and automated scraping
01. Acquire
Automated ProcessesFor making sense of social media data
Use post geodata orprofile location
(~50% of Proto-AEs)
GeotagInternet vernacular to Med-
DRA preferred terms
Translate Vernacular
Reduce multiplicate copiesRemove identifiers
Consolidate
Automated NLPBayesian classifierMachine learning
Identify Events
Proto-AEPosts with Resemblance to Adverse Events
DizzinessMedDRA:
10013573 ICD-10: R53
AstheniaMedDRA:
10003549 ICD-10: R42
For more examples see demo.medwatcher.org
System Statistics
Drugs, medical devices,vaccines, biologics, tobaccoproducts, supply chain
Regulated products2,500
Updated daily by humancurators. Mandarin, French,Spanish & Dutch next.
Vernacular English phrases10,000
Public posts from Facebook, Twitter
& patient forums, back to 2012
Posts collected63,000,000
Certified coders train machinelearning classifiers
& build dictionary
Manually classified380,000
As of October 23, 2015
System PerformanceHow well does automated processing work?
Automated tools identify 9 out of 10 adverse events.
Sensitivity or Recall
7 out of 10 posts contain AEinformation. Can increase to100% with manual curation.May vary by product subset.
Positive Predictive Valueor Precision
68%
88%
Across all products, all time, all data sources
SC
RA
NTO
N B
, DE
VR
IES
J, B
ER
INAT
O S
, BO
RTZ
C. H
AR
VAR
D B
US
INES
S R
EV
IEW
, JU
NE
201
3
Indicator score calculationHow well does automated processing work?
Albuterol has me feeling all sorts of dizzy and weak this morning
albute
rol
albute
rol h
as
albute
rol h
as m
e
has m
e0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Proportion out of all keeps
Keep
Proportion of posts to-ken
never shows up in
None
Proportion of all trash
Trash
albuterolalbuterol hasalbuterol has mehas mehas me feelingfeelingfeeling allfeeling all sortssortssorts of dizzydizzydizzy and weakweakweak thisweak this morningmorningmorning
1. Tokenization 2. Score calculationFor each token “w”
b(w) = (number of positives containing w)
(total number of positives)
g(w) = (number of negatives containing w)
(total number of negatives)
One Applied Examplealbuterol or salbutamolEstablished oral and inhaled drug for asthma & COPDHighly genericized marketA priori selected demonstration drug at demo.medwatcher.org
03
MethodsAlbuterol/salbutamol has been a demo drug from project inception due to client interest
English language onlyUse 0.7 indicator score thresholdMultiplicate copies consolidated
02. Proto-AEs
Tree plots, top events, PRRComparison to deduplicated UK spontaneous reportsDrug Analysis Prints form MHRA
04. Statistics
Need to check for false positives03. No curation
1 year ending October 23, 2015albuterol, salbutamol, brand names
01. Public Facebook and Twitter
Spam HappensExclude n=297 Twitter posts mentioning Ventolin, having a link, 19-Mar-2015 to 02-Apr-2015
Proto-AEs Mentions n= 13,055
Albuterol/Salbutamol: Proto-AE vs. Mentions
Mentions represent the patient voice.They exclude most spam, news, coupons,
and corporate announcements.
3% of Mentions were Proto-AEs
2.5%
Proto-AEs Mentions n=50,229
Proto-AEs Mentions n = 63,284
3.1%3.2%
Unique posts from Twitter & Facebookn = 1,946 Proto-AEs
21 out of 26 System Organ Classes81% MedDRA SOCs
Posts can have multiple symptoms andproducts mentioned within same posts
2,987 Drug-Event Pairs
1,613 from Twitter & 333 from Facebook83% From Twitter
Higher level group terms82 HLGTs
Higher level terms125 HLTs
MedDRA v18199 Preferred Terms
24-Oct 2014 to 23-Oct 2015, Facebook & Twitter n=63,284 Mentions
Albuterol/Salbutamol by System Organ Class24-Oct 2014 to 23-Oct 2015, Facebook & Twitter n=1,946 Proto-AEs
Rectangle size = Proto-AE count | Color intensity = PRR
Albuterol/Salbutamol by Preferred Term24-Oct 2014 to 23-Oct 2015, Facebook & Twitter n=1,946 Proto-AEsRectangle size = Proto-AE count | Color intensity = PRR | Black = small sample PRR restriction
Palpitations
Drug ineffective
PainMalaise
Incorrect doseadministered
Hypersensitivity
Insomnia
Altered state ofconsciousnessFeeling jittery
Tremor
Restlessness
Cough
Lung disorder
Anxiety Crying Somnolence
Wheezing
Dizziness
Nonspecificreaction
FatigueConditionaggravated
Feelingabnormal
Heart rateincreased
Drugdoseomission
Bronchitis
Headache
Nausea VomitingPsychomotorhyperactivity
Burningsensation
Affect lability Dependence
Pyrexia
Pneumonia
IrritabilityMuscletwitching
Albuterol/Salbutamol – Top 5 by Count24-Oct 2014 to 23-Oct 2015, Facebook & Twitter n=1,946 Proto-AEs
Public Facebook and Twitter combined.
0 50 100 150 200 250 300 350 400 450 500
123
136
175
204
433Tremor
Drug ineffective
Altered state of consciousness
Malaise
Cough
Represents the things that most concern patients
Albuterol/Salbutamol – Top 5 by PRR24-Oct 2014 to 23-Oct 2015, Facebook & Twitter n=1,946 Proto-AEs
Unadjusted PRRs calculated against all other products in social listening database (n=2,551 medical products), 5 or more events
0 5 10 15 20 25 30 35 40
23.6
26.6
31.1
32.5
35.7Tremor
Palpitations
Feeling jittery
Tachycardia
Wheezing
Corresponds well with product label
Correlation: Spontaneous Report vs. Social MediaNon-parametric density (isometric plot) at HLGT level
Respiratory disorders NEC(cough, lung disorder, oropharyngeal pain, rhinorrhoea, chest pain, dyspnoea)
Movement disorders(tremor)
Therapeutic/non-therapeutic effects(drug ineffective)
General system disorders(malaise, feeling jittery, pain, condition aggravated,
fatigue, feeling abnormal, asthenia)
Sleep disorders(insomnia, somnolence)
Epidermal & dermal conditions(pruritus, skin discomfort, rash, angioedema, urticaria)
Spontaneous07-Apr-1969 to 26-Aug-2015
46.5 years2,199 unique non-fatal re-
portsn=3,851 reactions at HLGT
156 unique HLGT
Social24-Oct-2014 to 23-Oct-20151 year1,946 unique non-fatal re-portsN=2,419 reactions at HLGT57 unique HLGT
ConcordanceMore common in spontaneous dataMore common in social media data
Social Media in PracticeProjects already underway for safety surveillance
04
Social Data in Concert with SpontaneousRectifying a false negative in an sponsor’s safety database
“Did this product cause x, y, z? I’ve had 3 administrations of this product andever since then have felt a, b, c.”
Twitter Post Identified
Assessment of causality and impor-tance led the sponsor to initiate asubmission with possible label change
Submit to Regulators
Social media monitoring usingEpidemico platform
Routine Portfolio Surveillance
A couple of cases had been previouslyreported to the sponsor but
had not crossed the signal threshold.
Review Company’s Safety Database
Monitoring Discussions about Clinical TrialsProtecting validity and blinding
DiscontinuationI'm in a [Sponsor] double blind #### trial andcould receive the placebo up until week 16 at
which point I'm guaranteed to receive the drug. What I don't understand is that I can't be told if I
was on the placebo for the first 16 weeks.I'm not seeing any positive benefit from whatever
I've been taking for 8 weeks so far, and I wouldreally like to know at week 16 if it's the drug
because then I can bail from the study at thattime and move on to something else…
Unblinding[My brother] started on Thursday. He told me yesterday that he tried to chew a capsule(I'm not sure why, I think he wanted to see if it had a flavor) & it was hard as a rock. I’massuming maybe it was eneric [sic] coated orsomething. He took the no flavor to meanhe's on the placebo, but I actually wonder ifthey would bother enteric coating a placebo.Probably doesn't matter & I know we're notsupposed to know anyway, just wondered ifanyone else's was enteric coated.
Beyond PharmacovigilanceHospital sentiment, counterfeits
Collaboration with WHO-UMC & MedDRA MSSO
Identify Products, Translate Vernacular to MedDRA
01
Reduce noise, provide indicator scores
Remove Spam, Identify AE , Benefits, Sentiment02
Demographics, time to onset, dose, route,medical conditions, lab results, challenge
Entity Extraction for Causal Determination03
Names, hospital name, medical record #, phone, email
Anonymize by Removing PII
04
Coming 1Q2016: Free Public APIIn appreciation for the public investment in this tool (FDA, IMI & Web-RADR)
For social media, clinical trial patient diaries, case safety reports, scientific literature
LimitationsWhat does “bias” mean in this setting?
05
How well can patients assess causality? How much should we care?Is a solely quantitative approach justified?
Causality & Quantitative Emphasis
01
Are we willing to disenfranchise the patient voice on privacy grounds?
Patient Privacy vs. Muting the Patient Voice02
How do you build tools for an ever-shifting data environment?Stability of Data Availability & Emerging Arenas
03
Can we all agree that nobody wants to see each Tweet as a separatecase report in a safety database? So, why is everyone so afraid?
Regulation Unclear
04
ChallengesA few among many
Are social media data biased?YES! All data are biased. But, can we understand the bias and make use of the information?
What we know, how we know itEach source of product safety information provides different perspectives on patient experience
Clinical Trials Sponsor AE Practice Data Social Media0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%PRO Seriousness Completeness
Social media can elucidate whatpatients experience most frequently.
1. Patient-Reported Outcomes
Traditional PV focus on most serious of eventsmay be less central in social media.
2. Seriousness
Social media posts may by less complete.Conversely, contain less sensitive identifying
information than electronic health records.
3. Completeness
Three Dimensions of Bias
Hypothetical data for illustrative purposes only.