research using behavioral big data (bbd)

Post on 22-Jan-2018

710 Views

Category:

Data & Analytics

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Research Using Behavioral Big Data Methodological, Practical, Ethical & Moral Issues

IEEE BigData Congress, Taipei Satellite Session, May 2016

Galit Shmueli徐茉莉Institute of Service Science

WhatisBehavioral BigData(BBD)• SpecialtypeofBigData

• Behavioral:people’sactions,interactions,self-reportedopinions,thoughts,feelings

• Humanandsocialaspects:Intentions,deception,emotion,reciprocation,herding,…• Whenawareofdatacollection->modifiedbehavior(legalrisks,embarrassment,unwantedsolicitation)

BBDvs.MedicalBigData

• Physicalmeasurements• Datacollectiontimingoftensetbymedicalsystem• Clinicaltrials:awareness&vestedinterest

• People’sdailyactions,interactions,self-reportedfeelings,opinions,thoughts(UGC)• Datagenerationtimingoftenchosenbyuser• Experiments:usersoftenunaware;goalnotalwaysinuser’sinterest

BBDonCitizensandCustomers

Governmentssecurity,lawenforcement,traffic(cameras,sensors)

FinancialInstitutionsfraud,loans(ITsystems,cameras)

Telecoms fraud,infrastructure,marketing(ITsystems,mobile)

Retailchainsmarketing,operations,merchandising(POSsystems,video,social,mobile)

InsurancesetUsage-BasedInsurancepremiums(telematicsinfo)

DataCollectionTechnologies:• Cameras• Sensors• ITsystems

(POS,calls,…)• GPS• Things• Internet• Mobile• Social

BBDonEmployees

ServiceProvidersqualitycontrol,employeeperformance

ElectronicPerformanceMonitoring(EPM)systems,websurfing,e-mailssentandreceived,telephoneuse,video,location(taxis)

BBDonCitizens,Customers,Employees:Internet!

• BBDnowalsoavailabletosmall companies&organizations• OnlineplatformshaveBBD(e-commerce,gaming,search,socialnetworks…)• Voluntarily enteredbyusers:personaldetails,photos,comments,messages,searchterms,bidsinauctions,likes,paymentinformation,connectionswith“friends”• Passivefootprints:durationonthewebsite,pagesbrowsed,sequence,referringwebsite,Internetbrowser,operatingsystem,location,IPaddress.• BBDnowavailabletoindividuals:QuantifiedSelf(andapps)

Moreandmorehumanandsocialactivitiesaremovingonline

MostcompaniesthathaveBBDwerenotcreatedforthepurposeofgeneratingBBD

Twoimportantpoints

WhyshoulddatascienceresearcherscareaboutBBD?

Technologyisadvancingintwodirections

Fullyautomated(algorithmic)solutions

Becausetheyare(andshouldbe)involvedindesigningboth!

Micro-levelrecordingofhumanandsocialbehavior

ResearchusingBBD

DuncanWatts,MicrosoftResearch:1. Socialscienceproblemsarealmostalwaysmore

difficultthantheyseem2.Thedatarequiredtoaddressmanyproblemsofinteresttosocialscientistsremaindifficulttoassemble

3.Thoroughexplorationofcomplexsocialproblemsoftenrequiresthecomplementaryapplicationofmultipleresearchtraditions

AcademicResearchQsusingBBD

Researchabout humanandsocialbehavior

examinenewphenomena

re-examineoldphenomenawithbetterdata

ResearchCommunities

Researcherswithsocialscience+technicalbackgrounds

InformationSystems

Marketing ComputationalSocialScience

ExamplesofBBDStudiesinTopJournalsConsumptioninVirtualWorlds(Hinz etal.InfoSysResearch,2015)“Theideathatconspicuousconsumptioncanincreasesocialstatus,asaformofsocialcapital,hasbeenbroadlyaccepted,yetresearchershavenotbeenabletotestthiseffectempirically.”• age-oldsociologyquestionwithnewBBDdata

• BBDfromtwovirtualworldwebsites(gamingwithsocialnetwork)

SocialinfluenceinSocialNewsWebsites(Muchniketal.Science,2014)“Therecentavailabilityofpopulation-scaledatasetsonratingbehaviorandsocialcommunicationenablenovelinvestigationsofsocialinfluence...”• Existingquestioninnewcontext:studysocialinfluencebiasinratingbehavior

• BBDfromasocialnewsaggregationwebsitewhereuserscontributenewsarticles,discussthem,andratecomments

OnlineConsumerRatingsofPhysicians(Gaoetal.InformationSystemsResearch,2014)“examinehowcloselytheonlineratingsreflectpatients’opinionaboutphysicianqualityatlarge.”• newphenomenonofonlineratingsofserviceproviders

• BBDondirectmeasuresofboththeofflinepopulation’sperceptionofphysicianquality,andconsumergeneratedonlinereviews.

ImpactofTeachersonStudentOutcomesusingEducationandTaxBBD(Chetty etal.Amer EconReview,2014)• long-termimpactofteachersonstudentoutcomeshasbeenofinterestineconomicpolicy:oldquestionwithnewBBDdata

• combinedBBDfromadministrativeschooldistrictrecordsandfederalincometaxrecords

EmotionalContagioninSocialNetworks(Krameretal.ProcoftheNationalAcademiesofSciences,2014)• Canemotionalstatesbetransferredtoothersviaemotionalcontagion?

• BBDfromlarge-scaleexperimentrunbyFB,manipulatingusers’exposureleveltoemotionalexpressionsintheirFacebookNewsFeed

AnonymousBrowsinginOnlineDatingWebsites(Bapna etal.ManagementScience,2016)“Onlinedatingplatformsoffernewcapabilities,suchasextensivesearch,bigdata–basedrecommendations,andvaryinglevelsofanonymity,whoseparallelsdonotexistinthephysicalworld...”• newquestionsabouthumanbehaviorduetonewtechnologies

• BBDfromlarge-scaleexperiment,partneredwithlargedatingwebsiteinNAmerica,testingtheeffectofanonymousbrowsingonmatching.

ONE WAY MIRRORS IN ONLINE DATINGA Randomized Field Experiment

Ravi Bapna, University of MinnesotaJui Ramaprasad, Mcgill University

Galit Shmueli, National Tsing Hua UniversityAkhmed Umyarov, University of Minnesota

Online Dating

46of the single population in the US uses online dating

to find a partner (Gelles 2011)

%

Online Dating Website

Non-anonymous Browsing (Default)

ProfileVisit

Recentvisitor:

Anonymous Browsing

ProfileVisit

Recentvisitor:

NONE

Research Question (in simple words)

How does anonymous browsing affect user behavior?

… and matching?

Formal Research Question

what is the relative causal effect of social inhibitions on search preferences vs. social inhibitions of contact initiation in dating markets?

given known gender asymmetries, how does this effect differ for men vs. women?

Randomized Field Experiment on Large Online Dating Website

50,000usersreceivegiftofanonymousbrowsing

Results

Users treated with anonymity

become disinhibited view more profiles, view more same-sex and interracial mates

get less matcheslose ability to leave a weak signal- especially harmful for women!

Roleofanonymity andimportanceofWEAKSIGNAL

inonlineplatforms

InAcademiaCausalQsaremostpopular• Methodologicalchallenges:• scalabilityofstatmodels• small-samplestatinference• self-selection

PredictiveQs(quiterare)• Howtouseresultsbeyondapplication-specific?6usesofpredictiveanalyticsfortheorybuilding[Shmueli &Koppius,2011]

InIndustryPurpose:evaluateorimproveproducts,service,operations,etc.• NetflixPrize:movierecommendersystem

• Yahoo!,LinkedIn:personalizednewscontenttoincreaseuserengagement/clicks[Agarwal&Chen2016]

• Target:pregnancyprediction• Amazon:pricing,etc.• Government:campaigntargeting

BBD-basedResearchQuestions

GettingBBDforResearch

1.OpenData,PubliclyAvailableDataData.govTwitterKaggle (UCIMR)APIandwebscraping

2.PartneringwithaCompany• Bothpartiesinterestedinresearchquestion• Datapurchase• Personalconnections• Partnershipbetweenschoolandorganization(CMULivingAnalyticsResearchLab)

3.CrowdsourcingAMTReplacingstudentsubjects• Experimentsubjects• Surveyrespondents• Cleaningandtaggingdata

“easyaccesstoalarge,stable,anddiversesubjectpool,thelowcostofdoingexperiments,andfasteriterationbetweendevelopingtheoryandexecutingexperiments”[MasonandSuri,2012]

UsingBBDforResearch:HumanSubjects

InstitutionalReviewBoard(IRB)“ethicscommittee”University-levelcommitteedesignatedtoapprove,monitor,andreviewbiomedicalandbehavioralresearchinvolvinghumans.• performsbenefit-riskanalysisforproposedstudy• guidelines:Beneficence,Justice,andRespect forpersons

• HHSproposenewIRBexemptioncriteriaforpubliclyavailabledata(orevenbuyingit)• CouncilforBigData,Ethics&Society’sletter:“thesecriteriaforexclusionfocusonthestatus ofthedataset… notthecontent ofthedatasetnorwhatwillbedonewiththedataset,whicharemoreaccuratecriteriafordeterminingtheriskprofileoftheproposedresearch

Ethics:BeyondIRBFacebookexperiment[Krameretal.2014]:• NoIRB

“[Thework]wasconsistentwithFacebook’sDataUsePolicy,towhichallusersagreepriortocreatinganaccountonFacebook,constitutinginformedconsentforthisresearch.”

• PNASeditorialExpressionofConcern• Variedresponsefrompublic,academia,press,ethicists,corporates[Adar2015]

BigBehavioralExperiments

BigBehavioralExperiments:IssuesComparetoindustrialenvironment

1.Fast-ChangingEnvironmentMultipleA/Btestsruneveryday(overlaps)Userskeepevolving

2.MultiplicityandScalingComputationaladvertisingandcontentrecommendation3M’s[Agarwal&Chen2016]:• Multi-response(clicks,shares,likes,…)• Multi-context(mobile,email,...)• Multipleobjective(engagment,revenue,...)

3.Spill-OverEffects• Treatmentcanaffectcontrolgroup(socialnetworks)

• Challengeofrandomizationonasocialnetwork(Fienberg,2015):eveniftreatmentandcontrolmemberssufficientlyfarawaytoavoidspill-overeffects,analysisstillmustaccountfordependenceamongunits.

BigBehavioralExperiments:IssuesComparetoindustrialenvironment

4.KnowledgeofAllocationandGiftEffect• Likeclinicaltrials:allocationknowledgecanaffectoutcome• Onlineusersdiscovertheirallocationviaonlineforums• Blindingandplacebo?• “Gift”orpreferentialtreatmentcanaffectoutcome• Bapna etal.(2016)comparedeffectatendofmanipulationtimeandrightafter,todeterminegifteffect

5.EthicalandMoralIssuesEaseofrunningalargescaleexperimentquickly andatlowcost• dangerofharmingmanypeoplequickly• smallscalepilotstudy?AMT:Fairtreatment&paymenttoworkers

ObservationalBBD:Issues

EthicalandMoralIssues• Privacy(Netflix)• Dataprotectionandreproducibleresearch

• Conflictofinterestcompany-vs-users(Studyconclusionsleadtooperationalactionsthattrade-offthecompany’sinterestwithuserwell-being)

• AMT– paymenttoworkers

MethodologicalIssues1.Self-selectionBiasUserschoosetreatment• ScalingofPSMtobigdata?

2.Simpson’sParadoxCausaldirectionreverseswhendataaredisaggregated• Doesadatasethaveaparadox?

3.ContaminationbyExperiments

4.DataSize&DimensionNeedverylarge+rich datatoanswerpredictiveQs[Junque deFortuny etal.2014]

ATree-BasedApproachforAddressingSelf-selectioninImpactStudies

withBigData

Inbal Yahav Galit Shmueli Deepa ManiBarIlan University NationalTsingHuaU IndianSchoolofBusiness

Israel Taiwan India

SelfSelection:TheChallenge

• Large impactstudiesofanintervention• Individuals/firmschoosewhichgrouptojoin

Howtoidentifyandadjustforself-selection?

CurrentMethods:ChallengeswithBigData

1.Matchingleadstoseveredataloss

2.Sufferfrom“datadredging”

3.Donotidentifyvariablesthatdrivetheselection

4.Assumeconstantinterventioneffect

5.Sequential natureiscomputationallycostly

6.Requiresusertospecifyformofselectionmodel

OurTree-BasedApproach:Useadataminingalgorithminanovelway

Flexiblenon-parametricselectionmodel

Automated detectionofunbalancedvariables

Easytointerpret,transparent,visual

Applicabletobinary,polytomous,continuousintervention

UsefulinBigDatacontext

Identifyheterogeneouseffects

Example:Impactoftrainingonfinancialgains

Experiment:USAgovt programrandomlyassignedeligiblecandidatestotrainingprogram• Goal:increasefutureearnings• Results(LaLonde,1986):

üGroupsstatisticallyequalintermsofdemographic&pre-trainearnings

ü AverageTrainingEffect=$1794(p<0.004)

Treereveals…High-SchoolMatters!

LaLonde’snaïveapproach(experiment)

TreeapproachHSdropout(n=348)

HSdegree(n=97)

Nottrained(n=260) $4554 $4,495 $4,855Trained(n=185) $6349 $5,649 $8,047

Trainingeffect$1794

(p=0.004)$1,154

(p=0.063)$3,192(p=0.015)

Overall:$1598(p=0.017)

no yes

Highschooldegree

LargeScaleSurveys

DataQuality• duplicateresponses• insincereresponsesrequiredifferentapproachesatlargescale

Onlinesurveys:cheap,easy,fastLargepoolofavailable“workers”Supplementexperimental/observationalstudies

Paradatadataonhowthesurveywasaccessed/answered• timestampsofopeninginvitationemail,whensurveywasaccessed

• Durationforansweringeachquestion

• [SurveyofAdultSkillsbytheOECD]

LargeScaleSurveys

MethodologicalIssue:GeneralizationSamplingandnon-samplingerrors

“Thecentralissueiswhetherconditionaleffectsinthesample(thestudypopulation)maybetransportedtodesiredtargetpopulations.Successdependsoncompatibilityofcausalstructuresinstudyandtargetpopulations,andwillrequiresubjectmatterconsiderationsineachconcretecase.”

[Keiding andLouis,2016]

• Statisticalgeneralization&scientificgeneralization[Kenett&Shmueli,2014]

MethodicalAnalysisCycleofBBDInspiredbyLifecycleview[Kenett,2014],andstatthinkingbuildingblocks[Hoerl etal.2014]

1. understandcompanycontext andBBD2. setuptheresearchquestion3. determineexperimentaldesign4. obtainIRB approval(ifneeded)5. possibly:pilot experiment6. communicate designwithcompany;assurefeasibility7. companydeploys experimentandcollectsthedata8. companyshares thedatawiththeresearchers9. researchersanalyzethedataandarriveatconclusions10. researchers share theinsightsandconclusionswithcompanyandresearchcommunity11. companyoperationalizes theinsightstoimprovetheirbusiness12. companydeploysimpactstudy

Summary

TechnicalChallengesDataaccessAnalysisscalabilityQuick-changingenvironment

BBD=lotsofbehavioraldataWhohasit?Howisitanalyzed?Forwhatpurpose?

MethodologicalChallengesSelectionbiasGeneralization“Control”groupcontaminatedbyotherexperimentsSpill-overeffectsLackofmethodicallifecycle

Legal,Ethical,MoralChallengesPrivacyviolation(Netflix;networks)RiskstohumansubjectsCompanyvs.ResearcherObjectivesGainsofcompanyatexpenseofindividuals,communities,societies,&science

WhyshoulddatascienceresearcherscareaboutBBD?Technologyisadvancingintwodirections

Fullyautomated(algorithmic)solutions

Micro-levelrecordingofhumanandsocialbehavior

ContemplationThreatstoprivacy,society,governance,humanthought,andhumaninteraction

Generalizationforcompany≠scientificgeneralization

Personalizationefforts->de-personalization

“Lawofunintendedconsequences”• Labeling“studentatrisk”,

“potentialcriminal”

Speedofresearch,excitementofnewabilities,notimeforcontemplation

TheCircle,runoutofasprawlingCaliforniacampus,linksusers’personalemails,socialmedia,banking,andpurchasingwiththeiruniversaloperatingsystem,resultinginoneonlineidentityandanewageofcivilityandtransparency.

TheWayForward

ConvergenceofSocialSciencesandEngineering

Things eventuallycollectBBD(intentionallyornot)

AnalyticsHumanity

Responsibility

Galit Shmueli徐茉莉Institute of Service Science

top related