final revised 2016notebook trec 2016 recall - … is also a blogger at e-discoveryteam.com where he...

Post on 30-May-2018

212 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Copyright2016e-DiscoveryTeamLLC

ALLRIGHTSRESERVED

e-DiscoveryTeam

TREC2016TotalRecallTrack

NOTEBOOK

October252016RevisedDecember192016

AcollaborativeeffortofRalphLoseye-DiscoveryTeamcome-DiscoveryTeamLLC

andKrollOntrackInceDiscoverycom

2

TABLEOFCONTENTSe-DiscoveryTeamMembershelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip03ResearchQuestionsConsideredatTREC2016RecallTrackhelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip03

BackgroundtoQuestionshelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip03FourResearchQuestionshelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip05

OverviewOfTeamParticipationin2015TRECRecallTrackhelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip05

SummaryofTeamrsquosWorkhelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip07ShortAnswerstoResearchQuestionshelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip10

ResearchQuestion1(PrimaryQuestion)helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip10

ResearchQuestion2helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip10

ResearchQuestion3helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip12

ResearchQuestion4helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip12

FurtherDiscussionofResearchQuestion1helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip12

3

e-DiscoveryTeamMembersTheTeamismadeupoffivelegalsearchexpertsRalphLoseyJimSullivanTonyReichenbergerLeviKuehnJaniGrantz--andoneldquorobotrdquoMrEDR(thesoftwaretheyused)TheteammembersarenotscientistsorinacademiaMostarelawyerswhospendtheirworkinghourslookingforevidenceinlargechaoticdatasetssuchasemailTheytypicallyassistotherattorneysinlawsuitsandlegalinvestigationsTheirworkincludestheidentificationreviewanalysisclassificationproductionandadmissionofElectronicallyStoredInformation(ESI)asevidenceincourtsintheUnitesStatesandelsewhereTheTeamleaderisRalphCLoseyJDafull-timepracticingattorneyprincipalandNationale-DiscoveryCounselofJacksonLewisPCaUSlawfirmwithover800attorneysandfifty-fiveofficesHehasover36yearsofexperiencedoinglegaldocumentreviewsLoseyisalsoabloggerate-DiscoveryTeamcomwherehehaswrittenovertwomillionwordsone-discoveryHehasalsowrittensixbookspublishedbytheAmericanBarAssociationandWestThompsonThepastfiveyearsLoseyhasparticipatedinmultiplepublicandprivateexperimentssomecompetitivetotestandprovevariouspredictivecodingmethodsLoseyhasalsowrittenoversixtyarticlesonthesubjectoflegalsearchandpredictivecodingJimSullivanJDTonyReichenbergerJDandJaniGrantzJDareattorneysearchandreviewspecialistswhoworkforKrollOntrackInc(KO)LeviKuehnisanon-attorneysearchandreviewspecialistswhoworksforKOKrollOntrackistheprimarye-discoveryvendorusedbyLoseyandhislawfirmItisaglobale-Discoverysoftwareprocessingandprojectmanagementcompany(eDiscoverycom)TheTeamrobotMrEDRistheTeamrsquospersonalizationofKrollOntrackrsquossoftwareeDiscoverycomReview(EDR)LoseySullivanandReichenbergerparticipatedinthe2015TRECTotalRecallTrackSotoodidapriorversionofMrEDRwhichisinaprocessofconstantenhancementThesoftwareversionusedin2016containedthelatestbeta-testversionofthesoftwarethathasnotyetbeenreleasedtothepublic

ResearchQuestionsConsideredatTREC2015RecallTrackBackgroundtoquestionsconsideredItisgenerallyacceptedinthelegalsearchcommunitythattheuseofpredictivecodingtypesearchalgorithmscanimprovethesearchandreviewofdocumentsinlegalproceedings1Theuseofpredictivecodinghasalsobeenapproved

1PredictiveCodingisdefinedbyTheGrossman-CormackGlossaryofTechnology-AssistedReview2013FedCtsLRev7(January2013)(Grossman-CormackGlossary)asldquoAnindustry-specifictermgenerallyusedtodescribeaTechnologyAssistedReviewprocessinvolvingtheuseofaMachineLearningAlgorithmtodistinguishRelevantfromNon-RelevantDocumentsbasedon

4

andevenencouragedbyvariouscourtsaroundtheworldincludingnumerouscourtsintheUS2Althoughthereisagreementonuseofpredictivecodingthereiscontroversyanddisagreementastothemosteffectivemethodsofuse3ThereareproponentsforavarietyofdifferentmethodstofindtrainingdocumentsforpredictivecodingSomeadvocatefortheuseofchanceselectionaloneothersfortheuseoftoprankeddocumentsaloneothersforacombinationoftoprankedandmid-levelrankeddocumentswhereclassificationisunsure4The-DiscoveryTeamusesamethodthatincludesacombinationofallthreeoftheseselectionprocessesandmoreSomeattorneysandpredictivecodingsoftwarevendorsadvocatefortheuseofpredictivecodingsearchmethodsaloneandforegoothersearchmethodswhentheydososuchaskeywordsearchconceptsearchessimilaritysearchesandlinearreviewThee-DiscoveryTeammembersrejectthatapproachandinsteadadvocateforahybridmultimodalapproachtheycallPredictiveCoding405ThismethodusesanapproachtoactivemachinelearningthattheTeamcallsISTstandingforldquoIntelligentlySpacedTrainingrdquoUnderISTtheattorneyinchargedecidesexactlywhentotrainThisisdifferentfromothersystemswhere

SubjectMatterExpert(s)CodingofaTrainingSetofDocumentsrdquoATechnologyAssistedReviewprocessisdefinedasldquoAprocessforPrioritizingorCodingaCollectionofelectronicDocumentsusingacomputerizedsystemthatharnesseshumanjudgmentsofoneormoreSubjectMatterExpert(s)onasmallersetofDocumentsandthenextrapolatesthosejudgmentstotheremainingDocumentCollectionhellipTARprocessesgenerallyincorporateStatisticalModelsandorSamplingtechniquestoguidetheprocessandtomeasureoverallsystemeffectivenessrdquoAlsoseeTechnology-AssistedReviewinE-DiscoveryCanBeMoreEffectiveandMoreEfficientThanExhaustiveManualReviewRichmondJournalofLawandTechnologyVolXVIIIssue3Article11(2011)2DaSilvaMoorevPublicisGroupe868FSupp2d137(SDNY2012)andnumerouscaseslatercitingtoandfollowingthislandmarkdecisionbyJudgeAndrewPeckincludinganothermorerecentopinionbyJudgePeckRioTintoPLCvValeSA306FRD125(SDNY2015)3GrossmanampCormackEvaluationofMachine-LearningProtocolsforTechnology-AssistedReviewinElectronicDiscoverySIGIRrsquo14July6ndash112014GrossmanampCormackCommentsonldquoTheImplicationsofRule26(g)ontheUseofTechnology-AssistedReviewrdquo7FederalCourtsLawReview286(2014)HerbertRoitblatseriesoffiveOrcaTecblogposts(12345)May-August2014HerbertRoitblatDaubertRule26(g)andtheeDiscoveryTurkeyOrcaTecblogAugust11th2014HickmanampSchienemanTheImplicationsofRule26(g)ontheUseofTechnology-AssistedReview7FEDCTSLREV239(2013)LoseyRPredictiveCoding30partone(e-DiscoveryTeam101115)4IdWebberRandomvsactiveselectionoftrainingexamplesine-discovery(Evaluatinge-Discoveryblog71414)5LoseyRPredictiveCoding40ndashNineKeyPointsofLegalDocumentReviewandanUpdatedStatementofOurWorkflow(e-DiscoveryTeam91216)(PartOneofanEightPartSeriesexplainingtherecentadvancementsfromourPredictiveCodingmethodfromversion30toversion40)

5

themachineretrainsaftereachdocumentiscodedorcertainpredeterminednumberandthehumantrainerhasnodiscretionastotiming6Thee-DiscoveryTeamapproachincludesalltypesofsearchmethods(thusthetermmultimodal)tofindrelevantdocumentswithprimaryrelianceplacedonpredictivecodingTheTeamalsousesavarietyofmethodstofindsuitabletrainingdocumentsforpredictivecodingincludinghighrankingdocumentsandallothersearchmethodsThisisafundamentaldifferencewithothermethodsthatrelyentirelyonpredictivecodingtofindrelevantdocumentsandrelyentirelyuponhigh-rankingdocumentsfortrainingGrossmanandCormackhavescientificallytestedthesehigh-rankingtrainingmethodsandmeasuredtheireffectivenessbutthisdoesnotmeanthattheyendorsethemasanexclusivetoolnorclaimthistobetheirownpreferredmethod7FourResearchQuestions

1 PrimaryQuestion(repeatfrom2015)WhatRecallPrecisionandEffortlevelswillthee-DiscoveryTeamattaininTRECtestconditionsoverallthirty-fourtopicsusingtheTeamrsquosPredictiveCoding40hybridmultimodalsearchmethodsandKrollOntrackrsquossoftwareeDiscoverycomReview(EDR)

2 WhatistheimpactofincorrectSubjectMatterExpert(ldquoSMErdquo)judgmentsbytheTRECassessorsonRecallandPrecision(Unplannedquestionthatunfortunatelyaroseoutofthecircumstancesencountered)

3 WhatisthemosteffectivesearchmethodfromtheTeamrsquosmultimodaltool-setforretrievalofrelevantdocumentsintherelativelysimplisticsearchchallengespresentedbymostbutnotallofthethirty-fourtopics(Unplannedquestionthataroseoutofthecircumstancesencountered)

4 Whatistheroleofactivemachinelearninginretrievalofrelevantdocumentsinthesimplisticsearchchallengespresentedbymostofthethirty-fourtopics(Unplannedquestionrelatedtothethirdissueabovethatalsoaroseoutofthecircumstancesencountered)

OverviewOfTeamParticipationin2016TRECRecallTrack

Thee-DiscoveryTeamparticipatedinallthirty-fouroftheTotalRecallTrackAthometopicsItdidnotparticipateinthefullyautomatedTRECTotalRecallsandboxAllthirty-fourtopicssearchedacollectionofpublicemailsofformerFloridaGovernorJebBushTherewere290099emailsintheJebBushEmailcollectionIntheversionoftheJebBushemailsusedbyTRECalmostallmetadataoftheseemailshasbeenremovedMoreovertheassociated

6ThemeritsoftheTeamrsquosapproachtothetimingofmachinelearningaredetailedinPredictiveCoding40PartTwo7GrossmanampCormackEvaluationofMachine-LearningProtocolsforTechnology-AssistedReviewinElectronicDiscoverySIGIRrsquo14July6ndash112014

6

attachmentsandimageswerenotpresentOthercollectionsoftheJebBushemailexistfromPSTfilesthatincludemoreinformationbuttheTeamdidnotutilizethisinformationandlimiteditseffortsandattentiontotheofficialTRECcollectionThissameJebBushemailcollectionwasusedbytheTotalRecallTrackin2015fortentopicsIn2015LoseysearchedalltenofthesetentopicsNoneofthesesearchtopicswasrepeatedin2016Thethirty-fourtopicssearchedin2016andtheirnamesareshownbelowOnthefarrightcolumnarethefirstnamesofthee-DiscoveryTeammemberwhodidthereviewforthattopicThethirteentopicsinredwereconsideredmandatorybyTRECandtheremainingtwenty-onewereoptionalThee-DiscoveryTeamdidalltopics

Topic Name Reviewer

401 SummerOlympics Ralph402 Space Tony403 BottledWater Ralph404 EminentDomain Tony405 NewtGingrich Ralph406 FelonDisenfranchisement Ralph407 FaithBasedInitiatives Ralph408 InvasiveSpecies Tony409 ClimateChange Levi410 Condominiums Tony411 StandYourGround Ralph412 2000Recount Tony413 JamesVCrosby Jim414 MedicaidReform Tony415 GeorgeWBush Jim416 Marketing Jim417 MovieGallery Ralph418 WarPreparations Tony419 LostFosterChildRilyaWilson Levi420 Billboards Jim421 TrafficCameras Jim422 NonResidentAliens Tony423 NationalRifleAssociation Tony424 GulfDrilling Levi425 CivilRightsActof2003 Ralph426 JeffreyGoldhagen Ralph

7

427 SlotMachines Jim428 NewStadiumsandArenas Levi429 ElianGonzalez Jim430 RestraintsandHelmets Jani431 AgencyCreditRatings Tony432 GayAdoption Jani433 Abstinence Jim434 BacardiTrademark Ralph

RalphLoseydidtentopicsTonyReichenbergerdidtenJimSullivandideightLeviKuehndidfourandJaniGrantzdidtwoUnliketheTeamrsquos2015effortnocontractreviewattorneyswereutilizedonanytopicTheywereallsoloeffortsalthoughtherewassomecoordinationandcommutationsbetweenteammembersontheSMEtypeissuesencounteredThispertainedtoquestionsoftruerelevanceanderrorsfoundinthegoldstandardformostofthesetopicsIneachTopictheassignedTeamattorneypersonallyreadandevaluatedfortruerelevanceeveryemailthatTRECreturnedasarelevantdocumentandeveryemailthatTRECunexpectedlyreturnedasIrrelevantSomeofthesewerereadandstudiedmultipletimesbeforewemadeourfinalcallsontruerelevancedeterminationsthattookintoconsiderationandgavesomedeferencetotheTRECassessoradjudicationsbutwerenotboundbythemManyotheremailsthattheTeammembersconsideredirrelevantandTRECagreedwerealsopersonallyreviewedaspartoftheirsearcheffortsAsmentionedtherewassometimesconsultationsanddiscussionbetweenteammembersastotheunexpectedTRECopinionsonrelevanceAllofthethirty-fourtopicspresentedsearchchallengestotheTeamthatwereeasiersomefareasierthantheTeamtypicallyfaceasattorneysleadinglegaldocumentreviewprojectsTheywereroughlyequivalenttothemostsimplisticchallengesthattheymightfaceinprojectsinvolvingverysimplelegaldisputesAfewofthesearchtopicsincludedlegalissuesmuchmorethanwerefoundinthe2015TotalRecallTrackThisisarevisionthattheTeamrequestedandappreciatedbecauseitallowedtestingoflegaljudgmentandanalysisindeterminationoftruerelevanceinthesetopicsInlegalsearchsuchskillsareobviouslyveryimportantInmostofthe2016TotalRecalltopicshowevernospeciallegaltrainingoranalysiswasrequiredforadeterminationoftruerelevanceTheTeamrsquosfinalreportwillspecificallyidentifyeachtopicandastheTeamdidinits2015TRECreportprovidefulldetailsonthetypesofsearchesperformedforeachtopicanddifficultiesencountered

SummaryoftheTeamrsquosWork

Thee-DiscoveryTeamrsquos2016TotalRecallTrackAthomeprojectstartedJune32016andconcludedonAugust312016Usingasingleexpertreviewerineachtopicthee-DiscoveryTeamclassified9863366documentsin34differentreviewprojects

8

Allattorneysusedthee-DiscoveryTeamrsquosPredictiveCoding40hybridmultimodalISTsearchtechniquesandwereassistedbytheKOsoftwareEDRTheyreliedonactivemachinelearningandothersearchtechniquestofindrelevantdocumentsandeffectivetrainingdocumentsThevarioustypesofsearchesincludedintheTeamrsquosmultimodalapproachareshowninthesearchpyramidbelow

LinearreviewreferstoanSMErsquosexaminationofalldocumentsbycertainkeywitnessesinalawsuitduringcertaintimeframescriticaltothedisputedfactsinalawsuitKeywordsearchinourmethodologyreferstotheuseoftermsoriginatingfromlegalanddocumentanalysisandfromwitnessinterviewsJudgmentalsamplingandverificationbySMEsarealsousedtotestthetermsbeforetheyareusedthroughoutadocumentcollectionOurkeywordsearchalsoincludesavarietyofBooleanfunctionsandparametrictargetingwhereinsearchesarelimitedtocertainmetadatafieldsofanelectronicdocumentSimilarityandconceptsearchesrefertoavarietyofpassivemachinelearninganalyticsearchtechniquesTheAIsearchatthetopofthepyramidreferstotheuseofactivemachinelearningTheEDRKOsoftwareusesaproprietarytypeoflogisticregressionalgorithmThestandardeight-stepworkflowusedbytheTeaminlegalsearchprojectsisshowninthediagrambelow8TomeettheTeamrsquosselfimposedtimerequirementsofcompletingeveryreviewprojectwithminimaltimeeffortsthestandardstepsThreeandSevenwereomitted

8LoseyRPredictiveCoding40ndashNineKeyPointsofLegalDocumentReviewandanUpdatedStatementofOurWorkflow(e-DiscoveryTeamOctober2016)containsacompletedescriptionofalleightstepsinpartsSixandSeven

9

aswillbefurtherexplainedFurtherduetotheset-upoftheTRECexperimentsthefirststepofourworkflowESICommunicationswasseverelyconstrainedtothepointofbeingpracticallymeaninglessaswillalsobefurtherexplainedTheTeamrsquosstandardworkflowwasthusreducedtofivestepsasshownbelow

InthefirststepofESICommunicationsteammembersonalegalreviewprojecttypicallyspendhoursindiscussionandanalysisofscopeofrelevanceandthetargetdocumentsThecommunicationsoftenincludehundredsofwrittenexchangesbothinformalsuchasemailsandchatsandformalsuchas(1)detailedrequestsforinformationcontainedincourtdocumentssuchasubpoenasorRequestForProduction(2)inputfromaqualifiedSMEwhoistypicallyalegalexpertwithdeepknowledgeofthefactualissuesinthecaseandthusdeepknowledgeofwhatthepresidingjudgeinthelegalproceedingwillholdtoberelevantanddiscoverableand(3)dialogueswiththepartyrequestingtheproductionofdocumentstoclarifythesearchtargetandotherpartiesTheESIcommunicationsmayleadtoformalmotionswiththegoverningcourtlegalmemorandumshearingsbeforethepresidingjudgeandopinionsrenderedbyoneormorejudgesonthescopeofrelevance9

9IdatPartSixwhereinthefirststepofESICommunicationsisexplainedindetail

10

TheonlyESIcommunicationsintheTRECexperimentalset-upwasaveryshortonesentencedescriptionofrelevanceforeachtopicTwotopicshadatwo-sentencedescription(410-Condominiumsand423-NationalRifleAssociation)TheonlyothertypeofESIcommunicationsinthisTRECTrackweretheautomatedinstantreturnsofalldocumentssubmittedastowhetherTRECconsideredthemtoberelevantornotTherewerenoappealsorotherproceduresset-upforAthomeparticipantswhoactuallyexaminedthedocumentsfortruerelevancetochallengeobviouserrorsinjudgment

ShortAnswerstoResearchQuestionsResearchQuestion1(PrimaryQuestion)WhatRecallPrecisionandEffortlevelswillthee-DiscoveryTeamattaininTRECtestconditionsoverallthirty-fourTopicsusingtheTeamrsquosPredictiveCoding40hybridmultimodalISTsearchmethodsandKrollOntrackrsquossoftwareeDiscoveryReview(EDR)ShortAnswertoPrimaryQuestionAgainlikelastyeartheTeamattainedexcellentresultswithhighlevelsofRecallandPrecisioninalltopicsincludingperfectornearperfectresultsinseveraltopicsusingthecorrectedgoldstandardTheTeamwasabletodosoeventhoughitonlyusedfiveoftheeightstepsinitsusualmethodologyandeventhoughitintentionallyseverelyconstrainedtheamountofhumaneffortexpendedoneachtopicTheTeamrsquosenthusiasmfortheresultswhichweresignificantlybetterthanits2015effortistemperedbythefactthatthesearchchallengespresentedinmostofthetopicsin2016werenotdifficultAsmentionedtheywereequivalenttoaneasylegalsearchprojectsuchasasimplesingleplaintiffemploymentlawdisputeTheFinalReportwillincludeadetailedanalysisoftheseresultsResearchQuestion2WhatistheimpactofmultipleerrorsinSMEjudgmentsbytheTRECassessorsonRecallandPrecisionShortAnswerTheimpactonRecallandPrecisionusingtheTeamrsquosmethodissignificantandasyouwouldexpectvarieddeterminedtothenumberoferrorsmadebyTRECassessorsinaparticulartopicAftertheTeamencounterednumerouserrorsonthefirsttopicsundertakenitwasforcedtocreateitsowngoldstandardoftruerelevantdocumentsforeachtopicTheTeamrsquosnewgoldstandardcorrectedfortheobviouserrorsseeninTRECrsquosassessmentsofrelevanceInallclosequestionsonrelevancethejudgmentofTRECrsquosassessorswasacceptedasaccurateTheobviouserrorsandinconsistenciesseenbytheTeamrsquosclosestudyofthedocumentswerenotacceptedInmostbutnotalltopicstheTeamdidnotusethedocumentswithobviouserrorsforitsmachinetrainingThiswillbefurtherdetailedintheFinalReportInalltopicstheTeamcreateditsownstandardandmadecomparativerecallprecisionandF1calculationsbasedthereonTheobservationandcorrectionofTRECerrorsingoldstandardbecameacollaborativeeffortamongtheTeamtopeerreviewandverifyourcorrected

11

standardMostoftheseeffortsmanyofwhichoccurredaftertheconclusionoftheTrackinAugustwerenotincludedinthetimereportsofeffortsexpendedbyattorneysinthesearchTheTeamwasveryreluctanttotakethisstepandwouldcertainlyhaveletpassafewerrorsormeredifferencesofopinionWerecognizethatnostandardiseverperfectAslawyerstheTeamunderstandsalltoowellthatsomeperhapsmanyjudgmentsonrelevancearesubjectiveAgaininallclosequestionsonrelevancethejudgmentsofTRECrsquosassessorswereacceptedeventhoughwepersonallydisagreedTheTeammeansnodisrespectbythecreationofanalternategoldstandardWeappreciateandrespecttheeffortsmadebytheTRECassessorsandorganizersStillthevolumeofobviouserrorsencounteredforcedustotakethisactionTheintegrityofourprimaryresearchquestiontotesttheeffectivenessofourhands-ontypeofadhochybridmethodsdemandedthatwedosoWeunderstandthattheimpactonotherTotalRecallParticipantsonesthatneveractuallyexaminedocumentswouldbefarlessperhapsevennegligibleStilltherecouldbeanimpactevenfortheminsometopicswheremorethananinsignificantnumberofthesameorsimilardocumentswereinconsistentlyjudgedThedecisiontonotaccepttheerrorsseenandtoinsteadcreateourowngoldstandardresultedinsubstantialadditionalworkfortheTeamInsometopicsweeventookthestepofmakingtwoldquoreasonablecallsrdquoOnewasforTRECandthesecondcallwhichalwaystookplaceonthenextsubmissionwasforourowninternaltrackingInthesecondcallwewouldincludeemailsthatweknewfrompriorsubmissionsofthesameorsimilardocumentwouldagainbeincorrectlyconsideredirrelevantbyTRECWeknewtheyweretruerelevantandsowaiteduntilafterourpublicreasonablecalltoTRECtosubmitthemandthenwemakeourowninternalreasonablecallWewereattemptingtoineffectplaytwogamesatonceandmaximizeourscoreineachgameKeepingtrackoftwostandardsaddedanunexpectedlayerofdifficultytoourworkandwedidnotbothertodosoinalltopicsThedual-calltopicswillbespecificallyidentifiedinourFinalReportInsometopicsthedifferencebetweenthetwostandardswassubstantialInafewtopicsitwasminorSomedifferenceswerefoundinalltopicsThisisnotunexpectedinanystandardinvolvingatleastsomewhatsubjectivemassrelevanceadjudicationsWedonotintendtoengageinacriticismofthespecificgoldstandardcreationmethodsusedin2016TotalRecallTrackexcepttonotethattheappealsprocedureincludedinthe2008and2009TRECLegalTrackscouldhaveimprovedtheaccuracyoftheresultsfortheTotalRecallTrackAthomeparticipants10FurthertheTeamunderstandsfrominformalreportsthattheTREC

10Participantappealrightscouldhavemitigatedtheerrorsseenin2016butthiscanbeburdensomeandasseeninthoseTracksin2008and2009cancreatetheirownissuesSeeOardHedlinTomlinsonBaronOverviewoftheTREC2008LegalTrackfoundathttptrecnistgovpubstrec17papersLEGALOVERVIEW08pdfandOardHedlinTomlinson

12

assessorsworkwasmuchmoretimeconstrainedthanwastheworkoftheTeamMoreoverunliketheTeamtheTRECassessorsdidnothavethebenefitofSMEinputfromanativeFloridianlawyer(Losey)whowasfamiliarwithFloridapoliticsandGovernorBushandsince2015hadputsubstantialtimereviewingthisemailcollectionTheFinalReportwillincludeadetailedcomparisonofrecallprecisionandF1basedonthecomparisonofboththeTRECandTeamassessmentsAfewexamplesofthemoreegregiouserrorsencounteredwillbeprovidedTheFinalReportmayalsocontainacompletelistingoftherevisedgoldstandardsthattheTeamcreatedforeachtopicoratleastaconditionalofferofdisclosureofthecorrectedstandardsTheTeaminvitesinputfromotherparticipantsandorganizersoftheTotalRecallTrackonthisissueAgaintheTeamrecognizesthatnogoldstandardiseverperfectincludingitsownrevisedstandardsThiswillbesetforthinfurtherdetailintheTeamrsquosfinalreportResearchQuestion3WhatisthemosteffectivesearchmethodfromtheTeamrsquosmultimodaltool-setforretrievalofrelevantdocumentsfortherelativelysimplisticsearchchallengespresentedbymostofthethirty-fourtopicsShortAnswerFortheeasytopicstheTeamfoundthatwhatitcallsldquotestedparametricBooleankeywordsearchrdquowasthemosteffectivesearchmethodtofindrelevantdocumentsTheTeamwassurprisedbyhowwellasophisticateduseofkeywordswasabletoidentifynearlyallofthetargetrelevantdocumentsinmanyofthetopicsinthisyearrsquosTotalRecallTrackThisshowsthecontinuedimportanceofamultimodalapproachtolegalsearchincludingespeciallykeywordsearchwhendoneproperlyespeciallyinsimplelawsuitsinvolvingrelativelyeasysearchissuesThiswillbesetforthinfurtherdetailintheTeamrsquosfinalreportResearchQuestion4Whatistheroleofactivemachinelearninginretrievalofrelevantdocumentsinthesimplisticsearchchallengespresentedbymostofthethirty-fourtopicsShortAnswerTheTeamfoundthatfortherelativelyeasytopicsinthisyearrsquosTotalRecallTracktheroleofactivemachinelearningwasreducedtoaqualitycontrolfunctionItwouldfindafewrelevantdocumentsnotlocatedbykeywordsearchorconceptandsimilaritysearchandthusimproverecallsomewhatInthesimplesttopicsactivemachinelearningdidnotfindanynewrelevantdocumentsbutinsteadonlyconfirmedthatallrelevantdocumentshadalreadybeenfoundbytheothermethodsThiswillbesetforthinfurtherdetailintheTeamrsquosfinalreport

FurtherDiscussionofResearchQuestion1

BaronOardOverviewoftheTREC2009LegalTrackfoundathttptrecnistgovpubstrec18papersLEGAL09OVERVIEWpdf

13

EvenusingthegivenuncorrectedTRECstandardforscoringandeventhoughinmosttopicswedidnottrainontheTRECreturned-relevantdocumentsthattheTeamconsideredirrelevanttheTeamoverallstillattainedexcellentresultsUnderthecorrectedstandardwhichwillbesharedintheFinalReporttheresultsweremuchbetterThefollowingchartcomparestheTeamrsquosRecallPrecisionandF-MeasureforeachAthometopicwiththeresultsobtainedbyTRECrsquosBMIandBMI-Descruns(onlyotherscoresnowavailable)

REASONABLECOMPARISON

Recall Precision F-Measure

EdiscoveryTeam

BMI BMI-Desc

EdiscoveryTeam

BMI BMI-Desc

EdiscoveryTeam

BMI BMI-Desc

athome401 SummerOlympics 4105 9170 9258 7344 1531 1545 5266 2623 2648athome402 Space 7257 9107 9028 2204 3086 3059 3381 4609 4570athome403 BottledWater 716 9771 9771 8041 3749 3749 1314 5418 5418athome404 EminentDomain 2294 9174 9193 6443 2655 2661 3383 4119 4127athome405 NewtGingrich 9508 9918 9836 2809 982 974 4336 1787 1773athome406 FelonDisenfran 7323 9291 9291 6691 958 958 6992 1737 1737

athome407FaithBasedInitiatives 3102 9180 9199 6872 4186 4195 4275 5750 5762

athome408 InvasiveSpecies 5517 8362 8362 6465 787 787 5953 1439 1439athome409 ClimateChange 8465 9505 9406 4071 1399 1385 5498 2440 2414athome410 Condominiums 9510 9948 9903 4613 4259 4240 6212 5964 5938athome411 StandYourGround 6629 7079 8427 6705 570 609 6667 1055 1136athome412 2000Recount 5738 9135 9248 4918 4097 4148 5296 5657 5727athome413 JamesVCrosby 9634 9908 9927 8900 2873 2878 9252 4455 4463athome414 MedicaidReform 9166 9690 9726 3532 3510 3523 5101 5154 5173athome415 GeorgeWBush 9408 6339 6708 9104 6109 5866 9253 6222 6259athome416 Marketing 6030 9419 9557 4208 4332 4396 4957 5935 6022athome417 MovieGallery 9961 9981 9966 9938 5728 5719 9949 7279 7267athome418 WarPreparations 3957 9305 9358 5034 1268 1276 4431 2232 2245

athome419LostFosterChildRilyaWilson 9884 9306 9361 1504 4813 4841 2610 6344 6382

athome420 Billboards 9254 9946 9932 9216 3165 3161 9235 4802 4795athome421 TrafficCameras 9048 10000 10000 1250 190 190 2197 373 373athome422 NonResidentAliens 9355 10000 10000 090 281 281 179 546 546

athome423NationalRifleAssociation 5105 9965 9965 3318 1868 1868 4022 3146 3146

athome424 GulfDrilling 9960 10000 10000 2276 2639 2639 3705 4176 4176athome425 CivilRightsAct2003 9132 9860 9860 9659 3370 3370 9388 5023 5023athome426 JeffreyGoldhagen 7000 9417 9417 8750 917 917 7778 1672 1672athome427 SlotMachines 8921 9668 9668 3577 1698 1698 5107 2889 2889athome428 NewStadiums 9310 9849 9849 1781 2695 2695 2991 4231 4231athome429 ElianGonzalez 9420 9927 9927 9241 3545 3545 9329 5224 5224

14

athome430RestraintsandHelmets 7195 9425 9465 6500 3640 3655 6830 5252 5274

athome431 AgencyCreditRate 7569 9931 9931 4760 1161 1161 5845 2078 2078athome432 GayAdoption 8500 9857 9857 8623 1120 1120 8561 2012 2012athome433 Abstinence 9911 10000 10000 6607 909 909 7929 1667 1667athome434 BacardiTrademark 8684 10000 10000 9167 344 344 8919 665 665

ThesecomparativestatisticsshowthescoresatthetimeofreasonablecallIntheprecisioncategorywhichinLegalSearchisthemoneyshotthathasthegreatestimpactonthecostofadocumentreviewprojectthee-DiscoveryTeamdominatedIthadthehighestprecisionlevelon28ofthe34topics(82)TheyarehighlightedinblueintheabovechartThee-DiscoveryTeamrsquosaverageprecisionscorewas571TheaverageprecisionofbothBMIandBMI-Descwas248ThustheTeamrsquosprecisionscorewasonaveragemoretwoandaquartertimeshigherthanthatoftheBMIstandards

IntheF1-measurewhichisthestandardvalueusedinlegalsearchtoevaluateoverallprecisionandrecallofaprojectthee-DiscoveryTeamagaindominatedThisissomewhatsurprisinginviewofthefactthatthesemeasurementswerebasedontheerror-filledTRECstandardTheTeamhadthehighestF1scoreson23ofthe34topics(68)TheyarehighlightedinblueintheabovechartThee-DiscoveryTeamrsquosaverageF1scorewas5769

5712

2483 2481

000

1000

2000

3000

4000

5000

6000

7000

AveragePrecisionAcrossTopics

EdiscoveryTeam BMI BMI-Desc

15

TheaverageF1ofBMIandBMI-Descwas365ThustheTeamrsquosF1scorewasonaveragemorethan58higherthanthatoftheBMIstandards

EvenusingTRECrsquoschallengedstandardtheTeamstillattainedhigherrecallthanboththeBMIandBMI-Descstandardsontwotopicstopic415GeorgeBushwithascoreof9408andtopic419LostFosterChildRilyaWilsonwithascoreof9884MoreovertheTeamattainedrecalllevelsinexcessof90atthetimeofreasonablecallinthefollowingadditionaltopics

bull 9508ontopic406FelonDisenfranchisementbull 9510ontopic410Condominiumsbull 9634ontopic413JamesVCrosbybull 9961ontopic417MovieGallerybull 9254ontopic420Billboardsbull 9048ontopic421TrafficCamerasbull 9355ontopic422NonResidentAliensbull 9960ontopic424GulfDrilling

5769

3646 3655

000

1000

2000

3000

4000

5000

6000

7000

AverageF-MeasureAcrossTopics

EdiscoveryTeam BMI BMI-Desc

16

bull 9132ontopic425CivilRightsActof2003bull 9310ontopic428NewStadiumsandArenasbull 9420ontopic429ElianGonzalezbull 9911ontopic433Abstinence

InsummaryevenwiththeTRECstandardwhereinmosttopicstheTeamdidnotusealldocumentsreturnedasrelevantforallofitstrainingdocumentsitattainedRecallscoresgreaterthan90infourteenofthethirty-fourtopicsTheTeamattainedRecallscoresof80orhigherinfouradditionaltopicsTheaverageresultsobtainedacrossallthirty-fourtopicsatthetimeofreasonablecallwereasfollows

bull 7546Recallbull 5712Precisionbull 5769F1bull 121DocsReviewedEffort

TheTeamwilldiscloseallofitsscoresunderthecorrectedgoldstandardintheFinalReportInthemeantimeherearetheaverageresultsobtainedacrossallthirty-fourtopicsatthetimeofreasonablecall

bull 8715Recallbull 6494Precisionbull 6874F1bull 124DocsReviewedEffort

AtthetimeofreasonablecalltheTeamhadrecallscoresgreaterthan90intwenty-oneofthethirty-fourtopicsandgreaterthan80infivemoretopicsRecallofgreaterthan99wasattainedinseventopicsAtthetimeofreasonablecalltheTeamhadprecisionscoresgreaterthan90inthirteenofthethirty-fourtopicsandgreaterthan80intwomoretopicsPrecisionofgreaterthan98wasattainedinsixtopicsAtthetimeofreasonablecalltheTeamhadF1scoresgreaterthan90intwelveofthethirty-fourtopicsandgreaterthan80inonemoretopicF1ofgreaterthan97wasattainedinfivetopicsWewereluckytoattainoneperfectscoreaswedidin2015intopic(417)withanF1scoreof100Theperfectscorewasobtainedbylocatingall5945documentsrelevantunderthecorrectedstandardafterreviewingonly45documentsThistopicwasfilledwithformlettersandwasafairlysimplesearchStilltheBMIandBMI-DescF1scoresforthistopicwerebothunder73TheTeamwaspleasedtoproveonceagainthatperfectrecallandperfectprecisionispossiblealbeitrareusingtheTeamrsquosmethods

17

ForquestionscommentsorsuggestionsconcerningthispreliminaryNotebookreportofthee-DiscoveryTeampleasecontactRalphLoseygmailcom

2

TABLEOFCONTENTSe-DiscoveryTeamMembershelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip03ResearchQuestionsConsideredatTREC2016RecallTrackhelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip03

BackgroundtoQuestionshelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip03FourResearchQuestionshelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip05

OverviewOfTeamParticipationin2015TRECRecallTrackhelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip05

SummaryofTeamrsquosWorkhelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip07ShortAnswerstoResearchQuestionshelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip10

ResearchQuestion1(PrimaryQuestion)helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip10

ResearchQuestion2helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip10

ResearchQuestion3helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip12

ResearchQuestion4helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip12

FurtherDiscussionofResearchQuestion1helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip12

3

e-DiscoveryTeamMembersTheTeamismadeupoffivelegalsearchexpertsRalphLoseyJimSullivanTonyReichenbergerLeviKuehnJaniGrantz--andoneldquorobotrdquoMrEDR(thesoftwaretheyused)TheteammembersarenotscientistsorinacademiaMostarelawyerswhospendtheirworkinghourslookingforevidenceinlargechaoticdatasetssuchasemailTheytypicallyassistotherattorneysinlawsuitsandlegalinvestigationsTheirworkincludestheidentificationreviewanalysisclassificationproductionandadmissionofElectronicallyStoredInformation(ESI)asevidenceincourtsintheUnitesStatesandelsewhereTheTeamleaderisRalphCLoseyJDafull-timepracticingattorneyprincipalandNationale-DiscoveryCounselofJacksonLewisPCaUSlawfirmwithover800attorneysandfifty-fiveofficesHehasover36yearsofexperiencedoinglegaldocumentreviewsLoseyisalsoabloggerate-DiscoveryTeamcomwherehehaswrittenovertwomillionwordsone-discoveryHehasalsowrittensixbookspublishedbytheAmericanBarAssociationandWestThompsonThepastfiveyearsLoseyhasparticipatedinmultiplepublicandprivateexperimentssomecompetitivetotestandprovevariouspredictivecodingmethodsLoseyhasalsowrittenoversixtyarticlesonthesubjectoflegalsearchandpredictivecodingJimSullivanJDTonyReichenbergerJDandJaniGrantzJDareattorneysearchandreviewspecialistswhoworkforKrollOntrackInc(KO)LeviKuehnisanon-attorneysearchandreviewspecialistswhoworksforKOKrollOntrackistheprimarye-discoveryvendorusedbyLoseyandhislawfirmItisaglobale-Discoverysoftwareprocessingandprojectmanagementcompany(eDiscoverycom)TheTeamrobotMrEDRistheTeamrsquospersonalizationofKrollOntrackrsquossoftwareeDiscoverycomReview(EDR)LoseySullivanandReichenbergerparticipatedinthe2015TRECTotalRecallTrackSotoodidapriorversionofMrEDRwhichisinaprocessofconstantenhancementThesoftwareversionusedin2016containedthelatestbeta-testversionofthesoftwarethathasnotyetbeenreleasedtothepublic

ResearchQuestionsConsideredatTREC2015RecallTrackBackgroundtoquestionsconsideredItisgenerallyacceptedinthelegalsearchcommunitythattheuseofpredictivecodingtypesearchalgorithmscanimprovethesearchandreviewofdocumentsinlegalproceedings1Theuseofpredictivecodinghasalsobeenapproved

1PredictiveCodingisdefinedbyTheGrossman-CormackGlossaryofTechnology-AssistedReview2013FedCtsLRev7(January2013)(Grossman-CormackGlossary)asldquoAnindustry-specifictermgenerallyusedtodescribeaTechnologyAssistedReviewprocessinvolvingtheuseofaMachineLearningAlgorithmtodistinguishRelevantfromNon-RelevantDocumentsbasedon

4

andevenencouragedbyvariouscourtsaroundtheworldincludingnumerouscourtsintheUS2Althoughthereisagreementonuseofpredictivecodingthereiscontroversyanddisagreementastothemosteffectivemethodsofuse3ThereareproponentsforavarietyofdifferentmethodstofindtrainingdocumentsforpredictivecodingSomeadvocatefortheuseofchanceselectionaloneothersfortheuseoftoprankeddocumentsaloneothersforacombinationoftoprankedandmid-levelrankeddocumentswhereclassificationisunsure4The-DiscoveryTeamusesamethodthatincludesacombinationofallthreeoftheseselectionprocessesandmoreSomeattorneysandpredictivecodingsoftwarevendorsadvocatefortheuseofpredictivecodingsearchmethodsaloneandforegoothersearchmethodswhentheydososuchaskeywordsearchconceptsearchessimilaritysearchesandlinearreviewThee-DiscoveryTeammembersrejectthatapproachandinsteadadvocateforahybridmultimodalapproachtheycallPredictiveCoding405ThismethodusesanapproachtoactivemachinelearningthattheTeamcallsISTstandingforldquoIntelligentlySpacedTrainingrdquoUnderISTtheattorneyinchargedecidesexactlywhentotrainThisisdifferentfromothersystemswhere

SubjectMatterExpert(s)CodingofaTrainingSetofDocumentsrdquoATechnologyAssistedReviewprocessisdefinedasldquoAprocessforPrioritizingorCodingaCollectionofelectronicDocumentsusingacomputerizedsystemthatharnesseshumanjudgmentsofoneormoreSubjectMatterExpert(s)onasmallersetofDocumentsandthenextrapolatesthosejudgmentstotheremainingDocumentCollectionhellipTARprocessesgenerallyincorporateStatisticalModelsandorSamplingtechniquestoguidetheprocessandtomeasureoverallsystemeffectivenessrdquoAlsoseeTechnology-AssistedReviewinE-DiscoveryCanBeMoreEffectiveandMoreEfficientThanExhaustiveManualReviewRichmondJournalofLawandTechnologyVolXVIIIssue3Article11(2011)2DaSilvaMoorevPublicisGroupe868FSupp2d137(SDNY2012)andnumerouscaseslatercitingtoandfollowingthislandmarkdecisionbyJudgeAndrewPeckincludinganothermorerecentopinionbyJudgePeckRioTintoPLCvValeSA306FRD125(SDNY2015)3GrossmanampCormackEvaluationofMachine-LearningProtocolsforTechnology-AssistedReviewinElectronicDiscoverySIGIRrsquo14July6ndash112014GrossmanampCormackCommentsonldquoTheImplicationsofRule26(g)ontheUseofTechnology-AssistedReviewrdquo7FederalCourtsLawReview286(2014)HerbertRoitblatseriesoffiveOrcaTecblogposts(12345)May-August2014HerbertRoitblatDaubertRule26(g)andtheeDiscoveryTurkeyOrcaTecblogAugust11th2014HickmanampSchienemanTheImplicationsofRule26(g)ontheUseofTechnology-AssistedReview7FEDCTSLREV239(2013)LoseyRPredictiveCoding30partone(e-DiscoveryTeam101115)4IdWebberRandomvsactiveselectionoftrainingexamplesine-discovery(Evaluatinge-Discoveryblog71414)5LoseyRPredictiveCoding40ndashNineKeyPointsofLegalDocumentReviewandanUpdatedStatementofOurWorkflow(e-DiscoveryTeam91216)(PartOneofanEightPartSeriesexplainingtherecentadvancementsfromourPredictiveCodingmethodfromversion30toversion40)

5

themachineretrainsaftereachdocumentiscodedorcertainpredeterminednumberandthehumantrainerhasnodiscretionastotiming6Thee-DiscoveryTeamapproachincludesalltypesofsearchmethods(thusthetermmultimodal)tofindrelevantdocumentswithprimaryrelianceplacedonpredictivecodingTheTeamalsousesavarietyofmethodstofindsuitabletrainingdocumentsforpredictivecodingincludinghighrankingdocumentsandallothersearchmethodsThisisafundamentaldifferencewithothermethodsthatrelyentirelyonpredictivecodingtofindrelevantdocumentsandrelyentirelyuponhigh-rankingdocumentsfortrainingGrossmanandCormackhavescientificallytestedthesehigh-rankingtrainingmethodsandmeasuredtheireffectivenessbutthisdoesnotmeanthattheyendorsethemasanexclusivetoolnorclaimthistobetheirownpreferredmethod7FourResearchQuestions

1 PrimaryQuestion(repeatfrom2015)WhatRecallPrecisionandEffortlevelswillthee-DiscoveryTeamattaininTRECtestconditionsoverallthirty-fourtopicsusingtheTeamrsquosPredictiveCoding40hybridmultimodalsearchmethodsandKrollOntrackrsquossoftwareeDiscoverycomReview(EDR)

2 WhatistheimpactofincorrectSubjectMatterExpert(ldquoSMErdquo)judgmentsbytheTRECassessorsonRecallandPrecision(Unplannedquestionthatunfortunatelyaroseoutofthecircumstancesencountered)

3 WhatisthemosteffectivesearchmethodfromtheTeamrsquosmultimodaltool-setforretrievalofrelevantdocumentsintherelativelysimplisticsearchchallengespresentedbymostbutnotallofthethirty-fourtopics(Unplannedquestionthataroseoutofthecircumstancesencountered)

4 Whatistheroleofactivemachinelearninginretrievalofrelevantdocumentsinthesimplisticsearchchallengespresentedbymostofthethirty-fourtopics(Unplannedquestionrelatedtothethirdissueabovethatalsoaroseoutofthecircumstancesencountered)

OverviewOfTeamParticipationin2016TRECRecallTrack

Thee-DiscoveryTeamparticipatedinallthirty-fouroftheTotalRecallTrackAthometopicsItdidnotparticipateinthefullyautomatedTRECTotalRecallsandboxAllthirty-fourtopicssearchedacollectionofpublicemailsofformerFloridaGovernorJebBushTherewere290099emailsintheJebBushEmailcollectionIntheversionoftheJebBushemailsusedbyTRECalmostallmetadataoftheseemailshasbeenremovedMoreovertheassociated

6ThemeritsoftheTeamrsquosapproachtothetimingofmachinelearningaredetailedinPredictiveCoding40PartTwo7GrossmanampCormackEvaluationofMachine-LearningProtocolsforTechnology-AssistedReviewinElectronicDiscoverySIGIRrsquo14July6ndash112014

6

attachmentsandimageswerenotpresentOthercollectionsoftheJebBushemailexistfromPSTfilesthatincludemoreinformationbuttheTeamdidnotutilizethisinformationandlimiteditseffortsandattentiontotheofficialTRECcollectionThissameJebBushemailcollectionwasusedbytheTotalRecallTrackin2015fortentopicsIn2015LoseysearchedalltenofthesetentopicsNoneofthesesearchtopicswasrepeatedin2016Thethirty-fourtopicssearchedin2016andtheirnamesareshownbelowOnthefarrightcolumnarethefirstnamesofthee-DiscoveryTeammemberwhodidthereviewforthattopicThethirteentopicsinredwereconsideredmandatorybyTRECandtheremainingtwenty-onewereoptionalThee-DiscoveryTeamdidalltopics

Topic Name Reviewer

401 SummerOlympics Ralph402 Space Tony403 BottledWater Ralph404 EminentDomain Tony405 NewtGingrich Ralph406 FelonDisenfranchisement Ralph407 FaithBasedInitiatives Ralph408 InvasiveSpecies Tony409 ClimateChange Levi410 Condominiums Tony411 StandYourGround Ralph412 2000Recount Tony413 JamesVCrosby Jim414 MedicaidReform Tony415 GeorgeWBush Jim416 Marketing Jim417 MovieGallery Ralph418 WarPreparations Tony419 LostFosterChildRilyaWilson Levi420 Billboards Jim421 TrafficCameras Jim422 NonResidentAliens Tony423 NationalRifleAssociation Tony424 GulfDrilling Levi425 CivilRightsActof2003 Ralph426 JeffreyGoldhagen Ralph

7

427 SlotMachines Jim428 NewStadiumsandArenas Levi429 ElianGonzalez Jim430 RestraintsandHelmets Jani431 AgencyCreditRatings Tony432 GayAdoption Jani433 Abstinence Jim434 BacardiTrademark Ralph

RalphLoseydidtentopicsTonyReichenbergerdidtenJimSullivandideightLeviKuehndidfourandJaniGrantzdidtwoUnliketheTeamrsquos2015effortnocontractreviewattorneyswereutilizedonanytopicTheywereallsoloeffortsalthoughtherewassomecoordinationandcommutationsbetweenteammembersontheSMEtypeissuesencounteredThispertainedtoquestionsoftruerelevanceanderrorsfoundinthegoldstandardformostofthesetopicsIneachTopictheassignedTeamattorneypersonallyreadandevaluatedfortruerelevanceeveryemailthatTRECreturnedasarelevantdocumentandeveryemailthatTRECunexpectedlyreturnedasIrrelevantSomeofthesewerereadandstudiedmultipletimesbeforewemadeourfinalcallsontruerelevancedeterminationsthattookintoconsiderationandgavesomedeferencetotheTRECassessoradjudicationsbutwerenotboundbythemManyotheremailsthattheTeammembersconsideredirrelevantandTRECagreedwerealsopersonallyreviewedaspartoftheirsearcheffortsAsmentionedtherewassometimesconsultationsanddiscussionbetweenteammembersastotheunexpectedTRECopinionsonrelevanceAllofthethirty-fourtopicspresentedsearchchallengestotheTeamthatwereeasiersomefareasierthantheTeamtypicallyfaceasattorneysleadinglegaldocumentreviewprojectsTheywereroughlyequivalenttothemostsimplisticchallengesthattheymightfaceinprojectsinvolvingverysimplelegaldisputesAfewofthesearchtopicsincludedlegalissuesmuchmorethanwerefoundinthe2015TotalRecallTrackThisisarevisionthattheTeamrequestedandappreciatedbecauseitallowedtestingoflegaljudgmentandanalysisindeterminationoftruerelevanceinthesetopicsInlegalsearchsuchskillsareobviouslyveryimportantInmostofthe2016TotalRecalltopicshowevernospeciallegaltrainingoranalysiswasrequiredforadeterminationoftruerelevanceTheTeamrsquosfinalreportwillspecificallyidentifyeachtopicandastheTeamdidinits2015TRECreportprovidefulldetailsonthetypesofsearchesperformedforeachtopicanddifficultiesencountered

SummaryoftheTeamrsquosWork

Thee-DiscoveryTeamrsquos2016TotalRecallTrackAthomeprojectstartedJune32016andconcludedonAugust312016Usingasingleexpertreviewerineachtopicthee-DiscoveryTeamclassified9863366documentsin34differentreviewprojects

8

Allattorneysusedthee-DiscoveryTeamrsquosPredictiveCoding40hybridmultimodalISTsearchtechniquesandwereassistedbytheKOsoftwareEDRTheyreliedonactivemachinelearningandothersearchtechniquestofindrelevantdocumentsandeffectivetrainingdocumentsThevarioustypesofsearchesincludedintheTeamrsquosmultimodalapproachareshowninthesearchpyramidbelow

LinearreviewreferstoanSMErsquosexaminationofalldocumentsbycertainkeywitnessesinalawsuitduringcertaintimeframescriticaltothedisputedfactsinalawsuitKeywordsearchinourmethodologyreferstotheuseoftermsoriginatingfromlegalanddocumentanalysisandfromwitnessinterviewsJudgmentalsamplingandverificationbySMEsarealsousedtotestthetermsbeforetheyareusedthroughoutadocumentcollectionOurkeywordsearchalsoincludesavarietyofBooleanfunctionsandparametrictargetingwhereinsearchesarelimitedtocertainmetadatafieldsofanelectronicdocumentSimilarityandconceptsearchesrefertoavarietyofpassivemachinelearninganalyticsearchtechniquesTheAIsearchatthetopofthepyramidreferstotheuseofactivemachinelearningTheEDRKOsoftwareusesaproprietarytypeoflogisticregressionalgorithmThestandardeight-stepworkflowusedbytheTeaminlegalsearchprojectsisshowninthediagrambelow8TomeettheTeamrsquosselfimposedtimerequirementsofcompletingeveryreviewprojectwithminimaltimeeffortsthestandardstepsThreeandSevenwereomitted

8LoseyRPredictiveCoding40ndashNineKeyPointsofLegalDocumentReviewandanUpdatedStatementofOurWorkflow(e-DiscoveryTeamOctober2016)containsacompletedescriptionofalleightstepsinpartsSixandSeven

9

aswillbefurtherexplainedFurtherduetotheset-upoftheTRECexperimentsthefirststepofourworkflowESICommunicationswasseverelyconstrainedtothepointofbeingpracticallymeaninglessaswillalsobefurtherexplainedTheTeamrsquosstandardworkflowwasthusreducedtofivestepsasshownbelow

InthefirststepofESICommunicationsteammembersonalegalreviewprojecttypicallyspendhoursindiscussionandanalysisofscopeofrelevanceandthetargetdocumentsThecommunicationsoftenincludehundredsofwrittenexchangesbothinformalsuchasemailsandchatsandformalsuchas(1)detailedrequestsforinformationcontainedincourtdocumentssuchasubpoenasorRequestForProduction(2)inputfromaqualifiedSMEwhoistypicallyalegalexpertwithdeepknowledgeofthefactualissuesinthecaseandthusdeepknowledgeofwhatthepresidingjudgeinthelegalproceedingwillholdtoberelevantanddiscoverableand(3)dialogueswiththepartyrequestingtheproductionofdocumentstoclarifythesearchtargetandotherpartiesTheESIcommunicationsmayleadtoformalmotionswiththegoverningcourtlegalmemorandumshearingsbeforethepresidingjudgeandopinionsrenderedbyoneormorejudgesonthescopeofrelevance9

9IdatPartSixwhereinthefirststepofESICommunicationsisexplainedindetail

10

TheonlyESIcommunicationsintheTRECexperimentalset-upwasaveryshortonesentencedescriptionofrelevanceforeachtopicTwotopicshadatwo-sentencedescription(410-Condominiumsand423-NationalRifleAssociation)TheonlyothertypeofESIcommunicationsinthisTRECTrackweretheautomatedinstantreturnsofalldocumentssubmittedastowhetherTRECconsideredthemtoberelevantornotTherewerenoappealsorotherproceduresset-upforAthomeparticipantswhoactuallyexaminedthedocumentsfortruerelevancetochallengeobviouserrorsinjudgment

ShortAnswerstoResearchQuestionsResearchQuestion1(PrimaryQuestion)WhatRecallPrecisionandEffortlevelswillthee-DiscoveryTeamattaininTRECtestconditionsoverallthirty-fourTopicsusingtheTeamrsquosPredictiveCoding40hybridmultimodalISTsearchmethodsandKrollOntrackrsquossoftwareeDiscoveryReview(EDR)ShortAnswertoPrimaryQuestionAgainlikelastyeartheTeamattainedexcellentresultswithhighlevelsofRecallandPrecisioninalltopicsincludingperfectornearperfectresultsinseveraltopicsusingthecorrectedgoldstandardTheTeamwasabletodosoeventhoughitonlyusedfiveoftheeightstepsinitsusualmethodologyandeventhoughitintentionallyseverelyconstrainedtheamountofhumaneffortexpendedoneachtopicTheTeamrsquosenthusiasmfortheresultswhichweresignificantlybetterthanits2015effortistemperedbythefactthatthesearchchallengespresentedinmostofthetopicsin2016werenotdifficultAsmentionedtheywereequivalenttoaneasylegalsearchprojectsuchasasimplesingleplaintiffemploymentlawdisputeTheFinalReportwillincludeadetailedanalysisoftheseresultsResearchQuestion2WhatistheimpactofmultipleerrorsinSMEjudgmentsbytheTRECassessorsonRecallandPrecisionShortAnswerTheimpactonRecallandPrecisionusingtheTeamrsquosmethodissignificantandasyouwouldexpectvarieddeterminedtothenumberoferrorsmadebyTRECassessorsinaparticulartopicAftertheTeamencounterednumerouserrorsonthefirsttopicsundertakenitwasforcedtocreateitsowngoldstandardoftruerelevantdocumentsforeachtopicTheTeamrsquosnewgoldstandardcorrectedfortheobviouserrorsseeninTRECrsquosassessmentsofrelevanceInallclosequestionsonrelevancethejudgmentofTRECrsquosassessorswasacceptedasaccurateTheobviouserrorsandinconsistenciesseenbytheTeamrsquosclosestudyofthedocumentswerenotacceptedInmostbutnotalltopicstheTeamdidnotusethedocumentswithobviouserrorsforitsmachinetrainingThiswillbefurtherdetailedintheFinalReportInalltopicstheTeamcreateditsownstandardandmadecomparativerecallprecisionandF1calculationsbasedthereonTheobservationandcorrectionofTRECerrorsingoldstandardbecameacollaborativeeffortamongtheTeamtopeerreviewandverifyourcorrected

11

standardMostoftheseeffortsmanyofwhichoccurredaftertheconclusionoftheTrackinAugustwerenotincludedinthetimereportsofeffortsexpendedbyattorneysinthesearchTheTeamwasveryreluctanttotakethisstepandwouldcertainlyhaveletpassafewerrorsormeredifferencesofopinionWerecognizethatnostandardiseverperfectAslawyerstheTeamunderstandsalltoowellthatsomeperhapsmanyjudgmentsonrelevancearesubjectiveAgaininallclosequestionsonrelevancethejudgmentsofTRECrsquosassessorswereacceptedeventhoughwepersonallydisagreedTheTeammeansnodisrespectbythecreationofanalternategoldstandardWeappreciateandrespecttheeffortsmadebytheTRECassessorsandorganizersStillthevolumeofobviouserrorsencounteredforcedustotakethisactionTheintegrityofourprimaryresearchquestiontotesttheeffectivenessofourhands-ontypeofadhochybridmethodsdemandedthatwedosoWeunderstandthattheimpactonotherTotalRecallParticipantsonesthatneveractuallyexaminedocumentswouldbefarlessperhapsevennegligibleStilltherecouldbeanimpactevenfortheminsometopicswheremorethananinsignificantnumberofthesameorsimilardocumentswereinconsistentlyjudgedThedecisiontonotaccepttheerrorsseenandtoinsteadcreateourowngoldstandardresultedinsubstantialadditionalworkfortheTeamInsometopicsweeventookthestepofmakingtwoldquoreasonablecallsrdquoOnewasforTRECandthesecondcallwhichalwaystookplaceonthenextsubmissionwasforourowninternaltrackingInthesecondcallwewouldincludeemailsthatweknewfrompriorsubmissionsofthesameorsimilardocumentwouldagainbeincorrectlyconsideredirrelevantbyTRECWeknewtheyweretruerelevantandsowaiteduntilafterourpublicreasonablecalltoTRECtosubmitthemandthenwemakeourowninternalreasonablecallWewereattemptingtoineffectplaytwogamesatonceandmaximizeourscoreineachgameKeepingtrackoftwostandardsaddedanunexpectedlayerofdifficultytoourworkandwedidnotbothertodosoinalltopicsThedual-calltopicswillbespecificallyidentifiedinourFinalReportInsometopicsthedifferencebetweenthetwostandardswassubstantialInafewtopicsitwasminorSomedifferenceswerefoundinalltopicsThisisnotunexpectedinanystandardinvolvingatleastsomewhatsubjectivemassrelevanceadjudicationsWedonotintendtoengageinacriticismofthespecificgoldstandardcreationmethodsusedin2016TotalRecallTrackexcepttonotethattheappealsprocedureincludedinthe2008and2009TRECLegalTrackscouldhaveimprovedtheaccuracyoftheresultsfortheTotalRecallTrackAthomeparticipants10FurthertheTeamunderstandsfrominformalreportsthattheTREC

10Participantappealrightscouldhavemitigatedtheerrorsseenin2016butthiscanbeburdensomeandasseeninthoseTracksin2008and2009cancreatetheirownissuesSeeOardHedlinTomlinsonBaronOverviewoftheTREC2008LegalTrackfoundathttptrecnistgovpubstrec17papersLEGALOVERVIEW08pdfandOardHedlinTomlinson

12

assessorsworkwasmuchmoretimeconstrainedthanwastheworkoftheTeamMoreoverunliketheTeamtheTRECassessorsdidnothavethebenefitofSMEinputfromanativeFloridianlawyer(Losey)whowasfamiliarwithFloridapoliticsandGovernorBushandsince2015hadputsubstantialtimereviewingthisemailcollectionTheFinalReportwillincludeadetailedcomparisonofrecallprecisionandF1basedonthecomparisonofboththeTRECandTeamassessmentsAfewexamplesofthemoreegregiouserrorsencounteredwillbeprovidedTheFinalReportmayalsocontainacompletelistingoftherevisedgoldstandardsthattheTeamcreatedforeachtopicoratleastaconditionalofferofdisclosureofthecorrectedstandardsTheTeaminvitesinputfromotherparticipantsandorganizersoftheTotalRecallTrackonthisissueAgaintheTeamrecognizesthatnogoldstandardiseverperfectincludingitsownrevisedstandardsThiswillbesetforthinfurtherdetailintheTeamrsquosfinalreportResearchQuestion3WhatisthemosteffectivesearchmethodfromtheTeamrsquosmultimodaltool-setforretrievalofrelevantdocumentsfortherelativelysimplisticsearchchallengespresentedbymostofthethirty-fourtopicsShortAnswerFortheeasytopicstheTeamfoundthatwhatitcallsldquotestedparametricBooleankeywordsearchrdquowasthemosteffectivesearchmethodtofindrelevantdocumentsTheTeamwassurprisedbyhowwellasophisticateduseofkeywordswasabletoidentifynearlyallofthetargetrelevantdocumentsinmanyofthetopicsinthisyearrsquosTotalRecallTrackThisshowsthecontinuedimportanceofamultimodalapproachtolegalsearchincludingespeciallykeywordsearchwhendoneproperlyespeciallyinsimplelawsuitsinvolvingrelativelyeasysearchissuesThiswillbesetforthinfurtherdetailintheTeamrsquosfinalreportResearchQuestion4Whatistheroleofactivemachinelearninginretrievalofrelevantdocumentsinthesimplisticsearchchallengespresentedbymostofthethirty-fourtopicsShortAnswerTheTeamfoundthatfortherelativelyeasytopicsinthisyearrsquosTotalRecallTracktheroleofactivemachinelearningwasreducedtoaqualitycontrolfunctionItwouldfindafewrelevantdocumentsnotlocatedbykeywordsearchorconceptandsimilaritysearchandthusimproverecallsomewhatInthesimplesttopicsactivemachinelearningdidnotfindanynewrelevantdocumentsbutinsteadonlyconfirmedthatallrelevantdocumentshadalreadybeenfoundbytheothermethodsThiswillbesetforthinfurtherdetailintheTeamrsquosfinalreport

FurtherDiscussionofResearchQuestion1

BaronOardOverviewoftheTREC2009LegalTrackfoundathttptrecnistgovpubstrec18papersLEGAL09OVERVIEWpdf

13

EvenusingthegivenuncorrectedTRECstandardforscoringandeventhoughinmosttopicswedidnottrainontheTRECreturned-relevantdocumentsthattheTeamconsideredirrelevanttheTeamoverallstillattainedexcellentresultsUnderthecorrectedstandardwhichwillbesharedintheFinalReporttheresultsweremuchbetterThefollowingchartcomparestheTeamrsquosRecallPrecisionandF-MeasureforeachAthometopicwiththeresultsobtainedbyTRECrsquosBMIandBMI-Descruns(onlyotherscoresnowavailable)

REASONABLECOMPARISON

Recall Precision F-Measure

EdiscoveryTeam

BMI BMI-Desc

EdiscoveryTeam

BMI BMI-Desc

EdiscoveryTeam

BMI BMI-Desc

athome401 SummerOlympics 4105 9170 9258 7344 1531 1545 5266 2623 2648athome402 Space 7257 9107 9028 2204 3086 3059 3381 4609 4570athome403 BottledWater 716 9771 9771 8041 3749 3749 1314 5418 5418athome404 EminentDomain 2294 9174 9193 6443 2655 2661 3383 4119 4127athome405 NewtGingrich 9508 9918 9836 2809 982 974 4336 1787 1773athome406 FelonDisenfran 7323 9291 9291 6691 958 958 6992 1737 1737

athome407FaithBasedInitiatives 3102 9180 9199 6872 4186 4195 4275 5750 5762

athome408 InvasiveSpecies 5517 8362 8362 6465 787 787 5953 1439 1439athome409 ClimateChange 8465 9505 9406 4071 1399 1385 5498 2440 2414athome410 Condominiums 9510 9948 9903 4613 4259 4240 6212 5964 5938athome411 StandYourGround 6629 7079 8427 6705 570 609 6667 1055 1136athome412 2000Recount 5738 9135 9248 4918 4097 4148 5296 5657 5727athome413 JamesVCrosby 9634 9908 9927 8900 2873 2878 9252 4455 4463athome414 MedicaidReform 9166 9690 9726 3532 3510 3523 5101 5154 5173athome415 GeorgeWBush 9408 6339 6708 9104 6109 5866 9253 6222 6259athome416 Marketing 6030 9419 9557 4208 4332 4396 4957 5935 6022athome417 MovieGallery 9961 9981 9966 9938 5728 5719 9949 7279 7267athome418 WarPreparations 3957 9305 9358 5034 1268 1276 4431 2232 2245

athome419LostFosterChildRilyaWilson 9884 9306 9361 1504 4813 4841 2610 6344 6382

athome420 Billboards 9254 9946 9932 9216 3165 3161 9235 4802 4795athome421 TrafficCameras 9048 10000 10000 1250 190 190 2197 373 373athome422 NonResidentAliens 9355 10000 10000 090 281 281 179 546 546

athome423NationalRifleAssociation 5105 9965 9965 3318 1868 1868 4022 3146 3146

athome424 GulfDrilling 9960 10000 10000 2276 2639 2639 3705 4176 4176athome425 CivilRightsAct2003 9132 9860 9860 9659 3370 3370 9388 5023 5023athome426 JeffreyGoldhagen 7000 9417 9417 8750 917 917 7778 1672 1672athome427 SlotMachines 8921 9668 9668 3577 1698 1698 5107 2889 2889athome428 NewStadiums 9310 9849 9849 1781 2695 2695 2991 4231 4231athome429 ElianGonzalez 9420 9927 9927 9241 3545 3545 9329 5224 5224

14

athome430RestraintsandHelmets 7195 9425 9465 6500 3640 3655 6830 5252 5274

athome431 AgencyCreditRate 7569 9931 9931 4760 1161 1161 5845 2078 2078athome432 GayAdoption 8500 9857 9857 8623 1120 1120 8561 2012 2012athome433 Abstinence 9911 10000 10000 6607 909 909 7929 1667 1667athome434 BacardiTrademark 8684 10000 10000 9167 344 344 8919 665 665

ThesecomparativestatisticsshowthescoresatthetimeofreasonablecallIntheprecisioncategorywhichinLegalSearchisthemoneyshotthathasthegreatestimpactonthecostofadocumentreviewprojectthee-DiscoveryTeamdominatedIthadthehighestprecisionlevelon28ofthe34topics(82)TheyarehighlightedinblueintheabovechartThee-DiscoveryTeamrsquosaverageprecisionscorewas571TheaverageprecisionofbothBMIandBMI-Descwas248ThustheTeamrsquosprecisionscorewasonaveragemoretwoandaquartertimeshigherthanthatoftheBMIstandards

IntheF1-measurewhichisthestandardvalueusedinlegalsearchtoevaluateoverallprecisionandrecallofaprojectthee-DiscoveryTeamagaindominatedThisissomewhatsurprisinginviewofthefactthatthesemeasurementswerebasedontheerror-filledTRECstandardTheTeamhadthehighestF1scoreson23ofthe34topics(68)TheyarehighlightedinblueintheabovechartThee-DiscoveryTeamrsquosaverageF1scorewas5769

5712

2483 2481

000

1000

2000

3000

4000

5000

6000

7000

AveragePrecisionAcrossTopics

EdiscoveryTeam BMI BMI-Desc

15

TheaverageF1ofBMIandBMI-Descwas365ThustheTeamrsquosF1scorewasonaveragemorethan58higherthanthatoftheBMIstandards

EvenusingTRECrsquoschallengedstandardtheTeamstillattainedhigherrecallthanboththeBMIandBMI-Descstandardsontwotopicstopic415GeorgeBushwithascoreof9408andtopic419LostFosterChildRilyaWilsonwithascoreof9884MoreovertheTeamattainedrecalllevelsinexcessof90atthetimeofreasonablecallinthefollowingadditionaltopics

bull 9508ontopic406FelonDisenfranchisementbull 9510ontopic410Condominiumsbull 9634ontopic413JamesVCrosbybull 9961ontopic417MovieGallerybull 9254ontopic420Billboardsbull 9048ontopic421TrafficCamerasbull 9355ontopic422NonResidentAliensbull 9960ontopic424GulfDrilling

5769

3646 3655

000

1000

2000

3000

4000

5000

6000

7000

AverageF-MeasureAcrossTopics

EdiscoveryTeam BMI BMI-Desc

16

bull 9132ontopic425CivilRightsActof2003bull 9310ontopic428NewStadiumsandArenasbull 9420ontopic429ElianGonzalezbull 9911ontopic433Abstinence

InsummaryevenwiththeTRECstandardwhereinmosttopicstheTeamdidnotusealldocumentsreturnedasrelevantforallofitstrainingdocumentsitattainedRecallscoresgreaterthan90infourteenofthethirty-fourtopicsTheTeamattainedRecallscoresof80orhigherinfouradditionaltopicsTheaverageresultsobtainedacrossallthirty-fourtopicsatthetimeofreasonablecallwereasfollows

bull 7546Recallbull 5712Precisionbull 5769F1bull 121DocsReviewedEffort

TheTeamwilldiscloseallofitsscoresunderthecorrectedgoldstandardintheFinalReportInthemeantimeherearetheaverageresultsobtainedacrossallthirty-fourtopicsatthetimeofreasonablecall

bull 8715Recallbull 6494Precisionbull 6874F1bull 124DocsReviewedEffort

AtthetimeofreasonablecalltheTeamhadrecallscoresgreaterthan90intwenty-oneofthethirty-fourtopicsandgreaterthan80infivemoretopicsRecallofgreaterthan99wasattainedinseventopicsAtthetimeofreasonablecalltheTeamhadprecisionscoresgreaterthan90inthirteenofthethirty-fourtopicsandgreaterthan80intwomoretopicsPrecisionofgreaterthan98wasattainedinsixtopicsAtthetimeofreasonablecalltheTeamhadF1scoresgreaterthan90intwelveofthethirty-fourtopicsandgreaterthan80inonemoretopicF1ofgreaterthan97wasattainedinfivetopicsWewereluckytoattainoneperfectscoreaswedidin2015intopic(417)withanF1scoreof100Theperfectscorewasobtainedbylocatingall5945documentsrelevantunderthecorrectedstandardafterreviewingonly45documentsThistopicwasfilledwithformlettersandwasafairlysimplesearchStilltheBMIandBMI-DescF1scoresforthistopicwerebothunder73TheTeamwaspleasedtoproveonceagainthatperfectrecallandperfectprecisionispossiblealbeitrareusingtheTeamrsquosmethods

17

ForquestionscommentsorsuggestionsconcerningthispreliminaryNotebookreportofthee-DiscoveryTeampleasecontactRalphLoseygmailcom

3

e-DiscoveryTeamMembersTheTeamismadeupoffivelegalsearchexpertsRalphLoseyJimSullivanTonyReichenbergerLeviKuehnJaniGrantz--andoneldquorobotrdquoMrEDR(thesoftwaretheyused)TheteammembersarenotscientistsorinacademiaMostarelawyerswhospendtheirworkinghourslookingforevidenceinlargechaoticdatasetssuchasemailTheytypicallyassistotherattorneysinlawsuitsandlegalinvestigationsTheirworkincludestheidentificationreviewanalysisclassificationproductionandadmissionofElectronicallyStoredInformation(ESI)asevidenceincourtsintheUnitesStatesandelsewhereTheTeamleaderisRalphCLoseyJDafull-timepracticingattorneyprincipalandNationale-DiscoveryCounselofJacksonLewisPCaUSlawfirmwithover800attorneysandfifty-fiveofficesHehasover36yearsofexperiencedoinglegaldocumentreviewsLoseyisalsoabloggerate-DiscoveryTeamcomwherehehaswrittenovertwomillionwordsone-discoveryHehasalsowrittensixbookspublishedbytheAmericanBarAssociationandWestThompsonThepastfiveyearsLoseyhasparticipatedinmultiplepublicandprivateexperimentssomecompetitivetotestandprovevariouspredictivecodingmethodsLoseyhasalsowrittenoversixtyarticlesonthesubjectoflegalsearchandpredictivecodingJimSullivanJDTonyReichenbergerJDandJaniGrantzJDareattorneysearchandreviewspecialistswhoworkforKrollOntrackInc(KO)LeviKuehnisanon-attorneysearchandreviewspecialistswhoworksforKOKrollOntrackistheprimarye-discoveryvendorusedbyLoseyandhislawfirmItisaglobale-Discoverysoftwareprocessingandprojectmanagementcompany(eDiscoverycom)TheTeamrobotMrEDRistheTeamrsquospersonalizationofKrollOntrackrsquossoftwareeDiscoverycomReview(EDR)LoseySullivanandReichenbergerparticipatedinthe2015TRECTotalRecallTrackSotoodidapriorversionofMrEDRwhichisinaprocessofconstantenhancementThesoftwareversionusedin2016containedthelatestbeta-testversionofthesoftwarethathasnotyetbeenreleasedtothepublic

ResearchQuestionsConsideredatTREC2015RecallTrackBackgroundtoquestionsconsideredItisgenerallyacceptedinthelegalsearchcommunitythattheuseofpredictivecodingtypesearchalgorithmscanimprovethesearchandreviewofdocumentsinlegalproceedings1Theuseofpredictivecodinghasalsobeenapproved

1PredictiveCodingisdefinedbyTheGrossman-CormackGlossaryofTechnology-AssistedReview2013FedCtsLRev7(January2013)(Grossman-CormackGlossary)asldquoAnindustry-specifictermgenerallyusedtodescribeaTechnologyAssistedReviewprocessinvolvingtheuseofaMachineLearningAlgorithmtodistinguishRelevantfromNon-RelevantDocumentsbasedon

4

andevenencouragedbyvariouscourtsaroundtheworldincludingnumerouscourtsintheUS2Althoughthereisagreementonuseofpredictivecodingthereiscontroversyanddisagreementastothemosteffectivemethodsofuse3ThereareproponentsforavarietyofdifferentmethodstofindtrainingdocumentsforpredictivecodingSomeadvocatefortheuseofchanceselectionaloneothersfortheuseoftoprankeddocumentsaloneothersforacombinationoftoprankedandmid-levelrankeddocumentswhereclassificationisunsure4The-DiscoveryTeamusesamethodthatincludesacombinationofallthreeoftheseselectionprocessesandmoreSomeattorneysandpredictivecodingsoftwarevendorsadvocatefortheuseofpredictivecodingsearchmethodsaloneandforegoothersearchmethodswhentheydososuchaskeywordsearchconceptsearchessimilaritysearchesandlinearreviewThee-DiscoveryTeammembersrejectthatapproachandinsteadadvocateforahybridmultimodalapproachtheycallPredictiveCoding405ThismethodusesanapproachtoactivemachinelearningthattheTeamcallsISTstandingforldquoIntelligentlySpacedTrainingrdquoUnderISTtheattorneyinchargedecidesexactlywhentotrainThisisdifferentfromothersystemswhere

SubjectMatterExpert(s)CodingofaTrainingSetofDocumentsrdquoATechnologyAssistedReviewprocessisdefinedasldquoAprocessforPrioritizingorCodingaCollectionofelectronicDocumentsusingacomputerizedsystemthatharnesseshumanjudgmentsofoneormoreSubjectMatterExpert(s)onasmallersetofDocumentsandthenextrapolatesthosejudgmentstotheremainingDocumentCollectionhellipTARprocessesgenerallyincorporateStatisticalModelsandorSamplingtechniquestoguidetheprocessandtomeasureoverallsystemeffectivenessrdquoAlsoseeTechnology-AssistedReviewinE-DiscoveryCanBeMoreEffectiveandMoreEfficientThanExhaustiveManualReviewRichmondJournalofLawandTechnologyVolXVIIIssue3Article11(2011)2DaSilvaMoorevPublicisGroupe868FSupp2d137(SDNY2012)andnumerouscaseslatercitingtoandfollowingthislandmarkdecisionbyJudgeAndrewPeckincludinganothermorerecentopinionbyJudgePeckRioTintoPLCvValeSA306FRD125(SDNY2015)3GrossmanampCormackEvaluationofMachine-LearningProtocolsforTechnology-AssistedReviewinElectronicDiscoverySIGIRrsquo14July6ndash112014GrossmanampCormackCommentsonldquoTheImplicationsofRule26(g)ontheUseofTechnology-AssistedReviewrdquo7FederalCourtsLawReview286(2014)HerbertRoitblatseriesoffiveOrcaTecblogposts(12345)May-August2014HerbertRoitblatDaubertRule26(g)andtheeDiscoveryTurkeyOrcaTecblogAugust11th2014HickmanampSchienemanTheImplicationsofRule26(g)ontheUseofTechnology-AssistedReview7FEDCTSLREV239(2013)LoseyRPredictiveCoding30partone(e-DiscoveryTeam101115)4IdWebberRandomvsactiveselectionoftrainingexamplesine-discovery(Evaluatinge-Discoveryblog71414)5LoseyRPredictiveCoding40ndashNineKeyPointsofLegalDocumentReviewandanUpdatedStatementofOurWorkflow(e-DiscoveryTeam91216)(PartOneofanEightPartSeriesexplainingtherecentadvancementsfromourPredictiveCodingmethodfromversion30toversion40)

5

themachineretrainsaftereachdocumentiscodedorcertainpredeterminednumberandthehumantrainerhasnodiscretionastotiming6Thee-DiscoveryTeamapproachincludesalltypesofsearchmethods(thusthetermmultimodal)tofindrelevantdocumentswithprimaryrelianceplacedonpredictivecodingTheTeamalsousesavarietyofmethodstofindsuitabletrainingdocumentsforpredictivecodingincludinghighrankingdocumentsandallothersearchmethodsThisisafundamentaldifferencewithothermethodsthatrelyentirelyonpredictivecodingtofindrelevantdocumentsandrelyentirelyuponhigh-rankingdocumentsfortrainingGrossmanandCormackhavescientificallytestedthesehigh-rankingtrainingmethodsandmeasuredtheireffectivenessbutthisdoesnotmeanthattheyendorsethemasanexclusivetoolnorclaimthistobetheirownpreferredmethod7FourResearchQuestions

1 PrimaryQuestion(repeatfrom2015)WhatRecallPrecisionandEffortlevelswillthee-DiscoveryTeamattaininTRECtestconditionsoverallthirty-fourtopicsusingtheTeamrsquosPredictiveCoding40hybridmultimodalsearchmethodsandKrollOntrackrsquossoftwareeDiscoverycomReview(EDR)

2 WhatistheimpactofincorrectSubjectMatterExpert(ldquoSMErdquo)judgmentsbytheTRECassessorsonRecallandPrecision(Unplannedquestionthatunfortunatelyaroseoutofthecircumstancesencountered)

3 WhatisthemosteffectivesearchmethodfromtheTeamrsquosmultimodaltool-setforretrievalofrelevantdocumentsintherelativelysimplisticsearchchallengespresentedbymostbutnotallofthethirty-fourtopics(Unplannedquestionthataroseoutofthecircumstancesencountered)

4 Whatistheroleofactivemachinelearninginretrievalofrelevantdocumentsinthesimplisticsearchchallengespresentedbymostofthethirty-fourtopics(Unplannedquestionrelatedtothethirdissueabovethatalsoaroseoutofthecircumstancesencountered)

OverviewOfTeamParticipationin2016TRECRecallTrack

Thee-DiscoveryTeamparticipatedinallthirty-fouroftheTotalRecallTrackAthometopicsItdidnotparticipateinthefullyautomatedTRECTotalRecallsandboxAllthirty-fourtopicssearchedacollectionofpublicemailsofformerFloridaGovernorJebBushTherewere290099emailsintheJebBushEmailcollectionIntheversionoftheJebBushemailsusedbyTRECalmostallmetadataoftheseemailshasbeenremovedMoreovertheassociated

6ThemeritsoftheTeamrsquosapproachtothetimingofmachinelearningaredetailedinPredictiveCoding40PartTwo7GrossmanampCormackEvaluationofMachine-LearningProtocolsforTechnology-AssistedReviewinElectronicDiscoverySIGIRrsquo14July6ndash112014

6

attachmentsandimageswerenotpresentOthercollectionsoftheJebBushemailexistfromPSTfilesthatincludemoreinformationbuttheTeamdidnotutilizethisinformationandlimiteditseffortsandattentiontotheofficialTRECcollectionThissameJebBushemailcollectionwasusedbytheTotalRecallTrackin2015fortentopicsIn2015LoseysearchedalltenofthesetentopicsNoneofthesesearchtopicswasrepeatedin2016Thethirty-fourtopicssearchedin2016andtheirnamesareshownbelowOnthefarrightcolumnarethefirstnamesofthee-DiscoveryTeammemberwhodidthereviewforthattopicThethirteentopicsinredwereconsideredmandatorybyTRECandtheremainingtwenty-onewereoptionalThee-DiscoveryTeamdidalltopics

Topic Name Reviewer

401 SummerOlympics Ralph402 Space Tony403 BottledWater Ralph404 EminentDomain Tony405 NewtGingrich Ralph406 FelonDisenfranchisement Ralph407 FaithBasedInitiatives Ralph408 InvasiveSpecies Tony409 ClimateChange Levi410 Condominiums Tony411 StandYourGround Ralph412 2000Recount Tony413 JamesVCrosby Jim414 MedicaidReform Tony415 GeorgeWBush Jim416 Marketing Jim417 MovieGallery Ralph418 WarPreparations Tony419 LostFosterChildRilyaWilson Levi420 Billboards Jim421 TrafficCameras Jim422 NonResidentAliens Tony423 NationalRifleAssociation Tony424 GulfDrilling Levi425 CivilRightsActof2003 Ralph426 JeffreyGoldhagen Ralph

7

427 SlotMachines Jim428 NewStadiumsandArenas Levi429 ElianGonzalez Jim430 RestraintsandHelmets Jani431 AgencyCreditRatings Tony432 GayAdoption Jani433 Abstinence Jim434 BacardiTrademark Ralph

RalphLoseydidtentopicsTonyReichenbergerdidtenJimSullivandideightLeviKuehndidfourandJaniGrantzdidtwoUnliketheTeamrsquos2015effortnocontractreviewattorneyswereutilizedonanytopicTheywereallsoloeffortsalthoughtherewassomecoordinationandcommutationsbetweenteammembersontheSMEtypeissuesencounteredThispertainedtoquestionsoftruerelevanceanderrorsfoundinthegoldstandardformostofthesetopicsIneachTopictheassignedTeamattorneypersonallyreadandevaluatedfortruerelevanceeveryemailthatTRECreturnedasarelevantdocumentandeveryemailthatTRECunexpectedlyreturnedasIrrelevantSomeofthesewerereadandstudiedmultipletimesbeforewemadeourfinalcallsontruerelevancedeterminationsthattookintoconsiderationandgavesomedeferencetotheTRECassessoradjudicationsbutwerenotboundbythemManyotheremailsthattheTeammembersconsideredirrelevantandTRECagreedwerealsopersonallyreviewedaspartoftheirsearcheffortsAsmentionedtherewassometimesconsultationsanddiscussionbetweenteammembersastotheunexpectedTRECopinionsonrelevanceAllofthethirty-fourtopicspresentedsearchchallengestotheTeamthatwereeasiersomefareasierthantheTeamtypicallyfaceasattorneysleadinglegaldocumentreviewprojectsTheywereroughlyequivalenttothemostsimplisticchallengesthattheymightfaceinprojectsinvolvingverysimplelegaldisputesAfewofthesearchtopicsincludedlegalissuesmuchmorethanwerefoundinthe2015TotalRecallTrackThisisarevisionthattheTeamrequestedandappreciatedbecauseitallowedtestingoflegaljudgmentandanalysisindeterminationoftruerelevanceinthesetopicsInlegalsearchsuchskillsareobviouslyveryimportantInmostofthe2016TotalRecalltopicshowevernospeciallegaltrainingoranalysiswasrequiredforadeterminationoftruerelevanceTheTeamrsquosfinalreportwillspecificallyidentifyeachtopicandastheTeamdidinits2015TRECreportprovidefulldetailsonthetypesofsearchesperformedforeachtopicanddifficultiesencountered

SummaryoftheTeamrsquosWork

Thee-DiscoveryTeamrsquos2016TotalRecallTrackAthomeprojectstartedJune32016andconcludedonAugust312016Usingasingleexpertreviewerineachtopicthee-DiscoveryTeamclassified9863366documentsin34differentreviewprojects

8

Allattorneysusedthee-DiscoveryTeamrsquosPredictiveCoding40hybridmultimodalISTsearchtechniquesandwereassistedbytheKOsoftwareEDRTheyreliedonactivemachinelearningandothersearchtechniquestofindrelevantdocumentsandeffectivetrainingdocumentsThevarioustypesofsearchesincludedintheTeamrsquosmultimodalapproachareshowninthesearchpyramidbelow

LinearreviewreferstoanSMErsquosexaminationofalldocumentsbycertainkeywitnessesinalawsuitduringcertaintimeframescriticaltothedisputedfactsinalawsuitKeywordsearchinourmethodologyreferstotheuseoftermsoriginatingfromlegalanddocumentanalysisandfromwitnessinterviewsJudgmentalsamplingandverificationbySMEsarealsousedtotestthetermsbeforetheyareusedthroughoutadocumentcollectionOurkeywordsearchalsoincludesavarietyofBooleanfunctionsandparametrictargetingwhereinsearchesarelimitedtocertainmetadatafieldsofanelectronicdocumentSimilarityandconceptsearchesrefertoavarietyofpassivemachinelearninganalyticsearchtechniquesTheAIsearchatthetopofthepyramidreferstotheuseofactivemachinelearningTheEDRKOsoftwareusesaproprietarytypeoflogisticregressionalgorithmThestandardeight-stepworkflowusedbytheTeaminlegalsearchprojectsisshowninthediagrambelow8TomeettheTeamrsquosselfimposedtimerequirementsofcompletingeveryreviewprojectwithminimaltimeeffortsthestandardstepsThreeandSevenwereomitted

8LoseyRPredictiveCoding40ndashNineKeyPointsofLegalDocumentReviewandanUpdatedStatementofOurWorkflow(e-DiscoveryTeamOctober2016)containsacompletedescriptionofalleightstepsinpartsSixandSeven

9

aswillbefurtherexplainedFurtherduetotheset-upoftheTRECexperimentsthefirststepofourworkflowESICommunicationswasseverelyconstrainedtothepointofbeingpracticallymeaninglessaswillalsobefurtherexplainedTheTeamrsquosstandardworkflowwasthusreducedtofivestepsasshownbelow

InthefirststepofESICommunicationsteammembersonalegalreviewprojecttypicallyspendhoursindiscussionandanalysisofscopeofrelevanceandthetargetdocumentsThecommunicationsoftenincludehundredsofwrittenexchangesbothinformalsuchasemailsandchatsandformalsuchas(1)detailedrequestsforinformationcontainedincourtdocumentssuchasubpoenasorRequestForProduction(2)inputfromaqualifiedSMEwhoistypicallyalegalexpertwithdeepknowledgeofthefactualissuesinthecaseandthusdeepknowledgeofwhatthepresidingjudgeinthelegalproceedingwillholdtoberelevantanddiscoverableand(3)dialogueswiththepartyrequestingtheproductionofdocumentstoclarifythesearchtargetandotherpartiesTheESIcommunicationsmayleadtoformalmotionswiththegoverningcourtlegalmemorandumshearingsbeforethepresidingjudgeandopinionsrenderedbyoneormorejudgesonthescopeofrelevance9

9IdatPartSixwhereinthefirststepofESICommunicationsisexplainedindetail

10

TheonlyESIcommunicationsintheTRECexperimentalset-upwasaveryshortonesentencedescriptionofrelevanceforeachtopicTwotopicshadatwo-sentencedescription(410-Condominiumsand423-NationalRifleAssociation)TheonlyothertypeofESIcommunicationsinthisTRECTrackweretheautomatedinstantreturnsofalldocumentssubmittedastowhetherTRECconsideredthemtoberelevantornotTherewerenoappealsorotherproceduresset-upforAthomeparticipantswhoactuallyexaminedthedocumentsfortruerelevancetochallengeobviouserrorsinjudgment

ShortAnswerstoResearchQuestionsResearchQuestion1(PrimaryQuestion)WhatRecallPrecisionandEffortlevelswillthee-DiscoveryTeamattaininTRECtestconditionsoverallthirty-fourTopicsusingtheTeamrsquosPredictiveCoding40hybridmultimodalISTsearchmethodsandKrollOntrackrsquossoftwareeDiscoveryReview(EDR)ShortAnswertoPrimaryQuestionAgainlikelastyeartheTeamattainedexcellentresultswithhighlevelsofRecallandPrecisioninalltopicsincludingperfectornearperfectresultsinseveraltopicsusingthecorrectedgoldstandardTheTeamwasabletodosoeventhoughitonlyusedfiveoftheeightstepsinitsusualmethodologyandeventhoughitintentionallyseverelyconstrainedtheamountofhumaneffortexpendedoneachtopicTheTeamrsquosenthusiasmfortheresultswhichweresignificantlybetterthanits2015effortistemperedbythefactthatthesearchchallengespresentedinmostofthetopicsin2016werenotdifficultAsmentionedtheywereequivalenttoaneasylegalsearchprojectsuchasasimplesingleplaintiffemploymentlawdisputeTheFinalReportwillincludeadetailedanalysisoftheseresultsResearchQuestion2WhatistheimpactofmultipleerrorsinSMEjudgmentsbytheTRECassessorsonRecallandPrecisionShortAnswerTheimpactonRecallandPrecisionusingtheTeamrsquosmethodissignificantandasyouwouldexpectvarieddeterminedtothenumberoferrorsmadebyTRECassessorsinaparticulartopicAftertheTeamencounterednumerouserrorsonthefirsttopicsundertakenitwasforcedtocreateitsowngoldstandardoftruerelevantdocumentsforeachtopicTheTeamrsquosnewgoldstandardcorrectedfortheobviouserrorsseeninTRECrsquosassessmentsofrelevanceInallclosequestionsonrelevancethejudgmentofTRECrsquosassessorswasacceptedasaccurateTheobviouserrorsandinconsistenciesseenbytheTeamrsquosclosestudyofthedocumentswerenotacceptedInmostbutnotalltopicstheTeamdidnotusethedocumentswithobviouserrorsforitsmachinetrainingThiswillbefurtherdetailedintheFinalReportInalltopicstheTeamcreateditsownstandardandmadecomparativerecallprecisionandF1calculationsbasedthereonTheobservationandcorrectionofTRECerrorsingoldstandardbecameacollaborativeeffortamongtheTeamtopeerreviewandverifyourcorrected

11

standardMostoftheseeffortsmanyofwhichoccurredaftertheconclusionoftheTrackinAugustwerenotincludedinthetimereportsofeffortsexpendedbyattorneysinthesearchTheTeamwasveryreluctanttotakethisstepandwouldcertainlyhaveletpassafewerrorsormeredifferencesofopinionWerecognizethatnostandardiseverperfectAslawyerstheTeamunderstandsalltoowellthatsomeperhapsmanyjudgmentsonrelevancearesubjectiveAgaininallclosequestionsonrelevancethejudgmentsofTRECrsquosassessorswereacceptedeventhoughwepersonallydisagreedTheTeammeansnodisrespectbythecreationofanalternategoldstandardWeappreciateandrespecttheeffortsmadebytheTRECassessorsandorganizersStillthevolumeofobviouserrorsencounteredforcedustotakethisactionTheintegrityofourprimaryresearchquestiontotesttheeffectivenessofourhands-ontypeofadhochybridmethodsdemandedthatwedosoWeunderstandthattheimpactonotherTotalRecallParticipantsonesthatneveractuallyexaminedocumentswouldbefarlessperhapsevennegligibleStilltherecouldbeanimpactevenfortheminsometopicswheremorethananinsignificantnumberofthesameorsimilardocumentswereinconsistentlyjudgedThedecisiontonotaccepttheerrorsseenandtoinsteadcreateourowngoldstandardresultedinsubstantialadditionalworkfortheTeamInsometopicsweeventookthestepofmakingtwoldquoreasonablecallsrdquoOnewasforTRECandthesecondcallwhichalwaystookplaceonthenextsubmissionwasforourowninternaltrackingInthesecondcallwewouldincludeemailsthatweknewfrompriorsubmissionsofthesameorsimilardocumentwouldagainbeincorrectlyconsideredirrelevantbyTRECWeknewtheyweretruerelevantandsowaiteduntilafterourpublicreasonablecalltoTRECtosubmitthemandthenwemakeourowninternalreasonablecallWewereattemptingtoineffectplaytwogamesatonceandmaximizeourscoreineachgameKeepingtrackoftwostandardsaddedanunexpectedlayerofdifficultytoourworkandwedidnotbothertodosoinalltopicsThedual-calltopicswillbespecificallyidentifiedinourFinalReportInsometopicsthedifferencebetweenthetwostandardswassubstantialInafewtopicsitwasminorSomedifferenceswerefoundinalltopicsThisisnotunexpectedinanystandardinvolvingatleastsomewhatsubjectivemassrelevanceadjudicationsWedonotintendtoengageinacriticismofthespecificgoldstandardcreationmethodsusedin2016TotalRecallTrackexcepttonotethattheappealsprocedureincludedinthe2008and2009TRECLegalTrackscouldhaveimprovedtheaccuracyoftheresultsfortheTotalRecallTrackAthomeparticipants10FurthertheTeamunderstandsfrominformalreportsthattheTREC

10Participantappealrightscouldhavemitigatedtheerrorsseenin2016butthiscanbeburdensomeandasseeninthoseTracksin2008and2009cancreatetheirownissuesSeeOardHedlinTomlinsonBaronOverviewoftheTREC2008LegalTrackfoundathttptrecnistgovpubstrec17papersLEGALOVERVIEW08pdfandOardHedlinTomlinson

12

assessorsworkwasmuchmoretimeconstrainedthanwastheworkoftheTeamMoreoverunliketheTeamtheTRECassessorsdidnothavethebenefitofSMEinputfromanativeFloridianlawyer(Losey)whowasfamiliarwithFloridapoliticsandGovernorBushandsince2015hadputsubstantialtimereviewingthisemailcollectionTheFinalReportwillincludeadetailedcomparisonofrecallprecisionandF1basedonthecomparisonofboththeTRECandTeamassessmentsAfewexamplesofthemoreegregiouserrorsencounteredwillbeprovidedTheFinalReportmayalsocontainacompletelistingoftherevisedgoldstandardsthattheTeamcreatedforeachtopicoratleastaconditionalofferofdisclosureofthecorrectedstandardsTheTeaminvitesinputfromotherparticipantsandorganizersoftheTotalRecallTrackonthisissueAgaintheTeamrecognizesthatnogoldstandardiseverperfectincludingitsownrevisedstandardsThiswillbesetforthinfurtherdetailintheTeamrsquosfinalreportResearchQuestion3WhatisthemosteffectivesearchmethodfromtheTeamrsquosmultimodaltool-setforretrievalofrelevantdocumentsfortherelativelysimplisticsearchchallengespresentedbymostofthethirty-fourtopicsShortAnswerFortheeasytopicstheTeamfoundthatwhatitcallsldquotestedparametricBooleankeywordsearchrdquowasthemosteffectivesearchmethodtofindrelevantdocumentsTheTeamwassurprisedbyhowwellasophisticateduseofkeywordswasabletoidentifynearlyallofthetargetrelevantdocumentsinmanyofthetopicsinthisyearrsquosTotalRecallTrackThisshowsthecontinuedimportanceofamultimodalapproachtolegalsearchincludingespeciallykeywordsearchwhendoneproperlyespeciallyinsimplelawsuitsinvolvingrelativelyeasysearchissuesThiswillbesetforthinfurtherdetailintheTeamrsquosfinalreportResearchQuestion4Whatistheroleofactivemachinelearninginretrievalofrelevantdocumentsinthesimplisticsearchchallengespresentedbymostofthethirty-fourtopicsShortAnswerTheTeamfoundthatfortherelativelyeasytopicsinthisyearrsquosTotalRecallTracktheroleofactivemachinelearningwasreducedtoaqualitycontrolfunctionItwouldfindafewrelevantdocumentsnotlocatedbykeywordsearchorconceptandsimilaritysearchandthusimproverecallsomewhatInthesimplesttopicsactivemachinelearningdidnotfindanynewrelevantdocumentsbutinsteadonlyconfirmedthatallrelevantdocumentshadalreadybeenfoundbytheothermethodsThiswillbesetforthinfurtherdetailintheTeamrsquosfinalreport

FurtherDiscussionofResearchQuestion1

BaronOardOverviewoftheTREC2009LegalTrackfoundathttptrecnistgovpubstrec18papersLEGAL09OVERVIEWpdf

13

EvenusingthegivenuncorrectedTRECstandardforscoringandeventhoughinmosttopicswedidnottrainontheTRECreturned-relevantdocumentsthattheTeamconsideredirrelevanttheTeamoverallstillattainedexcellentresultsUnderthecorrectedstandardwhichwillbesharedintheFinalReporttheresultsweremuchbetterThefollowingchartcomparestheTeamrsquosRecallPrecisionandF-MeasureforeachAthometopicwiththeresultsobtainedbyTRECrsquosBMIandBMI-Descruns(onlyotherscoresnowavailable)

REASONABLECOMPARISON

Recall Precision F-Measure

EdiscoveryTeam

BMI BMI-Desc

EdiscoveryTeam

BMI BMI-Desc

EdiscoveryTeam

BMI BMI-Desc

athome401 SummerOlympics 4105 9170 9258 7344 1531 1545 5266 2623 2648athome402 Space 7257 9107 9028 2204 3086 3059 3381 4609 4570athome403 BottledWater 716 9771 9771 8041 3749 3749 1314 5418 5418athome404 EminentDomain 2294 9174 9193 6443 2655 2661 3383 4119 4127athome405 NewtGingrich 9508 9918 9836 2809 982 974 4336 1787 1773athome406 FelonDisenfran 7323 9291 9291 6691 958 958 6992 1737 1737

athome407FaithBasedInitiatives 3102 9180 9199 6872 4186 4195 4275 5750 5762

athome408 InvasiveSpecies 5517 8362 8362 6465 787 787 5953 1439 1439athome409 ClimateChange 8465 9505 9406 4071 1399 1385 5498 2440 2414athome410 Condominiums 9510 9948 9903 4613 4259 4240 6212 5964 5938athome411 StandYourGround 6629 7079 8427 6705 570 609 6667 1055 1136athome412 2000Recount 5738 9135 9248 4918 4097 4148 5296 5657 5727athome413 JamesVCrosby 9634 9908 9927 8900 2873 2878 9252 4455 4463athome414 MedicaidReform 9166 9690 9726 3532 3510 3523 5101 5154 5173athome415 GeorgeWBush 9408 6339 6708 9104 6109 5866 9253 6222 6259athome416 Marketing 6030 9419 9557 4208 4332 4396 4957 5935 6022athome417 MovieGallery 9961 9981 9966 9938 5728 5719 9949 7279 7267athome418 WarPreparations 3957 9305 9358 5034 1268 1276 4431 2232 2245

athome419LostFosterChildRilyaWilson 9884 9306 9361 1504 4813 4841 2610 6344 6382

athome420 Billboards 9254 9946 9932 9216 3165 3161 9235 4802 4795athome421 TrafficCameras 9048 10000 10000 1250 190 190 2197 373 373athome422 NonResidentAliens 9355 10000 10000 090 281 281 179 546 546

athome423NationalRifleAssociation 5105 9965 9965 3318 1868 1868 4022 3146 3146

athome424 GulfDrilling 9960 10000 10000 2276 2639 2639 3705 4176 4176athome425 CivilRightsAct2003 9132 9860 9860 9659 3370 3370 9388 5023 5023athome426 JeffreyGoldhagen 7000 9417 9417 8750 917 917 7778 1672 1672athome427 SlotMachines 8921 9668 9668 3577 1698 1698 5107 2889 2889athome428 NewStadiums 9310 9849 9849 1781 2695 2695 2991 4231 4231athome429 ElianGonzalez 9420 9927 9927 9241 3545 3545 9329 5224 5224

14

athome430RestraintsandHelmets 7195 9425 9465 6500 3640 3655 6830 5252 5274

athome431 AgencyCreditRate 7569 9931 9931 4760 1161 1161 5845 2078 2078athome432 GayAdoption 8500 9857 9857 8623 1120 1120 8561 2012 2012athome433 Abstinence 9911 10000 10000 6607 909 909 7929 1667 1667athome434 BacardiTrademark 8684 10000 10000 9167 344 344 8919 665 665

ThesecomparativestatisticsshowthescoresatthetimeofreasonablecallIntheprecisioncategorywhichinLegalSearchisthemoneyshotthathasthegreatestimpactonthecostofadocumentreviewprojectthee-DiscoveryTeamdominatedIthadthehighestprecisionlevelon28ofthe34topics(82)TheyarehighlightedinblueintheabovechartThee-DiscoveryTeamrsquosaverageprecisionscorewas571TheaverageprecisionofbothBMIandBMI-Descwas248ThustheTeamrsquosprecisionscorewasonaveragemoretwoandaquartertimeshigherthanthatoftheBMIstandards

IntheF1-measurewhichisthestandardvalueusedinlegalsearchtoevaluateoverallprecisionandrecallofaprojectthee-DiscoveryTeamagaindominatedThisissomewhatsurprisinginviewofthefactthatthesemeasurementswerebasedontheerror-filledTRECstandardTheTeamhadthehighestF1scoreson23ofthe34topics(68)TheyarehighlightedinblueintheabovechartThee-DiscoveryTeamrsquosaverageF1scorewas5769

5712

2483 2481

000

1000

2000

3000

4000

5000

6000

7000

AveragePrecisionAcrossTopics

EdiscoveryTeam BMI BMI-Desc

15

TheaverageF1ofBMIandBMI-Descwas365ThustheTeamrsquosF1scorewasonaveragemorethan58higherthanthatoftheBMIstandards

EvenusingTRECrsquoschallengedstandardtheTeamstillattainedhigherrecallthanboththeBMIandBMI-Descstandardsontwotopicstopic415GeorgeBushwithascoreof9408andtopic419LostFosterChildRilyaWilsonwithascoreof9884MoreovertheTeamattainedrecalllevelsinexcessof90atthetimeofreasonablecallinthefollowingadditionaltopics

bull 9508ontopic406FelonDisenfranchisementbull 9510ontopic410Condominiumsbull 9634ontopic413JamesVCrosbybull 9961ontopic417MovieGallerybull 9254ontopic420Billboardsbull 9048ontopic421TrafficCamerasbull 9355ontopic422NonResidentAliensbull 9960ontopic424GulfDrilling

5769

3646 3655

000

1000

2000

3000

4000

5000

6000

7000

AverageF-MeasureAcrossTopics

EdiscoveryTeam BMI BMI-Desc

16

bull 9132ontopic425CivilRightsActof2003bull 9310ontopic428NewStadiumsandArenasbull 9420ontopic429ElianGonzalezbull 9911ontopic433Abstinence

InsummaryevenwiththeTRECstandardwhereinmosttopicstheTeamdidnotusealldocumentsreturnedasrelevantforallofitstrainingdocumentsitattainedRecallscoresgreaterthan90infourteenofthethirty-fourtopicsTheTeamattainedRecallscoresof80orhigherinfouradditionaltopicsTheaverageresultsobtainedacrossallthirty-fourtopicsatthetimeofreasonablecallwereasfollows

bull 7546Recallbull 5712Precisionbull 5769F1bull 121DocsReviewedEffort

TheTeamwilldiscloseallofitsscoresunderthecorrectedgoldstandardintheFinalReportInthemeantimeherearetheaverageresultsobtainedacrossallthirty-fourtopicsatthetimeofreasonablecall

bull 8715Recallbull 6494Precisionbull 6874F1bull 124DocsReviewedEffort

AtthetimeofreasonablecalltheTeamhadrecallscoresgreaterthan90intwenty-oneofthethirty-fourtopicsandgreaterthan80infivemoretopicsRecallofgreaterthan99wasattainedinseventopicsAtthetimeofreasonablecalltheTeamhadprecisionscoresgreaterthan90inthirteenofthethirty-fourtopicsandgreaterthan80intwomoretopicsPrecisionofgreaterthan98wasattainedinsixtopicsAtthetimeofreasonablecalltheTeamhadF1scoresgreaterthan90intwelveofthethirty-fourtopicsandgreaterthan80inonemoretopicF1ofgreaterthan97wasattainedinfivetopicsWewereluckytoattainoneperfectscoreaswedidin2015intopic(417)withanF1scoreof100Theperfectscorewasobtainedbylocatingall5945documentsrelevantunderthecorrectedstandardafterreviewingonly45documentsThistopicwasfilledwithformlettersandwasafairlysimplesearchStilltheBMIandBMI-DescF1scoresforthistopicwerebothunder73TheTeamwaspleasedtoproveonceagainthatperfectrecallandperfectprecisionispossiblealbeitrareusingtheTeamrsquosmethods

17

ForquestionscommentsorsuggestionsconcerningthispreliminaryNotebookreportofthee-DiscoveryTeampleasecontactRalphLoseygmailcom

4

andevenencouragedbyvariouscourtsaroundtheworldincludingnumerouscourtsintheUS2Althoughthereisagreementonuseofpredictivecodingthereiscontroversyanddisagreementastothemosteffectivemethodsofuse3ThereareproponentsforavarietyofdifferentmethodstofindtrainingdocumentsforpredictivecodingSomeadvocatefortheuseofchanceselectionaloneothersfortheuseoftoprankeddocumentsaloneothersforacombinationoftoprankedandmid-levelrankeddocumentswhereclassificationisunsure4The-DiscoveryTeamusesamethodthatincludesacombinationofallthreeoftheseselectionprocessesandmoreSomeattorneysandpredictivecodingsoftwarevendorsadvocatefortheuseofpredictivecodingsearchmethodsaloneandforegoothersearchmethodswhentheydososuchaskeywordsearchconceptsearchessimilaritysearchesandlinearreviewThee-DiscoveryTeammembersrejectthatapproachandinsteadadvocateforahybridmultimodalapproachtheycallPredictiveCoding405ThismethodusesanapproachtoactivemachinelearningthattheTeamcallsISTstandingforldquoIntelligentlySpacedTrainingrdquoUnderISTtheattorneyinchargedecidesexactlywhentotrainThisisdifferentfromothersystemswhere

SubjectMatterExpert(s)CodingofaTrainingSetofDocumentsrdquoATechnologyAssistedReviewprocessisdefinedasldquoAprocessforPrioritizingorCodingaCollectionofelectronicDocumentsusingacomputerizedsystemthatharnesseshumanjudgmentsofoneormoreSubjectMatterExpert(s)onasmallersetofDocumentsandthenextrapolatesthosejudgmentstotheremainingDocumentCollectionhellipTARprocessesgenerallyincorporateStatisticalModelsandorSamplingtechniquestoguidetheprocessandtomeasureoverallsystemeffectivenessrdquoAlsoseeTechnology-AssistedReviewinE-DiscoveryCanBeMoreEffectiveandMoreEfficientThanExhaustiveManualReviewRichmondJournalofLawandTechnologyVolXVIIIssue3Article11(2011)2DaSilvaMoorevPublicisGroupe868FSupp2d137(SDNY2012)andnumerouscaseslatercitingtoandfollowingthislandmarkdecisionbyJudgeAndrewPeckincludinganothermorerecentopinionbyJudgePeckRioTintoPLCvValeSA306FRD125(SDNY2015)3GrossmanampCormackEvaluationofMachine-LearningProtocolsforTechnology-AssistedReviewinElectronicDiscoverySIGIRrsquo14July6ndash112014GrossmanampCormackCommentsonldquoTheImplicationsofRule26(g)ontheUseofTechnology-AssistedReviewrdquo7FederalCourtsLawReview286(2014)HerbertRoitblatseriesoffiveOrcaTecblogposts(12345)May-August2014HerbertRoitblatDaubertRule26(g)andtheeDiscoveryTurkeyOrcaTecblogAugust11th2014HickmanampSchienemanTheImplicationsofRule26(g)ontheUseofTechnology-AssistedReview7FEDCTSLREV239(2013)LoseyRPredictiveCoding30partone(e-DiscoveryTeam101115)4IdWebberRandomvsactiveselectionoftrainingexamplesine-discovery(Evaluatinge-Discoveryblog71414)5LoseyRPredictiveCoding40ndashNineKeyPointsofLegalDocumentReviewandanUpdatedStatementofOurWorkflow(e-DiscoveryTeam91216)(PartOneofanEightPartSeriesexplainingtherecentadvancementsfromourPredictiveCodingmethodfromversion30toversion40)

5

themachineretrainsaftereachdocumentiscodedorcertainpredeterminednumberandthehumantrainerhasnodiscretionastotiming6Thee-DiscoveryTeamapproachincludesalltypesofsearchmethods(thusthetermmultimodal)tofindrelevantdocumentswithprimaryrelianceplacedonpredictivecodingTheTeamalsousesavarietyofmethodstofindsuitabletrainingdocumentsforpredictivecodingincludinghighrankingdocumentsandallothersearchmethodsThisisafundamentaldifferencewithothermethodsthatrelyentirelyonpredictivecodingtofindrelevantdocumentsandrelyentirelyuponhigh-rankingdocumentsfortrainingGrossmanandCormackhavescientificallytestedthesehigh-rankingtrainingmethodsandmeasuredtheireffectivenessbutthisdoesnotmeanthattheyendorsethemasanexclusivetoolnorclaimthistobetheirownpreferredmethod7FourResearchQuestions

1 PrimaryQuestion(repeatfrom2015)WhatRecallPrecisionandEffortlevelswillthee-DiscoveryTeamattaininTRECtestconditionsoverallthirty-fourtopicsusingtheTeamrsquosPredictiveCoding40hybridmultimodalsearchmethodsandKrollOntrackrsquossoftwareeDiscoverycomReview(EDR)

2 WhatistheimpactofincorrectSubjectMatterExpert(ldquoSMErdquo)judgmentsbytheTRECassessorsonRecallandPrecision(Unplannedquestionthatunfortunatelyaroseoutofthecircumstancesencountered)

3 WhatisthemosteffectivesearchmethodfromtheTeamrsquosmultimodaltool-setforretrievalofrelevantdocumentsintherelativelysimplisticsearchchallengespresentedbymostbutnotallofthethirty-fourtopics(Unplannedquestionthataroseoutofthecircumstancesencountered)

4 Whatistheroleofactivemachinelearninginretrievalofrelevantdocumentsinthesimplisticsearchchallengespresentedbymostofthethirty-fourtopics(Unplannedquestionrelatedtothethirdissueabovethatalsoaroseoutofthecircumstancesencountered)

OverviewOfTeamParticipationin2016TRECRecallTrack

Thee-DiscoveryTeamparticipatedinallthirty-fouroftheTotalRecallTrackAthometopicsItdidnotparticipateinthefullyautomatedTRECTotalRecallsandboxAllthirty-fourtopicssearchedacollectionofpublicemailsofformerFloridaGovernorJebBushTherewere290099emailsintheJebBushEmailcollectionIntheversionoftheJebBushemailsusedbyTRECalmostallmetadataoftheseemailshasbeenremovedMoreovertheassociated

6ThemeritsoftheTeamrsquosapproachtothetimingofmachinelearningaredetailedinPredictiveCoding40PartTwo7GrossmanampCormackEvaluationofMachine-LearningProtocolsforTechnology-AssistedReviewinElectronicDiscoverySIGIRrsquo14July6ndash112014

6

attachmentsandimageswerenotpresentOthercollectionsoftheJebBushemailexistfromPSTfilesthatincludemoreinformationbuttheTeamdidnotutilizethisinformationandlimiteditseffortsandattentiontotheofficialTRECcollectionThissameJebBushemailcollectionwasusedbytheTotalRecallTrackin2015fortentopicsIn2015LoseysearchedalltenofthesetentopicsNoneofthesesearchtopicswasrepeatedin2016Thethirty-fourtopicssearchedin2016andtheirnamesareshownbelowOnthefarrightcolumnarethefirstnamesofthee-DiscoveryTeammemberwhodidthereviewforthattopicThethirteentopicsinredwereconsideredmandatorybyTRECandtheremainingtwenty-onewereoptionalThee-DiscoveryTeamdidalltopics

Topic Name Reviewer

401 SummerOlympics Ralph402 Space Tony403 BottledWater Ralph404 EminentDomain Tony405 NewtGingrich Ralph406 FelonDisenfranchisement Ralph407 FaithBasedInitiatives Ralph408 InvasiveSpecies Tony409 ClimateChange Levi410 Condominiums Tony411 StandYourGround Ralph412 2000Recount Tony413 JamesVCrosby Jim414 MedicaidReform Tony415 GeorgeWBush Jim416 Marketing Jim417 MovieGallery Ralph418 WarPreparations Tony419 LostFosterChildRilyaWilson Levi420 Billboards Jim421 TrafficCameras Jim422 NonResidentAliens Tony423 NationalRifleAssociation Tony424 GulfDrilling Levi425 CivilRightsActof2003 Ralph426 JeffreyGoldhagen Ralph

7

427 SlotMachines Jim428 NewStadiumsandArenas Levi429 ElianGonzalez Jim430 RestraintsandHelmets Jani431 AgencyCreditRatings Tony432 GayAdoption Jani433 Abstinence Jim434 BacardiTrademark Ralph

RalphLoseydidtentopicsTonyReichenbergerdidtenJimSullivandideightLeviKuehndidfourandJaniGrantzdidtwoUnliketheTeamrsquos2015effortnocontractreviewattorneyswereutilizedonanytopicTheywereallsoloeffortsalthoughtherewassomecoordinationandcommutationsbetweenteammembersontheSMEtypeissuesencounteredThispertainedtoquestionsoftruerelevanceanderrorsfoundinthegoldstandardformostofthesetopicsIneachTopictheassignedTeamattorneypersonallyreadandevaluatedfortruerelevanceeveryemailthatTRECreturnedasarelevantdocumentandeveryemailthatTRECunexpectedlyreturnedasIrrelevantSomeofthesewerereadandstudiedmultipletimesbeforewemadeourfinalcallsontruerelevancedeterminationsthattookintoconsiderationandgavesomedeferencetotheTRECassessoradjudicationsbutwerenotboundbythemManyotheremailsthattheTeammembersconsideredirrelevantandTRECagreedwerealsopersonallyreviewedaspartoftheirsearcheffortsAsmentionedtherewassometimesconsultationsanddiscussionbetweenteammembersastotheunexpectedTRECopinionsonrelevanceAllofthethirty-fourtopicspresentedsearchchallengestotheTeamthatwereeasiersomefareasierthantheTeamtypicallyfaceasattorneysleadinglegaldocumentreviewprojectsTheywereroughlyequivalenttothemostsimplisticchallengesthattheymightfaceinprojectsinvolvingverysimplelegaldisputesAfewofthesearchtopicsincludedlegalissuesmuchmorethanwerefoundinthe2015TotalRecallTrackThisisarevisionthattheTeamrequestedandappreciatedbecauseitallowedtestingoflegaljudgmentandanalysisindeterminationoftruerelevanceinthesetopicsInlegalsearchsuchskillsareobviouslyveryimportantInmostofthe2016TotalRecalltopicshowevernospeciallegaltrainingoranalysiswasrequiredforadeterminationoftruerelevanceTheTeamrsquosfinalreportwillspecificallyidentifyeachtopicandastheTeamdidinits2015TRECreportprovidefulldetailsonthetypesofsearchesperformedforeachtopicanddifficultiesencountered

SummaryoftheTeamrsquosWork

Thee-DiscoveryTeamrsquos2016TotalRecallTrackAthomeprojectstartedJune32016andconcludedonAugust312016Usingasingleexpertreviewerineachtopicthee-DiscoveryTeamclassified9863366documentsin34differentreviewprojects

8

Allattorneysusedthee-DiscoveryTeamrsquosPredictiveCoding40hybridmultimodalISTsearchtechniquesandwereassistedbytheKOsoftwareEDRTheyreliedonactivemachinelearningandothersearchtechniquestofindrelevantdocumentsandeffectivetrainingdocumentsThevarioustypesofsearchesincludedintheTeamrsquosmultimodalapproachareshowninthesearchpyramidbelow

LinearreviewreferstoanSMErsquosexaminationofalldocumentsbycertainkeywitnessesinalawsuitduringcertaintimeframescriticaltothedisputedfactsinalawsuitKeywordsearchinourmethodologyreferstotheuseoftermsoriginatingfromlegalanddocumentanalysisandfromwitnessinterviewsJudgmentalsamplingandverificationbySMEsarealsousedtotestthetermsbeforetheyareusedthroughoutadocumentcollectionOurkeywordsearchalsoincludesavarietyofBooleanfunctionsandparametrictargetingwhereinsearchesarelimitedtocertainmetadatafieldsofanelectronicdocumentSimilarityandconceptsearchesrefertoavarietyofpassivemachinelearninganalyticsearchtechniquesTheAIsearchatthetopofthepyramidreferstotheuseofactivemachinelearningTheEDRKOsoftwareusesaproprietarytypeoflogisticregressionalgorithmThestandardeight-stepworkflowusedbytheTeaminlegalsearchprojectsisshowninthediagrambelow8TomeettheTeamrsquosselfimposedtimerequirementsofcompletingeveryreviewprojectwithminimaltimeeffortsthestandardstepsThreeandSevenwereomitted

8LoseyRPredictiveCoding40ndashNineKeyPointsofLegalDocumentReviewandanUpdatedStatementofOurWorkflow(e-DiscoveryTeamOctober2016)containsacompletedescriptionofalleightstepsinpartsSixandSeven

9

aswillbefurtherexplainedFurtherduetotheset-upoftheTRECexperimentsthefirststepofourworkflowESICommunicationswasseverelyconstrainedtothepointofbeingpracticallymeaninglessaswillalsobefurtherexplainedTheTeamrsquosstandardworkflowwasthusreducedtofivestepsasshownbelow

InthefirststepofESICommunicationsteammembersonalegalreviewprojecttypicallyspendhoursindiscussionandanalysisofscopeofrelevanceandthetargetdocumentsThecommunicationsoftenincludehundredsofwrittenexchangesbothinformalsuchasemailsandchatsandformalsuchas(1)detailedrequestsforinformationcontainedincourtdocumentssuchasubpoenasorRequestForProduction(2)inputfromaqualifiedSMEwhoistypicallyalegalexpertwithdeepknowledgeofthefactualissuesinthecaseandthusdeepknowledgeofwhatthepresidingjudgeinthelegalproceedingwillholdtoberelevantanddiscoverableand(3)dialogueswiththepartyrequestingtheproductionofdocumentstoclarifythesearchtargetandotherpartiesTheESIcommunicationsmayleadtoformalmotionswiththegoverningcourtlegalmemorandumshearingsbeforethepresidingjudgeandopinionsrenderedbyoneormorejudgesonthescopeofrelevance9

9IdatPartSixwhereinthefirststepofESICommunicationsisexplainedindetail

10

TheonlyESIcommunicationsintheTRECexperimentalset-upwasaveryshortonesentencedescriptionofrelevanceforeachtopicTwotopicshadatwo-sentencedescription(410-Condominiumsand423-NationalRifleAssociation)TheonlyothertypeofESIcommunicationsinthisTRECTrackweretheautomatedinstantreturnsofalldocumentssubmittedastowhetherTRECconsideredthemtoberelevantornotTherewerenoappealsorotherproceduresset-upforAthomeparticipantswhoactuallyexaminedthedocumentsfortruerelevancetochallengeobviouserrorsinjudgment

ShortAnswerstoResearchQuestionsResearchQuestion1(PrimaryQuestion)WhatRecallPrecisionandEffortlevelswillthee-DiscoveryTeamattaininTRECtestconditionsoverallthirty-fourTopicsusingtheTeamrsquosPredictiveCoding40hybridmultimodalISTsearchmethodsandKrollOntrackrsquossoftwareeDiscoveryReview(EDR)ShortAnswertoPrimaryQuestionAgainlikelastyeartheTeamattainedexcellentresultswithhighlevelsofRecallandPrecisioninalltopicsincludingperfectornearperfectresultsinseveraltopicsusingthecorrectedgoldstandardTheTeamwasabletodosoeventhoughitonlyusedfiveoftheeightstepsinitsusualmethodologyandeventhoughitintentionallyseverelyconstrainedtheamountofhumaneffortexpendedoneachtopicTheTeamrsquosenthusiasmfortheresultswhichweresignificantlybetterthanits2015effortistemperedbythefactthatthesearchchallengespresentedinmostofthetopicsin2016werenotdifficultAsmentionedtheywereequivalenttoaneasylegalsearchprojectsuchasasimplesingleplaintiffemploymentlawdisputeTheFinalReportwillincludeadetailedanalysisoftheseresultsResearchQuestion2WhatistheimpactofmultipleerrorsinSMEjudgmentsbytheTRECassessorsonRecallandPrecisionShortAnswerTheimpactonRecallandPrecisionusingtheTeamrsquosmethodissignificantandasyouwouldexpectvarieddeterminedtothenumberoferrorsmadebyTRECassessorsinaparticulartopicAftertheTeamencounterednumerouserrorsonthefirsttopicsundertakenitwasforcedtocreateitsowngoldstandardoftruerelevantdocumentsforeachtopicTheTeamrsquosnewgoldstandardcorrectedfortheobviouserrorsseeninTRECrsquosassessmentsofrelevanceInallclosequestionsonrelevancethejudgmentofTRECrsquosassessorswasacceptedasaccurateTheobviouserrorsandinconsistenciesseenbytheTeamrsquosclosestudyofthedocumentswerenotacceptedInmostbutnotalltopicstheTeamdidnotusethedocumentswithobviouserrorsforitsmachinetrainingThiswillbefurtherdetailedintheFinalReportInalltopicstheTeamcreateditsownstandardandmadecomparativerecallprecisionandF1calculationsbasedthereonTheobservationandcorrectionofTRECerrorsingoldstandardbecameacollaborativeeffortamongtheTeamtopeerreviewandverifyourcorrected

11

standardMostoftheseeffortsmanyofwhichoccurredaftertheconclusionoftheTrackinAugustwerenotincludedinthetimereportsofeffortsexpendedbyattorneysinthesearchTheTeamwasveryreluctanttotakethisstepandwouldcertainlyhaveletpassafewerrorsormeredifferencesofopinionWerecognizethatnostandardiseverperfectAslawyerstheTeamunderstandsalltoowellthatsomeperhapsmanyjudgmentsonrelevancearesubjectiveAgaininallclosequestionsonrelevancethejudgmentsofTRECrsquosassessorswereacceptedeventhoughwepersonallydisagreedTheTeammeansnodisrespectbythecreationofanalternategoldstandardWeappreciateandrespecttheeffortsmadebytheTRECassessorsandorganizersStillthevolumeofobviouserrorsencounteredforcedustotakethisactionTheintegrityofourprimaryresearchquestiontotesttheeffectivenessofourhands-ontypeofadhochybridmethodsdemandedthatwedosoWeunderstandthattheimpactonotherTotalRecallParticipantsonesthatneveractuallyexaminedocumentswouldbefarlessperhapsevennegligibleStilltherecouldbeanimpactevenfortheminsometopicswheremorethananinsignificantnumberofthesameorsimilardocumentswereinconsistentlyjudgedThedecisiontonotaccepttheerrorsseenandtoinsteadcreateourowngoldstandardresultedinsubstantialadditionalworkfortheTeamInsometopicsweeventookthestepofmakingtwoldquoreasonablecallsrdquoOnewasforTRECandthesecondcallwhichalwaystookplaceonthenextsubmissionwasforourowninternaltrackingInthesecondcallwewouldincludeemailsthatweknewfrompriorsubmissionsofthesameorsimilardocumentwouldagainbeincorrectlyconsideredirrelevantbyTRECWeknewtheyweretruerelevantandsowaiteduntilafterourpublicreasonablecalltoTRECtosubmitthemandthenwemakeourowninternalreasonablecallWewereattemptingtoineffectplaytwogamesatonceandmaximizeourscoreineachgameKeepingtrackoftwostandardsaddedanunexpectedlayerofdifficultytoourworkandwedidnotbothertodosoinalltopicsThedual-calltopicswillbespecificallyidentifiedinourFinalReportInsometopicsthedifferencebetweenthetwostandardswassubstantialInafewtopicsitwasminorSomedifferenceswerefoundinalltopicsThisisnotunexpectedinanystandardinvolvingatleastsomewhatsubjectivemassrelevanceadjudicationsWedonotintendtoengageinacriticismofthespecificgoldstandardcreationmethodsusedin2016TotalRecallTrackexcepttonotethattheappealsprocedureincludedinthe2008and2009TRECLegalTrackscouldhaveimprovedtheaccuracyoftheresultsfortheTotalRecallTrackAthomeparticipants10FurthertheTeamunderstandsfrominformalreportsthattheTREC

10Participantappealrightscouldhavemitigatedtheerrorsseenin2016butthiscanbeburdensomeandasseeninthoseTracksin2008and2009cancreatetheirownissuesSeeOardHedlinTomlinsonBaronOverviewoftheTREC2008LegalTrackfoundathttptrecnistgovpubstrec17papersLEGALOVERVIEW08pdfandOardHedlinTomlinson

12

assessorsworkwasmuchmoretimeconstrainedthanwastheworkoftheTeamMoreoverunliketheTeamtheTRECassessorsdidnothavethebenefitofSMEinputfromanativeFloridianlawyer(Losey)whowasfamiliarwithFloridapoliticsandGovernorBushandsince2015hadputsubstantialtimereviewingthisemailcollectionTheFinalReportwillincludeadetailedcomparisonofrecallprecisionandF1basedonthecomparisonofboththeTRECandTeamassessmentsAfewexamplesofthemoreegregiouserrorsencounteredwillbeprovidedTheFinalReportmayalsocontainacompletelistingoftherevisedgoldstandardsthattheTeamcreatedforeachtopicoratleastaconditionalofferofdisclosureofthecorrectedstandardsTheTeaminvitesinputfromotherparticipantsandorganizersoftheTotalRecallTrackonthisissueAgaintheTeamrecognizesthatnogoldstandardiseverperfectincludingitsownrevisedstandardsThiswillbesetforthinfurtherdetailintheTeamrsquosfinalreportResearchQuestion3WhatisthemosteffectivesearchmethodfromtheTeamrsquosmultimodaltool-setforretrievalofrelevantdocumentsfortherelativelysimplisticsearchchallengespresentedbymostofthethirty-fourtopicsShortAnswerFortheeasytopicstheTeamfoundthatwhatitcallsldquotestedparametricBooleankeywordsearchrdquowasthemosteffectivesearchmethodtofindrelevantdocumentsTheTeamwassurprisedbyhowwellasophisticateduseofkeywordswasabletoidentifynearlyallofthetargetrelevantdocumentsinmanyofthetopicsinthisyearrsquosTotalRecallTrackThisshowsthecontinuedimportanceofamultimodalapproachtolegalsearchincludingespeciallykeywordsearchwhendoneproperlyespeciallyinsimplelawsuitsinvolvingrelativelyeasysearchissuesThiswillbesetforthinfurtherdetailintheTeamrsquosfinalreportResearchQuestion4Whatistheroleofactivemachinelearninginretrievalofrelevantdocumentsinthesimplisticsearchchallengespresentedbymostofthethirty-fourtopicsShortAnswerTheTeamfoundthatfortherelativelyeasytopicsinthisyearrsquosTotalRecallTracktheroleofactivemachinelearningwasreducedtoaqualitycontrolfunctionItwouldfindafewrelevantdocumentsnotlocatedbykeywordsearchorconceptandsimilaritysearchandthusimproverecallsomewhatInthesimplesttopicsactivemachinelearningdidnotfindanynewrelevantdocumentsbutinsteadonlyconfirmedthatallrelevantdocumentshadalreadybeenfoundbytheothermethodsThiswillbesetforthinfurtherdetailintheTeamrsquosfinalreport

FurtherDiscussionofResearchQuestion1

BaronOardOverviewoftheTREC2009LegalTrackfoundathttptrecnistgovpubstrec18papersLEGAL09OVERVIEWpdf

13

EvenusingthegivenuncorrectedTRECstandardforscoringandeventhoughinmosttopicswedidnottrainontheTRECreturned-relevantdocumentsthattheTeamconsideredirrelevanttheTeamoverallstillattainedexcellentresultsUnderthecorrectedstandardwhichwillbesharedintheFinalReporttheresultsweremuchbetterThefollowingchartcomparestheTeamrsquosRecallPrecisionandF-MeasureforeachAthometopicwiththeresultsobtainedbyTRECrsquosBMIandBMI-Descruns(onlyotherscoresnowavailable)

REASONABLECOMPARISON

Recall Precision F-Measure

EdiscoveryTeam

BMI BMI-Desc

EdiscoveryTeam

BMI BMI-Desc

EdiscoveryTeam

BMI BMI-Desc

athome401 SummerOlympics 4105 9170 9258 7344 1531 1545 5266 2623 2648athome402 Space 7257 9107 9028 2204 3086 3059 3381 4609 4570athome403 BottledWater 716 9771 9771 8041 3749 3749 1314 5418 5418athome404 EminentDomain 2294 9174 9193 6443 2655 2661 3383 4119 4127athome405 NewtGingrich 9508 9918 9836 2809 982 974 4336 1787 1773athome406 FelonDisenfran 7323 9291 9291 6691 958 958 6992 1737 1737

athome407FaithBasedInitiatives 3102 9180 9199 6872 4186 4195 4275 5750 5762

athome408 InvasiveSpecies 5517 8362 8362 6465 787 787 5953 1439 1439athome409 ClimateChange 8465 9505 9406 4071 1399 1385 5498 2440 2414athome410 Condominiums 9510 9948 9903 4613 4259 4240 6212 5964 5938athome411 StandYourGround 6629 7079 8427 6705 570 609 6667 1055 1136athome412 2000Recount 5738 9135 9248 4918 4097 4148 5296 5657 5727athome413 JamesVCrosby 9634 9908 9927 8900 2873 2878 9252 4455 4463athome414 MedicaidReform 9166 9690 9726 3532 3510 3523 5101 5154 5173athome415 GeorgeWBush 9408 6339 6708 9104 6109 5866 9253 6222 6259athome416 Marketing 6030 9419 9557 4208 4332 4396 4957 5935 6022athome417 MovieGallery 9961 9981 9966 9938 5728 5719 9949 7279 7267athome418 WarPreparations 3957 9305 9358 5034 1268 1276 4431 2232 2245

athome419LostFosterChildRilyaWilson 9884 9306 9361 1504 4813 4841 2610 6344 6382

athome420 Billboards 9254 9946 9932 9216 3165 3161 9235 4802 4795athome421 TrafficCameras 9048 10000 10000 1250 190 190 2197 373 373athome422 NonResidentAliens 9355 10000 10000 090 281 281 179 546 546

athome423NationalRifleAssociation 5105 9965 9965 3318 1868 1868 4022 3146 3146

athome424 GulfDrilling 9960 10000 10000 2276 2639 2639 3705 4176 4176athome425 CivilRightsAct2003 9132 9860 9860 9659 3370 3370 9388 5023 5023athome426 JeffreyGoldhagen 7000 9417 9417 8750 917 917 7778 1672 1672athome427 SlotMachines 8921 9668 9668 3577 1698 1698 5107 2889 2889athome428 NewStadiums 9310 9849 9849 1781 2695 2695 2991 4231 4231athome429 ElianGonzalez 9420 9927 9927 9241 3545 3545 9329 5224 5224

14

athome430RestraintsandHelmets 7195 9425 9465 6500 3640 3655 6830 5252 5274

athome431 AgencyCreditRate 7569 9931 9931 4760 1161 1161 5845 2078 2078athome432 GayAdoption 8500 9857 9857 8623 1120 1120 8561 2012 2012athome433 Abstinence 9911 10000 10000 6607 909 909 7929 1667 1667athome434 BacardiTrademark 8684 10000 10000 9167 344 344 8919 665 665

ThesecomparativestatisticsshowthescoresatthetimeofreasonablecallIntheprecisioncategorywhichinLegalSearchisthemoneyshotthathasthegreatestimpactonthecostofadocumentreviewprojectthee-DiscoveryTeamdominatedIthadthehighestprecisionlevelon28ofthe34topics(82)TheyarehighlightedinblueintheabovechartThee-DiscoveryTeamrsquosaverageprecisionscorewas571TheaverageprecisionofbothBMIandBMI-Descwas248ThustheTeamrsquosprecisionscorewasonaveragemoretwoandaquartertimeshigherthanthatoftheBMIstandards

IntheF1-measurewhichisthestandardvalueusedinlegalsearchtoevaluateoverallprecisionandrecallofaprojectthee-DiscoveryTeamagaindominatedThisissomewhatsurprisinginviewofthefactthatthesemeasurementswerebasedontheerror-filledTRECstandardTheTeamhadthehighestF1scoreson23ofthe34topics(68)TheyarehighlightedinblueintheabovechartThee-DiscoveryTeamrsquosaverageF1scorewas5769

5712

2483 2481

000

1000

2000

3000

4000

5000

6000

7000

AveragePrecisionAcrossTopics

EdiscoveryTeam BMI BMI-Desc

15

TheaverageF1ofBMIandBMI-Descwas365ThustheTeamrsquosF1scorewasonaveragemorethan58higherthanthatoftheBMIstandards

EvenusingTRECrsquoschallengedstandardtheTeamstillattainedhigherrecallthanboththeBMIandBMI-Descstandardsontwotopicstopic415GeorgeBushwithascoreof9408andtopic419LostFosterChildRilyaWilsonwithascoreof9884MoreovertheTeamattainedrecalllevelsinexcessof90atthetimeofreasonablecallinthefollowingadditionaltopics

bull 9508ontopic406FelonDisenfranchisementbull 9510ontopic410Condominiumsbull 9634ontopic413JamesVCrosbybull 9961ontopic417MovieGallerybull 9254ontopic420Billboardsbull 9048ontopic421TrafficCamerasbull 9355ontopic422NonResidentAliensbull 9960ontopic424GulfDrilling

5769

3646 3655

000

1000

2000

3000

4000

5000

6000

7000

AverageF-MeasureAcrossTopics

EdiscoveryTeam BMI BMI-Desc

16

bull 9132ontopic425CivilRightsActof2003bull 9310ontopic428NewStadiumsandArenasbull 9420ontopic429ElianGonzalezbull 9911ontopic433Abstinence

InsummaryevenwiththeTRECstandardwhereinmosttopicstheTeamdidnotusealldocumentsreturnedasrelevantforallofitstrainingdocumentsitattainedRecallscoresgreaterthan90infourteenofthethirty-fourtopicsTheTeamattainedRecallscoresof80orhigherinfouradditionaltopicsTheaverageresultsobtainedacrossallthirty-fourtopicsatthetimeofreasonablecallwereasfollows

bull 7546Recallbull 5712Precisionbull 5769F1bull 121DocsReviewedEffort

TheTeamwilldiscloseallofitsscoresunderthecorrectedgoldstandardintheFinalReportInthemeantimeherearetheaverageresultsobtainedacrossallthirty-fourtopicsatthetimeofreasonablecall

bull 8715Recallbull 6494Precisionbull 6874F1bull 124DocsReviewedEffort

AtthetimeofreasonablecalltheTeamhadrecallscoresgreaterthan90intwenty-oneofthethirty-fourtopicsandgreaterthan80infivemoretopicsRecallofgreaterthan99wasattainedinseventopicsAtthetimeofreasonablecalltheTeamhadprecisionscoresgreaterthan90inthirteenofthethirty-fourtopicsandgreaterthan80intwomoretopicsPrecisionofgreaterthan98wasattainedinsixtopicsAtthetimeofreasonablecalltheTeamhadF1scoresgreaterthan90intwelveofthethirty-fourtopicsandgreaterthan80inonemoretopicF1ofgreaterthan97wasattainedinfivetopicsWewereluckytoattainoneperfectscoreaswedidin2015intopic(417)withanF1scoreof100Theperfectscorewasobtainedbylocatingall5945documentsrelevantunderthecorrectedstandardafterreviewingonly45documentsThistopicwasfilledwithformlettersandwasafairlysimplesearchStilltheBMIandBMI-DescF1scoresforthistopicwerebothunder73TheTeamwaspleasedtoproveonceagainthatperfectrecallandperfectprecisionispossiblealbeitrareusingtheTeamrsquosmethods

17

ForquestionscommentsorsuggestionsconcerningthispreliminaryNotebookreportofthee-DiscoveryTeampleasecontactRalphLoseygmailcom

5

themachineretrainsaftereachdocumentiscodedorcertainpredeterminednumberandthehumantrainerhasnodiscretionastotiming6Thee-DiscoveryTeamapproachincludesalltypesofsearchmethods(thusthetermmultimodal)tofindrelevantdocumentswithprimaryrelianceplacedonpredictivecodingTheTeamalsousesavarietyofmethodstofindsuitabletrainingdocumentsforpredictivecodingincludinghighrankingdocumentsandallothersearchmethodsThisisafundamentaldifferencewithothermethodsthatrelyentirelyonpredictivecodingtofindrelevantdocumentsandrelyentirelyuponhigh-rankingdocumentsfortrainingGrossmanandCormackhavescientificallytestedthesehigh-rankingtrainingmethodsandmeasuredtheireffectivenessbutthisdoesnotmeanthattheyendorsethemasanexclusivetoolnorclaimthistobetheirownpreferredmethod7FourResearchQuestions

1 PrimaryQuestion(repeatfrom2015)WhatRecallPrecisionandEffortlevelswillthee-DiscoveryTeamattaininTRECtestconditionsoverallthirty-fourtopicsusingtheTeamrsquosPredictiveCoding40hybridmultimodalsearchmethodsandKrollOntrackrsquossoftwareeDiscoverycomReview(EDR)

2 WhatistheimpactofincorrectSubjectMatterExpert(ldquoSMErdquo)judgmentsbytheTRECassessorsonRecallandPrecision(Unplannedquestionthatunfortunatelyaroseoutofthecircumstancesencountered)

3 WhatisthemosteffectivesearchmethodfromtheTeamrsquosmultimodaltool-setforretrievalofrelevantdocumentsintherelativelysimplisticsearchchallengespresentedbymostbutnotallofthethirty-fourtopics(Unplannedquestionthataroseoutofthecircumstancesencountered)

4 Whatistheroleofactivemachinelearninginretrievalofrelevantdocumentsinthesimplisticsearchchallengespresentedbymostofthethirty-fourtopics(Unplannedquestionrelatedtothethirdissueabovethatalsoaroseoutofthecircumstancesencountered)

OverviewOfTeamParticipationin2016TRECRecallTrack

Thee-DiscoveryTeamparticipatedinallthirty-fouroftheTotalRecallTrackAthometopicsItdidnotparticipateinthefullyautomatedTRECTotalRecallsandboxAllthirty-fourtopicssearchedacollectionofpublicemailsofformerFloridaGovernorJebBushTherewere290099emailsintheJebBushEmailcollectionIntheversionoftheJebBushemailsusedbyTRECalmostallmetadataoftheseemailshasbeenremovedMoreovertheassociated

6ThemeritsoftheTeamrsquosapproachtothetimingofmachinelearningaredetailedinPredictiveCoding40PartTwo7GrossmanampCormackEvaluationofMachine-LearningProtocolsforTechnology-AssistedReviewinElectronicDiscoverySIGIRrsquo14July6ndash112014

6

attachmentsandimageswerenotpresentOthercollectionsoftheJebBushemailexistfromPSTfilesthatincludemoreinformationbuttheTeamdidnotutilizethisinformationandlimiteditseffortsandattentiontotheofficialTRECcollectionThissameJebBushemailcollectionwasusedbytheTotalRecallTrackin2015fortentopicsIn2015LoseysearchedalltenofthesetentopicsNoneofthesesearchtopicswasrepeatedin2016Thethirty-fourtopicssearchedin2016andtheirnamesareshownbelowOnthefarrightcolumnarethefirstnamesofthee-DiscoveryTeammemberwhodidthereviewforthattopicThethirteentopicsinredwereconsideredmandatorybyTRECandtheremainingtwenty-onewereoptionalThee-DiscoveryTeamdidalltopics

Topic Name Reviewer

401 SummerOlympics Ralph402 Space Tony403 BottledWater Ralph404 EminentDomain Tony405 NewtGingrich Ralph406 FelonDisenfranchisement Ralph407 FaithBasedInitiatives Ralph408 InvasiveSpecies Tony409 ClimateChange Levi410 Condominiums Tony411 StandYourGround Ralph412 2000Recount Tony413 JamesVCrosby Jim414 MedicaidReform Tony415 GeorgeWBush Jim416 Marketing Jim417 MovieGallery Ralph418 WarPreparations Tony419 LostFosterChildRilyaWilson Levi420 Billboards Jim421 TrafficCameras Jim422 NonResidentAliens Tony423 NationalRifleAssociation Tony424 GulfDrilling Levi425 CivilRightsActof2003 Ralph426 JeffreyGoldhagen Ralph

7

427 SlotMachines Jim428 NewStadiumsandArenas Levi429 ElianGonzalez Jim430 RestraintsandHelmets Jani431 AgencyCreditRatings Tony432 GayAdoption Jani433 Abstinence Jim434 BacardiTrademark Ralph

RalphLoseydidtentopicsTonyReichenbergerdidtenJimSullivandideightLeviKuehndidfourandJaniGrantzdidtwoUnliketheTeamrsquos2015effortnocontractreviewattorneyswereutilizedonanytopicTheywereallsoloeffortsalthoughtherewassomecoordinationandcommutationsbetweenteammembersontheSMEtypeissuesencounteredThispertainedtoquestionsoftruerelevanceanderrorsfoundinthegoldstandardformostofthesetopicsIneachTopictheassignedTeamattorneypersonallyreadandevaluatedfortruerelevanceeveryemailthatTRECreturnedasarelevantdocumentandeveryemailthatTRECunexpectedlyreturnedasIrrelevantSomeofthesewerereadandstudiedmultipletimesbeforewemadeourfinalcallsontruerelevancedeterminationsthattookintoconsiderationandgavesomedeferencetotheTRECassessoradjudicationsbutwerenotboundbythemManyotheremailsthattheTeammembersconsideredirrelevantandTRECagreedwerealsopersonallyreviewedaspartoftheirsearcheffortsAsmentionedtherewassometimesconsultationsanddiscussionbetweenteammembersastotheunexpectedTRECopinionsonrelevanceAllofthethirty-fourtopicspresentedsearchchallengestotheTeamthatwereeasiersomefareasierthantheTeamtypicallyfaceasattorneysleadinglegaldocumentreviewprojectsTheywereroughlyequivalenttothemostsimplisticchallengesthattheymightfaceinprojectsinvolvingverysimplelegaldisputesAfewofthesearchtopicsincludedlegalissuesmuchmorethanwerefoundinthe2015TotalRecallTrackThisisarevisionthattheTeamrequestedandappreciatedbecauseitallowedtestingoflegaljudgmentandanalysisindeterminationoftruerelevanceinthesetopicsInlegalsearchsuchskillsareobviouslyveryimportantInmostofthe2016TotalRecalltopicshowevernospeciallegaltrainingoranalysiswasrequiredforadeterminationoftruerelevanceTheTeamrsquosfinalreportwillspecificallyidentifyeachtopicandastheTeamdidinits2015TRECreportprovidefulldetailsonthetypesofsearchesperformedforeachtopicanddifficultiesencountered

SummaryoftheTeamrsquosWork

Thee-DiscoveryTeamrsquos2016TotalRecallTrackAthomeprojectstartedJune32016andconcludedonAugust312016Usingasingleexpertreviewerineachtopicthee-DiscoveryTeamclassified9863366documentsin34differentreviewprojects

8

Allattorneysusedthee-DiscoveryTeamrsquosPredictiveCoding40hybridmultimodalISTsearchtechniquesandwereassistedbytheKOsoftwareEDRTheyreliedonactivemachinelearningandothersearchtechniquestofindrelevantdocumentsandeffectivetrainingdocumentsThevarioustypesofsearchesincludedintheTeamrsquosmultimodalapproachareshowninthesearchpyramidbelow

LinearreviewreferstoanSMErsquosexaminationofalldocumentsbycertainkeywitnessesinalawsuitduringcertaintimeframescriticaltothedisputedfactsinalawsuitKeywordsearchinourmethodologyreferstotheuseoftermsoriginatingfromlegalanddocumentanalysisandfromwitnessinterviewsJudgmentalsamplingandverificationbySMEsarealsousedtotestthetermsbeforetheyareusedthroughoutadocumentcollectionOurkeywordsearchalsoincludesavarietyofBooleanfunctionsandparametrictargetingwhereinsearchesarelimitedtocertainmetadatafieldsofanelectronicdocumentSimilarityandconceptsearchesrefertoavarietyofpassivemachinelearninganalyticsearchtechniquesTheAIsearchatthetopofthepyramidreferstotheuseofactivemachinelearningTheEDRKOsoftwareusesaproprietarytypeoflogisticregressionalgorithmThestandardeight-stepworkflowusedbytheTeaminlegalsearchprojectsisshowninthediagrambelow8TomeettheTeamrsquosselfimposedtimerequirementsofcompletingeveryreviewprojectwithminimaltimeeffortsthestandardstepsThreeandSevenwereomitted

8LoseyRPredictiveCoding40ndashNineKeyPointsofLegalDocumentReviewandanUpdatedStatementofOurWorkflow(e-DiscoveryTeamOctober2016)containsacompletedescriptionofalleightstepsinpartsSixandSeven

9

aswillbefurtherexplainedFurtherduetotheset-upoftheTRECexperimentsthefirststepofourworkflowESICommunicationswasseverelyconstrainedtothepointofbeingpracticallymeaninglessaswillalsobefurtherexplainedTheTeamrsquosstandardworkflowwasthusreducedtofivestepsasshownbelow

InthefirststepofESICommunicationsteammembersonalegalreviewprojecttypicallyspendhoursindiscussionandanalysisofscopeofrelevanceandthetargetdocumentsThecommunicationsoftenincludehundredsofwrittenexchangesbothinformalsuchasemailsandchatsandformalsuchas(1)detailedrequestsforinformationcontainedincourtdocumentssuchasubpoenasorRequestForProduction(2)inputfromaqualifiedSMEwhoistypicallyalegalexpertwithdeepknowledgeofthefactualissuesinthecaseandthusdeepknowledgeofwhatthepresidingjudgeinthelegalproceedingwillholdtoberelevantanddiscoverableand(3)dialogueswiththepartyrequestingtheproductionofdocumentstoclarifythesearchtargetandotherpartiesTheESIcommunicationsmayleadtoformalmotionswiththegoverningcourtlegalmemorandumshearingsbeforethepresidingjudgeandopinionsrenderedbyoneormorejudgesonthescopeofrelevance9

9IdatPartSixwhereinthefirststepofESICommunicationsisexplainedindetail

10

TheonlyESIcommunicationsintheTRECexperimentalset-upwasaveryshortonesentencedescriptionofrelevanceforeachtopicTwotopicshadatwo-sentencedescription(410-Condominiumsand423-NationalRifleAssociation)TheonlyothertypeofESIcommunicationsinthisTRECTrackweretheautomatedinstantreturnsofalldocumentssubmittedastowhetherTRECconsideredthemtoberelevantornotTherewerenoappealsorotherproceduresset-upforAthomeparticipantswhoactuallyexaminedthedocumentsfortruerelevancetochallengeobviouserrorsinjudgment

ShortAnswerstoResearchQuestionsResearchQuestion1(PrimaryQuestion)WhatRecallPrecisionandEffortlevelswillthee-DiscoveryTeamattaininTRECtestconditionsoverallthirty-fourTopicsusingtheTeamrsquosPredictiveCoding40hybridmultimodalISTsearchmethodsandKrollOntrackrsquossoftwareeDiscoveryReview(EDR)ShortAnswertoPrimaryQuestionAgainlikelastyeartheTeamattainedexcellentresultswithhighlevelsofRecallandPrecisioninalltopicsincludingperfectornearperfectresultsinseveraltopicsusingthecorrectedgoldstandardTheTeamwasabletodosoeventhoughitonlyusedfiveoftheeightstepsinitsusualmethodologyandeventhoughitintentionallyseverelyconstrainedtheamountofhumaneffortexpendedoneachtopicTheTeamrsquosenthusiasmfortheresultswhichweresignificantlybetterthanits2015effortistemperedbythefactthatthesearchchallengespresentedinmostofthetopicsin2016werenotdifficultAsmentionedtheywereequivalenttoaneasylegalsearchprojectsuchasasimplesingleplaintiffemploymentlawdisputeTheFinalReportwillincludeadetailedanalysisoftheseresultsResearchQuestion2WhatistheimpactofmultipleerrorsinSMEjudgmentsbytheTRECassessorsonRecallandPrecisionShortAnswerTheimpactonRecallandPrecisionusingtheTeamrsquosmethodissignificantandasyouwouldexpectvarieddeterminedtothenumberoferrorsmadebyTRECassessorsinaparticulartopicAftertheTeamencounterednumerouserrorsonthefirsttopicsundertakenitwasforcedtocreateitsowngoldstandardoftruerelevantdocumentsforeachtopicTheTeamrsquosnewgoldstandardcorrectedfortheobviouserrorsseeninTRECrsquosassessmentsofrelevanceInallclosequestionsonrelevancethejudgmentofTRECrsquosassessorswasacceptedasaccurateTheobviouserrorsandinconsistenciesseenbytheTeamrsquosclosestudyofthedocumentswerenotacceptedInmostbutnotalltopicstheTeamdidnotusethedocumentswithobviouserrorsforitsmachinetrainingThiswillbefurtherdetailedintheFinalReportInalltopicstheTeamcreateditsownstandardandmadecomparativerecallprecisionandF1calculationsbasedthereonTheobservationandcorrectionofTRECerrorsingoldstandardbecameacollaborativeeffortamongtheTeamtopeerreviewandverifyourcorrected

11

standardMostoftheseeffortsmanyofwhichoccurredaftertheconclusionoftheTrackinAugustwerenotincludedinthetimereportsofeffortsexpendedbyattorneysinthesearchTheTeamwasveryreluctanttotakethisstepandwouldcertainlyhaveletpassafewerrorsormeredifferencesofopinionWerecognizethatnostandardiseverperfectAslawyerstheTeamunderstandsalltoowellthatsomeperhapsmanyjudgmentsonrelevancearesubjectiveAgaininallclosequestionsonrelevancethejudgmentsofTRECrsquosassessorswereacceptedeventhoughwepersonallydisagreedTheTeammeansnodisrespectbythecreationofanalternategoldstandardWeappreciateandrespecttheeffortsmadebytheTRECassessorsandorganizersStillthevolumeofobviouserrorsencounteredforcedustotakethisactionTheintegrityofourprimaryresearchquestiontotesttheeffectivenessofourhands-ontypeofadhochybridmethodsdemandedthatwedosoWeunderstandthattheimpactonotherTotalRecallParticipantsonesthatneveractuallyexaminedocumentswouldbefarlessperhapsevennegligibleStilltherecouldbeanimpactevenfortheminsometopicswheremorethananinsignificantnumberofthesameorsimilardocumentswereinconsistentlyjudgedThedecisiontonotaccepttheerrorsseenandtoinsteadcreateourowngoldstandardresultedinsubstantialadditionalworkfortheTeamInsometopicsweeventookthestepofmakingtwoldquoreasonablecallsrdquoOnewasforTRECandthesecondcallwhichalwaystookplaceonthenextsubmissionwasforourowninternaltrackingInthesecondcallwewouldincludeemailsthatweknewfrompriorsubmissionsofthesameorsimilardocumentwouldagainbeincorrectlyconsideredirrelevantbyTRECWeknewtheyweretruerelevantandsowaiteduntilafterourpublicreasonablecalltoTRECtosubmitthemandthenwemakeourowninternalreasonablecallWewereattemptingtoineffectplaytwogamesatonceandmaximizeourscoreineachgameKeepingtrackoftwostandardsaddedanunexpectedlayerofdifficultytoourworkandwedidnotbothertodosoinalltopicsThedual-calltopicswillbespecificallyidentifiedinourFinalReportInsometopicsthedifferencebetweenthetwostandardswassubstantialInafewtopicsitwasminorSomedifferenceswerefoundinalltopicsThisisnotunexpectedinanystandardinvolvingatleastsomewhatsubjectivemassrelevanceadjudicationsWedonotintendtoengageinacriticismofthespecificgoldstandardcreationmethodsusedin2016TotalRecallTrackexcepttonotethattheappealsprocedureincludedinthe2008and2009TRECLegalTrackscouldhaveimprovedtheaccuracyoftheresultsfortheTotalRecallTrackAthomeparticipants10FurthertheTeamunderstandsfrominformalreportsthattheTREC

10Participantappealrightscouldhavemitigatedtheerrorsseenin2016butthiscanbeburdensomeandasseeninthoseTracksin2008and2009cancreatetheirownissuesSeeOardHedlinTomlinsonBaronOverviewoftheTREC2008LegalTrackfoundathttptrecnistgovpubstrec17papersLEGALOVERVIEW08pdfandOardHedlinTomlinson

12

assessorsworkwasmuchmoretimeconstrainedthanwastheworkoftheTeamMoreoverunliketheTeamtheTRECassessorsdidnothavethebenefitofSMEinputfromanativeFloridianlawyer(Losey)whowasfamiliarwithFloridapoliticsandGovernorBushandsince2015hadputsubstantialtimereviewingthisemailcollectionTheFinalReportwillincludeadetailedcomparisonofrecallprecisionandF1basedonthecomparisonofboththeTRECandTeamassessmentsAfewexamplesofthemoreegregiouserrorsencounteredwillbeprovidedTheFinalReportmayalsocontainacompletelistingoftherevisedgoldstandardsthattheTeamcreatedforeachtopicoratleastaconditionalofferofdisclosureofthecorrectedstandardsTheTeaminvitesinputfromotherparticipantsandorganizersoftheTotalRecallTrackonthisissueAgaintheTeamrecognizesthatnogoldstandardiseverperfectincludingitsownrevisedstandardsThiswillbesetforthinfurtherdetailintheTeamrsquosfinalreportResearchQuestion3WhatisthemosteffectivesearchmethodfromtheTeamrsquosmultimodaltool-setforretrievalofrelevantdocumentsfortherelativelysimplisticsearchchallengespresentedbymostofthethirty-fourtopicsShortAnswerFortheeasytopicstheTeamfoundthatwhatitcallsldquotestedparametricBooleankeywordsearchrdquowasthemosteffectivesearchmethodtofindrelevantdocumentsTheTeamwassurprisedbyhowwellasophisticateduseofkeywordswasabletoidentifynearlyallofthetargetrelevantdocumentsinmanyofthetopicsinthisyearrsquosTotalRecallTrackThisshowsthecontinuedimportanceofamultimodalapproachtolegalsearchincludingespeciallykeywordsearchwhendoneproperlyespeciallyinsimplelawsuitsinvolvingrelativelyeasysearchissuesThiswillbesetforthinfurtherdetailintheTeamrsquosfinalreportResearchQuestion4Whatistheroleofactivemachinelearninginretrievalofrelevantdocumentsinthesimplisticsearchchallengespresentedbymostofthethirty-fourtopicsShortAnswerTheTeamfoundthatfortherelativelyeasytopicsinthisyearrsquosTotalRecallTracktheroleofactivemachinelearningwasreducedtoaqualitycontrolfunctionItwouldfindafewrelevantdocumentsnotlocatedbykeywordsearchorconceptandsimilaritysearchandthusimproverecallsomewhatInthesimplesttopicsactivemachinelearningdidnotfindanynewrelevantdocumentsbutinsteadonlyconfirmedthatallrelevantdocumentshadalreadybeenfoundbytheothermethodsThiswillbesetforthinfurtherdetailintheTeamrsquosfinalreport

FurtherDiscussionofResearchQuestion1

BaronOardOverviewoftheTREC2009LegalTrackfoundathttptrecnistgovpubstrec18papersLEGAL09OVERVIEWpdf

13

EvenusingthegivenuncorrectedTRECstandardforscoringandeventhoughinmosttopicswedidnottrainontheTRECreturned-relevantdocumentsthattheTeamconsideredirrelevanttheTeamoverallstillattainedexcellentresultsUnderthecorrectedstandardwhichwillbesharedintheFinalReporttheresultsweremuchbetterThefollowingchartcomparestheTeamrsquosRecallPrecisionandF-MeasureforeachAthometopicwiththeresultsobtainedbyTRECrsquosBMIandBMI-Descruns(onlyotherscoresnowavailable)

REASONABLECOMPARISON

Recall Precision F-Measure

EdiscoveryTeam

BMI BMI-Desc

EdiscoveryTeam

BMI BMI-Desc

EdiscoveryTeam

BMI BMI-Desc

athome401 SummerOlympics 4105 9170 9258 7344 1531 1545 5266 2623 2648athome402 Space 7257 9107 9028 2204 3086 3059 3381 4609 4570athome403 BottledWater 716 9771 9771 8041 3749 3749 1314 5418 5418athome404 EminentDomain 2294 9174 9193 6443 2655 2661 3383 4119 4127athome405 NewtGingrich 9508 9918 9836 2809 982 974 4336 1787 1773athome406 FelonDisenfran 7323 9291 9291 6691 958 958 6992 1737 1737

athome407FaithBasedInitiatives 3102 9180 9199 6872 4186 4195 4275 5750 5762

athome408 InvasiveSpecies 5517 8362 8362 6465 787 787 5953 1439 1439athome409 ClimateChange 8465 9505 9406 4071 1399 1385 5498 2440 2414athome410 Condominiums 9510 9948 9903 4613 4259 4240 6212 5964 5938athome411 StandYourGround 6629 7079 8427 6705 570 609 6667 1055 1136athome412 2000Recount 5738 9135 9248 4918 4097 4148 5296 5657 5727athome413 JamesVCrosby 9634 9908 9927 8900 2873 2878 9252 4455 4463athome414 MedicaidReform 9166 9690 9726 3532 3510 3523 5101 5154 5173athome415 GeorgeWBush 9408 6339 6708 9104 6109 5866 9253 6222 6259athome416 Marketing 6030 9419 9557 4208 4332 4396 4957 5935 6022athome417 MovieGallery 9961 9981 9966 9938 5728 5719 9949 7279 7267athome418 WarPreparations 3957 9305 9358 5034 1268 1276 4431 2232 2245

athome419LostFosterChildRilyaWilson 9884 9306 9361 1504 4813 4841 2610 6344 6382

athome420 Billboards 9254 9946 9932 9216 3165 3161 9235 4802 4795athome421 TrafficCameras 9048 10000 10000 1250 190 190 2197 373 373athome422 NonResidentAliens 9355 10000 10000 090 281 281 179 546 546

athome423NationalRifleAssociation 5105 9965 9965 3318 1868 1868 4022 3146 3146

athome424 GulfDrilling 9960 10000 10000 2276 2639 2639 3705 4176 4176athome425 CivilRightsAct2003 9132 9860 9860 9659 3370 3370 9388 5023 5023athome426 JeffreyGoldhagen 7000 9417 9417 8750 917 917 7778 1672 1672athome427 SlotMachines 8921 9668 9668 3577 1698 1698 5107 2889 2889athome428 NewStadiums 9310 9849 9849 1781 2695 2695 2991 4231 4231athome429 ElianGonzalez 9420 9927 9927 9241 3545 3545 9329 5224 5224

14

athome430RestraintsandHelmets 7195 9425 9465 6500 3640 3655 6830 5252 5274

athome431 AgencyCreditRate 7569 9931 9931 4760 1161 1161 5845 2078 2078athome432 GayAdoption 8500 9857 9857 8623 1120 1120 8561 2012 2012athome433 Abstinence 9911 10000 10000 6607 909 909 7929 1667 1667athome434 BacardiTrademark 8684 10000 10000 9167 344 344 8919 665 665

ThesecomparativestatisticsshowthescoresatthetimeofreasonablecallIntheprecisioncategorywhichinLegalSearchisthemoneyshotthathasthegreatestimpactonthecostofadocumentreviewprojectthee-DiscoveryTeamdominatedIthadthehighestprecisionlevelon28ofthe34topics(82)TheyarehighlightedinblueintheabovechartThee-DiscoveryTeamrsquosaverageprecisionscorewas571TheaverageprecisionofbothBMIandBMI-Descwas248ThustheTeamrsquosprecisionscorewasonaveragemoretwoandaquartertimeshigherthanthatoftheBMIstandards

IntheF1-measurewhichisthestandardvalueusedinlegalsearchtoevaluateoverallprecisionandrecallofaprojectthee-DiscoveryTeamagaindominatedThisissomewhatsurprisinginviewofthefactthatthesemeasurementswerebasedontheerror-filledTRECstandardTheTeamhadthehighestF1scoreson23ofthe34topics(68)TheyarehighlightedinblueintheabovechartThee-DiscoveryTeamrsquosaverageF1scorewas5769

5712

2483 2481

000

1000

2000

3000

4000

5000

6000

7000

AveragePrecisionAcrossTopics

EdiscoveryTeam BMI BMI-Desc

15

TheaverageF1ofBMIandBMI-Descwas365ThustheTeamrsquosF1scorewasonaveragemorethan58higherthanthatoftheBMIstandards

EvenusingTRECrsquoschallengedstandardtheTeamstillattainedhigherrecallthanboththeBMIandBMI-Descstandardsontwotopicstopic415GeorgeBushwithascoreof9408andtopic419LostFosterChildRilyaWilsonwithascoreof9884MoreovertheTeamattainedrecalllevelsinexcessof90atthetimeofreasonablecallinthefollowingadditionaltopics

bull 9508ontopic406FelonDisenfranchisementbull 9510ontopic410Condominiumsbull 9634ontopic413JamesVCrosbybull 9961ontopic417MovieGallerybull 9254ontopic420Billboardsbull 9048ontopic421TrafficCamerasbull 9355ontopic422NonResidentAliensbull 9960ontopic424GulfDrilling

5769

3646 3655

000

1000

2000

3000

4000

5000

6000

7000

AverageF-MeasureAcrossTopics

EdiscoveryTeam BMI BMI-Desc

16

bull 9132ontopic425CivilRightsActof2003bull 9310ontopic428NewStadiumsandArenasbull 9420ontopic429ElianGonzalezbull 9911ontopic433Abstinence

InsummaryevenwiththeTRECstandardwhereinmosttopicstheTeamdidnotusealldocumentsreturnedasrelevantforallofitstrainingdocumentsitattainedRecallscoresgreaterthan90infourteenofthethirty-fourtopicsTheTeamattainedRecallscoresof80orhigherinfouradditionaltopicsTheaverageresultsobtainedacrossallthirty-fourtopicsatthetimeofreasonablecallwereasfollows

bull 7546Recallbull 5712Precisionbull 5769F1bull 121DocsReviewedEffort

TheTeamwilldiscloseallofitsscoresunderthecorrectedgoldstandardintheFinalReportInthemeantimeherearetheaverageresultsobtainedacrossallthirty-fourtopicsatthetimeofreasonablecall

bull 8715Recallbull 6494Precisionbull 6874F1bull 124DocsReviewedEffort

AtthetimeofreasonablecalltheTeamhadrecallscoresgreaterthan90intwenty-oneofthethirty-fourtopicsandgreaterthan80infivemoretopicsRecallofgreaterthan99wasattainedinseventopicsAtthetimeofreasonablecalltheTeamhadprecisionscoresgreaterthan90inthirteenofthethirty-fourtopicsandgreaterthan80intwomoretopicsPrecisionofgreaterthan98wasattainedinsixtopicsAtthetimeofreasonablecalltheTeamhadF1scoresgreaterthan90intwelveofthethirty-fourtopicsandgreaterthan80inonemoretopicF1ofgreaterthan97wasattainedinfivetopicsWewereluckytoattainoneperfectscoreaswedidin2015intopic(417)withanF1scoreof100Theperfectscorewasobtainedbylocatingall5945documentsrelevantunderthecorrectedstandardafterreviewingonly45documentsThistopicwasfilledwithformlettersandwasafairlysimplesearchStilltheBMIandBMI-DescF1scoresforthistopicwerebothunder73TheTeamwaspleasedtoproveonceagainthatperfectrecallandperfectprecisionispossiblealbeitrareusingtheTeamrsquosmethods

17

ForquestionscommentsorsuggestionsconcerningthispreliminaryNotebookreportofthee-DiscoveryTeampleasecontactRalphLoseygmailcom

6

attachmentsandimageswerenotpresentOthercollectionsoftheJebBushemailexistfromPSTfilesthatincludemoreinformationbuttheTeamdidnotutilizethisinformationandlimiteditseffortsandattentiontotheofficialTRECcollectionThissameJebBushemailcollectionwasusedbytheTotalRecallTrackin2015fortentopicsIn2015LoseysearchedalltenofthesetentopicsNoneofthesesearchtopicswasrepeatedin2016Thethirty-fourtopicssearchedin2016andtheirnamesareshownbelowOnthefarrightcolumnarethefirstnamesofthee-DiscoveryTeammemberwhodidthereviewforthattopicThethirteentopicsinredwereconsideredmandatorybyTRECandtheremainingtwenty-onewereoptionalThee-DiscoveryTeamdidalltopics

Topic Name Reviewer

401 SummerOlympics Ralph402 Space Tony403 BottledWater Ralph404 EminentDomain Tony405 NewtGingrich Ralph406 FelonDisenfranchisement Ralph407 FaithBasedInitiatives Ralph408 InvasiveSpecies Tony409 ClimateChange Levi410 Condominiums Tony411 StandYourGround Ralph412 2000Recount Tony413 JamesVCrosby Jim414 MedicaidReform Tony415 GeorgeWBush Jim416 Marketing Jim417 MovieGallery Ralph418 WarPreparations Tony419 LostFosterChildRilyaWilson Levi420 Billboards Jim421 TrafficCameras Jim422 NonResidentAliens Tony423 NationalRifleAssociation Tony424 GulfDrilling Levi425 CivilRightsActof2003 Ralph426 JeffreyGoldhagen Ralph

7

427 SlotMachines Jim428 NewStadiumsandArenas Levi429 ElianGonzalez Jim430 RestraintsandHelmets Jani431 AgencyCreditRatings Tony432 GayAdoption Jani433 Abstinence Jim434 BacardiTrademark Ralph

RalphLoseydidtentopicsTonyReichenbergerdidtenJimSullivandideightLeviKuehndidfourandJaniGrantzdidtwoUnliketheTeamrsquos2015effortnocontractreviewattorneyswereutilizedonanytopicTheywereallsoloeffortsalthoughtherewassomecoordinationandcommutationsbetweenteammembersontheSMEtypeissuesencounteredThispertainedtoquestionsoftruerelevanceanderrorsfoundinthegoldstandardformostofthesetopicsIneachTopictheassignedTeamattorneypersonallyreadandevaluatedfortruerelevanceeveryemailthatTRECreturnedasarelevantdocumentandeveryemailthatTRECunexpectedlyreturnedasIrrelevantSomeofthesewerereadandstudiedmultipletimesbeforewemadeourfinalcallsontruerelevancedeterminationsthattookintoconsiderationandgavesomedeferencetotheTRECassessoradjudicationsbutwerenotboundbythemManyotheremailsthattheTeammembersconsideredirrelevantandTRECagreedwerealsopersonallyreviewedaspartoftheirsearcheffortsAsmentionedtherewassometimesconsultationsanddiscussionbetweenteammembersastotheunexpectedTRECopinionsonrelevanceAllofthethirty-fourtopicspresentedsearchchallengestotheTeamthatwereeasiersomefareasierthantheTeamtypicallyfaceasattorneysleadinglegaldocumentreviewprojectsTheywereroughlyequivalenttothemostsimplisticchallengesthattheymightfaceinprojectsinvolvingverysimplelegaldisputesAfewofthesearchtopicsincludedlegalissuesmuchmorethanwerefoundinthe2015TotalRecallTrackThisisarevisionthattheTeamrequestedandappreciatedbecauseitallowedtestingoflegaljudgmentandanalysisindeterminationoftruerelevanceinthesetopicsInlegalsearchsuchskillsareobviouslyveryimportantInmostofthe2016TotalRecalltopicshowevernospeciallegaltrainingoranalysiswasrequiredforadeterminationoftruerelevanceTheTeamrsquosfinalreportwillspecificallyidentifyeachtopicandastheTeamdidinits2015TRECreportprovidefulldetailsonthetypesofsearchesperformedforeachtopicanddifficultiesencountered

SummaryoftheTeamrsquosWork

Thee-DiscoveryTeamrsquos2016TotalRecallTrackAthomeprojectstartedJune32016andconcludedonAugust312016Usingasingleexpertreviewerineachtopicthee-DiscoveryTeamclassified9863366documentsin34differentreviewprojects

8

Allattorneysusedthee-DiscoveryTeamrsquosPredictiveCoding40hybridmultimodalISTsearchtechniquesandwereassistedbytheKOsoftwareEDRTheyreliedonactivemachinelearningandothersearchtechniquestofindrelevantdocumentsandeffectivetrainingdocumentsThevarioustypesofsearchesincludedintheTeamrsquosmultimodalapproachareshowninthesearchpyramidbelow

LinearreviewreferstoanSMErsquosexaminationofalldocumentsbycertainkeywitnessesinalawsuitduringcertaintimeframescriticaltothedisputedfactsinalawsuitKeywordsearchinourmethodologyreferstotheuseoftermsoriginatingfromlegalanddocumentanalysisandfromwitnessinterviewsJudgmentalsamplingandverificationbySMEsarealsousedtotestthetermsbeforetheyareusedthroughoutadocumentcollectionOurkeywordsearchalsoincludesavarietyofBooleanfunctionsandparametrictargetingwhereinsearchesarelimitedtocertainmetadatafieldsofanelectronicdocumentSimilarityandconceptsearchesrefertoavarietyofpassivemachinelearninganalyticsearchtechniquesTheAIsearchatthetopofthepyramidreferstotheuseofactivemachinelearningTheEDRKOsoftwareusesaproprietarytypeoflogisticregressionalgorithmThestandardeight-stepworkflowusedbytheTeaminlegalsearchprojectsisshowninthediagrambelow8TomeettheTeamrsquosselfimposedtimerequirementsofcompletingeveryreviewprojectwithminimaltimeeffortsthestandardstepsThreeandSevenwereomitted

8LoseyRPredictiveCoding40ndashNineKeyPointsofLegalDocumentReviewandanUpdatedStatementofOurWorkflow(e-DiscoveryTeamOctober2016)containsacompletedescriptionofalleightstepsinpartsSixandSeven

9

aswillbefurtherexplainedFurtherduetotheset-upoftheTRECexperimentsthefirststepofourworkflowESICommunicationswasseverelyconstrainedtothepointofbeingpracticallymeaninglessaswillalsobefurtherexplainedTheTeamrsquosstandardworkflowwasthusreducedtofivestepsasshownbelow

InthefirststepofESICommunicationsteammembersonalegalreviewprojecttypicallyspendhoursindiscussionandanalysisofscopeofrelevanceandthetargetdocumentsThecommunicationsoftenincludehundredsofwrittenexchangesbothinformalsuchasemailsandchatsandformalsuchas(1)detailedrequestsforinformationcontainedincourtdocumentssuchasubpoenasorRequestForProduction(2)inputfromaqualifiedSMEwhoistypicallyalegalexpertwithdeepknowledgeofthefactualissuesinthecaseandthusdeepknowledgeofwhatthepresidingjudgeinthelegalproceedingwillholdtoberelevantanddiscoverableand(3)dialogueswiththepartyrequestingtheproductionofdocumentstoclarifythesearchtargetandotherpartiesTheESIcommunicationsmayleadtoformalmotionswiththegoverningcourtlegalmemorandumshearingsbeforethepresidingjudgeandopinionsrenderedbyoneormorejudgesonthescopeofrelevance9

9IdatPartSixwhereinthefirststepofESICommunicationsisexplainedindetail

10

TheonlyESIcommunicationsintheTRECexperimentalset-upwasaveryshortonesentencedescriptionofrelevanceforeachtopicTwotopicshadatwo-sentencedescription(410-Condominiumsand423-NationalRifleAssociation)TheonlyothertypeofESIcommunicationsinthisTRECTrackweretheautomatedinstantreturnsofalldocumentssubmittedastowhetherTRECconsideredthemtoberelevantornotTherewerenoappealsorotherproceduresset-upforAthomeparticipantswhoactuallyexaminedthedocumentsfortruerelevancetochallengeobviouserrorsinjudgment

ShortAnswerstoResearchQuestionsResearchQuestion1(PrimaryQuestion)WhatRecallPrecisionandEffortlevelswillthee-DiscoveryTeamattaininTRECtestconditionsoverallthirty-fourTopicsusingtheTeamrsquosPredictiveCoding40hybridmultimodalISTsearchmethodsandKrollOntrackrsquossoftwareeDiscoveryReview(EDR)ShortAnswertoPrimaryQuestionAgainlikelastyeartheTeamattainedexcellentresultswithhighlevelsofRecallandPrecisioninalltopicsincludingperfectornearperfectresultsinseveraltopicsusingthecorrectedgoldstandardTheTeamwasabletodosoeventhoughitonlyusedfiveoftheeightstepsinitsusualmethodologyandeventhoughitintentionallyseverelyconstrainedtheamountofhumaneffortexpendedoneachtopicTheTeamrsquosenthusiasmfortheresultswhichweresignificantlybetterthanits2015effortistemperedbythefactthatthesearchchallengespresentedinmostofthetopicsin2016werenotdifficultAsmentionedtheywereequivalenttoaneasylegalsearchprojectsuchasasimplesingleplaintiffemploymentlawdisputeTheFinalReportwillincludeadetailedanalysisoftheseresultsResearchQuestion2WhatistheimpactofmultipleerrorsinSMEjudgmentsbytheTRECassessorsonRecallandPrecisionShortAnswerTheimpactonRecallandPrecisionusingtheTeamrsquosmethodissignificantandasyouwouldexpectvarieddeterminedtothenumberoferrorsmadebyTRECassessorsinaparticulartopicAftertheTeamencounterednumerouserrorsonthefirsttopicsundertakenitwasforcedtocreateitsowngoldstandardoftruerelevantdocumentsforeachtopicTheTeamrsquosnewgoldstandardcorrectedfortheobviouserrorsseeninTRECrsquosassessmentsofrelevanceInallclosequestionsonrelevancethejudgmentofTRECrsquosassessorswasacceptedasaccurateTheobviouserrorsandinconsistenciesseenbytheTeamrsquosclosestudyofthedocumentswerenotacceptedInmostbutnotalltopicstheTeamdidnotusethedocumentswithobviouserrorsforitsmachinetrainingThiswillbefurtherdetailedintheFinalReportInalltopicstheTeamcreateditsownstandardandmadecomparativerecallprecisionandF1calculationsbasedthereonTheobservationandcorrectionofTRECerrorsingoldstandardbecameacollaborativeeffortamongtheTeamtopeerreviewandverifyourcorrected

11

standardMostoftheseeffortsmanyofwhichoccurredaftertheconclusionoftheTrackinAugustwerenotincludedinthetimereportsofeffortsexpendedbyattorneysinthesearchTheTeamwasveryreluctanttotakethisstepandwouldcertainlyhaveletpassafewerrorsormeredifferencesofopinionWerecognizethatnostandardiseverperfectAslawyerstheTeamunderstandsalltoowellthatsomeperhapsmanyjudgmentsonrelevancearesubjectiveAgaininallclosequestionsonrelevancethejudgmentsofTRECrsquosassessorswereacceptedeventhoughwepersonallydisagreedTheTeammeansnodisrespectbythecreationofanalternategoldstandardWeappreciateandrespecttheeffortsmadebytheTRECassessorsandorganizersStillthevolumeofobviouserrorsencounteredforcedustotakethisactionTheintegrityofourprimaryresearchquestiontotesttheeffectivenessofourhands-ontypeofadhochybridmethodsdemandedthatwedosoWeunderstandthattheimpactonotherTotalRecallParticipantsonesthatneveractuallyexaminedocumentswouldbefarlessperhapsevennegligibleStilltherecouldbeanimpactevenfortheminsometopicswheremorethananinsignificantnumberofthesameorsimilardocumentswereinconsistentlyjudgedThedecisiontonotaccepttheerrorsseenandtoinsteadcreateourowngoldstandardresultedinsubstantialadditionalworkfortheTeamInsometopicsweeventookthestepofmakingtwoldquoreasonablecallsrdquoOnewasforTRECandthesecondcallwhichalwaystookplaceonthenextsubmissionwasforourowninternaltrackingInthesecondcallwewouldincludeemailsthatweknewfrompriorsubmissionsofthesameorsimilardocumentwouldagainbeincorrectlyconsideredirrelevantbyTRECWeknewtheyweretruerelevantandsowaiteduntilafterourpublicreasonablecalltoTRECtosubmitthemandthenwemakeourowninternalreasonablecallWewereattemptingtoineffectplaytwogamesatonceandmaximizeourscoreineachgameKeepingtrackoftwostandardsaddedanunexpectedlayerofdifficultytoourworkandwedidnotbothertodosoinalltopicsThedual-calltopicswillbespecificallyidentifiedinourFinalReportInsometopicsthedifferencebetweenthetwostandardswassubstantialInafewtopicsitwasminorSomedifferenceswerefoundinalltopicsThisisnotunexpectedinanystandardinvolvingatleastsomewhatsubjectivemassrelevanceadjudicationsWedonotintendtoengageinacriticismofthespecificgoldstandardcreationmethodsusedin2016TotalRecallTrackexcepttonotethattheappealsprocedureincludedinthe2008and2009TRECLegalTrackscouldhaveimprovedtheaccuracyoftheresultsfortheTotalRecallTrackAthomeparticipants10FurthertheTeamunderstandsfrominformalreportsthattheTREC

10Participantappealrightscouldhavemitigatedtheerrorsseenin2016butthiscanbeburdensomeandasseeninthoseTracksin2008and2009cancreatetheirownissuesSeeOardHedlinTomlinsonBaronOverviewoftheTREC2008LegalTrackfoundathttptrecnistgovpubstrec17papersLEGALOVERVIEW08pdfandOardHedlinTomlinson

12

assessorsworkwasmuchmoretimeconstrainedthanwastheworkoftheTeamMoreoverunliketheTeamtheTRECassessorsdidnothavethebenefitofSMEinputfromanativeFloridianlawyer(Losey)whowasfamiliarwithFloridapoliticsandGovernorBushandsince2015hadputsubstantialtimereviewingthisemailcollectionTheFinalReportwillincludeadetailedcomparisonofrecallprecisionandF1basedonthecomparisonofboththeTRECandTeamassessmentsAfewexamplesofthemoreegregiouserrorsencounteredwillbeprovidedTheFinalReportmayalsocontainacompletelistingoftherevisedgoldstandardsthattheTeamcreatedforeachtopicoratleastaconditionalofferofdisclosureofthecorrectedstandardsTheTeaminvitesinputfromotherparticipantsandorganizersoftheTotalRecallTrackonthisissueAgaintheTeamrecognizesthatnogoldstandardiseverperfectincludingitsownrevisedstandardsThiswillbesetforthinfurtherdetailintheTeamrsquosfinalreportResearchQuestion3WhatisthemosteffectivesearchmethodfromtheTeamrsquosmultimodaltool-setforretrievalofrelevantdocumentsfortherelativelysimplisticsearchchallengespresentedbymostofthethirty-fourtopicsShortAnswerFortheeasytopicstheTeamfoundthatwhatitcallsldquotestedparametricBooleankeywordsearchrdquowasthemosteffectivesearchmethodtofindrelevantdocumentsTheTeamwassurprisedbyhowwellasophisticateduseofkeywordswasabletoidentifynearlyallofthetargetrelevantdocumentsinmanyofthetopicsinthisyearrsquosTotalRecallTrackThisshowsthecontinuedimportanceofamultimodalapproachtolegalsearchincludingespeciallykeywordsearchwhendoneproperlyespeciallyinsimplelawsuitsinvolvingrelativelyeasysearchissuesThiswillbesetforthinfurtherdetailintheTeamrsquosfinalreportResearchQuestion4Whatistheroleofactivemachinelearninginretrievalofrelevantdocumentsinthesimplisticsearchchallengespresentedbymostofthethirty-fourtopicsShortAnswerTheTeamfoundthatfortherelativelyeasytopicsinthisyearrsquosTotalRecallTracktheroleofactivemachinelearningwasreducedtoaqualitycontrolfunctionItwouldfindafewrelevantdocumentsnotlocatedbykeywordsearchorconceptandsimilaritysearchandthusimproverecallsomewhatInthesimplesttopicsactivemachinelearningdidnotfindanynewrelevantdocumentsbutinsteadonlyconfirmedthatallrelevantdocumentshadalreadybeenfoundbytheothermethodsThiswillbesetforthinfurtherdetailintheTeamrsquosfinalreport

FurtherDiscussionofResearchQuestion1

BaronOardOverviewoftheTREC2009LegalTrackfoundathttptrecnistgovpubstrec18papersLEGAL09OVERVIEWpdf

13

EvenusingthegivenuncorrectedTRECstandardforscoringandeventhoughinmosttopicswedidnottrainontheTRECreturned-relevantdocumentsthattheTeamconsideredirrelevanttheTeamoverallstillattainedexcellentresultsUnderthecorrectedstandardwhichwillbesharedintheFinalReporttheresultsweremuchbetterThefollowingchartcomparestheTeamrsquosRecallPrecisionandF-MeasureforeachAthometopicwiththeresultsobtainedbyTRECrsquosBMIandBMI-Descruns(onlyotherscoresnowavailable)

REASONABLECOMPARISON

Recall Precision F-Measure

EdiscoveryTeam

BMI BMI-Desc

EdiscoveryTeam

BMI BMI-Desc

EdiscoveryTeam

BMI BMI-Desc

athome401 SummerOlympics 4105 9170 9258 7344 1531 1545 5266 2623 2648athome402 Space 7257 9107 9028 2204 3086 3059 3381 4609 4570athome403 BottledWater 716 9771 9771 8041 3749 3749 1314 5418 5418athome404 EminentDomain 2294 9174 9193 6443 2655 2661 3383 4119 4127athome405 NewtGingrich 9508 9918 9836 2809 982 974 4336 1787 1773athome406 FelonDisenfran 7323 9291 9291 6691 958 958 6992 1737 1737

athome407FaithBasedInitiatives 3102 9180 9199 6872 4186 4195 4275 5750 5762

athome408 InvasiveSpecies 5517 8362 8362 6465 787 787 5953 1439 1439athome409 ClimateChange 8465 9505 9406 4071 1399 1385 5498 2440 2414athome410 Condominiums 9510 9948 9903 4613 4259 4240 6212 5964 5938athome411 StandYourGround 6629 7079 8427 6705 570 609 6667 1055 1136athome412 2000Recount 5738 9135 9248 4918 4097 4148 5296 5657 5727athome413 JamesVCrosby 9634 9908 9927 8900 2873 2878 9252 4455 4463athome414 MedicaidReform 9166 9690 9726 3532 3510 3523 5101 5154 5173athome415 GeorgeWBush 9408 6339 6708 9104 6109 5866 9253 6222 6259athome416 Marketing 6030 9419 9557 4208 4332 4396 4957 5935 6022athome417 MovieGallery 9961 9981 9966 9938 5728 5719 9949 7279 7267athome418 WarPreparations 3957 9305 9358 5034 1268 1276 4431 2232 2245

athome419LostFosterChildRilyaWilson 9884 9306 9361 1504 4813 4841 2610 6344 6382

athome420 Billboards 9254 9946 9932 9216 3165 3161 9235 4802 4795athome421 TrafficCameras 9048 10000 10000 1250 190 190 2197 373 373athome422 NonResidentAliens 9355 10000 10000 090 281 281 179 546 546

athome423NationalRifleAssociation 5105 9965 9965 3318 1868 1868 4022 3146 3146

athome424 GulfDrilling 9960 10000 10000 2276 2639 2639 3705 4176 4176athome425 CivilRightsAct2003 9132 9860 9860 9659 3370 3370 9388 5023 5023athome426 JeffreyGoldhagen 7000 9417 9417 8750 917 917 7778 1672 1672athome427 SlotMachines 8921 9668 9668 3577 1698 1698 5107 2889 2889athome428 NewStadiums 9310 9849 9849 1781 2695 2695 2991 4231 4231athome429 ElianGonzalez 9420 9927 9927 9241 3545 3545 9329 5224 5224

14

athome430RestraintsandHelmets 7195 9425 9465 6500 3640 3655 6830 5252 5274

athome431 AgencyCreditRate 7569 9931 9931 4760 1161 1161 5845 2078 2078athome432 GayAdoption 8500 9857 9857 8623 1120 1120 8561 2012 2012athome433 Abstinence 9911 10000 10000 6607 909 909 7929 1667 1667athome434 BacardiTrademark 8684 10000 10000 9167 344 344 8919 665 665

ThesecomparativestatisticsshowthescoresatthetimeofreasonablecallIntheprecisioncategorywhichinLegalSearchisthemoneyshotthathasthegreatestimpactonthecostofadocumentreviewprojectthee-DiscoveryTeamdominatedIthadthehighestprecisionlevelon28ofthe34topics(82)TheyarehighlightedinblueintheabovechartThee-DiscoveryTeamrsquosaverageprecisionscorewas571TheaverageprecisionofbothBMIandBMI-Descwas248ThustheTeamrsquosprecisionscorewasonaveragemoretwoandaquartertimeshigherthanthatoftheBMIstandards

IntheF1-measurewhichisthestandardvalueusedinlegalsearchtoevaluateoverallprecisionandrecallofaprojectthee-DiscoveryTeamagaindominatedThisissomewhatsurprisinginviewofthefactthatthesemeasurementswerebasedontheerror-filledTRECstandardTheTeamhadthehighestF1scoreson23ofthe34topics(68)TheyarehighlightedinblueintheabovechartThee-DiscoveryTeamrsquosaverageF1scorewas5769

5712

2483 2481

000

1000

2000

3000

4000

5000

6000

7000

AveragePrecisionAcrossTopics

EdiscoveryTeam BMI BMI-Desc

15

TheaverageF1ofBMIandBMI-Descwas365ThustheTeamrsquosF1scorewasonaveragemorethan58higherthanthatoftheBMIstandards

EvenusingTRECrsquoschallengedstandardtheTeamstillattainedhigherrecallthanboththeBMIandBMI-Descstandardsontwotopicstopic415GeorgeBushwithascoreof9408andtopic419LostFosterChildRilyaWilsonwithascoreof9884MoreovertheTeamattainedrecalllevelsinexcessof90atthetimeofreasonablecallinthefollowingadditionaltopics

bull 9508ontopic406FelonDisenfranchisementbull 9510ontopic410Condominiumsbull 9634ontopic413JamesVCrosbybull 9961ontopic417MovieGallerybull 9254ontopic420Billboardsbull 9048ontopic421TrafficCamerasbull 9355ontopic422NonResidentAliensbull 9960ontopic424GulfDrilling

5769

3646 3655

000

1000

2000

3000

4000

5000

6000

7000

AverageF-MeasureAcrossTopics

EdiscoveryTeam BMI BMI-Desc

16

bull 9132ontopic425CivilRightsActof2003bull 9310ontopic428NewStadiumsandArenasbull 9420ontopic429ElianGonzalezbull 9911ontopic433Abstinence

InsummaryevenwiththeTRECstandardwhereinmosttopicstheTeamdidnotusealldocumentsreturnedasrelevantforallofitstrainingdocumentsitattainedRecallscoresgreaterthan90infourteenofthethirty-fourtopicsTheTeamattainedRecallscoresof80orhigherinfouradditionaltopicsTheaverageresultsobtainedacrossallthirty-fourtopicsatthetimeofreasonablecallwereasfollows

bull 7546Recallbull 5712Precisionbull 5769F1bull 121DocsReviewedEffort

TheTeamwilldiscloseallofitsscoresunderthecorrectedgoldstandardintheFinalReportInthemeantimeherearetheaverageresultsobtainedacrossallthirty-fourtopicsatthetimeofreasonablecall

bull 8715Recallbull 6494Precisionbull 6874F1bull 124DocsReviewedEffort

AtthetimeofreasonablecalltheTeamhadrecallscoresgreaterthan90intwenty-oneofthethirty-fourtopicsandgreaterthan80infivemoretopicsRecallofgreaterthan99wasattainedinseventopicsAtthetimeofreasonablecalltheTeamhadprecisionscoresgreaterthan90inthirteenofthethirty-fourtopicsandgreaterthan80intwomoretopicsPrecisionofgreaterthan98wasattainedinsixtopicsAtthetimeofreasonablecalltheTeamhadF1scoresgreaterthan90intwelveofthethirty-fourtopicsandgreaterthan80inonemoretopicF1ofgreaterthan97wasattainedinfivetopicsWewereluckytoattainoneperfectscoreaswedidin2015intopic(417)withanF1scoreof100Theperfectscorewasobtainedbylocatingall5945documentsrelevantunderthecorrectedstandardafterreviewingonly45documentsThistopicwasfilledwithformlettersandwasafairlysimplesearchStilltheBMIandBMI-DescF1scoresforthistopicwerebothunder73TheTeamwaspleasedtoproveonceagainthatperfectrecallandperfectprecisionispossiblealbeitrareusingtheTeamrsquosmethods

17

ForquestionscommentsorsuggestionsconcerningthispreliminaryNotebookreportofthee-DiscoveryTeampleasecontactRalphLoseygmailcom

7

427 SlotMachines Jim428 NewStadiumsandArenas Levi429 ElianGonzalez Jim430 RestraintsandHelmets Jani431 AgencyCreditRatings Tony432 GayAdoption Jani433 Abstinence Jim434 BacardiTrademark Ralph

RalphLoseydidtentopicsTonyReichenbergerdidtenJimSullivandideightLeviKuehndidfourandJaniGrantzdidtwoUnliketheTeamrsquos2015effortnocontractreviewattorneyswereutilizedonanytopicTheywereallsoloeffortsalthoughtherewassomecoordinationandcommutationsbetweenteammembersontheSMEtypeissuesencounteredThispertainedtoquestionsoftruerelevanceanderrorsfoundinthegoldstandardformostofthesetopicsIneachTopictheassignedTeamattorneypersonallyreadandevaluatedfortruerelevanceeveryemailthatTRECreturnedasarelevantdocumentandeveryemailthatTRECunexpectedlyreturnedasIrrelevantSomeofthesewerereadandstudiedmultipletimesbeforewemadeourfinalcallsontruerelevancedeterminationsthattookintoconsiderationandgavesomedeferencetotheTRECassessoradjudicationsbutwerenotboundbythemManyotheremailsthattheTeammembersconsideredirrelevantandTRECagreedwerealsopersonallyreviewedaspartoftheirsearcheffortsAsmentionedtherewassometimesconsultationsanddiscussionbetweenteammembersastotheunexpectedTRECopinionsonrelevanceAllofthethirty-fourtopicspresentedsearchchallengestotheTeamthatwereeasiersomefareasierthantheTeamtypicallyfaceasattorneysleadinglegaldocumentreviewprojectsTheywereroughlyequivalenttothemostsimplisticchallengesthattheymightfaceinprojectsinvolvingverysimplelegaldisputesAfewofthesearchtopicsincludedlegalissuesmuchmorethanwerefoundinthe2015TotalRecallTrackThisisarevisionthattheTeamrequestedandappreciatedbecauseitallowedtestingoflegaljudgmentandanalysisindeterminationoftruerelevanceinthesetopicsInlegalsearchsuchskillsareobviouslyveryimportantInmostofthe2016TotalRecalltopicshowevernospeciallegaltrainingoranalysiswasrequiredforadeterminationoftruerelevanceTheTeamrsquosfinalreportwillspecificallyidentifyeachtopicandastheTeamdidinits2015TRECreportprovidefulldetailsonthetypesofsearchesperformedforeachtopicanddifficultiesencountered

SummaryoftheTeamrsquosWork

Thee-DiscoveryTeamrsquos2016TotalRecallTrackAthomeprojectstartedJune32016andconcludedonAugust312016Usingasingleexpertreviewerineachtopicthee-DiscoveryTeamclassified9863366documentsin34differentreviewprojects

8

Allattorneysusedthee-DiscoveryTeamrsquosPredictiveCoding40hybridmultimodalISTsearchtechniquesandwereassistedbytheKOsoftwareEDRTheyreliedonactivemachinelearningandothersearchtechniquestofindrelevantdocumentsandeffectivetrainingdocumentsThevarioustypesofsearchesincludedintheTeamrsquosmultimodalapproachareshowninthesearchpyramidbelow

LinearreviewreferstoanSMErsquosexaminationofalldocumentsbycertainkeywitnessesinalawsuitduringcertaintimeframescriticaltothedisputedfactsinalawsuitKeywordsearchinourmethodologyreferstotheuseoftermsoriginatingfromlegalanddocumentanalysisandfromwitnessinterviewsJudgmentalsamplingandverificationbySMEsarealsousedtotestthetermsbeforetheyareusedthroughoutadocumentcollectionOurkeywordsearchalsoincludesavarietyofBooleanfunctionsandparametrictargetingwhereinsearchesarelimitedtocertainmetadatafieldsofanelectronicdocumentSimilarityandconceptsearchesrefertoavarietyofpassivemachinelearninganalyticsearchtechniquesTheAIsearchatthetopofthepyramidreferstotheuseofactivemachinelearningTheEDRKOsoftwareusesaproprietarytypeoflogisticregressionalgorithmThestandardeight-stepworkflowusedbytheTeaminlegalsearchprojectsisshowninthediagrambelow8TomeettheTeamrsquosselfimposedtimerequirementsofcompletingeveryreviewprojectwithminimaltimeeffortsthestandardstepsThreeandSevenwereomitted

8LoseyRPredictiveCoding40ndashNineKeyPointsofLegalDocumentReviewandanUpdatedStatementofOurWorkflow(e-DiscoveryTeamOctober2016)containsacompletedescriptionofalleightstepsinpartsSixandSeven

9

aswillbefurtherexplainedFurtherduetotheset-upoftheTRECexperimentsthefirststepofourworkflowESICommunicationswasseverelyconstrainedtothepointofbeingpracticallymeaninglessaswillalsobefurtherexplainedTheTeamrsquosstandardworkflowwasthusreducedtofivestepsasshownbelow

InthefirststepofESICommunicationsteammembersonalegalreviewprojecttypicallyspendhoursindiscussionandanalysisofscopeofrelevanceandthetargetdocumentsThecommunicationsoftenincludehundredsofwrittenexchangesbothinformalsuchasemailsandchatsandformalsuchas(1)detailedrequestsforinformationcontainedincourtdocumentssuchasubpoenasorRequestForProduction(2)inputfromaqualifiedSMEwhoistypicallyalegalexpertwithdeepknowledgeofthefactualissuesinthecaseandthusdeepknowledgeofwhatthepresidingjudgeinthelegalproceedingwillholdtoberelevantanddiscoverableand(3)dialogueswiththepartyrequestingtheproductionofdocumentstoclarifythesearchtargetandotherpartiesTheESIcommunicationsmayleadtoformalmotionswiththegoverningcourtlegalmemorandumshearingsbeforethepresidingjudgeandopinionsrenderedbyoneormorejudgesonthescopeofrelevance9

9IdatPartSixwhereinthefirststepofESICommunicationsisexplainedindetail

10

TheonlyESIcommunicationsintheTRECexperimentalset-upwasaveryshortonesentencedescriptionofrelevanceforeachtopicTwotopicshadatwo-sentencedescription(410-Condominiumsand423-NationalRifleAssociation)TheonlyothertypeofESIcommunicationsinthisTRECTrackweretheautomatedinstantreturnsofalldocumentssubmittedastowhetherTRECconsideredthemtoberelevantornotTherewerenoappealsorotherproceduresset-upforAthomeparticipantswhoactuallyexaminedthedocumentsfortruerelevancetochallengeobviouserrorsinjudgment

ShortAnswerstoResearchQuestionsResearchQuestion1(PrimaryQuestion)WhatRecallPrecisionandEffortlevelswillthee-DiscoveryTeamattaininTRECtestconditionsoverallthirty-fourTopicsusingtheTeamrsquosPredictiveCoding40hybridmultimodalISTsearchmethodsandKrollOntrackrsquossoftwareeDiscoveryReview(EDR)ShortAnswertoPrimaryQuestionAgainlikelastyeartheTeamattainedexcellentresultswithhighlevelsofRecallandPrecisioninalltopicsincludingperfectornearperfectresultsinseveraltopicsusingthecorrectedgoldstandardTheTeamwasabletodosoeventhoughitonlyusedfiveoftheeightstepsinitsusualmethodologyandeventhoughitintentionallyseverelyconstrainedtheamountofhumaneffortexpendedoneachtopicTheTeamrsquosenthusiasmfortheresultswhichweresignificantlybetterthanits2015effortistemperedbythefactthatthesearchchallengespresentedinmostofthetopicsin2016werenotdifficultAsmentionedtheywereequivalenttoaneasylegalsearchprojectsuchasasimplesingleplaintiffemploymentlawdisputeTheFinalReportwillincludeadetailedanalysisoftheseresultsResearchQuestion2WhatistheimpactofmultipleerrorsinSMEjudgmentsbytheTRECassessorsonRecallandPrecisionShortAnswerTheimpactonRecallandPrecisionusingtheTeamrsquosmethodissignificantandasyouwouldexpectvarieddeterminedtothenumberoferrorsmadebyTRECassessorsinaparticulartopicAftertheTeamencounterednumerouserrorsonthefirsttopicsundertakenitwasforcedtocreateitsowngoldstandardoftruerelevantdocumentsforeachtopicTheTeamrsquosnewgoldstandardcorrectedfortheobviouserrorsseeninTRECrsquosassessmentsofrelevanceInallclosequestionsonrelevancethejudgmentofTRECrsquosassessorswasacceptedasaccurateTheobviouserrorsandinconsistenciesseenbytheTeamrsquosclosestudyofthedocumentswerenotacceptedInmostbutnotalltopicstheTeamdidnotusethedocumentswithobviouserrorsforitsmachinetrainingThiswillbefurtherdetailedintheFinalReportInalltopicstheTeamcreateditsownstandardandmadecomparativerecallprecisionandF1calculationsbasedthereonTheobservationandcorrectionofTRECerrorsingoldstandardbecameacollaborativeeffortamongtheTeamtopeerreviewandverifyourcorrected

11

standardMostoftheseeffortsmanyofwhichoccurredaftertheconclusionoftheTrackinAugustwerenotincludedinthetimereportsofeffortsexpendedbyattorneysinthesearchTheTeamwasveryreluctanttotakethisstepandwouldcertainlyhaveletpassafewerrorsormeredifferencesofopinionWerecognizethatnostandardiseverperfectAslawyerstheTeamunderstandsalltoowellthatsomeperhapsmanyjudgmentsonrelevancearesubjectiveAgaininallclosequestionsonrelevancethejudgmentsofTRECrsquosassessorswereacceptedeventhoughwepersonallydisagreedTheTeammeansnodisrespectbythecreationofanalternategoldstandardWeappreciateandrespecttheeffortsmadebytheTRECassessorsandorganizersStillthevolumeofobviouserrorsencounteredforcedustotakethisactionTheintegrityofourprimaryresearchquestiontotesttheeffectivenessofourhands-ontypeofadhochybridmethodsdemandedthatwedosoWeunderstandthattheimpactonotherTotalRecallParticipantsonesthatneveractuallyexaminedocumentswouldbefarlessperhapsevennegligibleStilltherecouldbeanimpactevenfortheminsometopicswheremorethananinsignificantnumberofthesameorsimilardocumentswereinconsistentlyjudgedThedecisiontonotaccepttheerrorsseenandtoinsteadcreateourowngoldstandardresultedinsubstantialadditionalworkfortheTeamInsometopicsweeventookthestepofmakingtwoldquoreasonablecallsrdquoOnewasforTRECandthesecondcallwhichalwaystookplaceonthenextsubmissionwasforourowninternaltrackingInthesecondcallwewouldincludeemailsthatweknewfrompriorsubmissionsofthesameorsimilardocumentwouldagainbeincorrectlyconsideredirrelevantbyTRECWeknewtheyweretruerelevantandsowaiteduntilafterourpublicreasonablecalltoTRECtosubmitthemandthenwemakeourowninternalreasonablecallWewereattemptingtoineffectplaytwogamesatonceandmaximizeourscoreineachgameKeepingtrackoftwostandardsaddedanunexpectedlayerofdifficultytoourworkandwedidnotbothertodosoinalltopicsThedual-calltopicswillbespecificallyidentifiedinourFinalReportInsometopicsthedifferencebetweenthetwostandardswassubstantialInafewtopicsitwasminorSomedifferenceswerefoundinalltopicsThisisnotunexpectedinanystandardinvolvingatleastsomewhatsubjectivemassrelevanceadjudicationsWedonotintendtoengageinacriticismofthespecificgoldstandardcreationmethodsusedin2016TotalRecallTrackexcepttonotethattheappealsprocedureincludedinthe2008and2009TRECLegalTrackscouldhaveimprovedtheaccuracyoftheresultsfortheTotalRecallTrackAthomeparticipants10FurthertheTeamunderstandsfrominformalreportsthattheTREC

10Participantappealrightscouldhavemitigatedtheerrorsseenin2016butthiscanbeburdensomeandasseeninthoseTracksin2008and2009cancreatetheirownissuesSeeOardHedlinTomlinsonBaronOverviewoftheTREC2008LegalTrackfoundathttptrecnistgovpubstrec17papersLEGALOVERVIEW08pdfandOardHedlinTomlinson

12

assessorsworkwasmuchmoretimeconstrainedthanwastheworkoftheTeamMoreoverunliketheTeamtheTRECassessorsdidnothavethebenefitofSMEinputfromanativeFloridianlawyer(Losey)whowasfamiliarwithFloridapoliticsandGovernorBushandsince2015hadputsubstantialtimereviewingthisemailcollectionTheFinalReportwillincludeadetailedcomparisonofrecallprecisionandF1basedonthecomparisonofboththeTRECandTeamassessmentsAfewexamplesofthemoreegregiouserrorsencounteredwillbeprovidedTheFinalReportmayalsocontainacompletelistingoftherevisedgoldstandardsthattheTeamcreatedforeachtopicoratleastaconditionalofferofdisclosureofthecorrectedstandardsTheTeaminvitesinputfromotherparticipantsandorganizersoftheTotalRecallTrackonthisissueAgaintheTeamrecognizesthatnogoldstandardiseverperfectincludingitsownrevisedstandardsThiswillbesetforthinfurtherdetailintheTeamrsquosfinalreportResearchQuestion3WhatisthemosteffectivesearchmethodfromtheTeamrsquosmultimodaltool-setforretrievalofrelevantdocumentsfortherelativelysimplisticsearchchallengespresentedbymostofthethirty-fourtopicsShortAnswerFortheeasytopicstheTeamfoundthatwhatitcallsldquotestedparametricBooleankeywordsearchrdquowasthemosteffectivesearchmethodtofindrelevantdocumentsTheTeamwassurprisedbyhowwellasophisticateduseofkeywordswasabletoidentifynearlyallofthetargetrelevantdocumentsinmanyofthetopicsinthisyearrsquosTotalRecallTrackThisshowsthecontinuedimportanceofamultimodalapproachtolegalsearchincludingespeciallykeywordsearchwhendoneproperlyespeciallyinsimplelawsuitsinvolvingrelativelyeasysearchissuesThiswillbesetforthinfurtherdetailintheTeamrsquosfinalreportResearchQuestion4Whatistheroleofactivemachinelearninginretrievalofrelevantdocumentsinthesimplisticsearchchallengespresentedbymostofthethirty-fourtopicsShortAnswerTheTeamfoundthatfortherelativelyeasytopicsinthisyearrsquosTotalRecallTracktheroleofactivemachinelearningwasreducedtoaqualitycontrolfunctionItwouldfindafewrelevantdocumentsnotlocatedbykeywordsearchorconceptandsimilaritysearchandthusimproverecallsomewhatInthesimplesttopicsactivemachinelearningdidnotfindanynewrelevantdocumentsbutinsteadonlyconfirmedthatallrelevantdocumentshadalreadybeenfoundbytheothermethodsThiswillbesetforthinfurtherdetailintheTeamrsquosfinalreport

FurtherDiscussionofResearchQuestion1

BaronOardOverviewoftheTREC2009LegalTrackfoundathttptrecnistgovpubstrec18papersLEGAL09OVERVIEWpdf

13

EvenusingthegivenuncorrectedTRECstandardforscoringandeventhoughinmosttopicswedidnottrainontheTRECreturned-relevantdocumentsthattheTeamconsideredirrelevanttheTeamoverallstillattainedexcellentresultsUnderthecorrectedstandardwhichwillbesharedintheFinalReporttheresultsweremuchbetterThefollowingchartcomparestheTeamrsquosRecallPrecisionandF-MeasureforeachAthometopicwiththeresultsobtainedbyTRECrsquosBMIandBMI-Descruns(onlyotherscoresnowavailable)

REASONABLECOMPARISON

Recall Precision F-Measure

EdiscoveryTeam

BMI BMI-Desc

EdiscoveryTeam

BMI BMI-Desc

EdiscoveryTeam

BMI BMI-Desc

athome401 SummerOlympics 4105 9170 9258 7344 1531 1545 5266 2623 2648athome402 Space 7257 9107 9028 2204 3086 3059 3381 4609 4570athome403 BottledWater 716 9771 9771 8041 3749 3749 1314 5418 5418athome404 EminentDomain 2294 9174 9193 6443 2655 2661 3383 4119 4127athome405 NewtGingrich 9508 9918 9836 2809 982 974 4336 1787 1773athome406 FelonDisenfran 7323 9291 9291 6691 958 958 6992 1737 1737

athome407FaithBasedInitiatives 3102 9180 9199 6872 4186 4195 4275 5750 5762

athome408 InvasiveSpecies 5517 8362 8362 6465 787 787 5953 1439 1439athome409 ClimateChange 8465 9505 9406 4071 1399 1385 5498 2440 2414athome410 Condominiums 9510 9948 9903 4613 4259 4240 6212 5964 5938athome411 StandYourGround 6629 7079 8427 6705 570 609 6667 1055 1136athome412 2000Recount 5738 9135 9248 4918 4097 4148 5296 5657 5727athome413 JamesVCrosby 9634 9908 9927 8900 2873 2878 9252 4455 4463athome414 MedicaidReform 9166 9690 9726 3532 3510 3523 5101 5154 5173athome415 GeorgeWBush 9408 6339 6708 9104 6109 5866 9253 6222 6259athome416 Marketing 6030 9419 9557 4208 4332 4396 4957 5935 6022athome417 MovieGallery 9961 9981 9966 9938 5728 5719 9949 7279 7267athome418 WarPreparations 3957 9305 9358 5034 1268 1276 4431 2232 2245

athome419LostFosterChildRilyaWilson 9884 9306 9361 1504 4813 4841 2610 6344 6382

athome420 Billboards 9254 9946 9932 9216 3165 3161 9235 4802 4795athome421 TrafficCameras 9048 10000 10000 1250 190 190 2197 373 373athome422 NonResidentAliens 9355 10000 10000 090 281 281 179 546 546

athome423NationalRifleAssociation 5105 9965 9965 3318 1868 1868 4022 3146 3146

athome424 GulfDrilling 9960 10000 10000 2276 2639 2639 3705 4176 4176athome425 CivilRightsAct2003 9132 9860 9860 9659 3370 3370 9388 5023 5023athome426 JeffreyGoldhagen 7000 9417 9417 8750 917 917 7778 1672 1672athome427 SlotMachines 8921 9668 9668 3577 1698 1698 5107 2889 2889athome428 NewStadiums 9310 9849 9849 1781 2695 2695 2991 4231 4231athome429 ElianGonzalez 9420 9927 9927 9241 3545 3545 9329 5224 5224

14

athome430RestraintsandHelmets 7195 9425 9465 6500 3640 3655 6830 5252 5274

athome431 AgencyCreditRate 7569 9931 9931 4760 1161 1161 5845 2078 2078athome432 GayAdoption 8500 9857 9857 8623 1120 1120 8561 2012 2012athome433 Abstinence 9911 10000 10000 6607 909 909 7929 1667 1667athome434 BacardiTrademark 8684 10000 10000 9167 344 344 8919 665 665

ThesecomparativestatisticsshowthescoresatthetimeofreasonablecallIntheprecisioncategorywhichinLegalSearchisthemoneyshotthathasthegreatestimpactonthecostofadocumentreviewprojectthee-DiscoveryTeamdominatedIthadthehighestprecisionlevelon28ofthe34topics(82)TheyarehighlightedinblueintheabovechartThee-DiscoveryTeamrsquosaverageprecisionscorewas571TheaverageprecisionofbothBMIandBMI-Descwas248ThustheTeamrsquosprecisionscorewasonaveragemoretwoandaquartertimeshigherthanthatoftheBMIstandards

IntheF1-measurewhichisthestandardvalueusedinlegalsearchtoevaluateoverallprecisionandrecallofaprojectthee-DiscoveryTeamagaindominatedThisissomewhatsurprisinginviewofthefactthatthesemeasurementswerebasedontheerror-filledTRECstandardTheTeamhadthehighestF1scoreson23ofthe34topics(68)TheyarehighlightedinblueintheabovechartThee-DiscoveryTeamrsquosaverageF1scorewas5769

5712

2483 2481

000

1000

2000

3000

4000

5000

6000

7000

AveragePrecisionAcrossTopics

EdiscoveryTeam BMI BMI-Desc

15

TheaverageF1ofBMIandBMI-Descwas365ThustheTeamrsquosF1scorewasonaveragemorethan58higherthanthatoftheBMIstandards

EvenusingTRECrsquoschallengedstandardtheTeamstillattainedhigherrecallthanboththeBMIandBMI-Descstandardsontwotopicstopic415GeorgeBushwithascoreof9408andtopic419LostFosterChildRilyaWilsonwithascoreof9884MoreovertheTeamattainedrecalllevelsinexcessof90atthetimeofreasonablecallinthefollowingadditionaltopics

bull 9508ontopic406FelonDisenfranchisementbull 9510ontopic410Condominiumsbull 9634ontopic413JamesVCrosbybull 9961ontopic417MovieGallerybull 9254ontopic420Billboardsbull 9048ontopic421TrafficCamerasbull 9355ontopic422NonResidentAliensbull 9960ontopic424GulfDrilling

5769

3646 3655

000

1000

2000

3000

4000

5000

6000

7000

AverageF-MeasureAcrossTopics

EdiscoveryTeam BMI BMI-Desc

16

bull 9132ontopic425CivilRightsActof2003bull 9310ontopic428NewStadiumsandArenasbull 9420ontopic429ElianGonzalezbull 9911ontopic433Abstinence

InsummaryevenwiththeTRECstandardwhereinmosttopicstheTeamdidnotusealldocumentsreturnedasrelevantforallofitstrainingdocumentsitattainedRecallscoresgreaterthan90infourteenofthethirty-fourtopicsTheTeamattainedRecallscoresof80orhigherinfouradditionaltopicsTheaverageresultsobtainedacrossallthirty-fourtopicsatthetimeofreasonablecallwereasfollows

bull 7546Recallbull 5712Precisionbull 5769F1bull 121DocsReviewedEffort

TheTeamwilldiscloseallofitsscoresunderthecorrectedgoldstandardintheFinalReportInthemeantimeherearetheaverageresultsobtainedacrossallthirty-fourtopicsatthetimeofreasonablecall

bull 8715Recallbull 6494Precisionbull 6874F1bull 124DocsReviewedEffort

AtthetimeofreasonablecalltheTeamhadrecallscoresgreaterthan90intwenty-oneofthethirty-fourtopicsandgreaterthan80infivemoretopicsRecallofgreaterthan99wasattainedinseventopicsAtthetimeofreasonablecalltheTeamhadprecisionscoresgreaterthan90inthirteenofthethirty-fourtopicsandgreaterthan80intwomoretopicsPrecisionofgreaterthan98wasattainedinsixtopicsAtthetimeofreasonablecalltheTeamhadF1scoresgreaterthan90intwelveofthethirty-fourtopicsandgreaterthan80inonemoretopicF1ofgreaterthan97wasattainedinfivetopicsWewereluckytoattainoneperfectscoreaswedidin2015intopic(417)withanF1scoreof100Theperfectscorewasobtainedbylocatingall5945documentsrelevantunderthecorrectedstandardafterreviewingonly45documentsThistopicwasfilledwithformlettersandwasafairlysimplesearchStilltheBMIandBMI-DescF1scoresforthistopicwerebothunder73TheTeamwaspleasedtoproveonceagainthatperfectrecallandperfectprecisionispossiblealbeitrareusingtheTeamrsquosmethods

17

ForquestionscommentsorsuggestionsconcerningthispreliminaryNotebookreportofthee-DiscoveryTeampleasecontactRalphLoseygmailcom

8

Allattorneysusedthee-DiscoveryTeamrsquosPredictiveCoding40hybridmultimodalISTsearchtechniquesandwereassistedbytheKOsoftwareEDRTheyreliedonactivemachinelearningandothersearchtechniquestofindrelevantdocumentsandeffectivetrainingdocumentsThevarioustypesofsearchesincludedintheTeamrsquosmultimodalapproachareshowninthesearchpyramidbelow

LinearreviewreferstoanSMErsquosexaminationofalldocumentsbycertainkeywitnessesinalawsuitduringcertaintimeframescriticaltothedisputedfactsinalawsuitKeywordsearchinourmethodologyreferstotheuseoftermsoriginatingfromlegalanddocumentanalysisandfromwitnessinterviewsJudgmentalsamplingandverificationbySMEsarealsousedtotestthetermsbeforetheyareusedthroughoutadocumentcollectionOurkeywordsearchalsoincludesavarietyofBooleanfunctionsandparametrictargetingwhereinsearchesarelimitedtocertainmetadatafieldsofanelectronicdocumentSimilarityandconceptsearchesrefertoavarietyofpassivemachinelearninganalyticsearchtechniquesTheAIsearchatthetopofthepyramidreferstotheuseofactivemachinelearningTheEDRKOsoftwareusesaproprietarytypeoflogisticregressionalgorithmThestandardeight-stepworkflowusedbytheTeaminlegalsearchprojectsisshowninthediagrambelow8TomeettheTeamrsquosselfimposedtimerequirementsofcompletingeveryreviewprojectwithminimaltimeeffortsthestandardstepsThreeandSevenwereomitted

8LoseyRPredictiveCoding40ndashNineKeyPointsofLegalDocumentReviewandanUpdatedStatementofOurWorkflow(e-DiscoveryTeamOctober2016)containsacompletedescriptionofalleightstepsinpartsSixandSeven

9

aswillbefurtherexplainedFurtherduetotheset-upoftheTRECexperimentsthefirststepofourworkflowESICommunicationswasseverelyconstrainedtothepointofbeingpracticallymeaninglessaswillalsobefurtherexplainedTheTeamrsquosstandardworkflowwasthusreducedtofivestepsasshownbelow

InthefirststepofESICommunicationsteammembersonalegalreviewprojecttypicallyspendhoursindiscussionandanalysisofscopeofrelevanceandthetargetdocumentsThecommunicationsoftenincludehundredsofwrittenexchangesbothinformalsuchasemailsandchatsandformalsuchas(1)detailedrequestsforinformationcontainedincourtdocumentssuchasubpoenasorRequestForProduction(2)inputfromaqualifiedSMEwhoistypicallyalegalexpertwithdeepknowledgeofthefactualissuesinthecaseandthusdeepknowledgeofwhatthepresidingjudgeinthelegalproceedingwillholdtoberelevantanddiscoverableand(3)dialogueswiththepartyrequestingtheproductionofdocumentstoclarifythesearchtargetandotherpartiesTheESIcommunicationsmayleadtoformalmotionswiththegoverningcourtlegalmemorandumshearingsbeforethepresidingjudgeandopinionsrenderedbyoneormorejudgesonthescopeofrelevance9

9IdatPartSixwhereinthefirststepofESICommunicationsisexplainedindetail

10

TheonlyESIcommunicationsintheTRECexperimentalset-upwasaveryshortonesentencedescriptionofrelevanceforeachtopicTwotopicshadatwo-sentencedescription(410-Condominiumsand423-NationalRifleAssociation)TheonlyothertypeofESIcommunicationsinthisTRECTrackweretheautomatedinstantreturnsofalldocumentssubmittedastowhetherTRECconsideredthemtoberelevantornotTherewerenoappealsorotherproceduresset-upforAthomeparticipantswhoactuallyexaminedthedocumentsfortruerelevancetochallengeobviouserrorsinjudgment

ShortAnswerstoResearchQuestionsResearchQuestion1(PrimaryQuestion)WhatRecallPrecisionandEffortlevelswillthee-DiscoveryTeamattaininTRECtestconditionsoverallthirty-fourTopicsusingtheTeamrsquosPredictiveCoding40hybridmultimodalISTsearchmethodsandKrollOntrackrsquossoftwareeDiscoveryReview(EDR)ShortAnswertoPrimaryQuestionAgainlikelastyeartheTeamattainedexcellentresultswithhighlevelsofRecallandPrecisioninalltopicsincludingperfectornearperfectresultsinseveraltopicsusingthecorrectedgoldstandardTheTeamwasabletodosoeventhoughitonlyusedfiveoftheeightstepsinitsusualmethodologyandeventhoughitintentionallyseverelyconstrainedtheamountofhumaneffortexpendedoneachtopicTheTeamrsquosenthusiasmfortheresultswhichweresignificantlybetterthanits2015effortistemperedbythefactthatthesearchchallengespresentedinmostofthetopicsin2016werenotdifficultAsmentionedtheywereequivalenttoaneasylegalsearchprojectsuchasasimplesingleplaintiffemploymentlawdisputeTheFinalReportwillincludeadetailedanalysisoftheseresultsResearchQuestion2WhatistheimpactofmultipleerrorsinSMEjudgmentsbytheTRECassessorsonRecallandPrecisionShortAnswerTheimpactonRecallandPrecisionusingtheTeamrsquosmethodissignificantandasyouwouldexpectvarieddeterminedtothenumberoferrorsmadebyTRECassessorsinaparticulartopicAftertheTeamencounterednumerouserrorsonthefirsttopicsundertakenitwasforcedtocreateitsowngoldstandardoftruerelevantdocumentsforeachtopicTheTeamrsquosnewgoldstandardcorrectedfortheobviouserrorsseeninTRECrsquosassessmentsofrelevanceInallclosequestionsonrelevancethejudgmentofTRECrsquosassessorswasacceptedasaccurateTheobviouserrorsandinconsistenciesseenbytheTeamrsquosclosestudyofthedocumentswerenotacceptedInmostbutnotalltopicstheTeamdidnotusethedocumentswithobviouserrorsforitsmachinetrainingThiswillbefurtherdetailedintheFinalReportInalltopicstheTeamcreateditsownstandardandmadecomparativerecallprecisionandF1calculationsbasedthereonTheobservationandcorrectionofTRECerrorsingoldstandardbecameacollaborativeeffortamongtheTeamtopeerreviewandverifyourcorrected

11

standardMostoftheseeffortsmanyofwhichoccurredaftertheconclusionoftheTrackinAugustwerenotincludedinthetimereportsofeffortsexpendedbyattorneysinthesearchTheTeamwasveryreluctanttotakethisstepandwouldcertainlyhaveletpassafewerrorsormeredifferencesofopinionWerecognizethatnostandardiseverperfectAslawyerstheTeamunderstandsalltoowellthatsomeperhapsmanyjudgmentsonrelevancearesubjectiveAgaininallclosequestionsonrelevancethejudgmentsofTRECrsquosassessorswereacceptedeventhoughwepersonallydisagreedTheTeammeansnodisrespectbythecreationofanalternategoldstandardWeappreciateandrespecttheeffortsmadebytheTRECassessorsandorganizersStillthevolumeofobviouserrorsencounteredforcedustotakethisactionTheintegrityofourprimaryresearchquestiontotesttheeffectivenessofourhands-ontypeofadhochybridmethodsdemandedthatwedosoWeunderstandthattheimpactonotherTotalRecallParticipantsonesthatneveractuallyexaminedocumentswouldbefarlessperhapsevennegligibleStilltherecouldbeanimpactevenfortheminsometopicswheremorethananinsignificantnumberofthesameorsimilardocumentswereinconsistentlyjudgedThedecisiontonotaccepttheerrorsseenandtoinsteadcreateourowngoldstandardresultedinsubstantialadditionalworkfortheTeamInsometopicsweeventookthestepofmakingtwoldquoreasonablecallsrdquoOnewasforTRECandthesecondcallwhichalwaystookplaceonthenextsubmissionwasforourowninternaltrackingInthesecondcallwewouldincludeemailsthatweknewfrompriorsubmissionsofthesameorsimilardocumentwouldagainbeincorrectlyconsideredirrelevantbyTRECWeknewtheyweretruerelevantandsowaiteduntilafterourpublicreasonablecalltoTRECtosubmitthemandthenwemakeourowninternalreasonablecallWewereattemptingtoineffectplaytwogamesatonceandmaximizeourscoreineachgameKeepingtrackoftwostandardsaddedanunexpectedlayerofdifficultytoourworkandwedidnotbothertodosoinalltopicsThedual-calltopicswillbespecificallyidentifiedinourFinalReportInsometopicsthedifferencebetweenthetwostandardswassubstantialInafewtopicsitwasminorSomedifferenceswerefoundinalltopicsThisisnotunexpectedinanystandardinvolvingatleastsomewhatsubjectivemassrelevanceadjudicationsWedonotintendtoengageinacriticismofthespecificgoldstandardcreationmethodsusedin2016TotalRecallTrackexcepttonotethattheappealsprocedureincludedinthe2008and2009TRECLegalTrackscouldhaveimprovedtheaccuracyoftheresultsfortheTotalRecallTrackAthomeparticipants10FurthertheTeamunderstandsfrominformalreportsthattheTREC

10Participantappealrightscouldhavemitigatedtheerrorsseenin2016butthiscanbeburdensomeandasseeninthoseTracksin2008and2009cancreatetheirownissuesSeeOardHedlinTomlinsonBaronOverviewoftheTREC2008LegalTrackfoundathttptrecnistgovpubstrec17papersLEGALOVERVIEW08pdfandOardHedlinTomlinson

12

assessorsworkwasmuchmoretimeconstrainedthanwastheworkoftheTeamMoreoverunliketheTeamtheTRECassessorsdidnothavethebenefitofSMEinputfromanativeFloridianlawyer(Losey)whowasfamiliarwithFloridapoliticsandGovernorBushandsince2015hadputsubstantialtimereviewingthisemailcollectionTheFinalReportwillincludeadetailedcomparisonofrecallprecisionandF1basedonthecomparisonofboththeTRECandTeamassessmentsAfewexamplesofthemoreegregiouserrorsencounteredwillbeprovidedTheFinalReportmayalsocontainacompletelistingoftherevisedgoldstandardsthattheTeamcreatedforeachtopicoratleastaconditionalofferofdisclosureofthecorrectedstandardsTheTeaminvitesinputfromotherparticipantsandorganizersoftheTotalRecallTrackonthisissueAgaintheTeamrecognizesthatnogoldstandardiseverperfectincludingitsownrevisedstandardsThiswillbesetforthinfurtherdetailintheTeamrsquosfinalreportResearchQuestion3WhatisthemosteffectivesearchmethodfromtheTeamrsquosmultimodaltool-setforretrievalofrelevantdocumentsfortherelativelysimplisticsearchchallengespresentedbymostofthethirty-fourtopicsShortAnswerFortheeasytopicstheTeamfoundthatwhatitcallsldquotestedparametricBooleankeywordsearchrdquowasthemosteffectivesearchmethodtofindrelevantdocumentsTheTeamwassurprisedbyhowwellasophisticateduseofkeywordswasabletoidentifynearlyallofthetargetrelevantdocumentsinmanyofthetopicsinthisyearrsquosTotalRecallTrackThisshowsthecontinuedimportanceofamultimodalapproachtolegalsearchincludingespeciallykeywordsearchwhendoneproperlyespeciallyinsimplelawsuitsinvolvingrelativelyeasysearchissuesThiswillbesetforthinfurtherdetailintheTeamrsquosfinalreportResearchQuestion4Whatistheroleofactivemachinelearninginretrievalofrelevantdocumentsinthesimplisticsearchchallengespresentedbymostofthethirty-fourtopicsShortAnswerTheTeamfoundthatfortherelativelyeasytopicsinthisyearrsquosTotalRecallTracktheroleofactivemachinelearningwasreducedtoaqualitycontrolfunctionItwouldfindafewrelevantdocumentsnotlocatedbykeywordsearchorconceptandsimilaritysearchandthusimproverecallsomewhatInthesimplesttopicsactivemachinelearningdidnotfindanynewrelevantdocumentsbutinsteadonlyconfirmedthatallrelevantdocumentshadalreadybeenfoundbytheothermethodsThiswillbesetforthinfurtherdetailintheTeamrsquosfinalreport

FurtherDiscussionofResearchQuestion1

BaronOardOverviewoftheTREC2009LegalTrackfoundathttptrecnistgovpubstrec18papersLEGAL09OVERVIEWpdf

13

EvenusingthegivenuncorrectedTRECstandardforscoringandeventhoughinmosttopicswedidnottrainontheTRECreturned-relevantdocumentsthattheTeamconsideredirrelevanttheTeamoverallstillattainedexcellentresultsUnderthecorrectedstandardwhichwillbesharedintheFinalReporttheresultsweremuchbetterThefollowingchartcomparestheTeamrsquosRecallPrecisionandF-MeasureforeachAthometopicwiththeresultsobtainedbyTRECrsquosBMIandBMI-Descruns(onlyotherscoresnowavailable)

REASONABLECOMPARISON

Recall Precision F-Measure

EdiscoveryTeam

BMI BMI-Desc

EdiscoveryTeam

BMI BMI-Desc

EdiscoveryTeam

BMI BMI-Desc

athome401 SummerOlympics 4105 9170 9258 7344 1531 1545 5266 2623 2648athome402 Space 7257 9107 9028 2204 3086 3059 3381 4609 4570athome403 BottledWater 716 9771 9771 8041 3749 3749 1314 5418 5418athome404 EminentDomain 2294 9174 9193 6443 2655 2661 3383 4119 4127athome405 NewtGingrich 9508 9918 9836 2809 982 974 4336 1787 1773athome406 FelonDisenfran 7323 9291 9291 6691 958 958 6992 1737 1737

athome407FaithBasedInitiatives 3102 9180 9199 6872 4186 4195 4275 5750 5762

athome408 InvasiveSpecies 5517 8362 8362 6465 787 787 5953 1439 1439athome409 ClimateChange 8465 9505 9406 4071 1399 1385 5498 2440 2414athome410 Condominiums 9510 9948 9903 4613 4259 4240 6212 5964 5938athome411 StandYourGround 6629 7079 8427 6705 570 609 6667 1055 1136athome412 2000Recount 5738 9135 9248 4918 4097 4148 5296 5657 5727athome413 JamesVCrosby 9634 9908 9927 8900 2873 2878 9252 4455 4463athome414 MedicaidReform 9166 9690 9726 3532 3510 3523 5101 5154 5173athome415 GeorgeWBush 9408 6339 6708 9104 6109 5866 9253 6222 6259athome416 Marketing 6030 9419 9557 4208 4332 4396 4957 5935 6022athome417 MovieGallery 9961 9981 9966 9938 5728 5719 9949 7279 7267athome418 WarPreparations 3957 9305 9358 5034 1268 1276 4431 2232 2245

athome419LostFosterChildRilyaWilson 9884 9306 9361 1504 4813 4841 2610 6344 6382

athome420 Billboards 9254 9946 9932 9216 3165 3161 9235 4802 4795athome421 TrafficCameras 9048 10000 10000 1250 190 190 2197 373 373athome422 NonResidentAliens 9355 10000 10000 090 281 281 179 546 546

athome423NationalRifleAssociation 5105 9965 9965 3318 1868 1868 4022 3146 3146

athome424 GulfDrilling 9960 10000 10000 2276 2639 2639 3705 4176 4176athome425 CivilRightsAct2003 9132 9860 9860 9659 3370 3370 9388 5023 5023athome426 JeffreyGoldhagen 7000 9417 9417 8750 917 917 7778 1672 1672athome427 SlotMachines 8921 9668 9668 3577 1698 1698 5107 2889 2889athome428 NewStadiums 9310 9849 9849 1781 2695 2695 2991 4231 4231athome429 ElianGonzalez 9420 9927 9927 9241 3545 3545 9329 5224 5224

14

athome430RestraintsandHelmets 7195 9425 9465 6500 3640 3655 6830 5252 5274

athome431 AgencyCreditRate 7569 9931 9931 4760 1161 1161 5845 2078 2078athome432 GayAdoption 8500 9857 9857 8623 1120 1120 8561 2012 2012athome433 Abstinence 9911 10000 10000 6607 909 909 7929 1667 1667athome434 BacardiTrademark 8684 10000 10000 9167 344 344 8919 665 665

ThesecomparativestatisticsshowthescoresatthetimeofreasonablecallIntheprecisioncategorywhichinLegalSearchisthemoneyshotthathasthegreatestimpactonthecostofadocumentreviewprojectthee-DiscoveryTeamdominatedIthadthehighestprecisionlevelon28ofthe34topics(82)TheyarehighlightedinblueintheabovechartThee-DiscoveryTeamrsquosaverageprecisionscorewas571TheaverageprecisionofbothBMIandBMI-Descwas248ThustheTeamrsquosprecisionscorewasonaveragemoretwoandaquartertimeshigherthanthatoftheBMIstandards

IntheF1-measurewhichisthestandardvalueusedinlegalsearchtoevaluateoverallprecisionandrecallofaprojectthee-DiscoveryTeamagaindominatedThisissomewhatsurprisinginviewofthefactthatthesemeasurementswerebasedontheerror-filledTRECstandardTheTeamhadthehighestF1scoreson23ofthe34topics(68)TheyarehighlightedinblueintheabovechartThee-DiscoveryTeamrsquosaverageF1scorewas5769

5712

2483 2481

000

1000

2000

3000

4000

5000

6000

7000

AveragePrecisionAcrossTopics

EdiscoveryTeam BMI BMI-Desc

15

TheaverageF1ofBMIandBMI-Descwas365ThustheTeamrsquosF1scorewasonaveragemorethan58higherthanthatoftheBMIstandards

EvenusingTRECrsquoschallengedstandardtheTeamstillattainedhigherrecallthanboththeBMIandBMI-Descstandardsontwotopicstopic415GeorgeBushwithascoreof9408andtopic419LostFosterChildRilyaWilsonwithascoreof9884MoreovertheTeamattainedrecalllevelsinexcessof90atthetimeofreasonablecallinthefollowingadditionaltopics

bull 9508ontopic406FelonDisenfranchisementbull 9510ontopic410Condominiumsbull 9634ontopic413JamesVCrosbybull 9961ontopic417MovieGallerybull 9254ontopic420Billboardsbull 9048ontopic421TrafficCamerasbull 9355ontopic422NonResidentAliensbull 9960ontopic424GulfDrilling

5769

3646 3655

000

1000

2000

3000

4000

5000

6000

7000

AverageF-MeasureAcrossTopics

EdiscoveryTeam BMI BMI-Desc

16

bull 9132ontopic425CivilRightsActof2003bull 9310ontopic428NewStadiumsandArenasbull 9420ontopic429ElianGonzalezbull 9911ontopic433Abstinence

InsummaryevenwiththeTRECstandardwhereinmosttopicstheTeamdidnotusealldocumentsreturnedasrelevantforallofitstrainingdocumentsitattainedRecallscoresgreaterthan90infourteenofthethirty-fourtopicsTheTeamattainedRecallscoresof80orhigherinfouradditionaltopicsTheaverageresultsobtainedacrossallthirty-fourtopicsatthetimeofreasonablecallwereasfollows

bull 7546Recallbull 5712Precisionbull 5769F1bull 121DocsReviewedEffort

TheTeamwilldiscloseallofitsscoresunderthecorrectedgoldstandardintheFinalReportInthemeantimeherearetheaverageresultsobtainedacrossallthirty-fourtopicsatthetimeofreasonablecall

bull 8715Recallbull 6494Precisionbull 6874F1bull 124DocsReviewedEffort

AtthetimeofreasonablecalltheTeamhadrecallscoresgreaterthan90intwenty-oneofthethirty-fourtopicsandgreaterthan80infivemoretopicsRecallofgreaterthan99wasattainedinseventopicsAtthetimeofreasonablecalltheTeamhadprecisionscoresgreaterthan90inthirteenofthethirty-fourtopicsandgreaterthan80intwomoretopicsPrecisionofgreaterthan98wasattainedinsixtopicsAtthetimeofreasonablecalltheTeamhadF1scoresgreaterthan90intwelveofthethirty-fourtopicsandgreaterthan80inonemoretopicF1ofgreaterthan97wasattainedinfivetopicsWewereluckytoattainoneperfectscoreaswedidin2015intopic(417)withanF1scoreof100Theperfectscorewasobtainedbylocatingall5945documentsrelevantunderthecorrectedstandardafterreviewingonly45documentsThistopicwasfilledwithformlettersandwasafairlysimplesearchStilltheBMIandBMI-DescF1scoresforthistopicwerebothunder73TheTeamwaspleasedtoproveonceagainthatperfectrecallandperfectprecisionispossiblealbeitrareusingtheTeamrsquosmethods

17

ForquestionscommentsorsuggestionsconcerningthispreliminaryNotebookreportofthee-DiscoveryTeampleasecontactRalphLoseygmailcom

9

aswillbefurtherexplainedFurtherduetotheset-upoftheTRECexperimentsthefirststepofourworkflowESICommunicationswasseverelyconstrainedtothepointofbeingpracticallymeaninglessaswillalsobefurtherexplainedTheTeamrsquosstandardworkflowwasthusreducedtofivestepsasshownbelow

InthefirststepofESICommunicationsteammembersonalegalreviewprojecttypicallyspendhoursindiscussionandanalysisofscopeofrelevanceandthetargetdocumentsThecommunicationsoftenincludehundredsofwrittenexchangesbothinformalsuchasemailsandchatsandformalsuchas(1)detailedrequestsforinformationcontainedincourtdocumentssuchasubpoenasorRequestForProduction(2)inputfromaqualifiedSMEwhoistypicallyalegalexpertwithdeepknowledgeofthefactualissuesinthecaseandthusdeepknowledgeofwhatthepresidingjudgeinthelegalproceedingwillholdtoberelevantanddiscoverableand(3)dialogueswiththepartyrequestingtheproductionofdocumentstoclarifythesearchtargetandotherpartiesTheESIcommunicationsmayleadtoformalmotionswiththegoverningcourtlegalmemorandumshearingsbeforethepresidingjudgeandopinionsrenderedbyoneormorejudgesonthescopeofrelevance9

9IdatPartSixwhereinthefirststepofESICommunicationsisexplainedindetail

10

TheonlyESIcommunicationsintheTRECexperimentalset-upwasaveryshortonesentencedescriptionofrelevanceforeachtopicTwotopicshadatwo-sentencedescription(410-Condominiumsand423-NationalRifleAssociation)TheonlyothertypeofESIcommunicationsinthisTRECTrackweretheautomatedinstantreturnsofalldocumentssubmittedastowhetherTRECconsideredthemtoberelevantornotTherewerenoappealsorotherproceduresset-upforAthomeparticipantswhoactuallyexaminedthedocumentsfortruerelevancetochallengeobviouserrorsinjudgment

ShortAnswerstoResearchQuestionsResearchQuestion1(PrimaryQuestion)WhatRecallPrecisionandEffortlevelswillthee-DiscoveryTeamattaininTRECtestconditionsoverallthirty-fourTopicsusingtheTeamrsquosPredictiveCoding40hybridmultimodalISTsearchmethodsandKrollOntrackrsquossoftwareeDiscoveryReview(EDR)ShortAnswertoPrimaryQuestionAgainlikelastyeartheTeamattainedexcellentresultswithhighlevelsofRecallandPrecisioninalltopicsincludingperfectornearperfectresultsinseveraltopicsusingthecorrectedgoldstandardTheTeamwasabletodosoeventhoughitonlyusedfiveoftheeightstepsinitsusualmethodologyandeventhoughitintentionallyseverelyconstrainedtheamountofhumaneffortexpendedoneachtopicTheTeamrsquosenthusiasmfortheresultswhichweresignificantlybetterthanits2015effortistemperedbythefactthatthesearchchallengespresentedinmostofthetopicsin2016werenotdifficultAsmentionedtheywereequivalenttoaneasylegalsearchprojectsuchasasimplesingleplaintiffemploymentlawdisputeTheFinalReportwillincludeadetailedanalysisoftheseresultsResearchQuestion2WhatistheimpactofmultipleerrorsinSMEjudgmentsbytheTRECassessorsonRecallandPrecisionShortAnswerTheimpactonRecallandPrecisionusingtheTeamrsquosmethodissignificantandasyouwouldexpectvarieddeterminedtothenumberoferrorsmadebyTRECassessorsinaparticulartopicAftertheTeamencounterednumerouserrorsonthefirsttopicsundertakenitwasforcedtocreateitsowngoldstandardoftruerelevantdocumentsforeachtopicTheTeamrsquosnewgoldstandardcorrectedfortheobviouserrorsseeninTRECrsquosassessmentsofrelevanceInallclosequestionsonrelevancethejudgmentofTRECrsquosassessorswasacceptedasaccurateTheobviouserrorsandinconsistenciesseenbytheTeamrsquosclosestudyofthedocumentswerenotacceptedInmostbutnotalltopicstheTeamdidnotusethedocumentswithobviouserrorsforitsmachinetrainingThiswillbefurtherdetailedintheFinalReportInalltopicstheTeamcreateditsownstandardandmadecomparativerecallprecisionandF1calculationsbasedthereonTheobservationandcorrectionofTRECerrorsingoldstandardbecameacollaborativeeffortamongtheTeamtopeerreviewandverifyourcorrected

11

standardMostoftheseeffortsmanyofwhichoccurredaftertheconclusionoftheTrackinAugustwerenotincludedinthetimereportsofeffortsexpendedbyattorneysinthesearchTheTeamwasveryreluctanttotakethisstepandwouldcertainlyhaveletpassafewerrorsormeredifferencesofopinionWerecognizethatnostandardiseverperfectAslawyerstheTeamunderstandsalltoowellthatsomeperhapsmanyjudgmentsonrelevancearesubjectiveAgaininallclosequestionsonrelevancethejudgmentsofTRECrsquosassessorswereacceptedeventhoughwepersonallydisagreedTheTeammeansnodisrespectbythecreationofanalternategoldstandardWeappreciateandrespecttheeffortsmadebytheTRECassessorsandorganizersStillthevolumeofobviouserrorsencounteredforcedustotakethisactionTheintegrityofourprimaryresearchquestiontotesttheeffectivenessofourhands-ontypeofadhochybridmethodsdemandedthatwedosoWeunderstandthattheimpactonotherTotalRecallParticipantsonesthatneveractuallyexaminedocumentswouldbefarlessperhapsevennegligibleStilltherecouldbeanimpactevenfortheminsometopicswheremorethananinsignificantnumberofthesameorsimilardocumentswereinconsistentlyjudgedThedecisiontonotaccepttheerrorsseenandtoinsteadcreateourowngoldstandardresultedinsubstantialadditionalworkfortheTeamInsometopicsweeventookthestepofmakingtwoldquoreasonablecallsrdquoOnewasforTRECandthesecondcallwhichalwaystookplaceonthenextsubmissionwasforourowninternaltrackingInthesecondcallwewouldincludeemailsthatweknewfrompriorsubmissionsofthesameorsimilardocumentwouldagainbeincorrectlyconsideredirrelevantbyTRECWeknewtheyweretruerelevantandsowaiteduntilafterourpublicreasonablecalltoTRECtosubmitthemandthenwemakeourowninternalreasonablecallWewereattemptingtoineffectplaytwogamesatonceandmaximizeourscoreineachgameKeepingtrackoftwostandardsaddedanunexpectedlayerofdifficultytoourworkandwedidnotbothertodosoinalltopicsThedual-calltopicswillbespecificallyidentifiedinourFinalReportInsometopicsthedifferencebetweenthetwostandardswassubstantialInafewtopicsitwasminorSomedifferenceswerefoundinalltopicsThisisnotunexpectedinanystandardinvolvingatleastsomewhatsubjectivemassrelevanceadjudicationsWedonotintendtoengageinacriticismofthespecificgoldstandardcreationmethodsusedin2016TotalRecallTrackexcepttonotethattheappealsprocedureincludedinthe2008and2009TRECLegalTrackscouldhaveimprovedtheaccuracyoftheresultsfortheTotalRecallTrackAthomeparticipants10FurthertheTeamunderstandsfrominformalreportsthattheTREC

10Participantappealrightscouldhavemitigatedtheerrorsseenin2016butthiscanbeburdensomeandasseeninthoseTracksin2008and2009cancreatetheirownissuesSeeOardHedlinTomlinsonBaronOverviewoftheTREC2008LegalTrackfoundathttptrecnistgovpubstrec17papersLEGALOVERVIEW08pdfandOardHedlinTomlinson

12

assessorsworkwasmuchmoretimeconstrainedthanwastheworkoftheTeamMoreoverunliketheTeamtheTRECassessorsdidnothavethebenefitofSMEinputfromanativeFloridianlawyer(Losey)whowasfamiliarwithFloridapoliticsandGovernorBushandsince2015hadputsubstantialtimereviewingthisemailcollectionTheFinalReportwillincludeadetailedcomparisonofrecallprecisionandF1basedonthecomparisonofboththeTRECandTeamassessmentsAfewexamplesofthemoreegregiouserrorsencounteredwillbeprovidedTheFinalReportmayalsocontainacompletelistingoftherevisedgoldstandardsthattheTeamcreatedforeachtopicoratleastaconditionalofferofdisclosureofthecorrectedstandardsTheTeaminvitesinputfromotherparticipantsandorganizersoftheTotalRecallTrackonthisissueAgaintheTeamrecognizesthatnogoldstandardiseverperfectincludingitsownrevisedstandardsThiswillbesetforthinfurtherdetailintheTeamrsquosfinalreportResearchQuestion3WhatisthemosteffectivesearchmethodfromtheTeamrsquosmultimodaltool-setforretrievalofrelevantdocumentsfortherelativelysimplisticsearchchallengespresentedbymostofthethirty-fourtopicsShortAnswerFortheeasytopicstheTeamfoundthatwhatitcallsldquotestedparametricBooleankeywordsearchrdquowasthemosteffectivesearchmethodtofindrelevantdocumentsTheTeamwassurprisedbyhowwellasophisticateduseofkeywordswasabletoidentifynearlyallofthetargetrelevantdocumentsinmanyofthetopicsinthisyearrsquosTotalRecallTrackThisshowsthecontinuedimportanceofamultimodalapproachtolegalsearchincludingespeciallykeywordsearchwhendoneproperlyespeciallyinsimplelawsuitsinvolvingrelativelyeasysearchissuesThiswillbesetforthinfurtherdetailintheTeamrsquosfinalreportResearchQuestion4Whatistheroleofactivemachinelearninginretrievalofrelevantdocumentsinthesimplisticsearchchallengespresentedbymostofthethirty-fourtopicsShortAnswerTheTeamfoundthatfortherelativelyeasytopicsinthisyearrsquosTotalRecallTracktheroleofactivemachinelearningwasreducedtoaqualitycontrolfunctionItwouldfindafewrelevantdocumentsnotlocatedbykeywordsearchorconceptandsimilaritysearchandthusimproverecallsomewhatInthesimplesttopicsactivemachinelearningdidnotfindanynewrelevantdocumentsbutinsteadonlyconfirmedthatallrelevantdocumentshadalreadybeenfoundbytheothermethodsThiswillbesetforthinfurtherdetailintheTeamrsquosfinalreport

FurtherDiscussionofResearchQuestion1

BaronOardOverviewoftheTREC2009LegalTrackfoundathttptrecnistgovpubstrec18papersLEGAL09OVERVIEWpdf

13

EvenusingthegivenuncorrectedTRECstandardforscoringandeventhoughinmosttopicswedidnottrainontheTRECreturned-relevantdocumentsthattheTeamconsideredirrelevanttheTeamoverallstillattainedexcellentresultsUnderthecorrectedstandardwhichwillbesharedintheFinalReporttheresultsweremuchbetterThefollowingchartcomparestheTeamrsquosRecallPrecisionandF-MeasureforeachAthometopicwiththeresultsobtainedbyTRECrsquosBMIandBMI-Descruns(onlyotherscoresnowavailable)

REASONABLECOMPARISON

Recall Precision F-Measure

EdiscoveryTeam

BMI BMI-Desc

EdiscoveryTeam

BMI BMI-Desc

EdiscoveryTeam

BMI BMI-Desc

athome401 SummerOlympics 4105 9170 9258 7344 1531 1545 5266 2623 2648athome402 Space 7257 9107 9028 2204 3086 3059 3381 4609 4570athome403 BottledWater 716 9771 9771 8041 3749 3749 1314 5418 5418athome404 EminentDomain 2294 9174 9193 6443 2655 2661 3383 4119 4127athome405 NewtGingrich 9508 9918 9836 2809 982 974 4336 1787 1773athome406 FelonDisenfran 7323 9291 9291 6691 958 958 6992 1737 1737

athome407FaithBasedInitiatives 3102 9180 9199 6872 4186 4195 4275 5750 5762

athome408 InvasiveSpecies 5517 8362 8362 6465 787 787 5953 1439 1439athome409 ClimateChange 8465 9505 9406 4071 1399 1385 5498 2440 2414athome410 Condominiums 9510 9948 9903 4613 4259 4240 6212 5964 5938athome411 StandYourGround 6629 7079 8427 6705 570 609 6667 1055 1136athome412 2000Recount 5738 9135 9248 4918 4097 4148 5296 5657 5727athome413 JamesVCrosby 9634 9908 9927 8900 2873 2878 9252 4455 4463athome414 MedicaidReform 9166 9690 9726 3532 3510 3523 5101 5154 5173athome415 GeorgeWBush 9408 6339 6708 9104 6109 5866 9253 6222 6259athome416 Marketing 6030 9419 9557 4208 4332 4396 4957 5935 6022athome417 MovieGallery 9961 9981 9966 9938 5728 5719 9949 7279 7267athome418 WarPreparations 3957 9305 9358 5034 1268 1276 4431 2232 2245

athome419LostFosterChildRilyaWilson 9884 9306 9361 1504 4813 4841 2610 6344 6382

athome420 Billboards 9254 9946 9932 9216 3165 3161 9235 4802 4795athome421 TrafficCameras 9048 10000 10000 1250 190 190 2197 373 373athome422 NonResidentAliens 9355 10000 10000 090 281 281 179 546 546

athome423NationalRifleAssociation 5105 9965 9965 3318 1868 1868 4022 3146 3146

athome424 GulfDrilling 9960 10000 10000 2276 2639 2639 3705 4176 4176athome425 CivilRightsAct2003 9132 9860 9860 9659 3370 3370 9388 5023 5023athome426 JeffreyGoldhagen 7000 9417 9417 8750 917 917 7778 1672 1672athome427 SlotMachines 8921 9668 9668 3577 1698 1698 5107 2889 2889athome428 NewStadiums 9310 9849 9849 1781 2695 2695 2991 4231 4231athome429 ElianGonzalez 9420 9927 9927 9241 3545 3545 9329 5224 5224

14

athome430RestraintsandHelmets 7195 9425 9465 6500 3640 3655 6830 5252 5274

athome431 AgencyCreditRate 7569 9931 9931 4760 1161 1161 5845 2078 2078athome432 GayAdoption 8500 9857 9857 8623 1120 1120 8561 2012 2012athome433 Abstinence 9911 10000 10000 6607 909 909 7929 1667 1667athome434 BacardiTrademark 8684 10000 10000 9167 344 344 8919 665 665

ThesecomparativestatisticsshowthescoresatthetimeofreasonablecallIntheprecisioncategorywhichinLegalSearchisthemoneyshotthathasthegreatestimpactonthecostofadocumentreviewprojectthee-DiscoveryTeamdominatedIthadthehighestprecisionlevelon28ofthe34topics(82)TheyarehighlightedinblueintheabovechartThee-DiscoveryTeamrsquosaverageprecisionscorewas571TheaverageprecisionofbothBMIandBMI-Descwas248ThustheTeamrsquosprecisionscorewasonaveragemoretwoandaquartertimeshigherthanthatoftheBMIstandards

IntheF1-measurewhichisthestandardvalueusedinlegalsearchtoevaluateoverallprecisionandrecallofaprojectthee-DiscoveryTeamagaindominatedThisissomewhatsurprisinginviewofthefactthatthesemeasurementswerebasedontheerror-filledTRECstandardTheTeamhadthehighestF1scoreson23ofthe34topics(68)TheyarehighlightedinblueintheabovechartThee-DiscoveryTeamrsquosaverageF1scorewas5769

5712

2483 2481

000

1000

2000

3000

4000

5000

6000

7000

AveragePrecisionAcrossTopics

EdiscoveryTeam BMI BMI-Desc

15

TheaverageF1ofBMIandBMI-Descwas365ThustheTeamrsquosF1scorewasonaveragemorethan58higherthanthatoftheBMIstandards

EvenusingTRECrsquoschallengedstandardtheTeamstillattainedhigherrecallthanboththeBMIandBMI-Descstandardsontwotopicstopic415GeorgeBushwithascoreof9408andtopic419LostFosterChildRilyaWilsonwithascoreof9884MoreovertheTeamattainedrecalllevelsinexcessof90atthetimeofreasonablecallinthefollowingadditionaltopics

bull 9508ontopic406FelonDisenfranchisementbull 9510ontopic410Condominiumsbull 9634ontopic413JamesVCrosbybull 9961ontopic417MovieGallerybull 9254ontopic420Billboardsbull 9048ontopic421TrafficCamerasbull 9355ontopic422NonResidentAliensbull 9960ontopic424GulfDrilling

5769

3646 3655

000

1000

2000

3000

4000

5000

6000

7000

AverageF-MeasureAcrossTopics

EdiscoveryTeam BMI BMI-Desc

16

bull 9132ontopic425CivilRightsActof2003bull 9310ontopic428NewStadiumsandArenasbull 9420ontopic429ElianGonzalezbull 9911ontopic433Abstinence

InsummaryevenwiththeTRECstandardwhereinmosttopicstheTeamdidnotusealldocumentsreturnedasrelevantforallofitstrainingdocumentsitattainedRecallscoresgreaterthan90infourteenofthethirty-fourtopicsTheTeamattainedRecallscoresof80orhigherinfouradditionaltopicsTheaverageresultsobtainedacrossallthirty-fourtopicsatthetimeofreasonablecallwereasfollows

bull 7546Recallbull 5712Precisionbull 5769F1bull 121DocsReviewedEffort

TheTeamwilldiscloseallofitsscoresunderthecorrectedgoldstandardintheFinalReportInthemeantimeherearetheaverageresultsobtainedacrossallthirty-fourtopicsatthetimeofreasonablecall

bull 8715Recallbull 6494Precisionbull 6874F1bull 124DocsReviewedEffort

AtthetimeofreasonablecalltheTeamhadrecallscoresgreaterthan90intwenty-oneofthethirty-fourtopicsandgreaterthan80infivemoretopicsRecallofgreaterthan99wasattainedinseventopicsAtthetimeofreasonablecalltheTeamhadprecisionscoresgreaterthan90inthirteenofthethirty-fourtopicsandgreaterthan80intwomoretopicsPrecisionofgreaterthan98wasattainedinsixtopicsAtthetimeofreasonablecalltheTeamhadF1scoresgreaterthan90intwelveofthethirty-fourtopicsandgreaterthan80inonemoretopicF1ofgreaterthan97wasattainedinfivetopicsWewereluckytoattainoneperfectscoreaswedidin2015intopic(417)withanF1scoreof100Theperfectscorewasobtainedbylocatingall5945documentsrelevantunderthecorrectedstandardafterreviewingonly45documentsThistopicwasfilledwithformlettersandwasafairlysimplesearchStilltheBMIandBMI-DescF1scoresforthistopicwerebothunder73TheTeamwaspleasedtoproveonceagainthatperfectrecallandperfectprecisionispossiblealbeitrareusingtheTeamrsquosmethods

17

ForquestionscommentsorsuggestionsconcerningthispreliminaryNotebookreportofthee-DiscoveryTeampleasecontactRalphLoseygmailcom

10

TheonlyESIcommunicationsintheTRECexperimentalset-upwasaveryshortonesentencedescriptionofrelevanceforeachtopicTwotopicshadatwo-sentencedescription(410-Condominiumsand423-NationalRifleAssociation)TheonlyothertypeofESIcommunicationsinthisTRECTrackweretheautomatedinstantreturnsofalldocumentssubmittedastowhetherTRECconsideredthemtoberelevantornotTherewerenoappealsorotherproceduresset-upforAthomeparticipantswhoactuallyexaminedthedocumentsfortruerelevancetochallengeobviouserrorsinjudgment

ShortAnswerstoResearchQuestionsResearchQuestion1(PrimaryQuestion)WhatRecallPrecisionandEffortlevelswillthee-DiscoveryTeamattaininTRECtestconditionsoverallthirty-fourTopicsusingtheTeamrsquosPredictiveCoding40hybridmultimodalISTsearchmethodsandKrollOntrackrsquossoftwareeDiscoveryReview(EDR)ShortAnswertoPrimaryQuestionAgainlikelastyeartheTeamattainedexcellentresultswithhighlevelsofRecallandPrecisioninalltopicsincludingperfectornearperfectresultsinseveraltopicsusingthecorrectedgoldstandardTheTeamwasabletodosoeventhoughitonlyusedfiveoftheeightstepsinitsusualmethodologyandeventhoughitintentionallyseverelyconstrainedtheamountofhumaneffortexpendedoneachtopicTheTeamrsquosenthusiasmfortheresultswhichweresignificantlybetterthanits2015effortistemperedbythefactthatthesearchchallengespresentedinmostofthetopicsin2016werenotdifficultAsmentionedtheywereequivalenttoaneasylegalsearchprojectsuchasasimplesingleplaintiffemploymentlawdisputeTheFinalReportwillincludeadetailedanalysisoftheseresultsResearchQuestion2WhatistheimpactofmultipleerrorsinSMEjudgmentsbytheTRECassessorsonRecallandPrecisionShortAnswerTheimpactonRecallandPrecisionusingtheTeamrsquosmethodissignificantandasyouwouldexpectvarieddeterminedtothenumberoferrorsmadebyTRECassessorsinaparticulartopicAftertheTeamencounterednumerouserrorsonthefirsttopicsundertakenitwasforcedtocreateitsowngoldstandardoftruerelevantdocumentsforeachtopicTheTeamrsquosnewgoldstandardcorrectedfortheobviouserrorsseeninTRECrsquosassessmentsofrelevanceInallclosequestionsonrelevancethejudgmentofTRECrsquosassessorswasacceptedasaccurateTheobviouserrorsandinconsistenciesseenbytheTeamrsquosclosestudyofthedocumentswerenotacceptedInmostbutnotalltopicstheTeamdidnotusethedocumentswithobviouserrorsforitsmachinetrainingThiswillbefurtherdetailedintheFinalReportInalltopicstheTeamcreateditsownstandardandmadecomparativerecallprecisionandF1calculationsbasedthereonTheobservationandcorrectionofTRECerrorsingoldstandardbecameacollaborativeeffortamongtheTeamtopeerreviewandverifyourcorrected

11

standardMostoftheseeffortsmanyofwhichoccurredaftertheconclusionoftheTrackinAugustwerenotincludedinthetimereportsofeffortsexpendedbyattorneysinthesearchTheTeamwasveryreluctanttotakethisstepandwouldcertainlyhaveletpassafewerrorsormeredifferencesofopinionWerecognizethatnostandardiseverperfectAslawyerstheTeamunderstandsalltoowellthatsomeperhapsmanyjudgmentsonrelevancearesubjectiveAgaininallclosequestionsonrelevancethejudgmentsofTRECrsquosassessorswereacceptedeventhoughwepersonallydisagreedTheTeammeansnodisrespectbythecreationofanalternategoldstandardWeappreciateandrespecttheeffortsmadebytheTRECassessorsandorganizersStillthevolumeofobviouserrorsencounteredforcedustotakethisactionTheintegrityofourprimaryresearchquestiontotesttheeffectivenessofourhands-ontypeofadhochybridmethodsdemandedthatwedosoWeunderstandthattheimpactonotherTotalRecallParticipantsonesthatneveractuallyexaminedocumentswouldbefarlessperhapsevennegligibleStilltherecouldbeanimpactevenfortheminsometopicswheremorethananinsignificantnumberofthesameorsimilardocumentswereinconsistentlyjudgedThedecisiontonotaccepttheerrorsseenandtoinsteadcreateourowngoldstandardresultedinsubstantialadditionalworkfortheTeamInsometopicsweeventookthestepofmakingtwoldquoreasonablecallsrdquoOnewasforTRECandthesecondcallwhichalwaystookplaceonthenextsubmissionwasforourowninternaltrackingInthesecondcallwewouldincludeemailsthatweknewfrompriorsubmissionsofthesameorsimilardocumentwouldagainbeincorrectlyconsideredirrelevantbyTRECWeknewtheyweretruerelevantandsowaiteduntilafterourpublicreasonablecalltoTRECtosubmitthemandthenwemakeourowninternalreasonablecallWewereattemptingtoineffectplaytwogamesatonceandmaximizeourscoreineachgameKeepingtrackoftwostandardsaddedanunexpectedlayerofdifficultytoourworkandwedidnotbothertodosoinalltopicsThedual-calltopicswillbespecificallyidentifiedinourFinalReportInsometopicsthedifferencebetweenthetwostandardswassubstantialInafewtopicsitwasminorSomedifferenceswerefoundinalltopicsThisisnotunexpectedinanystandardinvolvingatleastsomewhatsubjectivemassrelevanceadjudicationsWedonotintendtoengageinacriticismofthespecificgoldstandardcreationmethodsusedin2016TotalRecallTrackexcepttonotethattheappealsprocedureincludedinthe2008and2009TRECLegalTrackscouldhaveimprovedtheaccuracyoftheresultsfortheTotalRecallTrackAthomeparticipants10FurthertheTeamunderstandsfrominformalreportsthattheTREC

10Participantappealrightscouldhavemitigatedtheerrorsseenin2016butthiscanbeburdensomeandasseeninthoseTracksin2008and2009cancreatetheirownissuesSeeOardHedlinTomlinsonBaronOverviewoftheTREC2008LegalTrackfoundathttptrecnistgovpubstrec17papersLEGALOVERVIEW08pdfandOardHedlinTomlinson

12

assessorsworkwasmuchmoretimeconstrainedthanwastheworkoftheTeamMoreoverunliketheTeamtheTRECassessorsdidnothavethebenefitofSMEinputfromanativeFloridianlawyer(Losey)whowasfamiliarwithFloridapoliticsandGovernorBushandsince2015hadputsubstantialtimereviewingthisemailcollectionTheFinalReportwillincludeadetailedcomparisonofrecallprecisionandF1basedonthecomparisonofboththeTRECandTeamassessmentsAfewexamplesofthemoreegregiouserrorsencounteredwillbeprovidedTheFinalReportmayalsocontainacompletelistingoftherevisedgoldstandardsthattheTeamcreatedforeachtopicoratleastaconditionalofferofdisclosureofthecorrectedstandardsTheTeaminvitesinputfromotherparticipantsandorganizersoftheTotalRecallTrackonthisissueAgaintheTeamrecognizesthatnogoldstandardiseverperfectincludingitsownrevisedstandardsThiswillbesetforthinfurtherdetailintheTeamrsquosfinalreportResearchQuestion3WhatisthemosteffectivesearchmethodfromtheTeamrsquosmultimodaltool-setforretrievalofrelevantdocumentsfortherelativelysimplisticsearchchallengespresentedbymostofthethirty-fourtopicsShortAnswerFortheeasytopicstheTeamfoundthatwhatitcallsldquotestedparametricBooleankeywordsearchrdquowasthemosteffectivesearchmethodtofindrelevantdocumentsTheTeamwassurprisedbyhowwellasophisticateduseofkeywordswasabletoidentifynearlyallofthetargetrelevantdocumentsinmanyofthetopicsinthisyearrsquosTotalRecallTrackThisshowsthecontinuedimportanceofamultimodalapproachtolegalsearchincludingespeciallykeywordsearchwhendoneproperlyespeciallyinsimplelawsuitsinvolvingrelativelyeasysearchissuesThiswillbesetforthinfurtherdetailintheTeamrsquosfinalreportResearchQuestion4Whatistheroleofactivemachinelearninginretrievalofrelevantdocumentsinthesimplisticsearchchallengespresentedbymostofthethirty-fourtopicsShortAnswerTheTeamfoundthatfortherelativelyeasytopicsinthisyearrsquosTotalRecallTracktheroleofactivemachinelearningwasreducedtoaqualitycontrolfunctionItwouldfindafewrelevantdocumentsnotlocatedbykeywordsearchorconceptandsimilaritysearchandthusimproverecallsomewhatInthesimplesttopicsactivemachinelearningdidnotfindanynewrelevantdocumentsbutinsteadonlyconfirmedthatallrelevantdocumentshadalreadybeenfoundbytheothermethodsThiswillbesetforthinfurtherdetailintheTeamrsquosfinalreport

FurtherDiscussionofResearchQuestion1

BaronOardOverviewoftheTREC2009LegalTrackfoundathttptrecnistgovpubstrec18papersLEGAL09OVERVIEWpdf

13

EvenusingthegivenuncorrectedTRECstandardforscoringandeventhoughinmosttopicswedidnottrainontheTRECreturned-relevantdocumentsthattheTeamconsideredirrelevanttheTeamoverallstillattainedexcellentresultsUnderthecorrectedstandardwhichwillbesharedintheFinalReporttheresultsweremuchbetterThefollowingchartcomparestheTeamrsquosRecallPrecisionandF-MeasureforeachAthometopicwiththeresultsobtainedbyTRECrsquosBMIandBMI-Descruns(onlyotherscoresnowavailable)

REASONABLECOMPARISON

Recall Precision F-Measure

EdiscoveryTeam

BMI BMI-Desc

EdiscoveryTeam

BMI BMI-Desc

EdiscoveryTeam

BMI BMI-Desc

athome401 SummerOlympics 4105 9170 9258 7344 1531 1545 5266 2623 2648athome402 Space 7257 9107 9028 2204 3086 3059 3381 4609 4570athome403 BottledWater 716 9771 9771 8041 3749 3749 1314 5418 5418athome404 EminentDomain 2294 9174 9193 6443 2655 2661 3383 4119 4127athome405 NewtGingrich 9508 9918 9836 2809 982 974 4336 1787 1773athome406 FelonDisenfran 7323 9291 9291 6691 958 958 6992 1737 1737

athome407FaithBasedInitiatives 3102 9180 9199 6872 4186 4195 4275 5750 5762

athome408 InvasiveSpecies 5517 8362 8362 6465 787 787 5953 1439 1439athome409 ClimateChange 8465 9505 9406 4071 1399 1385 5498 2440 2414athome410 Condominiums 9510 9948 9903 4613 4259 4240 6212 5964 5938athome411 StandYourGround 6629 7079 8427 6705 570 609 6667 1055 1136athome412 2000Recount 5738 9135 9248 4918 4097 4148 5296 5657 5727athome413 JamesVCrosby 9634 9908 9927 8900 2873 2878 9252 4455 4463athome414 MedicaidReform 9166 9690 9726 3532 3510 3523 5101 5154 5173athome415 GeorgeWBush 9408 6339 6708 9104 6109 5866 9253 6222 6259athome416 Marketing 6030 9419 9557 4208 4332 4396 4957 5935 6022athome417 MovieGallery 9961 9981 9966 9938 5728 5719 9949 7279 7267athome418 WarPreparations 3957 9305 9358 5034 1268 1276 4431 2232 2245

athome419LostFosterChildRilyaWilson 9884 9306 9361 1504 4813 4841 2610 6344 6382

athome420 Billboards 9254 9946 9932 9216 3165 3161 9235 4802 4795athome421 TrafficCameras 9048 10000 10000 1250 190 190 2197 373 373athome422 NonResidentAliens 9355 10000 10000 090 281 281 179 546 546

athome423NationalRifleAssociation 5105 9965 9965 3318 1868 1868 4022 3146 3146

athome424 GulfDrilling 9960 10000 10000 2276 2639 2639 3705 4176 4176athome425 CivilRightsAct2003 9132 9860 9860 9659 3370 3370 9388 5023 5023athome426 JeffreyGoldhagen 7000 9417 9417 8750 917 917 7778 1672 1672athome427 SlotMachines 8921 9668 9668 3577 1698 1698 5107 2889 2889athome428 NewStadiums 9310 9849 9849 1781 2695 2695 2991 4231 4231athome429 ElianGonzalez 9420 9927 9927 9241 3545 3545 9329 5224 5224

14

athome430RestraintsandHelmets 7195 9425 9465 6500 3640 3655 6830 5252 5274

athome431 AgencyCreditRate 7569 9931 9931 4760 1161 1161 5845 2078 2078athome432 GayAdoption 8500 9857 9857 8623 1120 1120 8561 2012 2012athome433 Abstinence 9911 10000 10000 6607 909 909 7929 1667 1667athome434 BacardiTrademark 8684 10000 10000 9167 344 344 8919 665 665

ThesecomparativestatisticsshowthescoresatthetimeofreasonablecallIntheprecisioncategorywhichinLegalSearchisthemoneyshotthathasthegreatestimpactonthecostofadocumentreviewprojectthee-DiscoveryTeamdominatedIthadthehighestprecisionlevelon28ofthe34topics(82)TheyarehighlightedinblueintheabovechartThee-DiscoveryTeamrsquosaverageprecisionscorewas571TheaverageprecisionofbothBMIandBMI-Descwas248ThustheTeamrsquosprecisionscorewasonaveragemoretwoandaquartertimeshigherthanthatoftheBMIstandards

IntheF1-measurewhichisthestandardvalueusedinlegalsearchtoevaluateoverallprecisionandrecallofaprojectthee-DiscoveryTeamagaindominatedThisissomewhatsurprisinginviewofthefactthatthesemeasurementswerebasedontheerror-filledTRECstandardTheTeamhadthehighestF1scoreson23ofthe34topics(68)TheyarehighlightedinblueintheabovechartThee-DiscoveryTeamrsquosaverageF1scorewas5769

5712

2483 2481

000

1000

2000

3000

4000

5000

6000

7000

AveragePrecisionAcrossTopics

EdiscoveryTeam BMI BMI-Desc

15

TheaverageF1ofBMIandBMI-Descwas365ThustheTeamrsquosF1scorewasonaveragemorethan58higherthanthatoftheBMIstandards

EvenusingTRECrsquoschallengedstandardtheTeamstillattainedhigherrecallthanboththeBMIandBMI-Descstandardsontwotopicstopic415GeorgeBushwithascoreof9408andtopic419LostFosterChildRilyaWilsonwithascoreof9884MoreovertheTeamattainedrecalllevelsinexcessof90atthetimeofreasonablecallinthefollowingadditionaltopics

bull 9508ontopic406FelonDisenfranchisementbull 9510ontopic410Condominiumsbull 9634ontopic413JamesVCrosbybull 9961ontopic417MovieGallerybull 9254ontopic420Billboardsbull 9048ontopic421TrafficCamerasbull 9355ontopic422NonResidentAliensbull 9960ontopic424GulfDrilling

5769

3646 3655

000

1000

2000

3000

4000

5000

6000

7000

AverageF-MeasureAcrossTopics

EdiscoveryTeam BMI BMI-Desc

16

bull 9132ontopic425CivilRightsActof2003bull 9310ontopic428NewStadiumsandArenasbull 9420ontopic429ElianGonzalezbull 9911ontopic433Abstinence

InsummaryevenwiththeTRECstandardwhereinmosttopicstheTeamdidnotusealldocumentsreturnedasrelevantforallofitstrainingdocumentsitattainedRecallscoresgreaterthan90infourteenofthethirty-fourtopicsTheTeamattainedRecallscoresof80orhigherinfouradditionaltopicsTheaverageresultsobtainedacrossallthirty-fourtopicsatthetimeofreasonablecallwereasfollows

bull 7546Recallbull 5712Precisionbull 5769F1bull 121DocsReviewedEffort

TheTeamwilldiscloseallofitsscoresunderthecorrectedgoldstandardintheFinalReportInthemeantimeherearetheaverageresultsobtainedacrossallthirty-fourtopicsatthetimeofreasonablecall

bull 8715Recallbull 6494Precisionbull 6874F1bull 124DocsReviewedEffort

AtthetimeofreasonablecalltheTeamhadrecallscoresgreaterthan90intwenty-oneofthethirty-fourtopicsandgreaterthan80infivemoretopicsRecallofgreaterthan99wasattainedinseventopicsAtthetimeofreasonablecalltheTeamhadprecisionscoresgreaterthan90inthirteenofthethirty-fourtopicsandgreaterthan80intwomoretopicsPrecisionofgreaterthan98wasattainedinsixtopicsAtthetimeofreasonablecalltheTeamhadF1scoresgreaterthan90intwelveofthethirty-fourtopicsandgreaterthan80inonemoretopicF1ofgreaterthan97wasattainedinfivetopicsWewereluckytoattainoneperfectscoreaswedidin2015intopic(417)withanF1scoreof100Theperfectscorewasobtainedbylocatingall5945documentsrelevantunderthecorrectedstandardafterreviewingonly45documentsThistopicwasfilledwithformlettersandwasafairlysimplesearchStilltheBMIandBMI-DescF1scoresforthistopicwerebothunder73TheTeamwaspleasedtoproveonceagainthatperfectrecallandperfectprecisionispossiblealbeitrareusingtheTeamrsquosmethods

17

ForquestionscommentsorsuggestionsconcerningthispreliminaryNotebookreportofthee-DiscoveryTeampleasecontactRalphLoseygmailcom

11

standardMostoftheseeffortsmanyofwhichoccurredaftertheconclusionoftheTrackinAugustwerenotincludedinthetimereportsofeffortsexpendedbyattorneysinthesearchTheTeamwasveryreluctanttotakethisstepandwouldcertainlyhaveletpassafewerrorsormeredifferencesofopinionWerecognizethatnostandardiseverperfectAslawyerstheTeamunderstandsalltoowellthatsomeperhapsmanyjudgmentsonrelevancearesubjectiveAgaininallclosequestionsonrelevancethejudgmentsofTRECrsquosassessorswereacceptedeventhoughwepersonallydisagreedTheTeammeansnodisrespectbythecreationofanalternategoldstandardWeappreciateandrespecttheeffortsmadebytheTRECassessorsandorganizersStillthevolumeofobviouserrorsencounteredforcedustotakethisactionTheintegrityofourprimaryresearchquestiontotesttheeffectivenessofourhands-ontypeofadhochybridmethodsdemandedthatwedosoWeunderstandthattheimpactonotherTotalRecallParticipantsonesthatneveractuallyexaminedocumentswouldbefarlessperhapsevennegligibleStilltherecouldbeanimpactevenfortheminsometopicswheremorethananinsignificantnumberofthesameorsimilardocumentswereinconsistentlyjudgedThedecisiontonotaccepttheerrorsseenandtoinsteadcreateourowngoldstandardresultedinsubstantialadditionalworkfortheTeamInsometopicsweeventookthestepofmakingtwoldquoreasonablecallsrdquoOnewasforTRECandthesecondcallwhichalwaystookplaceonthenextsubmissionwasforourowninternaltrackingInthesecondcallwewouldincludeemailsthatweknewfrompriorsubmissionsofthesameorsimilardocumentwouldagainbeincorrectlyconsideredirrelevantbyTRECWeknewtheyweretruerelevantandsowaiteduntilafterourpublicreasonablecalltoTRECtosubmitthemandthenwemakeourowninternalreasonablecallWewereattemptingtoineffectplaytwogamesatonceandmaximizeourscoreineachgameKeepingtrackoftwostandardsaddedanunexpectedlayerofdifficultytoourworkandwedidnotbothertodosoinalltopicsThedual-calltopicswillbespecificallyidentifiedinourFinalReportInsometopicsthedifferencebetweenthetwostandardswassubstantialInafewtopicsitwasminorSomedifferenceswerefoundinalltopicsThisisnotunexpectedinanystandardinvolvingatleastsomewhatsubjectivemassrelevanceadjudicationsWedonotintendtoengageinacriticismofthespecificgoldstandardcreationmethodsusedin2016TotalRecallTrackexcepttonotethattheappealsprocedureincludedinthe2008and2009TRECLegalTrackscouldhaveimprovedtheaccuracyoftheresultsfortheTotalRecallTrackAthomeparticipants10FurthertheTeamunderstandsfrominformalreportsthattheTREC

10Participantappealrightscouldhavemitigatedtheerrorsseenin2016butthiscanbeburdensomeandasseeninthoseTracksin2008and2009cancreatetheirownissuesSeeOardHedlinTomlinsonBaronOverviewoftheTREC2008LegalTrackfoundathttptrecnistgovpubstrec17papersLEGALOVERVIEW08pdfandOardHedlinTomlinson

12

assessorsworkwasmuchmoretimeconstrainedthanwastheworkoftheTeamMoreoverunliketheTeamtheTRECassessorsdidnothavethebenefitofSMEinputfromanativeFloridianlawyer(Losey)whowasfamiliarwithFloridapoliticsandGovernorBushandsince2015hadputsubstantialtimereviewingthisemailcollectionTheFinalReportwillincludeadetailedcomparisonofrecallprecisionandF1basedonthecomparisonofboththeTRECandTeamassessmentsAfewexamplesofthemoreegregiouserrorsencounteredwillbeprovidedTheFinalReportmayalsocontainacompletelistingoftherevisedgoldstandardsthattheTeamcreatedforeachtopicoratleastaconditionalofferofdisclosureofthecorrectedstandardsTheTeaminvitesinputfromotherparticipantsandorganizersoftheTotalRecallTrackonthisissueAgaintheTeamrecognizesthatnogoldstandardiseverperfectincludingitsownrevisedstandardsThiswillbesetforthinfurtherdetailintheTeamrsquosfinalreportResearchQuestion3WhatisthemosteffectivesearchmethodfromtheTeamrsquosmultimodaltool-setforretrievalofrelevantdocumentsfortherelativelysimplisticsearchchallengespresentedbymostofthethirty-fourtopicsShortAnswerFortheeasytopicstheTeamfoundthatwhatitcallsldquotestedparametricBooleankeywordsearchrdquowasthemosteffectivesearchmethodtofindrelevantdocumentsTheTeamwassurprisedbyhowwellasophisticateduseofkeywordswasabletoidentifynearlyallofthetargetrelevantdocumentsinmanyofthetopicsinthisyearrsquosTotalRecallTrackThisshowsthecontinuedimportanceofamultimodalapproachtolegalsearchincludingespeciallykeywordsearchwhendoneproperlyespeciallyinsimplelawsuitsinvolvingrelativelyeasysearchissuesThiswillbesetforthinfurtherdetailintheTeamrsquosfinalreportResearchQuestion4Whatistheroleofactivemachinelearninginretrievalofrelevantdocumentsinthesimplisticsearchchallengespresentedbymostofthethirty-fourtopicsShortAnswerTheTeamfoundthatfortherelativelyeasytopicsinthisyearrsquosTotalRecallTracktheroleofactivemachinelearningwasreducedtoaqualitycontrolfunctionItwouldfindafewrelevantdocumentsnotlocatedbykeywordsearchorconceptandsimilaritysearchandthusimproverecallsomewhatInthesimplesttopicsactivemachinelearningdidnotfindanynewrelevantdocumentsbutinsteadonlyconfirmedthatallrelevantdocumentshadalreadybeenfoundbytheothermethodsThiswillbesetforthinfurtherdetailintheTeamrsquosfinalreport

FurtherDiscussionofResearchQuestion1

BaronOardOverviewoftheTREC2009LegalTrackfoundathttptrecnistgovpubstrec18papersLEGAL09OVERVIEWpdf

13

EvenusingthegivenuncorrectedTRECstandardforscoringandeventhoughinmosttopicswedidnottrainontheTRECreturned-relevantdocumentsthattheTeamconsideredirrelevanttheTeamoverallstillattainedexcellentresultsUnderthecorrectedstandardwhichwillbesharedintheFinalReporttheresultsweremuchbetterThefollowingchartcomparestheTeamrsquosRecallPrecisionandF-MeasureforeachAthometopicwiththeresultsobtainedbyTRECrsquosBMIandBMI-Descruns(onlyotherscoresnowavailable)

REASONABLECOMPARISON

Recall Precision F-Measure

EdiscoveryTeam

BMI BMI-Desc

EdiscoveryTeam

BMI BMI-Desc

EdiscoveryTeam

BMI BMI-Desc

athome401 SummerOlympics 4105 9170 9258 7344 1531 1545 5266 2623 2648athome402 Space 7257 9107 9028 2204 3086 3059 3381 4609 4570athome403 BottledWater 716 9771 9771 8041 3749 3749 1314 5418 5418athome404 EminentDomain 2294 9174 9193 6443 2655 2661 3383 4119 4127athome405 NewtGingrich 9508 9918 9836 2809 982 974 4336 1787 1773athome406 FelonDisenfran 7323 9291 9291 6691 958 958 6992 1737 1737

athome407FaithBasedInitiatives 3102 9180 9199 6872 4186 4195 4275 5750 5762

athome408 InvasiveSpecies 5517 8362 8362 6465 787 787 5953 1439 1439athome409 ClimateChange 8465 9505 9406 4071 1399 1385 5498 2440 2414athome410 Condominiums 9510 9948 9903 4613 4259 4240 6212 5964 5938athome411 StandYourGround 6629 7079 8427 6705 570 609 6667 1055 1136athome412 2000Recount 5738 9135 9248 4918 4097 4148 5296 5657 5727athome413 JamesVCrosby 9634 9908 9927 8900 2873 2878 9252 4455 4463athome414 MedicaidReform 9166 9690 9726 3532 3510 3523 5101 5154 5173athome415 GeorgeWBush 9408 6339 6708 9104 6109 5866 9253 6222 6259athome416 Marketing 6030 9419 9557 4208 4332 4396 4957 5935 6022athome417 MovieGallery 9961 9981 9966 9938 5728 5719 9949 7279 7267athome418 WarPreparations 3957 9305 9358 5034 1268 1276 4431 2232 2245

athome419LostFosterChildRilyaWilson 9884 9306 9361 1504 4813 4841 2610 6344 6382

athome420 Billboards 9254 9946 9932 9216 3165 3161 9235 4802 4795athome421 TrafficCameras 9048 10000 10000 1250 190 190 2197 373 373athome422 NonResidentAliens 9355 10000 10000 090 281 281 179 546 546

athome423NationalRifleAssociation 5105 9965 9965 3318 1868 1868 4022 3146 3146

athome424 GulfDrilling 9960 10000 10000 2276 2639 2639 3705 4176 4176athome425 CivilRightsAct2003 9132 9860 9860 9659 3370 3370 9388 5023 5023athome426 JeffreyGoldhagen 7000 9417 9417 8750 917 917 7778 1672 1672athome427 SlotMachines 8921 9668 9668 3577 1698 1698 5107 2889 2889athome428 NewStadiums 9310 9849 9849 1781 2695 2695 2991 4231 4231athome429 ElianGonzalez 9420 9927 9927 9241 3545 3545 9329 5224 5224

14

athome430RestraintsandHelmets 7195 9425 9465 6500 3640 3655 6830 5252 5274

athome431 AgencyCreditRate 7569 9931 9931 4760 1161 1161 5845 2078 2078athome432 GayAdoption 8500 9857 9857 8623 1120 1120 8561 2012 2012athome433 Abstinence 9911 10000 10000 6607 909 909 7929 1667 1667athome434 BacardiTrademark 8684 10000 10000 9167 344 344 8919 665 665

ThesecomparativestatisticsshowthescoresatthetimeofreasonablecallIntheprecisioncategorywhichinLegalSearchisthemoneyshotthathasthegreatestimpactonthecostofadocumentreviewprojectthee-DiscoveryTeamdominatedIthadthehighestprecisionlevelon28ofthe34topics(82)TheyarehighlightedinblueintheabovechartThee-DiscoveryTeamrsquosaverageprecisionscorewas571TheaverageprecisionofbothBMIandBMI-Descwas248ThustheTeamrsquosprecisionscorewasonaveragemoretwoandaquartertimeshigherthanthatoftheBMIstandards

IntheF1-measurewhichisthestandardvalueusedinlegalsearchtoevaluateoverallprecisionandrecallofaprojectthee-DiscoveryTeamagaindominatedThisissomewhatsurprisinginviewofthefactthatthesemeasurementswerebasedontheerror-filledTRECstandardTheTeamhadthehighestF1scoreson23ofthe34topics(68)TheyarehighlightedinblueintheabovechartThee-DiscoveryTeamrsquosaverageF1scorewas5769

5712

2483 2481

000

1000

2000

3000

4000

5000

6000

7000

AveragePrecisionAcrossTopics

EdiscoveryTeam BMI BMI-Desc

15

TheaverageF1ofBMIandBMI-Descwas365ThustheTeamrsquosF1scorewasonaveragemorethan58higherthanthatoftheBMIstandards

EvenusingTRECrsquoschallengedstandardtheTeamstillattainedhigherrecallthanboththeBMIandBMI-Descstandardsontwotopicstopic415GeorgeBushwithascoreof9408andtopic419LostFosterChildRilyaWilsonwithascoreof9884MoreovertheTeamattainedrecalllevelsinexcessof90atthetimeofreasonablecallinthefollowingadditionaltopics

bull 9508ontopic406FelonDisenfranchisementbull 9510ontopic410Condominiumsbull 9634ontopic413JamesVCrosbybull 9961ontopic417MovieGallerybull 9254ontopic420Billboardsbull 9048ontopic421TrafficCamerasbull 9355ontopic422NonResidentAliensbull 9960ontopic424GulfDrilling

5769

3646 3655

000

1000

2000

3000

4000

5000

6000

7000

AverageF-MeasureAcrossTopics

EdiscoveryTeam BMI BMI-Desc

16

bull 9132ontopic425CivilRightsActof2003bull 9310ontopic428NewStadiumsandArenasbull 9420ontopic429ElianGonzalezbull 9911ontopic433Abstinence

InsummaryevenwiththeTRECstandardwhereinmosttopicstheTeamdidnotusealldocumentsreturnedasrelevantforallofitstrainingdocumentsitattainedRecallscoresgreaterthan90infourteenofthethirty-fourtopicsTheTeamattainedRecallscoresof80orhigherinfouradditionaltopicsTheaverageresultsobtainedacrossallthirty-fourtopicsatthetimeofreasonablecallwereasfollows

bull 7546Recallbull 5712Precisionbull 5769F1bull 121DocsReviewedEffort

TheTeamwilldiscloseallofitsscoresunderthecorrectedgoldstandardintheFinalReportInthemeantimeherearetheaverageresultsobtainedacrossallthirty-fourtopicsatthetimeofreasonablecall

bull 8715Recallbull 6494Precisionbull 6874F1bull 124DocsReviewedEffort

AtthetimeofreasonablecalltheTeamhadrecallscoresgreaterthan90intwenty-oneofthethirty-fourtopicsandgreaterthan80infivemoretopicsRecallofgreaterthan99wasattainedinseventopicsAtthetimeofreasonablecalltheTeamhadprecisionscoresgreaterthan90inthirteenofthethirty-fourtopicsandgreaterthan80intwomoretopicsPrecisionofgreaterthan98wasattainedinsixtopicsAtthetimeofreasonablecalltheTeamhadF1scoresgreaterthan90intwelveofthethirty-fourtopicsandgreaterthan80inonemoretopicF1ofgreaterthan97wasattainedinfivetopicsWewereluckytoattainoneperfectscoreaswedidin2015intopic(417)withanF1scoreof100Theperfectscorewasobtainedbylocatingall5945documentsrelevantunderthecorrectedstandardafterreviewingonly45documentsThistopicwasfilledwithformlettersandwasafairlysimplesearchStilltheBMIandBMI-DescF1scoresforthistopicwerebothunder73TheTeamwaspleasedtoproveonceagainthatperfectrecallandperfectprecisionispossiblealbeitrareusingtheTeamrsquosmethods

17

ForquestionscommentsorsuggestionsconcerningthispreliminaryNotebookreportofthee-DiscoveryTeampleasecontactRalphLoseygmailcom

12

assessorsworkwasmuchmoretimeconstrainedthanwastheworkoftheTeamMoreoverunliketheTeamtheTRECassessorsdidnothavethebenefitofSMEinputfromanativeFloridianlawyer(Losey)whowasfamiliarwithFloridapoliticsandGovernorBushandsince2015hadputsubstantialtimereviewingthisemailcollectionTheFinalReportwillincludeadetailedcomparisonofrecallprecisionandF1basedonthecomparisonofboththeTRECandTeamassessmentsAfewexamplesofthemoreegregiouserrorsencounteredwillbeprovidedTheFinalReportmayalsocontainacompletelistingoftherevisedgoldstandardsthattheTeamcreatedforeachtopicoratleastaconditionalofferofdisclosureofthecorrectedstandardsTheTeaminvitesinputfromotherparticipantsandorganizersoftheTotalRecallTrackonthisissueAgaintheTeamrecognizesthatnogoldstandardiseverperfectincludingitsownrevisedstandardsThiswillbesetforthinfurtherdetailintheTeamrsquosfinalreportResearchQuestion3WhatisthemosteffectivesearchmethodfromtheTeamrsquosmultimodaltool-setforretrievalofrelevantdocumentsfortherelativelysimplisticsearchchallengespresentedbymostofthethirty-fourtopicsShortAnswerFortheeasytopicstheTeamfoundthatwhatitcallsldquotestedparametricBooleankeywordsearchrdquowasthemosteffectivesearchmethodtofindrelevantdocumentsTheTeamwassurprisedbyhowwellasophisticateduseofkeywordswasabletoidentifynearlyallofthetargetrelevantdocumentsinmanyofthetopicsinthisyearrsquosTotalRecallTrackThisshowsthecontinuedimportanceofamultimodalapproachtolegalsearchincludingespeciallykeywordsearchwhendoneproperlyespeciallyinsimplelawsuitsinvolvingrelativelyeasysearchissuesThiswillbesetforthinfurtherdetailintheTeamrsquosfinalreportResearchQuestion4Whatistheroleofactivemachinelearninginretrievalofrelevantdocumentsinthesimplisticsearchchallengespresentedbymostofthethirty-fourtopicsShortAnswerTheTeamfoundthatfortherelativelyeasytopicsinthisyearrsquosTotalRecallTracktheroleofactivemachinelearningwasreducedtoaqualitycontrolfunctionItwouldfindafewrelevantdocumentsnotlocatedbykeywordsearchorconceptandsimilaritysearchandthusimproverecallsomewhatInthesimplesttopicsactivemachinelearningdidnotfindanynewrelevantdocumentsbutinsteadonlyconfirmedthatallrelevantdocumentshadalreadybeenfoundbytheothermethodsThiswillbesetforthinfurtherdetailintheTeamrsquosfinalreport

FurtherDiscussionofResearchQuestion1

BaronOardOverviewoftheTREC2009LegalTrackfoundathttptrecnistgovpubstrec18papersLEGAL09OVERVIEWpdf

13

EvenusingthegivenuncorrectedTRECstandardforscoringandeventhoughinmosttopicswedidnottrainontheTRECreturned-relevantdocumentsthattheTeamconsideredirrelevanttheTeamoverallstillattainedexcellentresultsUnderthecorrectedstandardwhichwillbesharedintheFinalReporttheresultsweremuchbetterThefollowingchartcomparestheTeamrsquosRecallPrecisionandF-MeasureforeachAthometopicwiththeresultsobtainedbyTRECrsquosBMIandBMI-Descruns(onlyotherscoresnowavailable)

REASONABLECOMPARISON

Recall Precision F-Measure

EdiscoveryTeam

BMI BMI-Desc

EdiscoveryTeam

BMI BMI-Desc

EdiscoveryTeam

BMI BMI-Desc

athome401 SummerOlympics 4105 9170 9258 7344 1531 1545 5266 2623 2648athome402 Space 7257 9107 9028 2204 3086 3059 3381 4609 4570athome403 BottledWater 716 9771 9771 8041 3749 3749 1314 5418 5418athome404 EminentDomain 2294 9174 9193 6443 2655 2661 3383 4119 4127athome405 NewtGingrich 9508 9918 9836 2809 982 974 4336 1787 1773athome406 FelonDisenfran 7323 9291 9291 6691 958 958 6992 1737 1737

athome407FaithBasedInitiatives 3102 9180 9199 6872 4186 4195 4275 5750 5762

athome408 InvasiveSpecies 5517 8362 8362 6465 787 787 5953 1439 1439athome409 ClimateChange 8465 9505 9406 4071 1399 1385 5498 2440 2414athome410 Condominiums 9510 9948 9903 4613 4259 4240 6212 5964 5938athome411 StandYourGround 6629 7079 8427 6705 570 609 6667 1055 1136athome412 2000Recount 5738 9135 9248 4918 4097 4148 5296 5657 5727athome413 JamesVCrosby 9634 9908 9927 8900 2873 2878 9252 4455 4463athome414 MedicaidReform 9166 9690 9726 3532 3510 3523 5101 5154 5173athome415 GeorgeWBush 9408 6339 6708 9104 6109 5866 9253 6222 6259athome416 Marketing 6030 9419 9557 4208 4332 4396 4957 5935 6022athome417 MovieGallery 9961 9981 9966 9938 5728 5719 9949 7279 7267athome418 WarPreparations 3957 9305 9358 5034 1268 1276 4431 2232 2245

athome419LostFosterChildRilyaWilson 9884 9306 9361 1504 4813 4841 2610 6344 6382

athome420 Billboards 9254 9946 9932 9216 3165 3161 9235 4802 4795athome421 TrafficCameras 9048 10000 10000 1250 190 190 2197 373 373athome422 NonResidentAliens 9355 10000 10000 090 281 281 179 546 546

athome423NationalRifleAssociation 5105 9965 9965 3318 1868 1868 4022 3146 3146

athome424 GulfDrilling 9960 10000 10000 2276 2639 2639 3705 4176 4176athome425 CivilRightsAct2003 9132 9860 9860 9659 3370 3370 9388 5023 5023athome426 JeffreyGoldhagen 7000 9417 9417 8750 917 917 7778 1672 1672athome427 SlotMachines 8921 9668 9668 3577 1698 1698 5107 2889 2889athome428 NewStadiums 9310 9849 9849 1781 2695 2695 2991 4231 4231athome429 ElianGonzalez 9420 9927 9927 9241 3545 3545 9329 5224 5224

14

athome430RestraintsandHelmets 7195 9425 9465 6500 3640 3655 6830 5252 5274

athome431 AgencyCreditRate 7569 9931 9931 4760 1161 1161 5845 2078 2078athome432 GayAdoption 8500 9857 9857 8623 1120 1120 8561 2012 2012athome433 Abstinence 9911 10000 10000 6607 909 909 7929 1667 1667athome434 BacardiTrademark 8684 10000 10000 9167 344 344 8919 665 665

ThesecomparativestatisticsshowthescoresatthetimeofreasonablecallIntheprecisioncategorywhichinLegalSearchisthemoneyshotthathasthegreatestimpactonthecostofadocumentreviewprojectthee-DiscoveryTeamdominatedIthadthehighestprecisionlevelon28ofthe34topics(82)TheyarehighlightedinblueintheabovechartThee-DiscoveryTeamrsquosaverageprecisionscorewas571TheaverageprecisionofbothBMIandBMI-Descwas248ThustheTeamrsquosprecisionscorewasonaveragemoretwoandaquartertimeshigherthanthatoftheBMIstandards

IntheF1-measurewhichisthestandardvalueusedinlegalsearchtoevaluateoverallprecisionandrecallofaprojectthee-DiscoveryTeamagaindominatedThisissomewhatsurprisinginviewofthefactthatthesemeasurementswerebasedontheerror-filledTRECstandardTheTeamhadthehighestF1scoreson23ofthe34topics(68)TheyarehighlightedinblueintheabovechartThee-DiscoveryTeamrsquosaverageF1scorewas5769

5712

2483 2481

000

1000

2000

3000

4000

5000

6000

7000

AveragePrecisionAcrossTopics

EdiscoveryTeam BMI BMI-Desc

15

TheaverageF1ofBMIandBMI-Descwas365ThustheTeamrsquosF1scorewasonaveragemorethan58higherthanthatoftheBMIstandards

EvenusingTRECrsquoschallengedstandardtheTeamstillattainedhigherrecallthanboththeBMIandBMI-Descstandardsontwotopicstopic415GeorgeBushwithascoreof9408andtopic419LostFosterChildRilyaWilsonwithascoreof9884MoreovertheTeamattainedrecalllevelsinexcessof90atthetimeofreasonablecallinthefollowingadditionaltopics

bull 9508ontopic406FelonDisenfranchisementbull 9510ontopic410Condominiumsbull 9634ontopic413JamesVCrosbybull 9961ontopic417MovieGallerybull 9254ontopic420Billboardsbull 9048ontopic421TrafficCamerasbull 9355ontopic422NonResidentAliensbull 9960ontopic424GulfDrilling

5769

3646 3655

000

1000

2000

3000

4000

5000

6000

7000

AverageF-MeasureAcrossTopics

EdiscoveryTeam BMI BMI-Desc

16

bull 9132ontopic425CivilRightsActof2003bull 9310ontopic428NewStadiumsandArenasbull 9420ontopic429ElianGonzalezbull 9911ontopic433Abstinence

InsummaryevenwiththeTRECstandardwhereinmosttopicstheTeamdidnotusealldocumentsreturnedasrelevantforallofitstrainingdocumentsitattainedRecallscoresgreaterthan90infourteenofthethirty-fourtopicsTheTeamattainedRecallscoresof80orhigherinfouradditionaltopicsTheaverageresultsobtainedacrossallthirty-fourtopicsatthetimeofreasonablecallwereasfollows

bull 7546Recallbull 5712Precisionbull 5769F1bull 121DocsReviewedEffort

TheTeamwilldiscloseallofitsscoresunderthecorrectedgoldstandardintheFinalReportInthemeantimeherearetheaverageresultsobtainedacrossallthirty-fourtopicsatthetimeofreasonablecall

bull 8715Recallbull 6494Precisionbull 6874F1bull 124DocsReviewedEffort

AtthetimeofreasonablecalltheTeamhadrecallscoresgreaterthan90intwenty-oneofthethirty-fourtopicsandgreaterthan80infivemoretopicsRecallofgreaterthan99wasattainedinseventopicsAtthetimeofreasonablecalltheTeamhadprecisionscoresgreaterthan90inthirteenofthethirty-fourtopicsandgreaterthan80intwomoretopicsPrecisionofgreaterthan98wasattainedinsixtopicsAtthetimeofreasonablecalltheTeamhadF1scoresgreaterthan90intwelveofthethirty-fourtopicsandgreaterthan80inonemoretopicF1ofgreaterthan97wasattainedinfivetopicsWewereluckytoattainoneperfectscoreaswedidin2015intopic(417)withanF1scoreof100Theperfectscorewasobtainedbylocatingall5945documentsrelevantunderthecorrectedstandardafterreviewingonly45documentsThistopicwasfilledwithformlettersandwasafairlysimplesearchStilltheBMIandBMI-DescF1scoresforthistopicwerebothunder73TheTeamwaspleasedtoproveonceagainthatperfectrecallandperfectprecisionispossiblealbeitrareusingtheTeamrsquosmethods

17

ForquestionscommentsorsuggestionsconcerningthispreliminaryNotebookreportofthee-DiscoveryTeampleasecontactRalphLoseygmailcom

13

EvenusingthegivenuncorrectedTRECstandardforscoringandeventhoughinmosttopicswedidnottrainontheTRECreturned-relevantdocumentsthattheTeamconsideredirrelevanttheTeamoverallstillattainedexcellentresultsUnderthecorrectedstandardwhichwillbesharedintheFinalReporttheresultsweremuchbetterThefollowingchartcomparestheTeamrsquosRecallPrecisionandF-MeasureforeachAthometopicwiththeresultsobtainedbyTRECrsquosBMIandBMI-Descruns(onlyotherscoresnowavailable)

REASONABLECOMPARISON

Recall Precision F-Measure

EdiscoveryTeam

BMI BMI-Desc

EdiscoveryTeam

BMI BMI-Desc

EdiscoveryTeam

BMI BMI-Desc

athome401 SummerOlympics 4105 9170 9258 7344 1531 1545 5266 2623 2648athome402 Space 7257 9107 9028 2204 3086 3059 3381 4609 4570athome403 BottledWater 716 9771 9771 8041 3749 3749 1314 5418 5418athome404 EminentDomain 2294 9174 9193 6443 2655 2661 3383 4119 4127athome405 NewtGingrich 9508 9918 9836 2809 982 974 4336 1787 1773athome406 FelonDisenfran 7323 9291 9291 6691 958 958 6992 1737 1737

athome407FaithBasedInitiatives 3102 9180 9199 6872 4186 4195 4275 5750 5762

athome408 InvasiveSpecies 5517 8362 8362 6465 787 787 5953 1439 1439athome409 ClimateChange 8465 9505 9406 4071 1399 1385 5498 2440 2414athome410 Condominiums 9510 9948 9903 4613 4259 4240 6212 5964 5938athome411 StandYourGround 6629 7079 8427 6705 570 609 6667 1055 1136athome412 2000Recount 5738 9135 9248 4918 4097 4148 5296 5657 5727athome413 JamesVCrosby 9634 9908 9927 8900 2873 2878 9252 4455 4463athome414 MedicaidReform 9166 9690 9726 3532 3510 3523 5101 5154 5173athome415 GeorgeWBush 9408 6339 6708 9104 6109 5866 9253 6222 6259athome416 Marketing 6030 9419 9557 4208 4332 4396 4957 5935 6022athome417 MovieGallery 9961 9981 9966 9938 5728 5719 9949 7279 7267athome418 WarPreparations 3957 9305 9358 5034 1268 1276 4431 2232 2245

athome419LostFosterChildRilyaWilson 9884 9306 9361 1504 4813 4841 2610 6344 6382

athome420 Billboards 9254 9946 9932 9216 3165 3161 9235 4802 4795athome421 TrafficCameras 9048 10000 10000 1250 190 190 2197 373 373athome422 NonResidentAliens 9355 10000 10000 090 281 281 179 546 546

athome423NationalRifleAssociation 5105 9965 9965 3318 1868 1868 4022 3146 3146

athome424 GulfDrilling 9960 10000 10000 2276 2639 2639 3705 4176 4176athome425 CivilRightsAct2003 9132 9860 9860 9659 3370 3370 9388 5023 5023athome426 JeffreyGoldhagen 7000 9417 9417 8750 917 917 7778 1672 1672athome427 SlotMachines 8921 9668 9668 3577 1698 1698 5107 2889 2889athome428 NewStadiums 9310 9849 9849 1781 2695 2695 2991 4231 4231athome429 ElianGonzalez 9420 9927 9927 9241 3545 3545 9329 5224 5224

14

athome430RestraintsandHelmets 7195 9425 9465 6500 3640 3655 6830 5252 5274

athome431 AgencyCreditRate 7569 9931 9931 4760 1161 1161 5845 2078 2078athome432 GayAdoption 8500 9857 9857 8623 1120 1120 8561 2012 2012athome433 Abstinence 9911 10000 10000 6607 909 909 7929 1667 1667athome434 BacardiTrademark 8684 10000 10000 9167 344 344 8919 665 665

ThesecomparativestatisticsshowthescoresatthetimeofreasonablecallIntheprecisioncategorywhichinLegalSearchisthemoneyshotthathasthegreatestimpactonthecostofadocumentreviewprojectthee-DiscoveryTeamdominatedIthadthehighestprecisionlevelon28ofthe34topics(82)TheyarehighlightedinblueintheabovechartThee-DiscoveryTeamrsquosaverageprecisionscorewas571TheaverageprecisionofbothBMIandBMI-Descwas248ThustheTeamrsquosprecisionscorewasonaveragemoretwoandaquartertimeshigherthanthatoftheBMIstandards

IntheF1-measurewhichisthestandardvalueusedinlegalsearchtoevaluateoverallprecisionandrecallofaprojectthee-DiscoveryTeamagaindominatedThisissomewhatsurprisinginviewofthefactthatthesemeasurementswerebasedontheerror-filledTRECstandardTheTeamhadthehighestF1scoreson23ofthe34topics(68)TheyarehighlightedinblueintheabovechartThee-DiscoveryTeamrsquosaverageF1scorewas5769

5712

2483 2481

000

1000

2000

3000

4000

5000

6000

7000

AveragePrecisionAcrossTopics

EdiscoveryTeam BMI BMI-Desc

15

TheaverageF1ofBMIandBMI-Descwas365ThustheTeamrsquosF1scorewasonaveragemorethan58higherthanthatoftheBMIstandards

EvenusingTRECrsquoschallengedstandardtheTeamstillattainedhigherrecallthanboththeBMIandBMI-Descstandardsontwotopicstopic415GeorgeBushwithascoreof9408andtopic419LostFosterChildRilyaWilsonwithascoreof9884MoreovertheTeamattainedrecalllevelsinexcessof90atthetimeofreasonablecallinthefollowingadditionaltopics

bull 9508ontopic406FelonDisenfranchisementbull 9510ontopic410Condominiumsbull 9634ontopic413JamesVCrosbybull 9961ontopic417MovieGallerybull 9254ontopic420Billboardsbull 9048ontopic421TrafficCamerasbull 9355ontopic422NonResidentAliensbull 9960ontopic424GulfDrilling

5769

3646 3655

000

1000

2000

3000

4000

5000

6000

7000

AverageF-MeasureAcrossTopics

EdiscoveryTeam BMI BMI-Desc

16

bull 9132ontopic425CivilRightsActof2003bull 9310ontopic428NewStadiumsandArenasbull 9420ontopic429ElianGonzalezbull 9911ontopic433Abstinence

InsummaryevenwiththeTRECstandardwhereinmosttopicstheTeamdidnotusealldocumentsreturnedasrelevantforallofitstrainingdocumentsitattainedRecallscoresgreaterthan90infourteenofthethirty-fourtopicsTheTeamattainedRecallscoresof80orhigherinfouradditionaltopicsTheaverageresultsobtainedacrossallthirty-fourtopicsatthetimeofreasonablecallwereasfollows

bull 7546Recallbull 5712Precisionbull 5769F1bull 121DocsReviewedEffort

TheTeamwilldiscloseallofitsscoresunderthecorrectedgoldstandardintheFinalReportInthemeantimeherearetheaverageresultsobtainedacrossallthirty-fourtopicsatthetimeofreasonablecall

bull 8715Recallbull 6494Precisionbull 6874F1bull 124DocsReviewedEffort

AtthetimeofreasonablecalltheTeamhadrecallscoresgreaterthan90intwenty-oneofthethirty-fourtopicsandgreaterthan80infivemoretopicsRecallofgreaterthan99wasattainedinseventopicsAtthetimeofreasonablecalltheTeamhadprecisionscoresgreaterthan90inthirteenofthethirty-fourtopicsandgreaterthan80intwomoretopicsPrecisionofgreaterthan98wasattainedinsixtopicsAtthetimeofreasonablecalltheTeamhadF1scoresgreaterthan90intwelveofthethirty-fourtopicsandgreaterthan80inonemoretopicF1ofgreaterthan97wasattainedinfivetopicsWewereluckytoattainoneperfectscoreaswedidin2015intopic(417)withanF1scoreof100Theperfectscorewasobtainedbylocatingall5945documentsrelevantunderthecorrectedstandardafterreviewingonly45documentsThistopicwasfilledwithformlettersandwasafairlysimplesearchStilltheBMIandBMI-DescF1scoresforthistopicwerebothunder73TheTeamwaspleasedtoproveonceagainthatperfectrecallandperfectprecisionispossiblealbeitrareusingtheTeamrsquosmethods

17

ForquestionscommentsorsuggestionsconcerningthispreliminaryNotebookreportofthee-DiscoveryTeampleasecontactRalphLoseygmailcom

14

athome430RestraintsandHelmets 7195 9425 9465 6500 3640 3655 6830 5252 5274

athome431 AgencyCreditRate 7569 9931 9931 4760 1161 1161 5845 2078 2078athome432 GayAdoption 8500 9857 9857 8623 1120 1120 8561 2012 2012athome433 Abstinence 9911 10000 10000 6607 909 909 7929 1667 1667athome434 BacardiTrademark 8684 10000 10000 9167 344 344 8919 665 665

ThesecomparativestatisticsshowthescoresatthetimeofreasonablecallIntheprecisioncategorywhichinLegalSearchisthemoneyshotthathasthegreatestimpactonthecostofadocumentreviewprojectthee-DiscoveryTeamdominatedIthadthehighestprecisionlevelon28ofthe34topics(82)TheyarehighlightedinblueintheabovechartThee-DiscoveryTeamrsquosaverageprecisionscorewas571TheaverageprecisionofbothBMIandBMI-Descwas248ThustheTeamrsquosprecisionscorewasonaveragemoretwoandaquartertimeshigherthanthatoftheBMIstandards

IntheF1-measurewhichisthestandardvalueusedinlegalsearchtoevaluateoverallprecisionandrecallofaprojectthee-DiscoveryTeamagaindominatedThisissomewhatsurprisinginviewofthefactthatthesemeasurementswerebasedontheerror-filledTRECstandardTheTeamhadthehighestF1scoreson23ofthe34topics(68)TheyarehighlightedinblueintheabovechartThee-DiscoveryTeamrsquosaverageF1scorewas5769

5712

2483 2481

000

1000

2000

3000

4000

5000

6000

7000

AveragePrecisionAcrossTopics

EdiscoveryTeam BMI BMI-Desc

15

TheaverageF1ofBMIandBMI-Descwas365ThustheTeamrsquosF1scorewasonaveragemorethan58higherthanthatoftheBMIstandards

EvenusingTRECrsquoschallengedstandardtheTeamstillattainedhigherrecallthanboththeBMIandBMI-Descstandardsontwotopicstopic415GeorgeBushwithascoreof9408andtopic419LostFosterChildRilyaWilsonwithascoreof9884MoreovertheTeamattainedrecalllevelsinexcessof90atthetimeofreasonablecallinthefollowingadditionaltopics

bull 9508ontopic406FelonDisenfranchisementbull 9510ontopic410Condominiumsbull 9634ontopic413JamesVCrosbybull 9961ontopic417MovieGallerybull 9254ontopic420Billboardsbull 9048ontopic421TrafficCamerasbull 9355ontopic422NonResidentAliensbull 9960ontopic424GulfDrilling

5769

3646 3655

000

1000

2000

3000

4000

5000

6000

7000

AverageF-MeasureAcrossTopics

EdiscoveryTeam BMI BMI-Desc

16

bull 9132ontopic425CivilRightsActof2003bull 9310ontopic428NewStadiumsandArenasbull 9420ontopic429ElianGonzalezbull 9911ontopic433Abstinence

InsummaryevenwiththeTRECstandardwhereinmosttopicstheTeamdidnotusealldocumentsreturnedasrelevantforallofitstrainingdocumentsitattainedRecallscoresgreaterthan90infourteenofthethirty-fourtopicsTheTeamattainedRecallscoresof80orhigherinfouradditionaltopicsTheaverageresultsobtainedacrossallthirty-fourtopicsatthetimeofreasonablecallwereasfollows

bull 7546Recallbull 5712Precisionbull 5769F1bull 121DocsReviewedEffort

TheTeamwilldiscloseallofitsscoresunderthecorrectedgoldstandardintheFinalReportInthemeantimeherearetheaverageresultsobtainedacrossallthirty-fourtopicsatthetimeofreasonablecall

bull 8715Recallbull 6494Precisionbull 6874F1bull 124DocsReviewedEffort

AtthetimeofreasonablecalltheTeamhadrecallscoresgreaterthan90intwenty-oneofthethirty-fourtopicsandgreaterthan80infivemoretopicsRecallofgreaterthan99wasattainedinseventopicsAtthetimeofreasonablecalltheTeamhadprecisionscoresgreaterthan90inthirteenofthethirty-fourtopicsandgreaterthan80intwomoretopicsPrecisionofgreaterthan98wasattainedinsixtopicsAtthetimeofreasonablecalltheTeamhadF1scoresgreaterthan90intwelveofthethirty-fourtopicsandgreaterthan80inonemoretopicF1ofgreaterthan97wasattainedinfivetopicsWewereluckytoattainoneperfectscoreaswedidin2015intopic(417)withanF1scoreof100Theperfectscorewasobtainedbylocatingall5945documentsrelevantunderthecorrectedstandardafterreviewingonly45documentsThistopicwasfilledwithformlettersandwasafairlysimplesearchStilltheBMIandBMI-DescF1scoresforthistopicwerebothunder73TheTeamwaspleasedtoproveonceagainthatperfectrecallandperfectprecisionispossiblealbeitrareusingtheTeamrsquosmethods

17

ForquestionscommentsorsuggestionsconcerningthispreliminaryNotebookreportofthee-DiscoveryTeampleasecontactRalphLoseygmailcom

15

TheaverageF1ofBMIandBMI-Descwas365ThustheTeamrsquosF1scorewasonaveragemorethan58higherthanthatoftheBMIstandards

EvenusingTRECrsquoschallengedstandardtheTeamstillattainedhigherrecallthanboththeBMIandBMI-Descstandardsontwotopicstopic415GeorgeBushwithascoreof9408andtopic419LostFosterChildRilyaWilsonwithascoreof9884MoreovertheTeamattainedrecalllevelsinexcessof90atthetimeofreasonablecallinthefollowingadditionaltopics

bull 9508ontopic406FelonDisenfranchisementbull 9510ontopic410Condominiumsbull 9634ontopic413JamesVCrosbybull 9961ontopic417MovieGallerybull 9254ontopic420Billboardsbull 9048ontopic421TrafficCamerasbull 9355ontopic422NonResidentAliensbull 9960ontopic424GulfDrilling

5769

3646 3655

000

1000

2000

3000

4000

5000

6000

7000

AverageF-MeasureAcrossTopics

EdiscoveryTeam BMI BMI-Desc

16

bull 9132ontopic425CivilRightsActof2003bull 9310ontopic428NewStadiumsandArenasbull 9420ontopic429ElianGonzalezbull 9911ontopic433Abstinence

InsummaryevenwiththeTRECstandardwhereinmosttopicstheTeamdidnotusealldocumentsreturnedasrelevantforallofitstrainingdocumentsitattainedRecallscoresgreaterthan90infourteenofthethirty-fourtopicsTheTeamattainedRecallscoresof80orhigherinfouradditionaltopicsTheaverageresultsobtainedacrossallthirty-fourtopicsatthetimeofreasonablecallwereasfollows

bull 7546Recallbull 5712Precisionbull 5769F1bull 121DocsReviewedEffort

TheTeamwilldiscloseallofitsscoresunderthecorrectedgoldstandardintheFinalReportInthemeantimeherearetheaverageresultsobtainedacrossallthirty-fourtopicsatthetimeofreasonablecall

bull 8715Recallbull 6494Precisionbull 6874F1bull 124DocsReviewedEffort

AtthetimeofreasonablecalltheTeamhadrecallscoresgreaterthan90intwenty-oneofthethirty-fourtopicsandgreaterthan80infivemoretopicsRecallofgreaterthan99wasattainedinseventopicsAtthetimeofreasonablecalltheTeamhadprecisionscoresgreaterthan90inthirteenofthethirty-fourtopicsandgreaterthan80intwomoretopicsPrecisionofgreaterthan98wasattainedinsixtopicsAtthetimeofreasonablecalltheTeamhadF1scoresgreaterthan90intwelveofthethirty-fourtopicsandgreaterthan80inonemoretopicF1ofgreaterthan97wasattainedinfivetopicsWewereluckytoattainoneperfectscoreaswedidin2015intopic(417)withanF1scoreof100Theperfectscorewasobtainedbylocatingall5945documentsrelevantunderthecorrectedstandardafterreviewingonly45documentsThistopicwasfilledwithformlettersandwasafairlysimplesearchStilltheBMIandBMI-DescF1scoresforthistopicwerebothunder73TheTeamwaspleasedtoproveonceagainthatperfectrecallandperfectprecisionispossiblealbeitrareusingtheTeamrsquosmethods

17

ForquestionscommentsorsuggestionsconcerningthispreliminaryNotebookreportofthee-DiscoveryTeampleasecontactRalphLoseygmailcom

16

bull 9132ontopic425CivilRightsActof2003bull 9310ontopic428NewStadiumsandArenasbull 9420ontopic429ElianGonzalezbull 9911ontopic433Abstinence

InsummaryevenwiththeTRECstandardwhereinmosttopicstheTeamdidnotusealldocumentsreturnedasrelevantforallofitstrainingdocumentsitattainedRecallscoresgreaterthan90infourteenofthethirty-fourtopicsTheTeamattainedRecallscoresof80orhigherinfouradditionaltopicsTheaverageresultsobtainedacrossallthirty-fourtopicsatthetimeofreasonablecallwereasfollows

bull 7546Recallbull 5712Precisionbull 5769F1bull 121DocsReviewedEffort

TheTeamwilldiscloseallofitsscoresunderthecorrectedgoldstandardintheFinalReportInthemeantimeherearetheaverageresultsobtainedacrossallthirty-fourtopicsatthetimeofreasonablecall

bull 8715Recallbull 6494Precisionbull 6874F1bull 124DocsReviewedEffort

AtthetimeofreasonablecalltheTeamhadrecallscoresgreaterthan90intwenty-oneofthethirty-fourtopicsandgreaterthan80infivemoretopicsRecallofgreaterthan99wasattainedinseventopicsAtthetimeofreasonablecalltheTeamhadprecisionscoresgreaterthan90inthirteenofthethirty-fourtopicsandgreaterthan80intwomoretopicsPrecisionofgreaterthan98wasattainedinsixtopicsAtthetimeofreasonablecalltheTeamhadF1scoresgreaterthan90intwelveofthethirty-fourtopicsandgreaterthan80inonemoretopicF1ofgreaterthan97wasattainedinfivetopicsWewereluckytoattainoneperfectscoreaswedidin2015intopic(417)withanF1scoreof100Theperfectscorewasobtainedbylocatingall5945documentsrelevantunderthecorrectedstandardafterreviewingonly45documentsThistopicwasfilledwithformlettersandwasafairlysimplesearchStilltheBMIandBMI-DescF1scoresforthistopicwerebothunder73TheTeamwaspleasedtoproveonceagainthatperfectrecallandperfectprecisionispossiblealbeitrareusingtheTeamrsquosmethods

17

ForquestionscommentsorsuggestionsconcerningthispreliminaryNotebookreportofthee-DiscoveryTeampleasecontactRalphLoseygmailcom

17

ForquestionscommentsorsuggestionsconcerningthispreliminaryNotebookreportofthee-DiscoveryTeampleasecontactRalphLoseygmailcom

top related