multiplexing droplet-based single cell rna-sequencing ... · 59 barcodes of two sets of 466 cells...

20
Multiplexing droplet-based single cell RNA-sequencing using natural genetic barcodes Authors: Hyun Min Kang*$, Meena Subramaniam$, Sasha Targ$, Michelle Nguyen, Lenka Maliskova, Eunice Wan, Simon Wong, Lauren Byrnes, Cristina Lanata, Rachel Gate, Sara Mostafavi, Alexander Marson, Noah Zaitlen, Lindsey A Criswell, Jimmie Ye* certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was not this version posted March 20, 2017. . https://doi.org/10.1101/118778 doi: bioRxiv preprint

Upload: others

Post on 19-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Multiplexing droplet-based single cell RNA-sequencing ... · 59 barcodes of two sets of 466 cells from individuals S1 and S2 to be the same. Applied to the 60 simulated data, demuxlet

Multiplexingdroplet-basedsinglecellRNA-sequencingusingnaturalgeneticbarcodes

Authors: Hyun Min Kang*$, Meena Subramaniam$, Sasha Targ$, Michelle Nguyen, Lenka

Maliskova,EuniceWan,SimonWong,LaurenByrnes,CristinaLanata,RachelGate,SaraMostafavi,

AlexanderMarson,NoahZaitlen,LindseyACriswell,JimmieYe*

certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted March 20, 2017. . https://doi.org/10.1101/118778doi: bioRxiv preprint

Page 2: Multiplexing droplet-based single cell RNA-sequencing ... · 59 barcodes of two sets of 466 cells from individuals S1 and S2 to be the same. Applied to the 60 simulated data, demuxlet

Theconfluenceofmicrofluidicandsequencingtechnologieshasenabledprofilingofthe1

transcriptome1,2,epigenome3,andchromatinconformationofsinglecells4atan2

unprecedentedscale.InitialapplicationsofsinglecellRNA-sequencinghavecharacterized3

cellularheterogeneityintumors5,6,tissues7,8,andresponsetostimulation9.Morerecently,4

droplet-basedtechnologieshavesignificantlyincreasedthethroughputofsinglecellcapture5

andlibrarypreparation1,10,enablingtranscriptomesequencingofthousandsofcellsfromone6

microfluidicreaction.7

Whileimprovementsinbiochemistry11,12andmicrofluidics13,14continuetoincreasethe8

numberofcellsthatcanbesequencedpersample,formanyapplications(e.g.differential9

expressionandgeneticstudies),sequencingthousandsofcellseachfrommanyindividuals10

wouldbettercaptureinterindividualvariabilitythansequencingmorecellsfromafew11

individuals.However,instandardworkflows,runningaseparatemicrofluidicreactionforeach12

sampleremainscostprohibitive15.Multiplexingcouldsignificantlyreducethepersamplecost13

byallowingcellsfromseveralindividualstobeprocessedsimultaneously,andreducetheper14

cellcostbyallowinghigherflowratesduetotheabilitytodetectandexcludedoubletsthat15

containcellsfromtwodifferentindividuals.Further,samplemultiplexinglimitsthetechnical16

variabilityassociatedwithsampleandlibrarypreparation,improvingstatisticalpowerto17

accuratelyestimatetruebiologicaleffects16.18

Wepresentasimpleexperimentaldesignandcomputationalalgorithm,demuxlet,to19

multiplexsamplesindscRNA-seqwithoutadditionalexperimentalmodification(Fig.1A).While20

strategiestodemultiplexcellsfromdifferentspecies1,10,17orhostandgraftsamples17have21

beenreported,nomethodisavailableforsimultaneousdemultiplexinganddoubletdetection22

certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted March 20, 2017. . https://doi.org/10.1101/118778doi: bioRxiv preprint

Page 3: Multiplexing droplet-based single cell RNA-sequencing ... · 59 barcodes of two sets of 466 cells from individuals S1 and S2 to be the same. Applied to the 60 simulated data, demuxlet

ofcellsfrom>2individuals.Inspiredbymodelsandalgorithmsdevelopedforcontamination23

detectioninDNAsequencingdata18,demuxletisfast,accurate,scalableandworkswith24

standardinputformats17,19,20.25

Attheheartofourstrategyisastatisticalmodelforpredictingtheprobabilityof26

observingaconsistent‘geneticbarcode’,asetofsinglenucleotidepolymorphisms(SNPs),in27

theRNA-seqreadsofasinglecellandthegenotypes(fromSNPgenotyping,imputationorDNA28

sequencing)ofdonorsamples.ThemodelaccountsforthebasequalityscoreoftheRNA-29

sequencingreadsaspreviouslydescribed18andgenotypeuncertaintiesatunobservedSNPs30

fromimputationtolargereferencepanels21.Itthenusesmaximumlikelihoodtodeterminethe31

mostlikelysampleidentityforeachcellusingamixturemodel.Asmallnumberofreads32

overlappingcommonSNPsissufficienttoaccuratelyidentifythesampleoforigin.Forapoolof33

8samples,4SNPscanuniquelyassignacelltothedonoroforigin(Fig.1B),and20SNPseach34

withminorallelefrequency(MAF)of50%candistinguisheverysamplewith98%probability.35

Themixturemodelindemuxletalsousesgeneticinformationtoidentifydoublets36

containingtwocellsfromdifferentindividuals,whichcomprisemostdropletscontaining37

multiplecells.Bymultiplexingevenasmallnumberofsamples,adoubletwillhaveahigh38

probability(1–1/N,e.g.87.5%forN=8samples)ofcontainingcellsfromtwoindividualswhich39

isdetectablebythedemuxletmodel(Fig.1C).Theabilitytorecoverthesampleidentityofeach40

cell(“demuxing”)andidentifymostdoubletsenablesexperimentaldesignsthatsignificantly41

increasethepersamplethroughputofcurrentdscRNA-seqworkflows.42

Wefirstassessthefeasibilityofourstrategyandtheperformanceofdemuxletby43

analyzingmultiplexedperipheralbloodmononuclearcells(PBMCs)from8patientswith44

certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted March 20, 2017. . https://doi.org/10.1101/118778doi: bioRxiv preprint

Page 4: Multiplexing droplet-based single cell RNA-sequencing ... · 59 barcodes of two sets of 466 cells from individuals S1 and S2 to be the same. Applied to the 60 simulated data, demuxlet

systemiclupuserythematosus(SLE).Usingasequentialpoolingstrategy,threepoolsof45

equimolarconcentrationsofcellsweregenerated(W1:patientsS1-S4,W2:patientsS5-S8and46

W3:patientsS1-S8)andeachloadedinawellona10XChromiumSingleCellinstrument(Fig.47

2A).3,645,4,254and6,205singlecellswereobtainedfromeachwellandsequencedtoan48

averagedepthof23k,17kand13kreadspercell.49

Demuxletidentified91%(3332/3645),91%(3864/4254),and86%(5348/6205)of50

dropletsassingletsfromwellsW1,W2andW3,respectively.25%(+/-2.6%),25%(+/-4.6%)51

and12.5%(+/-1.4%)ofsingletsfromwellsW1,W2andW3mappedtoeachdonor,consistent52

withequalmixingof8individuals.Weestimateanerrorrate(numberofcellsassignedto53

individualsnotinthemixture)of2/3332(W1)and0/3864(W2)singletsbyanalyzingwellsW154

andW2,eachcontainingcellsfromtwodisjointsetsof4individuals(Fig.2B),suggesting>99%55

ofsingletswereassignedtoindividualscorrectly.56

Wenextassesstheabilityofdemuxlettodetectdoubletsinbothsimulatedandreal57

data.466/3645(13%)cellsweresimulatedassyntheticdoubletsbysettingthecellular58

barcodesoftwosetsof466cellsfromindividualsS1andS2tobethesame.Appliedtothe59

simulateddata,demuxletidentified91%(426/466)ofsyntheticdoubletsasdoubletsor60

ambiguous,correctlyrecoveringthesampleidentityofbothcellsin403/426(95%)doublets61

(fig.S1).AppliedtorealdatafromW1,W2andW3,demuxletidentified138/3645,165/4254,62

and384/6205doublets,correspondingto5.0%,5.2%and7.1%,consistentwiththelinear63

relationshipbetweenthenumberofcellssequencedanddoubletratesestimatedusingamixed64

speciesexperiment(Fig.2C).65

certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted March 20, 2017. . https://doi.org/10.1101/118778doi: bioRxiv preprint

Page 5: Multiplexing droplet-based single cell RNA-sequencing ... · 59 barcodes of two sets of 466 cells from individuals S1 and S2 to be the same. Applied to the 60 simulated data, demuxlet

Sampledemultiplexingenablesindividual-specificvisualizationofsinglecelldatawecall66

‘dropprints’.Whilebothvariabilityincelltypeproportionandgeneexpressionhavebeen67

previouslyobservedinPBMCs,ithasnotbeenpossibletofullycontrolforbatcheffectsdueto68

separateprocessingofsamples22,23.Singletsidentifiedbydemuxletinallthreewellscluster69

intoknownPBMCsubpopulations(Fig.2D)andarenotconfoundedbywelltowelleffects(fig.70

S2A).Whilewefound6differentiallyexpressedgenes(FDR<0.05)betweenwellsW1andW2,71

only2genesweredifferentiallyexpressedinwellW3betweenW1andW2individuals(FDR<72

0.05)(fig.S2B)suggestingsamplemultiplexingcouldreduceconfoundingsuchaslibrary73

preparationbatcheffects.Furthermore,forthesameindividuals,dropprintsfromtwo74

differentwellsarequalitativelyconsistent,theestimatesofcelltypeproportionsforthesame75

individualsinW1orW2andW3arehighlycorrelated(R=0.99)(Fig.2Eandfig.S3),andthe76

inferredcelltype-specificexpressionprofilesarecorrelatedwithbulksequencingofsortedcell77

populations(R=0.76-0.92)(fig.S4).Theseresultsdemonstratethatdemuxletrecoversthe78

sampleidentityofsinglecellswithhighaccuracy,identifiesdoubletsattheexpectedrate,and79

canallowforcomparisonofindividualswithinandacrosswells.80

Demuxletenablesmultiplexedexperimentaldesignsthatincreasethesample81

throughputforprofilingofinterindividualresponsesacrossavarietyofconditions.Weapplied82

suchamultiplexingstrategytocharacterizecelltype-specificresponsestoIFN-β,apotent83

cytokinethatinducesgenome-scalechangesinthetranscriptionalprofilesofimmunecells24,25.84

From8lupuspatients,1MPBMCseachwereisolated,sequentiallypooled,anddividedintwo85

aliquots.OnesamplewasactivatedwithrecombinantIFN-βfor6hours,atimepointwe86

previouslyfoundtomaximizetheexpressionofinterferon-sensitivegenes(ISGs)indendritic87

certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted March 20, 2017. . https://doi.org/10.1101/118778doi: bioRxiv preprint

Page 6: Multiplexing droplet-based single cell RNA-sequencing ... · 59 barcodes of two sets of 466 cells from individuals S1 and S2 to be the same. Applied to the 60 simulated data, demuxlet

cells(DCs)andTcells26,27.Amatchedcontrolsamplewasalsoculturedfor6hours.Fromthis88

experiment,wecapturedandsequenced14,619controland14,446stimulatedcells.89

Incontrolandstimulatedexperiments,demuxletidentified83%(12138/14619)and84%90

(12167/14446)ofdropletsassinglets,andrecoveredthesampleidentityof99%(12127/1213891

and12155/12167)ofsinglets.Detecteddoubletsformdistinctclustersint-SNEspaceatthe92

peripheryofothercelltypes,indicativeoftheexpectedenrichmentofdoubletsformixedcell93

typesinaheterogeneouspopulation(fig.S5).Theestimateddoubletrateof10.9%isconsistent94

withpredictedratesbasedonthenumberofcellsrecovered,andtheobservedproportionof95

doubletsfromeachpairofindividualsishighlycorrelatedwiththeexpectedproportions96

(R=0.98)(Fig.2Candfig.S6).97

Demultiplexingindividualsenablestheuseofthe8sampleswithinapoolasbiological98

replicatestoquantitativelyassesscelltype-specificresponsestoIFN-βstimulation.Consistent99

withpreviousreportsfrombulkRNA-sequencingdata,IFN-βstimulationinduceswidespread100

transcriptomicchangesobservedasashiftinthet-SNEprojectionsofsinglets(Fig.3A)24.After101

assigningeachsinglettoareferencecellpopulation17,weidentified2,686differentially102

expressedgenes(logFC>2,FDR<0.05)inatleastonecelltypeinresponsetoIFN-βstimulation103

(tableS1).Thesegenesclusterintomodulesofcelltype-specificresponsesenrichedfordistinct104

generegulatoryprocesses(Fig.3B,tableS2).Forexample,thetwoclustersofupregulated105

genes,pan-leukocyte(ClusterIII:401genes,logFC>2,FDR<0.05)andCD14+specific(ClusterI:106

767genes,logFC>2,FDR<0.05),wereenrichedforgeneralantiviralresponse(e.g.KEGG107

InfluenzaA:ClusterIIIP<1.6x10-5),chemokinesignaling(ClusterIP<7.6x10-3)andgenes108

implicatedinSLE(ClusterIP<4.4x10-3).Thefiveclustersofdownregulatedgeneswere109

certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted March 20, 2017. . https://doi.org/10.1101/118778doi: bioRxiv preprint

Page 7: Multiplexing droplet-based single cell RNA-sequencing ... · 59 barcodes of two sets of 466 cells from individuals S1 and S2 to be the same. Applied to the 60 simulated data, demuxlet

enrichedforantibacterialresponse(KEGGLegionellosis:ClusterIImonocytedownP<5.5x10-3)110

andnaturalkillercellmediatedtoxicity(ClusterIVNK/Thcelldown:P<3.6x10-2).The111

differentialexpressionusingcelltype-specificestimatesfromsinglecelldatarecoversknown112

generegulatoryprogramsaffectedbyinterferonstimulation.113

WenextcharacterizeinterindividualvariabilityinPBMCexpressionatbaselineandin114

responsetoIFN-βstimulation.Inbothcontrolandstimulatedcells,thevarianceofmean115

expressionamongindividualsissubstantiallyhigherthanexpectedfromsyntheticreplicates116

(Fig.3C).Aspreviouslyreported22,28,celltypeproportionvariedsignificantlyamongindividuals117

andcontributestovariabilityingeneexpression(fig.S7).Thevarianceestimatedfromsynthetic118

replicateswithmatchedcelltypeproportionsismoreconcordantwiththeobservedvariance119

(Lin’sconcordance=0.54versus0.022,Pearsoncorrelation=0.78versus0.69,Fig.3C-D).120

However,comparingmeanexpressionfromsyntheticreplicateswithincelltypes(Lin’s121

concordance=0.007-0.20,Pearsoncorrelation=0.27–0.68)showsthatthereis122

interindividualvariabilitynotexplainedsimplybycelltypeproportion(fig.S8).123

Wethenexploredinterindividualvariabilityinexpressionwithinonecelltype,124

CD14+CD16-monocytes.Thecorrelationofmeanexpressionbetweenpairsofsynthetic125

replicatesfromthesameindividual(>99%)wasgreaterthanbetweendifferentindividuals126

(~97%),indicatingvariationbeyondsampling(Fig.3E).Wefound585genesthathave127

significantinterindividualvariabilityinstimulatedCD14+CD16-monocytesand827incontrolby128

correlatingthesyntheticreplicatesacrossindividuals(Pearsoncorrelation,FDR<0.05).The129

variablegenesinstimulatedCD14+CD16-monocytesandtoalesserextentinCD4+Tcells(P<130

9.3x10-4and4.5x10-2,hypergeometrictest,Fig.3F)areenrichedfordifferentiallyexpressed131

certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted March 20, 2017. . https://doi.org/10.1101/118778doi: bioRxiv preprint

Page 8: Multiplexing droplet-based single cell RNA-sequencing ... · 59 barcodes of two sets of 466 cells from individuals S1 and S2 to be the same. Applied to the 60 simulated data, demuxlet

genes,consistentwithourpreviousdiscoveryofmoreIFN-βresponse-eQTLsinmonocyte-132

deriveddendriticcellsthanCD4+Tcells26,27.Wehypothesizethatnaturalgeneticvariation133

couldexplaininterindividualvariabilityingeneexpressioninourmultiplexeddata.Forexample,134

schlafenfamilymember5(SLFN5)andguanylatebindingprotein3(GBP3)expressionarehighly135

correlatedbetweenreplicatesafterIFN-βstimulation(R=0.92,P<0.0011and0.80,P<0.017).136

TheaverageexpressionofthetwosyntheticreplicatesareassociatedwithknowneQTLsin137

CD14+monocytesandlymphoblastoidcelllines,respectively(SLFN5:rs11080327P<3.1x10-4,138

GBP3:rs10493821P<2.1x10-2,Fig.3G)26,29.Theseresultssuggestthatsinglecellsequencing139

recoversrepeatableinterindividualvariationingeneexpressionandintwogenes,isassociated140

withknowngeneticdeterminants.141

Weintroducedemuxlet,anewcomputationalmethodthatenablessimpleandefficient142

samplemultiplexingfordscRNA-seq,validateitsperformanceinsimulatedandrealdata,and143

characterizesinglecellexpressionofPBMCsfromSLEpatientsinseveraldifferentconditions.144

Ourresultsdemonstratedemuxletprovidesreliableestimationofcelltypeproportionacross145

individuals,recoverscelltype-specifictranscriptionalprogramsfrommixedpopulations146

consistentwithpreviousreports,andidentifiesgeneswithinterindividualvariability24.The147

capabilitytodemultiplexandidentifydoubletsusingnaturalgeneticvariationsignificantly148

reducestheper-sampleandper-cellcostofsingle-cellRNA-sequencing,doesnotrequire149

syntheticbarcodesorsplit-poolstrategies30-34,andcapturesbiologicalvariabilityamong150

individualsampleswhilelimitingtheeffectsofunwantedtechnicalvariability.151

TheapplicationofsinglecellsequencingmethodssuchasdscRNA-seqtolargernumbers152

ofindividualsisapromisingapproachtocharacterizingcellularheterogeneityamong153

certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted March 20, 2017. . https://doi.org/10.1101/118778doi: bioRxiv preprint

Page 9: Multiplexing droplet-based single cell RNA-sequencing ... · 59 barcodes of two sets of 466 cells from individuals S1 and S2 to be the same. Applied to the 60 simulated data, demuxlet

individualsatbaselineandindifferentenvironmentalconditions,acrucialareaforfurther154

understandingofhealthanddisease35-37.Experimentalandcomputationalmethodsforreliable155

andefficientsamplemultiplexingcouldenablebroadadoptionofdroplet-basedRNA-seqfor156

population-scalestudies,facilitatinggeneticandlongitudinalanalysesinrelevantcelltypesand157

conditionsacrossarangeofsampledindividuals38. 158

certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted March 20, 2017. . https://doi.org/10.1101/118778doi: bioRxiv preprint

Page 10: Multiplexing droplet-based single cell RNA-sequencing ... · 59 barcodes of two sets of 466 cells from individuals S1 and S2 to be the same. Applied to the 60 simulated data, demuxlet

159

Figure1–Demuxlet:demultiplexinganddoubletidentificationfromsinglecelldata.160

A)Pipelineforexperimentalmultiplexingofunrelatedindividuals,loadingontodroplet-based161

singlecellRNA-sequencinginstrument,andcomputationaldemultiplexing(demux)anddoublet162

removalusingdemuxlet.Assumingequalmixingof8individuals,B)4geneticvariantscan163

recoverthesampleidentityofacell,andC)87.5%ofdoubletswillcontaincellsfromtwo164

differentsamples. 165

certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted March 20, 2017. . https://doi.org/10.1101/118778doi: bioRxiv preprint

Page 11: Multiplexing droplet-based single cell RNA-sequencing ... · 59 barcodes of two sets of 466 cells from individuals S1 and S2 to be the same. Applied to the 60 simulated data, demuxlet

166

Figure2–Performanceofdemuxlet.A)Experimentaldesignforequimolarpoolingofcells167

from8unrelatedsamples(S1-S8)intothreewells(W1-W3).W1andW2containcellsfromtwo168

disjointsetsof4individuals.W3containscellsfromall8individuals.B)Demultiplexingsingle169

cellsineachwellrecoverstheexpectedindividuals.C)Estimatesofdoubletratesversus170

previousestimatesfrommixedspeciesexperiments.D)Celltypeidentitydeterminedby171

predictiontopreviouslyannotatedPBMCdata.E)t-SNEplotoftwoindividuals(S1andS5)from172

differentwellsarequalitativelyconcordant.173

certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted March 20, 2017. . https://doi.org/10.1101/118778doi: bioRxiv preprint

Page 12: Multiplexing droplet-based single cell RNA-sequencing ... · 59 barcodes of two sets of 466 cells from individuals S1 and S2 to be the same. Applied to the 60 simulated data, demuxlet

174

Figure3–InterindividualvariabilityinIFN-βresponse.A)t-SNEplotofunstimulated(blue)and175

IFN-β-stimulated(red)PBMCsandtheestimatedcelltypes.B)Celltype-specificexpressionin176

stimulated(left)andunstimulated(right)cells.Differentiallyexpressedgenesshown(FDR<177

0.05,|log(FC)|>1).Eachcolumnrepresentscelltype-specificexpressionforeachindividual178

fromdemuxlet.C)Celltypeproportionsforeachindividualinunstimulatedandstimulated179

cells.D)Observedvariance(y-axis)inmeanexpressionoverallPBMCsfromeachindividual180

versusexpectedvariance(x-axis)oversyntheticreplicatessampledacrossallcells(lightblue,181

pink)orreplicatesmatchedforcelltypeproportion(blue,red).E)Correlationbetweensample182

replicatesincontrolandstimulatedcells.F)Numberofsignificantlyvariablegenesineachcell183

typeandcondition.G)MeanexpressionofSLFN5andGPB3intwosamplereplicateslabeledby184

genotype. 185

certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted March 20, 2017. . https://doi.org/10.1101/118778doi: bioRxiv preprint

Page 13: Multiplexing droplet-based single cell RNA-sequencing ... · 59 barcodes of two sets of 466 cells from individuals S1 and S2 to be the same. Applied to the 60 simulated data, demuxlet

Methods186

Identifyingthesampleidentityofeachsinglecell.187

Wefirstdescribethemethodtoinferthesampleidentityofeachcellintheabsenceof188

doublets.ConsiderRNA-sequencereadsfromCbarcodeddropletsmultiplexedacrossS189

differentsamples,wheretheirgenotypesareavailableacrossVexonicvariants.Let𝑑"#bethe190

numberofuniquereadsoverlappingwiththev-thvariantfromthec-thdroplet.Let𝑏"#% ∈191

𝑅, 𝐴, 𝑂 , 𝑖 ∈ 1,⋯ , 𝑑"# bethevariant-overlappingbasecallfromthei-thread,representing192

reference(R),alternate(A),andother(O)allelesrespectively.Let𝑒"#% ∈ 0,1 bealatent193

variableindicatingwhetherthebasecalliscorrect(0)ornot(1),thengiven𝑒"#% = 0, 𝑏"#% ∈194

𝑅, 𝐴 and~Binomial 2, ;<when𝑔 ∈ {0,1,2}isthetruegenotypeofsamplecorresponding195

toc-thdropletatv-thvariant.When𝑒"#% = 1,weassumethatPr(𝑏"#%|𝑔, 𝑒"#%)followstableS3.196

𝑒"#% isassumedtofollowBernoulli 10GHIJKLM where𝑞"#% isaphred-scalequalityscoreofthe197

observedbasecall.198

Weallowuncertaintyofobservedgenotypesatthev-thvariantforthes-thsample199

using𝑃P#(;) = Pr(𝑔|DataP#),theposteriorprobabilityofapossiblegenotype𝑔givenexternal200

DNAdataDataP#(e.g.sequencereads,imputedgenotypes,orarray-basedgenotypes).If201

genotypelikelihoodPr(DataP#|𝑔)isprovided(e.g.unphasedsequencereads)instead,itcanbe202

convertedtoaposteriorprobabilityscaleusing𝑃P#(;) = Pr(DataP#|𝑔)Pr(𝑔)where203

Pr 𝑔 ~Binomial 2, 𝑝# and𝑝#isthepopulationallelefrequencyofthealternateallele.To204

allowerrors𝜀intheposteriorprobability,wereplaceitto(1 − 𝜀)𝑃P#(;) + 𝜀Pr(𝑔).Theoverall205

likelihoodthatthec-thdropletoriginatedfromthes-thsampleis206

certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted March 20, 2017. . https://doi.org/10.1101/118778doi: bioRxiv preprint

Page 14: Multiplexing droplet-based single cell RNA-sequencing ... · 59 barcodes of two sets of 466 cells from individuals S1 and S2 to be the same. Applied to the 60 simulated data, demuxlet

𝐿" 𝑠 = Pr(𝑏"#%|𝑔, 𝑒)YZ[\ 𝑃P#

(;)]IJ%[Y

<;[\

^#[Y (1)

Intheabsenceofdoublets,weusethemaximumlikelihoodtodeterminethebest-matching207

sampleasargmaxP 𝐿" 𝑠 .208

209

Screeningfordropletscontainingmultiplesamples.210

Toidentifydoublets,weimplementamixturemodeltocalculatethelikelihoodthatthe211

sequencereadsoriginatedfromtwoindividuals,andthelikelihoodsarecomparedtodetermine212

whetheradropletcontainscellsfromoneortwosamples.Ifsequencereadsfromthec-th213

dropletoriginatefromtwodifferentsamples,𝑠Y, 𝑠<withmixingproportions 1 − 𝛼 ∶ 𝛼,then214

thelikelihoodin(1)canberepresentedasthefollowingmixturedistribution18,215

𝐿" 𝑠Y, 𝑠<, α = 1 − α Pr 𝑏"#% 𝑔Y, 𝑒 + 𝛼Pr 𝑏"#% 𝑔<, 𝑒YZ[\ 𝑃P#

(;L)𝑃P#(;d)]IJ

%[Y;L,;d^#[Y 216

Toreducethecomputationalcost,weconsiderdiscretevaluesofα ∈ {αY,⋯ , αe},(e.g.217

5-50%by5%).Wedeterminethatitisadoubletbetweensamples𝑠Y, 𝑠<ifandonlyif218

fghiL,id,j kI PL,Pd,lfghikI P

≥ 𝑡andthemostlikelymixingproportionisestimatedtobe219

argmaxo𝐿" 𝑠Y, 𝑠<, 𝛼 .Wedeterminethatthecellcontainsonlyasingleindividualsif220

fghiL,id,j kI PL,Pd,lfghikI P

≤ Yq.Thelessconfidentdroplets,weclassifycellsasambiguous.Whilewe221

consideronlydoubletsforestimatingdoubletrates,weremovealldoubletsandambiguous222

dropletstoconservativelyestimatesinglets.FigureS1illustratesthedistributionofsinglet,223

doubletlikelihoodsandthedecisionboundarieswhent=2wasused.224

225

IsolationandpreparationofPBMCsamples.226

certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted March 20, 2017. . https://doi.org/10.1101/118778doi: bioRxiv preprint

Page 15: Multiplexing droplet-based single cell RNA-sequencing ... · 59 barcodes of two sets of 466 cells from individuals S1 and S2 to be the same. Applied to the 60 simulated data, demuxlet

Peripheralbloodmononuclearcellswereisolatedfrompatientdonors,Ficollseparated,and227

cryopreservedbytheUCSFCoreImmunologicLaboratory(CIL).PBMCswerethawedina37°C228

waterbath,andsubsequentlywashedandresuspendedinEasySepbuffer.Cellsweretreated229

withDNAseIandincubatedfor15minatRTbeforefilteringthrougha40umcolumn.Finally,230

thecellswerewashedinEasySepandresuspendedin1xPBMSand0.04%bovineserum231

albumin.Cellsfrom8donorswerethenre-concentratedto1McellspermLandthenserially232

pooled.Ateachpoolingstage,1McellspermLwerecombinedtoresultinafinalsamplepool233

withcellsfromalldonors.234

235

IFN-βstimulationandculture.236

Priortopooling,samplesfrom8individualswereseparatedintotwoaliquotseach.Onealiquot237

ofPBMCswasactivatedby100U/mLofrecombinantIFN-β(PBLAssayScience)for6hours238

accordingtothepublishedprotocol26.Thesecondaliquotwasleftuntreated.After6hours,the239

8samplesforeachconditionwerepooledtogetherintwofinalpools(stimulatedcellsand240

controlcells)asdescribedabove.241

242

Droplet-basedcaptureandsequencing.243

Cellularsuspensionswereloadedontothe10xChromiuminstrument(10xGenomics)and244

sequencedasdescribedinZhengetal17.ThecDNAlibrariesweresequencedusingacustom245

programon10lanesofIlluminaHiSeq2500RapidMode,yielding1.8Btotalreadsand25K246

readspercell.Atthesedepths,werecovered>90%ofcapturedtranscriptsineachsequencing247

experiment.248

certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted March 20, 2017. . https://doi.org/10.1101/118778doi: bioRxiv preprint

Page 16: Multiplexing droplet-based single cell RNA-sequencing ... · 59 barcodes of two sets of 466 cells from individuals S1 and S2 to be the same. Applied to the 60 simulated data, demuxlet

249

Bulkisolationandsequencing.250

PBMCsfromlupuspatientswereisolatedandpreparedasdescribedabove.Onceresuspended251

inEasySepbuffer,theEasyEightsMagnetwasusedtosequentiallyisolateCD14+(usingthe252

EasySepHumanCD14positiveselectionkitII,cat#17858),CD19+(usingtheEasySepHuman253

CD19positiveselectionkitII,cat#17854),CD8+(EasySepHumanCD8positiveselectionkitII,254

cat#17853),andCD4+cells(EasySepHumanCD4Tcellnegativeisolationkit(cat#17952)255

accordingtothekitprotocol.RNAwasextractedusingtheRNeasyMinikit(#74104),and256

reversetranscriptionandtagmentationwereconductedaccordingtoPicellietal.usingthe257

SmartSeq2protocol39,40.AftercDNAsynthesisandtagmentation,thelibrarywasamplifiedwith258

theNexteraXTDNASamplePreparationKit(#FC-131-1096)accordingtoprotocol,startingwith259

0.2ngofcDNA.SampleswerethensequencedononelaneoftheIlluminaHiSeq4000with260

pairedend100bpreadlength,yielding350Mtotalreads.261

262

Alignmentandinitialprocessingofsinglecellsequencingdata.263

WeusedtheCellRangerv1.1andv1.2softwarewiththedefaultsettingstoprocesstheraw264

FASTQfiles,alignthesequencingreadstothehg19transcriptome,andgenerateafilteredUMI265

expressionprofileforeachcell17.TherawUMIcountsfromallcellsandgeneswithnonzero266

countsacrossthepopulationofcellswereusedtogeneratet-SNEprofiles.267

268

Celltypeclassificationandclustering.269

certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted March 20, 2017. . https://doi.org/10.1101/118778doi: bioRxiv preprint

Page 17: Multiplexing droplet-based single cell RNA-sequencing ... · 59 barcodes of two sets of 466 cells from individuals S1 and S2 to be the same. Applied to the 60 simulated data, demuxlet

ToidentifyknownimmunecellpopulationsinPBMCs,weusedtheSeuratpackagetoperform270

unbiasedclusteringonthe2.7kPBMCsfromZhengetal.,followingthepubliclyavailable271

GuidedClusteringTutorial17,41.TheFindAllMarkersfunctionwasthenusedtofindthetop20272

markersforeachofthe8identifiedcelltypes.Clusteraverageswerecalculatedbytakingthe273

averagerawcountacrossallcellsofeachcelltype.Foreachcell,wecalculatedtheSpearman274

correlationoftherawcountsofthemarkergenesandtheclusteraverages,andassignedeach275

celltothecelltypetowhichithadmaximumcorrelation.276

277

Differentialexpressionanalysis.278

Demultiplexedindividualswereusedasreplicatesfordifferentialexpressionanalysis.Foreach279

gene,rawcountsweresummedforeachindividual.WeusedtheDESeq2packagetodetect280

differentiallyexpressedgenesbetweencontrolandstimulatedconditions42.Geneswith281

baseMean>1werefilteredoutfromtheDESeq2output,andtheqvaluepackagewasusedto282

calculateFDR<0.0543.283

284

EstimationofinterindividualvariabilityinPBMCs.285

Foreachindividual,wefoundthemeanexpressionofeachgenewithnonzerocounts.The286

meanwascalculatedfromthelog2singlecellUMIcountsnormalizedtothemediancountfor287

eachcell.Tomeasureinterindividualvariability,wethencalculatedthevarianceofthemean288

expressionacrossallindividuals.Lin’sconcordancecorrelationcoefficientwasusedtocompare289

theagreementofobserveddataandsyntheticreplicates.Syntheticreplicatesweregenerated290

bysamplingwithoutreplacementeitherfromallcellsorcellsmatchedforcelltypeproportion.291

certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted March 20, 2017. . https://doi.org/10.1101/118778doi: bioRxiv preprint

Page 18: Multiplexing droplet-based single cell RNA-sequencing ... · 59 barcodes of two sets of 466 cells from individuals S1 and S2 to be the same. Applied to the 60 simulated data, demuxlet

Estimationofinterindividualvariabilitywithincelltypes.292

Foreachcelltype,wegeneratedtwobulkequivalentreplicatesforeachindividualbysumming293

rawcountsofcellssampledwithoutreplacement.WeusedDESeq2togeneratevariance-294

stabilizedcountsacrossallreplicates.Tofilterforexpressedgenes,weperformedall295

subsequentanalysesongeneswith5%ofsampleswith>0counts.Thecorrelationofreplicates296

andQTLdetectionwasperformedonthelog2normalizedcounts.Pearsoncorrelationofthe297

tworeplicatesfromeachofthe8individualswasusedtofindgeneswithsignificant298

interindividualvariability.299

300

SinglecellandbulkRNA-sequencingdatahasbeendepositedintheGeneExpressionOmnibus301

undertheaccessionnumberGSE96583.Demuxletsoftwareisfreelyavailableat302

https://github.com/hyunminkang/apigenome.303

304

1. Macosko,E.Z.etal.Highlyparallelgenome-wideexpressionprofilingofindividualcells305usingnanoliterdroplets.Cell,1611202-1214(2015).306

2. Pollen,A.A.etal.Low-coveragesingle-cellmRNAsequencingrevealscellular307heterogeneityandactivatedsignalingpathwaysindevelopingcerebralcortex.Nat308Biotech32,1053-1058(2014).309

3. Buenrostro,J.D.etal.Single-cellchromatinaccessibilityrevealsprinciplesofregulatory310variation.Nature,523486-490(2015).311

4. Nagano,T.etal.Single-cellHi-Crevealscell-to-cellvariabilityinchromosomestructure.312Nature,50259-64(2013).313

5. Patel,A.P.etal.Single-cellRNA-seqhighlightsintratumoralheterogeneityinprimary314glioblastoma.Science,3441396-1401(2014).315

6. Tirosh,I.etal.Dissectingthemulticellularecosystemofmetastaticmelanomabysingle-316cellRNA-seq.Science,352189-196(2016).317

7. Muraro,M.J.etal.Asingle-celltranscriptomeatlasofthehumanpancreas.CellSyst3,318385-394e383(2016).319

8. Baron,M.etal.Asingle-celltranscriptomicmapofthehumanandmousepancreas320revealsinter-andintra-cellpopulationstructure.CellSyst3,346-360e344(2016).321

certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted March 20, 2017. . https://doi.org/10.1101/118778doi: bioRxiv preprint

Page 19: Multiplexing droplet-based single cell RNA-sequencing ... · 59 barcodes of two sets of 466 cells from individuals S1 and S2 to be the same. Applied to the 60 simulated data, demuxlet

9. Shalek,A.K.etal.Single-cellRNA-seqrevealsdynamicparacrinecontrolofcellular322variation.Nature510,363–369(2014).323

10. Klein,A.M.etal.Dropletbarcodingforsingle-celltranscriptomicsappliedtoembryonic324stemcells.Cell161,1187-1201(2015).325

11. Stegle,O.,Teichmann,S.A.&Marioni,J.C.Computationalandanalyticalchallengesin326single-celltranscriptomics.NatRevGenet16,133-145(2015).327

12. Gawad,C.,Koh,W.&Quake,S.R.Single-cellgenomesequencing:currentstateofthe328science.NatRevGenet17,175-188(2016).329

13. Streets,A.M.etal.Microfluidicsingle-cellwhole-transcriptomesequencing.ProcNatl330AcadSci111,7048-7053(2014).331

14. Zilionis,R.etal.Single-cellbarcodingandsequencingusingdropletmicrofluidics.Nat332Protoc12,44-73(2017).333

15. Ziegenhain,C.etal.Comparativeanalysisofsingle-cellRNAsequencingmethods.Mol334Cell65,631-643.e634(2017).335

16. Hicks,S.C.,Teng,M.&Irizarry,R.A.Onthewidespreadandcriticalimpactofsystematic336biasandbatcheffectsinsingle-cellRNA-Seqdata.bioRxiv025528(2015).337

17. Zheng,G.X.Y.etal.Massivelyparalleldigitaltranscriptionalprofilingofsinglecells.Nat338Commun8,14049(2017).339

18. Jun,G.etal.DetectingandestimatingcontaminationofhumanDNAsamplesin340sequencingandarray-basedgenotypedata.AmJHumGenet91,839-848(2012).341

19. Danecek,P.etal.ThevariantcallformatandVCFtools.Bioinformatics27,2156-2158342(2011).343

20. Li,H.etal.Thesequencealignment/mapformatandSAMtools.Bioinformatics25,2078-3442079(2009).345

21. Loh,P.R.,Palamara,P.F.&Price,A.L.Fastandaccuratelong-rangephasinginaUK346Biobankcohort.NatGenet48,811-816(2016).347

22. Aguirre-Gamboa,R.etal.DifferentialeffectsofenvironmentalandgeneticfactorsonT348andBcellimmunetraits.CellRep17,2474-2487.349

23. Li,Y.etal.Afunctionalgenomicsapproachtounderstandvariationincytokine350productioninhumans.Cell167,1099-1110.e1014(2016).351

24. Mostafavi,S.etal.Parsingtheinterferontranscriptionalnetworkanditsdisease352associations.Cell164,564-578.353

25. Stark,G.R.,Kerr,I.M.,Williams,B.R.G.,Silverman,R.H.&Schreiber,R.D.Howcells354respondtointerferon.AnnuRevBiochem67,227-264(2003).355

26. Lee,M.N.etal.Commongeneticvariantsmodulatepathogen-sensingresponsesin356humandendriticcells.Science343,1246980-1246980(2014).357

27. Ye,C.J.etal.IntersectionofpopulationvariationandautoimmunitygeneticsinhumanT358cellactivation.Science345,1254665-1254665(2014).359

28. Palmer,C.,Diehn,M.,Alizadeh,A.A.&Brown,P.O.Cell-typespecificgeneexpression360profilesofleukocytesinhumanperipheralblood.BMCGenomics7,115(2006).361

29. Lappalainen,T.etal.Transcriptomeandgenomesequencinguncoversfunctional362variationinhumans.Nature501,506-511(2013).363

30. Cao,J.etal.Comprehensivesinglecelltranscriptionalprofilingofamulticellular364organismbycombinatorialindexing.bioRxiv104844(2017).365

certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted March 20, 2017. . https://doi.org/10.1101/118778doi: bioRxiv preprint

Page 20: Multiplexing droplet-based single cell RNA-sequencing ... · 59 barcodes of two sets of 466 cells from individuals S1 and S2 to be the same. Applied to the 60 simulated data, demuxlet

31. Dixit,A.etal.Perturb-Seq:Dissectingmolecularcircuitswithscalablesingle-cellRNA366profilingofpooledgeneticscreens.Cell167,1853-1866.e1817(2016).367

32. Adamson,B.etal.AMultiplexedsingle-cellCRISPRscreeningplatformenables368systematicdissectionoftheunfoldedproteinresponse.Cell167,1867-1882.e1821369(2016).370

33. Jaitin,D.A.etal.DissectingimmunecircuitsbylinkingCRISPR-pooledscreenswith371single-cellRNA-seq.Cell167,1883-1896.e1815(2016).372

34. Datlinger,P.etal.PooledCRISPRscreeningwithsingle-celltranscriptomereadout.Nat373Meth14,297-301(2017).374

35. Buettner,F.etal.Computationalanalysisofcell-to-cellheterogeneityinsingle-cellRNA-375sequencingdatarevealshiddensubpopulationsofcells.NatBiotech33,155-160(2015).376

36. Tung,P.-Y.etal.Batcheffectsandtheeffectivedesignofsingle-cellgeneexpression377studies.SciRep7,39921(2017).378

37. Tanay,A.&Regev,A.Scalingsingle-cellgenomicsfromphenomenologytomechanism.379Nature541,331-338(2017).380

38. Wills,Q.F.etal.Single-cellgeneexpressionanalysisrevealsgeneticassociationsmasked381inwhole-tissueexperiments.NatBiotech31,748-752(2013).382

39. Picelli,S.etal.Smart-seq2forsensitivefull-lengthtranscriptomeprofilinginsinglecells.383NatMeth10,1096-1098(2013).384

40. Picelli,S.etal.Full-lengthRNA-seqfromsinglecellsusingSmart-seq2.NatProtoc9,171-385181(2014).386

41. Satija,R.,Farrell,J.A.,Gennert,D.,Schier,A.F.&Regev,A.Spatialreconstructionof387single-cellgeneexpressiondata.NatBiotech33,495-502(2015).388

42. Anders,S.&Huber,W.Differentialexpressionanalysisforsequencecountdata.389GenomeBiol11,R106(2010).390

43. Dabney,A.,Storey,J.D.&Warnes,G.R.qvalue:Q-valueestimationforfalsediscovery391ratecontrol.Rpackageversion1(2010).392

certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted March 20, 2017. . https://doi.org/10.1101/118778doi: bioRxiv preprint