multiplexing droplet-based single cell rna-sequencing ... · 59 barcodes of two sets of 466 cells...
TRANSCRIPT
Multiplexingdroplet-basedsinglecellRNA-sequencingusingnaturalgeneticbarcodes
Authors: Hyun Min Kang*$, Meena Subramaniam$, Sasha Targ$, Michelle Nguyen, Lenka
Maliskova,EuniceWan,SimonWong,LaurenByrnes,CristinaLanata,RachelGate,SaraMostafavi,
AlexanderMarson,NoahZaitlen,LindseyACriswell,JimmieYe*
certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted March 20, 2017. . https://doi.org/10.1101/118778doi: bioRxiv preprint
Theconfluenceofmicrofluidicandsequencingtechnologieshasenabledprofilingofthe1
transcriptome1,2,epigenome3,andchromatinconformationofsinglecells4atan2
unprecedentedscale.InitialapplicationsofsinglecellRNA-sequencinghavecharacterized3
cellularheterogeneityintumors5,6,tissues7,8,andresponsetostimulation9.Morerecently,4
droplet-basedtechnologieshavesignificantlyincreasedthethroughputofsinglecellcapture5
andlibrarypreparation1,10,enablingtranscriptomesequencingofthousandsofcellsfromone6
microfluidicreaction.7
Whileimprovementsinbiochemistry11,12andmicrofluidics13,14continuetoincreasethe8
numberofcellsthatcanbesequencedpersample,formanyapplications(e.g.differential9
expressionandgeneticstudies),sequencingthousandsofcellseachfrommanyindividuals10
wouldbettercaptureinterindividualvariabilitythansequencingmorecellsfromafew11
individuals.However,instandardworkflows,runningaseparatemicrofluidicreactionforeach12
sampleremainscostprohibitive15.Multiplexingcouldsignificantlyreducethepersamplecost13
byallowingcellsfromseveralindividualstobeprocessedsimultaneously,andreducetheper14
cellcostbyallowinghigherflowratesduetotheabilitytodetectandexcludedoubletsthat15
containcellsfromtwodifferentindividuals.Further,samplemultiplexinglimitsthetechnical16
variabilityassociatedwithsampleandlibrarypreparation,improvingstatisticalpowerto17
accuratelyestimatetruebiologicaleffects16.18
Wepresentasimpleexperimentaldesignandcomputationalalgorithm,demuxlet,to19
multiplexsamplesindscRNA-seqwithoutadditionalexperimentalmodification(Fig.1A).While20
strategiestodemultiplexcellsfromdifferentspecies1,10,17orhostandgraftsamples17have21
beenreported,nomethodisavailableforsimultaneousdemultiplexinganddoubletdetection22
certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted March 20, 2017. . https://doi.org/10.1101/118778doi: bioRxiv preprint
ofcellsfrom>2individuals.Inspiredbymodelsandalgorithmsdevelopedforcontamination23
detectioninDNAsequencingdata18,demuxletisfast,accurate,scalableandworkswith24
standardinputformats17,19,20.25
Attheheartofourstrategyisastatisticalmodelforpredictingtheprobabilityof26
observingaconsistent‘geneticbarcode’,asetofsinglenucleotidepolymorphisms(SNPs),in27
theRNA-seqreadsofasinglecellandthegenotypes(fromSNPgenotyping,imputationorDNA28
sequencing)ofdonorsamples.ThemodelaccountsforthebasequalityscoreoftheRNA-29
sequencingreadsaspreviouslydescribed18andgenotypeuncertaintiesatunobservedSNPs30
fromimputationtolargereferencepanels21.Itthenusesmaximumlikelihoodtodeterminethe31
mostlikelysampleidentityforeachcellusingamixturemodel.Asmallnumberofreads32
overlappingcommonSNPsissufficienttoaccuratelyidentifythesampleoforigin.Forapoolof33
8samples,4SNPscanuniquelyassignacelltothedonoroforigin(Fig.1B),and20SNPseach34
withminorallelefrequency(MAF)of50%candistinguisheverysamplewith98%probability.35
Themixturemodelindemuxletalsousesgeneticinformationtoidentifydoublets36
containingtwocellsfromdifferentindividuals,whichcomprisemostdropletscontaining37
multiplecells.Bymultiplexingevenasmallnumberofsamples,adoubletwillhaveahigh38
probability(1–1/N,e.g.87.5%forN=8samples)ofcontainingcellsfromtwoindividualswhich39
isdetectablebythedemuxletmodel(Fig.1C).Theabilitytorecoverthesampleidentityofeach40
cell(“demuxing”)andidentifymostdoubletsenablesexperimentaldesignsthatsignificantly41
increasethepersamplethroughputofcurrentdscRNA-seqworkflows.42
Wefirstassessthefeasibilityofourstrategyandtheperformanceofdemuxletby43
analyzingmultiplexedperipheralbloodmononuclearcells(PBMCs)from8patientswith44
certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted March 20, 2017. . https://doi.org/10.1101/118778doi: bioRxiv preprint
systemiclupuserythematosus(SLE).Usingasequentialpoolingstrategy,threepoolsof45
equimolarconcentrationsofcellsweregenerated(W1:patientsS1-S4,W2:patientsS5-S8and46
W3:patientsS1-S8)andeachloadedinawellona10XChromiumSingleCellinstrument(Fig.47
2A).3,645,4,254and6,205singlecellswereobtainedfromeachwellandsequencedtoan48
averagedepthof23k,17kand13kreadspercell.49
Demuxletidentified91%(3332/3645),91%(3864/4254),and86%(5348/6205)of50
dropletsassingletsfromwellsW1,W2andW3,respectively.25%(+/-2.6%),25%(+/-4.6%)51
and12.5%(+/-1.4%)ofsingletsfromwellsW1,W2andW3mappedtoeachdonor,consistent52
withequalmixingof8individuals.Weestimateanerrorrate(numberofcellsassignedto53
individualsnotinthemixture)of2/3332(W1)and0/3864(W2)singletsbyanalyzingwellsW154
andW2,eachcontainingcellsfromtwodisjointsetsof4individuals(Fig.2B),suggesting>99%55
ofsingletswereassignedtoindividualscorrectly.56
Wenextassesstheabilityofdemuxlettodetectdoubletsinbothsimulatedandreal57
data.466/3645(13%)cellsweresimulatedassyntheticdoubletsbysettingthecellular58
barcodesoftwosetsof466cellsfromindividualsS1andS2tobethesame.Appliedtothe59
simulateddata,demuxletidentified91%(426/466)ofsyntheticdoubletsasdoubletsor60
ambiguous,correctlyrecoveringthesampleidentityofbothcellsin403/426(95%)doublets61
(fig.S1).AppliedtorealdatafromW1,W2andW3,demuxletidentified138/3645,165/4254,62
and384/6205doublets,correspondingto5.0%,5.2%and7.1%,consistentwiththelinear63
relationshipbetweenthenumberofcellssequencedanddoubletratesestimatedusingamixed64
speciesexperiment(Fig.2C).65
certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted March 20, 2017. . https://doi.org/10.1101/118778doi: bioRxiv preprint
Sampledemultiplexingenablesindividual-specificvisualizationofsinglecelldatawecall66
‘dropprints’.Whilebothvariabilityincelltypeproportionandgeneexpressionhavebeen67
previouslyobservedinPBMCs,ithasnotbeenpossibletofullycontrolforbatcheffectsdueto68
separateprocessingofsamples22,23.Singletsidentifiedbydemuxletinallthreewellscluster69
intoknownPBMCsubpopulations(Fig.2D)andarenotconfoundedbywelltowelleffects(fig.70
S2A).Whilewefound6differentiallyexpressedgenes(FDR<0.05)betweenwellsW1andW2,71
only2genesweredifferentiallyexpressedinwellW3betweenW1andW2individuals(FDR<72
0.05)(fig.S2B)suggestingsamplemultiplexingcouldreduceconfoundingsuchaslibrary73
preparationbatcheffects.Furthermore,forthesameindividuals,dropprintsfromtwo74
differentwellsarequalitativelyconsistent,theestimatesofcelltypeproportionsforthesame75
individualsinW1orW2andW3arehighlycorrelated(R=0.99)(Fig.2Eandfig.S3),andthe76
inferredcelltype-specificexpressionprofilesarecorrelatedwithbulksequencingofsortedcell77
populations(R=0.76-0.92)(fig.S4).Theseresultsdemonstratethatdemuxletrecoversthe78
sampleidentityofsinglecellswithhighaccuracy,identifiesdoubletsattheexpectedrate,and79
canallowforcomparisonofindividualswithinandacrosswells.80
Demuxletenablesmultiplexedexperimentaldesignsthatincreasethesample81
throughputforprofilingofinterindividualresponsesacrossavarietyofconditions.Weapplied82
suchamultiplexingstrategytocharacterizecelltype-specificresponsestoIFN-β,apotent83
cytokinethatinducesgenome-scalechangesinthetranscriptionalprofilesofimmunecells24,25.84
From8lupuspatients,1MPBMCseachwereisolated,sequentiallypooled,anddividedintwo85
aliquots.OnesamplewasactivatedwithrecombinantIFN-βfor6hours,atimepointwe86
previouslyfoundtomaximizetheexpressionofinterferon-sensitivegenes(ISGs)indendritic87
certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted March 20, 2017. . https://doi.org/10.1101/118778doi: bioRxiv preprint
cells(DCs)andTcells26,27.Amatchedcontrolsamplewasalsoculturedfor6hours.Fromthis88
experiment,wecapturedandsequenced14,619controland14,446stimulatedcells.89
Incontrolandstimulatedexperiments,demuxletidentified83%(12138/14619)and84%90
(12167/14446)ofdropletsassinglets,andrecoveredthesampleidentityof99%(12127/1213891
and12155/12167)ofsinglets.Detecteddoubletsformdistinctclustersint-SNEspaceatthe92
peripheryofothercelltypes,indicativeoftheexpectedenrichmentofdoubletsformixedcell93
typesinaheterogeneouspopulation(fig.S5).Theestimateddoubletrateof10.9%isconsistent94
withpredictedratesbasedonthenumberofcellsrecovered,andtheobservedproportionof95
doubletsfromeachpairofindividualsishighlycorrelatedwiththeexpectedproportions96
(R=0.98)(Fig.2Candfig.S6).97
Demultiplexingindividualsenablestheuseofthe8sampleswithinapoolasbiological98
replicatestoquantitativelyassesscelltype-specificresponsestoIFN-βstimulation.Consistent99
withpreviousreportsfrombulkRNA-sequencingdata,IFN-βstimulationinduceswidespread100
transcriptomicchangesobservedasashiftinthet-SNEprojectionsofsinglets(Fig.3A)24.After101
assigningeachsinglettoareferencecellpopulation17,weidentified2,686differentially102
expressedgenes(logFC>2,FDR<0.05)inatleastonecelltypeinresponsetoIFN-βstimulation103
(tableS1).Thesegenesclusterintomodulesofcelltype-specificresponsesenrichedfordistinct104
generegulatoryprocesses(Fig.3B,tableS2).Forexample,thetwoclustersofupregulated105
genes,pan-leukocyte(ClusterIII:401genes,logFC>2,FDR<0.05)andCD14+specific(ClusterI:106
767genes,logFC>2,FDR<0.05),wereenrichedforgeneralantiviralresponse(e.g.KEGG107
InfluenzaA:ClusterIIIP<1.6x10-5),chemokinesignaling(ClusterIP<7.6x10-3)andgenes108
implicatedinSLE(ClusterIP<4.4x10-3).Thefiveclustersofdownregulatedgeneswere109
certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted March 20, 2017. . https://doi.org/10.1101/118778doi: bioRxiv preprint
enrichedforantibacterialresponse(KEGGLegionellosis:ClusterIImonocytedownP<5.5x10-3)110
andnaturalkillercellmediatedtoxicity(ClusterIVNK/Thcelldown:P<3.6x10-2).The111
differentialexpressionusingcelltype-specificestimatesfromsinglecelldatarecoversknown112
generegulatoryprogramsaffectedbyinterferonstimulation.113
WenextcharacterizeinterindividualvariabilityinPBMCexpressionatbaselineandin114
responsetoIFN-βstimulation.Inbothcontrolandstimulatedcells,thevarianceofmean115
expressionamongindividualsissubstantiallyhigherthanexpectedfromsyntheticreplicates116
(Fig.3C).Aspreviouslyreported22,28,celltypeproportionvariedsignificantlyamongindividuals117
andcontributestovariabilityingeneexpression(fig.S7).Thevarianceestimatedfromsynthetic118
replicateswithmatchedcelltypeproportionsismoreconcordantwiththeobservedvariance119
(Lin’sconcordance=0.54versus0.022,Pearsoncorrelation=0.78versus0.69,Fig.3C-D).120
However,comparingmeanexpressionfromsyntheticreplicateswithincelltypes(Lin’s121
concordance=0.007-0.20,Pearsoncorrelation=0.27–0.68)showsthatthereis122
interindividualvariabilitynotexplainedsimplybycelltypeproportion(fig.S8).123
Wethenexploredinterindividualvariabilityinexpressionwithinonecelltype,124
CD14+CD16-monocytes.Thecorrelationofmeanexpressionbetweenpairsofsynthetic125
replicatesfromthesameindividual(>99%)wasgreaterthanbetweendifferentindividuals126
(~97%),indicatingvariationbeyondsampling(Fig.3E).Wefound585genesthathave127
significantinterindividualvariabilityinstimulatedCD14+CD16-monocytesand827incontrolby128
correlatingthesyntheticreplicatesacrossindividuals(Pearsoncorrelation,FDR<0.05).The129
variablegenesinstimulatedCD14+CD16-monocytesandtoalesserextentinCD4+Tcells(P<130
9.3x10-4and4.5x10-2,hypergeometrictest,Fig.3F)areenrichedfordifferentiallyexpressed131
certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted March 20, 2017. . https://doi.org/10.1101/118778doi: bioRxiv preprint
genes,consistentwithourpreviousdiscoveryofmoreIFN-βresponse-eQTLsinmonocyte-132
deriveddendriticcellsthanCD4+Tcells26,27.Wehypothesizethatnaturalgeneticvariation133
couldexplaininterindividualvariabilityingeneexpressioninourmultiplexeddata.Forexample,134
schlafenfamilymember5(SLFN5)andguanylatebindingprotein3(GBP3)expressionarehighly135
correlatedbetweenreplicatesafterIFN-βstimulation(R=0.92,P<0.0011and0.80,P<0.017).136
TheaverageexpressionofthetwosyntheticreplicatesareassociatedwithknowneQTLsin137
CD14+monocytesandlymphoblastoidcelllines,respectively(SLFN5:rs11080327P<3.1x10-4,138
GBP3:rs10493821P<2.1x10-2,Fig.3G)26,29.Theseresultssuggestthatsinglecellsequencing139
recoversrepeatableinterindividualvariationingeneexpressionandintwogenes,isassociated140
withknowngeneticdeterminants.141
Weintroducedemuxlet,anewcomputationalmethodthatenablessimpleandefficient142
samplemultiplexingfordscRNA-seq,validateitsperformanceinsimulatedandrealdata,and143
characterizesinglecellexpressionofPBMCsfromSLEpatientsinseveraldifferentconditions.144
Ourresultsdemonstratedemuxletprovidesreliableestimationofcelltypeproportionacross145
individuals,recoverscelltype-specifictranscriptionalprogramsfrommixedpopulations146
consistentwithpreviousreports,andidentifiesgeneswithinterindividualvariability24.The147
capabilitytodemultiplexandidentifydoubletsusingnaturalgeneticvariationsignificantly148
reducestheper-sampleandper-cellcostofsingle-cellRNA-sequencing,doesnotrequire149
syntheticbarcodesorsplit-poolstrategies30-34,andcapturesbiologicalvariabilityamong150
individualsampleswhilelimitingtheeffectsofunwantedtechnicalvariability.151
TheapplicationofsinglecellsequencingmethodssuchasdscRNA-seqtolargernumbers152
ofindividualsisapromisingapproachtocharacterizingcellularheterogeneityamong153
certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted March 20, 2017. . https://doi.org/10.1101/118778doi: bioRxiv preprint
individualsatbaselineandindifferentenvironmentalconditions,acrucialareaforfurther154
understandingofhealthanddisease35-37.Experimentalandcomputationalmethodsforreliable155
andefficientsamplemultiplexingcouldenablebroadadoptionofdroplet-basedRNA-seqfor156
population-scalestudies,facilitatinggeneticandlongitudinalanalysesinrelevantcelltypesand157
conditionsacrossarangeofsampledindividuals38. 158
certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted March 20, 2017. . https://doi.org/10.1101/118778doi: bioRxiv preprint
159
Figure1–Demuxlet:demultiplexinganddoubletidentificationfromsinglecelldata.160
A)Pipelineforexperimentalmultiplexingofunrelatedindividuals,loadingontodroplet-based161
singlecellRNA-sequencinginstrument,andcomputationaldemultiplexing(demux)anddoublet162
removalusingdemuxlet.Assumingequalmixingof8individuals,B)4geneticvariantscan163
recoverthesampleidentityofacell,andC)87.5%ofdoubletswillcontaincellsfromtwo164
differentsamples. 165
certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted March 20, 2017. . https://doi.org/10.1101/118778doi: bioRxiv preprint
166
Figure2–Performanceofdemuxlet.A)Experimentaldesignforequimolarpoolingofcells167
from8unrelatedsamples(S1-S8)intothreewells(W1-W3).W1andW2containcellsfromtwo168
disjointsetsof4individuals.W3containscellsfromall8individuals.B)Demultiplexingsingle169
cellsineachwellrecoverstheexpectedindividuals.C)Estimatesofdoubletratesversus170
previousestimatesfrommixedspeciesexperiments.D)Celltypeidentitydeterminedby171
predictiontopreviouslyannotatedPBMCdata.E)t-SNEplotoftwoindividuals(S1andS5)from172
differentwellsarequalitativelyconcordant.173
certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted March 20, 2017. . https://doi.org/10.1101/118778doi: bioRxiv preprint
174
Figure3–InterindividualvariabilityinIFN-βresponse.A)t-SNEplotofunstimulated(blue)and175
IFN-β-stimulated(red)PBMCsandtheestimatedcelltypes.B)Celltype-specificexpressionin176
stimulated(left)andunstimulated(right)cells.Differentiallyexpressedgenesshown(FDR<177
0.05,|log(FC)|>1).Eachcolumnrepresentscelltype-specificexpressionforeachindividual178
fromdemuxlet.C)Celltypeproportionsforeachindividualinunstimulatedandstimulated179
cells.D)Observedvariance(y-axis)inmeanexpressionoverallPBMCsfromeachindividual180
versusexpectedvariance(x-axis)oversyntheticreplicatessampledacrossallcells(lightblue,181
pink)orreplicatesmatchedforcelltypeproportion(blue,red).E)Correlationbetweensample182
replicatesincontrolandstimulatedcells.F)Numberofsignificantlyvariablegenesineachcell183
typeandcondition.G)MeanexpressionofSLFN5andGPB3intwosamplereplicateslabeledby184
genotype. 185
certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted March 20, 2017. . https://doi.org/10.1101/118778doi: bioRxiv preprint
Methods186
Identifyingthesampleidentityofeachsinglecell.187
Wefirstdescribethemethodtoinferthesampleidentityofeachcellintheabsenceof188
doublets.ConsiderRNA-sequencereadsfromCbarcodeddropletsmultiplexedacrossS189
differentsamples,wheretheirgenotypesareavailableacrossVexonicvariants.Let𝑑"#bethe190
numberofuniquereadsoverlappingwiththev-thvariantfromthec-thdroplet.Let𝑏"#% ∈191
𝑅, 𝐴, 𝑂 , 𝑖 ∈ 1,⋯ , 𝑑"# bethevariant-overlappingbasecallfromthei-thread,representing192
reference(R),alternate(A),andother(O)allelesrespectively.Let𝑒"#% ∈ 0,1 bealatent193
variableindicatingwhetherthebasecalliscorrect(0)ornot(1),thengiven𝑒"#% = 0, 𝑏"#% ∈194
𝑅, 𝐴 and~Binomial 2, ;<when𝑔 ∈ {0,1,2}isthetruegenotypeofsamplecorresponding195
toc-thdropletatv-thvariant.When𝑒"#% = 1,weassumethatPr(𝑏"#%|𝑔, 𝑒"#%)followstableS3.196
𝑒"#% isassumedtofollowBernoulli 10GHIJKLM where𝑞"#% isaphred-scalequalityscoreofthe197
observedbasecall.198
Weallowuncertaintyofobservedgenotypesatthev-thvariantforthes-thsample199
using𝑃P#(;) = Pr(𝑔|DataP#),theposteriorprobabilityofapossiblegenotype𝑔givenexternal200
DNAdataDataP#(e.g.sequencereads,imputedgenotypes,orarray-basedgenotypes).If201
genotypelikelihoodPr(DataP#|𝑔)isprovided(e.g.unphasedsequencereads)instead,itcanbe202
convertedtoaposteriorprobabilityscaleusing𝑃P#(;) = Pr(DataP#|𝑔)Pr(𝑔)where203
Pr 𝑔 ~Binomial 2, 𝑝# and𝑝#isthepopulationallelefrequencyofthealternateallele.To204
allowerrors𝜀intheposteriorprobability,wereplaceitto(1 − 𝜀)𝑃P#(;) + 𝜀Pr(𝑔).Theoverall205
likelihoodthatthec-thdropletoriginatedfromthes-thsampleis206
certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted March 20, 2017. . https://doi.org/10.1101/118778doi: bioRxiv preprint
𝐿" 𝑠 = Pr(𝑏"#%|𝑔, 𝑒)YZ[\ 𝑃P#
(;)]IJ%[Y
<;[\
^#[Y (1)
Intheabsenceofdoublets,weusethemaximumlikelihoodtodeterminethebest-matching207
sampleasargmaxP 𝐿" 𝑠 .208
209
Screeningfordropletscontainingmultiplesamples.210
Toidentifydoublets,weimplementamixturemodeltocalculatethelikelihoodthatthe211
sequencereadsoriginatedfromtwoindividuals,andthelikelihoodsarecomparedtodetermine212
whetheradropletcontainscellsfromoneortwosamples.Ifsequencereadsfromthec-th213
dropletoriginatefromtwodifferentsamples,𝑠Y, 𝑠<withmixingproportions 1 − 𝛼 ∶ 𝛼,then214
thelikelihoodin(1)canberepresentedasthefollowingmixturedistribution18,215
𝐿" 𝑠Y, 𝑠<, α = 1 − α Pr 𝑏"#% 𝑔Y, 𝑒 + 𝛼Pr 𝑏"#% 𝑔<, 𝑒YZ[\ 𝑃P#
(;L)𝑃P#(;d)]IJ
%[Y;L,;d^#[Y 216
Toreducethecomputationalcost,weconsiderdiscretevaluesofα ∈ {αY,⋯ , αe},(e.g.217
5-50%by5%).Wedeterminethatitisadoubletbetweensamples𝑠Y, 𝑠<ifandonlyif218
fghiL,id,j kI PL,Pd,lfghikI P
≥ 𝑡andthemostlikelymixingproportionisestimatedtobe219
argmaxo𝐿" 𝑠Y, 𝑠<, 𝛼 .Wedeterminethatthecellcontainsonlyasingleindividualsif220
fghiL,id,j kI PL,Pd,lfghikI P
≤ Yq.Thelessconfidentdroplets,weclassifycellsasambiguous.Whilewe221
consideronlydoubletsforestimatingdoubletrates,weremovealldoubletsandambiguous222
dropletstoconservativelyestimatesinglets.FigureS1illustratesthedistributionofsinglet,223
doubletlikelihoodsandthedecisionboundarieswhent=2wasused.224
225
IsolationandpreparationofPBMCsamples.226
certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted March 20, 2017. . https://doi.org/10.1101/118778doi: bioRxiv preprint
Peripheralbloodmononuclearcellswereisolatedfrompatientdonors,Ficollseparated,and227
cryopreservedbytheUCSFCoreImmunologicLaboratory(CIL).PBMCswerethawedina37°C228
waterbath,andsubsequentlywashedandresuspendedinEasySepbuffer.Cellsweretreated229
withDNAseIandincubatedfor15minatRTbeforefilteringthrougha40umcolumn.Finally,230
thecellswerewashedinEasySepandresuspendedin1xPBMSand0.04%bovineserum231
albumin.Cellsfrom8donorswerethenre-concentratedto1McellspermLandthenserially232
pooled.Ateachpoolingstage,1McellspermLwerecombinedtoresultinafinalsamplepool233
withcellsfromalldonors.234
235
IFN-βstimulationandculture.236
Priortopooling,samplesfrom8individualswereseparatedintotwoaliquotseach.Onealiquot237
ofPBMCswasactivatedby100U/mLofrecombinantIFN-β(PBLAssayScience)for6hours238
accordingtothepublishedprotocol26.Thesecondaliquotwasleftuntreated.After6hours,the239
8samplesforeachconditionwerepooledtogetherintwofinalpools(stimulatedcellsand240
controlcells)asdescribedabove.241
242
Droplet-basedcaptureandsequencing.243
Cellularsuspensionswereloadedontothe10xChromiuminstrument(10xGenomics)and244
sequencedasdescribedinZhengetal17.ThecDNAlibrariesweresequencedusingacustom245
programon10lanesofIlluminaHiSeq2500RapidMode,yielding1.8Btotalreadsand25K246
readspercell.Atthesedepths,werecovered>90%ofcapturedtranscriptsineachsequencing247
experiment.248
certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted March 20, 2017. . https://doi.org/10.1101/118778doi: bioRxiv preprint
249
Bulkisolationandsequencing.250
PBMCsfromlupuspatientswereisolatedandpreparedasdescribedabove.Onceresuspended251
inEasySepbuffer,theEasyEightsMagnetwasusedtosequentiallyisolateCD14+(usingthe252
EasySepHumanCD14positiveselectionkitII,cat#17858),CD19+(usingtheEasySepHuman253
CD19positiveselectionkitII,cat#17854),CD8+(EasySepHumanCD8positiveselectionkitII,254
cat#17853),andCD4+cells(EasySepHumanCD4Tcellnegativeisolationkit(cat#17952)255
accordingtothekitprotocol.RNAwasextractedusingtheRNeasyMinikit(#74104),and256
reversetranscriptionandtagmentationwereconductedaccordingtoPicellietal.usingthe257
SmartSeq2protocol39,40.AftercDNAsynthesisandtagmentation,thelibrarywasamplifiedwith258
theNexteraXTDNASamplePreparationKit(#FC-131-1096)accordingtoprotocol,startingwith259
0.2ngofcDNA.SampleswerethensequencedononelaneoftheIlluminaHiSeq4000with260
pairedend100bpreadlength,yielding350Mtotalreads.261
262
Alignmentandinitialprocessingofsinglecellsequencingdata.263
WeusedtheCellRangerv1.1andv1.2softwarewiththedefaultsettingstoprocesstheraw264
FASTQfiles,alignthesequencingreadstothehg19transcriptome,andgenerateafilteredUMI265
expressionprofileforeachcell17.TherawUMIcountsfromallcellsandgeneswithnonzero266
countsacrossthepopulationofcellswereusedtogeneratet-SNEprofiles.267
268
Celltypeclassificationandclustering.269
certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted March 20, 2017. . https://doi.org/10.1101/118778doi: bioRxiv preprint
ToidentifyknownimmunecellpopulationsinPBMCs,weusedtheSeuratpackagetoperform270
unbiasedclusteringonthe2.7kPBMCsfromZhengetal.,followingthepubliclyavailable271
GuidedClusteringTutorial17,41.TheFindAllMarkersfunctionwasthenusedtofindthetop20272
markersforeachofthe8identifiedcelltypes.Clusteraverageswerecalculatedbytakingthe273
averagerawcountacrossallcellsofeachcelltype.Foreachcell,wecalculatedtheSpearman274
correlationoftherawcountsofthemarkergenesandtheclusteraverages,andassignedeach275
celltothecelltypetowhichithadmaximumcorrelation.276
277
Differentialexpressionanalysis.278
Demultiplexedindividualswereusedasreplicatesfordifferentialexpressionanalysis.Foreach279
gene,rawcountsweresummedforeachindividual.WeusedtheDESeq2packagetodetect280
differentiallyexpressedgenesbetweencontrolandstimulatedconditions42.Geneswith281
baseMean>1werefilteredoutfromtheDESeq2output,andtheqvaluepackagewasusedto282
calculateFDR<0.0543.283
284
EstimationofinterindividualvariabilityinPBMCs.285
Foreachindividual,wefoundthemeanexpressionofeachgenewithnonzerocounts.The286
meanwascalculatedfromthelog2singlecellUMIcountsnormalizedtothemediancountfor287
eachcell.Tomeasureinterindividualvariability,wethencalculatedthevarianceofthemean288
expressionacrossallindividuals.Lin’sconcordancecorrelationcoefficientwasusedtocompare289
theagreementofobserveddataandsyntheticreplicates.Syntheticreplicatesweregenerated290
bysamplingwithoutreplacementeitherfromallcellsorcellsmatchedforcelltypeproportion.291
certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted March 20, 2017. . https://doi.org/10.1101/118778doi: bioRxiv preprint
Estimationofinterindividualvariabilitywithincelltypes.292
Foreachcelltype,wegeneratedtwobulkequivalentreplicatesforeachindividualbysumming293
rawcountsofcellssampledwithoutreplacement.WeusedDESeq2togeneratevariance-294
stabilizedcountsacrossallreplicates.Tofilterforexpressedgenes,weperformedall295
subsequentanalysesongeneswith5%ofsampleswith>0counts.Thecorrelationofreplicates296
andQTLdetectionwasperformedonthelog2normalizedcounts.Pearsoncorrelationofthe297
tworeplicatesfromeachofthe8individualswasusedtofindgeneswithsignificant298
interindividualvariability.299
300
SinglecellandbulkRNA-sequencingdatahasbeendepositedintheGeneExpressionOmnibus301
undertheaccessionnumberGSE96583.Demuxletsoftwareisfreelyavailableat302
https://github.com/hyunminkang/apigenome.303
304
1. Macosko,E.Z.etal.Highlyparallelgenome-wideexpressionprofilingofindividualcells305usingnanoliterdroplets.Cell,1611202-1214(2015).306
2. Pollen,A.A.etal.Low-coveragesingle-cellmRNAsequencingrevealscellular307heterogeneityandactivatedsignalingpathwaysindevelopingcerebralcortex.Nat308Biotech32,1053-1058(2014).309
3. Buenrostro,J.D.etal.Single-cellchromatinaccessibilityrevealsprinciplesofregulatory310variation.Nature,523486-490(2015).311
4. Nagano,T.etal.Single-cellHi-Crevealscell-to-cellvariabilityinchromosomestructure.312Nature,50259-64(2013).313
5. Patel,A.P.etal.Single-cellRNA-seqhighlightsintratumoralheterogeneityinprimary314glioblastoma.Science,3441396-1401(2014).315
6. Tirosh,I.etal.Dissectingthemulticellularecosystemofmetastaticmelanomabysingle-316cellRNA-seq.Science,352189-196(2016).317
7. Muraro,M.J.etal.Asingle-celltranscriptomeatlasofthehumanpancreas.CellSyst3,318385-394e383(2016).319
8. Baron,M.etal.Asingle-celltranscriptomicmapofthehumanandmousepancreas320revealsinter-andintra-cellpopulationstructure.CellSyst3,346-360e344(2016).321
certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted March 20, 2017. . https://doi.org/10.1101/118778doi: bioRxiv preprint
9. Shalek,A.K.etal.Single-cellRNA-seqrevealsdynamicparacrinecontrolofcellular322variation.Nature510,363–369(2014).323
10. Klein,A.M.etal.Dropletbarcodingforsingle-celltranscriptomicsappliedtoembryonic324stemcells.Cell161,1187-1201(2015).325
11. Stegle,O.,Teichmann,S.A.&Marioni,J.C.Computationalandanalyticalchallengesin326single-celltranscriptomics.NatRevGenet16,133-145(2015).327
12. Gawad,C.,Koh,W.&Quake,S.R.Single-cellgenomesequencing:currentstateofthe328science.NatRevGenet17,175-188(2016).329
13. Streets,A.M.etal.Microfluidicsingle-cellwhole-transcriptomesequencing.ProcNatl330AcadSci111,7048-7053(2014).331
14. Zilionis,R.etal.Single-cellbarcodingandsequencingusingdropletmicrofluidics.Nat332Protoc12,44-73(2017).333
15. Ziegenhain,C.etal.Comparativeanalysisofsingle-cellRNAsequencingmethods.Mol334Cell65,631-643.e634(2017).335
16. Hicks,S.C.,Teng,M.&Irizarry,R.A.Onthewidespreadandcriticalimpactofsystematic336biasandbatcheffectsinsingle-cellRNA-Seqdata.bioRxiv025528(2015).337
17. Zheng,G.X.Y.etal.Massivelyparalleldigitaltranscriptionalprofilingofsinglecells.Nat338Commun8,14049(2017).339
18. Jun,G.etal.DetectingandestimatingcontaminationofhumanDNAsamplesin340sequencingandarray-basedgenotypedata.AmJHumGenet91,839-848(2012).341
19. Danecek,P.etal.ThevariantcallformatandVCFtools.Bioinformatics27,2156-2158342(2011).343
20. Li,H.etal.Thesequencealignment/mapformatandSAMtools.Bioinformatics25,2078-3442079(2009).345
21. Loh,P.R.,Palamara,P.F.&Price,A.L.Fastandaccuratelong-rangephasinginaUK346Biobankcohort.NatGenet48,811-816(2016).347
22. Aguirre-Gamboa,R.etal.DifferentialeffectsofenvironmentalandgeneticfactorsonT348andBcellimmunetraits.CellRep17,2474-2487.349
23. Li,Y.etal.Afunctionalgenomicsapproachtounderstandvariationincytokine350productioninhumans.Cell167,1099-1110.e1014(2016).351
24. Mostafavi,S.etal.Parsingtheinterferontranscriptionalnetworkanditsdisease352associations.Cell164,564-578.353
25. Stark,G.R.,Kerr,I.M.,Williams,B.R.G.,Silverman,R.H.&Schreiber,R.D.Howcells354respondtointerferon.AnnuRevBiochem67,227-264(2003).355
26. Lee,M.N.etal.Commongeneticvariantsmodulatepathogen-sensingresponsesin356humandendriticcells.Science343,1246980-1246980(2014).357
27. Ye,C.J.etal.IntersectionofpopulationvariationandautoimmunitygeneticsinhumanT358cellactivation.Science345,1254665-1254665(2014).359
28. Palmer,C.,Diehn,M.,Alizadeh,A.A.&Brown,P.O.Cell-typespecificgeneexpression360profilesofleukocytesinhumanperipheralblood.BMCGenomics7,115(2006).361
29. Lappalainen,T.etal.Transcriptomeandgenomesequencinguncoversfunctional362variationinhumans.Nature501,506-511(2013).363
30. Cao,J.etal.Comprehensivesinglecelltranscriptionalprofilingofamulticellular364organismbycombinatorialindexing.bioRxiv104844(2017).365
certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted March 20, 2017. . https://doi.org/10.1101/118778doi: bioRxiv preprint
31. Dixit,A.etal.Perturb-Seq:Dissectingmolecularcircuitswithscalablesingle-cellRNA366profilingofpooledgeneticscreens.Cell167,1853-1866.e1817(2016).367
32. Adamson,B.etal.AMultiplexedsingle-cellCRISPRscreeningplatformenables368systematicdissectionoftheunfoldedproteinresponse.Cell167,1867-1882.e1821369(2016).370
33. Jaitin,D.A.etal.DissectingimmunecircuitsbylinkingCRISPR-pooledscreenswith371single-cellRNA-seq.Cell167,1883-1896.e1815(2016).372
34. Datlinger,P.etal.PooledCRISPRscreeningwithsingle-celltranscriptomereadout.Nat373Meth14,297-301(2017).374
35. Buettner,F.etal.Computationalanalysisofcell-to-cellheterogeneityinsingle-cellRNA-375sequencingdatarevealshiddensubpopulationsofcells.NatBiotech33,155-160(2015).376
36. Tung,P.-Y.etal.Batcheffectsandtheeffectivedesignofsingle-cellgeneexpression377studies.SciRep7,39921(2017).378
37. Tanay,A.&Regev,A.Scalingsingle-cellgenomicsfromphenomenologytomechanism.379Nature541,331-338(2017).380
38. Wills,Q.F.etal.Single-cellgeneexpressionanalysisrevealsgeneticassociationsmasked381inwhole-tissueexperiments.NatBiotech31,748-752(2013).382
39. Picelli,S.etal.Smart-seq2forsensitivefull-lengthtranscriptomeprofilinginsinglecells.383NatMeth10,1096-1098(2013).384
40. Picelli,S.etal.Full-lengthRNA-seqfromsinglecellsusingSmart-seq2.NatProtoc9,171-385181(2014).386
41. Satija,R.,Farrell,J.A.,Gennert,D.,Schier,A.F.&Regev,A.Spatialreconstructionof387single-cellgeneexpressiondata.NatBiotech33,495-502(2015).388
42. Anders,S.&Huber,W.Differentialexpressionanalysisforsequencecountdata.389GenomeBiol11,R106(2010).390
43. Dabney,A.,Storey,J.D.&Warnes,G.R.qvalue:Q-valueestimationforfalsediscovery391ratecontrol.Rpackageversion1(2010).392
certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted March 20, 2017. . https://doi.org/10.1101/118778doi: bioRxiv preprint