why you should care about statistics - jeff leek

74
@leekgroup @simplystats why you should care about sta6s6cs Jeff Leek Johns Hopkins Bloomberg Biosta6s6cs [email protected]

Upload: australian-bioinformatics-network

Post on 04-Jul-2015

470 views

Category:

Health & Medicine


1 download

TRANSCRIPT

Page 1: Why You Should Care About Statistics - Jeff Leek

@leekgroup  

@simplystats  

why  you  should  care  about  sta6s6cs  

Jeff  Leek  Johns  Hopkins  Bloomberg  Biosta6s6cs  

 

[email protected]  

Page 2: Why You Should Care About Statistics - Jeff Leek

@leekgroup  

@simplystats  

credits  

•  slides  shamelessly  borrowed  from:  –  Ingo  Ruczinski  (Google:  “ingo’s  pond”)  –  Josh  Akey  (UW  Genomics)  – Karl  Broman  (Google:  “the  stupidest  thing  broman”)  

Page 3: Why You Should Care About Statistics - Jeff Leek

@leekgroup  

@simplystats  

why  this  stuff  maNers  

Page 4: Why You Should Care About Statistics - Jeff Leek

@leekgroup  

@simplystats  

seems  like  an  exci6ng  result!  

hNp://www.nature.com/nm/journal/v12/n11/full/nm1491.html  

Page 5: Why You Should Care About Statistics - Jeff Leek

@leekgroup  

@simplystats  

stunning  problems  

Page 6: Why You Should Care About Statistics - Jeff Leek

@leekgroup  

@simplystats  

how  it  went  down  

hNp://www.nature.com/news/2011/110111/full/469139a/box/1.html  

Page 7: Why You Should Care About Statistics - Jeff Leek

@leekgroup  

@simplystats  

s6ll  going  on  

Page 8: Why You Should Care About Statistics - Jeff Leek

@leekgroup  

@simplystats  

worth  a  watch  

hNp://www.birs.ca/events/2013/5-­‐day-­‐workshops/13w5083/videos/watch/201308141121-­‐Baggerly.mp4  

Page 9: Why You Should Care About Statistics - Jeff Leek

@leekgroup  

@simplystats  

worth  a  read  

hNp://www.iom.edu/Reports/2012/Evolu6on-­‐of-­‐Transla6onal-­‐Omics.aspx  

Page 10: Why You Should Care About Statistics - Jeff Leek

@leekgroup  

@simplystats  

what  were  the  problems?  

•  irreproducibility  •  lack  of  coopera6on  

 

•  silly  predic6on  rules  •  study  design/batch  effects  •  procedures  not  locked  down    

Exper6se  

Transparency  

Page 11: Why You Should Care About Statistics - Jeff Leek

@leekgroup  

@simplystats  

6p  #1:  know  the  analysis  

hNp://bit.ly/OgW3xv  

Page 12: Why You Should Care About Statistics - Jeff Leek

@leekgroup  

@simplystats  

6p  #2:  care  about  the  analysis  

Drinkel et al. Oganometalics 2013

Page 13: Why You Should Care About Statistics - Jeff Leek

@leekgroup  

@simplystats  

6p  #3:  have  a  data/analysis  sharing  plan  

hNp://www.nature.com/nature/journal/v467/n7314/full/467401b.html  

Page 14: Why You Should Care About Statistics - Jeff Leek

@leekgroup  

@simplystats  

6p  #4:  know  where  to  get  help  

hNp://www.biostat.jhsph.edu/consult/  

Page 15: Why You Should Care About Statistics - Jeff Leek

@leekgroup  

@simplystats  

6p  #5:  no  subs6tute  for  the  real  thing  

Page 16: Why You Should Care About Statistics - Jeff Leek

@leekgroup  

@simplystats  

“central  dogma”  of  sta6s6cs  

Adapted  from  Josh  Akey  

Page 17: Why You Should Care About Statistics - Jeff Leek

@leekgroup  

@simplystats  

sample  size  

Page 18: Why You Should Care About Statistics - Jeff Leek

@leekgroup  

@simplystats  

some  experiment  

Page 19: Why You Should Care About Statistics - Jeff Leek

@leekgroup  

@simplystats  

example  calcula6ons  

Page 20: Why You Should Care About Statistics - Jeff Leek

@leekgroup  

@simplystats  

beNer  technology  ≠  no  variability  

hNp://www.nature.com/nbt/journal/v29/n7/full/nbt.1910.html  

Page 21: Why You Should Care About Statistics - Jeff Leek

@leekgroup  

@simplystats  

power  

Page 22: Why You Should Care About Statistics - Jeff Leek

@leekgroup  

@simplystats  

bad  study  design  

78%  of  genes  differen6ally  expressed    

Page 23: Why You Should Care About Statistics - Jeff Leek

@leekgroup  

@simplystats  

group  and  date  “confounded”  

Page 24: Why You Should Care About Statistics - Jeff Leek

@leekgroup  

@simplystats  

uh-­‐oh!  

Page 25: Why You Should Care About Statistics - Jeff Leek

@leekgroup  

@simplystats  

confounding:    

associa6on  between  shoe  size  and  literacy  in  kids    

Page 26: Why You Should Care About Statistics - Jeff Leek

@leekgroup  

@simplystats  

proteomics  

Page 27: Why You Should Care About Statistics - Jeff Leek

@leekgroup  

@simplystats  

proteomics  

Page 28: Why You Should Care About Statistics - Jeff Leek

@leekgroup  

@simplystats  

gene  expression  

Page 29: Why You Should Care About Statistics - Jeff Leek

@leekgroup  

@simplystats  

gene  expression  

Page 30: Why You Should Care About Statistics - Jeff Leek

@leekgroup  

@simplystats  

gwas  

Page 31: Why You Should Care About Statistics - Jeff Leek

@leekgroup  

@simplystats  

gwas  

Page 32: Why You Should Care About Statistics - Jeff Leek

@leekgroup  

@simplystats  

confounding  is  a  big  deal  

hNp://www.nature.com/nrg/journal/v11/n10/full/nrg2825.html  

Page 33: Why You Should Care About Statistics - Jeff Leek

@leekgroup  

@simplystats  

confounding  and  study  design  

Page 34: Why You Should Care About Statistics - Jeff Leek

@leekgroup  

@simplystats  

6p  #6:  randomiza6on  

Page 35: Why You Should Care About Statistics - Jeff Leek

@leekgroup  

@simplystats  

an  example  study  

Page 36: Why You Should Care About Statistics - Jeff Leek

@leekgroup  

@simplystats  

a  bad  design  

Page 37: Why You Should Care About Statistics - Jeff Leek

@leekgroup  

@simplystats  

stra6fied  design  

Page 38: Why You Should Care About Statistics - Jeff Leek

@leekgroup  

@simplystats  

more  good  study  characteris6cs  

•  Balanced  

•  Replicated  •  Has  Controls  

Page 39: Why You Should Care About Statistics - Jeff Leek

@leekgroup  

@simplystats  

6p  #7:  look  at  the  data  

hNp://en.wikipedia.org/wiki/Anscombe's_quartet  

Page 40: Why You Should Care About Statistics - Jeff Leek

@leekgroup  

@simplystats  

summarizing  data  

hNp://www.biostat.wisc.edu/~kbroman/topten_worstgraphs/  

Page 41: Why You Should Care About Statistics - Jeff Leek

@leekgroup  

@simplystats  

replicates  

Page 42: Why You Should Care About Statistics - Jeff Leek

@leekgroup  

@simplystats  

watch  the  scale!  

Page 43: Why You Should Care About Statistics - Jeff Leek

@leekgroup  

@simplystats  

log  transform  is  common/useful  

Page 44: Why You Should Care About Statistics - Jeff Leek

@leekgroup  

@simplystats  

bland-­‐altman  plots  

hNp://en.wikipedia.org/wiki/Bland%E2%80%93Altman_plot  

Page 45: Why You Should Care About Statistics - Jeff Leek

@leekgroup  

@simplystats  

beware  ridiculograms!  

Page 46: Why You Should Care About Statistics - Jeff Leek

@leekgroup  

@simplystats  

ack!  math!  

X1,…,XM

Y1,…,YN

Y =1N

Yii=1

N

X = 1M

Xii=1

M

sX2 =

1M −1

(Xi − X )2i=1

M

sY2 =

1N −1

(Yi −Y )2i=1

N

Observa6ons:  

Averages:  

SD2  or  variances:  

Page 47: Why You Should Care About Statistics - Jeff Leek

@leekgroup  

@simplystats  

an  important  issue  

Page 48: Why You Should Care About Statistics - Jeff Leek

@leekgroup  

@simplystats  

t-­‐sta%s%c:  you’ll  see  this  a  lot*  

Y − X sY2

N+

sX2

M

Invented  to  improve  beer:    hNp://en.wikipedia.org/wiki/Student's_t-­‐test  

Page 49: Why You Should Care About Statistics - Jeff Leek

@leekgroup  

@simplystats  

p-­‐values  

Original  Sta6s6c  

Page 50: Why You Should Care About Statistics - Jeff Leek

@leekgroup  

@simplystats  

how  to  calculate    

                                   {#  |Sperm|  ≥  |Sobs|}    P-­‐value  =                                          #  of  Permuta6ons  

Observed  Sta6s6c  =  2  

Page 51: Why You Should Care About Statistics - Jeff Leek

@leekgroup  

@simplystats  

6p  #8:  know  what  a  p-­‐value  is(n’t)  

The  probability  of  observing  a  sta6s6c  that  extreme  if  the  null  hypothesis  is  true.      The  p-­‐value  is  not  •  Probability  the  null  is  true  •  Probability  the  alterna6ve  is  true  •  A  measure  of  sta6s6cal  evidence  

Page 52: Why You Should Care About Statistics - Jeff Leek

@leekgroup  

@simplystats  

an  easy  mistake  to  make  

Page 53: Why You Should Care About Statistics - Jeff Leek

@leekgroup  

@simplystats  

a  problem  

Page 54: Why You Should Care About Statistics - Jeff Leek

@leekgroup  

@simplystats  

a  problem  

Page 55: Why You Should Care About Statistics - Jeff Leek

@leekgroup  

@simplystats  

a  problem  

Page 56: Why You Should Care About Statistics - Jeff Leek

@leekgroup  

@simplystats  

mul6ple  comparison  error  rates  •  Family  wise  error  rate:  

Pr(# False Positives ≥ 1)  •  False  discovery  rate:    

•  EFP  (e-­‐values)   E[# False Positives]  

E #False Positives# Of Discoveries"

# $ %

& '

Page 57: Why You Should Care About Statistics - Jeff Leek

@leekgroup  

@simplystats  

difference  in  interpreta6on  Suppose  550  out  of  10,000  genes  are  significant  at  0.05  level  

 P-­‐value  <  0.05  Expect  0.05*10,000  =  500  false  posi6ves    False  Discovery  Rate  <  0.05  Expect  0.05*550  =  27.5  false  posi6ves    Family  Wise  Error  Rate  <  0.05  The  probability  of  at  least  1  false  posi6ve  ≤  0.05          

Page 58: Why You Should Care About Statistics - Jeff Leek

@leekgroup  

@simplystats  

read  this  

hNp://www.pnas.org/content/100/16/9440.long  

Page 59: Why You Should Care About Statistics - Jeff Leek

@leekgroup  

@simplystats  

the  inevitable  

hNp://simplysta6s6cs.org/2013/08/26/sta6s6cs-­‐meme-­‐sad-­‐p-­‐value-­‐bear/  

Page 60: Why You Should Care About Statistics - Jeff Leek

@leekgroup  

@simplystats  

why  I’m  sympathe6c  

Page 61: Why You Should Care About Statistics - Jeff Leek

@leekgroup  

@simplystats  

beware  of  “hacking”  sta6s6cs  

Page 62: Why You Should Care About Statistics - Jeff Leek

@leekgroup  

@simplystats  

be  nice  to  the  poor  sta6s6cian  

Page 63: Why You Should Care About Statistics - Jeff Leek

@leekgroup  

@simplystats  

6p  #9:  correla6on  and  causa6on  

hNp://xkcd.com/552/  

Page 64: Why You Should Care About Statistics - Jeff Leek

@leekgroup  

@simplystats  

most  common  mistake  

Fit  regression  models  (correla7ons)  followed  by:    “In  summary,  our  results  support  a  causal  rela%onship  of  breasxeeding  in  infancy  with  recep6ve    language  at  age  3  and  with  verbal  and  nonverbal  IQ  at  school  age.  These  findings  support    Na6onal  and  interna6onal  recommenda6ons  to  promote  exclusive  breasxeeding  through  age  6  months  and  con6nua6on  of  breasxeeding  through  at  least  age  1  year.”  

 

Page 65: Why You Should Care About Statistics - Jeff Leek

@leekgroup  

@simplystats  

predic6on  and  associa6on  

Page 66: Why You Should Care About Statistics - Jeff Leek

@leekgroup  

@simplystats  

diagnos6cs  

Page 67: Why You Should Care About Statistics - Jeff Leek

@leekgroup  

@simplystats  

6p  #10:  know  these  quan66es  

Page 68: Why You Should Care About Statistics - Jeff Leek

@leekgroup  

@simplystats  

key  quan66es  as  frac6ons  

Page 69: Why You Should Care About Statistics - Jeff Leek

@leekgroup  

@simplystats  

important  to  keep  in  mind  

Page 70: Why You Should Care About Statistics - Jeff Leek

@leekgroup  

@simplystats  

general  popula6on  

Page 71: Why You Should Care About Statistics - Jeff Leek

@leekgroup  

@simplystats  

general  popula6on  

Page 72: Why You Should Care About Statistics - Jeff Leek

@leekgroup  

@simplystats  

at  risk  subpopula6on  

Page 73: Why You Should Care About Statistics - Jeff Leek

@leekgroup  

@simplystats  

at  risk  subpopula6on  

Page 74: Why You Should Care About Statistics - Jeff Leek

@leekgroup  

@simplystats  

summary  of  6ps  1.  know  the  analysis  2.  care  about  the  analysis  3.  have  a  data  sharing  plan  4.  know  where/when  to  get  help  5.  this  isn’t  a  subs6tute  for  learning  sta6s6cs  6.  randomize  in  your  study  design  7.  look  at  your  data  8.  know  what  p-­‐values  are(n’t)  9.  beware  causality  creep  10. know  the  key  diagnos6c  quan66es