mlb final project

15
1 Total Payroll vs. Winning Percentage In Major League Baseball Bayesian Statistics Fall, 2014 Lingwen He Zijian Su Xiangyu Li Padraic O’Shea

Upload: lingwen-he

Post on 28-Jan-2018

150 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: MLB Final Project

1    

       

Total  Payroll  vs.    Winning  Percentage    

In  Major  League  Baseball  Bayesian  Statistics  

Fall,  2014      

Lingwen  He  Zijian  Su  Xiangyu  Li  

Padraic  O’Shea    

Page 2: MLB Final Project

2    

Introduction    

Major  League  Baseball  (MLB)  is  the  last  professional  sport  in  America  to  have  not  adopted  a  salary  cap.  The  lack  of  a  salary  cap  has  led  to  large  differences  in  the  total  payroll  for  big  market  teams  vs.  small  market  teams.  This  glaring  difference  in  total  payroll  has  fed  the  ongoing  discussion  of  whether  or  not  teams  can  “buy”  wins  by  spending  more  money.  To  investigate  whether  teams  that  spend  more  money  have  higher  winning  percentages,  we  will  explore  the  existence  of  a  linear  relationship  between  average  total  payroll  and  average  winning  percentage  of  MLB  teams  from  2004  to  2012.  

 Methods      Data  

From  Baseball-­‐Reference.com  we  acquired  data  on  regular  season  winning  percentage  by  team.  This  data  can  be  accessed  from  the  following  link:  http://www.baseball-­‐reference.com/leagues/MLB/.  Data  on  total  payroll,  by  team,  was  acquired  through  USA  Today.  A  link  to  that  data  is  provided  here:  http://content.usatoday.com/sportsdata/baseball/mlb/salaries/team/2004.    

To  explore  the  linear  relationship  between  total  payroll  and  winning  percentage  over  time  one  data  point  for  each  team  was  needed.  To  calculate  these  data  points  winning  percentage  and  total  payroll  were  collected  from  the  2004  to  2012  seasons  and  averaged  by  team.  The  predictor  variable  was  re-­‐scaled,  by  dividing  by  a  million,  to  increase  the  size  of  the  coefficient.  Initial  inference  on  the  ‘averaged’  dataset  did  not  indicate  a  severe  violation  of  the  assumption  of  normality  but  the  normal-­‐QQ  plot  was  not  perfectly  linear.  Three  potential  outliers  were  also  identified  while  performing  inference.  

One  at  a  time,  the  possible  outliers  were  removed  and  analysis  completed  using  residual  plots  and  QQ  plots.  We  found,  overall,  that  removing  the  points  did  not  improve  the  model  or  the  fit  of  the  distribution.  Therefore,  the  dataset  containing  all  points  was  used  for  the  analysis.  The  plots  used  for  inference  can  be  found  in  Appendix  C.  

To  begin  understanding  the  data,  descriptive  statistics  may  be  considered.  The  mean,  median  and  standard  deviation  for  each  variable  under  consideration  is  given  in  Table  1  below.  Predictably  we  see  that  average  total  salary  appears  skewed  to  the  right  as  the  mean  is  greater  than  the  median.  Average  winning  percentage  appears  to  have  a  relatively  normal  distribution.  Additionally,  standard  deviation  is  fairly  large,  especially  for  total  salary.  

     

Page 3: MLB Final Project

3    

Data  (Avg.  2004  to  2012)  

Mean   Standard  Deviation  

Minimum   Median   Maximum  

Total  Salary   84.657   32.693   44.046   74.752   199.368  Winning  %   0.5003   0.0414   0.40   0.50   0.58  

Table  1.  Descriptive  Statistics  for  Study  Variables  Total  Salary  in  Millions    

 Statistical  Method  

We  hypothesize  that  average  total  payroll  and  average  winning  percentage  are  linearly  associated.  To  assess  this  relationship,  Bayesian  simple  linear  regressions  will  be  utilized  with  average  winning  percentage  as  the  response.  Two  methods  will  be  used  to  explore  this  linear  relationship.  Firstly,  a  non-­‐informative  prior  to  illustrate  the  lack  of  prior  knowledge  about  the  effects  of  salary  on  winning  percentage.  Next,  an  informative  prior  based  on  our  prior  beliefs.  The  two  methods’  predictive  outputs  will  then  be  compared.  

For  the  informative  prior  a  N(0.5,  0.05)  for  beta0  is  used  as  our  expectation  for  the  winning  percentage  is  50%  with  small  variance.  For  beta1  a  N(0.1,  100)  is  used  due  to  the  lack  of  knowledge  and  an  expectation  that  this  rate  will  be  positive,  but  not  overly  large.  Our  expectation  for  the  variance  of  beta1  is  that  it  will  be  large.  

Convergence  was  assessed  via  OpenBUGS  output  by  history  plots,  auto-­‐correlation  plots  and  MC_error  values.  Due  to  rapid  convergence,  only  one  chain  was  used  for  the  MCMC  integration.  However,  this  meant  BGR  plots  could  not  be  used  to  assess  burn-­‐in.  In  an  effort  to  exclude  initial  values,  since  they  were  based  on  intuition  and  likely  not  representative  of  the  posterior  distribution,  a  3000  sample  burn-­‐in  was  used.  

     

Page 4: MLB Final Project

4    

Results  The  results  of  the  Bayesian  simple  linear  regression  models  performed  using  R  are  

given  below.  Figure  1  contains  the  node  statistics  for  the  non-­‐informative  prior.  The  history  plots  and  auto-­‐correlation  plots  used  for  assessing  convergence  in  the  non-­‐informative  prior  model  can  be  found  in  Appendix  A.  

  mean sd MC_error val2.5pc median val97.5pc start sample

beta0 0.5002 0.006425 6.46E-5 0.4876 0.5002 0.5131 3001 12000 beta1 7.962E-4 2.01E-4 1.935E-6 4.009E-4 7.949E-4 0.001184 3001 12000 mu[1] 0.4835 0.007636 7.081E-5 0.4684 0.4835 0.4989 3001 12000 mu[2] 0.5043 0.006521 6.684E-5 0.4914 0.5043 0.5171 3001 12000 mu[3] 0.4925 0.006691 6.445E-5 0.4792 0.4924 0.5059 3001 12000 mu[4] 0.5449 0.01305 1.345E-4 0.5189 0.5449 0.5704 3001 12000 mu[5] 0.5199 0.008179 8.614E-5 0.5033 0.52 0.536 3001 12000 mu[6] 0.5124 0.007157 7.505E-5 0.498 0.5125 0.5266 3001 12000 mu[7] 0.4873 0.007166 6.734E-5 0.4732 0.4872 0.5018 3001 12000 mu[8] 0.4809 0.008022 7.386E-5 0.465 0.4808 0.497 3001 12000 mu[9] 0.4862 0.007292 6.824E-5 0.4718 0.4862 0.501 3001 12000 mu[10] 0.5131 0.007238 7.598E-5 0.4986 0.5132 0.5275 3001 12000 mu[11] 0.499 0.006429 6.422E-5 0.4864 0.499 0.512 3001 12000 mu[12] 0.4767 0.008688 7.94E-5 0.4594 0.4767 0.4943 3001 12000 mu[13] 0.5241 0.008868 9.323E-5 0.5062 0.5242 0.5415 3001 12000 mu[14] 0.5121 0.00713 7.474E-5 0.4978 0.5122 0.5262 3001 12000 mu[15] 0.4716 0.009598 8.729E-5 0.4525 0.4716 0.491 3001 12000 mu[16] 0.4878 0.007113 6.698E-5 0.4738 0.4877 0.5022 3001 12000 mu[17] 0.4922 0.006711 6.455E-5 0.4789 0.4922 0.5057 3001 12000 mu[18] 0.4781 0.008451 7.74E-5 0.4614 0.4781 0.4951 3001 12000 mu[19] 0.5256 0.009123 9.582E-5 0.5072 0.5257 0.5434 3001 12000 mu[20] 0.5916 0.02402 2.405E-4 0.5445 0.5915 0.6389 3001 12000 mu[21] 0.4806 0.008057 7.414E-5 0.4647 0.4806 0.4968 3001 12000 mu[22] 0.5272 0.00943 9.892E-5 0.5083 0.5274 0.5457 3001 12000 mu[23] 0.4679 0.01032 9.374E-5 0.4475 0.4678 0.4886 3001 12000 mu[24] 0.4773 0.008586 7.852E-5 0.4602 0.4773 0.4947 3001 12000 mu[25] 0.5077 0.006722 6.976E-5 0.4943 0.5078 0.5211 3001 12000 mu[26] 0.5067 0.006652 6.882E-5 0.4935 0.5068 0.5199 3001 12000 mu[27] 0.5082 0.006758 7.023E-5 0.4947 0.5082 0.5216 3001 12000 mu[28] 0.4685 0.0102 9.27E-5 0.4483 0.4684 0.4889 3001 12000 mu[29] 0.4905 0.006852 6.532E-5 0.4769 0.4904 0.5042 3001 12000 mu[30] 0.4884 0.007049 6.655E-5 0.4745 0.4883 0.5027 3001 12000 postprob 0.9998 0.01581 1.431E-4 1.0 1.0 1.0 3001 12000 sigma 0.03472 0.004829 4.703E-5 0.02672 0.03418 0.04542 3001 12000 tausq 876.2 235.3 2.254 484.9 856.2 1401.0 3001 12000

Figure  1.  Node  Statistics  for  Non-­‐Informative  Prior                  

Page 5: MLB Final Project

5    

Figure  2  contains  the  node  statistics  for  the  informative  prior.  The  history  plots  and  auto-­‐correlation  plots  used  for  assessing  convergence  in  the  informative  prior  model  can  be  found  in  Appendix  A.  

 mean sd MC_error val2.5pc median val97.5pc start sample

beta0 0.5002 0.006425 6.46E-5 0.4876 0.5002 0.5131 3001 12000 beta1 7.966E-4 2.01E-4 1.935E-6 4.013E-4 7.952E-4 0.001184 3001 12000 mu[1] 0.4835 0.007636 7.081E-5 0.4684 0.4835 0.4989 3001 12000 mu[2] 0.5043 0.006521 6.684E-5 0.4914 0.5043 0.5171 3001 12000 mu[3] 0.4925 0.006691 6.445E-5 0.4792 0.4924 0.5059 3001 12000 mu[4] 0.5449 0.01305 1.345E-4 0.5189 0.5449 0.5705 3001 12000 mu[5] 0.52 0.008179 8.614E-5 0.5033 0.52 0.536 3001 12000 mu[6] 0.5124 0.007157 7.506E-5 0.498 0.5125 0.5266 3001 12000 mu[7] 0.4873 0.007166 6.734E-5 0.4732 0.4872 0.5018 3001 12000 mu[8] 0.4808 0.008022 7.386E-5 0.465 0.4808 0.497 3001 12000 mu[9] 0.4862 0.007292 6.824E-5 0.4718 0.4862 0.501 3001 12000 mu[10] 0.5131 0.007238 7.598E-5 0.4986 0.5132 0.5275 3001 12000 mu[11] 0.499 0.006429 6.421E-5 0.4864 0.499 0.512 3001 12000 mu[12] 0.4767 0.008688 7.94E-5 0.4593 0.4767 0.4943 3001 12000 mu[13] 0.5241 0.008868 9.323E-5 0.5062 0.5242 0.5415 3001 12000 mu[14] 0.5122 0.00713 7.474E-5 0.4978 0.5122 0.5262 3001 12000 mu[15] 0.4716 0.009598 8.73E-5 0.4525 0.4716 0.4909 3001 12000 mu[16] 0.4878 0.007113 6.698E-5 0.4738 0.4877 0.5022 3001 12000 mu[17] 0.4922 0.006711 6.455E-5 0.4789 0.4922 0.5057 3001 12000 mu[18] 0.4781 0.008451 7.74E-5 0.4614 0.4781 0.4951 3001 12000 mu[19] 0.5256 0.009123 9.582E-5 0.5072 0.5257 0.5434 3001 12000 mu[20] 0.5916 0.02402 2.405E-4 0.5445 0.5915 0.639 3001 12000 mu[21] 0.4806 0.008057 7.414E-5 0.4646 0.4806 0.4968 3001 12000 mu[22] 0.5273 0.00943 9.892E-5 0.5083 0.5274 0.5457 3001 12000 mu[23] 0.4679 0.01032 9.374E-5 0.4475 0.4678 0.4885 3001 12000 mu[24] 0.4773 0.008585 7.853E-5 0.4602 0.4773 0.4947 3001 12000 mu[25] 0.5077 0.006722 6.976E-5 0.4943 0.5078 0.5211 3001 12000 mu[26] 0.5067 0.006652 6.882E-5 0.4935 0.5068 0.5199 3001 12000 mu[27] 0.5082 0.006758 7.024E-5 0.4947 0.5082 0.5216 3001 12000 mu[28] 0.4685 0.0102 9.27E-5 0.4483 0.4684 0.4889 3001 12000 mu[29] 0.4905 0.006852 6.532E-5 0.4769 0.4904 0.5042 3001 12000 mu[30] 0.4884 0.007049 6.655E-5 0.4745 0.4883 0.5027 3001 12000 postprob 0.9998 0.01581 1.431E-4 1.0 1.0 1.0 3001 12000 sigma 0.03472 0.004829 4.704E-5 0.02672 0.03418 0.04542 3001 12000 tausq 876.2 235.3 2.254 484.9 856.2 1401.0 3001 12000  

Figure  2.  Node  Statistics  for  Informative  Prior    

Discussion    

Based  on  our  analyses,  we  found  a  positive  relationship  between  average  total  payroll  and  average  winning  percentage  in  Major  League  Baseball  for  the  years  2004  to  2012.  For  both  the  non-­‐informative  and  informative  methods,  the  statistics  for  postprob  indicate  that  Pr(β1≥0|{y})  is  about  0.9998.  Or  in  other  words,  there  is  a  greater  than  99%  chance  that  beta1  >  0.  These  findings  are  similarly  supported  by  the  means  and  positive  95%  credible  sets  for  beta1.  Therefore,  there  does  appear  to  be  a  linear  association  between  average  total  payroll  and  average  winning  percentage  for  MLB  teams.  

Page 6: MLB Final Project

6    

There  was  very  little  difference  in  the  results  of  the  non-­‐informative  and  informative  priors.  Our  belief  is  that  this  is  due  to  the  informative  prior  being  very  consistent  with  the  data.  The  mean  and  median  for  the  informative  prior  are  actually  slightly  larger  than  those  of  the  non-­‐informative  prior.  This  may  be  an  indication  that  our  non-­‐informative  prior  fits  the  data  better,  but  the  difference  is  very  small.  

If  further  exploration  of  the  linear  relationship  between  average  total  payroll  and  average  winning  percentage  for  MLB  teams  was  completed  more  information  about  parameters  would  help  improve  the  analysis.    Additionally,  if  an  ‘averaged’  data  set  was  used  in  follow-­‐up  exploration,  including  more  years  would  be  advised.  Finally,  although  it  appears  that  a  positive  relationship  exists  between  total  payroll  and  winning  percentage  based  on  this  analysis.  It  would  be  important  to  explore  the  ongoing  changes  in  the  league.  Most  notably,  the  debate  on  the  use  of  statistics  for  calculating  wins  based  on  on-­‐base-­‐percentage  rather  than  traditional  baseball  measurements  for  success.  This  ongoing  development  is  having  an  impact  on  perceived  value  for  many  players  and  may  drastically  affect  a  team’s  salary  and  winning  percentage.  

 References    

Our  dataset  was  constructed  by  combining  the  historical  Major  League  Baseball  team  salaries  and  winning  percentage.  This  data  was  drawn  from  the  same  time  period,  2004  to  2012,  for  both  variables.  Links  to  these  MLB  data  sources  can  be  found  below:    Baseball-­‐Refernce.com.  (2014).  Team  Wins.  Retrieved  from  http://www.baseball-­‐reference.com/leagues/MLB/.    USA  Today.  (2014).  USATODAY  Salaries  Database,  MLB  salaries  by  team  for  various  years  (2004  to  2014).  Retrieved  from  http://content.usatoday.com/sportsdata/baseball/mlb/salaries/team/2004.  

 Appendix  A:  MCMC  Integration  Convergence    

Figure  3  and  4  below  are  the  history  plots  and  auto-­‐correlation  plots,  respectively,  for  the  non-­‐informative  prior.  From  these  plots  it  was  assessed  that  convergence  occurred  quickly  for  every  variable.  Postprob’s  convergence  was  assessed  using  the  MC_error  found  in  the  results  section  of  this  paper.    

Page 7: MLB Final Project

7    

 Figure  3.  History  Plots  for  Non-­‐Informative  Prior  

 

 Figure  4.  Auto-­‐Correlation  Plots  for  Non-­‐Informative  Prior  

 Figure  5  and  6  below  are  the  history  plots  and  auto-­‐correlation  plots,  respectively,  

for  the  informative  prior.  From  these  plots  it  was  assessed  that  convergence  occurred  quickly  for  every  variable.  Postprob’s  convergence  was  assessed  using  the  MC_error  found  in  the  results  section  of  this  paper.    

Page 8: MLB Final Project

8    

 Figure  5.  History  Plots  for  Informative  Prior  

 

 Figure  6.  Auto-­‐Correlation  Plots  for  Informative  Prior  

         

Page 9: MLB Final Project

9    

Appendix  B:  OpenBUGS  Code    

Non-­‐informative  Prior  model  {  for  (i  in  1:N){       xcent[i]<-­‐x[i]-­‐mean(x[])  }  for  (i  in  1:N){     mu[i]<-­‐beta0+beta1*xcent[i]     y[i]~dnorm(mu[i],tausq)  }  postprob<-­‐step(beta1)  beta0~dflat()  beta1~dflat()  tausq~dgamma(0.001,0.001)  sigma<-­‐1/sqrt(tausq)  }    

#data  list(x=c(63.69154422,  89.76840667,  74.92461167,  140.7180136,  109.4106621,  99.92325911,  68.43453544,  60.32229233,  67.06203433,  100.8169706,  83.123759,  55.12364089,  114.6389837,  99.61515889,  48.75051056,  69.04419133,  74.57996711,  56.90573011,  116.4527793,  199.368707,  60.03360822,  118.5733706,  44.04681044,  55.889759,  94.06393778,  92.80824244,  94.66015589,  44.78459711,  72.37762889,  69.80194444),y=c(0.47,  0.54,  0.44,  0.56,  0.48,  0.53,  0.49,  0.49,  0.48,  0.52,  0.46,  0.40,  0.55,  0.52,  0.50,  0.50,  0.50,  0.47,  0.50,  0.58,  0.52,  0.53,  0.42,  0.50,  0.50,  0.46,  0.57,  0.49,  0.54,  0.50),  N=30)    

#inits  list(beta0=0,  beta1=0,tausq=1)    Informative  Prior  model  {  for  (i  in  1:N){       xcent[i]<-­‐x[i]-­‐mean(x[])  }  for  (i  in  1:N){     mu[i]<-­‐beta0+beta1*xcent[i]     y[i]~dnorm(mu[i],tausq)  }  postprob<-­‐step(beta1)  beta0~dnorm(0.5,  0.05)  beta1~dnorm(0.1,  100)  tausq~dgamma(0.001,0.001)  sigma<-­‐1/sqrt(tausq)  

Page 10: MLB Final Project

10    

}    

#data  list(x=c(63.69154422,  89.76840667,  74.92461167,  140.7180136,  109.4106621,  99.92325911,  68.43453544,  60.32229233,  67.06203433,  100.8169706,  83.123759,  55.12364089,  114.6389837,  99.61515889,  48.75051056,  69.04419133,  74.57996711,  56.90573011,  116.4527793,  199.368707,  60.03360822,  118.5733706,  44.04681044,  55.889759,  94.06393778,  92.80824244,  94.66015589,  44.78459711,  72.37762889,  69.80194444),y=c(0.47,  0.54,  0.44,  0.56,  0.48,  0.53,  0.49,  0.49,  0.48,  0.52,  0.46,  0.40,  0.55,  0.52,  0.50,  0.50,  0.50,  0.47,  0.50,  0.58,  0.52,  0.53,  0.42,  0.50,  0.50,  0.46,  0.57,  0.49,  0.54,  0.50),  N=30)    

#inits  list(beta0=0,  beta1=0,tausq=1)  

 Appendix  C:  Inference  (R  code)  

 Complete  Dataset  >  data.bb=read.table('C://Users/xli63/Desktop/Baseball.txt',  header=TRUE)  >  attach(data.bb)  >  head(data.bb)  >  x  <-­‐  data.bb$AverageTotalPayroll  >  pct  <-­‐  data.bb$AveragePCT  >  lm.out=lm(pct~x)  >  summary(lm.out)  Call:  lm(formula  =  pct  ~  x)  Call:  lm(formula  =  pct  ~  x)  Residuals:  

Min                   1Q   Median   3Q   Max    -­‐0.076797   -­‐0.013157     0.007243   0.020457   0.061695    Coefficients:  

                           Estimate   Std.  Error   t  value   Pr(>|t|)          (Intercept)   0.4328659   0.0168386   25.707     <  2e-­‐16  ***  X   0.0007969   0.0001860     4.286   0.000194  ***  -­‐-­‐-­‐  Signif.  codes:    0  ‘***’  0.001  ‘**’  0.01  ‘*’  0.05  ‘.’  0.1  ‘  ’  1  Residual  standard  error:  0.03274  on  28  degrees  of  freedom  Multiple  R-­‐squared:    0.3961,        Adjusted  R-­‐squared:    0.3746    F-­‐statistic:  18.37  on  1  and  28  DF,    p-­‐value:  0.0001944    

Reduced  Model  (#3  Removed)  >  data_up2  <-­‐  data.bb[-­‐c(3),]  >  xnew2  <-­‐  data_up2$AvgTotalPayroll  >  pct2  <-­‐  data_up2$AveragePCT  >  red_residual_line2  <-­‐  lm(pct2~xnew2)  >  summary(red_residual_line2)  Call:  lm(formula  =  pct2  ~  xnew2)  Residuals:              Min     1Q           Median                   3Q                 Max    -­‐0.079121     -­‐0.011606       0.005706       0.018941       0.060047    

Page 11: MLB Final Project

11    

Coefficients:                                                  Estimate     Std.  Error     t  value     Pr(>|t|)          (Intercept)     0.4361350       0.0164219       26.558       <  2e-­‐16  ***  xnew2                 0.0007798       0.0001804         4.323     0.000187  ***  -­‐-­‐-­‐  Signif.  codes:    0  ‘***’  0.001  ‘**’  0.01  ‘*’  0.05  ‘.’  0.1  ‘  ’  1  Residual  standard  error:  0.03171  on  27  degrees  of  freedom  Multiple  R-­‐squared:    0.4091,        Adjusted  R-­‐squared:    0.3872    F-­‐statistic:  18.69  on  1  and  27  DF,    p-­‐value:  0.0001872    

Reduced  Model  (#12  Removed)  >  data_new  <-­‐  mydata[-­‐c(12),]  >  xnew  <-­‐  data_new$AverageTotalPayroll  >  pct_new  <-­‐  data_new$AveragePCT  >  remove_residual_line  <-­‐  lm(pct_new~xnew)  >  summary(remove_residual_line)  Call:  lm(formula  =  pct_new  ~  xnew)  Residuals:              Min                   1Q           Median                   3Q                 Max    -­‐0.056063     -­‐0.013108       0.004436       0.016632       0.059747    Coefficients:                               Estimate     Std.  Error     t  value     Pr(>|t|)          (Intercept)     0.4421938       0.0156408       28.272       <  2e-­‐16  ***  xnew                   0.0007190       0.0001709         4.208     0.000255  ***  -­‐-­‐-­‐  Signif.  codes:    0  ‘***’  0.001  ‘**’  0.01  ‘*’  0.05  ‘.’  0.1  ‘  ’  1  Residual  standard  error:  0.02964  on  27  degrees  of  freedom  Multiple  R-­‐squared:    0.396,          Adjusted  R-­‐squared:    0.3737    F-­‐statistic:    17.7  on  1  and  27  DF,    p-­‐value:  0.0002551    

Reduced  Model  (#24  Removed)  >  data_up1  <-­‐  data.bb[-­‐c(24),]  >  xnew1  <-­‐  data_up1$Avg.Total.Payroll  >  pct1  <-­‐  data_up1$Average.PCT  >  red_residual_line  <-­‐  lm(pct1~xnew1)  >  plot(red_residual_line)  >  summary(red_residual_line)  Call:  lm(formula  =  pct1  ~  xnew1)  Residuals:              Min                   1Q           Median                   3Q                 Max    -­‐0.075337     -­‐0.013510       0.007229       0.017961       0.062273    Coefficients:                               Estimate     Std.  Error     t  value     Pr(>|t|)          (Intercept)     0.4301762       0.0174143       24.702       <  2e-­‐16  ***  xnew1                 0.0008193       0.0001903         4.305     0.000196  ***  -­‐-­‐  

Signif.  codes:    0  ‘***’  0.001  ‘**’  0.01  ‘*’  0.05  ‘.’  0.1  ‘  ’  1  Residual  standard  error:  0.03304  on  27  degrees  of  freedom  Multiple  R-­‐squared:    0.4071,        Adjusted  R-­‐squared:    0.3851    F-­‐statistic:  18.54  on  1  and  27  DF,    p-­‐value:  0.0001965  

Page 12: MLB Final Project

12    

 Residual  &  QQ  plots  Based  on  Complete  Dataset  

 Residual  &  QQ  plots  Based  on  Reduced  Model  (Data  point  3  Removed)  

         

50 100 150 200

0.40

0.45

0.50

0.55

x

y

Page 13: MLB Final Project

13    

Residual  &  QQ  plots  Based  on  Reduced  Model  (Data  point  12  Removed)  

   Residual  &  QQ  plots  Based  on  Reduced  Model  (Data  point  24  Removed)  

     

   

Page 14: MLB Final Project

14    

Appendix  D:  ‘Averaged’  Dataset  AverageTotalPayroll   AveragePCT  

63.69154422   0.47  89.76840667   0.54  74.92461167   0.44  140.7180136   0.56  109.4106621   0.48  99.92325911   0.53  68.43453544   0.49  60.32229233   0.49  67.06203433   0.48  100.8169706   0.52  83.123759   0.46  55.12364089   0.40  114.6389837   0.55  99.61515889   0.52  48.75051056   0.50  69.04419133   0.50  74.57996711   0.50  56.90573011   0.47  116.4527793   0.50  199.368707   0.58  60.03360822   0.52  118.5733706   0.53  44.04681044   0.42  55.889759   0.50  94.06393778   0.50  92.80824244   0.46  94.66015589   0.57  44.78459711   0.49  72.37762889   0.54  69.80194444   0.50  

*For  Average  Total  Payroll  10.1  =  10,100,000  

       

Page 15: MLB Final Project

15    

Contributions    Project  proposal:  All  Members  OpenBUGS/R  Computing:  

-­‐ Non-­‐informative  prior:  Lingwen  He  -­‐ Informative  prior:  Zijian  Su  -­‐ Inference:  Xiangyu  Li  -­‐ Additional  Computing:  Lingwen  He,  Zijian  Su,  Xiangyu  Li  

Interim  report:  All  Members  Final  report  writing  and  formatting:  Padraic  O’Shea