real statistics using excel

39
Real Statistics Using Excel 102521088 吳吳吳

Upload: paloma

Post on 24-Feb-2016

78 views

Category:

Documents


0 download

DESCRIPTION

Real Statistics Using Excel. 102521088 吳 柏葦. Out line . Confidence and prediction intervals for regression Exponential Regression Model Power Regression Model Linear regression models for comparing means. Confidence and prediction intervals for regression. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Real Statistics Using Excel

Real Statistics Using Excel

102521088 吳柏葦

Page 2: Real Statistics Using Excel

• Confidence and prediction intervals for regression

• Exponential Regression Model

• Power Regression Model

• Linear regression models for comparing means

*Out line

Page 3: Real Statistics Using Excel

Confidence and prediction intervals for regression

Page 4: Real Statistics Using Excel

The 95% confidence interval for the forecasted values ŷ of x is

Where

This means that there is a 95% probability that the true linear regression line of the population will lie within the confidence interval of the regression line calculated from the sample data.

Page 5: Real Statistics Using Excel
Page 6: Real Statistics Using Excel

The 95% prediction interval of the forecasted value ŷ0 for x0 is

where the standard error of the prediction is

For any specific value x0 the prediction interval is more meaningful than the confidence interval.

Page 7: Real Statistics Using Excel

Figure 2 – Confidence and prediction intervals for data in Example 1

Find the 95% confidence and prediction intervals for the forecasted life expectancy for men who smoke 20 cigarettes in Example 1 of Method of Least Squares

Page 8: Real Statistics Using Excel

Referring to Figure 2, we see that the forecasted value for 20 cigarettes is given by FORECAST (20,B4:B18,A4:A18) = 73.16. The confidence interval, calculated using the standard error 2.06 (found in cell E12), is (68.70, 77.61).

The prediction interval is calculated in a similar way using the prediction standard error of 8.24 (found in cell J12). Thus life expectancy of men who smoke 20 cigarettes is in the interval (55.36, 90.95) with 95% probability.

Page 9: Real Statistics Using Excel

Example 2: Test whether the y-intercept is 0.We use the same approach as that used in Example 1 to find the confidence interval of ŷ when x = 0 (this is the y-intercept). The result is given in column M of Figure 2. Here the standard error is

And so the confidence interval is

Since 0 is not in this interval, the null hypothesis that the y-intercept is zero is rejected.

Page 10: Real Statistics Using Excel

Exponential Regression Model

Page 11: Real Statistics Using Excel

Sometimes linear regression can be used with relationships which are not inherently linear, but can be made to be linear after a transformation. In particular, we consider the following exponential model:

y= α

Page 12: Real Statistics Using Excel

y= α

ln y = ln α + β

Y’ = α ’+ β + ε

Page 13: Real Statistics Using Excel

Observation: Since αeβ(x+1) = αeβx · eβ, we note that an increase in x of 1 unit results in y being multiplied by eβ.Observation: A model of the form ln y = βx + δ is referred to as a log-level regression model. Clearly any such model can be expressed as an exponential regression model of form y = αeβx by setting α = eδ.

Page 14: Real Statistics Using Excel

*Example 1: Determine whether the data on the left side of Figure 1 fits with an exponential model.

Figure 1 – Data for Example 1 and log transform

Page 15: Real Statistics Using Excel

*The table on the right side of Figure 1 shows ln y (the natural log of y) instead of y. We now use the Regression data analysis tool to model the relationship between ln y and x.

Figure 2 – Regression data analysis for x vs. ln y from Example 1

Page 16: Real Statistics Using Excel

The table in Figure 2 shows that the model is a good fit and the relationship between ln y and x is given by

*ln y = 0.016+2.64

Applying e to both sides of the equation yields

Page 17: Real Statistics Using Excel

We can also see the relationship between  and  by creating a scatter chart for the original data and choosing Layout > Analysis|Trendline in Excel and then selecting the Exponential Trendline option. We can also create a chart showing the relationship between  and ln  and use Linear Trendline to show the linear regression line .

Page 18: Real Statistics Using Excel

As usual we can use the formula y = 14.05∙(1.016)x described above for prediction. Thus if we want the y value corresponding to x = 26, using the above model we get ŷ  =14.05∙(1.016)26 = 21.35.We can get the same result using Excel’s GROWTH function, as described below.Excel Functions: Excel supplies two functions for exponential regression, namely GROWTH and LOGEST.LOGEST is the exponential counterpart to the linear regression function LINEST described in Testing the Slope of the Regression Line. Once again you need to highlight a 5 × 2 area and enter the array function =LOGEST(R1, R2, TRUE, TRUE), where R1 = the array of observed values for y (not ln y) and R2 is the array of observed values for x, and then press Ctrl-Shft-Enter. LOGEST doesn’t supply any labels and so you will need to enter these manually.

Page 19: Real Statistics Using Excel

Essentially LOGEST is simply LINEST using the mapping described above for transforming an exponent model into a linear model. For Example 1 the output for LOGEST(B6:B16, A6:A16, TRUE, TRUE) is as in Figure 4.

Page 20: Real Statistics Using Excel

GROWTH is the exponential counterpart to the linear regression function TREND described in Method of Least Squares. For R1 = the array containing the y values of the observed data and R2 = the array containing the x values of the observed data, GROWTH(R1, R2, x) = EXP(a) * EXP(b)^x where EXP(a) and EXP(b) are as defined from the LOGEST output described above (or alternatively from the Regression data analysis). E.g., based on the data from Example 1, we have:GROWTH(B6:B16, A6:A16, 26) = 21.35which is the same result we obtained earlier using the Regression data analysis tool.GROWTH can also be used to predict more than one value. In this case, GROWTH(R1, R2, R3) is an array function where R1 and R2 are as described above and R3 is an array ofx values. The function returns an array of predicted  values for the x values in R3 based on the model determined by the values in R1 and R2.

Page 21: Real Statistics Using Excel

Power Regression Model

Page 22: Real Statistics Using Excel

Another non-linear regression model is the power regression model, which is based on the following equation:

y= α

ln y = ln α + β ln

y = α + β + ε

Observation: A model of the form ln y = β ln x + δ is referred to as a log-log regression model. Since if this equation holds, we have

it follows that any such model can be expressed as a power regression model of form y =αxβ by setting α = eδ.

Page 23: Real Statistics Using Excel

Example 1: Determine whether the data on the left side of Figure 1 is a good fit for a power model.

Page 24: Real Statistics Using Excel

The table on the right side of Figure 1 shows y transformed into ln y and x transformed into ln x. We now use the Regression data analysis tool to model the relationship between ln y and ln x.

Page 25: Real Statistics Using Excel

Figure 2 shows that the model is a good fit and the relationship between ln x and ln y is given byln y = 0.234 + 2.81 lnApplying e to both sides of the equation yields

We can also see the relationship between  and  by creating a scatter chart for the original data and choosing Layout > Analysis|Trendline in Excel and then selecting the Power Trendline option (after choosing More Trendline Options). We can also create a chart showing the relationship between ln x and ln y and use Linear Trendline to show the linear regression line 

Page 26: Real Statistics Using Excel
Page 27: Real Statistics Using Excel

As usual we can use the formula  described above for prediction. For example, if we want the y value corresponding to x = 26, using the above model we getExcel doesn’t provide functions like TREND/GROWTH (nor LINEST/LOGEST) for power/log-log regression, but we can use the TREND formula as follows:=EXP(TREND(LN(B6:B16),LN(A6:A16),LN(26)))to get the same result.Observation: Thus the equivalent of the array formula GROWTH(R1, R2, R3) for log-log regression is =EXP(TREND(LN(R1), LN(R2), LN(R3))).Observation: In the case where there is one independent variable x, there are four ways of making log transformations, namelylevel-level regression: y = βx + αlog-level regression: ln y = βx + αlevel-log regression: y = β ln x + αlog-log regression: ln y = β ln x + αWe dealt with the first of these in ordinary linear regression (no log transformation). The second is described in Exponential Regression and the fourth is power regression as described on this webpage. We haven’t studied the level-log regression, but it too can be analyzed using techniques similar to those described here.

Page 28: Real Statistics Using Excel

Linear regression models for comparing means

Page 29: Real Statistics Using Excel

In this section we show how to use dummy variables to model categorical variables using linear regression in a way that is similar to that employed in Dichotomous Variables and the t-test. In particular we show that hypothesis testing of the difference between means using the t-test (see Two Sample t Test with Equal Variances and Two Sample t Test with Unequal Variances) can be done by using linear regression.

Page 30: Real Statistics Using Excel

Example 1: Repeat the analysis of Example 1 of Two Sample t Test with Equal Variances (comparing means from populations with equal variance) using linear regression.

The leftmost table in Figure 1 contains the original data from Example 1 of Two Sample t Test with Equal Variances. We define the dummy variable x so that x = 0 when the data element is from the New group and x = 1 when the data element is from the Old group. The data can now be expressed with an independent variable  and a dependent variable  as described in the middle table in Figure 1.

Page 31: Real Statistics Using Excel

Running the Regression data analysis tool on x and y, we get the results on the right in Figure 1. We can now compare this with the results we obtained using the t-test data analysis tool, which we repeat here in Figure 2.

Page 32: Real Statistics Using Excel

We now make some observations regarding this comparison:F = 4.738 in the regression analysis is equal to the square of the t-stat (2.177) from the t-test, which is consistent with Property 1 of F DistributionR Square = .208 in the regression analysis is equal to  =    where t is the t-stat from the t-test, which is consistent with the observation following Theorem 1 of One Sample Hypothesis Testing for CorrelationThe p-value = .043 from the regression analysis (called Significance F) is the same as the p-value from the  test (called P(T<=t) two-tail).

Page 33: Real Statistics Using Excel

Effect SizeWe can also see from the above discussion that the regression coefficient can be expressed as a function of the t-stat using the following formula:

The impact of this is that the effect size for the t-test can be expressed in terms of the regression coefficient. The general guidelines are that r = .1 is viewed as a small effect, r= .3 as a medium effect and r = .5 as a large effect. For Example 1, r = 0.456 which is close to .5, and so is viewed as a large effect.Note that this formula can also be used to measure the effect size for t-tests even when the population variances are unequal (see next example) and for the case of paired samples.

Page 34: Real Statistics Using Excel

Model coefficientsAlso note that the coefficients in the regression model y = bx + a can be calculated directly from the original data as follows. First calculate the means of the data for each flavoring (new and old). The mean of the data in the new flavoring sample is 15 and the mean of the data in the old flavoring sample is 11.1. Since x = 0 for the new flavoring sample and x = 1 for the old flavoring sample, we have

This means that a = 15 and b = 11.1 – a = 11.1 – 15 = -3.9, and so the regression line is y = 15 – 3.9x, which agrees with the coefficients in Figure 1.

Page 35: Real Statistics Using Excel

Unequal varianceAs was mentioned in the discussion following Figure 4 of Testing the Regression Line Slope, the Regression data analysis tool provides an optional Residuals Plot. The output for Example 1 is displayed in Figure 3.

From the chart we see how the residual values corresponding to x = 0 and x = 1 are distributed about the mean of zero. The spreading about x = 1 is a bit larger than for x = 0, but the difference is quite small, which is an indication that the variances for x = 0 andx = 1 are quite equal. This suggests that the variances for the New and Old samples are roughly equal.

Page 36: Real Statistics Using Excel

Example 2: Repeat the analysis of Example 2 of Two Sample t Test with Unequal Variances (comparing means from populations with unequal variance) using linear regression.

Page 37: Real Statistics Using Excel

We note that the regression analysis displayed in Figure 4 agrees with the t-test analysis assuming equal variances (the table on the left of Figure 5).

Page 38: Real Statistics Using Excel

Unfortunately, since the variances are quite unequal, the correct results are given by the table on the right in Figure 5. This highlights the importance of the requirement that variances of the  values for each be equal for the results of the regression analysis to be useful.Also note that the plot of the Residuals for the regression analysis clearly shows that the variances are unequal (see Figure 6).

Page 39: Real Statistics Using Excel

Thanks for your attention