Download - Lec 3 Sampling
-
8/14/2019 Lec 3 Sampling
1/8
8- 1
Copyright 2004 by The McGraw-Hill Companies, Inc. All rig hts reserved.
llpp ggnniimmaaSSMethodsMethods&&
Central Limit TheoremCentral Limit Theorem
8- 2
Copyright 2004 by The McGraw-Hill Companies, Inc. All rig hts reserved.
We use sample informationtomake decisions or inferences
about the population.
We use sample informationtomake decisions or inferences
about the population.
TwoKEYsteps:TwoKEYsteps:
1. Choice of a proper method for selecting sample data
&2. Proper analysis of the sample data (more later)
KEY 1.KEY 1.
8- 3
Copyright 2004 by The McGraw-Hill Companies, Inc. All rig hts reserved.
KEY 1. KEY 1.If the proper
method for selecting
the sample is
NOT MADE
If the proper
method for selecting
the sample is
NOT MADE the SAMPLEwill not be truly
representative of theTOTAL Population!
the SAMPLEwill not be truly
representative of theTOTAL Population!
and wrong conclusions can be drawn!
8- 4
Copyright 2004 by The McGraw-Hill Companies, Inc. All rig hts reserved.
of the physical impossibility of checkingall items in the population, and,
also, it would be too time-consuming
$ the studying of all the items in a populationwould NOT be cost effective
the sample results are usually sufficient
the destructive nature of certain tests
Why Sample the Population?Why Sample the Population?Why Sample the Population?Why Sample the Population?
Because
-
8/14/2019 Lec 3 Sampling
2/8
8- 5
Copyright 2004 by The McGraw-Hill Companies, Inc. All rig hts reserved.
with Replacementwith Replacement
Each data unit in the
population is allowed to
appear in the sample
more than once
Each data unit in the
population is allowed to
appear in the sample
more than once
Each data unit in the
population is allowed to
appear in the sample
no more than once
Each data unit in the
population is allowed to
appear in the sample
no more than once
Each data unit in the
population
has a known likelihoodof being
included in the sample
Each data unit in the
population
has a known likelihoodof being
included in the sample
Non-Probability SamplingNon-Probability Sampling
Doesnot involverandom selection;
inclusion of an item is
based onconvenience
Doesnot involverandom selection;
inclusion of an item is
based onconvenience
Probability SamplingProbability Sampling
without Replacementwithout Replacement
Techniques8- 6
Copyright 2004 by The McGraw-Hill Companies, Inc. All rig hts reserved.
MethodsSimple Random
Systematic Random
StratifiedRandom
Cluster
...each item(person) in the population
has an equal chance of being included
items(people) of the population
are arranged in some order.
A random starting point is selected, and
then everykth member of the populationis selected for the sample
a population is
first divided into subgroups, called strata,
and a sample is selected from each strata
a population is
first divided into primary units, and
samples are selected from each unit
8- 7
Copyright 2004 by The McGraw-Hill Companies, Inc. All rig hts reserved.
The law firm of Hoya and Associates has five partners.At their weekly partners meeting each reported thenumber of hours they billed their clients last week:
ExampleExamplePartner Hours
Dunn 22
Hardy 26
Kiers 30
Malinowski 26
Tillman 22Iftwo partners are selected randomly
how many different samples are possible?
Iftwo partners are selected randomlyhow many different samples are possible?
8- 8
Copyright 2004 by The McGraw-Hill Companies, Inc. All rig hts reserved.
Partner Hours
Dunn 22
Hardy 26
Kiers 30
Malinowski 26
Tillman 22
Objects
5 taken 2 at a time
Using 5C2 Using 5C2
for a Total of 10 Samples!
Iftwo partners are selected randomlyhow many different samples are possible?
Iftwo partners are selected randomlyhow many different samples are possible?
-
8/14/2019 Lec 3 Sampling
3/8
8- 9
Copyright 2004 by The McGraw-Hill Companies, Inc. All rig hts reserved.
Partner Hours
Dunn 22
Hardy 26
Kiers 30
Malinowski 26
Tillman 22
Objects
5
5C2 =5C2 =5!
=2!
= 10 Samples= 10 Samples
(5 2!)
Iftwo partners are selected randomlyhow many different samples are possible?Iftwo partners are selected randomly
how many different samples are possible?
8-10
Copyright 2004 by The McGraw-Hill Companies, Inc. All rig hts reserved.
Partners Samples of 2 Mean
1&2
1&3
1&4
1&5
2&3
2&4
2&5
3&4
3&5
4&5
(22+26)/2 =
(22+30)/2 =
(22+26)/2 =
24
2624
(22+22)/2 =
(26+30)/2 =
(26+26)/2 =
(26+22)/2 =
(30+26)/2 =
(30+22)/2 =
(26+22)/2 =
22
28
26
2428
26
24
8-11
Copyright 2004 by The McGraw-Hill Companies, Inc. All rig hts reserved.
Terminology
is the difference betweena sample statistic
and its
corresponding populationparameter
is the difference betweena sample statistic
and its
corresponding populationparameter
is a probability distributionconsisting of
all possible sample meansof a given sample size
selected from a population
is a probability distributionconsisting of
all possible sample meansof a given sample size
selected from a population
Sampling error
Sampling distributionof the sample mean
ExampleExample
8-12
Copyright 2004 by The McGraw-Hill Companies, Inc. All rig hts reserved.
Sample
Mean
Frequency Relative frequency
Probability
Organize the sample meansinto a Sampling Distribution
Organize the sample meansinto a Sampling Distribution
Example continuedExample continuedMean
24
26
24
22
28
26
24
2826
24
22
24
26
28
1
4
3
2
1/10
4/10
3/10
2/10
-
8/14/2019 Lec 3 Sampling
4/8
8-13
Copyright 2004 by The McGraw-Hill Companies, Inc. All rig hts reserved.
Sample Mean Frequency
10
==== 22(1)+ 24(4)+ 26(3) + 28(2)
Example continuedExample continued
22
24
26
28
1
4
3
2
Compute themean of the sample means .Compare it with the population mean
Compute themean of the sample means .Compare it with the population mean
= 25.2= 25.2
X
8-14
Copyright 2004 by The McGraw-Hill Companies, Inc. All rig hts reserved.
Example continuedExample continued
5
2226302622 ++++++++++++++++====
Thepopulation mean is also the same asthesample means25.2 hours!
Thepopulation mean is also the same asthesample means25.2 hours!
Note
Partner Hours
Dunn 22
Hardy 26
Kiers 30
Malinowski 26
Tillman 22
= 25.2= 25.2
8-15
Copyright 2004 by The McGraw-Hill Companies, Inc. All rig hts reserved.
The sampling distribution of the means
ofall possible samples of sizengenerated from the population
will beapproximately normally distributed!
CentralLimit TheoremCentralLimit Theorem
Sampling Distributions:Sampling Distributions:
VarianceVariance 2
/n2
/n
Mean (x)Mean (x)
/ n/ nStandard Deviation
(standard error of the mean)Standard Deviation
(standard error of the mean)X
8-16
Copyright 2004 by The McGraw-Hill Companies, Inc. All rig hts reserved.
samplemeansamplestandarddeviation
sample variancesampleproportion
A point estimate is one value (a single point)that is used to estimate a population parameter
PointEstimatesPointEstimates
More
-
8/14/2019 Lec 3 Sampling
5/8
8-17
Copyright 2004 by The McGraw-Hill Companies, Inc. All rig hts reserved.
PointEstimatesPointEstimates
Population followsthe normal distribution
Population followsthe normal distribution
The sampling distribution
of thesample means also follows
the normal distribution
The sampling distribution
of thesample means also follows
the normal distribution
Probability ofa sample mean
falling within a particular region,
use:
Probability ofa sample mean
falling within a particular region,
use: Z =n
X
Population doesNOTfollowthe normal distribution
Population doesNOTfollowthe normal distribution
If the sample is of at least 30
observations, the sample WILL
follow the normal distribution
If the sample is of at least 30
observations, the sample WILL
follow the normal distribution
Probability ofa sample mean
falling within a particular region,
use:
Probability ofa sample mean
falling within a particular region,
use: Z =n
X
s
8-18
Copyright 2004 by The McGraw-Hill Companies, Inc. All rig hts reserved.
CentralLimit TheoremCentralLimit Theorem
Chart 8 6 Results for Several PopulationsChart 8 6 Results for Several Populations
8-19
Copyright 2004 by The McGraw-Hill Companies, Inc. All rig hts reserved.
Suppose it takes anaverage of 330 minutes
for taxpayers toprepare, copy, andmail an income tax
return form.
Suppose it takes anaverage of 330 minutes
for taxpayers toprepare, copy, andmail an income tax
return form.
Using the Sampling Distribution
of the Sample Mean
Using the Sampling Distribution
of the Sample Mean
= 12.6= 12.6
A consumer watchdogagency selects arandomsample of 40 taxpayersand finds the standarddeviation of the timeneeded is 80 minutes
A consumer watchdogagency selects arandomsample of 40 taxpayersand finds the standarddeviation of the timeneeded is 80 minutes
What is thestandard error of the mean?What is thestandard error of the mean?
Data
/n/nFormulaFormula = 80/= 80/ 40
8-20
Copyright 2004 by The McGraw-Hill Companies, Inc. All rig hts reserved.
What is the likelihood the sample mean
isgreater than 320 minutes?
What is the likelihood the sample mean
isgreater than 320 minutes?
Using the Sampling Distribution
of the Sample Mean
Using the Sampling Distribution
of the Sample Mean
Suppose it takes anaverage of 330 minutes fortaxpayers to prepare, copy, and mail an income taxreturn form. A consumer watchdog agency selects a
random sample of 40 taxpayers and finds thestandard deviation of the time needed is 80 minutes.
Suppose it takes anaverage of 330 minutes fortaxpayers to prepare, copy, and mail an income taxreturn form. A consumer watchdog agency selects a
random sample of 40 taxpayers and finds thestandard deviation of the time needed is 80 minutes.
Data
nswer
-
8/14/2019 Lec 3 Sampling
6/8
8-21
Copyright 2004 by The McGraw-Hill Companies, Inc. All rig hts reserved.
Using the Sampling Distributionof the Sample Mean
Using the Sampling Distributionof the Sample Mean
What is the likelihood the sample meanisgreater than 320 minutes?
What is the likelihood the sample meanisgreater than 320 minutes?
* average of 330 minutes *random sample of 40* standard deviation is 80 minutes
* average of 330 minutes *random sample of 40* standard deviation is 80 minutes
Data
ns
Xz
====FormulaFormula
4080
330320 ==== = 0.79= 0.79
1111
330320
a1
8-22
Copyright 2004 by The McGraw-Hill Companies, Inc. All rig hts reserved.
Using the Sampling Distributionof the Sample Mean
Using the Sampling Distributionof the Sample Mean
What is the likelihood the sample meanisgreater than 320 minutes?
What is the likelihood the sample meanisgreater than 320 minutes?
* average of 330 minutes *random sample of 40* standard deviation is 80 minutes
* average of 330 minutes *random sample of 40* standard deviation is 80 minutes
Data
Look up 0.79in Table
Look up 0.79in Table
2222
a1 =0.2852a1 =0.2852Required Area =
0.2852 + .5 = 0.7852Required Area =
0.2852 + .5 = 0.7852
330320
a1
8-23
Copyright 2004 by The McGraw-Hill Companies, Inc. All rig hts reserved.
Sampling Distribution of
Proportion
Sampling Distribution of
Proportion
The normal distribution(acontinuous distribution)
yields a goodapproximation ofthe binomial distribution
(adiscrete distribution)
for large values ofn.
Use whennp andn(1-p) are both greater than 5!Use whennp andn(1-p) are both greater than 5!
8-24
Copyright 2004 by The McGraw-Hill Companies, Inc. All rig hts reserved.
np=
)1( pnp ====
Mean andVarianceof a
Binomial ProbabilityDistribution
Mean andVarianceof a
Binomial ProbabilityDistribution
2
FormulaFormula
2FormulaFormula
-
8/14/2019 Lec 3 Sampling
7/8
8-25
Copyright 2004 by The McGraw-Hill Companies, Inc. All rig hts reserved.
A multinational company claims that 55% of itsemployees are bilingual. To verify this claim, a
statistician selected a sample of 60 employees of thecompany usingsimple random sampling and
found 48% to be bilingual.
np = 60(.55)= 33
n(1-p) = 60(.45)
= 27
The sample size is bigenough to use the normal
approximation with amean of.55and astandard deviation
of (.55)(.45)/60 = 0.064
The sample size is bigenough to use the normal
approximation with amean of.55and astandard deviation
of (.55)(.45)/60 = 0.064
Sampling Distribution ofProportion
Sampling Distribution ofProportion
Based on this information,what can we say about the companys claim?
8-26
Copyright 2004 by The McGraw-Hill Companies, Inc. All rig hts reserved.
s
Xz
====
1111 Z = (0.48 -0.55) / 0.064Z = -1.09
Look up 1.09 in TableLook up 1.09 in Table22222222
a1 =0.3621a1 =0.3621
Required Area= .5 0.3621 = 0.1379
or 14%
Required Area= .5 0.3621 = 0.1379
or 14%
Sampling Distribution ofProportion
Sampling Distribution ofProportion
continuedcontinued
FormulaFormula
.55.48
a1
8-27
Copyright 2004 by The McGraw-Hill Companies, Inc. All rig hts reserved.
s
Xz
====
1111 Z = (0.48 -0.55) / 0.064Z = -1.09
Look up 1.09 in TableLook up 1.09 in Table22222222
a =0.3621a =0.3621
Required Area= .5 0.3621 = 0.1379
or 14%
Required Area= .5 0.3621 = 0.1379
or 14%
There is
approximately
a 14%chance
that the
companys claim
is true, based onthis sample.
There is
approximately
a 14%chance
that the
companys claim
is true, based onthis sample.
Sampling Distribution of
Proportion
Sampling Distribution of
Proportion
Conclusion
continuedcontinued
FormulaFormula
8-28
Copyright 2004 by The McGraw-Hill Companies, Inc. All rig hts reserved.
Suppose themean selling price of a
litre of gasoline in Canada is$.659.
Further, assume the distribution ispositively
skewed, with astandarddeviationof$0.08.
What is the probability of selecting a
sample of 35 gasoline stations and
finding the sample mean within$.03 ofthe population mean?
Sampling Distribution of
Mean
Sampling Distribution of
Mean
-
8/14/2019 Lec 3 Sampling
8/8
8-29
Copyright 2004 by The McGraw-Hill Companies, Inc. All rig hts reserved.
Sampling Distribution ofMean
Sampling Distribution ofMean
nsz X ====
1 3508.0$
659$.629$. ==== 22.-2====
ns
z X ====2
3508.0$
659$.689$. ==== 2.22====
mean selling price is$.659 SDof $0.08Sample of 35 gasoline stations
Probability ofsample mean within$.03?
mean selling price is$.659 SDof $0.08Sample of 35 gasoline stations
Probability ofsample mean within$.03?
Data
Find thez-scores for.659 +/- .03
Find thez-scores for.659 +/- .03 i.e. 0.629and.689
.629 .689
8-30
Copyright 2004 by The McGraw-Hill Companies, Inc. All rig hts reserved.
We would expect about97%
of the sample means tobe within $0.03 of the
population mean.
We would expect about97%
of the sample means tobe within $0.03 of the
population mean.
a1 = .4868a2 = .4868
Sampling Distribution ofMean
Sampling Distribution ofMean
mean selling price is$.659 SDof $0.08Sample of 35 gasoline stations
Probability ofsample mean within $.03?
mean selling price is$.659 SDof $0.08Sample of 35 gasoline stations
Probability ofsample mean within $.03?
Data
Find areas from tableFind areas from table
Required A = .9736
z ==== -2.221z ==== 2.22
2
8-31
Copyright 2004 by The McGraw-Hill Companies, Inc. All rig hts reserved.
This completes Chapter 8This completes Chapter 8