2007 會計資訊系統計學 ( 一 ) 上課投影片 5-1 data collection and sampling chapter 5

28
5-1 2007 會會會會會會會會 ( ) 會會會會會 Data Collection and Sampling Chapter 5

Post on 22-Dec-2015

248 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 2007 會計資訊系統計學 ( 一 ) 上課投影片 5-1 Data Collection and Sampling Chapter 5

5-12007會計資訊系統計學 (一 )上課投影片

Data Collection and Sampling

Data Collection and Sampling

Chapter 5Chapter 5

Page 2: 2007 會計資訊系統計學 ( 一 ) 上課投影片 5-1 Data Collection and Sampling Chapter 5

5-22007會計資訊系統計學 (一 )上課投影片

Recall• Statistics is a tool for converting data into information:

Data

Statistics

Information

• But…– Where then does data come from? – How is it gathered? – How do we ensure its accurate (正確) ? Is the data reliable (可靠) ?

– Is it representative (代表性) of the population from which it was drawn?

• This chapter explores some of these issues.

Page 3: 2007 會計資訊系統計學 ( 一 ) 上課投影片 5-1 Data Collection and Sampling Chapter 5

5-32007會計資訊系統計學 (一 )上課投影片

5.1 Methods of Collecting Data• The reliability and accuracy of the data affect the

validity of the results of a statistical analysis.• The reliability and accuracy of the data depend

on the method of collection.• Four of the most popular sources of statistical

data are:– Published data (公開資料)– Observational studies (觀察)– Experimental studies (實驗)– Surveys (調查)

Page 4: 2007 會計資訊系統計學 ( 一 ) 上課投影片 5-1 Data Collection and Sampling Chapter 5

5-42007會計資訊系統計學 (一 )上課投影片

– This is often a preferred source of data due to low cost and convenience.

– Published data is found as printed material, tapes, disks, and on the Internet.

– Data published by the organization that has collected it is called PRIMARY DATA(初級資料) .

For example:Data published by the US Bureau of Census.

For example:Data published by the US Bureau of Census.

– Data published by an organization different than the organization that has collected it is called SECONDARY DATA(次級資料) .

For example:•The Statistical abstracts of the United States,compiles data from primary sources• Compustat, sells variety of financial data tapescompiled from primary sources

For example:•The Statistical abstracts of the United States,compiles data from primary sources• Compustat, sells variety of financial data tapescompiled from primary sources

Published Data

Page 5: 2007 會計資訊系統計學 ( 一 ) 上課投影片 5-1 Data Collection and Sampling Chapter 5

5-52007會計資訊系統計學 (一 )上課投影片

– Observational study is one in which measurements representing a variable of interest are observed and recorded, without controlling any factor that might influence their values.

– Experimental study is one in which measurements representing a variable of interest are observed and recorded, while controlling factors (控制變數) that might influence their values.

• When published data is unavailable, one needs to conduct a study to generate the data.

Observational and experimental studies

Page 6: 2007 會計資訊系統計學 ( 一 ) 上課投影片 5-1 Data Collection and Sampling Chapter 5

5-62007會計資訊系統計學 (一 )上課投影片

• Surveys solicit information from people. e.g. pre-election polls; marketing surveys.

• The Response Rate(回收率) (i.e. the proportion of all people selected who complete the survey) is a key survey parameter.

• Surveys can be made by means of – personal interview (個人訪談)– telephone interview (電話訪談)– self-administered questionnaire (自我管理調查)

Surveys

Page 7: 2007 會計資訊系統計學 ( 一 ) 上課投影片 5-1 Data Collection and Sampling Chapter 5

5-72007會計資訊系統計學 (一 )上課投影片

Questionnaire Design(問卷設計)Key design principles of a good questionnaire:

1. Keep the questionnaire as short as possible.2. Ask short, simple, and clearly worded questions.3. Start with demographic questions to help respondents get

started comfortably.4. Use dichotomous (yes|no) and multiple choice questions.5. Use open-ended questions cautiously. 6. Avoid using leading-questions.7. Pretest a questionnaire on a small number of people.8. Think about the way you intend to use the collected data

when preparing the questionnaire.

Page 8: 2007 會計資訊系統計學 ( 一 ) 上課投影片 5-1 Data Collection and Sampling Chapter 5

5-82007會計資訊系統計學 (一 )上課投影片

5.2 Sampling(抽樣)• Recall that statistical inference permits us to draw

conclusions about a population based on a sample.• Motivation for conducting a sampling procedure:

– Costs. (e.g. it’s less expensive to sample 1,000 television viewers than 100 million TV viewers)

– Population size.– The possible destructive nature (破壞性) of the

sampling process. (e.g. performing a crash test on every automobile produced is impractical).

• The sampled population (抽樣母體) and the target population (目標母體) should be similar to one another.

Page 9: 2007 會計資訊系統計學 ( 一 ) 上課投影片 5-1 Data Collection and Sampling Chapter 5

5-92007會計資訊系統計學 (一 )上課投影片

5.3 Sampling Plans• A sampling plan is just a method or procedure for

specifying how a sample will be taken from a population.

• We will focus our attention on these three methods:– Simple random sampling (簡單隨機抽樣)– Stratified random sampling (分層隨機抽樣)– Cluster sampling (集群抽樣)

Page 10: 2007 會計資訊系統計學 ( 一 ) 上課投影片 5-1 Data Collection and Sampling Chapter 5

5-102007會計資訊系統計學 (一 )上課投影片

Simple Random Sampling

• In simple random sampling all the samples with the same size are equally likely to be chosen.

• To conduct random sampling – assign a number to each element of the chosen

population (or use already given numbers),– randomly select the sample numbers (members). Use a

random numbers table, or a software package.

Page 11: 2007 會計資訊系統計學 ( 一 ) 上課投影片 5-1 Data Collection and Sampling Chapter 5

5-112007會計資訊系統計學 (一 )上課投影片

• Example 5.1– A government income-tax auditor is responsible for

1,000 tax returns.– The auditor will randomly select 40 returns to audit.– Use Excel’s random number generator to select the

returns.• Solution

• We generate 50 numbers between 1 and 1000 (we need only 40 numbers, but the extra might be used if duplicate numbers are generated.)

Simple Random Sampling

Page 12: 2007 會計資訊系統計學 ( 一 ) 上課投影片 5-1 Data Collection and Sampling Chapter 5

5-122007會計資訊系統計學 (一 )上課投影片

Simple Random Sampling

• Example 5.1: A government income tax auditor must choose a sample of 40 of 1,000 returns to audit

Extra #’s may be used if duplicate random numbers are generated.

Page 13: 2007 會計資訊系統計學 ( 一 ) 上課投影片 5-1 Data Collection and Sampling Chapter 5

5-132007會計資訊系統計學 (一 )上課投影片

Simple Random Sampling

0.3820002 382.00018 3830.1006806 100.68056 1010.5964843 596.48427 5970.8991058 899.10581 9000.8846095 884.60952 8850.9584643 958.46431 9590.0144963 14.496292 150.4074221 407.4221 4080.8632466 863.24656 8640.1385846 138.58455 1390.2450331 245.03311 246

. . .

. . .

0.3820002 382.00018 3830.1006806 100.68056 1010.5964843 596.48427 5970.8991058 899.10581 9000.8846095 884.60952 8850.9584643 958.46431 9590.0144963 14.496292 150.4074221 407.4221 4080.8632466 863.24656 8640.1385846 138.58455 1390.2450331 245.03311 246

. . .

. . .

X(100) Round-up

38310159790088595915408864139246..

The auditor should select 40 files numbered 383, 101, ...

50 Random numbersbetween 0 and 1000,each has a probabilityof 1/1000 to be selected

50 numbers uniformly distributed between 0 and 1

50 random uniformly distributed whole-numbers between 1 and 1000.

Page 14: 2007 會計資訊系統計學 ( 一 ) 上課投影片 5-1 Data Collection and Sampling Chapter 5

5-142007會計資訊系統計學 (一 )上課投影片

• This sampling procedure separates the population into mutually exclusive sets (strata) (互斥的層別) , and then draw simple random samples from each stratum.

Sex• Male• Female

Age• under 20• 20-30• 31-40• 41-50

Occupation• professional• clerical• blue-collar

Stratified Random Sampling

Page 15: 2007 會計資訊系統計學 ( 一 ) 上課投影片 5-1 Data Collection and Sampling Chapter 5

5-152007會計資訊系統計學 (一 )上課投影片

• With this procedure we can acquire information about– the whole population– each stratum– the relationships among strata.

Stratified Random Sampling

Page 16: 2007 會計資訊系統計學 ( 一 ) 上課投影片 5-1 Data Collection and Sampling Chapter 5

5-162007會計資訊系統計學 (一 )上課投影片

Stratified Random Sampling• After the population has been stratified, we can use

simple random sampling to generate the complete sample. For example, keep the proportion of each stratum in the population.

If we only have sufficient resources to sample 400 people total, we would draw 100 of them from the low income group…

If we are sampling 1000 people, we’d draw50 of them from the high income group.

Page 17: 2007 會計資訊系統計學 ( 一 ) 上課投影片 5-1 Data Collection and Sampling Chapter 5

5-172007會計資訊系統計學 (一 )上課投影片

• Cluster sampling is a simple random sample of groups or clusters of elements.

• This procedure is useful when– it is difficult and costly to develop a complete list of the

population members (making it difficult to develop a simple random sampling procedure.

– the population members are widely dispersed geographically.

• Cluster sampling may increase sampling error (抽樣誤差) , because of probable similarities among cluster members.

Cluster Sampling

Page 18: 2007 會計資訊系統計學 ( 一 ) 上課投影片 5-1 Data Collection and Sampling Chapter 5

5-182007會計資訊系統計學 (一 )上課投影片

Sample Size(樣本數)• Numerical techniques for determining sample sizes

will be described later, but suffice it to say that the larger the sample size is, the more accurate we can expect the sample estimates to be.

Page 19: 2007 會計資訊系統計學 ( 一 ) 上課投影片 5-1 Data Collection and Sampling Chapter 5

5-192007會計資訊系統計學 (一 )上課投影片

5.4 Sampling and Non-Sampling Errors

• Two major types of error can arise when a sample of observations is taken from a population:

• Sampling error(抽樣誤差) refers to differences between the sample and the population that exist only because of the observations that happened to be selected for the sample.

• Nonsampling errors (非抽樣誤差) are more serious and are due to mistakes made in the acquisition of data or due to the sample observations being selected improperly.

Page 20: 2007 會計資訊系統計學 ( 一 ) 上課投影片 5-1 Data Collection and Sampling Chapter 5

5-202007會計資訊系統計學 (一 )上課投影片

Sampling Error• Sampling error refers to differences between the sample

and the population that exist only because of the observations that happened to be selected for the sample.

• Another way to look at this is: the differences in results for different samples (of the same size) is due to sampling error:– E.g. Two samples of size 10 of 1,000 households. If we happened

to get the highest income level data points in our first sample and all the lowest income levels in the second, this delta is due to sampling error.

• Increasing the sample size will reduce this type of error.

Page 21: 2007 會計資訊系統計學 ( 一 ) 上課投影片 5-1 Data Collection and Sampling Chapter 5

5-212007會計資訊系統計學 (一 )上課投影片

Population income distribution

( population mean)

)( meansamplex

Sampling errorThe sample mean falls here only because certain randomly selected observations were included in the sample.

Sampling Errors

Page 22: 2007 會計資訊系統計學 ( 一 ) 上課投影片 5-1 Data Collection and Sampling Chapter 5

5-222007會計資訊系統計學 (一 )上課投影片

Nonsampling Error• Nonsampling errors are more serious and are due

to mistakes made in the acquisition of data or due to the sample observations being selected improperly.

• Three types of nonsampling errors:– Errors in data acquisition– Nonresponse errors (無反應偏差)– Selection bias (取樣誤差)

• Note: increasing the sample size will not reduce this type of error.

Page 23: 2007 會計資訊系統計學 ( 一 ) 上課投影片 5-1 Data Collection and Sampling Chapter 5

5-232007會計資訊系統計學 (一 )上課投影片

Errors in data acquisition• …arises from the recording of incorrect responses,

due to:– incorrect measurements being taken because of faulty

equipment,– mistakes made during transcription from primary

sources,– inaccurate recording of data due to misinterpretation of

terms, or– inaccurate responses to questions concerning sensitive

issues.

Page 24: 2007 會計資訊系統計學 ( 一 ) 上課投影片 5-1 Data Collection and Sampling Chapter 5

5-242007會計資訊系統計學 (一 )上課投影片

Data Acquisition Error

If this observation…

…is wrongly recorded here…

…then the sample mean is affected

Sampling error + Data acquisition error

Population

Sample

Page 25: 2007 會計資訊系統計學 ( 一 ) 上課投影片 5-1 Data Collection and Sampling Chapter 5

5-252007會計資訊系統計學 (一 )上課投影片

Nonresponse Error• …refers to error (or bias) introduced when

responses are not obtained from some members of the sample, i.e. the sample observations that are collected may not be representative of the target population.

• As mentioned earlier, the Response Rate (i.e. the proportion of all people selected who complete the survey) is a key survey parameter and helps in the understanding in the validity of the survey and sources of nonresponse error.

Page 26: 2007 會計資訊系統計學 ( 一 ) 上課投影片 5-1 Data Collection and Sampling Chapter 5

5-262007會計資訊系統計學 (一 )上課投影片

Non-Response Error

Population

Sample

No response here... …may lead to biased results here.

Page 27: 2007 會計資訊系統計學 ( 一 ) 上課投影片 5-1 Data Collection and Sampling Chapter 5

5-272007會計資訊系統計學 (一 )上課投影片

Selection Bias

• …occurs when the sampling plan is such that some members of the target population cannot possibly be selected for inclusion in the sample.

Page 28: 2007 會計資訊系統計學 ( 一 ) 上課投影片 5-1 Data Collection and Sampling Chapter 5

5-282007會計資訊系統計學 (一 )上課投影片

Selection Bias

Population

Sample

When parts of the population cannot be selected...

…the sample cannot representthe whole population.