© 2008 mcgraw-hill higher education the statistical imagination chapter 2. organizing data to...

42
© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 2. Organizing Data to Minimize Statistical Error

Upload: oswin-glenn

Post on 16-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: © 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 2. Organizing Data to Minimize Statistical Error

© 2008 McGraw-Hill Higher Education

The Statistical Imagination

• Chapter 2. Organizing Data to Minimize Statistical Error

Page 2: © 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 2. Organizing Data to Minimize Statistical Error

© 2008 McGraw-Hill Higher Education

Statistical Error

• Known degrees of imprecision in the procedures used to gather and process information

• Two main sources of statistical error: (1) sampling error

(2) measurement error

Page 3: © 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 2. Organizing Data to Minimize Statistical Error

© 2008 McGraw-Hill Higher Education

Sampling Error

• Sampling error – inaccuracy in predictions about a population that results from the fact that we do not observe every subject in the population

Page 4: © 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 2. Organizing Data to Minimize Statistical Error

© 2008 McGraw-Hill Higher Education

Sampling, and Controlling Sampling Error

• Observe Figure 2-1 in the text

Page 5: © 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 2. Organizing Data to Minimize Statistical Error

© 2008 McGraw-Hill Higher Education

A Population and Its Parameters

• A Population: A large group of people of particular interest that we desire to study and understand

• A Parameter: A summary calculation of measurements made on all subjects in a population (usually not calculated and, therefore, unknown)

Page 6: © 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 2. Organizing Data to Minimize Statistical Error

© 2008 McGraw-Hill Higher Education

A Sample and Its Statistics

• A Sample: A small subgroup of the population; the sample is observed and measured and then used to draw conclusions about the population

• A Statistic: A summary calculation of measurements made on a sample to estimate a parameter of the population

Page 7: © 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 2. Organizing Data to Minimize Statistical Error

© 2008 McGraw-Hill Higher Education

Managing Sampling Error

• Sampling error hinges on understanding probability theory, which is the analysis and understanding of chance occurrences

• Probability theory provides a set of rules for determining the accuracy of sample statistics and for computing the degree of confidence we have in conclusions about a population

Page 8: © 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 2. Organizing Data to Minimize Statistical Error

© 2008 McGraw-Hill Higher Education

Sample Size is One Source of Sampling Error

• Sample Size: The number of cases or observations in a sample

• The larger the sample, the smaller the range of error

• Probability theory allows us to say exactly how often a sample statistic will correctly predict a parameter

Page 9: © 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 2. Organizing Data to Minimize Statistical Error

© 2008 McGraw-Hill Higher Education

Sample Representativeness as a Source of Sampling Error

• Sample representativeness: The extent to which all segments of a population actually land in a sample

Page 10: © 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 2. Organizing Data to Minimize Statistical Error

© 2008 McGraw-Hill Higher Education

Representative Sample

• A representative sample is one in which all segments of the population are included in the sample in their correct proportions in the population

• A nonrepresentative sample is one in which some segments of the population are overrepresented or underrepresented in the sample

Page 11: © 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 2. Organizing Data to Minimize Statistical Error

© 2008 McGraw-Hill Higher Education

A Simple Random Sample

• A simple random sample is one in which every person (or object) in the population has the same chance of being selected for the sample

Page 12: © 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 2. Organizing Data to Minimize Statistical Error

© 2008 McGraw-Hill Higher Education

Measurement Error

• Measurement error – inaccuracy in research that derives from imprecise measurement instruments, difficulties in the classification of observations, and the need to round numbers

Page 13: © 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 2. Organizing Data to Minimize Statistical Error

© 2008 McGraw-Hill Higher Education

Controlling Measurement Error

• Measurement : assignment of symbols (either names or numbers) to the differences we observe in a variable’s qualities or amounts

• Score–the measurement of a particular sample subject on a single variable; also called a code

• Unit of measure–a set interval or distance between quantities of the variables (e.g., inches, miles, years, pounds)

Page 14: © 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 2. Organizing Data to Minimize Statistical Error

© 2008 McGraw-Hill Higher Education

Operational Definition

• An Operational Definition is the set of procedures or operations for measuring a variable

• It answers the question: How is this variable to be measured?

Page 15: © 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 2. Organizing Data to Minimize Statistical Error

© 2008 McGraw-Hill Higher Education

Levels of Measurement

• The level of measurement of a variable identifies its measurement properties, which determine the kind of mathematical operations that can be appropriately used with it and the statistical formulas that can be used with it in testing theoretical hypotheses

• An important guide for selecting statistical formulas and procedures

Page 16: © 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 2. Organizing Data to Minimize Statistical Error

© 2008 McGraw-Hill Higher Education

Four Levels of Measurement

• Nominal: Names categories• Ordinal: Names categories/scores and

ranks them• Interval: Ranked numerical scores with a

set unit of measure• Ratio: Ranked numerical scores with a set

unit of measure and a true zero point

Page 17: © 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 2. Organizing Data to Minimize Statistical Error

© 2008 McGraw-Hill Higher Education

Nominal Variables

• Nominal comes from the Latin word for name. A nominal variable is one that is measured simply by naming categories

• The codes of a nominal variable (even if they are numerical codes) merely indicate a difference in category, class, quality, or kind

• Nominal variables do not provide meaningfully ordered numerical scores

• Dichotomous variable: A nominal variable with only two categories

Page 18: © 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 2. Organizing Data to Minimize Statistical Error

© 2008 McGraw-Hill Higher Education

Examples of Nominal Variables and Their Categories

• Place of birth: Chicago, New York, Atlanta, Salt Lake City, etc.

• Hair color: brown, blonde, red, black, auburn, etc.

• Academic major: chemistry, sociology, biology, psychology, etc.

• Presence of fever: yes, no (dichotomous)

Page 19: © 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 2. Organizing Data to Minimize Statistical Error

© 2008 McGraw-Hill Higher Education

Ordinal Variables

• An ordinal variable is one with named categories or numerical scores with the additional property of allowing categories or scores to be ranked from highest to lowest, best to worst, or first to last

• Because of the similarities of statistical procedures applied to nominal and ordinal variables, we often lump these two groups together and refer to nominal/ordinal variables

Page 20: © 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 2. Organizing Data to Minimize Statistical Error

© 2008 McGraw-Hill Higher Education

Examples of Ordinal Variables and Their Ranked Scores

• Social class ranking: upper, middle, working, poverty

• College class level: first year, sophomore, junior, senior

• Quality of housing: standard, substandard, dilapidated

• Item with Likert scoring: strongly agree, agree, disagree, strongly disagree

• Rank of finish: 1, 2, 3, 4, etc.

Page 21: © 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 2. Organizing Data to Minimize Statistical Error

© 2008 McGraw-Hill Higher Education

Interval Variables

• Have the characteristics of nominal and ordinal variables plus a defined numerical unit or “interval” of measure

• Identify differences in amount, quantity, degree, or distance

• Are assigned highly useful numerical scores• The intervals or distances between scores

are the same between any two points on the measurement scale

Page 22: © 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 2. Organizing Data to Minimize Statistical Error

© 2008 McGraw-Hill Higher Education

Examples of Interval Variables and Their Scores

• Hostility trait scale: between 5 and 55 hostility scale points

• Seasonal temperature: between -80 and 140 Fahrenheit degrees

• Psychological depression: between 0 and 60 scale points

Page 23: © 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 2. Organizing Data to Minimize Statistical Error

© 2008 McGraw-Hill Higher Education

Ratio Variables

• Have the characteristics of interval variables plus a true zero point, where a score of zero means none

• With a ratio variable, we can compute ratios, the amount of one observation in relation to another

• Because of the similarities of statistical procedures applied to interval and ratio variables, we often lump these two groups together and refer to interval/ratio variables

Page 24: © 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 2. Organizing Data to Minimize Statistical Error

© 2008 McGraw-Hill Higher Education

Examples of Ratio Variables and Their Scores

• Body weight: between 0 and 700 pounds• Body height: between 10 and 100 inches• Age: between 0 and 125 years• Duration of time: between 0 seconds and

infinity• Grade point average (GPA): between 0 and

4.0

Page 25: © 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 2. Organizing Data to Minimize Statistical Error

© 2008 McGraw-Hill Higher Education

Modifications of the Four Levels of Measurement

• The higher the level of measurement, the more that can be done with a variable in terms of mathematical calculations

• Thus, we oftentimes find ways to “increase” the level of measurement

Page 26: © 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 2. Organizing Data to Minimize Statistical Error

© 2008 McGraw-Hill Higher Education

Increasing the Level of Measurement Via Indexing

• Indexing: Researchers often create an index (a summing up of objective events, behaviors, knowledge, and circumstances) or a survey scale (a summing up of subjective responses on attitudes, feelings, and opinions) to transform nominal/ordinal data into an interval/ratio variable. (See Table 2-3 of the text.)

Page 27: © 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 2. Organizing Data to Minimize Statistical Error

© 2008 McGraw-Hill Higher Education

Keep Things Straight

• Take care to distinguish: (a) Level of Measurement : applies to

the entire variable and describes its measurement properties, and

(b) Unit of Measure, applies only to an interval/ratio variable and stipulates the “ruler” being used for its numerical scores

Page 28: © 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 2. Organizing Data to Minimize Statistical Error

© 2008 McGraw-Hill Higher Education

Coding and Counting Observations

• Codebook: A concise description of the symbols that signify each score of each variable

Page 29: © 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 2. Organizing Data to Minimize Statistical Error

© 2008 McGraw-Hill Higher Education

Basic Principles of Coding

• Inclusiveness: There must be a score or code for every observation made for a given variable

• Exclusiveness: Every observation can be assigned one and only one score for a given variable

• Missing Values (codes for missing data) must be assigned so that they may be excluded from calculations

Page 30: © 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 2. Organizing Data to Minimize Statistical Error

© 2008 McGraw-Hill Higher Education

Quality Control Guidelines for Data Entry

• Make sure entered code values are consistent with codebook and measurement instruments (such as questionnaires)

• Have an assistant double check data entries• If entering data into a computer spreadsheet

or data file, print the data file and double check codes

• Produce frequency and percentage frequency distributions and search for stray codes

Page 31: © 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 2. Organizing Data to Minimize Statistical Error

© 2008 McGraw-Hill Higher Education

Frequency Distribution

• A listing of all observed scores (or categories) of a variable and the frequency, f , of each score or category

• The frequency of a score or category is not very informative by itself so we compute proportions and percentages

Page 32: © 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 2. Organizing Data to Minimize Statistical Error

© 2008 McGraw-Hill Higher Education

Proportional Frequency Distribution

• A listing of the proportion of responses for each category or score of a variable

• Divide the frequency of the category by n (the total sample size)

Page 33: © 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 2. Organizing Data to Minimize Statistical Error

© 2008 McGraw-Hill Higher Education

Percentage Frequency Distribution

• A listing of the percent of responses for each category or score of a variable

• Multiply the proportional frequency by 100

Page 34: © 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 2. Organizing Data to Minimize Statistical Error

© 2008 McGraw-Hill Higher Education

Coding and Counting Interval/Ratio Data

• Variables with interval/ratio levels of measurement are quantitative and, therefore, allow for very precise measurements

Page 35: © 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 2. Organizing Data to Minimize Statistical Error

© 2008 McGraw-Hill Higher Education

Precision of Measurement

• A precise measurement is one in which the degree of measurement error is sufficiently small for the task at hand

• For interval/ratio variables, the degree of precision is specified by how far we round scores

Page 36: © 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 2. Organizing Data to Minimize Statistical Error

© 2008 McGraw-Hill Higher Education

Rounding Error

• Rounding error is the difference between the true or perfect score (which we may never know) and our rounded, observed score

• Rounding error depends on what decimal place we choose as our level of precision – our rounding unit

Page 37: © 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 2. Organizing Data to Minimize Statistical Error

© 2008 McGraw-Hill Higher Education

Rounding Procedures

• 1. Specify the rounding unit according to its decimal place (see Appendix A)

• 2. Observe the number to the right of the rounding unit: A. If it is 0, 1, 2, 3, or 4, round down B. If it is 6, 7, 8, or 9, round up C. If it is 5, look at the next decimal place to the

right, and, if the number in it is 5 or greater, round up. If there is no number in this next decimal place, round to an even number

Page 38: © 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 2. Organizing Data to Minimize Statistical Error

© 2008 McGraw-Hill Higher Education

Real Limits of Rounded Numbers

• The real limits (or true limits) of a score are the range of possible true values of an (already) rounded score

• Real limits apply to variables with an interval/ratio level of measurement

Page 39: © 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 2. Organizing Data to Minimize Statistical Error

© 2008 McGraw-Hill Higher Education

Calculating Real Limits

• 1. Focus on the “rounding unit,” the decimal place to which the score was rounded. Divide this rounding unit by 2

• 2. Subtract the result of step 1 from the observed rounded score to get the lower real limit

• 3. Add the result of step 1 to the observed rounded score to get the upper real limit

Page 40: © 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 2. Organizing Data to Minimize Statistical Error

© 2008 McGraw-Hill Higher Education

Percentiles and Quartiles

• The Cumulative Percentage Frequency Distribution is the percentage frequency of a score plus that of all the scores preceding it in the distribution

• A cumulative frequency distribution provides a tool for identifying fractiles – scores that separate a fraction of a distribution’s cases

Page 41: © 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 2. Organizing Data to Minimize Statistical Error

© 2008 McGraw-Hill Higher Education

Fractiles

• Percentile Rank : Among the cases in a score distribution, a percentile rank is the percentage of cases that fall at or below a specified value of X

• Quartiles are fractiles that identify the score values that break a distribution into four equally sized groups

Page 42: © 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 2. Organizing Data to Minimize Statistical Error

© 2008 McGraw-Hill Higher Education

Statistical Follies

• A nonrepresentative sample (one that over- or underrepresents a category of sample subjects) can lead to faulty conclusions

• Having a large sample will not make up for failing to obtain a representative sample