© 2008 mcgraw-hill higher education the statistical imagination chapter 2. organizing data to...

© 2008 McGraw-Hill Higher Education

The Statistical Imagination

• Chapter 2. Organizing Data to Minimize Statistical Error


Statistical Error

• Known degrees of imprecision in the procedures used to gather and process information

• Two main sources of statistical error: (1) sampling error

(2) measurement error


Sampling Error

• Sampling error – inaccuracy in predictions about a population that results from the fact that we do not observe every subject in the population


Sampling, and Controlling Sampling Error

• Observe Figure 2-1 in the text


A Population and Its Parameters

• A Population: A large group of people of particular interest that we desire to study and understand

• A Parameter: A summary calculation of measurements made on all subjects in a population (usually not calculated and, therefore, unknown)


A Sample and Its Statistics

• A Sample: A small subgroup of the population; the sample is observed and measured and then used to draw conclusions about the population

• A Statistic: A summary calculation of measurements made on a sample to estimate a parameter of the population


Managing Sampling Error

• Sampling error hinges on understanding probability theory, which is the analysis and understanding of chance occurrences

• Probability theory provides a set of rules for determining the accuracy of sample statistics and for computing the degree of confidence we have in conclusions about a population


Sample Size is One Source of Sampling Error

• Sample Size: The number of cases or observations in a sample

• The larger the sample, the smaller the range of error

• Probability theory allows us to say exactly how often a sample statistic will correctly predict a parameter


Sample Representativeness as a Source of Sampling Error

• Sample representativeness: The extent to which all segments of a population actually land in a sample


Representative Sample

• A representative sample is one in which all segments of the population are included in the sample in their correct proportions in the population

• A nonrepresentative sample is one in which some segments of the population are overrepresented or underrepresented in the sample


A Simple Random Sample

• A simple random sample is one in which every person (or object) in the population has the same chance of being selected for the sample


Measurement Error

• Measurement error – inaccuracy in research that derives from imprecise measurement instruments, difficulties in the classification of observations, and the need to round numbers


Controlling Measurement Error

• Measurement : assignment of symbols (either names or numbers) to the differences we observe in a variable’s qualities or amounts

• Score–the measurement of a particular sample subject on a single variable; also called a code

• Unit of measure–a set interval or distance between quantities of the variables (e.g., inches, miles, years, pounds)


Operational Definition

• An Operational Definition is the set of procedures or operations for measuring a variable

• It answers the question: How is this variable to be measured?


Levels of Measurement

• The level of measurement of a variable identifies its measurement properties, which determine the kind of mathematical operations that can be appropriately used with it and the statistical formulas that can be used with it in testing theoretical hypotheses

• An important guide for selecting statistical formulas and procedures


Four Levels of Measurement

• Nominal: Names categories• Ordinal: Names categories/scores and

ranks them• Interval: Ranked numerical scores with a

set unit of measure• Ratio: Ranked numerical scores with a set

unit of measure and a true zero point


Nominal Variables

• Nominal comes from the Latin word for name. A nominal variable is one that is measured simply by naming categories

• The codes of a nominal variable (even if they are numerical codes) merely indicate a difference in category, class, quality, or kind

• Nominal variables do not provide meaningfully ordered numerical scores

• Dichotomous variable: A nominal variable with only two categories


Examples of Nominal Variables and Their Categories

• Place of birth: Chicago, New York, Atlanta, Salt Lake City, etc.

• Hair color: brown, blonde, red, black, auburn, etc.

• Academic major: chemistry, sociology, biology, psychology, etc.

• Presence of fever: yes, no (dichotomous)


Ordinal Variables

• An ordinal variable is one with named categories or numerical scores with the additional property of allowing categories or scores to be ranked from highest to lowest, best to worst, or first to last

• Because of the similarities of statistical procedures applied to nominal and ordinal variables, we often lump these two groups together and refer to nominal/ordinal variables


Examples of Ordinal Variables and Their Ranked Scores

• Social class ranking: upper, middle, working, poverty

• College class level: first year, sophomore, junior, senior

• Quality of housing: standard, substandard, dilapidated

• Item with Likert scoring: strongly agree, agree, disagree, strongly disagree

• Rank of finish: 1, 2, 3, 4, etc.


Interval Variables

• Have the characteristics of nominal and ordinal variables plus a defined numerical unit or “interval” of measure

• Identify differences in amount, quantity, degree, or distance

• Are assigned highly useful numerical scores• The intervals or distances between scores

are the same between any two points on the measurement scale


Examples of Interval Variables and Their Scores

• Hostility trait scale: between 5 and 55 hostility scale points

• Seasonal temperature: between -80 and 140 Fahrenheit degrees

• Psychological depression: between 0 and 60 scale points


Ratio Variables

• Have the characteristics of interval variables plus a true zero point, where a score of zero means none

• With a ratio variable, we can compute ratios, the amount of one observation in relation to another

• Because of the similarities of statistical procedures applied to interval and ratio variables, we often lump these two groups together and refer to interval/ratio variables


Examples of Ratio Variables and Their Scores

• Body weight: between 0 and 700 pounds• Body height: between 10 and 100 inches• Age: between 0 and 125 years• Duration of time: between 0 seconds and

infinity• Grade point average (GPA): between 0 and

4.0


Modifications of the Four Levels of Measurement

• The higher the level of measurement, the more that can be done with a variable in terms of mathematical calculations

• Thus, we oftentimes find ways to “increase” the level of measurement


Increasing the Level of Measurement Via Indexing

• Indexing: Researchers often create an index (a summing up of objective events, behaviors, knowledge, and circumstances) or a survey scale (a summing up of subjective responses on attitudes, feelings, and opinions) to transform nominal/ordinal data into an interval/ratio variable. (See Table 2-3 of the text.)


Keep Things Straight

• Take care to distinguish: (a) Level of Measurement : applies to

the entire variable and describes its measurement properties, and

(b) Unit of Measure, applies only to an interval/ratio variable and stipulates the “ruler” being used for its numerical scores


Coding and Counting Observations

• Codebook: A concise description of the symbols that signify each score of each variable


Basic Principles of Coding

• Inclusiveness: There must be a score or code for every observation made for a given variable

• Exclusiveness: Every observation can be assigned one and only one score for a given variable

• Missing Values (codes for missing data) must be assigned so that they may be excluded from calculations


Quality Control Guidelines for Data Entry

• Make sure entered code values are consistent with codebook and measurement instruments (such as questionnaires)

• Have an assistant double check data entries• If entering data into a computer spreadsheet

or data file, print the data file and double check codes

• Produce frequency and percentage frequency distributions and search for stray codes


Frequency Distribution

• A listing of all observed scores (or categories) of a variable and the frequency, f , of each score or category

• The frequency of a score or category is not very informative by itself so we compute proportions and percentages


Proportional Frequency Distribution

• A listing of the proportion of responses for each category or score of a variable

• Divide the frequency of the category by n (the total sample size)


Percentage Frequency Distribution

• A listing of the percent of responses for each category or score of a variable

• Multiply the proportional frequency by 100


Coding and Counting Interval/Ratio Data

• Variables with interval/ratio levels of measurement are quantitative and, therefore, allow for very precise measurements


Precision of Measurement

• A precise measurement is one in which the degree of measurement error is sufficiently small for the task at hand

• For interval/ratio variables, the degree of precision is specified by how far we round scores


Rounding Error

• Rounding error is the difference between the true or perfect score (which we may never know) and our rounded, observed score

• Rounding error depends on what decimal place we choose as our level of precision – our rounding unit


Rounding Procedures

• 1. Specify the rounding unit according to its decimal place (see Appendix A)

• 2. Observe the number to the right of the rounding unit: A. If it is 0, 1, 2, 3, or 4, round down B. If it is 6, 7, 8, or 9, round up C. If it is 5, look at the next decimal place to the

right, and, if the number in it is 5 or greater, round up. If there is no number in this next decimal place, round to an even number


Real Limits of Rounded Numbers

• The real limits (or true limits) of a score are the range of possible true values of an (already) rounded score

• Real limits apply to variables with an interval/ratio level of measurement


Calculating Real Limits

• 1. Focus on the “rounding unit,” the decimal place to which the score was rounded. Divide this rounding unit by 2

• 2. Subtract the result of step 1 from the observed rounded score to get the lower real limit

• 3. Add the result of step 1 to the observed rounded score to get the upper real limit


Percentiles and Quartiles

• The Cumulative Percentage Frequency Distribution is the percentage frequency of a score plus that of all the scores preceding it in the distribution

• A cumulative frequency distribution provides a tool for identifying fractiles – scores that separate a fraction of a distribution’s cases


Fractiles

• Percentile Rank : Among the cases in a score distribution, a percentile rank is the percentage of cases that fall at or below a specified value of X

• Quartiles are fractiles that identify the score values that break a distribution into four equally sized groups


Statistical Follies

• A nonrepresentative sample (one that over- or underrepresents a category of sample subjects) can lead to faulty conclusions

• Having a large sample will not make up for failing to obtain a representative sample

© 2008 mcgraw-hill higher education the statistical imagination chapter 2. organizing data to...

Documents