© 2008 mcgraw-hill higher education the statistical imagination chapter 2. organizing data to...
TRANSCRIPT
© 2008 McGraw-Hill Higher Education
The Statistical Imagination
• Chapter 2. Organizing Data to Minimize Statistical Error
© 2008 McGraw-Hill Higher Education
Statistical Error
• Known degrees of imprecision in the procedures used to gather and process information
• Two main sources of statistical error: (1) sampling error
(2) measurement error
© 2008 McGraw-Hill Higher Education
Sampling Error
• Sampling error – inaccuracy in predictions about a population that results from the fact that we do not observe every subject in the population
© 2008 McGraw-Hill Higher Education
Sampling, and Controlling Sampling Error
• Observe Figure 2-1 in the text
© 2008 McGraw-Hill Higher Education
A Population and Its Parameters
• A Population: A large group of people of particular interest that we desire to study and understand
• A Parameter: A summary calculation of measurements made on all subjects in a population (usually not calculated and, therefore, unknown)
© 2008 McGraw-Hill Higher Education
A Sample and Its Statistics
• A Sample: A small subgroup of the population; the sample is observed and measured and then used to draw conclusions about the population
• A Statistic: A summary calculation of measurements made on a sample to estimate a parameter of the population
© 2008 McGraw-Hill Higher Education
Managing Sampling Error
• Sampling error hinges on understanding probability theory, which is the analysis and understanding of chance occurrences
• Probability theory provides a set of rules for determining the accuracy of sample statistics and for computing the degree of confidence we have in conclusions about a population
© 2008 McGraw-Hill Higher Education
Sample Size is One Source of Sampling Error
• Sample Size: The number of cases or observations in a sample
• The larger the sample, the smaller the range of error
• Probability theory allows us to say exactly how often a sample statistic will correctly predict a parameter
© 2008 McGraw-Hill Higher Education
Sample Representativeness as a Source of Sampling Error
• Sample representativeness: The extent to which all segments of a population actually land in a sample
© 2008 McGraw-Hill Higher Education
Representative Sample
• A representative sample is one in which all segments of the population are included in the sample in their correct proportions in the population
• A nonrepresentative sample is one in which some segments of the population are overrepresented or underrepresented in the sample
© 2008 McGraw-Hill Higher Education
A Simple Random Sample
• A simple random sample is one in which every person (or object) in the population has the same chance of being selected for the sample
© 2008 McGraw-Hill Higher Education
Measurement Error
• Measurement error – inaccuracy in research that derives from imprecise measurement instruments, difficulties in the classification of observations, and the need to round numbers
© 2008 McGraw-Hill Higher Education
Controlling Measurement Error
• Measurement : assignment of symbols (either names or numbers) to the differences we observe in a variable’s qualities or amounts
• Score–the measurement of a particular sample subject on a single variable; also called a code
• Unit of measure–a set interval or distance between quantities of the variables (e.g., inches, miles, years, pounds)
© 2008 McGraw-Hill Higher Education
Operational Definition
• An Operational Definition is the set of procedures or operations for measuring a variable
• It answers the question: How is this variable to be measured?
© 2008 McGraw-Hill Higher Education
Levels of Measurement
• The level of measurement of a variable identifies its measurement properties, which determine the kind of mathematical operations that can be appropriately used with it and the statistical formulas that can be used with it in testing theoretical hypotheses
• An important guide for selecting statistical formulas and procedures
© 2008 McGraw-Hill Higher Education
Four Levels of Measurement
• Nominal: Names categories• Ordinal: Names categories/scores and
ranks them• Interval: Ranked numerical scores with a
set unit of measure• Ratio: Ranked numerical scores with a set
unit of measure and a true zero point
© 2008 McGraw-Hill Higher Education
Nominal Variables
• Nominal comes from the Latin word for name. A nominal variable is one that is measured simply by naming categories
• The codes of a nominal variable (even if they are numerical codes) merely indicate a difference in category, class, quality, or kind
• Nominal variables do not provide meaningfully ordered numerical scores
• Dichotomous variable: A nominal variable with only two categories
© 2008 McGraw-Hill Higher Education
Examples of Nominal Variables and Their Categories
• Place of birth: Chicago, New York, Atlanta, Salt Lake City, etc.
• Hair color: brown, blonde, red, black, auburn, etc.
• Academic major: chemistry, sociology, biology, psychology, etc.
• Presence of fever: yes, no (dichotomous)
© 2008 McGraw-Hill Higher Education
Ordinal Variables
• An ordinal variable is one with named categories or numerical scores with the additional property of allowing categories or scores to be ranked from highest to lowest, best to worst, or first to last
• Because of the similarities of statistical procedures applied to nominal and ordinal variables, we often lump these two groups together and refer to nominal/ordinal variables
© 2008 McGraw-Hill Higher Education
Examples of Ordinal Variables and Their Ranked Scores
• Social class ranking: upper, middle, working, poverty
• College class level: first year, sophomore, junior, senior
• Quality of housing: standard, substandard, dilapidated
• Item with Likert scoring: strongly agree, agree, disagree, strongly disagree
• Rank of finish: 1, 2, 3, 4, etc.
© 2008 McGraw-Hill Higher Education
Interval Variables
• Have the characteristics of nominal and ordinal variables plus a defined numerical unit or “interval” of measure
• Identify differences in amount, quantity, degree, or distance
• Are assigned highly useful numerical scores• The intervals or distances between scores
are the same between any two points on the measurement scale
© 2008 McGraw-Hill Higher Education
Examples of Interval Variables and Their Scores
• Hostility trait scale: between 5 and 55 hostility scale points
• Seasonal temperature: between -80 and 140 Fahrenheit degrees
• Psychological depression: between 0 and 60 scale points
© 2008 McGraw-Hill Higher Education
Ratio Variables
• Have the characteristics of interval variables plus a true zero point, where a score of zero means none
• With a ratio variable, we can compute ratios, the amount of one observation in relation to another
• Because of the similarities of statistical procedures applied to interval and ratio variables, we often lump these two groups together and refer to interval/ratio variables
© 2008 McGraw-Hill Higher Education
Examples of Ratio Variables and Their Scores
• Body weight: between 0 and 700 pounds• Body height: between 10 and 100 inches• Age: between 0 and 125 years• Duration of time: between 0 seconds and
infinity• Grade point average (GPA): between 0 and
4.0
© 2008 McGraw-Hill Higher Education
Modifications of the Four Levels of Measurement
• The higher the level of measurement, the more that can be done with a variable in terms of mathematical calculations
• Thus, we oftentimes find ways to “increase” the level of measurement
© 2008 McGraw-Hill Higher Education
Increasing the Level of Measurement Via Indexing
• Indexing: Researchers often create an index (a summing up of objective events, behaviors, knowledge, and circumstances) or a survey scale (a summing up of subjective responses on attitudes, feelings, and opinions) to transform nominal/ordinal data into an interval/ratio variable. (See Table 2-3 of the text.)
© 2008 McGraw-Hill Higher Education
Keep Things Straight
• Take care to distinguish: (a) Level of Measurement : applies to
the entire variable and describes its measurement properties, and
(b) Unit of Measure, applies only to an interval/ratio variable and stipulates the “ruler” being used for its numerical scores
© 2008 McGraw-Hill Higher Education
Coding and Counting Observations
• Codebook: A concise description of the symbols that signify each score of each variable
© 2008 McGraw-Hill Higher Education
Basic Principles of Coding
• Inclusiveness: There must be a score or code for every observation made for a given variable
• Exclusiveness: Every observation can be assigned one and only one score for a given variable
• Missing Values (codes for missing data) must be assigned so that they may be excluded from calculations
© 2008 McGraw-Hill Higher Education
Quality Control Guidelines for Data Entry
• Make sure entered code values are consistent with codebook and measurement instruments (such as questionnaires)
• Have an assistant double check data entries• If entering data into a computer spreadsheet
or data file, print the data file and double check codes
• Produce frequency and percentage frequency distributions and search for stray codes
© 2008 McGraw-Hill Higher Education
Frequency Distribution
• A listing of all observed scores (or categories) of a variable and the frequency, f , of each score or category
• The frequency of a score or category is not very informative by itself so we compute proportions and percentages
© 2008 McGraw-Hill Higher Education
Proportional Frequency Distribution
• A listing of the proportion of responses for each category or score of a variable
• Divide the frequency of the category by n (the total sample size)
© 2008 McGraw-Hill Higher Education
Percentage Frequency Distribution
• A listing of the percent of responses for each category or score of a variable
• Multiply the proportional frequency by 100
© 2008 McGraw-Hill Higher Education
Coding and Counting Interval/Ratio Data
• Variables with interval/ratio levels of measurement are quantitative and, therefore, allow for very precise measurements
© 2008 McGraw-Hill Higher Education
Precision of Measurement
• A precise measurement is one in which the degree of measurement error is sufficiently small for the task at hand
• For interval/ratio variables, the degree of precision is specified by how far we round scores
© 2008 McGraw-Hill Higher Education
Rounding Error
• Rounding error is the difference between the true or perfect score (which we may never know) and our rounded, observed score
• Rounding error depends on what decimal place we choose as our level of precision – our rounding unit
© 2008 McGraw-Hill Higher Education
Rounding Procedures
• 1. Specify the rounding unit according to its decimal place (see Appendix A)
• 2. Observe the number to the right of the rounding unit: A. If it is 0, 1, 2, 3, or 4, round down B. If it is 6, 7, 8, or 9, round up C. If it is 5, look at the next decimal place to the
right, and, if the number in it is 5 or greater, round up. If there is no number in this next decimal place, round to an even number
© 2008 McGraw-Hill Higher Education
Real Limits of Rounded Numbers
• The real limits (or true limits) of a score are the range of possible true values of an (already) rounded score
• Real limits apply to variables with an interval/ratio level of measurement
© 2008 McGraw-Hill Higher Education
Calculating Real Limits
• 1. Focus on the “rounding unit,” the decimal place to which the score was rounded. Divide this rounding unit by 2
• 2. Subtract the result of step 1 from the observed rounded score to get the lower real limit
• 3. Add the result of step 1 to the observed rounded score to get the upper real limit
© 2008 McGraw-Hill Higher Education
Percentiles and Quartiles
• The Cumulative Percentage Frequency Distribution is the percentage frequency of a score plus that of all the scores preceding it in the distribution
• A cumulative frequency distribution provides a tool for identifying fractiles – scores that separate a fraction of a distribution’s cases
© 2008 McGraw-Hill Higher Education
Fractiles
• Percentile Rank : Among the cases in a score distribution, a percentile rank is the percentage of cases that fall at or below a specified value of X
• Quartiles are fractiles that identify the score values that break a distribution into four equally sized groups
© 2008 McGraw-Hill Higher Education
Statistical Follies
• A nonrepresentative sample (one that over- or underrepresents a category of sample subjects) can lead to faulty conclusions
• Having a large sample will not make up for failing to obtain a representative sample