chapter 2
DESCRIPTION
Chapter 2. Frequency Distributions, Stem-and-leaf displays, and Histograms. Where have we been?. = = 1.79. (X- ) = 0.00. (X- ) 2 = SS = 16.00. X = 30 N = 5 = 6.00. - PowerPoint PPT PresentationTRANSCRIPT
Chapter 2
Frequency Distributions, Stem-and-leaf displays, and Histograms
Where have we been?
To calculate SS, the variance, and the standard deviation: find the deviations from , square and sum them (SS), divide by N (2) and take a square root().
Example: Scores on a Psychology quiz
Student
John
JenniferArthurPatrickMarie
X
7
8357
X = 30 N = 5 = 6.00
X -
+1.00
+2.00-3.00-1.00+1.00
(X- ) = 0.00
(X - )2
1.00
4.009.001.001.00
(X- )2 = SS = 16.00
2 = SS/N = 3.20 = = 1.7920.3
Ways of showing how scores are distributed around the meanFrequency Distributions, Stem-and-leaf displays Histograms
Some definitionsFrequency Distribution - a tabular display of
the way scores are distributed across all the possible values of a variable
Absolute Frequency Distribution - displays the count of each score.
Cumulative Frequency Distribution - displays the total number of scores at and below each score.
Relative Frequency Distribution - displays the proportion of each score.
Relative Cumulative Frequency Distribution - displays the proportion of scores at and below each score.
Example DataTraffic accidents by bus drivers
•Studied 708 bus drivers.
•Recorded all accidents for a period of 4 years.
•Data looks like:
3, 0, 6, 0, 0, 2, 1, 4, 1, … 6, 0, 2
Frequency Distributions
# ofaccidents
01234567891011
AbsoluteFreq.11715715811578442176131
708
RelativeFrequency
.165
.222
.223
.162
.110
.062
.030
.010
.008
.001
.004
.001
.998
Calculate relative frequency.
Divide each absolute frequency by the N.
For example, 117/708 = .165
Notice rounding error
What can you answer?# of
accidents01234567891011
RelativeFreq..165.222.223.162.110.062.030.010.008.001.004.001.998
Proportion with at most 1 accident?
Proportion with 8 or more accidents?
= .165 + .222 = .387 .387 * 100 = 38.7%
= .008 + .001 +.004 + .001 = .014 = 1.4%
Proportion with between 4 and 7 accidents?= .110 + .062 +.030 + .010 = .212 = 21.2%
Cumulative Frequencies
# of acdnts01234567891011
AbsoluteFrequency
11715715811578442176131
708
CumulativeFrequency
117274432547625669690697703704707708
CumulativeRelative
Frequency.165.387.610.773.883.945.975.983.993.994.9991.000
Cumulative frequencies show number of scores at or below each point.
Calculate by adding all scores below each point.
Cumulative relative frequencies show the proportion of scores at or below each point.
Calculate by dividing cumulative frequencies by N at each point.
Grouped Frequency Example
2.72 2.84 2.63 2.51 2.54 2.98 2.61 2.93 2.87 2.76 2.58 2.66 2.86 2.862.58 2.60 2.63 2.62 2.73 2.80 2.79 2.96 2.58 2.50 2.82 2.83 2.90 2.912.87 2.87 2.74 2.70 2.52 2.75 2.99 2.66 2.58 2.71 2.51 2.87 2.87 2.752.85 2.61 2.54 2.73 2.96 2.90 2.75 2.76 2.93 2.64 2.85 2.70 2.56 2.512.83 2.79 2.76 2.75 2.86 2.58 2.87 2.89 2.89 2.52 2.59 2.54 2.54 2.852.83 2.96 2.93 2.89 2.92 2.98 2.59 2.81 2.78 2.95 2.96 2.95 2.56 2.592.87 2.84 2.84 2.80 2.65 2.70 2.61 2.89 2.83 2.85 2.52 2.66 2.74 2.732.88 2.85
100 High school students’ average time in seconds to read ambiguous sentences.
Values range between 2.50 seconds and 2.99 seconds.
Grouped FrequenciesNeeded when
number of values is large OR values are continuous.
To calculate group intervals First find the range. Determine a “good” interval based on
on number of resulting intervals,meaning of data, andcommon, regular numbers.
List intervals from largest to smallest.
Grouped Frequencies
ReadingTime
2.90-2.99
2.80-2.89
2.70-2.79
2.60-2.69
2.50-2.59
ReadingTime
2.95-2.99
2.90-2.94
2.85-2.89
2.80-2.84
2.75-2.79
2.70-2.74
2.65-2.69
2.60-2.64
2.55-2.59
2.50-2.54
Frequency
16
31
20
12
21
Frequency
9
7
20
11
10
10
4
8
10
11
Range = 2.99 - 2.50 = .49 ~ .50
i = .1#i = 5
i = .05#i = 10
Either is acceptable.
Use whichever display seems most informative.
In this case, the smaller intervals and 10 category table seems more informative.
Sometimes it goes the other way and less detailed presentation is necessary tp prevent the reader from missing the forest for the trees.
Stem and Leaf Displays
Used when seeing all of the values is important.
Shows data grouped all values visual summary
Stem and Leaf Display
Reading time dataReading
Time
2.9
2.9
2.8
2.8
2.7
2.7
2.6
2.6
2.5
2.5
Leaves
5,5,6,6,6,6,8,8,9
0,0,1,2,3,3,3
5,5,5,5,5,6,6,6,7,7,7,7,7,7,7,8,9,9,9,9
0,0,1,2,3,3,3,3,4,4,4
5,5,5,5,6,6,6,8,9,9
0,0,0,1,2,3,3,3,4,4
5,6,6,6
0,1,1,1,2,3,3,4
6,6,8,8,8,8,8,9,9,9
0,1,1,1,2,2,2,4,4,4,4
i = .05#i = 10
Stem and Leaf Display
Reading time dataReading
Time
2.9
2.8
2.7
2.6
2.5
Leaves
0,0,1,2,3,3,3,5,5,6,6,6,6,8,8,9
0,0,1,2,3,3,3,3,4,4,4,5,5,5,5,5,6,6,6,7,7,7,7,7,7,7,8,9,9,9,9
0,0,0,1,2,3,3,3,4,4,5,5,5,5,6,6,6,8,9,9
0,1,1,1,2,3,3,4,5,6,6,6
0,1,1,1,2,2,2,4,4,4,4,6,6,8,8,8,8,8,9,9,9
i = .1#i = 5
Transition to Histograms999977777776665555
988666655
3332100
44433332100
9986665555
4433321000
6665
43321110
44442221110
2.50-2.54
2.55-2.59
2.60 –2.64
2.65 –2.69
2.70 –2.74
2.75 –2.79
2.80 –2.84
2.85 –2.89
2.90 –2.94
2.95 –2.99
9998888866
Histogram of reading times
2.50-2.54
2.55-2.59
2.60 –2.64
2.65 –2.69
2.70 –2.74
2.75 –2.79
2.80 –2.84
2.85 –2.89
2.90 –2.94
2.95 –2.99
20181614121086420
Reading Time (seconds)
Frequency
Histogram concepts - 1
Used to display continuous data.Discrete data are shown on a box
graph.But most psychology data are
continuous, even if they are measured with integers.
Histogram concepts - 2
Use bar graphs, not histograms, for discrete data.
You rarely see data that is really discrete.Discrete data are categories or rankings.If you have continuous data, you can use
histograms, but remember real class limits.
Histograms can be used for relative frequencies as well.
What are the real limits of each class?
2.50-2.54
2.55-2.59
2.60 –2.64
2.65 –2.69
2.70 –2.74
2.75 –2.79
2.80 –2.84
2.85 –2.89
2.90 –2.94
2.95 –2.99
20181614121086420
Real limits of the fifth class are ???? - ???? Real limits of the highest class are ???? - ????.
Frequency
What are the real limits of each class?
2.50-2.54
2.55-2.59
2.60 –2.64
2.65 –2.69
2.70 –2.74
2.75 –2.79
2.80 –2.84
2.85 –2.89
2.90 –2.94
2.95 –2.99
20181614121086420
Real limits of the fifth class are 2.695-2.745 Real limits of the highest class are 2.945 - 2.995
Frequency
Predicting from Theoretical Distributions
Theoretical distributions show how scores can be expected to be distributed around the mean.(Mean = 2.755 for reading data).
Distributions are named after the shapes of their histograms:
Rectangular J-shaped Bell (Normal) many others
Rectangular Distribution of scores
Flipping a coin
100 flips - how many heads and tails do you expect?
Heads Tails
100
75
50
25
0
Rolling a die
120 rolls - how many of each number do you expect?
1 2 3 4 5 6
100
75
50
25
0
Rolling 2 dice
How many combinations are possible?
DiceTotal
123456789101112
AbsoluteFreq.
01234565432136
RelativeFrequency
.000
.028
.056
.083
.111
.139
.167
.139
.111
.083
.056
.0281.001
Rolling 2 dice
360 rolls - how many of each number do you expect?
1 2 3 4 5 6 7 8 9 10 11 12
100908070605040302010
0
Normal Curve
J Curve
Occurs when socially normative behaviors are measured.Most people follow the norm, but there are always a few outliers.
Principles of Theoretical Curves
Expected frequency = Theoretical relative frequency * N
Expected frequencies are your best estimates because they are closer, on the average, than any other estimate when we square the error.
Law of Large Numbers - The more observations that we have, the closer the relative frequencies should come to the theoretical distribution.
Q & A: Continuous data HOW IS THE FACT THAT WE ARE DISPLAYING
CONTINUOUS DATA SHOWN ON A HISTOGRAM AS OPPOSED TO A BAR GRAPH?
The bars of the graph on a histogram meet at the real limits of each interval.
IF DATA CAN ONLY BE INTEGERS (SUCH AS NUMBER OF TRUE/FALSE QUESTIONS ANSWERED CORRECTLY ON A PSYCH QUIZ), HOW COME IT IS CALLED CONTINUOUS DATA.
Whether data is continuous or discrete depends on what your measuring, not the accuracy of your measuring instrument. For example, distance is continuous whether you measure it with a yardstick or a micrometer. Knowledge, like self-confidence and other psychological variables, is probably best thought of as a continuous variable.
Determining “i” (the size of the interval)
WHAT IS THE RULE FOR DETERMINING THE SIZE OF INTERVALS TO USE IN WHICH TO GROUP DATA?
Whatever intervals seems appropriate to most informatively present the data. It is a matter of judgement. Usually we use 6 – 12 same size intervals each of which use intuitively obvious endpoints (e.g., 5s and 0s).