chapter 13 – univariate data. what is this topic about? in this topic, we look at data sets (i.e....
TRANSCRIPT
CHAPTER 13 – UNIVARIATE DATA
WHAT IS THIS TOPIC ABOUT?• In this topic, we look at data sets (i.e. groups of numbers) and we apply
mathematical tests to the data to learn about it
• A data set is a group of numbers that we find from research e.g. survey results, observations of the world
• ‘Univariate’ means that there is one (‘uni’) variable (‘variate’)
• A variable is something that varies or changes
• We measure this variable in order to learn about whatever we are researching
13A: MEASURES OF CENTRAL TENDENCY• Measures of central tendency are methods that we use to look at the
middle or centre point of the data we have collected through research
• There are three different ways of doing this:
• Mean: the average of all observations in a set of data ()
• Median: the middle observation in an set of data that is put in order
• Mode: the most frequent/common observation in a data set
• Grouped vs ungrouped data sets
• Ungrouped means each individual data observation is looked at within the data set
• Grouped means that the data has been put into different groups or intervals, rather than looking at each data observation separately
SYMBOLS AND TABLES
• Frequency table: used to count the number of times something is observed
• ‘Frequency’ just means the number of observations
• means ‘sum’ or ‘total’
• is called ‘x bar’ and is the symbol for ‘mean’ or ‘average’
Observation Frequency
Red cars 2
Blue cars 5
Yellow cars 3
THE FOLLOWING APPLIES TO UNGROUPED DATA SETS
MEAN ()To find the mean (average) of the data set:
1. Add all the observations/scores in the data set together (they do not have to be in order)
2. Divide by the number of observations/scores
We can write this as:
Or, as:
where x is the scores
and n is the number of scores
FINDING THE MEAN: WORKED EXAMPLE
Find the mean of the data set: 6, 2, 4, 3, 4, 5, 4, 51. Add the observations/scores together (in other words, find which is the total/sum of the scores)
= 6 + 2 + 4 + 3 + 4 + 5 + 4 + 5
= 33
2. Divide by the number of scores (n)
There are 8 scores in this data set (n = 8)
=
= 4.125
MEDIAN
To find the median (middle/centre score) of the data set:
1. Arrange the scores in numerical order (smallest to biggest is the easiest way)
2. Put one finger on the smallest score, and a finger on the biggest score, and move your fingers inward one number at a time until they meet at the middle score
3. If there are an odd number of scores, the median is the middle score
4. If there are an even number of scores, find the mean/average () of the two middle scores
FINDING THE MEDIAN: WORKED EXAMPLE
Find the median of the data set: 6, 2, 4, 3, 4, 5, 4, 5, 3
1. Put the scores in numerical order:
2, 3, 3, 4, 4, 4, 5, 5, 6
2. Working inwards from the smallest and biggest numbers, we find that the middle score is 4
Therefore, the median of this data set is 4.
MODE
To find the mode (most frequent/common score) of the data set:
1. Work through the data set and record how many times each score appears (it might be easier to put them in order first to ensure you don’t miss any)
2. Whichever score appears most frequently/commonly is the mode
Note:
• Sometimes there is no mode – each score appears once only
• Sometimes there is one clear mode – one number that appears most frequently/commonly
• Sometimes there is more than one mode
FINDING THE MODE: WORKED EXAMPLE
Find the mode of the data set: 6, 2, 4, 3, 4, 5, 4, 5, 3
1. (optional, but useful) Put the scores in numerical order:
2, 3, 3, 4, 4, 4, 5, 5, 6
2. Determine which number (or numbers) appear most commonly
In this case, the mode is 4 (it appears three times in this data set)
CALCULATING THE MEAN, MEDIAN AND MODE FROM A FREQUENCY TABLE• First, we draw up a table with four columns: Score (x), Frequency (f), Frequency x score (fx), Cumulative
frequency (cf)
• We find the MEAN using this formula:
f = frequency, x = the scores
• We find the MEDIAN by finding the position of each score in cumulative frequency column
We then use the formula to find where (at what position) the median will appear, and read this score off the cf
column
• We find the MODE by looking for the score with the highest frequency
WORKED EXAMPLE: FREQUENCY TABLE
• This is what the question might look like: Find the mean, median and mode of the data set below.
• If you were to write this data out as a list, it would be:
4, 5, 5, 6, 6, 6, 6, 6, 7, 7, 7, 7, 8, 8, 8 (i.e. one 4, two 5s, five 6s, four 7s, three 8s)
Score (x) Frequency (f)
4 1
5 2
6 5
7 4
8 3
Total n
WORKED EXAMPLE: FREQUENCY TABLE
• Draw up this table, but add these two extra columns:
Score (x) Frequency (f) Frequency x score (fx) Cumulative frequency (cf)
4 1
5 2
6 5
7 4
8 3
Total n (fx)
WORKED EXAMPLE: FREQUENCY TABLE
• Fill in all the data
Score (x) Frequency (f) Frequency x score (fx) Cumulative frequency (cf)
4 1 4 x 1 = 4 1
5 2 5 x 2 = 10 1 + 2 = 3
6 5 6 x 5 = 30 3 + 5 = 8
7 4 7 x 4 = 28 8 + 4 = 12
8 3 8 x 3 = 24 12 + 3 = 15
Total 1 + 2 + 5 + 4 + 3n = 15
4 + 10 + 30 + 28 + 24(fx) = 96
(not needed)
In this column, add the frequencies
together from one row to the next (the
first number will always be the first
frequency)
In this column, multiple the score by the
frequency
WORKED EXAMPLE: FREQUENCY TABLE
• Use the data to find the mean, median and mode
Score (x)
Frequency (f) Frequency x score (fx)
Cumulative frequency (cf)
4 1 4 1
5 2 10 3
6 5 30 8
7 4 28 12
8 3 24 15
Total n = 15 (fx) = 96
• MEAN
Use the formula:
• MEDIAN
Locate the position of the median using
Median position =
= 8, which means that the median is the 8th score
Use the cf column to find the 8th score, which is 6
• MODE
The score with the highest frequency is 6, therefore 6 is the mode
QUESTIONS (UNGROUPED DATA)
Exercise 13A page 435 questions:
• 1acd
• 2 (stem and leaf plot – see )
• 3 (frequency tables)
• 13abcdThe stem is the first number,
and the leaves are the second number, so for Science,
8 7 3 ǀ 3becomes 38, 37 and 33.
For Maths, 4 ǀ 0 6 8
becomes 40, 46 and 48.
GROUPED DATA• When data is grouped, we lose the original values, because instead of
having individual numbers, we are given an interval or group (e.g. 0-10)
• Therefore, we need to estimate the mean, median and mode using different methods
THE FOLLOWING APPLIES TO GROUPED DATA SETS
Mean
• With class intervals, the individual values are lost.
• Use midpoints of the intervals into which these values fall.
• For example, when measuring heights of students in a class, if we found that 4 students had a height between 180 and 185 cm, we have to assume that each of those 4 students is 182.5 cm tall. The formula used for calculating the mean is the same as for data presented in a frequency table
• Here x represents the midpoint (or class centre) of each class interval, f is the corresponding frequency and n is the total number of observations in a set.
Median
• The median is found by drawing a cumulative frequency polygon (ogive) of the data and estimating the median from the 50th percentile.
Modal class
• We do not find a mode because exact scores are lost. We can, however, find a modal class. This is the class interval that has the highest frequency.
WORKED EXAMPLE: GROUPED DATA
STEP 1
• Draw up this table but add in three columns: ‘midpoint’, ‘midpoint x frequency’ and ‘cumulative frequency’ (the blue is the stuff I have added)
Class interval Midpoint (x) Frequency (f) Midpoint x frequency (fx)
Cumulative frequency (cf)
60-<70 5
70-<80 7
80-<90 10
90-<100 12
100-<110 8
110-<120 3
Total (not needed) 45 (n) (fx) (not needed)
STEP 2
• Fill in the data
Class interval Midpoint (x) Frequency (f) Midpoint x frequency (fx)
Cumulative frequency (cf)
60-<70 65 5 65 x 5 = 325 5
70-<80 75 7 75 x 7 = 525 5 + 7 = 12
80-<90 85 10 85 x 10 = 850 12 + 10 = 22
90-<100 95 12 95 x 12 = 1140 22 + 12 = 34
100-<110 105 8 105 x 8 = 840 34 + 8 = 42
110-<120 115 3 115 x 3 = 345 42 + 3 = 45
Total (not needed) 45 (n) (fx) = 4025 (not needed)
This means the mid point of the
class interval (i.e. the middle
number between 60 and
70 is 65 etc.)
STEP 3
• Use the data to find the mean, modal class and median
Class interval
Midpoint (x) Frequency (f)
Midpoint x frequency (fx)
Cumulative frequency (cf)
60-<70 65 5 325 5
70-<80 75 7 525 12
80-<90 85 10 850 22
90-<100 95 12 1140 34
100-<110 105 8 840 42
110-<120 115 3 345 45
Total 45 (n) (fx) = 4025
• MEAN
Use the formula:
Therefore, we can say that the mean is ≈ 89.4
(use a wavy equals sign to show that it is approximate as the we had to use intervals rather than individual data)
• MODAL CLASS
The interval with the highest frequency is
90-<100, which is the modal class.
Class interval Midpoint (x) Frequency (f)
Midpoint x frequency (fx)
Cumulative frequency (cf)
60-<70 65 5 325 5
70-<80 75 7 525 12
80-<90 85 10 850 22
90-<100 95 12 1140 34
100-<110 105 8 840 42
110-<120 115 3 345 45
Total 45 (n) (fx) = 4025
MEDIAN
1. Draw a combined cumulative frequency histogram (bar graph)
The mid points for each interval go along the bottom (x) axis, and the cumulative frequency (cf) up along the y axis
2. Draw a dot on each corner where the bars meet, and connect the dots with a line (this is called the ogive)
3. Find the middle of the cf axis (which is the last cf value divided by 2 45 ÷ 2 = 22.5)
4. Draw a horizontal line at this point and see where it meets the ogive
5. Draw a vertical line down to meet the data (x) axis
6. This is the approximate median, so we say that the median ≈ 90
(again, use the wavy equals sign)
QUESTIONS (GROUPED DATA)
Exercise 13A page 435 questions:
• 5
• 8 (multiple choice abcd)