chapter 13 – univariate data. what is this topic about? in this topic, we look at data sets (i.e....

25
CHAPTER 13 – UNIVARIATE DATA

Upload: dwayne-terry

Post on 24-Dec-2015

216 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: CHAPTER 13 – UNIVARIATE DATA. WHAT IS THIS TOPIC ABOUT? In this topic, we look at data sets (i.e. groups of numbers) and we apply mathematical tests to

CHAPTER 13 – UNIVARIATE DATA

Page 2: CHAPTER 13 – UNIVARIATE DATA. WHAT IS THIS TOPIC ABOUT? In this topic, we look at data sets (i.e. groups of numbers) and we apply mathematical tests to

WHAT IS THIS TOPIC ABOUT?• In this topic, we look at data sets (i.e. groups of numbers) and we apply

mathematical tests to the data to learn about it

• A data set is a group of numbers that we find from research e.g. survey results, observations of the world

• ‘Univariate’ means that there is one (‘uni’) variable (‘variate’)

• A variable is something that varies or changes

• We measure this variable in order to learn about whatever we are researching

Page 3: CHAPTER 13 – UNIVARIATE DATA. WHAT IS THIS TOPIC ABOUT? In this topic, we look at data sets (i.e. groups of numbers) and we apply mathematical tests to

13A: MEASURES OF CENTRAL TENDENCY• Measures of central tendency are methods that we use to look at the

middle or centre point of the data we have collected through research

• There are three different ways of doing this:

• Mean: the average of all observations in a set of data ()

• Median: the middle observation in an set of data that is put in order

• Mode: the most frequent/common observation in a data set

• Grouped vs ungrouped data sets

• Ungrouped means each individual data observation is looked at within the data set

• Grouped means that the data has been put into different groups or intervals, rather than looking at each data observation separately

Page 4: CHAPTER 13 – UNIVARIATE DATA. WHAT IS THIS TOPIC ABOUT? In this topic, we look at data sets (i.e. groups of numbers) and we apply mathematical tests to

SYMBOLS AND TABLES

• Frequency table: used to count the number of times something is observed

• ‘Frequency’ just means the number of observations

• means ‘sum’ or ‘total’

• is called ‘x bar’ and is the symbol for ‘mean’ or ‘average’

Observation Frequency

Red cars 2

Blue cars 5

Yellow cars 3

Page 5: CHAPTER 13 – UNIVARIATE DATA. WHAT IS THIS TOPIC ABOUT? In this topic, we look at data sets (i.e. groups of numbers) and we apply mathematical tests to

THE FOLLOWING APPLIES TO UNGROUPED DATA SETS

Page 6: CHAPTER 13 – UNIVARIATE DATA. WHAT IS THIS TOPIC ABOUT? In this topic, we look at data sets (i.e. groups of numbers) and we apply mathematical tests to

MEAN ()To find the mean (average) of the data set:

1. Add all the observations/scores in the data set together (they do not have to be in order)

2. Divide by the number of observations/scores

We can write this as:

Or, as:

where x is the scores

and n is the number of scores

Page 7: CHAPTER 13 – UNIVARIATE DATA. WHAT IS THIS TOPIC ABOUT? In this topic, we look at data sets (i.e. groups of numbers) and we apply mathematical tests to

FINDING THE MEAN: WORKED EXAMPLE

Find the mean of the data set: 6, 2, 4, 3, 4, 5, 4, 51. Add the observations/scores together (in other words, find which is the total/sum of the scores)

= 6 + 2 + 4 + 3 + 4 + 5 + 4 + 5

= 33

2. Divide by the number of scores (n)

There are 8 scores in this data set (n = 8)

=

= 4.125

Page 8: CHAPTER 13 – UNIVARIATE DATA. WHAT IS THIS TOPIC ABOUT? In this topic, we look at data sets (i.e. groups of numbers) and we apply mathematical tests to

MEDIAN

To find the median (middle/centre score) of the data set:

1. Arrange the scores in numerical order (smallest to biggest is the easiest way)

2. Put one finger on the smallest score, and a finger on the biggest score, and move your fingers inward one number at a time until they meet at the middle score

3. If there are an odd number of scores, the median is the middle score

4. If there are an even number of scores, find the mean/average () of the two middle scores

Page 9: CHAPTER 13 – UNIVARIATE DATA. WHAT IS THIS TOPIC ABOUT? In this topic, we look at data sets (i.e. groups of numbers) and we apply mathematical tests to

FINDING THE MEDIAN: WORKED EXAMPLE

Find the median of the data set: 6, 2, 4, 3, 4, 5, 4, 5, 3

1. Put the scores in numerical order:

2, 3, 3, 4, 4, 4, 5, 5, 6

2. Working inwards from the smallest and biggest numbers, we find that the middle score is 4

Therefore, the median of this data set is 4.

Page 10: CHAPTER 13 – UNIVARIATE DATA. WHAT IS THIS TOPIC ABOUT? In this topic, we look at data sets (i.e. groups of numbers) and we apply mathematical tests to

MODE

To find the mode (most frequent/common score) of the data set:

1. Work through the data set and record how many times each score appears (it might be easier to put them in order first to ensure you don’t miss any)

2. Whichever score appears most frequently/commonly is the mode

Note:

• Sometimes there is no mode – each score appears once only

• Sometimes there is one clear mode – one number that appears most frequently/commonly

• Sometimes there is more than one mode

Page 11: CHAPTER 13 – UNIVARIATE DATA. WHAT IS THIS TOPIC ABOUT? In this topic, we look at data sets (i.e. groups of numbers) and we apply mathematical tests to

FINDING THE MODE: WORKED EXAMPLE

Find the mode of the data set: 6, 2, 4, 3, 4, 5, 4, 5, 3

1. (optional, but useful) Put the scores in numerical order:

2, 3, 3, 4, 4, 4, 5, 5, 6

2. Determine which number (or numbers) appear most commonly

In this case, the mode is 4 (it appears three times in this data set)

Page 12: CHAPTER 13 – UNIVARIATE DATA. WHAT IS THIS TOPIC ABOUT? In this topic, we look at data sets (i.e. groups of numbers) and we apply mathematical tests to

CALCULATING THE MEAN, MEDIAN AND MODE FROM A FREQUENCY TABLE• First, we draw up a table with four columns: Score (x), Frequency (f), Frequency x score (fx), Cumulative

frequency (cf)

• We find the MEAN using this formula:

f = frequency, x = the scores

• We find the MEDIAN by finding the position of each score in cumulative frequency column

We then use the formula to find where (at what position) the median will appear, and read this score off the cf

column

• We find the MODE by looking for the score with the highest frequency

Page 13: CHAPTER 13 – UNIVARIATE DATA. WHAT IS THIS TOPIC ABOUT? In this topic, we look at data sets (i.e. groups of numbers) and we apply mathematical tests to

WORKED EXAMPLE: FREQUENCY TABLE

• This is what the question might look like: Find the mean, median and mode of the data set below.

• If you were to write this data out as a list, it would be:

4, 5, 5, 6, 6, 6, 6, 6, 7, 7, 7, 7, 8, 8, 8 (i.e. one 4, two 5s, five 6s, four 7s, three 8s)

Score (x) Frequency (f)

4 1

5 2

6 5

7 4

8 3

Total n

Page 14: CHAPTER 13 – UNIVARIATE DATA. WHAT IS THIS TOPIC ABOUT? In this topic, we look at data sets (i.e. groups of numbers) and we apply mathematical tests to

WORKED EXAMPLE: FREQUENCY TABLE

• Draw up this table, but add these two extra columns:

Score (x) Frequency (f) Frequency x score (fx) Cumulative frequency (cf)

4 1

5 2

6 5

7 4

8 3

Total n (fx)

Page 15: CHAPTER 13 – UNIVARIATE DATA. WHAT IS THIS TOPIC ABOUT? In this topic, we look at data sets (i.e. groups of numbers) and we apply mathematical tests to

WORKED EXAMPLE: FREQUENCY TABLE

• Fill in all the data

Score (x) Frequency (f) Frequency x score (fx) Cumulative frequency (cf)

4 1 4 x 1 = 4 1

5 2 5 x 2 = 10 1 + 2 = 3

6 5 6 x 5 = 30 3 + 5 = 8

7 4 7 x 4 = 28 8 + 4 = 12

8 3 8 x 3 = 24 12 + 3 = 15

Total 1 + 2 + 5 + 4 + 3n = 15

4 + 10 + 30 + 28 + 24(fx) = 96

(not needed)

In this column, add the frequencies

together from one row to the next (the

first number will always be the first

frequency)

In this column, multiple the score by the

frequency

Page 16: CHAPTER 13 – UNIVARIATE DATA. WHAT IS THIS TOPIC ABOUT? In this topic, we look at data sets (i.e. groups of numbers) and we apply mathematical tests to

WORKED EXAMPLE: FREQUENCY TABLE

• Use the data to find the mean, median and mode

Score (x)

Frequency (f) Frequency x score (fx)

Cumulative frequency (cf)

4 1 4 1

5 2 10 3

6 5 30 8

7 4 28 12

8 3 24 15

Total n = 15 (fx) = 96

• MEAN

Use the formula:

• MEDIAN

Locate the position of the median using

Median position =

= 8, which means that the median is the 8th score

Use the cf column to find the 8th score, which is 6

• MODE

The score with the highest frequency is 6, therefore 6 is the mode

Page 17: CHAPTER 13 – UNIVARIATE DATA. WHAT IS THIS TOPIC ABOUT? In this topic, we look at data sets (i.e. groups of numbers) and we apply mathematical tests to

QUESTIONS (UNGROUPED DATA)

Exercise 13A page 435 questions:

• 1acd

• 2 (stem and leaf plot – see )

• 3 (frequency tables)

• 13abcdThe stem is the first number,

and the leaves are the second number, so for Science,

8 7 3 ǀ 3becomes 38, 37 and 33.

For Maths, 4 ǀ 0 6 8

becomes 40, 46 and 48.

Page 18: CHAPTER 13 – UNIVARIATE DATA. WHAT IS THIS TOPIC ABOUT? In this topic, we look at data sets (i.e. groups of numbers) and we apply mathematical tests to

GROUPED DATA• When data is grouped, we lose the original values, because instead of

having individual numbers, we are given an interval or group (e.g. 0-10)

• Therefore, we need to estimate the mean, median and mode using different methods

THE FOLLOWING APPLIES TO GROUPED DATA SETS

Page 19: CHAPTER 13 – UNIVARIATE DATA. WHAT IS THIS TOPIC ABOUT? In this topic, we look at data sets (i.e. groups of numbers) and we apply mathematical tests to

Mean

• With class intervals, the individual values are lost.

• Use midpoints of the intervals into which these values fall.

• For example, when measuring heights of students in a class, if we found that 4 students had a height between 180 and 185 cm, we have to assume that each of those 4 students is 182.5 cm tall. The formula used for calculating the mean is the same as for data presented in a frequency table

• Here x represents the midpoint (or class centre) of each class interval, f is the corresponding frequency and n is the total number of observations in a set.

Median

• The median is found by drawing a cumulative frequency polygon (ogive) of the data and estimating the median from the 50th percentile.

Modal class

• We do not find a mode because exact scores are lost. We can, however, find a modal class. This is the class interval that has the highest frequency.

Page 20: CHAPTER 13 – UNIVARIATE DATA. WHAT IS THIS TOPIC ABOUT? In this topic, we look at data sets (i.e. groups of numbers) and we apply mathematical tests to

WORKED EXAMPLE: GROUPED DATA

Page 21: CHAPTER 13 – UNIVARIATE DATA. WHAT IS THIS TOPIC ABOUT? In this topic, we look at data sets (i.e. groups of numbers) and we apply mathematical tests to

STEP 1

• Draw up this table but add in three columns: ‘midpoint’, ‘midpoint x frequency’ and ‘cumulative frequency’ (the blue is the stuff I have added)

Class interval Midpoint (x) Frequency (f) Midpoint x frequency (fx)

Cumulative frequency (cf)

60-<70 5

70-<80 7

80-<90 10

90-<100 12

100-<110 8

110-<120 3

Total (not needed) 45 (n) (fx) (not needed)

Page 22: CHAPTER 13 – UNIVARIATE DATA. WHAT IS THIS TOPIC ABOUT? In this topic, we look at data sets (i.e. groups of numbers) and we apply mathematical tests to

STEP 2

• Fill in the data

Class interval Midpoint (x) Frequency (f) Midpoint x frequency (fx)

Cumulative frequency (cf)

60-<70 65 5 65 x 5 = 325 5

70-<80 75 7 75 x 7 = 525 5 + 7 = 12

80-<90 85 10 85 x 10 = 850 12 + 10 = 22

90-<100 95 12 95 x 12 = 1140 22 + 12 = 34

100-<110 105 8 105 x 8 = 840 34 + 8 = 42

110-<120 115 3 115 x 3 = 345 42 + 3 = 45

Total (not needed) 45 (n) (fx) = 4025 (not needed)

This means the mid point of the

class interval (i.e. the middle

number between 60 and

70 is 65 etc.)

Page 23: CHAPTER 13 – UNIVARIATE DATA. WHAT IS THIS TOPIC ABOUT? In this topic, we look at data sets (i.e. groups of numbers) and we apply mathematical tests to

STEP 3

• Use the data to find the mean, modal class and median

Class interval

Midpoint (x) Frequency (f)

Midpoint x frequency (fx)

Cumulative frequency (cf)

60-<70 65 5 325 5

70-<80 75 7 525 12

80-<90 85 10 850 22

90-<100 95 12 1140 34

100-<110 105 8 840 42

110-<120 115 3 345 45

Total 45 (n) (fx) = 4025

• MEAN

Use the formula:

Therefore, we can say that the mean is ≈ 89.4

(use a wavy equals sign to show that it is approximate as the we had to use intervals rather than individual data)

• MODAL CLASS

The interval with the highest frequency is

90-<100, which is the modal class.

Page 24: CHAPTER 13 – UNIVARIATE DATA. WHAT IS THIS TOPIC ABOUT? In this topic, we look at data sets (i.e. groups of numbers) and we apply mathematical tests to

Class interval Midpoint (x) Frequency (f)

Midpoint x frequency (fx)

Cumulative frequency (cf)

60-<70 65 5 325 5

70-<80 75 7 525 12

80-<90 85 10 850 22

90-<100 95 12 1140 34

100-<110 105 8 840 42

110-<120 115 3 345 45

Total 45 (n) (fx) = 4025

MEDIAN

1. Draw a combined cumulative frequency histogram (bar graph)

The mid points for each interval go along the bottom (x) axis, and the cumulative frequency (cf) up along the y axis

2. Draw a dot on each corner where the bars meet, and connect the dots with a line (this is called the ogive)

3. Find the middle of the cf axis (which is the last cf value divided by 2 45 ÷ 2 = 22.5)

4. Draw a horizontal line at this point and see where it meets the ogive

5. Draw a vertical line down to meet the data (x) axis

6. This is the approximate median, so we say that the median ≈ 90

(again, use the wavy equals sign)

Page 25: CHAPTER 13 – UNIVARIATE DATA. WHAT IS THIS TOPIC ABOUT? In this topic, we look at data sets (i.e. groups of numbers) and we apply mathematical tests to

QUESTIONS (GROUPED DATA)

Exercise 13A page 435 questions:

• 5

• 8 (multiple choice abcd)