1 complete business statistics by amir d. aczel & jayavel sounderpandian 7 th edition. prepared...
TRANSCRIPT
1
COMPLETE BUSINESS
STATISTICSbyby
AMIR D. ACZELAMIR D. ACZEL&&
JAYAVEL SOUNDERPANDIANJAYAVEL SOUNDERPANDIAN77thth edition. edition.
Prepared by Prepared by Lloyd Jaisingh, Morehead State UniversityLloyd Jaisingh, Morehead State University
2
Using Statistics 使用統計 Percentiles and Quartiles 百分位數與四分位數 Measures of Central Tendency 集中傾向之衡量 Measures of Variability 變異性之衡量 Grouped Data and the Histogram 群聚數據與直方圖 Skewness and Kurtosis 偏態與峰態 Relations between the Mean and Standard Deviation
Methods of Displaying Data Exploratory Data Analysis 探索性資料分析 Using the Computer 使用電腦
Introduction and Descriptive Statistics1
3
Distinguish between qualitative data and quantitative data.
Describe nominal, ordinal, interval, and ratio scales of measurements.
Describe the difference between population and sample. Calculate and interpret percentiles and quartiles. Explain measures of central tendency and how to
compute them. Create different types of charts that describe data sets. Use Excel templates to compute various measures and
create charts.
LEARNING OBJECTIVESAfter studying this chapter, you should be able toAfter studying this chapter, you should be able to::
4
Statistics is a science that helps us make better decisions in business and economics as well as in other fields.
Statistics teaches us how to summarize, analyze, and draw meaningful inferences from data that then lead to improve decisions.
These decisions that we make help us improve the running, for example, a department, a company, the entire economy, etc.
WHAT IS STATISTICSWHAT IS STATISTICS??
5
1-1. Using Statistics (Two Categories)
Inferential Statistics
推論統計 Predict and forecast values
of population parameters Test hypotheses about
values of population parameters
Make decisions
Descriptive Statistics
敘述統計 Collect Organize Summarize Display Analyze
6
Qualitative 定性 - Categorical or Nominal: Examples are- Color 顏色 Gender 性別 Nationality 國籍
Quantitative 定量 - Measurable or Countable: Examples are- Temperatures 溫度 Salaries 薪水 Number of points scored
on a 100 point exam
Types of Data - Two Types (p.4)
7
Scales of Measurement (p.4-5)
衡量尺度 定 義 特 色 可衡量變數
名目尺度 用以辨識或歸類個案之數值
(1) 僅名目本身有意義,數字大小無意義。(2) 是層級最低之尺度,無法轉換為其他尺度。(3) 僅能命名,不能排序。
性別、職業、居住區域、身分證碼、學號等。
順序尺度 用以表示等級或順序之數值
(1) 只能指出等級或順序,但無法衡量等級間之差異程度。(2) 可降低為名目尺度。(3) 僅能排序,不能加減。
名次、排序、百分位序等。
區間尺度 用以表示程度上差異之數值
(1) 具任意原點, 0 不代表「無」。(2) 可降低為順序尺度及名目尺度。(3) 僅能加減,不能乘除。
溫度、態度、滿意度、同意度、重要性等。
比率尺度 用以衡量實質上差異之數值
(1) 具絕對原點, 0 代表「無」 。(2) 可降低為區間尺度順序尺度及名目尺度等。(3) 能乘除。
所得、銷售額、考試分數、玉米收穫量等。
Categorical or nonmertric type• Nominal scale ( 名目尺度 ) • Ordinal scale ( 順序尺度 )
Analytical or metric type• Interval scale ( 區間尺度 )• Ratio scale ( 比率尺度 )
8
A population( 母體 ) consists of the set of all measurements for which the investigator is interested.
A sample( 樣本 ) is a subset of the measurements selected from the population.
A census( 普查 ) is a complete enumeration of every item in a population.
Samples and Populations( 樣本與母體 )P.5
9
Sampling( 抽樣 ) from the population is often done randomly( 隨機 ), such that every possible sample of equal size (n) will have an equal chance of being selected.
A sample selected in this way is called a simple random sample or just a random sample.
A random sample allows chance to determine its elements.
Simple Random Sample
10
Population (N) Sample (n)
Samples and Populations
11
Census( 普查 ) of a population may be:Impossible( 不可能 ) Impractical( 不實際 )Too costly( 成本高 )
Why Sample?
12
Exercise (p.8, 5min)
• 1-1
• 1-4
• 1-5
13
Given any set of numerical observations, order them according to magnitude.
The Pth percentile in the ordered( 已排序 ) set is that value below which lie P% (P percent) of the observations in the set.
The position of the Pth percentile is given by (n + 1)P/100, where n is the number of observations in the set.
1-2 Percentiles( 百分位數 ) and Quartiles( 四分位數 )
14
A large department store collects data on sales made by each of its salespeople. The number of sales made on a given day by each of 2020 salespeople is shown on the next slide. Also, the data has been sorted in magnitude.
Example 1-2 (p.9)
15
Example 1-2 (Continued) - Sales and Sorted Sales
Sales Sorted Sales
9 6 6 9 12 1010 1213 1315 1416 1414 1514 1616 1617 1616 1724 1721 1822 1818 1919 2018 2120 2217 24
16
Find the 50th, 80th, and the 90th percentiles of this data set. To find the 50th percentile, determine the data point in position (n + 1)P/100 = (20 + 1)(50/100) = 10.5. Thus, the percentile is located at the 10.5th position. The 10th observation is 16, and the 11th observation is also 16. The 50th percentile will lie halfway between the 10th and 11th values and is thus 16.
Example 1-2 (Continued) Percentiles
17
To find the 80th percentile, determine the data point in position (n + 1)P/100 = (20 + 1)(80/100) = 16.8. Thus, the percentile is located at the 16.8th position. The 16th observation is 19, and the 17th observation is also 20. The 80th percentile is a point lying 0.8 of the way from 19 to 20 and is thus 19.8.
Example 1-2 (Continued) Percentiles
18
To find the 90th percentile, determine the data point in position (n + 1)P/100 = (20 + 1)(90/100) = 18.9. Thus, the percentile is located at the 18.9th position. The 18th observation is 21, and the 19th observation is also 22. The 90th percentile is a point lying 0.9 of the way from 21 to 22 and is thus 21.9.
Example 1-2 (Continued) Percentiles
Example 1-2
19
Quartiles( 四分位數 ) are the percentage points that break down the ordered data set into quarters. The first quartile is the 25th percentile. It is the point below which lie 1/4 of the data. The second quartile is the 50th percentile. It is the point below which lie 1/2 of the data. This is also called the median( 中位數 ). The third quartile is the 75th percentile. It is the point below which lie 3/4 of the data.
Quartiles – Special Percentiles( 特殊百分位數 ,p.10)
20
The first quartile, Q1, (25th percentile) is often called the lower quartile( 下四分位數 ). The second quartile, Q2, (50th
percentile) is often called median or the middle quartile( 中四分位數 ). The third quartile, Q3, (75th percentile) is often called the upper quartile( 上四分位數 ). The interquartile range( 四分位數間距 ) is the difference between the first and the third quartiles.
Quartiles and Interquartile Range
21
SortedSales Sales 9 6 6 9 12 10 10 12 13 13 15 14 16 14 14 15 14 16 16 16 17 16 16 17 24 17 21 18 22 18 18 19 19 20 18 21 20 22 17 24
First QuartileFirst Quartile
MedianMedian
Third QuartileThird Quartile
((n+1)P/100n+1)P/100
(20+1)25/100=5.25
(20+1)50/100=10.5
(20+1)75/100=15.75
13 + (.25)(1) = 13.25
16 + (.5)(0) = 16
18+ (.75)(1) = 18.75
QuartilesQuartiles
Example 1-3: Finding Quartiles
Position
(16-16)
Basic Stat.xls
22
Example 1-3: Using the Template
23
Example 1-3 (Continued): Using the Template
This is the lower part of the same This is the lower part of the same template from the previous slide.template from the previous slide.
24
Exercise, p.11, 10 min
• 1-9(Ans : Q1=9, Q2=11.6, Q3=15.5,
55%=12.32, 85%=16.5)
• 1-12(Ans : median=51, Q1=30.5, Q3=194.25
IQR=163.75, 45%=42.2)
Basic Stat.xlsP %= (n+1)P / 100
25
Measures of Variability( 衡量變異性 ) Range 全距 Interquartile range 四分位間距 Variance 變異數 Standard Deviation 標準差
Measures of Central Tendency( 衡量集中傾向 )Median 中位數Mode 眾數Mean 平均數
Other summary measures: 其他Skewness 偏態
Kurtosis 峰態
Summary Measures: Population Parameters Sample Statistics
26
Median 中位數 Middle value when sorted in order of magnitude 50th percentile
Mode 眾數 Most frequently- occurring value
Mean 平均數 Average
1-3 Measures of Central Tendency or Location(p.11)
27
Sales Sorted Sales
9 6 6 9 12 1010 1213 1315 1416 1414 1514 1616 1617 1616 1724 1721 1822 1818 1919 2018 2120 2217 24
Median
Median50th Percentile
(20+1)50/100=10.5 16 + (.5)(0) = 16
The median is the middle value of data sorted in order of magnitude. It is the 50th percentile.
Example – Median (Data is used from Example 1-2)
See slide # 19 for the template outputSee slide # 19 for the template output
28
. . . . . . : . : : : . . . . .--------------------------------------------------------------- 6 9 10 12 13 14 15 16 17 18 19 20 21 22 24
. . . . . . : . : : : . . . . .--------------------------------------------------------------- 6 9 10 12 13 14 15 16 17 18 19 20 21 22 24
Mode = 16
The mode is the most frequently occurring value. It is the value with the highest frequency.
Example - Mode (Data is used from Example 1-2)
See slide # 19 for the template outputSee slide # 19 for the template output
29
The mean( 平均數 ) of a set of observations is their average - the sum of the observed values divided by the number of observations.
Population Mean 母體平均數 Sample Mean 樣本平均數
x
Ni
N
1
Arithmetic Mean or Average
xx
ni
n
1
n
30
xx
ni
n
1 31720
1585.
Sales 9 6 12 10 13 15 16 14 14 16 17 16 24 21 22 18 19 18 20 17
317
Example – Mean (Data is used from Example 1-2)
See slide # 19 for the template outputSee slide # 19 for the template output
31
. . . . . . : . : : : . . . . . --------------------------------------------------------------- 6 9 10 12 13 14 15 16 17 18 19 20 21 22 24
. . . . . . : . : : : . . . . . --------------------------------------------------------------- 6 9 10 12 13 14 15 16 17 18 19 20 21 22 24
Median and Mode = 16
Mean = 15.85
Example - Mode (Data is used from Example 1-2)
See slide # 19 for the template outputSee slide # 19 for the template output
每一點代表一個數值
32
Exercise, p.15, 5 min
• 例 1- 4
• 1-13 ~ 1-16 (See Textbook p.656)
• 1-17 (Ans : mean=18.34, median=19.1)
33
Range 全距Difference between maximum and minimum values
Interquartile Range 四分位數間距Difference between third and first quartile (Q3 - Q1)
Variance 變異數Average*of the squared deviations from the mean
Standard Deviation 標準差Square root of the variance
Definitions of population variance and sample variance differ slightly.
1-4 Measures of Variability or Dispersion (p.15)
34
SortedSales Sales Rank 9 6 1 6 9 212 10 310 12 413 13 515 14 616 14 714 15 814 16 916 16 1017 16 1116 17 1224 17 1321 18 1422 18 1518 19 1619 20 1718 21 1820 22 1917 24 20
First Quartile
Third Quartile
Q1 = 13 + (.25)(1) = 13.25
Q3 = 18+ (.75)(1) = 18.75
Minimum
Maximum
Range Maximum - Minimum = 24 - 6 = 18
Interquartile Range
Q3 - Q1 = 18.75 - 13.25 = 5.5
Example - Range and Interquartile Range (Data is used from Example 1-2)
35
( )
2
2
1
2
1
2
2
1
( )x
N
xN
N
i
N
i
N xi
N
Population Variance 母體變異數
sx x
n
xx
nn
s s
i
n
i
ni
n
2
2
1
2
1
2
2
1
1
1
( )
Sample Variance 樣本變異數
Variance and Standard Deviation
( )
36
公式證明
n
xx
nxnx
n
xnxnxxn
xxxxn
xxxxn
xxn
s
xnxn
xx
iii
iii
ii
n
ii
ii
2
222
2222
22
1
22
1
1
1
1
*21
12
1
1
21
1)(
1
1
18 -8.9 79.21 32418 -8.9 79.21 32418 -8.9 79.21 32418 -8.9 79.21 32419 -7.9 62.41 36120 -6.9 47.61 400 20 -6.9 47.61 40020 -6.9 47.61 40021 -5.9 34.81 44122 -4.9 24.01 48422 -4.9 24.01 48423 -3.9 15.21 52924 -2.9 8.41 57626 -0.9 0.81 67627 0.1 0.01 72932 5.1 26.01 102433 6.1 37.21 108949 22.1 488.41 240152 25.1 630.01 270456 29.1 846.81 3136
538 0 2657.8 17130
x xx ( )x x 2
x 2
82.1188421.139
88421.13919
8.2657
19
2.1447217130
1920
28944417130
12020
53817130
1
1
88421.13919
8.2657
)120(
8.2657
1
)(
2
2
2
1
2
1
2
2
s
x
s
nn
n
ix
n
xxs
n
i
n
i
Calculation of Sample Variance
38
Example: Sample Variance Using the Template
Note: This is Note: This is just a just a replication replication of slide #19.of slide #19.
39
Exercise, p.22, 10 min
• 標準差之計算 - 例 1- 5, 1- 6 (p.20) 或例 1- 2
• 1- 18 (p.22)
• 1-19 (Ans. Range=27, 57.7386, 7.5986)
• 1-20 (Ans. Range=60, 321.3788, 17.9270)
• 1-21 (Ans. Range=1186, 110287.45,
332.09555)
Basic Stat.xls
40
Dividing data into groups or classes or intervals
Groups should be:Mutually exclusive 群間互斥
• Not overlapping - every observation is assigned to only one group
Exhaustive 完全分群• Every observation is assigned to a group
Equal-width (if possible) 等寬• First or last group may be open-ended
1-5 Group Data and the Histogram群聚數據與直方圖
41
Table with two columns 兩行 listing:Each and every group or class or interval of valuesAssociated frequency of each group
• Number of observations assigned to each group• Sum of frequencies is number of observations
– N for population– n for sample
Class midpoint 組中點 is the middle value of a group or class or interval
Relative frequency 相對頻率 is the percentage of total observations in each classSum of relative frequencies = 1
Frequency Distribution 頻率分配
42
x f(x) f(x)/nSpending Class ($) Frequency (number of customers) Relative Frequency
0 to less than 100 30 0.163100 to less than 200 38 0.207200 to less than 300 50 0.272300 to less than 400 31 0.168400 to less than 500 22 0.120500 to less than 600 13 0.070
184 1.000
x f(x) f(x)/nSpending Class ($) Frequency (number of customers) Relative Frequency
0 to less than 100 30 0.163100 to less than 200 38 0.207200 to less than 300 50 0.272300 to less than 400 31 0.168400 to less than 500 22 0.120500 to less than 600 13 0.070
184 1.000
• Example of relative frequency: 30/184 = 0.163 • Sum of relative frequencies = 1
Example 1-7: Frequency Distribution p.23
43
x F(x) F(x)/nSpending Class ($) Cumulative Frequency Cumulative Relative Frequency
0 to less than 100 30 0.163100 to less than 200 68 0.370200 to less than 300 118 0.641300 to less than 400 149 0.810400 to less than 500 171 0.929500 to less than 600 184 1.000
x F(x) F(x)/nSpending Class ($) Cumulative Frequency Cumulative Relative Frequency
0 to less than 100 30 0.163100 to less than 200 68 0.370200 to less than 300 118 0.641300 to less than 400 149 0.810400 to less than 500 171 0.929500 to less than 600 184 1.000
The cumulative frequencycumulative frequency 累積頻率累積頻率 of each group is the sum of the frequencies of that and all preceding groups.
The cumulative frequencycumulative frequency 累積頻率累積頻率 of each group is the sum of the frequencies of that and all preceding groups.
Cumulative Frequency Distribution
44
頻率分配圖練習 , 10 min
• 例 1- (p.25), 以 5 為距離
Basic Stat.xls
45
A histogram histogram is a chart made of bars of different heights. 不同高度之條狀圖Widths and locations of bars correspond to
widths and locations of data groupings 寬度與位置代表群組的資料寬度與位置
Heights of bars correspond to frequencies or relative frequencies of data groupings 高度代表頻率
Histogram 直方圖
46
Frequency Histogram
Histogram Example : 1-7
47
Relative Frequency Histogram
Histogram Example
48
Skewness– Measure of asymmetry of a frequency distribution
• Skewed to left 左偏 <0• Symmetric or unskewed 對稱• Skewed to right 右偏 >0
Kurtosis– Measure of flatness or peakedness of a frequency
distribution• Platykurtic (relatively flat)• Mesokurtic (normal)• Leptokurtic (relatively peaked) * 公示如 p.27
1-6 Skewness 偏度 and Kurtosis 峰度 p.25
49
Skewed to left
Skewness偏度值 -, 越左偏
50
Skewness
Symmetric
51
SkewnessSkewed to right
偏度值 +, 越右偏
52
Kurtosis
Platykurtic 平扁 - flat distribution
扁度值越小 , 越平扁
53
KurtosisMesokurtic - not too flat and not too peaked
54
Kurtosis
Leptokurtic 尖扁 - peaked distribution
扁度值越大 , 越尖突
55
Chebyshev’s Theorem 柴比雪夫定理Applies to any distribution, regardless of shape 可應用於任何分配之數據
Places lower limits on the percentages of observations within a given number of standard deviations from the mean
Empirical Ruler 經驗法則Applies only to roughly mound-shaped and symmetric
distributions 適用山型與對稱之數據Specifies approximate percentages of observations
within a given number of standard deviations from the mean
1-7 Relations between the Mean and Standard Deviation p.27 ( 重要 )
2
11)|(|
kkxp
56
11
21
14
34
75%
11
31
19
89
89%
11
41
116
1516
94%
2
2
2
At least of the elements of any distribution lie within k standard deviations of the mean
At least
Lie within
Standarddeviationsof the mean
2
3
4
Chebyshev’s Theorem
2
11
k
57
For roughly mound-shaped and symmetric distributions, approximately:
68% 1 standard deviation of the mean
95% Lie within
2 standard deviations of the mean
All 3 standard deviations of the mean
Empirical Rule 經驗法則
58
Exercise, p.28, 10 min
• Exercise 1- 22
Basic Stat.xls
59
Pie Charts 圓餅圖Categories represented as percentages of total
Bar Graphs 直條圖Heights of rectangles represent group frequencies
Frequency Polygons 頻率圖 Height of line represents frequency
Ogives 累加頻率圖 Height of line represents cumulative frequency
Time Plots 時間圖Represents values over time
1-8 Methods of Displaying Data
60
Pie Chart
61
Bar Chart
Average Revenues
Average Expenses
Fig. 1-11 Airline Operating Expenses and Revenues
1 2
1 0
8
6
4
2
0
A i r li n e
American Continental Delta Northwest Southwest United USAir
62
Relative Frequency Polygon Ogive
Frequency Polygon and Ogive
50403020100
0.3
0.2
0.1
0.0
Re
lativ
e F
req
ue
ncy
Sales50403020100
1.0
0.5
0.0
Cu
mu
lativ
e R
ela
tive
Fre
qu
en
cySales
63
OSAJJMAMFJDNOSAJJMAMFJDNOSAJJMAMFJ
8.5
7.5
6.5
5.5
Month
Mill
ions
of T
ons
M o nthly S te e l P ro d uc tio n
(P ro b le m 1 -4 6 )
Time Plot
64
圖形練習 , 10 min
• 1- 24
• 1- 25
1-24.xls 1-25.xls
65
Stem-and-Leaf Displays 莖葉 Quick-and-dirty listing of all observations 快速瀏覽所有觀測值 Conveys some of the same information as a histogram 將資料轉
化成直方圖 Box Plots 盒形圖
Median Lower and upper quartiles Maximum and minimum
Techniques to determine relationships 關係 and trends 趨勢 , identify outliers 離群值 and influential 有影響的 observations, and quickly describe 快速描述 or summarize 總結 data sets.
Techniques to determine relationships 關係 and trends 趨勢 , identify outliers 離群值 and influential 有影響的 observations, and quickly describe 快速描述 or summarize 總結 data sets.
1-9 Exploratory Data Analysis – EDA探索性資料分析
66
1 122355567 (10 ~) 2 0111222346777899 (20 ~) 3 012457 (30 ~) 4 11257 (40 ~) 5 0236 (50 ~) 6 02 (60 ~)
1 122355567 (10 ~) 2 0111222346777899 (20 ~) 3 012457 (30 ~) 4 11257 (40 ~) 5 0236 (50 ~) 6 02 (60 ~)
Example 1-8: Stem-and-Leaf Display
67
X X *o
MedianQ1 Q3InnerFence
InnerFence
OuterFence
OuterFence
Interquartile Range
Smallest data point not below inner fence
Largest data point not exceeding inner fence
Suspected outlierOutlier
Q1-3(IQR)Q1-1.5(IQR) Q3+1.5(IQR)
Q3+3(IQR)
Elements of a Box PlotElements of a Box Plot
Box Plot 盒形圖
離群值
IQR
一半數據在盒內
68
Example: Box Plot
69
Exercise, p.39, 15 min
• 1- 27
BoxPlot.xls
70
1-10 Using the Computer – The Template Output
71
Using the Computer – Template Output for the Histogram
72
Using the Computer – Template Output for Histograms for Grouped Data
73
Using the Computer – Template Output for Frequency Polygons & the Ogive for Grouped Data
74
Using the Computer – Template Output for Two Frequency Polygons for Grouped Data
75
Using the Computer – Pie Chart Template Output
76
Using the Computer – Bar Chart Template Output
77
Using the Computer – Box Plot Template Output
78
Using the Computer – Box Plot Template to Compare Two Data Sets
79
Using the Computer – Time Plot
Template
80
Using the Computer – Time Plot
Comparison Template