ch4 fundasqct
TRANSCRIPT
-
8/13/2019 Ch4 FundaSQCT
1/36
1
Chapter - 4
Fundamentals of Statistical
Concepts & Techniques in QualityControl and Improvement
-
8/13/2019 Ch4 FundaSQCT
2/36
2
Basic Terminologies
Population Set of all items that
possess a certain
characteristic of interest
Eg. Average thickness ofthe plastic cups produced
in week no. 23 (10,000)
Parameter Is a characteristic of a
population, which
describes it
Eg. Average thickness of10,000 cups
Sample A subset of population
Eg. Selecting 200 plasticcups from the week 23
output
Statistic A characteristic of a
sample, which is used to
make inferences on the
population parameters that
are unknown
Eg. Average thickness of200 plastic cups is 1mm
-
8/13/2019 Ch4 FundaSQCT
3/36
Assigning Probabilities
Classical MethodAssigning probabilities based on theassumption of equally likely outcomes.
Relative Frequency MethodAssigning probabilities based onexperimentation or historical data.
Subjective MethodAssigning probabilities based on theassignors judgment.
-
8/13/2019 Ch4 FundaSQCT
4/36
4
Basics of Probability
Probability of an event describes the chance ofoccurrence of that event
A probability function is bound by 0 and 1
0 for non-occurrence, 1 for occurrence Set of all outcomes of an experiment is calledsample space (S)
If each outcome in sample space is likely to
happen, then the prob. of event A is given byP(A) = na / N and probability associated withsample space is P(S) = 1
-
8/13/2019 Ch4 FundaSQCT
5/36
5
Basics of ProbabilityContd..
Events Simple events cannot be broken into other events Compound events are made up of two or more simple
events
Complementary of an event, say A, implies theoccurrence of everything except A. i.e. P(Ac) = 1P(A)
Laws Additive law defines the probability of the union of 2
or more events (say A & B), i.e. implies A mayhappen, B may happen or both
P(A u B) = P(A) + P(B)P(A n B)
-
8/13/2019 Ch4 FundaSQCT
6/36
6
Basics of ProbabilityContd..
Lawscontd.. Multiplicative law defines the probability of the
intersection of 2 or more events (say A & B), i.e.implies all the events in the group occurs
P(A n B) = P(A).P(B | A) = P(B).P(A | B) P(B | A) represents conditional probability, (i.e.,
probability that B occurs if A has)
Independence Two events A & B are said to be independent, if the
outcome of one has no influence on outcome of other
P(B | A) = P(B) and hence P(A n B) = P(A).P(B)
-
8/13/2019 Ch4 FundaSQCT
7/36
-
8/13/2019 Ch4 FundaSQCT
8/36
-
8/13/2019 Ch4 FundaSQCT
9/36
9
Statistics Statistics is the science that deals with the collection, classification,
analysis and making of inferences from data
Descriptive Statistics Describes the characteristics of product or process using
information collected on it
Inferential Statistics Draws conclusion on unknown process parameters based on
sample information
Data Collection Direct observation
Indirect observation (Questionnaires) No control over data and Chances of Error is more It can be described by random variablecontinuous or discrete
-
8/13/2019 Ch4 FundaSQCT
10/36
10
StatisticsContd..
Continuous variable Variable that can assume any value on a continuous scale within
a range Eg. Viscosity of a resin
Discrete variable Variable that can assume a finite number of values are called
discrete Eg. No. of defect in a shirt They are classified as acceptable or not Continuous characteristic can also be viewed as discrete. Eg.
Diameter of a hub in a tire
Accuracy Refers to the degree of uniformity of the observations around a
desired value, such that on average, target is realized
Precision Refers to the degree of variability of observation
-
8/13/2019 Ch4 FundaSQCT
11/36
11
-
8/13/2019 Ch4 FundaSQCT
12/36
12
Measures of Scale
Nominal Scale Data variables are simply labels to identify an attribute Eg. Critical / Major / Minor
Ordinal Scale Data has the properties of nominal data
Data ranks or orders the observation Eg. Grades, 1- outstanding, 5poor Interval Scale
Data has the properties of ordinal data and a fixed unit ofmeasure describes the interval between observations
Eg. Temp. of water in diff stages of cooling during 2 hrs interval
Ratio Scale Data has the properties of Interval data and a natural zero exists
for measurement scale Eg. Wt. of cement bag: 100 kg, 100.2kg.
-
8/13/2019 Ch4 FundaSQCT
13/36
Measures of Scale A nominal scale is really a list of categories to which objects can be
classified. For example, people who receive a mail order offer mightbe classified as "no response," "purchase and pay," "purchase but
return the product," and "purchase and neither pay nor return." The
data so classified are termed categorical data.
An ordinal scale is a measurement scale that assigns values to
objects based on their ranking with respect to one another. Forexample, a doctor might use a scale of 0-10 to indicate degree of
improvement in some condition, from 0 (no improvement) to 10
(disappearance of the condition). While you know that a 4 is better
than a 2, there is no implication that a 4 is twice as good as a 2. Nor
is the improvement from 2 to 4 necessarily the same "amount" ofimprovement as the improvement from 6 to 8. All we know is that
there are 11 categories, with 1 being better than 0, 2 being better
than 1, etc
13
-
8/13/2019 Ch4 FundaSQCT
14/36
Measures of Scale An interval scale is a measurement scale in which a
certain distance along the scale means the same thing
no matter where on the scale you are, but where "0" on
the scale does not represent the absence of the thing
being measured. Fahrenheit and Celsius temperature
scales are examples.
A ratio scale is a measurement scale in which a certaindistance along the scale means the same thing no
matter where on the scale you are, and where "0" on the
scale represents the absence of the thing beingmeasured. Thus a "4" on such a scale implies twice as
much of the thing being measured as a "2."
14
-
8/13/2019 Ch4 FundaSQCT
15/36
Interval Data: Temperature, Dates (datathat has has an arbitrary zero)
Ratio Data: Height, Weight, Age, Length(data that has an absolute zero)
Nominal Data: Male, Female, Race,Political Party (categorical data that cannot
be ranked)
Ordinal Data: Degree of Satisfaction atRestaurant (data that can be ranked)
15
-
8/13/2019 Ch4 FundaSQCT
16/36
http://www.statsoft.com/textbook/elementary-concepts-in-statistics/
Interval variables allow us not only to rank order the items that aremeasured, but also to quantify and compare the sizes of differences
between them. For example, temperature, as measured in degrees
Fahrenheit or Celsius, constitutes an interval scale. We can say that a
temperature of 40 degrees is higher than a temperature of 30 degrees, and
that an increase from 20 to 40 degrees is twice as much as an increasefrom 30 to 40 degrees.
Ratio variables are very similar to interval variables; in addition to all theproperties of interval variables, they feature an identifiable absolute zero
point, thus, they allow for statements such as x is two times more than y.
Typical examples of ratio scales are measures of time or space. Forexample, as the Kelvin temperature scale is a ratio scale, not only can we
say that a temperature of 200 degrees is higher than one of 100 degrees,
we can correctly state that it is twice as high. Interval scales do not have the
ratio property. Most statistical data analysis procedures do not distinguish
between the interval and ratio properties of the measurement scales.16
-
8/13/2019 Ch4 FundaSQCT
17/36
17
Measures of Central Tendency
Mean Simple average of the observations in a dataset Sample Mean, Population Mean (formulae)
Median
Is the value in the middle, when observations are ranked It is more robust than mean, as it is not influenced by extreme
values in dataset
Mode Is the value that occurs more frequently in a dataset
A dataset having more than one mode is called Multi-modal
Trimmed Mean Obtained by calculating the mean of data, that remain after a
proportion of high and low values being deleted (a% trimmed)
-
8/13/2019 Ch4 FundaSQCT
18/36
18
Measures of Dispersion Provides information on the variability or scatter
of the observations around a given value
Range Difference between largest and smallest value in the
dataset
R = XL- XS
Variance Measures the fluctuation of the observations around
the mean
Population variance, Sample Variance (formulae)
Why n-1 in sample variance? To satisfy the property of unbiasedness i.e. averageof sample variance (keeps varying between sample)should be equal to population variance which isconstant
-
8/13/2019 Ch4 FundaSQCT
19/36
19
Measures of Dispersion
Standard Deviation Mostly used measure of dispersion and has the same unit as theobservation
Measures the variability of the observation around the mean Population Standard Deviation, Sample Standard Deviation
Inter Quartile Range Lower / First quartile (Q1) is the value such that 1/4thof theobservations fall below it and 3/4thfall above it (Q1 = 0.25 (n+1))
Vice Versa for Third Quartile (Q3) (Q1 = 0.75 (n+1)) Difference between 3rd quartile and 1st quartile (IQR = Q3Q1)
Larger the value of IQR, greater the spread of data To find IQR, the data are ranked in ascending order and then Q1and Q3 are calculated
-
8/13/2019 Ch4 FundaSQCT
20/36
20
Measures of Skewness & Kurtosis Skewness coefficient (V
1
) Describes the asymmetry of the dataset about the
mean or indicates the degree to which distributiondeviates from symmetry (formulae)
Negatively skewed: V1= -ve, Mean < Median
Positively skewed: V1= +ve, Mean > Median Not skewed: V1= 0, Mean = Median Kurtosis coefficient (V2)
Is a measure of peakness of the dataset (formulae) Is also a measure of heaviness of the tails of
distribution For normal distribution (Mesokurtic), V2= 3 Leptokurtic, More peaked, V2 > 3 Platykurtic, Less peaked, V2 < 3
-
8/13/2019 Ch4 FundaSQCT
21/36
21
-
8/13/2019 Ch4 FundaSQCT
22/36
22
Measures of Association
Indicates how two or more variables are related to eachother
Small values indicate weak relation and large value forstrong
Correlation coefficient (r)
Is a measure of the strength of the linear relationship between 2variables Sample correlation coefficient is always between -1 and 1 Formulae 1 denotes perfect +ve linear relationship, -1 denotes perfectve
linear relationship and 0 denotes uncorrelated Sample Problem
-
8/13/2019 Ch4 FundaSQCT
23/36
-
8/13/2019 Ch4 FundaSQCT
24/36
Researchers at the European Centre forRoad Safety Testing are trying to find out
how the age of cars affects their braking
capability. They test a group of ten cars ofdiffering ages and find out the minimum
stopping distances that the cars can
achieve. The results are set out in thetable below:
24
-
8/13/2019 Ch4 FundaSQCT
25/36
Car ages and stopping distances
CarAge
(months)Minimum Stopping at 40 kph(metres)A 9 28.4 B 15 29.3 C 24 37.6 D
30 36.2 E 38 36.5 F 46 35.3 G 53 36.2 H
60 44.1 I 64 44.8 J 76 47.2
25
-
8/13/2019 Ch4 FundaSQCT
26/36
x-bar = 415/10 = 41.5y-bar = 376.6/10 = 37.7
r= 10 x 16713.3 - 415 x 375.6 / {(10 x 21623 - 4152) (10x 14457.72 - 375.62)}
r= 11259 / (44005 x 3501.84)
r= 11259 / 124.14
r= 0.91
ralways lies in the range -1 to +1. If it lies close to either of thesetwo values, then the dispersion of the scattergram points is small
and therefore a strong correlation exists between the two variables.
For rto equal exactly -1 or +1 must mean that correlation is perfectand all the points on the scattergram lie on the line of best fit
(otherwise known as the regression line.) If ris close to 0, thedispersion is large and the variables are uncorrelated. The positive
or negative sign on the value of rindicates positive or negative
correlation.
26
-
8/13/2019 Ch4 FundaSQCT
27/36
-
8/13/2019 Ch4 FundaSQCT
28/36
ExampleRolling 2 Dice (Red/Green)
Red\Green 1 2 3 4 5 6
1 2 3 4 5 6 7
2 3 4 5 6 7 83 4 5 6 7 8 9
4 5 6 7 8 9 10
5 6 7 8 9 10 11
6 7 8 9 10 11 12
Y= Sum of the up faces of the two die. Table gives value of yfor all elements in S
-
8/13/2019 Ch4 FundaSQCT
29/36
Rolling 2 DiceProbability Mass Function & CDF
y p(y) F(y)
2 1/36 1/36
3 2/36 3/36
4 3/36 6/36
5 4/36 10/36
6 5/36 15/36
7 6/36 21/36
8 5/36 26/36
9 4/36 30/36
10 3/36 33/36
11 2/36 35/36
12 1/36 36/36
y
t
tpyF
yyp
2
)()(
inresultcandie2waysof#
tosumcandie2waysof#)(
-
8/13/2019 Ch4 FundaSQCT
30/36
30
Cumulative Distributive function
For a discrete random variable, F(x) = all i p(xi)for xi x For a continuous random variable, F(x) = ba
f(x)dx
F(x) is a non decreasing function of x such thatfor limit, x tending to infinity, it is 1 and it is 0 forminus infinity
Expected value
= E(x) = all i xi p(xi) , if x is discrete = E(x) = ba x f(x)dx, if x is continuous
Variance of a random variable is given by
Var(X) = E[(X - )2
] = E(X2
)[E(X)]2
-
8/13/2019 Ch4 FundaSQCT
31/36
31
Discrete Distributions
Hyper geometric distribution Useful in sampling from a finite population without
replacement, where the outcomes are success or
failure
If we consider, getting a nonconforming item assuccess, the probability distribution of nonconforming
item (x) is given by P(x) = Dcx.(N-D)c(n-x) /
Ncn D: no. of defects in population, x: no. of defects in sample
N: Size of population, n: Size of sample
Mean = E(x) = nD/N
Variance 2= Var(x) = nD/N(1D/N)((N-D)/N-1))
-
8/13/2019 Ch4 FundaSQCT
32/36
32
Discrete Distributions
Binomial distribution Series of independent trials Useful in sampling from a large population without
replacement, or to sample with replacement from a
finite population Probability of success (p) on any trial is assumed to
be a constant
Let x denote the no. of successes, if n trials are
conducted, probability of x successes is given by P(x) = ncx. px (1-p)n-x, x = 0,1, 2..
Mean = E(x) = np Variance 2= Var(x) = np(1p)
-
8/13/2019 Ch4 FundaSQCT
33/36
Trials are independent in binomial, but notin hyper-geometric
Probability of success on any trial remainsconstant in binomial but not in hyper-
geometric
hyper-geometric approaches to binomial
as N and D/N remains constant
33
-
8/13/2019 Ch4 FundaSQCT
34/36
34
Discrete Distributions
Poisson distribution Used to model the no. of events that happen within a
product unit, space or volume or time period. Eg. No.of machine breakdown per month
Probability distribution function of the no. of events (x)is given by
P(x) = e-. x / x! , x = 0,1, 2..
Mean or average no. of events is given by
Mean = Variance 2
= It is used as an approximation to the binomial, whenn is large (n)and p is small (p0), such that np= is a constant or average no. of defects per unit isconstant
-
8/13/2019 Ch4 FundaSQCT
35/36
-
8/13/2019 Ch4 FundaSQCT
36/36
36