ch4 fundasqct

8/13/2019 Ch4 FundaSQCT

1/36

1

Chapter - 4

Fundamentals of Statistical

Concepts & Techniques in QualityControl and Improvement


2/36

2

Basic Terminologies

Population Set of all items that

possess a certain

characteristic of interest

Eg. Average thickness ofthe plastic cups produced

in week no. 23 (10,000)

Parameter Is a characteristic of a

population, which

describes it

Eg. Average thickness of10,000 cups

Sample A subset of population

Eg. Selecting 200 plasticcups from the week 23

output

Statistic A characteristic of a

sample, which is used to

make inferences on the

population parameters that

are unknown

Eg. Average thickness of200 plastic cups is 1mm


3/36

Assigning Probabilities

Classical MethodAssigning probabilities based on theassumption of equally likely outcomes.

Relative Frequency MethodAssigning probabilities based onexperimentation or historical data.

Subjective MethodAssigning probabilities based on theassignors judgment.


4/36

4

Basics of Probability

Probability of an event describes the chance ofoccurrence of that event

A probability function is bound by 0 and 1

0 for non-occurrence, 1 for occurrence Set of all outcomes of an experiment is calledsample space (S)

If each outcome in sample space is likely to

happen, then the prob. of event A is given byP(A) = na / N and probability associated withsample space is P(S) = 1


5/36

5

Basics of ProbabilityContd..

Events Simple events cannot be broken into other events Compound events are made up of two or more simple

events

Complementary of an event, say A, implies theoccurrence of everything except A. i.e. P(Ac) = 1P(A)

Laws Additive law defines the probability of the union of 2

or more events (say A & B), i.e. implies A mayhappen, B may happen or both

P(A u B) = P(A) + P(B)P(A n B)


6/36

6

Basics of ProbabilityContd..

Lawscontd.. Multiplicative law defines the probability of the

intersection of 2 or more events (say A & B), i.e.implies all the events in the group occurs

P(A n B) = P(A).P(B | A) = P(B).P(A | B) P(B | A) represents conditional probability, (i.e.,

probability that B occurs if A has)

Independence Two events A & B are said to be independent, if the

outcome of one has no influence on outcome of other

P(B | A) = P(B) and hence P(A n B) = P(A).P(B)


7/36


8/36


9/36

9

Statistics Statistics is the science that deals with the collection, classification,

analysis and making of inferences from data

Descriptive Statistics Describes the characteristics of product or process using

information collected on it

Inferential Statistics Draws conclusion on unknown process parameters based on

sample information

Data Collection Direct observation

Indirect observation (Questionnaires) No control over data and Chances of Error is more It can be described by random variablecontinuous or discrete


10/36

10

StatisticsContd..

Continuous variable Variable that can assume any value on a continuous scale within

a range Eg. Viscosity of a resin

Discrete variable Variable that can assume a finite number of values are called

discrete Eg. No. of defect in a shirt They are classified as acceptable or not Continuous characteristic can also be viewed as discrete. Eg.

Diameter of a hub in a tire

Accuracy Refers to the degree of uniformity of the observations around a

desired value, such that on average, target is realized

Precision Refers to the degree of variability of observation


11/36

11


12/36

12

Measures of Scale

Nominal Scale Data variables are simply labels to identify an attribute Eg. Critical / Major / Minor

Ordinal Scale Data has the properties of nominal data

Data ranks or orders the observation Eg. Grades, 1- outstanding, 5poor Interval Scale

Data has the properties of ordinal data and a fixed unit ofmeasure describes the interval between observations

Eg. Temp. of water in diff stages of cooling during 2 hrs interval

Ratio Scale Data has the properties of Interval data and a natural zero exists

for measurement scale Eg. Wt. of cement bag: 100 kg, 100.2kg.


13/36

Measures of Scale A nominal scale is really a list of categories to which objects can be

classified. For example, people who receive a mail order offer mightbe classified as "no response," "purchase and pay," "purchase but

return the product," and "purchase and neither pay nor return." The

data so classified are termed categorical data.

An ordinal scale is a measurement scale that assigns values to

objects based on their ranking with respect to one another. Forexample, a doctor might use a scale of 0-10 to indicate degree of

improvement in some condition, from 0 (no improvement) to 10

(disappearance of the condition). While you know that a 4 is better

than a 2, there is no implication that a 4 is twice as good as a 2. Nor

is the improvement from 2 to 4 necessarily the same "amount" ofimprovement as the improvement from 6 to 8. All we know is that

there are 11 categories, with 1 being better than 0, 2 being better

than 1, etc

13


14/36

Measures of Scale An interval scale is a measurement scale in which a

certain distance along the scale means the same thing

no matter where on the scale you are, but where "0" on

the scale does not represent the absence of the thing

being measured. Fahrenheit and Celsius temperature

scales are examples.

A ratio scale is a measurement scale in which a certaindistance along the scale means the same thing no

matter where on the scale you are, and where "0" on the

scale represents the absence of the thing beingmeasured. Thus a "4" on such a scale implies twice as

much of the thing being measured as a "2."

14


15/36

Interval Data: Temperature, Dates (datathat has has an arbitrary zero)

Ratio Data: Height, Weight, Age, Length(data that has an absolute zero)

Nominal Data: Male, Female, Race,Political Party (categorical data that cannot

be ranked)

Ordinal Data: Degree of Satisfaction atRestaurant (data that can be ranked)

15


16/36

http://www.statsoft.com/textbook/elementary-concepts-in-statistics/

Interval variables allow us not only to rank order the items that aremeasured, but also to quantify and compare the sizes of differences

between them. For example, temperature, as measured in degrees

Fahrenheit or Celsius, constitutes an interval scale. We can say that a

temperature of 40 degrees is higher than a temperature of 30 degrees, and

that an increase from 20 to 40 degrees is twice as much as an increasefrom 30 to 40 degrees.

Ratio variables are very similar to interval variables; in addition to all theproperties of interval variables, they feature an identifiable absolute zero

point, thus, they allow for statements such as x is two times more than y.

Typical examples of ratio scales are measures of time or space. Forexample, as the Kelvin temperature scale is a ratio scale, not only can we

say that a temperature of 200 degrees is higher than one of 100 degrees,

we can correctly state that it is twice as high. Interval scales do not have the

ratio property. Most statistical data analysis procedures do not distinguish

between the interval and ratio properties of the measurement scales.16


17/36

17

Measures of Central Tendency

Mean Simple average of the observations in a dataset Sample Mean, Population Mean (formulae)

Median

Is the value in the middle, when observations are ranked It is more robust than mean, as it is not influenced by extreme

values in dataset

Mode Is the value that occurs more frequently in a dataset

A dataset having more than one mode is called Multi-modal

Trimmed Mean Obtained by calculating the mean of data, that remain after a

proportion of high and low values being deleted (a% trimmed)


18/36

18

Measures of Dispersion Provides information on the variability or scatter

of the observations around a given value

Range Difference between largest and smallest value in the

dataset

R = XL- XS

Variance Measures the fluctuation of the observations around

the mean

Population variance, Sample Variance (formulae)

Why n-1 in sample variance? To satisfy the property of unbiasedness i.e. averageof sample variance (keeps varying between sample)should be equal to population variance which isconstant


19/36

19

Measures of Dispersion

Standard Deviation Mostly used measure of dispersion and has the same unit as theobservation

Measures the variability of the observation around the mean Population Standard Deviation, Sample Standard Deviation

Inter Quartile Range Lower / First quartile (Q1) is the value such that 1/4thof theobservations fall below it and 3/4thfall above it (Q1 = 0.25 (n+1))

Vice Versa for Third Quartile (Q3) (Q1 = 0.75 (n+1)) Difference between 3rd quartile and 1st quartile (IQR = Q3Q1)

Larger the value of IQR, greater the spread of data To find IQR, the data are ranked in ascending order and then Q1and Q3 are calculated


20/36

20

Measures of Skewness & Kurtosis Skewness coefficient (V

1

) Describes the asymmetry of the dataset about the

mean or indicates the degree to which distributiondeviates from symmetry (formulae)

Negatively skewed: V1= -ve, Mean < Median

Positively skewed: V1= +ve, Mean > Median Not skewed: V1= 0, Mean = Median Kurtosis coefficient (V2)

Is a measure of peakness of the dataset (formulae) Is also a measure of heaviness of the tails of

distribution For normal distribution (Mesokurtic), V2= 3 Leptokurtic, More peaked, V2 > 3 Platykurtic, Less peaked, V2 < 3


21/36

21


22/36

22

Measures of Association

Indicates how two or more variables are related to eachother

Small values indicate weak relation and large value forstrong

Correlation coefficient (r)

Is a measure of the strength of the linear relationship between 2variables Sample correlation coefficient is always between -1 and 1 Formulae 1 denotes perfect +ve linear relationship, -1 denotes perfectve

linear relationship and 0 denotes uncorrelated Sample Problem


23/36


24/36

Researchers at the European Centre forRoad Safety Testing are trying to find out

how the age of cars affects their braking

capability. They test a group of ten cars ofdiffering ages and find out the minimum

stopping distances that the cars can

achieve. The results are set out in thetable below:

24


25/36

Car ages and stopping distances

CarAge

(months)Minimum Stopping at 40 kph(metres)A 9 28.4 B 15 29.3 C 24 37.6 D

30 36.2 E 38 36.5 F 46 35.3 G 53 36.2 H

60 44.1 I 64 44.8 J 76 47.2

25


26/36

x-bar = 415/10 = 41.5y-bar = 376.6/10 = 37.7

r= 10 x 16713.3 - 415 x 375.6 / {(10 x 21623 - 4152) (10x 14457.72 - 375.62)}

r= 11259 / (44005 x 3501.84)

r= 11259 / 124.14

r= 0.91

ralways lies in the range -1 to +1. If it lies close to either of thesetwo values, then the dispersion of the scattergram points is small

and therefore a strong correlation exists between the two variables.

For rto equal exactly -1 or +1 must mean that correlation is perfectand all the points on the scattergram lie on the line of best fit

(otherwise known as the regression line.) If ris close to 0, thedispersion is large and the variables are uncorrelated. The positive

or negative sign on the value of rindicates positive or negative

correlation.

26


27/36


28/36

ExampleRolling 2 Dice (Red/Green)

Red\Green 1 2 3 4 5 6

1 2 3 4 5 6 7

2 3 4 5 6 7 83 4 5 6 7 8 9

4 5 6 7 8 9 10

5 6 7 8 9 10 11

6 7 8 9 10 11 12

Y= Sum of the up faces of the two die. Table gives value of yfor all elements in S


29/36

Rolling 2 DiceProbability Mass Function & CDF

y p(y) F(y)

2 1/36 1/36

3 2/36 3/36

4 3/36 6/36

5 4/36 10/36

6 5/36 15/36

7 6/36 21/36

8 5/36 26/36

9 4/36 30/36

10 3/36 33/36

11 2/36 35/36

12 1/36 36/36

y

t

tpyF

yyp

2

)()(

inresultcandie2waysof#

tosumcandie2waysof#)(


30/36

30

Cumulative Distributive function

For a discrete random variable, F(x) = all i p(xi)for xi x For a continuous random variable, F(x) = ba

f(x)dx

F(x) is a non decreasing function of x such thatfor limit, x tending to infinity, it is 1 and it is 0 forminus infinity

Expected value

= E(x) = all i xi p(xi) , if x is discrete = E(x) = ba x f(x)dx, if x is continuous

Variance of a random variable is given by

Var(X) = E[(X - )2

] = E(X2

)[E(X)]2


31/36

31

Discrete Distributions

Hyper geometric distribution Useful in sampling from a finite population without

replacement, where the outcomes are success or

failure

If we consider, getting a nonconforming item assuccess, the probability distribution of nonconforming

item (x) is given by P(x) = Dcx.(N-D)c(n-x) /

Ncn D: no. of defects in population, x: no. of defects in sample

N: Size of population, n: Size of sample

Mean = E(x) = nD/N

Variance 2= Var(x) = nD/N(1D/N)((N-D)/N-1))


32/36

32


Binomial distribution Series of independent trials Useful in sampling from a large population without

replacement, or to sample with replacement from a

finite population Probability of success (p) on any trial is assumed to

be a constant

Let x denote the no. of successes, if n trials are

conducted, probability of x successes is given by P(x) = ncx. px (1-p)n-x, x = 0,1, 2..

Mean = E(x) = np Variance 2= Var(x) = np(1p)


33/36

Trials are independent in binomial, but notin hyper-geometric

Probability of success on any trial remainsconstant in binomial but not in hyper-

geometric

hyper-geometric approaches to binomial

as N and D/N remains constant

33


34/36

34


Poisson distribution Used to model the no. of events that happen within a

product unit, space or volume or time period. Eg. No.of machine breakdown per month

Probability distribution function of the no. of events (x)is given by

P(x) = e-. x / x! , x = 0,1, 2..

Mean or average no. of events is given by

Mean = Variance 2

= It is used as an approximation to the binomial, whenn is large (n)and p is small (p0), such that np= is a constant or average no. of defects per unit isconstant


35/36


36/36

36

ch4 fundasqct

Documents