ch4 fundasqct

Upload: mohan-rao

Post on 04-Jun-2018

244 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/13/2019 Ch4 FundaSQCT

    1/36

    1

    Chapter - 4

    Fundamentals of Statistical

    Concepts & Techniques in QualityControl and Improvement

  • 8/13/2019 Ch4 FundaSQCT

    2/36

    2

    Basic Terminologies

    Population Set of all items that

    possess a certain

    characteristic of interest

    Eg. Average thickness ofthe plastic cups produced

    in week no. 23 (10,000)

    Parameter Is a characteristic of a

    population, which

    describes it

    Eg. Average thickness of10,000 cups

    Sample A subset of population

    Eg. Selecting 200 plasticcups from the week 23

    output

    Statistic A characteristic of a

    sample, which is used to

    make inferences on the

    population parameters that

    are unknown

    Eg. Average thickness of200 plastic cups is 1mm

  • 8/13/2019 Ch4 FundaSQCT

    3/36

    Assigning Probabilities

    Classical MethodAssigning probabilities based on theassumption of equally likely outcomes.

    Relative Frequency MethodAssigning probabilities based onexperimentation or historical data.

    Subjective MethodAssigning probabilities based on theassignors judgment.

  • 8/13/2019 Ch4 FundaSQCT

    4/36

    4

    Basics of Probability

    Probability of an event describes the chance ofoccurrence of that event

    A probability function is bound by 0 and 1

    0 for non-occurrence, 1 for occurrence Set of all outcomes of an experiment is calledsample space (S)

    If each outcome in sample space is likely to

    happen, then the prob. of event A is given byP(A) = na / N and probability associated withsample space is P(S) = 1

  • 8/13/2019 Ch4 FundaSQCT

    5/36

    5

    Basics of ProbabilityContd..

    Events Simple events cannot be broken into other events Compound events are made up of two or more simple

    events

    Complementary of an event, say A, implies theoccurrence of everything except A. i.e. P(Ac) = 1P(A)

    Laws Additive law defines the probability of the union of 2

    or more events (say A & B), i.e. implies A mayhappen, B may happen or both

    P(A u B) = P(A) + P(B)P(A n B)

  • 8/13/2019 Ch4 FundaSQCT

    6/36

    6

    Basics of ProbabilityContd..

    Lawscontd.. Multiplicative law defines the probability of the

    intersection of 2 or more events (say A & B), i.e.implies all the events in the group occurs

    P(A n B) = P(A).P(B | A) = P(B).P(A | B) P(B | A) represents conditional probability, (i.e.,

    probability that B occurs if A has)

    Independence Two events A & B are said to be independent, if the

    outcome of one has no influence on outcome of other

    P(B | A) = P(B) and hence P(A n B) = P(A).P(B)

  • 8/13/2019 Ch4 FundaSQCT

    7/36

  • 8/13/2019 Ch4 FundaSQCT

    8/36

  • 8/13/2019 Ch4 FundaSQCT

    9/36

    9

    Statistics Statistics is the science that deals with the collection, classification,

    analysis and making of inferences from data

    Descriptive Statistics Describes the characteristics of product or process using

    information collected on it

    Inferential Statistics Draws conclusion on unknown process parameters based on

    sample information

    Data Collection Direct observation

    Indirect observation (Questionnaires) No control over data and Chances of Error is more It can be described by random variablecontinuous or discrete

  • 8/13/2019 Ch4 FundaSQCT

    10/36

    10

    StatisticsContd..

    Continuous variable Variable that can assume any value on a continuous scale within

    a range Eg. Viscosity of a resin

    Discrete variable Variable that can assume a finite number of values are called

    discrete Eg. No. of defect in a shirt They are classified as acceptable or not Continuous characteristic can also be viewed as discrete. Eg.

    Diameter of a hub in a tire

    Accuracy Refers to the degree of uniformity of the observations around a

    desired value, such that on average, target is realized

    Precision Refers to the degree of variability of observation

  • 8/13/2019 Ch4 FundaSQCT

    11/36

    11

  • 8/13/2019 Ch4 FundaSQCT

    12/36

    12

    Measures of Scale

    Nominal Scale Data variables are simply labels to identify an attribute Eg. Critical / Major / Minor

    Ordinal Scale Data has the properties of nominal data

    Data ranks or orders the observation Eg. Grades, 1- outstanding, 5poor Interval Scale

    Data has the properties of ordinal data and a fixed unit ofmeasure describes the interval between observations

    Eg. Temp. of water in diff stages of cooling during 2 hrs interval

    Ratio Scale Data has the properties of Interval data and a natural zero exists

    for measurement scale Eg. Wt. of cement bag: 100 kg, 100.2kg.

  • 8/13/2019 Ch4 FundaSQCT

    13/36

    Measures of Scale A nominal scale is really a list of categories to which objects can be

    classified. For example, people who receive a mail order offer mightbe classified as "no response," "purchase and pay," "purchase but

    return the product," and "purchase and neither pay nor return." The

    data so classified are termed categorical data.

    An ordinal scale is a measurement scale that assigns values to

    objects based on their ranking with respect to one another. Forexample, a doctor might use a scale of 0-10 to indicate degree of

    improvement in some condition, from 0 (no improvement) to 10

    (disappearance of the condition). While you know that a 4 is better

    than a 2, there is no implication that a 4 is twice as good as a 2. Nor

    is the improvement from 2 to 4 necessarily the same "amount" ofimprovement as the improvement from 6 to 8. All we know is that

    there are 11 categories, with 1 being better than 0, 2 being better

    than 1, etc

    13

  • 8/13/2019 Ch4 FundaSQCT

    14/36

    Measures of Scale An interval scale is a measurement scale in which a

    certain distance along the scale means the same thing

    no matter where on the scale you are, but where "0" on

    the scale does not represent the absence of the thing

    being measured. Fahrenheit and Celsius temperature

    scales are examples.

    A ratio scale is a measurement scale in which a certaindistance along the scale means the same thing no

    matter where on the scale you are, and where "0" on the

    scale represents the absence of the thing beingmeasured. Thus a "4" on such a scale implies twice as

    much of the thing being measured as a "2."

    14

  • 8/13/2019 Ch4 FundaSQCT

    15/36

    Interval Data: Temperature, Dates (datathat has has an arbitrary zero)

    Ratio Data: Height, Weight, Age, Length(data that has an absolute zero)

    Nominal Data: Male, Female, Race,Political Party (categorical data that cannot

    be ranked)

    Ordinal Data: Degree of Satisfaction atRestaurant (data that can be ranked)

    15

  • 8/13/2019 Ch4 FundaSQCT

    16/36

    http://www.statsoft.com/textbook/elementary-concepts-in-statistics/

    Interval variables allow us not only to rank order the items that aremeasured, but also to quantify and compare the sizes of differences

    between them. For example, temperature, as measured in degrees

    Fahrenheit or Celsius, constitutes an interval scale. We can say that a

    temperature of 40 degrees is higher than a temperature of 30 degrees, and

    that an increase from 20 to 40 degrees is twice as much as an increasefrom 30 to 40 degrees.

    Ratio variables are very similar to interval variables; in addition to all theproperties of interval variables, they feature an identifiable absolute zero

    point, thus, they allow for statements such as x is two times more than y.

    Typical examples of ratio scales are measures of time or space. Forexample, as the Kelvin temperature scale is a ratio scale, not only can we

    say that a temperature of 200 degrees is higher than one of 100 degrees,

    we can correctly state that it is twice as high. Interval scales do not have the

    ratio property. Most statistical data analysis procedures do not distinguish

    between the interval and ratio properties of the measurement scales.16

  • 8/13/2019 Ch4 FundaSQCT

    17/36

    17

    Measures of Central Tendency

    Mean Simple average of the observations in a dataset Sample Mean, Population Mean (formulae)

    Median

    Is the value in the middle, when observations are ranked It is more robust than mean, as it is not influenced by extreme

    values in dataset

    Mode Is the value that occurs more frequently in a dataset

    A dataset having more than one mode is called Multi-modal

    Trimmed Mean Obtained by calculating the mean of data, that remain after a

    proportion of high and low values being deleted (a% trimmed)

  • 8/13/2019 Ch4 FundaSQCT

    18/36

    18

    Measures of Dispersion Provides information on the variability or scatter

    of the observations around a given value

    Range Difference between largest and smallest value in the

    dataset

    R = XL- XS

    Variance Measures the fluctuation of the observations around

    the mean

    Population variance, Sample Variance (formulae)

    Why n-1 in sample variance? To satisfy the property of unbiasedness i.e. averageof sample variance (keeps varying between sample)should be equal to population variance which isconstant

  • 8/13/2019 Ch4 FundaSQCT

    19/36

    19

    Measures of Dispersion

    Standard Deviation Mostly used measure of dispersion and has the same unit as theobservation

    Measures the variability of the observation around the mean Population Standard Deviation, Sample Standard Deviation

    Inter Quartile Range Lower / First quartile (Q1) is the value such that 1/4thof theobservations fall below it and 3/4thfall above it (Q1 = 0.25 (n+1))

    Vice Versa for Third Quartile (Q3) (Q1 = 0.75 (n+1)) Difference between 3rd quartile and 1st quartile (IQR = Q3Q1)

    Larger the value of IQR, greater the spread of data To find IQR, the data are ranked in ascending order and then Q1and Q3 are calculated

  • 8/13/2019 Ch4 FundaSQCT

    20/36

    20

    Measures of Skewness & Kurtosis Skewness coefficient (V

    1

    ) Describes the asymmetry of the dataset about the

    mean or indicates the degree to which distributiondeviates from symmetry (formulae)

    Negatively skewed: V1= -ve, Mean < Median

    Positively skewed: V1= +ve, Mean > Median Not skewed: V1= 0, Mean = Median Kurtosis coefficient (V2)

    Is a measure of peakness of the dataset (formulae) Is also a measure of heaviness of the tails of

    distribution For normal distribution (Mesokurtic), V2= 3 Leptokurtic, More peaked, V2 > 3 Platykurtic, Less peaked, V2 < 3

  • 8/13/2019 Ch4 FundaSQCT

    21/36

    21

  • 8/13/2019 Ch4 FundaSQCT

    22/36

    22

    Measures of Association

    Indicates how two or more variables are related to eachother

    Small values indicate weak relation and large value forstrong

    Correlation coefficient (r)

    Is a measure of the strength of the linear relationship between 2variables Sample correlation coefficient is always between -1 and 1 Formulae 1 denotes perfect +ve linear relationship, -1 denotes perfectve

    linear relationship and 0 denotes uncorrelated Sample Problem

  • 8/13/2019 Ch4 FundaSQCT

    23/36

  • 8/13/2019 Ch4 FundaSQCT

    24/36

    Researchers at the European Centre forRoad Safety Testing are trying to find out

    how the age of cars affects their braking

    capability. They test a group of ten cars ofdiffering ages and find out the minimum

    stopping distances that the cars can

    achieve. The results are set out in thetable below:

    24

  • 8/13/2019 Ch4 FundaSQCT

    25/36

    Car ages and stopping distances

    CarAge

    (months)Minimum Stopping at 40 kph(metres)A 9 28.4 B 15 29.3 C 24 37.6 D

    30 36.2 E 38 36.5 F 46 35.3 G 53 36.2 H

    60 44.1 I 64 44.8 J 76 47.2

    25

  • 8/13/2019 Ch4 FundaSQCT

    26/36

    x-bar = 415/10 = 41.5y-bar = 376.6/10 = 37.7

    r= 10 x 16713.3 - 415 x 375.6 / {(10 x 21623 - 4152) (10x 14457.72 - 375.62)}

    r= 11259 / (44005 x 3501.84)

    r= 11259 / 124.14

    r= 0.91

    ralways lies in the range -1 to +1. If it lies close to either of thesetwo values, then the dispersion of the scattergram points is small

    and therefore a strong correlation exists between the two variables.

    For rto equal exactly -1 or +1 must mean that correlation is perfectand all the points on the scattergram lie on the line of best fit

    (otherwise known as the regression line.) If ris close to 0, thedispersion is large and the variables are uncorrelated. The positive

    or negative sign on the value of rindicates positive or negative

    correlation.

    26

  • 8/13/2019 Ch4 FundaSQCT

    27/36

  • 8/13/2019 Ch4 FundaSQCT

    28/36

    ExampleRolling 2 Dice (Red/Green)

    Red\Green 1 2 3 4 5 6

    1 2 3 4 5 6 7

    2 3 4 5 6 7 83 4 5 6 7 8 9

    4 5 6 7 8 9 10

    5 6 7 8 9 10 11

    6 7 8 9 10 11 12

    Y= Sum of the up faces of the two die. Table gives value of yfor all elements in S

  • 8/13/2019 Ch4 FundaSQCT

    29/36

    Rolling 2 DiceProbability Mass Function & CDF

    y p(y) F(y)

    2 1/36 1/36

    3 2/36 3/36

    4 3/36 6/36

    5 4/36 10/36

    6 5/36 15/36

    7 6/36 21/36

    8 5/36 26/36

    9 4/36 30/36

    10 3/36 33/36

    11 2/36 35/36

    12 1/36 36/36

    y

    t

    tpyF

    yyp

    2

    )()(

    inresultcandie2waysof#

    tosumcandie2waysof#)(

  • 8/13/2019 Ch4 FundaSQCT

    30/36

    30

    Cumulative Distributive function

    For a discrete random variable, F(x) = all i p(xi)for xi x For a continuous random variable, F(x) = ba

    f(x)dx

    F(x) is a non decreasing function of x such thatfor limit, x tending to infinity, it is 1 and it is 0 forminus infinity

    Expected value

    = E(x) = all i xi p(xi) , if x is discrete = E(x) = ba x f(x)dx, if x is continuous

    Variance of a random variable is given by

    Var(X) = E[(X - )2

    ] = E(X2

    )[E(X)]2

  • 8/13/2019 Ch4 FundaSQCT

    31/36

    31

    Discrete Distributions

    Hyper geometric distribution Useful in sampling from a finite population without

    replacement, where the outcomes are success or

    failure

    If we consider, getting a nonconforming item assuccess, the probability distribution of nonconforming

    item (x) is given by P(x) = Dcx.(N-D)c(n-x) /

    Ncn D: no. of defects in population, x: no. of defects in sample

    N: Size of population, n: Size of sample

    Mean = E(x) = nD/N

    Variance 2= Var(x) = nD/N(1D/N)((N-D)/N-1))

  • 8/13/2019 Ch4 FundaSQCT

    32/36

    32

    Discrete Distributions

    Binomial distribution Series of independent trials Useful in sampling from a large population without

    replacement, or to sample with replacement from a

    finite population Probability of success (p) on any trial is assumed to

    be a constant

    Let x denote the no. of successes, if n trials are

    conducted, probability of x successes is given by P(x) = ncx. px (1-p)n-x, x = 0,1, 2..

    Mean = E(x) = np Variance 2= Var(x) = np(1p)

  • 8/13/2019 Ch4 FundaSQCT

    33/36

    Trials are independent in binomial, but notin hyper-geometric

    Probability of success on any trial remainsconstant in binomial but not in hyper-

    geometric

    hyper-geometric approaches to binomial

    as N and D/N remains constant

    33

  • 8/13/2019 Ch4 FundaSQCT

    34/36

    34

    Discrete Distributions

    Poisson distribution Used to model the no. of events that happen within a

    product unit, space or volume or time period. Eg. No.of machine breakdown per month

    Probability distribution function of the no. of events (x)is given by

    P(x) = e-. x / x! , x = 0,1, 2..

    Mean or average no. of events is given by

    Mean = Variance 2

    = It is used as an approximation to the binomial, whenn is large (n)and p is small (p0), such that np= is a constant or average no. of defects per unit isconstant

  • 8/13/2019 Ch4 FundaSQCT

    35/36

  • 8/13/2019 Ch4 FundaSQCT

    36/36

    36