chap2

Chapter 2

Probability and StatisticsReview

2.1 Probability Review

2.1.1 Introduction

The intent of this chapter is to provide a review of the basic probabilityand statistical topics needed for the study of reliability. This chapter isnot a substitute for a calculus-based probability and statistics course. Thischapter covers primarily those topics from probability and statistics whichare necessary to understand the statistical treatments used in this book.

It has become increasingly clear that a basic uncertainty exists in theoutcomes of real-world processes. Often it is useful to be able to predict thelikelihood of the occurrence of certain of these outcomes. Probability theoryoften employs mathematical models which have the necessary quality of con-sistency and also have sufficient flexibility to describe realistic situations. Inaddition, probability theory has been used successfully and practically to ex-tend the uncertainty of basic outcomes to a determination of the likelihood ofcomplex events. What follows is a sketch of the models of probability theory.

2.1.2 Experiments, Sample Spaces and Events

Definition: An experiment is any process whose possible outcomes canbe identified and whose actual outcome can be observed but not determinedin advance. Although the actual outcome of a particular experimental trial

21

22 CHAPTER 2. PROBABILITY AND STATISTICS REVIEW

cannot be determined in advance of the trial, the set of possible outcomescan be known and that set is called the sample space and denoted by S.

SAMPLE SPACE:Definition: A sample space of an experiment is the set of possible outcomesof the experiment. Sample spaces are often classified into discrete samplespaces, in which there are either a finite number of outcomes or a countablyinfinite number of outcomes, and continuous sample spaces, in which thereare a non-countable number of outcomes.

EXAMPLES OF EXPERIMENTS AND SAMPLE SPACES:Experiment 1: A coin is tossed 3 timesS={HHH, HHT, HTH, THH, HTT, THT, TTH, TTT}, 8 possible outcomes.

Experiment 2: A black die and a red die are rolledS={(1,1),(1,2),...,(1,6),(2,1),...,(6,6)},36 possible outcomes, see Figure 2.1

Figure 2.1: Sample space of the dice experiment

Experiment 3: A spinner is spun and the point on the chord of a circleis noted.

S = {x : x(0, 2π)}, a continuous sample space

Experiment 4: A spinner, as in Figure 2.2 is spin twice and the number

2.1. PROBABILITY REVIEW 23

pair noted S={(1,1),(1,2),(1,3),(2,1),(2,2),(2,3),(3,1),(3,2),(3,3)},

Figure 2.2: Simple Spinner

Experiment 5: The number of phone calls coming into a major tele-phone exchange in 1 hour is observed S={0,1,2,3,...}, a countably infinitesample space

Experiment 6: The time to failure of an electrical component is ob-served (hours) S = {t : t ∈ R, t > 0}, a non-countable infinite sample space

Experiment 7: A sample of 3 components is observed; each componentcan be non-defective(N) or defective(D)S = {NNN,NND,NDN,DNN,NDD,DND,DDN,DDD} ,

EVENTS: Collections of outcomes from S are called events.Definition: An event is a sub-group of the outcomes of S. Events are mostoften denoted by capital letters, A,B, . . . or A1, A2, . . . . Note that both theempty set φ and S itself are events, that is, subsets of S. φ might be calledan impossible event since it contains no possible outcomes; likewise, S mightbe called a certain event.

EXAMPLES OF EVENTS:Experiment 1: Event A: the outcomes which result on at least 2 headsA={HHH,HHT,HTH,THH}

Experiment 2: Event B: the outcomes where the sum of the spots is 8B={(2,6),(3,5),(4,4),(5,3),(6,2)}, in Figure 2.1, B is the diagonal of points.


Experiment 5: Event C: the number of calls is greater than 5, C={6,7,8,...}

Experiment 6: Event D: the time to failure is between 100 and 200hours D={(100,200)}Since events are point sets, the language and operations of set theory areuseful in the discussion of probability. Some basic definitions of set opera-tions are reviewed below and their relationships to events are indicated inthis table .

TableNOTATION SET LANGUAGE EVENT LANGUAGE

S Universal Set Sample Space (Certain Event)φ Empty Set Impossible EventAc Complement of A Event A does not occur

(points in S thatare not in set A)

A ∪B A union B Event A or Event B(points that are in Set A or both occur

or Set B or bothA ∩B A Intersect B Event A and Event B

(points that are in both both occurSet A and Set B)

A ∩ B = φ Set A and Set B are Event A and Event Bdisjoint are mutually exclusive

Figure 2.3 shows a set or Venn diagram indicating a sample space with twosets, A and B, their intersection and their union.

2.1.3 Definition of probability

Usually the ways of describing probabilities are more the ways of examiningthe relevance of a probability model to a realistic situation at hand. Althoughthe theory of probability often aids in the modeling and understanding of theoccurrence of random physical phenomena, probability theory is a mathemat-ical model constructed by means of the axiomatic method. Viewed in thisway then, the ways of describing probabilities are methods for identifying theundefined terms of a mathematical theory, hopefully with some relationshipto real phenomena.


Figure 2.3: Venn Diagram with 2 Sets

In the axiomatic approach, probability is defined as a function on theevents.

Definition: If an experiment has sample space S and event A is definedon S, then P(A) is a real number called the probability of A. The probabilityfunction must follow three axioms:1) 0 ≤ P (A) ≤ 1, for every event A2) P(S) = 13) For any sequence of events: A1, A2, . . . which are mutually exclusive(that is, Ai ∩ Aj = φ, for i 6= j), then P (∪Ai) =

∑

P (Ai).Whenever values P(A) satisfy the above axioms, it has been shown that

a complete theory of probability can be developed as a consistent mathe-matical system. That is true regardless of how the values P(A) are assigned,other than satisfying the axioms. It is entirely another question to have thevalues P(A) model reality. The modeling of reality by a probability systemwill be discussed in the next section.

2.1.4 The assignment of probabilities to finite samplespaces

The assignment of probabilities is most easily examined and most easily ac-complished in a physically meaningful way when the sample space S is finite.Three usual ways of assigning probabilities will be examined in this section.


In the case of finite S, S= {O1, O2, . . . , On}, where Oi represents the ith pos-sible outcome. With each Oi is associated a value pi, which is assigned sothat:1. pi ≥ 0, for all i;2.

∑

pi = 1.In addition, the probability of an event is the sum of the pi values associatedwith the Oi contained in the event subset. That is, P(A)= pi, where the sumis over NA, the number of outcomes in the event A. Probabilities assignedthis way satisfy the three axioms and thus have the properties of probabilitytheory which follow from the axioms. If probability theory is to be a modelfor which the probability of an event has physical meaning, it remains toassign the values pi in a physically meaningful way. Three ways of doing soare now outlined.

a) EQUALLY LIKELY OUTCOMES If there are N possible out-comes in S and it is determined that the outcomes are “equally likely”, thenpi =

1Nfor all i. This is easily implemented as long as the number of possible

outcomes in S is known and results in an association of probability with like-lihood. It is also possible that some experiments that do not fit the abovecriteria can be described in such a way as to use the equally likely method ofassignment. For example, some infinite sample spaces can be viewed in sucha way, although care must be used in doing so and it is possible that suchfitted assignments are not unique. In the case of equally likely outcomes,P (A) = NA

N, where NA is as before, the number of outcomes in the event A.

EXAMPLES OF EQUALLY LIKELY PROBABILITY ASSIGN-MENTS:Experiment 1: Probability of 1

8is assigned to each possible outcome

Event A: the outcomes that result in at least 2 headsP (A) = NA

N= 4

8= 1

2

Experiment 2: Probability of 1/36 is assigned to each possible outcomeEvent B: the outcomes where the sum of spots is 8P (B) = NB

N= 5

36


Experiment 4: In this experiment, the 9 possible outcomes are notintuitively equally likely. However, the equally likely concept can be in-voked to induce an intuitively appealing assignment of probabilities. Also,to stay within the finite sample space case, assume that the spinner stopsonly on the degree marks of the circle. Then, within the 1 area, thereare 180 degrees and in each of the other two, there are 90 degrees. Inthis case, each of the 360 degree marks can intuitively be considered asan equally likely stopping place. In the case of one spin: P (1) = 180

360=

12, P (2) = 90

360= 1

4and P (3) = 90

360= 1

4. For the sample space of two spins:

S={(1,1),(1,2),(1,3),(2,1),(2,2),(2,3),(3,1),(3,2),(3,3)} and the possible out-comes are not intuitively equally likely. However, by again assuming that thedegree marks are all equally likely and using a simple argument on lengths ofarc, it can be seen that the equally likely assumption induces the followingassignment of probabilities for the points in S:

P (1, 1) = 14, P (1, 2) = P (2, 1) = P (1, 3) = P (3, 1) = 1

8,

P (2, 3) = P (3, 2) = P (2, 2) = P (3, 3) = 116

For this experiment, an easier method of assignment for the sample spaceof two spins will be available as more theory is developed. In any case, thequestion of assigned probabilities versus induced probabilities is not alwaysobvious. As a general rule, it seems more straight forward to assign probabil-ities at the most basic or primary level of the experiment, at least for finitesample spaces. In other cases, as will be shown, it may be easier to assignprobabilities at a more developed stage of the problem.

b)FREQUENCY OF OCCURRENCE OF OUTCOMES:If the experiment can be thought of as a random experiment that is repeat-able, then the probability of an event may be thought of as the relativefrequency of occurrence of the outcomes in the event. The rolling of diceexperiment can obviously be thought of as repeatable and the probability ofevent B above, the outcomes where the sum is 8, is the relative frequency ofthe sum of 8 in a large number of rolls of the dice, In addition, the relativefrequency assignment satisfies the three axioms and results in a consistentmodel for probability.

Note that in the equally likely assignment of probability, a drawback tothe method of assignment is the necessity that there be a finite number ofpossible outcomes and that they all be equally likely. In the case of the


relative frequency assignment, a drawback is the necessity that the experi-ment be assumed to be repeatable and repeatable under essentially the sameconditions. The relative frequency method is more often used to verify a par-ticular assignment of probability or to check its reasonableness rather thanbe used as an assignment method itself.

The following example indicates some of the ambiguity that might ariseusing the relative frequency method of probability assignment. Note, how-ever, that differences that result between using the relative frequency methodof assignment and using the equally likely method, if both are applicable, aremost often quite small. Some sources recommend that NA

N, where NA is the

number of occurrences of event A and N is the total number of occurrences,be used as the assignment as N goes to infinity. The difficulties induced byquestions of an optimal number of occurrences to observe before using theassignment raise questions as to the practicality of this recommendation.

EXAMPLES OF RELATIVE FREQUENCY PROBABILITYASSIGNMENTS:

Experiment 2: In 1000 rolls of the two dice, the event B, sum=8, wasobserved 137 times. On this basis P(B) could be chosen to be .137 which isclose to the equally likely assignment of 5/36. It is possible that for thesedice, P(B) is .137 of that if more rolls were observed, the relative frequencyof event B would be closer to 5

36. If P(B)=.137 is used as the assignment,

P (Bc) = .863 and, of course, it is true that B-complement was observedon 863 of the 1000 rolls. In any case, assignments on the basis of relativefrequency are consistent with the theory of probability.

In this experiment it is obvious that the 8 outcomes are not equally likely.If there is a past history of samples of size 3 for the testing of this compo-nent, then the relative frequency method of assignment could be used in astraightforward manner. If only data for individual components is available,the relative frequency assignment can still be used, but more theory of prob-ability is needed to do so. This example will be treated again in a latersection of this chapter.

c) SUBJECTIVE PROBABILITY ASSIGNMENTS:Not all experiments are repeatable and not all result in an equally likely setof possible outcomes. In fact, there are many instances in practice whereneither of these conditions are true and it is still important to attach a value


of likelihood to an event. For example, a space vehicle with a very specificmission is to be launched and a value for the likelihood that this specificmission will succeed is required.

In this situation, probabilities can be assigned subjectively and can betaken to denote the degree of belief in the occurrence of the event. If proba-bilities are assigned subjectively, it allows one to incorporate one’s experiencewith similar experiments or one’s engineering judgment into the assignment.Other types of information can be included as well, including informationthat would result in a relative frequency assignment or an equally likely as-signment. The requirements of a subjective probability assignment are thatthey be assigned in a coherent and consistent manner so that the assignmentis consistent in preferences and so that the assignment does not contradictthe three axioms of probability.

There is some concern that this method of assignment will result in prob-abilities with which reasonable people will fail to agree. This implies thatusing the other methods reasonable people will agree with the assignmentand that the other methods do not include any subjectivity. However, it isthe experience of many that subjective elements are present in all methodsof assignment. Thus all methods should include criteria for reasonablenessof judgment. Such criteria for consensus and consistency of agreement havebeen suggested for the subjective method of assignment of probability. Aninterested reader is referred to Savage (1954) or Lindley (1969).

It seems to be important that the theory of probability include as applica-tions the many cases of interest where the outcomes of an experiment are notequally likely nor repeatable. It is still useful that the guidance provided byprobability theory be available. In many cases, incorporation of engineeringjudgment, for example, into an experiment is extremely valuable; and, thesubjective definition of probability allows the incorporation of this kind ofinformation. In fact, many feel that it is neglectful to ignore it.

The subjective assignment of probability allows one to examine and ma-nipulate probabilistically one’s degree of belief in an outcome and to examineits effect on the degrees of belief in more complex events. In addition, the useof the subjective assignment widens the range of applicability of probabilitytheory.

EXAMPLE OF SUBJECTIVE PROBABILITY ASSIGNMENT:Suppose it is required to state the likelihood of success of the next space shut-tle flight. In this case, success is defined as the lack of a catastrophic failure.


This event will only occur once and yet it is important to indicate the chancethat it will succeed. There have been only 26 shuttle flights so far and therewould be consensus that this number is too small to use the relative frequencyassignment as a subjective probability. There would also be consensus thatthe probability of success is greater than the 1

26= .038 that would be used if

the assignment were done this way.Prior to the Challenger accident in January, 1986, NASA estimated that

the chance of such a catastrophic solid rocket booster failure was roughly1 in 100,000 or .00001. This estimate was produced for the Department ofEnergy for use in a risk analysis. After the Challenger disaster, RichardFeynman did a subjective analysis by adjusting (for improved technology)the estimate obtained by using data from 2900 solid rocket booster flightsacross all military and NASA programs. He proposed an adjusted estimateof 1 failure per 50-100 launchings or a .02 to .01 chance of a solid rocketbooster failure.

It is not easy to attach an exact value to the probability of success ofsuch an event but to attempt to do so is thought by many analysts to beworth the effort. The analysis in the short description here of the assessmentattempts contains information, not only of the probability values but also ofthe difficulty and range of the assessments. The discrepancy among informedexperts in the assignment of probabilities gives additional information aboutthe precision of an assignment. This information is useful in subsequent anal-yses. This topic will be taken up again in a later section on Bayes analysis.

2.1.5 Some theorems of probability

Using only the three axioms, a number of useful properties of probabilitiescan be proved. Note that the events A and Ac are mutually exclusive andthat A ∪ Ac = S.The law of complement follows:

LAW OF COMPLEMENT:

P (Ac) = 1− P (A) (2.1)

A second property which follows from the axioms is called the general addi-tion law. It states that:


GENERAL ADDITION LAW:

P (A ∪ B) = P (A) + P (B)− P (A ∩ B) (2.2)

Note that, if A and B are mutually exclusive,A ∩ B = φ, and (2.2) be-comes:

P (A ∪ B) = P (A) + P (B) (2.3)

Also, notice that the probabilities of more complicated events are oftenmore easily computed using some of these developed properties of probabil-ity theory. For example, to illustrate the use of (2.1) and (2.2), consider thefollowing examples.

EXAMPLES OF USE OF THEOREMS:

Experiment 2: Event C: the outcomes where both dice exhibit evennumbers, P (C) = NC

N= 9

36= 1

4. Recall Event B: outcomes where sum is 8

P (B ∪ C) = P (B) + P (C)− P (B ∩ C) = 536+ 9

36− 3

36= 11

36

In this case, P (B ∪ C) could have been determined as easily by counting.

Experiment 4: Event D: both spins result in the same value,P(D)=P(1,1)+P(2,2)+P(3,3)=1

4+ 1

16+ 1

16= 6

16= 3

8.

Event Dc : both spins result in different values, P (Dc) = 1−P (D) = 58.

2.1.6 Conditional probability and independent events

Most probabilities are functions of two events. The first is the event beingconsidered and the second, the conditioning event, describes the conditionsunder which the first is being considered. In many applications the con-ditioning event is ignored or averaged over, resulting in what are denotedas marginal probabilities. However, it is important to consider conditioningevents and to present probability rules relating to them. Conditional proba-bilities(probabilities of conditional events) follow the same three axioms. Theresulting theory is based on a subspace of the sample space which is inducedby the restrictions imposed by the condition.


Suppose that events A1 and A2 are among the subsets of S and supposethat interest lies in the probability of A1. Now suppose also that there isinformation that A2 has occurred. Then the interest is in the probabilityof event A1, given that event A2 has occurred. This probability is written,P (A1|A2). The information that A2 has occurred is a restriction of S, namelya restriction to a consideration of only the possible outcomes contained in theevent A2. See Figure 2.4. The shaded area consists of the possible outcomesof which the set (A1|A2) is comprised. In a sense, the event A2 is considereda new sample space on the basis of the information that A2 has occurred.This also indicates that the probabilities assigned to the possible outcomeswithin event A2(the new restricted sample space must be adjusted to sum toone. Then the conditional probability P (A1|A2) is the sum of these adjustedprobabilities assigned to the possible outcomes in the set (A1|A2), that is,the shaded area of Figure 2.4. This can also be achieved by multiplying by

1P (A2)

so that:

P (A1|A2) =P (A1 ∩ A2)

P (A2)(2.4)

Figure 2.4: Venn diagram

EXAMPLES OF CONDITIONAL PROBABILITY:Experiment 2: S={(1,1),...,(6,6)}. See Figure 2.1.

a) Suppose an observer notes that the black die is a 4. What is the probabil-


ity that the sum of spots is 6?The information restricts the new sample spaceto the 6 possible outcomes {(4,1),(4,2),(4,3),(4,4),(4,5),(4,6)}. The probabil-ity 1

6is assigned to each outcome in the restricted space, whereas the other

30 outcomes in S have probability zero. Also, P (sum = 6|Black = 4) = 16.

b) Using (2.4) to adjust the probabilities assigned to the outcomes of theoriginal S,

P (Sum = 6|Black = 4) = P (Sum=6∩Black=4)P (Black=4)

= 136/ 636= 1

6.

Note that knowledge of the outcome of the black die changes the proba-bility that the sum is 6. Without knowledge of the outcome of the black die,P(Sum=6)=5/36 and with the knowledge that the black die is 4, P (Sum =6|Black = 4) = 6

36. Thus the information that the black die is a 4 slightly

increases the probability that the sum is a 6. The event, (Sum=6),is said tobe dependent of the event, (Black=4).

An extreme example of dependency of two events is the relationship be-tween the event, (Sum=6), and the event, (Black=6). In this case, knowledgethat the black die has the outcome 6, specifies that the sum cannot be 6,that is, if the black die is a 6, the event, (Sum=6), cannot occur. Thusmutually exclusive, non-empty, events are dependent because the occurrenceof one prohibits the occurrence of the other. Next, consider the relationshipbetween the events, (Black=4) and (Sum=7).

P (Sum = 7|Black = 4) = P (Sum=7∩Black=4)P (Black=4)

= 16= P (Sum = 7)

The occurrence of the event, (Black=4), does not change the probabilitythat the sum is 7. That is, the probability of the event, (Sum=7), in therestricted sample space is the same as the probability of that event in S.When this occurs, it is said that the two events are independent.

Definition: Two events, A1 and A2, are independent, if, and only if,

P (A1|A2) = P (A1) (2.5)

Note that (2.5) is equivalent to the definition of independence:

P (A1 ∩ A2) = P (A1).P (A2) (2.6)


Note also, from (2.4), that in general, there is a multiplication rule

P (A1 ∩ A2) = P (A1|A2).P (A2) (2.7)

Probability Trees: The additional theory of conditional and indepen-dent events allows a choice for the determination of a sample space for exper-iments. It also allows the use of an appealing graphical procedure for outlin-ing a sample space called a probability tree. Consider experiment 4,where aspinner is spun twice. On thefirst spin, the outcome could be 1, 2 or 3, withprobability 1/2, 1/4 and 1/4, respectively. See Figure 2.2 and Figure 2.4. Onthe second spin, the same outcomes are possible, with the same probabilities.If it is assumed that the two spins result in independent outcomes, then theprobabilities of the 2-spin outcomes are the same as those determined earlier.The diagram in Figure 2.5 is called a probability tree.

Figure 2.5: Tree diagram


2.1.7 Rule of total probability and Bayes rule

Let A1, A2, . . . , Ak form a partition of S, that is, the A’s are mutually exclu-sive and their union is S. For example, A and Ac form a partition of S. Thenif B is any event, that is, B is a set in S, then:

B = (A1 ∩ B) ∪ (A2 ∩B) ∪ · · · ∪ (Ak ∩B) (2.8)

Since the A’s are mutually exclusive, the sets (Ai ∩ B) are all mutuallyexclusive and (2.3) gives the rule of total probability:

P (B) = P (A1 ∩ B) + P (A2 ∩ B) + · · ·+ P (Ak ∩ B) (2.9)

For example, for any events A and B in S, B = (A ∩ B) ∪ (Ac ∩ B). SeeFigure 2.6. Then, P (B) = P (A ∩ B) + P (Ac ∩B).

Figure 2.6: Venn diagram

This sets the stage for a probabilistic relationship that has potential forbeing extremely useful in applications of probability and thus reliability. Thisrelationship, called Bayes Rule, is now seen to be a simple extension of therule of conditional probability. To be first to discover an obvious relation-ship, however, is still quite an achievement. Bayes Rule was first published


by Rev. Thomas Bayes in 1764. It follows from the multiplication rule:

P (A ∩ B) = P (A|B)P (B) = P (B|A)P (A).

Using the right-hand sides of the equation and the rule of total probability,Bayes Rule is:

P (A|B) = P (B|A) · P (A)P (B)

=

=P (B|A) · P (A)

P (A1) P (B|A1) + P (A2) P (B|A2) + · · ·+ P (Ak) P (B|Ak), (2.10)

where A1, A2, . . . , Ak is a partition of S.

Now, examine the potential of Bayes Rule. Suppose P (B|A) and P(A)are known, where A is one of the Ai in the partition. Also, suppose thatthe event B is observed to occur. In this case, the question is asked: ”Doesthe occurrence of B influence the probability of occurrence of A? And, if itdoes, how?” Bayes Rule tells how. One view of statistics is as a means ofusing observation to adjust one’s beliefs or ”prior probability”. Bayes Ruleindicates how to make that adjustment under certain restrictions.

Consider the case where an experimenter begins an analysis with a priorprobabilistic belief about the value of a parameter. The prior belief usuallycomes from information, in the case of engineering studies, that is basedon engineering judgment of past experience. Bayes Rule is a way of proba-bilistically adjusting that prior belief using some observed test results. Thedifficulty in applying Bayes Rule often comes from the difficulty in gettingthe prior belief into a proper form.

In the form of Bayes Rule of equation (2.10), the P(A) is called the priorprobability and it represents the prior belief about the event A. The P (A|B)is called the posterior probability and it represents the probability of eventA after observing that the event B has occurred.

EXAMPLE OF THE USE OF BAYES RULE:A new component is being developed which is essentially the same as a

previously produced component. However, improved materials and processeshave indicated that the new component may exhibit improved performanceover the prototype. Tests with the prototype led engineers to believe that


its production results in components of 3 types. The first type T1 rarely fails(assume the probability of failure to be .00001); the second type T2 fails 1%of the time; and, the third type T3 fails 10% of the time. In the experience,the production results in approximately equal numbers of the 3 types. Arandom group of 10 of the new components is tested with no failures. BayesRule can be used to determine the new frequency of the 3 types. Let eventN represent non-defectiveness (not a failure):

P (T1) = 0.333 P (T2) = 0.333 P (T3) = 0.333P (N |T1) = 0.99999 P (N |T2) = 0.99 P (N |T3) = 0.9P (10N |T1) = 0.9999 P (10N |T2) = 0.904 P (10N |T3) = 0.349

Using the rule of total probability, P(10 N)=0.75. ThenP (T1|10N) = 0.44, P (T2|10N) = 0.40, P (T3|10N) = 0.16

Thus, the probability of T1 has increased from 0.333 to 0.44, the prob-ability of T2 has increased from 0.333 to 0.40 and the probability of T3 hasdecreased from 0.333 to 0.16.

2.1.8 Random variables and probability distributions

In many applications of probability theory, statistics and reliability especially,it is important that a numerical value be allocated to an outcome from anexperiment. Notice that in the experiments described earlier, this was notnecessarily the case. In fact, experiment 1 is such an experiment with samplespace:

S = {HHH,HHT,HTH, THH,HTT, THT, TTH, TTT}If this occurs, it can be of interest to assign a real number to each outcome

of S. A natural association of outcomes from experiment 1 to real numbersis the number of heads in each outcome. Then the outcome HHH is asso-ciated with the number 3; HHT, HTH and THH are associated with 2 andso on, where each outcome is associated with one, unique real number. IfX=number of heads, then X is a function of the outcomes that associateseach outcome with a unique real number. In this case, X is called a randomvariable.

Definition: A random variable is a rule or function that associateseach outcome of the sample space with a real number.


Notice that for a discrete sample space, values of a random variable canbe considered events because each value of a random variable is induced by asubset of outcomes from S. Thus it is natural, in this case, to assign a proba-bility to that value of the random variable in the same way that probabilitieswere assigned to events. That is, the probability that a random variable hasa particular value is the sum of the probabilities that were assigned to theoutcomes which induce that particular value of the random variable.

EXAMPLE OF A RANDOM VARIABLE: As in Example 1, acoin is tossed 3 times and each of the 8 possible outcomes in S is assignedprobability 1

8. Let X=number of H’s:

P (X = 0) = P (TTT ) = 18

P (X = 1) = P (HTT ) + P (THT ) + P (TTH) = 38

P (X = 2) = P (HHT ) + P (HTH) + P (THH) = 38

P (X = 3) = P (HHH) = 18

The association of each value of a random variable with the probabilityof that value occurring is called the probability distribution of that randomvariable.

Figure 2.7: Outcomes:Sum of spots on a Toss of a Single Pair of Dice

EXAMPLES OF DISCRETE PROBABILITY DISTRIBUTIONS:a) In the above example of random variable X, the distribution of X is given.


b) Experiment 2: Let Y=sum of the spots showing on the dice. The proba-bility distribution of Y is illustrated in Figure 2.7.

Usually, random variables are denoted by capital letters taken from nearthe end of the alphabet. Since a random variable is a function or rule, it takeson specific values at times and these particular values are usually denoted bya lower case of the same letter as the random variable. Often of interest isthe interval of values when X ≤ x. The probability that X ≤ x, P (X ≤ x),is usually expressed as a function of x:

FX(x) = P (X ≤ x).

The function FX(x) is called the cumulative distribution function (cdf)of the random variable X. If it is entirely clear that the distribution functionF is the cdf of a particular random variable X, the subscript of F will bedeleted. Cumulative distribution functions have the properties:

a) limx→∞ FX(x) = 1, limx→−∞ FX(x) = 0,

b) FX(x) is non-decreasing,

c) P (a < X < b) = FX(b)− FX(a).

The cdf may be used to classify random variables into discrete or contin-uous types. A random variable X is said to be discrete (continuous) if its cdfFX(x) is a discrete (continuous) function. If X is a discrete random variable,a number pX(xi) = P (X = xi) is associated with the value xi of the randomvariable. The numbers pX(x) satisfy:

a) pX(xi) ≥ 0, for all i

b)∑

{all i} pX(xi) = 1.

Also, FX(x) =∑

xi≤x pX(xi).

EXAMPLE OF A DISCRETE CUMULATIVE DISTRIBUTIONFUNCTION: The cdf of the random variable, X=number of heads, in ex-periment 1: 3 tosses of a coin.

FX(x) =

0 for x < 018

for 0 ≤ x < 148

for 1 ≤ x < 278

for 2 ≤ x < 3

1 for 3 ≤ x


Since a continuous random variable has a non-countable infinity of pos-sible values, a positive probability cannot be assigned to each possible value.Thus, the probability that a continuous random variable will assume anyparticular one of its values is zero. Instead, a continuous random variablehas a density function, a non-negative function defined for all real values x,which represents the a continuous distribution of mass over the values of therandom variable and the mass over any interval of the random variable is nu-merically equal to the probability that the random variable assumes a valuein that interval. The density function f(x) is a function with the propertythat this mass is given by the integral of f(x) over the interval. The integralrepresents the area under the graph of f(x) between the end points of theinterval. Thus,

P (a < X < b) =

∫ b

a

f(x)dx.

Since probabilities are non-negative, the density function must be non-negativeand

∫ ∞

−∞f(x)dx = P (−∞ < X <∞) = 1

Also, since F(x) is continuous,F ′(x) exists and F ′(x) = f(x).

EXAMPLE OF A CONTINUOUS PROBABILITY DISTRI-BUTION: The density of a continuous random variable X is given by:

f(x) =

0 for x < 0

4− 8x for 0 < x < 0.5

0, for x > 0.5

The probability that X is between 0 and 0.1 is given by:

P (X < 0.1) =

∫ 0.1

0

(4− 8x)dx =

(

4x− (8x2)

2

)

|0.10 = 0.36

Also, since

F (x) =

∫ x

0

(4− 8y) dy = 4x− 4x2 0 < x < 0.5

P (0.2 < X < 0.4) = F (0.4)− F (0.2) = 0.96− 0.64 = 0.32


Figure 2.8: Triangular Density Function

P (0.2 < X < 0.4|X > 0.2) =0.32

1− F (0.2)=0.32

0.36= 0.89

Thus far in the review of probability, the structure underlying a prob-ability distribution has been developed from the definitions of experiment,sample space and random variable. Although this structure exists and issometimes important to consider, it is more often used as a background sup-port that is not detailed. Most of the time, the probability distribution ofa random variable is assumed to be of a certain known type, based on pastexperience or judgment.

Often, a random variable is a mechanism that arises as a result of a pro-cess with certain characteristics and these characteristics of the process giverise to the probability distribution of the random variable. In the case of ex-periments which result in continuous sample spaces, it can often be assumedthat the outcomes themselves are values of a continuous random variable anda distribution on that random variable can be specified without the interven-ing complex structure.


EXAMPLES OF PROBABILITY DISTRIBUTION SPECIFI-CATIONS:

a) Consider the case in which an extremely large lot (so large that we canconsider it to be infinite) is tested until the first defective is found. If thelot has 2% defectives and once again letting X represent the number of trialsuntil the first defective is found, we have P (X = x) = 0.98(x−1) 0.02. Thismeans that the first (x− 1) trials have non-defectives (each with probability0.98) and then we follow these trials with the first defective (with probability0.02). The distribution of the random variable, X, for this case is called thegeometric distribution.

In general, the probability that the kth success occurs on the trial num-bered X is given by

(

x− 1

k − 1

)

pk−1 qx−k p =

(

x− 1

k − 1

)

pk qx−k.

To obtain this result, we have applied the binomial distribution to obtain(k − 1) successes anywhere among the first (x − 1) trials, followed by asuccess. This distribution is known as the negative binomial or Pascal distri-bution. If p = 2% = 0.02 the probability that the 4th defective is found ontrial 9 is

(

83

)

0.0240.985 = 8.1 × 10−6. The geometric distribution is a specialcase of the negative binomial distribution with k = 1.

b) A random number generator generates a 10-digit number X between 0and 1. Every value is equally likely to occur. X is assumed to be a continuousrandom variable. The density of X is then

f(x) =

{

1 for 0 < x < 1

0, otherwise.

Hence we may say that random numbers are uniform on the unit interval.The continuous uniform distribution is given by

f(x) =

{

1b−a for a < x < b

0, otherwise.

For the special case of the random variable, X representing a random numberas described above, a = 0 and b = 1 and

f(x) =

{

1 for 0 < x < 1

0, otherwise.

2.2. SOME USEFUL DISCRETE DISTRIBUTIONS 43

Figure 2.9 illustrates the distribution of random numbers or equivalentlythe distribution of a random variable which is uniform on the unit interval.

Figure 2.9: Distribution of a Uniform Random Variable on the Unit Interval

2.2 Some useful discrete distributions

2.2.1 Binomial distribution

The binomial arises from the Bernouli trials situation in which there are nindependent trials with two outcomes on each trial (arbitrarily called successand failure) and the probability of either outcome is constant from trial totrial. Suppose that p is the probability of success and that Sq=1-pS is theprobability of failure. let X =the number of successes in n Bernouli trials.Then

P (X = x) =

(

n

x

)

px qn−x x = 0, 1, 2, . . . , n,

where(

n

x

)

=n!

x!(n− x)!; n! = n(n− 1)(n− 2) . . . , (2)(1)

For example, consider 10 tosses of a pair of fair dice. We wish to knowthe probability of exactly 3 “7’s” in 10 tosses. Although we have more thantwo outcomes here, we may arbitrarily say that the other outcome for usewith the binomial distribution is “a non-7”. Hence p = 6

36= 1

6and q = 5

6,

then

P (X = 3) =

(

10

3

) (

1

6

)3 (

5

6

)7

= 120

(

1

216

)(

78125

279936

)

= 0.155


In reliability, p would usually represent the probability of a single unit, sub-system or system surviving t time units.

Example: 8 devices having the exponential distribution with parameterθ = 1000 cycles are placed on test for 500 cycles. what is the probability a)that exactly 4 survive ? and b) at least 5 survive ?a) p = p(survive 500 cycles) = r(500) = e−500/100 = e−1/2 = 0.6065q= p(fail within 500 cycles) = f(500) = 1 − e−500/100 = 0.3935, thus,p(exactly 4 of 8 survive 500 cycles) = P (X = 4) =

(

84

)

(0.6065)4 (0.3935)4 =70(0.1353)(0.0240) = 0.2271

b) P(at least 5 of 8 survive 500 cycles)=

P (X ≥ 5) =∑8

k=5

(

8k

)

(0.6065)k (0.3935)8−k == 56(0.0821)(0.0609) + 28(0.0498)(0.1548) + 8(0.0302)(0.3935) + 1(0.0183)(1) == 0.2801 + 0.2159 + 0.0959 + 0.0183 = 0.6102

Cumulative values of the binomial distribution are found in most statis-tics texts and in several computer software packages. That is, values ofB(x, n, p) =

∑xk=0

(

nx

)

px qn−x x = 0, 1, 2 . . . , n are found. To obtain indi-vidual or marginal probabilities, simply subtract two consecutive cumulativevalues. That is, P (X = x) = B(x, n, p) − B(x − 1, n, p). The mean of thebinomial variate is np and the variance is np(1− p).

2.2.2 Poisson Distribution

A random variable X with a Poisson distribution takes the values x =0, 1, 2, . . . with a probability mass function

P (X = x) =e−µ µx

x!

where µ is the parameter of the distribution.We note that, compared to the binomial,there is no n (number of tri-

als) and that the random variable, x, can assume all possible integer val-ues. Again, the random variable, x, counts successes. However, the numberof trials is infinite and hence the number of successes is unlimited. Theparameter,µ, is the mean number of successes. Thus, it is necessary to know,

2.2. SOME USEFUL DISCRETE DISTRIBUTIONS 45

estimate or hypothesize the mean in order to obtain probabilities of x suc-cesses. In practice, the number of occurrences of any event must be finite,but the Poisson works well even though only the first few values of x haveprobabilites significantly different than zero. For example, there are manysituations where random arrivals are of interest. The number of automobilesarriving at a particular point on a highway or at a particular intersection,“buzz-bombs” falling on London during World War II and semi-finished prod-uct arriving on a conveyor belt to the next stage of manufacturing have allbeen tracked successfully using the Poisson distribution. The distribution offlaws in materials or goods of a fixed size or area tends to be Poisson.

Example Suppose that the number of violent storms arriving on the gulfcoast of the United States is a Poisson random variable with a mean of 4.2per year. What is the probability that in a given year there will bea) no violent stormsb) exactly 3c) four or less ?

SOLUTION:a) P (X = 0) = e−4.2 4.20

0!= e−4.2 = 0.015

b) P (X = 3) = e−4.2 4.23

3!= 0.015(74.088)

6= 0.185

c) P (X ≤ 4) =∑4

k=0e−4.2 4.2k

k!

= 0.015 + 0.063 + 0.1323 + 0.185 + 0.194 = 0.590

In computing we have used the recursive relationship P (X = x) =P (x, µ) = P (x − 1, µ)µ

x. Cumulative tables and computer software for the

Poisson are also widely available. In reliability, the Poisson has many uses.One is illustrated here: Suppose that we have a single unit operating with 4identical units in standby. Suppose further that each unit has the exponen-tial distribution with parameter λ = 0.001 failures per cycle (θ = MTBF =1λ= 1000 cycles). We seek the reliability of this system for a mission of

2000 cycles. Assuming that the switch used to turn on the next stand-byunit when the previous unit fails is completely reliable, we solve this prob-lem using the Poisson. The system is reliable as long as we have 4 or fewerfailures. This is equivalent, with identical units, to allowing one unit to fail4 or fewer times in 2000 cycles. Furthermore, we take advantage of the rela-tionship between the Poisson and the exponential distributions. The Poisson


counts random occurrences and the exponential measures time between oc-currences. The parameter, θ, is the same for both. It represents, in this case,the mean or expected number of failures in 2000 cycles which is 2. Thus,P(X failures—failure mean, µ = 2

2000cycles) = e−µ µx

x!P(4 or fewer failures

µ = 2) = e−2(1 + 11

1+ 22

2!+ 23

3!+ 24

4!) = 0.9473.

2.3 More about distributions

2.3.1 Multivariate, marginal and conditional distribu-tions

There is sometimes interest in probability computations involving two ormore random variables. The term multivariate probability distribution refersto a joint probability distribution of r random variables. Since the details ofdistributions of r random variables are exactly the same as for distributionsof 2 random variables, only probability distributions of 2 random variables,called joint distributions will be defined here.

Definition: The joint cumulative probability distribution of random vari-ables X and Y is

FX,Y (x, y) = P (X ≤ x, Y ≤ y), for all x, y.

The distribution of X, called the marginal distribution, in this case, can beobtained from the joint distribution of X and Y by:

FX(x) = P (X ≤ x) = P (X ≤ x, Y <∞)

If X and Y are both discrete random variables, there is a joint probabilitymass function of X and Y denoted by p(x, y) = P (X = x, Y = y). If Xand Y are both continuous random variables, there exists a joint probabilitydensity function f(x,y) defined for all real x and y such that

P (x1 < X < x2, y1 < Y < y2) =

∫ y2

y1

∫ x2

x1

f(x, y)dx dy

Definition: Random variables X and Y are independent if, for all xand y, P (X < x, Y < y) = P (X < x)P (Y < y). In terms of the jointdistribution function F of X and Y, X and Y are independent if F (x, y) =FX(x)FY (y). The marginal distribution can be regarded as a distribution

2.3. MORE ABOUT DISTRIBUTIONS 47

obtained by summing across the joint distribution. If the joint distributionis viewed as a 3-dimensional plot with the up-dimension being the value ofthe probability or the density, then the marginal can be seen as the reflectionof the joint distribution on one of the upright sides of the 3-dimensional cubethat encloses the joint distribution. Also, the darkness of the shadow ofthe reflection indicates an increase of the height of the reflected shadow.This is meant as an intuitive representation only; the actual determinationof the marginal distribution from the joint distribution is as was outlinedearlier. It is often also of interest to find the distribution of one of twojointly distributed random variables, given a particular value of the other.That is, the distribution of X, given that Y = y, may be of interest and thisdistribution is called the conditional distribution of X given Y = y. If both Xand Y are discrete random variables, this conditional distribution is denotedby

PX|Y (x|Y = y) = P (X = x|Y = y) =P (x, y)

P (y)

If both X and Y are continuous random variables, the density of the condi-tional distribution can be written in terms of the joint and marginal densities

fX|Y (x|y) =fX,Y (x, y)

fY (y), if fY (y) > 0

In terms of the geometrical description, the conditional distribution is theadjusted distribution on a slice through the joint distribution where the ad-justing is to make the probabilities sum or integrate to 1.

EXAMPLES OF JOINT, MARGINAL AND CONDITIONALDISTRIBUTIONS:a) Consider the situation where 2 samples of 5 units are taken from alarge group of units of which 1 are defective and 99% are non-defective.Let X=number of samples of the 2 that contain all non-defective units andlet Y=number of samples that contain exactly 1 defective unit and 4 non-defective units. It can be noticed that, although the samples are indepen-dent with respect to the number of defectives in each sample, the randomvariables X and Y are dependent. The values of the random variables aredetermined by the number of samples that contain a specified number ofdefective units and so since there are only 2 samples, X and Y are depen-dent. For example, if it is known that X=2, it is clear that Y must be 0.


Further, P (Y = 0|X = 2) = 1. The joint probabilities are given in Table 2.1and the marginal distributions are given by the marginals of the table. Thecomputations in the table are based on P(5 non-def)=5(0.99)=0.95 and P(4non-def)=5(0.99)4(0.01)=0.048.

Y0 1 2 Total

0 0 0.0002 0.0023 0.0025X 1 0.0038 0.0912 0 0.0950

2 0.9025 0 0 0.9025Total 0.9063 0.0914 0.0023 1

Table 2.1: Joint Distribution of X and Y

b) consider the joint density

f(x, y) =

{

4(1+xy)5

for 0 < x, y < 1

0, otherwise.

It follows that

f(x) =

{

(4+2x)5

for 0 < x < 1

0, otherwise.

f(y) =

{

(4+2y)5

for 0 < y < 1

0, otherwise.

X and Y are not independent. The dependence of X and Y is true althoughf(y|x = 0.5) = (4+2y)

5= f(y). Thus, it can occur that a conditional adjusted

slice through the joint density can result in the marginal projection, but forindependence, it is necessary for the condition f(y|X = x) = f(y) to holdfor all values of x. The contours of the joint density of X, Y are given inFigure 2.10.

2.3.2 Empirical distributions

a)Histogram: Let the range of values of a particular random variable X bepartitioned into intervals of equal length. Also, let the probability that thevalues of X lie in any interval be interpreted as a relative frequency. If a


Figure 2.10: Contours of Joint Density Function

group of X values is now observed, then the observed relative frequenciesin the groups can be used to represent the probabilities and an approximatedistribution of X can be displayed visually. Such a display is called an empir-ical distribution or histogram and is often used to examine if the probabilitydistribution of X is of a certain type.

To draw a histogram, let the horizontal axis represent the values of therandom variable and draw the boundaries of the intervals, called classes. Letthe vertical axis represent the relative frequencies. It is easiest to choose theclasses of equal length so that the heights of the rectangle or bar over theclasses are proportional to the relative frequencies.

The empirical distribution can show interesting features of the values thatmight otherwise be unnoticed. Also, from the display one can more easilynotice the range of the values, the shape of the concentration of the valuesand whether that shape is symmetric or skewed, whether there are gaps inthe values and whether there are outliers, that is, values that are markedlydifferent from the others.

b)Stem and Leaf Plot: Another method of illustrating the empirical dis-tribution and which has the additional benefit of preserving the actual indi-


vidual values (which are lost in the use of the histogram method) is the stemand leaf plot. The observed values of the random variable are considered tobe of two parts, a stem, consisting of one or two of the leading digits, anda leaf, consisting of the remaining digits. The digits then are used to sortthe values into groups of numerical order and at the same time are used todisplay the values and their frequency.

EXAMPLE OF A STEM AND LEAF PLOT: Table 2.2 presents25 observed values of the time to failure of a particular component in hours:

41.3 32.7 65.4 53.4 27.329.8 21.3 52.6 35.6 26.575.1 31.2 57.7 20.2 45.846.9 55.2 39.8 28.9 24.322.8 36.5 44.8 33.4 21.7

Table 2.2: Time to Failure for 25 Components

In this case, the first digit can be chosen for the stem and the remainingtwo digits for the leaf. This choice results in 5 stems and the plot in Fig-ure 2.11. The number of stems is chosen for viewing ease and there is somepersonal choice. In general, too few stems result in a lack of discriminationin the view and too many stems result in too much noise in the view. Mostoften the advice is given that an effective view is obtained when between 5and 20 stems are used. If there are too few stems with a natural choice forstems and leaves, it is possible to increase the number of stems while keepingthe same stem and leaf choice. For this, each stem is used twice, where leaves0,1,2,3,4 are recorded on the first line of the stem and leaves 5,6,7,8,9 are onthe second. Then the plot is said to have 2 lines per stem. This procedurecan be extended and the next natural choice for extension is to 5 lines perstem. In the example of Figure 2.11, there may be some who believe thatthere are too few stems to get a good view of the empirical distribution. Inthis case, one would use 2 lines per stem and redo the figure.

c)Box Plot: Another display of the values of a random variable thatprovides visual shape information about the about the relative frequency


distribution is called a box plot. The box plot also provides clearer infor-mation about the location or center of the distribution, about the dispersionand skewness of the distribution and about the tails of the distribution. Boxplots, because of the information about the tails, can be useful in determin-ing whether an extreme value or values should be considered outliers fromthe distribution. An outlier would be a value that should not be consideredas an observation from the distribution in question. The construction of boxplots will be discussed in a later section when more of the necessary statisti-cal tools have been defined.

d)Other Empirical Distribution Methods: There are a number of otherimportant method for examining empirical distributions, especially for thedistributions which typically arise in reliability studies. Because several ofthese are so related to reliability distributions, their presentation will bepostponed until more statistical and reliability tools have been defined.

Figure 2.11: Stem and leaf plot


2.3.3 Transformation of variables

In reliability engineering, it is often necessary to comprehend the transforma-tion of a variable in order to understand the relationship between densities.If we are given a density function , say f(x), and we wish to know the den-sity of some function of X, say Y= u(x), then we obtain this density, g(y),using a variable transformation. This is a straightforward procedure whichis usually taught in any calculus sequence. For example, a transformation oftwo variables is made to develop the relationship between rectangular andpolar coordinates. In our treatment of reliability, we will always transformonly one variable. Shown below are two different means of getting the sameresult. Method 1 uses the Jacobian determinant and Method 2 uses the def-inition of the cumulative density function to obtain the density of the newvariable. In the example shown, we are given the exponential density as f(t)and we wish to know the density of the new random variable, Y = 1

T. In the

Jacobian method, we first solve for T in terms of Y, then substitute t for yin the density of y (the exponential) and multiply this result by the absolutevalue of the determinant of the single element matrix, dt

dy. If the cumulative

density of the original variable can be easily obtained, Method 2 becomesquite simple.

Method 1. Given f(x); suppose that Y = u(X)Then X = w(Y ). Let w be a monotone function. Then

g(y) = f [w(y)]

∣

∣

∣

∣

dx

dy

∣

∣

∣

∣

Example:

f(t) =

{

1θe−

tθ for t > 0

0, otherwise.

Let Y = 1T. Find g(y)

T = 1y, g(y) = 1

θe−

1

yθ | − 1y2| = 1

y2θe−

1

yθ . �

Method 2. Obtain the cdf of Y , G(y), and then differentiate to obtaing(y).

EXAMPLE

f(t) =

{

1θe−

tθ for t > 0

0, otherwise.

2.4. ELEMENTS OF STATISTICS 53

Let Y = 1TFind g(y).

G(Y ) = P (Y ≤ y)⇔ P (T ≤ 1

y)

=

∫ 1/y

0

1

θe−

tθ dt = 1− e−

1

yθ

therefore, g(y) = ddy

[

1− e−1

yθ

]

= 1y2θ

e−1

yθ as before.

2.4 Elements of statistics

2.4.1 Introduction

Usually the result of an experiment or test is a set of observations, mea-surements or data. One view of statistics is that it is the science of makinginferences about a population based on an analysis of sample data from thatpopulation. The process of taking a sample of data is important and basicallyone wishes that the sample be representative of the population. An impor-tant method of selecting a sample is random sampling and most statisticaltechniques are based on the assumption of a random sample. In this section,only a brief outline of the techniques that are used in statistics to examinethe data once it is obtained will be presented. These techniques will includeonly computational techniques, as the graphical techniques will be presentedelsewhere. The techniques will include the summarization of the data bycomputation of estimates that represent the characteristics of the populationfrom which the data are a sample and the making of statistical inferencesfrom the summarized data. Distributions that are useful in reliability aretreated in Chapter 3, but some discussion of sampling distributions, asneeded, will be provided in this chapter.

2.4.2 Moments and parametric estimation

The outline begins with the definition of the characteristics of the popula-tion (or its representative distribution) called moments or expected valuesand designated by the Greek letter µi or E(xi). Characteristics of the pop-ulation are called parameters and the moments represent certain of thesecharacteristics, such as the center of gravity (first moment or mean), the


dispersion (the second moment about the mean), the skewness (a function ofthe third moment) or the kurtosis (a function of the fourth moment). Also,any function of the data that does not depend on unknown parameters iscalled a statistic. The statistics that are usually used to estimate the mo-ments are presented here with a short description of some of the propertiesof these estimates. First, the definitions of moments:Definition: The ith moment of a distribution represented by a density f(x)is

µi =

∫

xi f(x)Dx = E(xi). (2.11)

The first moment, the center of gravity of the distribution or the measureof its central tendency, is usually denoted by µ, the population mean,and estimated by the sample arithmetic mean if the data are somewhatsymmetrically mound-shaped. The sample mean is denoted by:

X =1

n

n∑

i

xi, (2.12)

where n is the number of observations in the sample. The measure ofdispersion of the distribution is usually characterized by the second momentabout the mean or E(x2 − {E(x)}2 = σ2, called the population variance.The sample variance is

s2 =

∑ni (xi − x)2

n− 1(2.13)

and the square-root of the variance is the standard deviation. Note thatthe denominator of the sample variance is n-1. Both of these sample measuresX and s2 , have the property of being unbiased, which is the property thatthe mean of the sample measures for all possible samples is the characteristicitself. It is the divisor n-1 that allows this property for s2. The sampleskewness is measured by:

√

b1 =

∑

(xi − x)3/n√

∑

(xi − x)2/nor by

√

b1 =n

(n− 1)(n− 2)

n∑

i

(

(xi − x)

s

)3

(2.14)

The distribution skewness, usually the value√

β1 =µ3

σ3, indicates the

direction and length of the tail of the distribution. A negative value of


skewness indicates that the data tails off to the left, a value near zero indicatesthat the data tend to look symmetric and a positive value indicates that thedata tails off to the right. The sample kurtosis is measured by:

b2 =n(n+ 1)

(n− 1)(n− 2)(n− 3)

n∑

i

(

xi − x

s

)4

− 3(n− 1)2

(n− 2)(n− 3)+ 3 (2.15)

The distribution kurtosis, denoted by β2 =µ4

µ22

, indicates the heaviness of

the tails of the distribution. The normal distribution has a kurtosis value of 3.

Examples: Means and VariancesRecall that the mean of a discrete-valued random variable is given by

µ = E(X) =∑

all x

xPX(x).

Thus, we see that the mean is a weighted average – the sum of values of Xweighted by the probability that each value occurs. The mean is a measureof central tendency. It is the first moment about the origin. The variance ofa discrete-valued random variable, X, is given by

σ2 = V AR(X) = E[(x− µ)2] =∑

all x

(x− µ)2PX(x)

or alternatively,

σ2 = V AR(X) = E(x2)−[E(x)]2 = E(x2)−µ2 =∑

all x

x2 P (x)−[

∑

all x

xP (x)

]2

The variance is a measure of dispersion or spread. It, in essence, representsthe average squared distance of all values from the mean. The standard de-viation, σ, is the square root of the variance.

Example: For the experiment in which two dice were tossed and X rep-resented the sum of the upturned faces,E(X) = 2(1/36)+3(2/36)+4(3/36)+5(4/36)+6(5/36)+7(6/36)+8(5/36)+9(4/36) + 10(3/36) + 11(2/36) + 12(1/36) = 7V AR(X) = 22(1/36)+32(2/36)+42(3/36)+52(4/36)+62(5/36)+72(6/36)+


82(5/36)+92(4/36)+102(3/36)+112(2/36)+122(1/36)−72 = 1974/36−49 =54.83− 49.00 = 5.83

The mean of a continuous-valued random variable is given by

µ = E(X) =

∫

all x

xf(x)dx

As in the discrete case, we see that the mean is a weighted average– it is thefirst moment about the origin. The variance of a continuous-valued randomvariable, X, is given by

σ2 = V AR(X) = E[(X − µ)2] =

∫

all x

(x− µ)2f(x)dx

or alternatively,

σ2 = E(X2)− E(X)2 = E(X2)− µ2 =

∫

all x

x2f(x)dx−[∫

all x

xf(x)dx

]2

As before, the standard deviation, σ, is the square root of the variance.

Example: for the density function given earlier,

f(x) =

0 for x < 0

4− 8x for 0 < x < 0.5

0, x > 0.5

E(X) =∫ 0.5

0x(4− 8x)dx = 2x2 − 8x3

3|0.50 = 1

2− 13 = 1

6= 0.1667

V (X) =∫

x2(4− 8x)dx−(

16

)2

= 4x3

3− 2x4|0.50 −

(

16

)2== 1

6− 1

8− 1

36= 1

72= 0.0139

2.4.3 Samples, statistics and sampling distributions

Random Sample A sample x1, x2, . . . , xn is said to be a random samplefrom a population, if (1) each item in the population has an equal likelihoodof being a part of the sample and (2) the selection of any item in the sampledoes not influence the selection of any other item and is not influenced bythe selection of any other item in the sample.


Characteristics of the sample are called statistics. Examples of statis-tics are the mean, the median, the range and the standard deviation (of thesample values). Characteristics of the population are called parameters.The word parameter also has other meanings in mathematics and statistics.Examples of parameters are the population mean, the population varianceand the population skewness. Statistics are usually, by convention, assignedLatin letters and parameters are assigned Greek letters. e.g., x and s arestatistics; µ and σ are parameters.

Parameter Estimation. Statistics are used to estimate parameters. Forexample, the sample average is used to estimate the population mean andthe sample standard deviation is used to estimate the population standarddeviation. x =

∑ni xi is used to estimate µ and

s =

√

∑ni (xi − x)2

n− 1

=

√

∑ni x

2i −

(∑n

i xi)2

n

n− 1

is used to estimate σ. In general, the estimator of a parameter, θ is given thesymbol θ. θ is called ”theta estimate or theta hat”.

Desirable Properties of Estimators

1) Unbiasedness: An estimator θ is said to be an unbiased estimate of θ ifE(θ) = θ. X is an unbiased estimate of µ. The median and the mode arealso unbiased estimates of µ. . s is an unbiased estimate of σ. The bias of anestimator θ, is given by E(θ)− θ.

2) Small Variance: The variance of an estimator θ is given by

σ2(θ) = V AR(θ) = E[(θ − E(θ)]2 =∑

all θ

[(θ − E(θ)]2 P (θ)

Or∫

all θ

[(θ − E(θ)]2 f(θ)dθ

Obviously if θ has a small variance then the spread around its mean is smalland any selected value of θ will be close to the mean of θ and if the mean


of is close to θ, then θ is a good estimator. The mean square error (MSE)combines the variance and bias of an estimator as a single measure.

MSE(θ) = E(θ − θ)2 = E(θ2 − 2θθ + θ2) = E(θ2)− 2θE(θ) + θ2

We now add and subtract the same quantity

E(θ − θ)2 = E(θ2)− 2θE(θ) + θ2 − [E(θ)]2 + [E(θ)]2.

Rearranging,

E(θ − θ)2 = E(θ2)− [E(θ)]2 + [E(θ)]2 − 2θE(θ) + θ2

MSE(θ) = E(θ − θ)2 = V AR(θ) + [E(θ)− θ]2

orMSE(θ) = E(θ − θ)2 = V AR(θ) + [bias]2.

Thus the MSE of an estimator θ combines information about its varianceand bias. The better the estimator, the smaller the MSE. If the estimatoris required to be unbiased, that is, E(θ) = θ, then E(θ − θ)2 = V (θ). Inthis case, minimizing MSE yields θ that is the minimum variance, unbiasedestimator (MVUE), a desirable property for an estimator.

When we have two or more estimators of the same parameter, one way ofcomparing them pairwise is to calculate their relative efficiency. Given twoestimators of θ, θ1 and θ2, then the efficiency of θ2 relative to θ1 is given by

Relative efficiency =V AR(θ2)

V AR(θ1)

Example: It can be shown that the variance of the sample median fornormal distribution (?) is, for large n, V AR(med) = (1.2533)2 σ2

n. We know

that the variance of the sample mean X is σ2

n. Thus, the efficiency of the

sample median relative to the sample mean is

V AR(X)

V AR(med)=

σ2/n

(1.2533)2 σ2/n=

1

1.25332= 0.6366

3. Consistency

Let θn be the estimate of θ after n observations have been taken. The esti-mator θn is said to be a consistent estimator of θ if for any positive number,ǫ

limn→∞

P (|θn − θ| ≤ ǫ) = 1


This means that as the sample size increases θ is getting closer and closer to θ.The minimum variance unbiased estimator (MVUE) will possess the smallestvariance possible among unbiased estimators. If, in addition, V AR(θ) → 0then this estimator (or, rather, a sequence of estimators) will be consistent.

It can be shown that for normal distribution, the sample mean X is theMVUE estimator of the population mean, µ.

4. SufficiencyAnother important property of an estimator is the property of sufficiency.An estimator is said to be sufficient if it contains all of the information in thesample regarding the parameter. Furthermore, if an unbiased estimator anda sufficient statistic exist for θ ∈ Θ the best unbiased estimator of θ is anexplicit function of the sufficient statistic. If there exists a unique functionof the sufficient statistic for θ, then this is necessarily the best estimator forθ.

E.g., for normal distribution, the sample mean X is a sufficient statisticfor estimating µ.

2.4.4 Normal Distribution:

The Central Limit Theorem states that, under fairly general conditions,the distribution of a sum of n independent random variables, for sufficientlylarge n, is approximately a normal distribution. Furthermore, the normaldistribution can be effectively used to approximate other important sam-pling distributions, such as the binomial. It follows that the distribution ofthe random variable X, from a random sample, is approximately normallydistributed with mean µ and standard deviation σ√

n, where µ and σ are

the mean and standard deviation of the individual random variables in thesample and n is the sample size. That is,

X − µσ√n

= Z, where fZ(z) =1√2π

e−z2

2 (2.16)

Z is said to have a standard normal distribution, with mean 0 and standarddeviation 1 or one writes Z ∼ N(0, 1).Values of the cdf (the area below) Φ(z) = P (Z ≤ z) are found in tableslike that in the appendix, or software. The value of z which has an area ofα above that value will be denoted as zα. Similarly, the value of z whichleaves an area of α below that value is given by −zα, due to the symmetry


of standard Normal distribution. E.g., z0.025 = 1.96 and z0.975 = −1.96. A(1− α) confidence interval for µ is given by:

1− α = P

(

X − zα/2σ√n≤ µ ≤ X + zα/2

σ√n

)

Example: Suppose that the lifetime of wooden telephone poles is normallydistributed with a mean of 20.2 years and a standard deviation of 2.1 years.The probability that a pole will survive beyond 23 years is

1− Φ

(

23− 20.2

2.1

)

= 1− Φ(1.3333) = 1− 0.9088 = 0.0912

The probability distribution of a statistic is called a sampling distribution.There are several important sampling distributions to consider for variousstatistical techniques such as the normal, chi-square, Student’s t and the Fdistributions. These distributions will be outlined briefly here.

2.4.5 The Chi-Square Distribution

The sum of squared standard normal random variables is said to have a chi-square distribution, with parameter r, where r is the number of variables inthe sum. That is:

χ2 = Z21 + Z2

2 + . . . ,+Z2r and fχ2(u) =

1

2r/2Γ(

r2

)ur2−1 e−u/2, u > 0 (2.17)

where:

Γ(n) =

∫ ∞

0

xn−1e−xdx for n > 0 (2.18)

The chi-square distribution is a special case of Gamma distribution which isimportant in reliability theory as it is the distribution of a sum of exponentialrandom variables. It is also important as the distribution of the samplevariance. In addition, the sum of independent chi-square random variables ischi-square distributed with the parameter equal to the sum of the individualparameters.

The gamma function of equation 2.2.8 is used quite regularly in reliabilityengineering. If n is an integer then Γ(n) = (n − 1)! Also the relationshipΓ(n) = (n− 1)Γ(n− 1) is always true whether or not n is an integer. Muchmore will be said about the gamma function in Chapter 4.


In reliability, owing to its relationship with the exponential density, thechi-square might be used for those situations where the exponential is appro-priate.

It can be shown that for a sample of n independent normalsXi ∼ N(µ, σ),the quantity

S2(n− 1)

σ2=

∑

(Xi −X)2

σ2

has chi-square distribution with n− 1 d.f. Using the above relationship, wemay state,

1− α = P

[

(n− 1)s2

χ2α/2,n−1

≤ σ2 ≤ (n− 1)s2

χ21−α/2,n−1

]

where χ2β,n−1 is the value of the chi-sqaure statistic with an area β beyond

this chi-square value.

Example: Consider a random sample of 8 items drawn from a populationknown to be normal: 45.2, 67.8, 34.6, 21.7, 89.3, 55.5, 78.3 and 49.0. We willuse this sample data to first obtain a 90% confidence interval on σ2 and thento get a lower 80% confidence bound for σ2. From the data, s2 = 505.759.

0.90 = P

[

7(505.759)

χ20.05,7

≤ σ2 ≤ 7(505.759)

χ20.05,7

]

0.90 = P

[

3540.31

14.07≤ σ2 ≤ 3540.31

2.17

]

0.90 = P (251.62 ≤ σ2 ≤ 1631.48)

Now suppose that a 80% lower bound for σ2 is desired.

0.80 = P

[

7(505.759)

χ20.20,7

≤ σ2

]

0.80 = P

[

3540.31

9.80≤ σ2

]

0.80 = P (361.26 ≤ σ2)


2.4.6 The Student’s t Distribution:

If Z ∼ (0, 1) and V is chi-square with parameter r, then T has the Student’st distribution with r degrees of freedom, where:

T =Z

√

Vr

and

f(t) =Γ(

r+12

)

√π r Γ

(

r2

)

[(

t2

r

)

+ 1

]−(r+1)/2

−∞ < t <∞ (2.19)

Thus,

T =X − µ

S√n

=X − µ

σ√n

/√

S2/σ2

has a t distribution with parameter n− 1 since X−µσ√n

= Z ∼ N(0, 1) and

S2

σ2=

∑ni (Xi −X)2

(n− 1)σ2=

V

(n− 1)

Critical values of t are given in the Appendix.To obtain values of t from the Appendix, one needs the degrees of freedom,

ν = n − 1, for example t0.05,6 = 1.943. This means that a t-value of 1.943with 6 degrees of freedom (sample size is 7) leaves an area of 0.05 beyond it.Note, from the t tables that limν→∞ tα,ν = zα. E.g. t0.05,100 = 1.660 is fairlyclose to z0.05 = 1.645. A (1− α)100% confidence interval for µ is given by:

1− α = P

(

X − tα/2,n−1s√n≤ µ ≤ X + tα/2,n−1

s√n

)

Example: suppose that a sample of five telephone failures (believed to bedistributed normally) are: 16.5, 21.4, 11.8 19.7 and 22.9. Find a 95% confi-dence interval on the true mean time to failure. X = 18.46 and s = 4.418.thus

P

(

18.46− t0.25,44.418√

5≤ µ ≤ 18.46 + t0.25,4

4.418√5

)

0.95 = P [18.46− 2.571(1.976) ≤ µ ≤ 18.46 + 2.571(1.976)]

= P (13.380 ≤ µ ≤ 23.540)


2.4.7 The F Distribution

The ratio of independent chi-square random variables has an F-distributionwith parameters r1 and r2. In this case, a ratio of sample variances hasan F-distribution with parameters n1 − 1 and n2 − 1, where n1 and n2 arethe corresponding sample sizes of the samples from which the variances were

computed. Thus,S2

1/σ2

1

S2

2/σ2

2

has an F-distribution with the above parameters.

This model can be used for inferences about the ratio of population variances.The F distribution density is given by

fF (t) =Γ(

ν1+ν22

)

Γν12Γν2

2

(

ν1ν2

)ν1/2 t(ν1/2)−1(

1 + ν1tν2

)(ν1+ν2)/2, t ≥ 0

The F distribution comes about as the ratio of two chi-square variates dividedby their respective degrees of freedom, i.e.,

Fn1−1,n2−1 =χ2(n1−1)/(n1 − 1)

χ2(n2−1)/(n2 − 1)

From above, we can see that the ratio of two variances drawn from normalpopulations having the same variance tends to follow an F distribution, i.e.,S2

1

S2

2

∼ F(n1−1,n2−1). We can use this fact to produce a confidence interval for

the ratio of two variances.

Example: Consider the following two samples, drawn from normal pop-ulations and test to see if they have the same variance (α = 0.10).Sample 1: 23.4, 31.6, 29.6, 19.9, 26.4,28.5, 26.7, 20.3, 32.4Sample 2: 67.4, 69.1, 72.5, 66.8, 80.8,79.9 ,77.4,66.8, 73.6, 71.1, 74.4,76.1

s21 = 24.28, s22 = 20.76, n1 = 9, n2 = 12{

H0 : s21 = s22

H1 : s21 6= s22 α = 0.10

We haves21

s22

= 1.170 F0.05,11,8 = 3.31 F0.95,11,8 = 0.34

0.34 < 1.170 < 3.31 Thus, we can not reject H0.In order to obtain F.95,11,8, we used the relationship

F0.95,11,8 =1

F0.05,8,11

=1

2.95= 0.34


In general,

F1−α,n1−1,n2−1 =1

Fα,n2−1,n1−1

2.4.8 Tables of Sampling Distributions:

There are tables of the sampling distributions which include the backgroundinformation as well as the necessary values for use of the distributions. Onecomprehensive book of tables is by Owen (1962) which is directed towardstudents, practitioners and researchers in Statistics.

2.5 PARAMETER ESTIMATION

The general problem of estimating a characteristic of a population or dis-tribution, called a parameter, is that of deriving a function of the sampleobservations, the data, such that the value computed from the sample is usu-ally close the actual value of the parameter. There may be several differentpotential estimators for a parameter. For example, if the mean of a distri-bution is to be estimated, one might consider the sample mean, the samplemedian or some other function of the data as an estimator. Candidate esti-mators are often found by the method of maximum likelihood, least squaresor the method of moments.

2.5.1 Maximum Likelihood:

One method for choosing a point estimator, and in fact, one of the bestmethods is the method of maximum likelihood. Let X be a random variablewith probability density function f(x; θ), where θ is an unknown parameter.Let X1 = x1, X2 = x2, . . . , Xn = xn be a random sample of n observed valuesx1, x2, . . . , xn with likelihood function:

L(θ) = f(x1, θ) · f(x2, θ) · · · · · f(xn, θ).

The likelihood function, a function of only the unknown parameter θ sincethe x’s are observed values, is essentially then the “likelihood” of θ with theseobserved values of x. The maximum likelihood estimator of θ is the value ofθ that maximizes the function L(θ).

2.5. PARAMETER ESTIMATION 65

Maximum likelihood estimators are not necessarily unbiased but they canusually be easily adjusted to be unbiased. Also, maximum likelihood esti-mators have excellent large sample properties, since they are asymptoticallynormally distributed, asymptotically unbiased and, under mild conditions ofregularity, asymptotically efficient.

Using the likelihood function, approximate (1 − α)100% confidence in-tervals on parameters of interest may be obtained by inverting Fisher’s In-formation Matrix. Fisher’s Information Matrix, for the two parameter case,θi, θj, is given by

Iij = E

[

−δ2 lnL

δθi δθj

]

i, j = 1, 2

This leads to

[

V ar(θ1) Cov(θ1, θ2)

Cov(θ1, θ2) V ar(θ2)

]

=

[

− δ2lnLδθ1

2 |θ1,θ2 − δ2lnLδθ1δθ2

|θ1,θ2− δ2lnL

δθ1δθ2|θ1,θ2 − δ2lnL

δθ22 |θ1,θ2

]−1

Using the asymptotic normality property of the MLE’s, the (1 − α)100%confidence intervals for θi are calculated using

θi ± zα/2

√

V ar(θi)

Example:Consider the exponential distribution for the case in which all units tested

fail. The times to failure are t1, t2, . . . , tn.

L =

(

1

θe−

t1θ

)(

1

θe−

t2θ

)

. . .

(

1

θe−

tnθ

)

L =

(

1

θ

)n

e−∑n

i tiθ

lnL = n ln

(

1

θ

) ∑ni tiθ

= −n ln(θ)−∑n

i tiθ

δlnL

δθ= −n

θ+

∑ni tiθ2

= 0

Setδ lnL

δθ= 0⇒ −n+

∑ni tiθ

= 0


⇒ θ =

∑ni tin

= t

Example: Supppose that 6 units are tested until each fails. The time tofailure density is believed to be exponential. The failure times are 450, 540,670, 710, 990 and 1210. Then

θ =

∑6i=1 ti6

=4570

6= 761.67

Often the simultaneous equations represented by the partial derviativesof lnL with respect to each unknown parameter are difficult to solve andappoximate and or iterative methods such as the method of Newton-Raphsonmust be used. More will be said about this with respect to MLEs for theparameters of the Weibull density in Chapter 4.

Example: Consider a censored sample from the exponential distributionwith parameter θ. Censored, for our purposes here means that the unit ranfor a certain time and did not fail and we have recorded its non-failure run-ning time. Items in our sample either represent failure times, tif , or censoredtimes, tis. We will first develop a general expression for the maximum likeli-hood estimate of and then look at a specific case. Suppose that our sample,ordered by time to failure and running time, looks like this:

{t1,f , t2,f , t3,s, t4,f , t5,s, t6,s, t7,f , t8,f , t9,f , t10,f , t11,s, t12,s, t13,f}.

This indicates that units 1,2,4,7,8,9,10 and 13 failed at times t1, t2, t4, t8, t9, t10and t13 and units 3,5,6,11 and 12 survived through times t3, t5, t6, t11 and t12.Then, using the exponential density,

L =∏

k=f

(

1

θe

ti,kθ

)

∏

k=s

(

e−tj,kθ

)

Λ = lnL = ln

(

1

θ8

)

+∑

k=f

−ti,kθ+

∑

k=s

−tj,kθ

Λ = − ln θ8 − 1

θ

(∑

k=f

ti,k +∑

k=s

tj,k

)

Λ = −8 ln θ − 1

θ

(∑

k=f

ti,k +∑

k=s

tj,k

)


δΛ

δθ= −8

θ+

1

θ2

∑

k=f

ti,k +1

θ2

∑

k=s

tj,k = 0

−8 + 1

θ

(∑

k=f

ti,k +∑

k=s

tj,k

)= 0

θ =

∑i ti,f +

∑j tj,s

8

Thus, the maximum likelihood estimator for the multiply censored case issimply the sum of all failure times and running times divided by the numberof failures. For the example above suppose that the data is as follows:t1,f = 114 t6,s = 520 t11,s = 1692t2,f = 237 t7,f = 774 t12,s = 1748t3,s = 251 t8,f = 892 t13,f = 2107t4,f = 495 t9,f = 1055t5,s = 520 t10,f = 1278

∑

k=f

ti,k = 6952∑

k=s

tj,k = 4731

θ =6952 + 4731

8=11683

8= 1460.38

In general, for a sample taken from an exponential distribution with rfailures and c censored items (or non-failures),

θ =

∑ri=1 ti,f +

∑cj=1 ti,s

r

where ”f” indicates failed unit.

2.5.2 Moment Estimators

The method of moments equates sample moments to population momentsand solves for the parameters to be estimated. Population moments havebeen observed earlier and they are formally defined now. The kth populationmoment about the origin (defined earlier in section 2.3.2) is given byµ′k =

∑all x x

k Px(x) for the discrete case and


µ′k =∫all x

xkf(x)dx for the continuous case. The kth sample moment aboutthe origin is given by

m′k =

∑ni=1 x

k

n

Example: Consider the gamma density which is given by

f(x) =

(x

β

)α−1e−

xβ

β Γ(α), x > 0

where

Γ(α) =

∫ ∞

0

xα−1 e−xdx

The first two moments about the origin for this distribution are µ′1 = αβ andµ′2 = αβ2 + α2β2.

Now consider a sample of size n taken on a process variable believed tobe described best by the gamma density. The first two sample moments are

given by m′1 =

∑ni xi

n= X and m′

2 =∑n

i x2

i

n

Hence, we equate

m′1 =

∑ni xi

n= X = αβ and m′

2 =∑n

i x2

i

n= αβ2 + α2β2. From the first

equation, β =m′

1

α. Substituting in the second equation and substituting for

α, we have

α =m′

12

m′2 −m′

1

=(X)2(∑n

i Xi2

n

)−X

2=

n(X)2∑ni (Xi −X)2

Substituting α in the first equation, we obtain

β =m′

1

α=

X

α=

∑ni (Xi −X)2

nX.

Thus the moment estimators of the parameters α and β for the gammadistribution are

α =n(X)2∑n

i (Xi −X)2and β =

∑ni (Xi −X)2

nX.


2.5.3 Least Squares Procedure:

Suppose that there is a single dependent variable or response y which isuncontrolled and depends on one or more independent regressor variables sayX1, X2, . . . , Xn which are measured with negligible error and are controlled.The relationship fit to such a set of experimental data is characterized by aprediction equation called a regression equation. Linear regression involvestreating only the case of a single regressor variable. Let us denote a randomsample of size n by the set {(xi, yi); i = 1, 2, ...n}. Each observation in thissample satisfies the equation yi = α + βxi + ǫi.

The α and β in the above model are called regression coefficients. The βis the slope of the regression line and α is the Y intercept of the regressionline. ǫi is a random error term with mean 0 and variance σ2. The randomerrors are also assumed to be uncorrelated. The estimated regression line is,

yi = a+ bxi

Each pair of observations satisfies the relation,

yi = a+ bxi + ei

where ei = yi − yi, and a and b in the above equation are estimators for theparameters α and β. These parameters are estimated by minimizing the sumof squares of the residuals ei.

Least Squares Estimation The process is to find a and b, the estimatesof α and β such that the sum of the squares of the residuals is a minimum.The residual sum of squares is called sum of squares error(SSE).

SSE =n∑

i

e2i =n∑

i=1

(yi − yi)2 =

n∑

i=1

(yi − a− bxi)2

Differentiating SSE with respect to a and b and setting them to 0 we get

δ(SSE)

δa= −2

n∑

i+1

(yi − a− bxi) = 0

orn∑

i=1

yi = na+ bn∑

i=1

xi (2.20)


X Y X2 XY1 10 143 100 14302 17 137 289 23293 22 129 484 28384 29 114 841 33065 35 98 1225 34306 41 87 1681 35677 48 79 2304 37928 57 59 3249 33639 66 48 4356 316810 78 41 6084 319811 91 35 8281 3185

SUM: 494 970 28894 33606

Table 2.3: Data and calculation for Least squares Example

δ(SSE)

δb= −2

n∑

i=1

xi(yi − a− bxi)

orn∑

i=1

xiyi = an∑

i=1

xi + bn∑

i=1

xi2 (2.21)

The least squares estimates a and b of the regression coefficients arecomputed by solving equations (2.20) and (2.21) simultaneously, resulting in

b =n∑n

i=1 xiyi − (∑n

i=1 xi)(∑n

i=1 yi)

n∑n

i=1 xi2 − (

∑ni=1 xi)2

(2.22)

and

a =

∑ni=1 yi − b

∑ni=1 xi

n= y − bx (2.23)

ExampleThe data in the Table 2.3 below will now be analyzed using the least

square intercept and slope parameter estimates shown above.

b =n∑n

i=1 xiyi − (∑n

i=1 xi)(∑n

i=1 yi)

n∑n

i=1 xi2 − (

∑ni=1 xi)2

=11(33606)− (494)9970)

11(28894)− (494)2= −1.484

2.6. STATISTICAL INFERENCE 71

∑ni=1 yi − b

∑ni=1 xi

n= y − bx =

970

11− (−1.484)

(494

11

)= 154.827

Figure 2.12 presents a plot of the data and the fitted line.

Figure 2.12: Least Squares Regression line Fitted to the data of Table 2.3

2.6 Statistical inference

Statistical inference is comprised of methods of making inferences about thepopulation from sample observations. Of course, the use of estimates as inthe previous section is a method of making an inference about the populationfrom sample data and, as such, is considered a statistical inference. Thereare two other commonly used methods of statistical inference that will beoutlined here: interval estimation and tests of hypotheses.

2.6.1 Interval Estimation:

In many cases, a point estimate of a parameter does not provide enoughinformation about the parameter of interest. For example, the point estimatedoes not reveal any information about the variability of the estimate. Thissituation can be rectified by the use of an interval estimate or confidenceinterval, whose length is a function of the variability of the estimate.

There are several formal methods of choosing the appropriate values ofthe parameter for the confidence set.


Often, it is possible to compute a confidence interval in a fairly simplemanner, but this simple way of constructing confidence intervals requiresthat functions of the sample and parameter be found which are distributedindependently of the parameter. Thus, in order to compute a confidenceinterval on a parameter θ in this simple way, two functions Θl and Θu of thedata are determined such that Pr{Θl < θ < Θu} = 1−α, where α is usuallysmall and where it is possible to compute this probability indepen-dently of the value of the parameter. Then suppose this probability iscomputed for a large number of similarly taken samples, that is, the limits Θl

and Θu are calculated for each sample. One could then interpret probabilityas meaning that approximately 100(1 − α)% of the intervals will cover thetrue value of Θ. Using this procedure with one sample results in a methodfor obtaining a random interval which covers the true value of the parameterwith a specified probability. This method can be used to construct confidenceintervals for the mean µ of a normal distribution because it is known that

the function X−µs√n

has a t distribution independent of µ.

Now, suppose that the distribution of an estimator of θ depends on θandone is interested in computing a 100(1−α)% confidence interval. In the casewhere one can determine the density of an estimator of the parameter, sayg(Θ; Θ), where Θ is the estimator and Θ is the parameter, one proceeds asfollows. For a particular specified value of Θ , say Θ0, it is possible to findtwo numbers,g1(Θ0) and g2(Θ0), such that:

P{Θ < g1(Θ0)} =∫ g1(θ0)

−∞g(Θ; Θ0)dΘ =

α

2

and

P{Θ > g2(Θ0)} =∫ ∞

g2(θ0)

g(Θ; Θ0)dΘ =α

2

The values g1(Θ) and g2(Θ) are functions of the Θ values and it followsthat:

P{g1(Θ) < Θ < g2(Θ)} =∫ g2(θ)

g1(θ)

g(Θ; Θ)dΘ = 1− α

When the functions g1(Θ) and g2(Θ) are plotted in the Θ, Θ space, as inFigure 2.13, a confidence interval for θ can be constructed as follows: froma sample of n, compute Θn and draw a horizontal line through Θn on the


Θ -axis. This line will intersect the two curves at points labeled Un andLn as in the figure. These two numbers, Un and Ln, when projected on theΘ-axis asΘL and ΘU define a confidence interval for Θ. To examine why thisis true, consider that the sample comes from a population with parametervalue Θ0. The probability that the estimate Θ is between g1(θ) and g2(θ) is1 − α. When the estimate does fall between these values, a horizontal linethrough Θ will cut a vertical line through Θ0 at some point between thecurves and the corresponding interval (ΘL,ΘU) will cover Θ. It follows thatthe probability is 1− α that such an interval will cover Θ. This statement istrue for any population parameter value Θ.

Figure 2.13: Plot of L(θ) and L(θ) in the θ, Θ space

Sometimes it is possible to determine the limits (ΘL,ΘU) for a givenestimate without actually finding the functions g1(Θ) and g2(Θ). Refer toFigure 2.13 and note that the limits for θ are at points (ΘL,ΘU) whereg1(ΘU) = Θn and g2(ΘL) = Θn In terms ofg1 and g2, it follows that ΘU isthe value of θ for which: this seems

murky to me!∫ θn

−∞g(Θ; Θ)dΘ =

α

2


and ΘL is the value of θ for which:∫ ∞

θn

g(Θ; Θ)dΘ =α

2

If these equations can be solved for θ, then the solutions are the 100(1−α)%confidence limits for θ.

EXAMPLE: Consider the exponential distribution with mean time tofailure θ. For a sample of 1 observation x,Θ = x and the above two integralsresult in the equations:

1− e−xθ =

α

2and e−

xθ =

α

2

which can be solved for ΘU and ΘL. That is, ΘU = x−ln(1−α/2) and

ΘL =x

−ln(α/2) .

In the case of a discrete random variable, the above integrals becomesums, but confidence intervals having confidence coefficients exactly equalto 1 − α are not available. However, under certain conditions, one can findconfidence intervals having confidence coefficients not less than 1 − α. Forexample, consider the case of a sample (x1, x2, . . . , xn) which represents asample of observation from a binomial random variable with:

f(x, p) = px(1− p)1−x, x = 0, 1

Now suppose that k of the x’s are 1’s. The estimator of p is p = kn,where

k =∑n

i=1 xi can have values 0, 1, 2, ..., n. Then

g(np; p) =

(n

np

)pnp(1− p)n−np, np = 0, 1, 2, . . . , n

The upper 100(1−α)% confidence limit PU can be determined by findingthe value of p for which:

k∑

y=0

(n

y

)py(1− p)n−y =

α

2

and the lower limit PL is the value of p for which:

n∑

y=k

(n

y

)py(1− p)n−y =

α

2


If k=0, the lower limit is taken to be 0, and if k=n, the upper limit istaken to be 1.

Another procedure for finding binomial parameter confidence intervalsand a procedure that is more easily extendible to multidimensional parame-ters is as follows: First construct a set which is called a 1−α acceptance setfor each possible value of p. For each value of p include in the acceptanceset, values of k such that the sum of the probabilities of the values in theacceptance set is greater than or equal to 1 − α. The usual way to do thisis to select the value of k which has the largest probability of occurrence forthe given p. Then continue adding values of k into the acceptance set bydescending order of probability until the total probability of entries in theacceptance set is greater than or equal to 1 − α. Confidence sets are con-structed from acceptance sets. For the outcome k that is observed, check theacceptance set for a value of p and, if k is in the acceptance set for the valueof p checked, then that value of p is in the confidence set for p for the k valuethat was observed. In regular cases, this procedure results in the same limitsas the procedure discussed earlier.

2.6.2 Hypothesis Testing:

Sometimes, the situation requires that a statement about the parameter inquestion be statistically “verified” as acceptable. The statement is usuallycalled a hypothesis and the decision-making procedure is called hypothesistesting. In a statistical test of a hypothesis, the procedure depends onthe statement of two opposing hypotheses, the null hypothesis H0, andthe alternative hypothesis H1. The hypotheses are statements about thepopulation or distribution of interest and not statements about the sample.The hypothesis testing procedures use the sample information to make astatistical decision about the population hypothesis. If the informationfrom the sample is consistent with the statement of the hypothesis, thenit is concluded that the hypothesis is true. If the information from thesample is inconsistent with the hypothesis, then the hypothesis is judgedfalse. One method of testing hypotheses can be easily based on confidenceinterval estimation.

The decision to accept or reject the hypothesis is based on a test statisticcomputed from the data. Two regions are determined such that when thetest statistic is calculated to be in one region, the hypothesis H0 is rejected


and when the test statistic is in the other region, H0 is not rejected. Thefirst region is called the rejection region and is determined so that theprobability of the test statistic being in the rejection region when H0 is trueis α, a value denoted as the significance level of the test.

When a decision is made using the sample data, it is possible to make anerror in this decision. There are two kinds of error that can be made whentesting hypotheses: 1) if the null hypothesis is rejected when it is true, anerror, called a Type I error, is made, with probability α; if the alternativehypothesis is rejected when it is true, an error, called a Type II error, ismade, with probability β.

The methods for constructing confidence intervals and tests of hypothesesin reliability situations will be discussed in later chapters and more generalmethods can be found in many statistical texts, see for example, Hines andMontgomery (1990).

2.6.3 Tolerance Limits:

In many applications of statistics it is useful to compare the data on anitem to a set of specifications to determine how many of the items satisfy thespecifications. Often these specifications are called tolerance limits and aredetermined by the requirements on the item. Sometimes, tolerance limits arelimits computed from the sample data which have a certain proportion of thepopulation between them with a certain probability.

If the distribution is known and the values of its parameters specified, thenthe computation of tolerance limits is straight-forward from the distributionfunction and it is unnecessary to add a probability statement because nosample data are involved.

If the distribution and/or the parameters of the distribution are unknown,then the tolerance limits must be computed using some sample data and itis possible to compute tolerance limits such that with a certain probabilityor confidence they contain a certain proportion of the population.

Tolerance limits for known distributions are based on estimates of theparameters of the distribution and are tabled for several important distri-butions. Non-parametric tolerance limits can also be constructed, based onsample order statistics. The non-parametric limits are efficient and useful inmany situations.


EXAMPLE: Consider a normal distribution with known standard devi-ation σ that is, X ∼ N(µ, σ). In this case, the sample mean based on a sam-ple of n items is such that: X ∼ N(µ, σ√

n). The tolerance limits are bounds

X ± kσ, where k is such that:

P{FX(X + kσ)− FX(X − kσ) ≥ 1− α} ≥ 1− β

or, in words, the probability is greater than or equal to 1 − β that the areabetween X ± kσ is greater than or equal to 1 − α. Since P{X − kσ ≤X ≤ X + kσ} ≥ 1 − α will be true whenever X − kσ ≤ µ + zα/2 σ andX+kσ ≥ µ−zα/2 σ (wherezα/2 is the value of the standard normal (see 2.2.6)for which there is area α/2 above), it follows that this will occur whenever

(zα/2 − k

)√n ≤ X − µ

σ√n

≤(k − zα/2

)√n

or whenever k = zα/2 +(

1√n

)zβ/2. Then this value of k provides tolerance

limits for the distribution of X.

2.6.4 Prediction Intervals:

Another type of statistical limit that reliability practitioners find useful, inaddition to confidence intervals and tolerance limits, is called a predictioninterval. Confidence intervals give bounds on characteristics (parameters) ofthe population and tolerance limits give bounds on areas or proportions ofthe population enclosed in a region. Prediction intervals give bounds on thevalues of the next k observations from the same population; that is, a pre-diction interval is an interval formed from two statistics, L(X1, X2, . . . , Xn)and U(X1, X2, . . . , Xn) from a random sample from the population underconsideration such that the interval (L, U) contains the next k observationsfrom that population, with probability γ. Note that the prediction intervaldoes not mean that the interval (L, U) determined from a first sample ofsize n will contain 100γ% of a large number of second samples of size k. Itdoes mean: if pairs of samples of sizes n and k are drawn repeatedly, and aninterval (L, U) is computed from each of the samples of size n of a given pair,then 100γ% of the samples of size k will fall in the interval corresponding tothat pair. Prediction intervals will be presented for specific situations later


in this text. At this point, only an example of their possible use in reliabilitywill be presented. Suppose one has n observations on the time to failure of aparticular device and there is interest in buying k more of these devices. Animportant question is: How large should k be so that there is a probabilityγ that at least m(m ≤ k) of these devices will operate t0 units of time orlonger? Prediction intervals can effectively answer that question. See Engle-hardt and Bain (1978), Hall and Prairie (1971) and Hall and Prairie (1973).

EXAMPLE: Suppose one has a sample of the failure times of 10 de-vices which can be assumed to follow a normal distribution. In addition, it isplanned that k more of the devices are to be put into use. How large should kbe to give 95% confidence that at least m ≤ k of the devices will survive pastx10+ rms10 with probability .95, where x10 and s10 are the sample mean andstandard deviation from the previous sample of n=10 and rm = r10 = 1.0 isfound from Hall and Prairie (1973), when k=16 and m=10.

2.7 Goodness of Fit tests

We review two general tests and then two specific tests for goodness of fit(for the exponential and for the Weibull) the general nature of a goodness-of-fit test is to set up hypotheses such that the null hypothesis reflects thatthe data is a representive sample from the distribution under consideration.E.g.,H0 : data represents a sample from the normal distribution(with/without µand σ given)H1 : data does not represent a sample from the normal distribution (with/withoutµ and σ given). The test is conducted at a level of risk, α representing theprobability of falsely rejecting H0.

2.7.1 Chi-square Goodness-of-Fit Test

The most popular of all goodness-of-fit tests, it requires a large number ofdata points, say at least 50 and preferably at least 100 data points. Wecategorize the data. We note the observed and expected frequencies in eachcell or category. For the continuous case, the data might be categorized as:

2.7. GOODNESS OF FIT TESTS 79

CELL OBSERVED EXPECTEDa1 ≤ x ≤ a2 O1 E1

a2 ≤ x ≤ a3 O2 E2

. . .

. . .an ≤ x ≤ an+1 On En

Calculate χ2 =∑n

i=1(Oi−Ei)

2

Eiand compare it with χ2

α,ν , where the degreesof freedom ν depend on the number of parameters estimated from the data.For example, if the null hypothesis is about the normal distribution and µand σ are estimated from the data and 12 cells are used then the degrees offreedom are 12− 1− 2 = 9. A rule of thumb indicates that if any cells haveexpected values less than 5, they should be combined with adjoining cellsuntil the expectation is greater than 5. In this case, the number of cells, n,is reduced and the new n should be used in determining degrees of freedom.Consider the following two sets of hypotheses:

SET:1 SET:2H0 : data from normal dist. H0 : data from normal dist.with (µ, σ)

H1 : data not from normal dist. H1 : data not from normal with (µ, σ)

Note that if H0 in set 1 is rejected, we may conclude that the data is notfrom a normal distribution, but if H0 in set 2 is rejected, we conclude onlythat the data is not from a normal distribution with parameters µ and σ,which implies that the data could be from another distribution with differentparameters, µ′ and/or σ′.

Example:

Table 2.4 presents the following data, taken from a population consideredto be exponential:

λ = 100(17)+300(14)+···+3000(8)1000

= 960

Note that cell mid-points were used to estimate λ and that 3000 was ar-bitrarily assigned to represent values beyond 2400. Expected values are nowcalculated using the cumulative exponential, F (t) = 1− e−

t960

.


CELL OBSERVED0-200 17200-400 14400-600 16600-800 12800-1000 41000-1200 61200-1400 61400-1600 41600-1800 41800-2000 22000-2200 52200-2400 2> 2400 5

Table 2.4: Data for Chi-Square Goodness-of-Fit Example

CELL OBSERVED EXPECTED0-200 17 18.81200-400 14 15.27400-600 16 12.39600-800 12 10.07800-1000 4 8.171000-1200 6 6.641200-1400 6 5.391400-1600 4 4.371600-1800 4 3.551800-2000 2 2.902000-2200 5 2.342200-2400 2 1.90> 2400 5 8.20

The calculated chi-square value is 5.504. This is compared with the chi-square value α = 0.05, with 10−1−1 = 8 degrees of freedom. χ2

0.05,8 = 15.51.Since 5.504 < 15.51 we cannot reject the null hypothesis and we may con-clude that the exponential distribution is appropriate for this data.

2.7. GOODNESS OF FIT TESTS 81

Combining cells with expectations less than 5 we have

CELL OBSERVED EXPECTED0-200 17 18.81200-400 14 15.27400-600 16 12.39600-800 12 10.07800-1000 4 8.171000-1200 6 6.641200-1400 6 5.391400-1800 8 7.921800-2200 7 5.24> 2200 7 10.10

2.7.2 Kolmogorov-Smirnov Test

This non-parametric goodness of fit test estimates the empirical distributionfunction at each ordered data point x(i) by

Fn(x(i)) =i

n, i = 1, 2, . . . , n.

We let F0(x(i)) represent the value of the distribution function evaluatedunder the hypothesized density. For each i, we calculate |Fn(x(i))−F0(x(i))|.We then obtain the maiximum of these absolute deviations and compareit with the critical K-S value, Dn,α (with the risk level α). Specifically, ifDn = maxi |Fn(x(i))−F0(x(i))| > Dn,α, then we may reject the null hypothesisthat the data is a representative sample of the proposed distribution. Thecritical values Dn,α are given in the Appendix. Also, for n > 35, Dn,0.05 canbe approximated by 1.36/

√n.

Example: Shown in Table 2.5 below are 10 computer-generated randomnumbers. Test the hypothesis that they are from a population uniform onthe unit interval. Note that, for the uniform distribution, F0(x(i)) = x(i). Thelast column of Table 2.5 is for use with the Lohrding test, described in thenext section.

From Table 2.5, we compute Dn = maxi |Fn(x(i)) − F0(x(i))| = 0.161.From Table 2.8, D10,0.05 = 0.41. Thus, we cannot reject the hypothesis thatthe data is a sample from a uniform (0,1) distribution.


Data Order No. x(i) Fn(x(i)) F0(x(i)) |Fn − F0| dk0.394 1 0.240 0.1 0.240 0.140 1.790.639 2 0.330 0.2 0.330 0.130 1.330.748 3 0.363 0.3 0.363 0.063 0.6980.330 4 0.394 0.4 0.394 0.006 0.2180.539 5 0.487 0.5 0.487 0.013 0.2230.984 6 0.539 0.6 0.539 0.061 0.0420.620 7 0.620 0.7 0.620 0.080 0.1180.240 8 0.639 0.8 0.639 0.161 0.6840.487 9 0.748 0.9 0.748 0.152 0.6310.363 10 0.984 1.0 0.984 0.016 1.078

Table 2.5: Random numbers and their use in K-S test and Lohrding’s Test

2.7.3 The Lohrding Test

A generally more powerful extension of the K-S goodness of fit test hasbeen developed by Lohrding (1973). The Lohrding test, whose power wasexamined using an extensive simulation study, is based on a statistic impliedby Pyke (1959) and used by Maag and Dicaire (1971). Three versions of thestatistic were considered and the most powerful generally is T =

∑nk=1

dkn,

where

dk =|Fn(x(k))− k

n+1|

√(n−k+1)k

(n+1)2(n+2)

Critical values for T are given by T (α) = P1 − P2(n + P3)P4 , where for

α = 0.10, 0.05, 0.01, the P values are presented in Table 2.6.

Example:Consider the data in Table 2.5. Recall that the data represents 10 randomnumbers. Test the hypothesis that they represent the uniform distributionon (0,1). Note that for the uniform distribution, F0(x(i)) = x(i).

Using the Lohrding test with the data in Table 2.5, the average of the d′ks(last column of the table) =T=0.6813 and the critical value is T(.05)=1.47.

2.8. REFERENCES 83

This results in the same conclusion of “not reject” that was reached usingthe K-S test with this example.

Also given in Lohrding (1973) are 100(1−α)% confidence bounds on thecdf F that are comparable to the corresponding K-S bounds but which, forlarge sample sizes, are generally better, that is, more narrow in the tails ofthe distribution than the K-S bounds. The Lohrding bounds are:

Lower 100(1− α)% bound: k

n+1− B(α)

√(n−k+1)k

(n+1)2(n+2)

Upper 100(1− α)% bound: kn+1

+B(α)√

(n−k+1)k(n+1)2(n+2)

where B(α) = P1 + P2(n+ P3)P4 and the P values are given in Table 2.6

α P1 P2 P3 P4

0.10 1.23 6.48 3.18 -2.050.05 1.42 2.13 2.26 -1.540.01 1.84 0.17 -4.02 -0.753

Table 2.6: Values of P used to compute the critical values of T (α)

For the above example, the 95% confidence bounds on F for both the K-Sand the Lohrding techniques are given in Table 2.7. Note that the confidencewidth for Lohrding technique is smaller than that of the K-S.

2.8 References

Advisory Group on Reliability of Electronic Equipment (AGREE) (1957),”Reliability of Military Electronic Equipment”, Task Group 9 Report, Wash-ington, DC, US Government Printing Office, June.

Englehardt, M. and Bain, L. J. (1978), ”Prediction Intervals for theWeibull Process,” Technometrics, 20, pp. 167-169.

Hahn, G. J. and Nelson, W. B. (1973), ”A Survey of Prediction Intervalsand Their Applications,” Journal of Quality Technology, 5, pp. 178-188.


Data interval LL CL UL LK-S CK-S UK-S0-0.240 0 0 0.245 0 0 0.41

0.240-0.33 0 0.0909 0.418 0 0.1 0.510.33-0.363 0 0.1818 0.562 0 0.2 0.610.363-0.394 0 0.2727 0.683 0 0.3 0.710.394-0.487 0 0.3636 0.785 0 0.4 0.810.487-0.539 0.033 0.4545 0.876 0.09 0.5 0.910.539-0.620 0.124 0.5454 0.955 0.19 0.6 10.620-0.639 0.226 0.6364 1 0.29 0.7 10.639-0.748 0.347 0.7273 1 0.39 0.8 10.748-0.984 0.491 0.8182 1 0.49 0.9 10.984-1 0.664 0.9091 1 0.59 1 1

Table 2.7: Examples of K-S (LK-S and UK-S) and Lohrding (LL and UL)95% Confidence Bounds

Hall, I. J. and Prairie, R. R. (1971), ”Prediction Intervals in ReliabilityWork”, Sandia Laboratories Report, SC-DR-70-833, Albuquerque, NM.

Hall, I. J. and Prairie, R. R. (1973), ”One-Sided Prediction Intervals toContain at Least m Out of k Future Observations”, Technometrics, 15, No.4, pp. 897-914.

Hines, William W. and Montgomery, Douglas C., Probability and Statis-tics in Engineering and Management Science, John Wiley & Sons, New York,1990.

Lindley, D. V. (1969), Introduction to Probability and Statistics From aBayesian Viewpoint, Part I Probability , Cambridge University Press, Cam-bridge, England.

Lohrding, Ronald K. (1973), Three Kolmogorov-Smirnov Type One-SampleTests with Improved Power Properties, Journal of Stat. Computing and Sim-ulation, Vol. 2, pp. 139-18.

Maag, U. R. and Dicaire, G. (1971), On Kolmogorov-Smirnov Type One-Sample Statistics, Biometrika, Vol. 54, pp. 653-656.

2.9. PROBLEMS FOR CHAPTER 2 85

Owen, D. B. (1962), Handbook of Statistical Tables, Addison-Wesley Pub-lishing Company, Inc., Reading, MA.

Pyke, R. (1959), The Supremum and Infimum of the Poisson Process,Annals of Math. Stat., Vol. 30, pp. 568-576.

Savage, L. J. (1954), The Foundations of Statistics, John Wiley & Sons,New York.

2.9 Problems for Chapter 2

Problem 2.1 Let a sample space be made up of 7 simple events: e1, e2, e3, e4, e5, e6, e7.Let P{e1} = P{e3} = P{e5} = P{e1} = 0.01, P{e2} = 0.2, P{e4} =0.05, P{e6} = 0.35.Event A is made up of: e1, e3, event B is made up of: e1, e2, e5, e7 andevent C is made up of :e3, e4, e5, e6, e7. Determine:

a)P (A) b) P (B) c) P (A∩B) d) P (A∪B) e) P (C|A) f) P (Ac)

g) if A and C are independent h) if A and B are independent

i) if there can be any other event D in the sample space that is mutuallyexclusive of B ∪ C.

Problem 2.2 A small lot of ten items contains 2 defective items. The experiment is:a random sample of 3 items is selected from the lot and the type of item(defective [D] or non-defective [N]) is determined. Case A: replace-ment sampling is used, that is, an item is selected, the defectivenessdetermined and the item is replaced in the sample before the next itemis selected. Case B: the sampling is performed using non-replacement.a) Write out the simple events that make up the sample space S.b) Let A be the event that 2 non-defective items are selected in a row.Find P{A}.c) Let B be the event that the 2 defective items are selected in a row.Find P{B}.


Problem 2.3 Why aren’t there many 5-engine airplanes? Well, probably for otherreasons, but consider the following problems. If at least half of a plane’sengines must be functioning for the plane to fly,a) are there any values of q for which a 3-engine plane is safer thana 4 engine plane, where q is the probability that an engine will befunctioning?b) for which q a 3-engine plane is safer than a 5-engine plane? Assumethat the engines function independently.

Problem 2.4 6 dice are tossed,a) what is the probability that they are all 2’s?b) what is the probability that every possible number appears?Answer a) and b) if 7 dice are tossed.

Problem 2.5 What is the probability of winning at craps? (One wins on first tosswith a 7 or 11. If one gets a 4, 5, 6, 8, 9, 10 on the first toss, one tossesagain until the first toss is repeated for a win or a 7 is tossed for a loss.)

Problem 2.6 What is the probability that in a group of 23 people, no two peoplewill have the same birthday?

Problem 2.7 The proportion of women who vote Republican is .45, while the pro-portion of men who vote Republican is .55. What is the probabilitythat a person chosen at random from the population is a Republican,if women make up 55% of the population?

Problem 2.8 A red die and a black die are tossed. What is the probability thatthe black die is a 2, if the sum of the numbers is known to be 5? Arethese events independent? What is the probability that the black dieis a 2, if the sum of the numbers is known to be 7? Are these eventsindependent?

Problem 2.9 An urn contains 10 identical balls, of which 2 are white and 8 are black.If 3 balls are drawn without replacement,a) what is the probability that 2 are black,b) what is the probability that at least 2 are black?c) Show that the answers to a) and b) can be obtained using the hy-pergeometric distribution.


Problem 2.10 The Arizona lottery has the game Lotto, for which one wins by match-ing 3, 4, 5, or all 6 numbers drawn in any order, where the numbersrange from 1 to 42. The odds for winning the jackpot, that is, match-ing all 6 out of the 6 numbers drawn are listed as 1:5,245,786. Thesecond prize, matching any 5 out of the 6, has odds of 1:24,286.05. Formatching any 3 out of 6, one wins $2, at odds of 1:36.74.a) Compute the odds of winning the jackpot.b) Verify the odds of winning the second prize.c) Verify the odds of winning $2.d) Which distribution can be used to compute the above odds and redothe above using the distribution.

Problem 2.11 A consumer is deciding whether to buy a lot of N=200 items which theproducer guarantees to contain D=2 or fewer defective items. If the lotcontains as many as D=6 defective items, the consumer will not buythe lot.a) Which distribution represents the distribution of the number of de-fective items in a sample of n items, where n > 20?b) How large of a sample must be taken and for which value of c mustbe used so that:P{accept lot|D = 2} ≥ 0.90P{reject lot|D = 6} ≥ 0.90

where the lot is accepted if the observed number of defectives in thesample is less than or equal to c and rejected otherwise.c) Could one do better, that is, save money with the same statisticalrequirements, if one sampled with replacement?

Problem 2.12 A production line has a constant probability p that an item from theline is defective. Assume that items are sampled from the line in anindependent manner.a) What distribution usually is chosen to represent the number of fail-ures in a sample of n items?b) What distribution usually is chosen to represent the number of itemssampled until the first failure is found?c) What distribution usually is chosen to represent the number of itemssampled until the rth failure is found?d) What distribution usually is chosen to represent the number of fail-


ures in a sample of n items, the size of the sample is very large and pis small?e) What distribution usually is chosen to represent the number of fail-ures in a sample of n items, if the size of the sample is very large andp is not small?f) What are the continuous analogues of the above distributions?

Problem 2.13 One is trying to assess if a large production line has an acceptable per-cent defective pa = 0.02, or an unacceptable percent defective pu =0.06. Devise a sampling plan, based on the observed number of defec-tives c and the sample size n, so that:P{X ≤ c|p = 0.02} ≥ 0.95P{X ≤ c|p = 0.06} ≤ 0.05

where X represents the random variable of the number of defectives inthe sample.

Problem 2.14 The Poisson distribution is an adequate approximation to the binomialif the number of trials n is large and the proportion p is small. Repeatproblem (13) using the Poisson distribution and show that the resultshold whenever the ratio pu

pa= 3 with the appropriate sample size.

Problem 2.15 A function f is defined by:

f(x) =

kx, for 0 < x < 1k2(3− x) for 1 ≤ x ≤ 3

0 elsewhere

a) What is the value of k so that f is a density?b) What is E(X), where X is the random variable whose density is f(x)?c) What is V(X), the variance of X?d) What is the median of X?e) What is the coefficient of skewness of X?f) What is the coefficient of kurtosis of X?

Problem 2.16 For the following data, draw a stem plot and a box plot. Compute thesample 5-number summary and the sample coefficients of skewness andkurtosis. 23.20, 17.33, 26.47, 32.66, 32.19, 35.40, 11.90, 23.59, 31.57,


18.48, 20.86, 14.86, 13.92, 19.13, 20.36, 12.29, 21.59, 22.58, 25.81, 22.81,23.04, 22.78, 33.06, 24.49

Problem 2.17 Apply the K-S and Lohrding tests for the goodness of fit of the data

4.8 8.0 13.0 46.6 10.1 6.6 22.6 4.5

3.9 17.4 6.3 5.4 5.2 9.2 9.1

to the exponential distribution with the mean θ = 10. Compare theresults to the chi-square test.


n α = .20 .15 .10 .05 .011 0.900 0.925 0.950 0.975 0.9952 0.684 0.726 0.776 0.842 0.9293 0.565 0.597 0.642 0.708 0.8284 0.494 0.525 0.564 0.624 0.7335 0.446 0.474 0.510 0.565 0.6696 0.410 0.436 0.470 0.521 0.6187 0.381 0.405 0.438 0.486 0.5778 0.358 0.381 0.411 0.457 0.5439 0.339 0.360 0.388 0.432 0.51410 0.322 0.342 0.368 0.410 0.49011 0.307 0.326 0.352 0.391 0.46812 0.295 0.313 0.338 0.375 0.45013 0.284 0.302 0.325 0.361 0.43314 0.274 0.292 0.314 0.349 0.41815 0.266 0.283 0.304 0.338 0.40416 0.258 0.274 0.295 0.328 0.39217 0.250 0.266 0.286 0.318 0.38118 0.244 0.259 0.278 0.309 0.37119 0.237 0.252 0.272 0.301 0.36320 0.231 0.246 0.264 0.294 0.35625 0.210 0.220 0.240 0.270 0.32030 0.190 0.200 0.220 0.240 0.29035 0.180 0.190 0.210 0.230 0.270

> 35 1.07/√n 1.14/

√n 1.22/

√n 1.36/

√n 1.63/

√n

Table 2.8: Critical values of K-S test

chap2

Documents