lecture4 mech su

8/12/2019 Lecture4 Mech SU

1/17

Lecture 4: Probability and Statistics

Goals of the lecture

Finding a representative value that bestcharacterizes the average of the data set.

Finding a representative value that provides ameasure of the variation in the measured dataset.

Establishing an interval about therepresentative average value in which the truevalue is expected to lie.


2/17

Introduction to Uncertainty and Estimation of Precision

Uncertainty

Two classes of experiments exist:Single-sample experiment: measurement is taken exactly once

Repeat-sample experiment: the same measurement is taken several

times, under identical conditions

Repeat-sampling allows an estimate of the measurement to be

made via statistical methods

Total uncertainty Ux in a measurement of x is calculated from bias

and precision uncertainties:

Given Bx = bias uncertainty; Px = precision uncertainty

Assume sources of bias and precision error are independentTotal uncertainty , where Ux, Bx and Px are all at the

same odds (coverage, confidence).


3/17

Error Distribution: Characterizes the probability that an error of a

given size will occur during repeat-sample experiments.

Probability: an expression of the likelihood of a particular event

taking place, measured with reference to all possible events.

The probability density function (PDF) for the entire populationof possible precision error values is generally assumed to be

Gaussian (normal, bell-shaped)

Since total precision error is random, each individual

measurement in the sample will have a distinct error whose

likelihood of occurrence (roughly) decreases with size


4/17

Normal Distribution Function (Gaussian Function)

For a normal distribution, if we know the mean and standard deviation, we can estimate

the probability that a single measurement will lie within a band around the mean:


5/17

Probability density function In simple terms, a probability

density function (PDF) is constructed by drawing a smooth curve

fit through the vertically normalized histogram.

Histograms:

a histogram is constructed by divvying up the n measurements of a sample intoJ bins or

intervals (also called classes) such that for the first bin (j = 1),x1


6/17

The histogram can be modified by dividing the vertical axis by the total

number of measurements, n . The resulting probabi l i ty his togram has the

same shape, but the vertical axis represents a relat ive frequenc y or

probabi l i ty, i.e.,

We can also define a vertically normalized histogram by further dividing the

vertical axis by the bin width or interval width. The vertical axis of the vertically

normalized histogram is defined as:


7/17

Using a histogram to display this data, we need to choose K small intervals

for each bin of the histogram.

For smallN, the number of measurement results in at least one binshould be >= 5

For intermediate values of N

K = 1.87(20 1)0.40+1 = 7.1

M in imum Value = 0.68, Maximum Value = 1.34

Therefore a bin width xof 0.10 is chosen

This histogram is an estimate of the data set probability density function.

nj is the number of samples in each bin. p(x) defines the probability that

measured variable might assume any particular value upon any individual

measurement.


8/17

The Mean Value (Central Tendency) and Standard

Deviation

Continuous Random Variable

Mean Value Variance

Discrete Random Variable

Mean Value Variance


9/17

Finite Statistics

Unless we have made a very large number of measurements, we don't have

an accurate estimate of the mean or standard deviation of a data set. If weassume the values are normally distributed, we can estimate the mean and

standard deviation from the data.

The sample mean and sample variance are given by:

and

How close are these values to the true mean

and standard deviation? That depends on

how many samples we have.

For a normally distributed data set, we can

say that the probability of a sample, xi,

differing from the data set mean value, , is

given by x


10/17

Standard Deviation of the Means

If we take a set of N measurements of the same variable, then repeat this

process M times, the mean of each data set will differ somewhat from the

others. It can be shown that the mean values themselves will follow a normal

distribution even if the original distribution is not normal.

The standard deviation of the means is given

by :

sample of N values differs from the true mean of the

distribution by an amount


11/17

Normal ized probabil i ty densi ty funct iona no rmalized probabi l i tydens i ty funct ion is constructed by transforming both the abscissa (horizontal

axis) and ordinate (vertical axis) of the PDF plot as follows:

The above transformations accomplish two things:

The first transformation normalizes the abscissa such that the PDF is

centered around z = 0.The second transformation normalizes the ordinate such that the PDF is

spread out in similar fashion regardless of the value of standard deviation.

The Gaussian or Normal probability density function

It is symmetric about the mean.The mean, median, and mode are all equal to , the expected value (at the

peak of the distribution).

Its plot is commonly called a bel l curvebecause of its shape.

The actual shape depends on the magnitude of the standard deviation.

Namely, if is small, the bell will be tall and skinny, while if is large, the bell

will be short and fat, as sketched.


12/17

Confidence level is defined as the probability that a random variable lieswithin a specified range of values. The range of values itself is called the confidence

interval. For example, as discussed above we are 95.44% confident that a purely

random variable lies within two standard deviations from the mean.

Level of significance, , is defined as the probability that a randomvariable lies outside of a specified range of values. In the above example, we

are 100 95.44 = 4.56% confident that a purely random variable lies either

below or above two standard deviations from the mean. (We usually roundthis off to 5% for practical engineering statistical analysis.)


13/17

Regression Analysis

Regression analysis is used to find an equation for y as a function of x that

provides the best f i t to the data.

Typically, y is some measured output as a function of some known input, x.Recall that the l inear correlation coeff ic ient is used to determine if there is a

trend.

Linear regression analysis

Linear regression analysis is also called l inear least-squares fitanalysis.

The goal of linear regression analysis is to find the bestfitstraight

line through a set of y vs. x data.

The technique for deriving equations for this best-f i t or least-squares

fi t line is as follows:1. An equation for a straight line that attempts to fit the data pairs is chosen asY=ax+b.

2. in the above equation, a is the s lope (a = dy/dx most of us are more familiar

with the symbol m rather than a for the slope of a line), and b is the y-

intercept the y location where the line crosses the y axis (in other words,

the value of Y at x = 0).


14/17

Linear regression analysis (Cont.)

3. An upper case Y is used for the fitted line to distinguish the fitted data

from the actual data values, y.

4. In linear regression analysis,coeff ic ients a and b are opt im ized fo r the

best po ssib le fi t to the data.

5. The optimization process itself is actually very straightforward:

6. For each data pair (xi, yi), error eiis defined as the difference between the predicted or

fitted value and the actual value: ei= error at data pair i, or :

eiis also called the residual. Note: Here, what we call the actual value does not necessarily mean the

correctvalue, but rather the value of the actual measured data point.

7. We define E as the sum of the squared errors of the fit a global measure

of the error associated with all n data points. The equation for E is :


15/17

8. It is now assumed that the best f i t is the one for wh ich E is the smallest.

9. In other words, coeff ic ients a and b th at minim ize E need to b e found. These

coefficients will be the ones that create the best-fit straight line Y = ax + b.

10.How can a and b be found such that E is minimized? Well, as any good

engineer or mathematician knows, to find a minimum (or maximum) of a

quantity, that quantity is differentiated, and the derivativ e is set to zero.

11.Here, two part ial derivatives are required, since E is a function of two

variables,a

andb

. Therefore, we obtain:



16/17


Correlation coefficient :

In engineering analysis, we often want to fit a trend line or curve to a set of x-ydata.

Consider a set of n measurements of some variable y as a function of another

variable x.

Typically, y is some measured output as a function of some known input, x.

In general, in such a set of measurements, there may be:

Some scatter (precision error or random error).A t rend in spite of the scatter, y may show an overall inc rease with x, or

perhaps an overall decrease with x.

The l inear correlation coeff ic ient is used to determine i f there is a trend.

If there is a trend, regression analysis is used to find an equation for y as a

function of x which provides the best f i t to the data.

The l inear correlat ion coeff ic ient rxyis defined as:


17/17

Data Outlier Detection

How do you handle spurious data points?

The most common and simplest approach is to label points that lie outside the range

of 99.8% probability of occurrence, , as outliers. This three-sigma test

works well with data set of 10 or more points.

Number of Measurements RequiredSome sample statistics must be known to estimate the variation in the data set and

therefore estimate a confidence interval in the data yet to be acquired.

lecture4 mech su

Documents

lecture4 ee689 tline examples

mech-ht 120 lect-03 mech approach

qf i lecture4

mech apptitude

dami lecture4

mech final

iac lecture4 homework

27024 mech

lecture2 mech su

rm lecture4

lecture4-57staff.cs.psu.ac.th/345-101/slide/lecture4.pdf ·...

ee560 lecture4

lecture4 ch2-3 waves emwaves

purhnikh fysikh kai stoixeivdh svmatidia-lecture4

mech. annual

java lecture4 ovr

csc 315-lecture4

lecture1 mech su

fy lecture4

fm schulte lecture4