lecture4 mech su
Post on 03-Jun-2018
220 Views
Preview:
TRANSCRIPT
-
8/12/2019 Lecture4 Mech SU
1/17
Lecture 4: Probability and Statistics
Goals of the lecture
Finding a representative value that bestcharacterizes the average of the data set.
Finding a representative value that provides ameasure of the variation in the measured dataset.
Establishing an interval about therepresentative average value in which the truevalue is expected to lie.
-
8/12/2019 Lecture4 Mech SU
2/17
Introduction to Uncertainty and Estimation of Precision
Uncertainty
Two classes of experiments exist:Single-sample experiment: measurement is taken exactly once
Repeat-sample experiment: the same measurement is taken several
times, under identical conditions
Repeat-sampling allows an estimate of the measurement to be
made via statistical methods
Total uncertainty Ux in a measurement of x is calculated from bias
and precision uncertainties:
Given Bx = bias uncertainty; Px = precision uncertainty
Assume sources of bias and precision error are independentTotal uncertainty , where Ux, Bx and Px are all at the
same odds (coverage, confidence).
-
8/12/2019 Lecture4 Mech SU
3/17
Error Distribution: Characterizes the probability that an error of a
given size will occur during repeat-sample experiments.
Probability: an expression of the likelihood of a particular event
taking place, measured with reference to all possible events.
The probability density function (PDF) for the entire populationof possible precision error values is generally assumed to be
Gaussian (normal, bell-shaped)
Since total precision error is random, each individual
measurement in the sample will have a distinct error whose
likelihood of occurrence (roughly) decreases with size
-
8/12/2019 Lecture4 Mech SU
4/17
Normal Distribution Function (Gaussian Function)
For a normal distribution, if we know the mean and standard deviation, we can estimate
the probability that a single measurement will lie within a band around the mean:
-
8/12/2019 Lecture4 Mech SU
5/17
Probability density function In simple terms, a probability
density function (PDF) is constructed by drawing a smooth curve
fit through the vertically normalized histogram.
Histograms:
a histogram is constructed by divvying up the n measurements of a sample intoJ bins or
intervals (also called classes) such that for the first bin (j = 1),x1
-
8/12/2019 Lecture4 Mech SU
6/17
The histogram can be modified by dividing the vertical axis by the total
number of measurements, n . The resulting probabi l i ty his togram has the
same shape, but the vertical axis represents a relat ive frequenc y or
probabi l i ty, i.e.,
We can also define a vertically normalized histogram by further dividing the
vertical axis by the bin width or interval width. The vertical axis of the vertically
normalized histogram is defined as:
-
8/12/2019 Lecture4 Mech SU
7/17
Using a histogram to display this data, we need to choose K small intervals
for each bin of the histogram.
For smallN, the number of measurement results in at least one binshould be >= 5
For intermediate values of N
K = 1.87(20 1)0.40+1 = 7.1
M in imum Value = 0.68, Maximum Value = 1.34
Therefore a bin width xof 0.10 is chosen
This histogram is an estimate of the data set probability density function.
nj is the number of samples in each bin. p(x) defines the probability that
measured variable might assume any particular value upon any individual
measurement.
-
8/12/2019 Lecture4 Mech SU
8/17
The Mean Value (Central Tendency) and Standard
Deviation
Continuous Random Variable
Mean Value Variance
Discrete Random Variable
Mean Value Variance
-
8/12/2019 Lecture4 Mech SU
9/17
Finite Statistics
Unless we have made a very large number of measurements, we don't have
an accurate estimate of the mean or standard deviation of a data set. If weassume the values are normally distributed, we can estimate the mean and
standard deviation from the data.
The sample mean and sample variance are given by:
and
How close are these values to the true mean
and standard deviation? That depends on
how many samples we have.
For a normally distributed data set, we can
say that the probability of a sample, xi,
differing from the data set mean value, , is
given by x
-
8/12/2019 Lecture4 Mech SU
10/17
Standard Deviation of the Means
If we take a set of N measurements of the same variable, then repeat this
process M times, the mean of each data set will differ somewhat from the
others. It can be shown that the mean values themselves will follow a normal
distribution even if the original distribution is not normal.
The standard deviation of the means is given
by :
sample of N values differs from the true mean of the
distribution by an amount
-
8/12/2019 Lecture4 Mech SU
11/17
Normal ized probabil i ty densi ty funct iona no rmalized probabi l i tydens i ty funct ion is constructed by transforming both the abscissa (horizontal
axis) and ordinate (vertical axis) of the PDF plot as follows:
The above transformations accomplish two things:
The first transformation normalizes the abscissa such that the PDF is
centered around z = 0.The second transformation normalizes the ordinate such that the PDF is
spread out in similar fashion regardless of the value of standard deviation.
The Gaussian or Normal probability density function
It is symmetric about the mean.The mean, median, and mode are all equal to , the expected value (at the
peak of the distribution).
Its plot is commonly called a bel l curvebecause of its shape.
The actual shape depends on the magnitude of the standard deviation.
Namely, if is small, the bell will be tall and skinny, while if is large, the bell
will be short and fat, as sketched.
-
8/12/2019 Lecture4 Mech SU
12/17
Confidence level is defined as the probability that a random variable lieswithin a specified range of values. The range of values itself is called the confidence
interval. For example, as discussed above we are 95.44% confident that a purely
random variable lies within two standard deviations from the mean.
Level of significance, , is defined as the probability that a randomvariable lies outside of a specified range of values. In the above example, we
are 100 95.44 = 4.56% confident that a purely random variable lies either
below or above two standard deviations from the mean. (We usually roundthis off to 5% for practical engineering statistical analysis.)
-
8/12/2019 Lecture4 Mech SU
13/17
Regression Analysis
Regression analysis is used to find an equation for y as a function of x that
provides the best f i t to the data.
Typically, y is some measured output as a function of some known input, x.Recall that the l inear correlation coeff ic ient is used to determine if there is a
trend.
Linear regression analysis
Linear regression analysis is also called l inear least-squares fitanalysis.
The goal of linear regression analysis is to find the bestfitstraight
line through a set of y vs. x data.
The technique for deriving equations for this best-f i t or least-squares
fi t line is as follows:1. An equation for a straight line that attempts to fit the data pairs is chosen asY=ax+b.
2. in the above equation, a is the s lope (a = dy/dx most of us are more familiar
with the symbol m rather than a for the slope of a line), and b is the y-
intercept the y location where the line crosses the y axis (in other words,
the value of Y at x = 0).
-
8/12/2019 Lecture4 Mech SU
14/17
Linear regression analysis (Cont.)
3. An upper case Y is used for the fitted line to distinguish the fitted data
from the actual data values, y.
4. In linear regression analysis,coeff ic ients a and b are opt im ized fo r the
best po ssib le fi t to the data.
5. The optimization process itself is actually very straightforward:
6. For each data pair (xi, yi), error eiis defined as the difference between the predicted or
fitted value and the actual value: ei= error at data pair i, or :
eiis also called the residual. Note: Here, what we call the actual value does not necessarily mean the
correctvalue, but rather the value of the actual measured data point.
7. We define E as the sum of the squared errors of the fit a global measure
of the error associated with all n data points. The equation for E is :
-
8/12/2019 Lecture4 Mech SU
15/17
8. It is now assumed that the best f i t is the one for wh ich E is the smallest.
9. In other words, coeff ic ients a and b th at minim ize E need to b e found. These
coefficients will be the ones that create the best-fit straight line Y = ax + b.
10.How can a and b be found such that E is minimized? Well, as any good
engineer or mathematician knows, to find a minimum (or maximum) of a
quantity, that quantity is differentiated, and the derivativ e is set to zero.
11.Here, two part ial derivatives are required, since E is a function of two
variables,a
andb
. Therefore, we obtain:
Linear regression analysis (Cont.)
-
8/12/2019 Lecture4 Mech SU
16/17
Linear regression analysis (Cont.)
Correlation coefficient :
In engineering analysis, we often want to fit a trend line or curve to a set of x-ydata.
Consider a set of n measurements of some variable y as a function of another
variable x.
Typically, y is some measured output as a function of some known input, x.
In general, in such a set of measurements, there may be:
Some scatter (precision error or random error).A t rend in spite of the scatter, y may show an overall inc rease with x, or
perhaps an overall decrease with x.
The l inear correlation coeff ic ient is used to determine i f there is a trend.
If there is a trend, regression analysis is used to find an equation for y as a
function of x which provides the best f i t to the data.
The l inear correlat ion coeff ic ient rxyis defined as:
-
8/12/2019 Lecture4 Mech SU
17/17
Data Outlier Detection
How do you handle spurious data points?
The most common and simplest approach is to label points that lie outside the range
of 99.8% probability of occurrence, , as outliers. This three-sigma test
works well with data set of 10 or more points.
Number of Measurements RequiredSome sample statistics must be known to estimate the variation in the data set and
therefore estimate a confidence interval in the data yet to be acquired.
top related