02 statistics review

80
Unit 2 Statistics Review Wang, Yuan-Kai, 王元凱 [email protected] http://www.ykwang.tw Department of Electrical Engineering, Fu Jen Univ. 輔仁大學電機工程系 2006~2011 Bayesian Networks Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright Reference this document as: Wang, Yuan-Kai, “Statistics Review," Lecture Notes of Wang, Yuan-Kai, Fu Jen University, Taiwan, 2011.

Upload: yuan-kai-wang

Post on 05-Dec-2014

851 views

Category:

Education


5 download

DESCRIPTION

 

TRANSCRIPT

  • 1. Bayesian Networks Unit 2 Statistics Review Wang, Yuan-Kai, [email protected] http://www.ykwang.tw Department of Electrical Engineering, Fu Jen Univ. 2006~2011 Reference this document as: Wang, Yuan-Kai, Statistics Review," Lecture Notes of Wang, Yuan-Kai, Fu Jen University, Taiwan, 2011.Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 2. Unit - Statistics Review p. 2 Goal of this Unit Review basic concepts of statistics in terms of Image processing Pattern recognitionFu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 3. Unit - Statistics Review p. 3 Related Units Previous unit(s) Probability Review Next units Uncertainty Inference (Discrete) Uncertainty Inference (Continuous)Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 4. Unit - Statistics Review p. 4 Self-Study Artificial Intelligence: a modern approach Russell & Norvig, 2nd, Prentice Hall, 2003. pp.462~474, Chapter 13, Sec. 13.1~13.3 , 2002 D. Grifiths, 2009, O ReillyFu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 5. Unit - Statistics Review p. 5 Contents 1. Introduction 6 2. Histogram 12 3. Central Tendency .............................. 18 4. Variance ............................................. 26 5. Frequency Distribution ...... 34 6. Covariance ......................................... 52 7. Covariance Matrix 57 8. Correlation .......................................... 64 9. Chart and Graph .. 68 10.References 79Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 6. Unit - Statistics Review p. 6 1. Introduction Probability and statistics are about uncertainty The world is full of uncertainty Our hardware/software implementation needs to consider uncertaintyFu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 7. Unit - Statistics Review p. 7 Uncertainty by Probability It summarizes the uncertainty that arises from laziness and ignorance An example P(your toothache is caused by a cavity) = 0.8 20% represents your laziness and ignorance all other possible causesFu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 8. Unit - Statistics Review p. 8 Uncertainty by Statistics It derives probabilistic facts from a set of data Derive actual probability number P(your toothache is caused by a cavity) = 0.8 Describe characteristics of data Mean, variance, moment, ... Build the statistic model of data Gaussian, Gaussian Mixture Reason new facts from the dataFu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 9. Unit - Statistics Review p. 9 What Is Statistics Given a set of data from a random variable A statistic is a number that provides information about the data Descriptive statistics Two way to describe data Measures of central tendency Measures of dispersionFu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 10. Unit - Statistics Review p. 10 Measures of Central Tendency Mean Average, expected value of the random variable Median Middle value of the R.V. Mode The variable value at the peak of the pmf/pdfFu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 11. Unit - Statistics Review p. 11 Measures of Dispersion Dispersion Variance Covariance Correlation Moment Others: range, percentiles, 95% percentile,Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 12. Unit - Statistics Review p. 12 2. Histogram This course has 15 students Every student has a score with values: 0, 10, 20, ... 100 Random variable X = Students score Scores of the 15 student {20, 90, 90, 100, 50, 60, 70, 60 ,80, 70, 80, 90, 80, 70, 70} 20x1; 50x1; 60x2; 70x4; 80x3; 90x3; 100x1Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 13. Unit - Statistics Review p. 13 The Histogram 20x1; 50x1; 60x2; 70x4; 80x3; 90x3; 100x1 No. of X X 10 20 30 40 50 60 70 80 90 100 20x1/15; 50x1/15; 60x2/15; 70x4/15; 80x3/15; 90x3/15; 100x1/15 P(X) X Histogram is P.D.F. 10 20 30 40 50 60 70 80 90 100Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 14. Unit - Statistics Review p. 14 Definitions For a random variable X X has n possible values {x1, x2, ..., xn} Now there are N random data of X x1, x2, .., xN Histogram & Distribution The number of xi : N(xi) The probabilities of xi : p(xi)= N(xi)/NFu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 15. Unit - Statistics Review p. 15 Histogram v.s. P.D.F.Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 16. Unit - Statistics Review p. 16 2D Gaussian Histogram pdfFu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 17. Unit - Statistics Review p. 17 Histogram of an Image Random variable X (Gray level) has n possible values {x1, x2, ..., xn}, n=256 N random data x1, x2, .., xN of X, N=Width*Height Histogram: N(xi) Distribution: P(xi) = N(xi) / NFu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 18. Unit - Statistics Review p. 18 3. Central Tendency Random variable X = Students score Scores of the 15 student {20, 90, 90, 100, 50, 60, 70, 60 ,80, 70, 80, 90, 80, 70, 70} 20x1; 50x1; 60x2; 70x4; 80x3; 90x3; 100x1 P(X) Histogram X 10 20 30 40 50 60 70 80 90 100 Mean ?Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 19. Unit - Statistics Review p. 19 Mean Mean from the set of data 1 N E[ X ] x N x i 1 i Mean from the p.d.f n E [ X ] x xi p( xi ) i 1Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 20. Unit - Statistics Review p. 20 Mean of an Image 1 N n E[ X ] x xi E [ X ] x xi p( xi ) N i 1 i 1Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 21. Unit - Statistics Review p. 21 Disadvantage of Mean Mean is easily influenced by outlier (extreme values) Mean may not be the real value P(X) 1 N x N x i 72 i 1 X 10 20 30 40 50 60 70 80 90 100Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 22. Unit - Statistics Review p. 22 Median & Mode Median and mode are another measures of central tendency Median: (1) Sort the scores, (2) Select the middle {20, 50, 60, 60, 70, 70, 70, 70, 80 ,80, 80, 90, 90, 90, 100} Mode: select the score with the maximum N(X) or P(X) 20x1; 50x1; 60x2; 70x4; 80x3; 90x3; 100x1 P(X) X 10 20 30 40 50 60 70 80 90 100Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 23. Unit - Statistics Review p. 23 Advantage of Median & Mode Median and mode is not influenced by outlier Median and mode will be the real valueFu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 24. Unit - Statistics Review p. 24 Expected Value E[X] : mean n E[ X ] x xi p ( xi ) E[ X ] x i 1 xp( x)dx E[Xr] rth moment of X n E[ X ] xi p ( xi ) E[ X ] x r p ( x)dx r r i 1 E[(X-)r] rthn central moment of X E[( X ) ] ( xi ) p( xi ) r r i 1 E[( X ) ] ( x ) p ( x)dx r r Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 25. Unit - Statistics Review p. 25 Deviation about the Mean x xi It indicates how far a value is from the center It is a very important number to measure the dispersion of how a distribution spreads outFu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 26. Unit - Statistics Review p. 26 4. Variance Variance and standard deviation come from the deviation Average Deviation Calculate all of the deviations and find their average It is a measure of the typical amount any given data point might vary N ( xi x ) x x i AD i 1 NFu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 27. Unit - Statistics Review p. 27 We need Absolute Deviation xi x x i N xi | xi x | 1 1-3=-2 xi x 1 |1-3|=2 2 2-3=-1 AAD i 1 2 |2-3|=1 3 3-3=0 N 3 |3-3|=0 4 4-3=1 4 |4-3|=1 5 5-3=2 5 |5-3|=2 =15 =? N =15 =6 x 15/5 (x i x) x 15/5 ABD=6/ AD i 1 =3.0 N =3.0 5 =1.2Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 28. Unit - Statistics Review p. 28 Or Square of the Deviation N Square the deviations ( x i x ) 2 Take the square root to remove Variance i 1 to return to the minus signs N original scale N x xi N (x i x) 2 AAD i 1 N i 1 N N (x i x) AD i 1 NFu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 29. Unit - Statistics Review p. 29 Sample Mean and Sample Variance We can approximate the expected value by the sample mean N x 1 N x i 1 i N Sample variance s i 2 1 N (x x) i 2 But, strangely enough, if you1want a good approximation of the true variance, you should use 2 N 2 1 N i N 1 s (x x) N 1 i 1 2Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 30. Unit - Statistics Review p. 30 Variance of an Image n 1 1 2 ( xi x ) 2 p ( xi ), n 256 N 2 N 1 i 1 ( xi x )2 i 0 1 n x N x p( x ) Moments i 1 i iFu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 31. Unit - Statistics Review p. 31 An Example of Variance Variance of the scores of 15 students in this course = ? {20, 90, 90, 100, 50, 60, 70, 60 ,80, 70, 80, 90, 80, 70, 70} P(X) 1 N x x 72 i X N i 1 10 20 30 40 50 60 70 80 90 100 1 N i 2 N 1 i 1 ( x x )2 = 388.6Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 32. Unit - Statistics Review p. 32 Ex. of Standard Deviation Standard deviation (SD): = (Var)1/2 1 N ( x x ) = 19.7 N 1 i 1 i 2 P(X) 1 N | x X 10 20 30 40 50 60 70 80 90 100 i x| N i 1 52.3 72 91.7Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 33. Unit - Statistics Review p. 33 Formal Definition of Variance Var ( X ) x E[( X E[ X ]) 2 ] ( xi x ) 2 p( xi ) 2 i Var ( X ) E[( X E[ X ]) ] ( xi x ) p( xi )dx 2 x 2 2 xFu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 34. Unit - Statistics Review p. 34 5. Frequency Distributions A graph or chart that shows the number of observations of a given value, or class intervalFu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 35. Unit - Statistics Review p. 35 Shape of Distribution Modality The number of peaks in the curve Skewness An asymmetry in a distribution where values are shifted to one extreme or the other. Kurtosis The degree of Peakedness/flatness in the curveFu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 36. Unit - Statistics Review p. 36 Modality Unimodal Bimodal MultimodalFu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
  • 37. Unit - Statistics Review p. 37 Skewness (1/3) The third moment about the Mean =0: symmetry distribution (Normal distribution ) >0 : Right Skew (Positive Skew) 3: sharp peak K0: sharp peak K0 If X tends to decrease when Y increases, then XY