Ömer nezi̇h gerek presentation1
DESCRIPTION
afyon_kış_okuluTRANSCRIPT
Prediction / Co(variance – relation) Öngörü / Ko(varyans – relasyon)
Ömer Nezih Gerek
Data Information
Data is useless in raw form.
Since you measure it, it must carry some
information!
Signatures (statistical)…
◦ 1. Model based
◦ 2. (non)Parametric
System, Prediction, etc.
Histogram, mean, etc.
Non-parametric statistics
Histogram (a fair pdf estimate)
Symmetric unimodal Skewed right Bi-modal
Multi-modal Skewed left Symmetric
N i( ) = count X k( ) = i{ }i=0,k=1
i=max_val , k=data_ size
Things that we extract from pdf
Technically “everything”
All moments:
◦ from which, we can derive mean, variance,
etc..
But “not” the correlation characteristics
in a time series!
Pdf is stationary, and doesn’t care about
inter-symbol relations…
M x t( ) = E etx{ }
momn =¶n
¶t nM x t( )
t=0
Sample statistics:
m :Mean (average) m = E x n[ ]{ }
s 2 :Variance s 2 = E x n[ ]- m( )2{ }
What is this “correlation” thing
It is:
But this is an abstract def. Let’s estimate it
from data (with example )
Rx t( ) = E x t( )x t -t( ){ }
Rx 2( ) =1
Nx 2( )x 0( ) + x 3( )x 1( ) + x 4( )x 2( ) + ...{
...+ x N +1( )x N -1( ) + x N + 2( )x N( )}
t = 2
Correlation somewhat depends on
the mean Let’s normalize it:
and call it “COVARIANCE”.
Cx t( ) = E x t( )- méë ùû x t -t( )- méë ùû{ }= E x t( )x t -t( ){ } - E x t( )m{ } - E x t -t( )m{ } + m2
= Rx t( ) - m2
Rx n( ) :Autocorrelation
Rx n( ) = E x m[ ]x m+ n[ ]{ }
Cx n( ) :Autocovariance
Cx n( ) = E x m[ ]- m( ) x m+ n[ ]- m( ){ }Cx n( ) = E x m[ ]x m+ n[ ]{ }- m2
Properties
Correlation and Covariance carries:
◦ time relation (single time difference), and
◦ info. regarding up to 2nd order monents: mean and variance.
Naturally, covariance decreases by distance…
Example calculation:
2 3 1
× 2 3 1
------------------
2 3 1
6 9 3
4 6 2
------------------
2 9 14 9 2
R(0) R(1) R(2)
m = 2
C(0) = 10, C(1) = 5, C(2) = -2
Covariance elements
is the “spectral
density!
◦ shows how much “power” the random signal
has at each “frequency”.
◦ remember “equalizers”, “woometers”, “radio
stations”…
C 0( ) = s 2 = E x t( ) - m( )2{ }
Sxx f( ) = FT C t( ){ }
Remember the Fourier Transform
From time (x(t)) to frequency (X(f))
In case x(t) = R(t), X(f) will be Sx(f).
… and integral of power signal density is
power (which is always positive):
X f( ) = x t( )e- j 2p ft dt-¥
¥
ò
Sx f( )dff1
f2
ò ³ 0 Sx f( ) ³ 0
Our source of info: Sx(f) & Rx(t)
We will extract significant information
from autocorrelation and power spectral
density:
◦ Best linear prediction (what will come in the
series?)
◦ The best linear model of the “process” that
produces a certain outcome
◦ The power of the process at different
frequencies.
What happens to a random
signal when it passes through a
linear system?
Do you remember what happens to a
signal that passes through a linear system?
where H(f) is the frequency response.
Y f( ) = X f( ) × H f( )
So, what happens to a random
signal when it passes through a
linear system?
Now, we don’t have the signal, x(t). We
only have statistical parameters, R(t) or
S(t):
where H(f) is the still frequency response.
Notice that Sy(f) is still positive
Sy f( ) = Sx f( ) × H f( )2
implies Power at output:
Mean at output:
Variance at output:
Sy f( ) = Sx f( ) × H f( )2
Sy f( )dfò = Sx f( ) × H f( )2
dfò = Ry 0( )
my = mx × H 0( )
s y
2 = Ry 0( )- my
2
So much math to do: Prediction
…x[1],x[2],…,x[n-2],x[n-1], what next?
A linear predictor “filters” incoming
samples to predict x[n]:
1, 2, 1, 2, 1, 2, 1, 2, x[n]=?
x[n] = 0*x[n-1] + 1*x[n-2]
◦ So our filter is: {0z-1 +1z-2}
Series examples:
1, 2, 3, 4, 5, x[n] = ?
x[n] = 2*x[n-1] – x[n-2]
◦ So, our filter is: {2z-1 – 1z-2}
1, 1, 2, 4, 7, 13, 24, x[n] = ?
x[n] = x[n-1] + x[n-2] + x[n-2]
◦ So, our filter is: {1z-1 + 1z-2 + 1z-3}
But these series’ are too deterministic.
Besides, how long is our filter?
If this is close to x[n]
Then this will be small!
Ideal predictor:
Linear predictor:
Stochastic series:
First order prediction:
Question: What is the optimum h1?
Answer: Such a value that minimizes:
x̂ n[ ] = h1 × x n-1[ ]
d n[ ] = x n[ ]- x̂ n[ ]
Minimzation of
Equivalently:
Which has a power magnitude:
d n[ ] = x n[ ]- x̂ n[ ]
d n[ ] = x n[ ]- h1 × x n-1[ ]
s d
2 = E d2 n[ ]{ }
= E x n[ ]- h1 × x n-1[ ]( )2{ }
Minimization (cont.)
Expanding:
s d
2 = E d2 n[ ]{ }
= E x n[ ]- h1 × x n-1[ ]( )2{ }
= E x2 n[ ]- 2h1 × x n[ ] × x n-1[ ]{ +h1
2 × x2 n-1[ ]} = E x2 n[ ]{ } - 2h1 ×E x n[ ] × x n-1[ ]{ }
+ h1
2 ×E x2 n-1[ ]{ }
Minimization (cont.)
s d
2 = E x2 n[ ]{ } - 2h1 ×E x n[ ] × x n-1[ ]{ }
+ h1
2 ×E x2 n-1[ ]{ } = s x
2 - 2h1 ×Rx 1( ) + h1
2 ×s x
2
= 1+ h1
2 - 2h1r1éë ùûs x
2
Minimization (cont.)
We have:
which can be minimized (according to h1)
by taking its derivative w.r.t h1 and
equating to zero!
s d
2 = 1+ h1
2 - 2h1r1éë ùûs x
2
¶
¶h1
s d
2 = 0
Minimization (cont.)
¶
¶h1
s d
2 = 0
¶
¶h1
1+ h1
2 - 2h1r1éë ùûs x
2 = 0
2h1 - 2h1r1 = 0
Þ h1 = r1
The filter coefficient is the
same as the correlation
coefficient!
x̂ n[ ] = r1 × x n-1[ ]
Optimum 1st order prediction
See that, the best prediction coefficient
depends on R(t):
Is this true for “longer” prediction filters?
Let’s take a look at 2nd order prediction
filter…
r1 =Rx 1( )s x
2: First correlation coefficient
2nd order prediction
Question: What are optimum h1 and h2?
Answer: Values that minimize:
x̂ n[ ] = h1 × x n-1[ ]+ h2 × x n- 2[ ]
d n[ ] = x n[ ]- x̂ n[ ]
Minimzation of
Equivalently:
Which has a power magnitude:
d n[ ] = x n[ ]- x̂ n[ ]
d n[ ] = x n[ ]- h1 × x n-1[ ]- h2 × x n- 2[ ]
s d
2 = E d2 n[ ]{ }
= E x n[ ]- h1 × x n-1[ ]- h2 × x n- 2[ ]( )2{ }
Minimization (cont.)
Expanding:
s d
2 = E d2 n[ ]{ }
= E x n[ ]- h1 × x n-1[ ]- h2 × x n- 2[ ]( )2{ }
= E x2 n[ ]+ h1
2 × x2 n-1[ ]+ h2
2 × x2 n- 2[ ]{ - 2h1 × x n[ ]x n-1[ ]- 2h2 × x n[ ]x n- 2[ ]
+2h1 ×h2 × x n-1[ ] × x n- 2[ ]}
Minimization (cont.)
Expanding:
s d
2 = E x2 n[ ]+ h1
2 × x2 n-1[ ]+ h2
2 × x2 n- 2[ ]{ - 2h1 × x n[ ]x n-1[ ]- 2h2 × x n[ ]x n- 2[ ]
+2h1 ×h2 × x n-1[ ] × x n- 2[ ]}
= E x2 n[ ]{ } + h1
2 ×E x2 n-1[ ]{ } + h2
2 ×E x2 n- 2[ ]{ } - 2h1 ×E x n[ ]x n-1[ ]{ } - 2h2 ×E x n[ ]x n- 2[ ]{ } + 2h1 ×h2 ×E x n-1[ ] × x n- 2[ ]{ }
Minimization (cont.)
s d
2 = E x2 n[ ]{ } + h1
2 ×E x2 n-1[ ]{ } + h2
2 ×E x2 n- 2[ ]{ } - 2h1 ×E x n[ ]x n-1[ ]{ } - 2h2 ×E x n[ ]x n- 2[ ]{ } + 2h1 ×h2 ×E x n-1[ ] × x n- 2[ ]{ }= s x
2 + h1
2s x
2 + h2
2s x
2 - 2h1Rx 1( ) - 2h2Rx 2( ) + 2h1h2Rx 1( )
= s x
2 1+ h1
2 + h2
2 - 2h1r1 - 2h2r2 + 2h1h2r1éë ùû
Minimization (cont.)
We now have:
which can be minimized (according to h1
and h2) by taking its derivative w.r.t h1,h2
and equating to zero!
s d
2 =s x
2 1+ h1
2 + h2
2 - 2h1r1 - 2h2r2 + 2h1h2r1éë ùû
¶
¶h1
s d
2 =¶
¶h2
s d
2 = 0
Minimization (cont.)
¶
¶h1
s d
2 = 0 Þ h1 =r1 1- r2( )
1- r1
2( )
The filter coefficients have
somewhat changed…
¶
¶h2
s d
2 = 0 Þ h1 =r2 - r1
2( )1- r1
2( )
Comparison of 1st and 2nd orders
s d,min,1
2 = 1- r1
2éë ùûs x
2
s d,min,2
2 = 1- r1
2 -r1
2 - r2( )1- r1
2( )
é
ë
êê
ù
û
úús x
2
1³ r1
2 ³ r2Þs d,min,1
2 ³s d,min,2
2
By increasing the prediction window size,
prediction error “definitely” increases!
Note that 2nd and 1st order are the same
if:
s d,min,2
2 = 1- r1
2 -r1
2 - r2( )1- r1
2( )
é
ë
êê
ù
û
úús x
2 = 1- r1
2éë ùûs x
2
Þ r2 = r1
2
A first order system property:
Nth order prediction:
s d
2 = E d2 n[ ]{ }
= E x n[ ]- x̂ n[ ]( )2{ }
= E x n[ ]- hj x n- j[ ]j=1
N
åæ
èç
ö
ø÷
2ì
íï
îï
ü
ýï
ýï
s d
2 = E x n[ ]- hj x n- j[ ]j=1
N
åæ
èç
ö
ø÷
2ì
íï
îï
ü
ýï
ýï
¶s d
2
¶hj
= E 2 x n[ ]- x̂ n[ ]( )¶
¶hj
- x̂ n[ ]( )ìíï
îï
üýï
ýï
¶
¶hj
- x̂ n[ ]( ) = x n- j[ ]
E x n[ ]- x̂ n[ ]( )x n- j[ ]{ } = E d n[ ]x n- j[ ]{ } = 0
Important observation : d n[ ] ^ x n- j[ ]
Nth order prediction
E x n[ ]- hi x n- i[ ]i=1
N
åæ
èçö
ø÷x n- j[ ]
ìíî
üýý
= 0, "j
Rx j( ) - hi Rx j - i( )i=1
N
å = 0, "i
Nth order prediction
In short:
or:
with:
rx = Rx ´ hopt
hopt = Rx
-1 ´ rx
s d,min
2 =s x
2 - rx
TRx
-1rx
Good
correlations
reduce the
error faster!
Remarks
is not a cheap operation
There must be a good N estimate.
The result is only the best “linear” predictor.
Multidimensional extensions exist.
hopt = Rx
-1 ´ rx
Graphically…
Example
R(0) =1, R(1) = 0.9, R(2) = 0.81
What is the optimum 2-tap prediction filter?
hopt = 1 0.90.9 1
é
ëêù
ûú
-1
× 0.90.81
é
ëêù
ûú
= 5.263 -4.737-4.737 5.263
é
ëêù
ûú× 0.9
0.81é
ëêù
ûú= 0.9
0é
ëêù
ûú
hopt = R-1 ×r
Example (cont.)
R(0) =1, R(1) = 0.9, R(2) = 0.81
Why is h=[0.9, 0] ?
R= 0.9R= 0.9
R= 0.9 ×0.9 = 0.81
Because
process is
1st order
x n[ ]x n-1[ ]x n- 2[ ]
. . . . . . . . .
. . . x(n-1,m-1) x(n-1,m)
. . . x(n,m-1) x(n,m)
R(0) = RMS found...
R(1) = ave horiz-1( )R(2) = ave vert -1( )R(3) = ave diag-1( )
hopt = R-1 ×r same formula!...
"Template" and filter size may vary.
Reasons for multidimensional
prediction: Image processing?
Consider solar radiation: We have relation to “last hour”, but;
◦ Don’t we have relation to “yesterday, same hour”?
◦ What about “last year, same hour”?
◦ What about “yesterday’s wind speed”?
◦ What about “last week’s electricity consumption”?
Do we need extremely long prediction size, N?
The trick is to put “related” terms
near to each other
instead of
Then we can use 2D prediction
with similar correlation
definitions…
… and achieve low prediction error
Putting related items near each other
is good for other
methods, too…
Other methods may include:
◦ Nonlinear prediction
◦ Neural networks
◦ Transformation (Fourier, wavelet, etc.)
◦ Adaptive methods
A case for solar radiation prediction:
Correlation may include an
auxilliary signal Yielding a “hidden” process:
◦ Hidden Markov Model
We observe
pressure
We predict the
wind speed!
Result is “wind measurement” using a barometer
The results are accurate enough to make RES sizing!
Some examples with nonlinear prediction
(thanks to graduate students)
7860 7880 7900 7920 7940 7960 7980 8000 -2
0
2
4
6
8
10
12
Saat
Rüzg
ar Hı
zı (m
/s)
Ölçülen Tahmin
7860 7880 7900 7920 7940 7960 7980 8000 0
1
2
3
4
5
Saat
Rüzg
ar Hı
zı (m
/s)
Ölçülen Tahmin
İzmir
Antalya
and their distributions (İzmir)
0
2
4
6
8
10
12
14
16
18
20
Dağılım Yüzdeleri
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Rüzgar Hızı Durum Aralığı (m/s)
0
2
4
6
8
10
12
14
16
18
20
Dağılım Yüzdeleri
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Rüzgar Hızı Durum Aralığı (m/s)
Real
Predicted
Motto:
Know your math.
or, keep a SP guy around you