imran sha - prr.hec.gov.pkprr.hec.gov.pk/jspui/bitstream/123456789/837/1/658s.pdf · allah subhanhu...
TRANSCRIPT
TIME�FREQUENCY ANALYSIS USING NEURAL NETWORKS
[De�blurred Time�Frequency Distributions Using Neural Networks]
A Dissertation presented
by
Imran Sha�
to
The Department of Computer Engineering
in partial ful�llment of the requirements
for the degree of
DOCTOR OF PHILOSOPHY
in the subject of
Digital Signal Processing
Centre for Advance Studies in Engineering (C@SE), IslamabadUniversity of Engineering and Technology (UET), Taxila
Pakistan
September 2009
CERTIFICATE
It is certi�ed that the work contained in this thesis is carried out by Mr. Imran Sha�under my supervision at C@SE Islamabad, af�liated with UET Taxila, Pakistan.
Prof. Dr. Syed Ismail Shah
Department of Computer Engineering
C@SE, Islamabad
Prof. Dr. Abdul Khaliq
Chairman
Department of Computer Engineering
C@SE, Islamabad
President
C@SE, Islamabad
ABSTRACT
The thesis is divided in three parts. In the �rst part, it explores and discusses
the diversity of concepts and motivations for obtaining good resolution and highly
concentrated time�frequency distributions (TFDs) for the research community. The
description of the methods used for TFDs' objective assessment is provided later in
this part.
In the second part, a novel multi�processes ANN based framework to obtain
highly concentrated TFDs is proposed. The propose method utilizes a localised Bayesian
regularised neural network model (BRNNM) to obtain the energy concentration along
the instantaneous frequencies (IFs) of individual components in the multicomponent
signals without assuming any prior knowledge. The spectrogram and pre�processed
Wigner�Ville distribution (WD) of the signals with known IF laws are used as the train-
ing set for the BRNNM. These distributions, taken as two�dimensional (2�D) image
matrices, are vectorized and clustered according to the elbow criterion. Each cluster
contains the pairs of the input and target vectors from the spectrograms and highly
concentrated pre�processed WD respectively. For each cluster, the pairs of vectors are
used to train the multiple ANNs under the Bayesian framework of David Mackay. The
best trained network for each cluster is selected based on network error criterion. In
the test phase, the test TFDs of unknown signals, after vectorization and clustering,
are processed through these specialized ANNs. After post�processing, the resulting
TFDs are found to exhibit improved resolution and concentration along the individual
components then the initial blurred estimates.
The third part presents the discussion on the experimental results obtained by the
proposed technique. Moreover the framework is extended to include the various objec-
tive methods of assessment to evaluate the performance of de�blurred TFDs obtained
through the proposed technique. The selected methods not only allow quantifying the
quality of TFDs instead of relying solely on visual inspection of their plots, but also
help in drawing comparison of the proposed technique with the other existing tech-
niques found in literature for the purpose. In particular the computation regularities
show the effectiveness of the objective criteria in quantifying the TFDs' concentration
and resolution information.
ACKNOWLEDGMENTS
In the name of Allah the most bene�cient the most merciful. Praise be to Allah
subhanhu wa ta'ala and peace and blessing be on all his prophets and messengers, espe-
cially on the seal of prophets, prophet Muhammad salalahu alaihi wassalam. Without
Allah subhanhu wa ta'ala's help and blessing, I was unable to complete this thesis.
I am indebted to Professor Dr. Syed Ismail Shah and Professor Dr. Jamil Ah-
mad, my advisors, for their guidance and patience throughout my research. More
importantly, I am grateful to both for the support and encouragement they generously
gave me when I needed it most. Special thanks to Professor Dr. Shoab Ahmad Khan
for not only being on my dissertation committee but for his comments on my proposal
report, fruitful discussions and valuable help in the second part of the thesis. I would
also like to thank my other dissertation committe members Professor Dr. Abdul Khaliq
and Professor Dr. Amir Iqbal Bhatti.
I would also like to thank my research colleague Faisal Mehmood Kashif for
encouraging me to embark on my research. I offer thanks to my friends and colleagues
Adnan Khan, Sajid Bashir, Imran Zaka, Habib ur Rehman and Seema Khalid.
I am grateful to my parents for their in�nite support and for teaching me the
importance of knowledge. I would also like to thank my wife and daughter for their
patience and continued support.
I would also like to thank the Higher Education Commission (HEC) of Pakistan
for the four year scholarship for graduate studies. Last but not least, I would like
to thank unknown and anonymous reveiwers, whose critique caused stimulating and
illuminating discussions. Their valuable comments have helped me revising my work
and eventually publishing it in prestigious international journals. List of publications
is given in appendix B.
CONTENTS
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Concentration and Resolution Tradeoff . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.2 Application speci�c Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3 Objective Assessment of TFDs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.4 Our Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.5 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.5.1 Review of High Resolution TF methods . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.5.2 A Novel ANN based Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.5.3 The Objective Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.6 Overview of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.1 TF Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.1.1 The Methods based on Evolutionary Spectrum . . . . . . . . . . . . . . . . . . . 19
2.1.2 The Methods based on Cohen's Bilinear Class . . . . . . . . . . . . . . . . . . . 25
2.2 Objective Assessment Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2.2.1 Entropy Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
2.2.2 Normalized Entropy Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
2.2.3 Ratio of Norms based Measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
2.2.4 LJubisa Measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
2.2.5 Boashash Performance Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
2.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3 Neural Network based Framework for ComputingDe�blurred TFDs�Part I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.1 TFDs using ANN�Binary Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.1.1 Selected ANN Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.1.2 Test Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.2 Analysis & Comparison of the ANN Training Algorithms . . . . . . . . . . . . . . . 60
3.2.1 The Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
3.2.2 Selected ANN Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
3.2.3 Performance Evaluation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
3.3 Impact of varying number of Neurons and the Hidden Layers . . . . . . . . . . . . 72
3.3.1 ANN topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
3.3.2 Effect of varying the number of Hidden Neurons . . . . . . . . . . . . . . . . . 74
3.3.3 Effect of varying the number of Hidden Layers . . . . . . . . . . . . . . . . . . . 75
3.4 Effect of Data Clustering and using Multiple ANNs for each Cluster . . . . . 80
3.4.1 Advantages of Clustering and Training Multiple ANNs . . . . . . . . . . . 80
3.4.2 The Network Architecture and Procedure . . . . . . . . . . . . . . . . . . . . . . . . 81
3.4.3 Performance Evaluation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
4 Neural Network based Framework for ComputingDe�blurred TFDs�Part II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
4.1 The ANN based Framework's Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
4.1.1 Pre�processing of Training Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
4.1.2 Processing through Bayesian Regularized Neural Network Model 102
4.1.3 Post�processing of the Output Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
4.2 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
4.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
5 Discussion on Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .113
5.1 Visual Interpretation and Entropy Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
5.1.1 Resultant NTFDs � Experimental Results . . . . . . . . . . . . . . . . . . . . . . . 117
5.2 Objective Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
5.2.1 Real Life Test Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
5.2.2 Synthetic Test Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
5.2.3 Performance Evaluation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
5.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
6 Conclusion and Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .134
6.1 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .139
A ANN Fundamentals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .162
A.1 Brain Vs ANN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
A.2 Human Vs Arti�cial Neuron . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
A.3 ANN Layers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
A.4 Weights and Error Adjustment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
A.4.1 Back propagation Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
A.5 Learning Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
A.5.1 The Lavenberg�Marquardt back propagation training algorithm . . 170
A.5.2 The Powell�Beale conjugate gradient back propagation trainingalgorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
A.5.3 The Gradient descent with adaptive learning rate back propagationtraining algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
A.5.4 The Resilient propagation back propagation training algorithm. . . 173
A.6 Bayesian Regularisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
B List of Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .176
B.1 Journal Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
B.1.1 Published . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
B.2 Conference Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
LIST OF FIGURES
Figure 2.1 TFDs of a multicomponent bat echolocation chirp signal. (a)Spectrogram (Test Input to the BRNNM)[Hamming window of length L = 100], (b)WVD, (c) ZAMD, (d) MHD, (e) CWD [kernel width =1], (f) BJD.. . . . . . . . . . . . . . . 45
Figure 3.1 Graphical explanation of the method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
Figure 3.2 Input training TFD image of sinusoidal FM signal. . . . . . . . . . . . . . . . . 58
Figure 3.3 Input training TFD image of parallel chirps. . . . . . . . . . . . . . . . . . . . . . . 59
Figure 3.4 Target TFD image for the sinusoidal FM signal. . . . . . . . . . . . . . . . . . . . 59
Figure 3.5 Target TFD image of parallel chirp signal. . . . . . . . . . . . . . . . . . . . . . . . . 60
Figure 3.6 Bineary TFD obtained by the OKM [132]. . . . . . . . . . . . . . . . . . . . . . . . . 61
Figure 3.7 Spectrogram of the bat echolocation chirp signal. . . . . . . . . . . . . . . . . . 61
Figure 3.8 The deblurred TFD obtained by the proposed ANN model. . . . . . . . . 62
Figure 3.9 Input training TFD image of the sinusoidal FM signal. . . . . . . . . . . . . 63
Figure 3.10 Input training TFD image of parallel chirps signal. . . . . . . . . . . . . . . . . 63
Figure 3.11 Test TFD image of combined sinusoidal FM & parallel chirpssignal. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
Figure 3.12 The resultant TFD obtained after passing the spectrogram of the testsignal through the trained ANN with RPROP backpropagation algorithm. . . . . . . . . 69
Figure 3.13 The resultant TFD obtained after passing the spectrogram of thetest signal through the trained ANN with Powell-Beale conjugate gradient backpropagation algorithm.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
Figure 3.14 The resultant TFD obtained after passing the spectrogram of thetest signal through the trained ANN with Gradient descent with adaptive lrbackpropagation algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
Figure 3.15 The resultant TFD obtained after passing the spectrogram of thetest signal through the trained neural network with Levenberg-Marquardt trainingalgorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
Figure 3.16 The comparative graph which shows error convergence with respect tonumber of iterations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
Figure 3.17 Test spectrogram image of single chirp signal. . . . . . . . . . . . . . . . . . . . . 73
Figure 3.18 The comparative graph of error vs number of neurons in single hiddenlayer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
Figure 3.19 The comparative graph of error vs epoches for various number ofhidden layers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
Figure 3.20 Resultant TFD (2 hidden layers with 50 neurons in each). . . . . . . . . . 76
Figure 3.21 Resultant TFD (2 hidden layers with 5 neurons in each). . . . . . . . . . . . 77
Figure 3.22 Resultant TFD (3 hidden layers with 5 neurons in each). . . . . . . . . . . . 77
Figure 3.23 Resultant TFD (3 hidden layers with 20 neurons in each). . . . . . . . . . 78
Figure 3.24 Resultant TFD (2 hidden layers with 15 neurons in each). . . . . . . . . . 78
Figure 3.25 Resultant TFD (single layer with 40 neurons). . . . . . . . . . . . . . . . . . . . . 79
Figure 3.26 Resultant TFD (Single hidden layer with 30 neurons). . . . . . . . . . . . . . 79
Figure 3.27 Resultant TFD obtained after processing test TFD with single ANNwithout data clustering. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
Figure 3.28 Resultant TFD obtained after processing test TFD with multiple ANNsafter data clustering. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
Figure 3.29 The MSE in last epoch for (a) ANNs trained for cluster 1 (b) ANNstrained for cluster 2 (c) ANNs trained for cluster 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
Figure 3.30 Rate of MSE convergence against epochs for (a) ANNs trained forcluster 1 (b) ANNs trained for cluster 2 (c) ANNs trained for cluster 3 . . . . . . . . . . . 85
Figure 3.31 The convergence time taken by variuos ANNs for each cluster of data,(a) ANNs trained for cluster 1, (b) ANNs trained for cluster 2, (c) ANNs trained forcluster 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
Figure 4.1 Flow diagram of the method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
Figure 4.2 Major modules of the method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
Figure 4.3 Pre-processing of training data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
Figure 4.4 The spectrograms used as input training images of the (a) sinusoidalFM, and (b) parallel chirp signals. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
Figure 4.5 Target TFDs with CTs unsuitable for training ANN taking WD of the,(a) parallel chirps' signal, and (b) sinusoidal FM signal. . . . . . . . . . . . . . . . . . . . . . . . . . 95
Figure 4.6 The non-processed WD target images of the sinusoidal FM signal, (a)grayscale version, (b) binary version. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
Figure 4.7 The pre-processed WD target image of sinusoidal FM signal, (a)grayscale version, (b) binary version. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
Figure 4.8 The non-processed WD target images of the parallel chirps' signal, (a)grayscale version, (b) binary version. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
Figure 4.9 The pre-processed WD target image of the parallel chirps' signal, (a)grayscale version, (b) binary version. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
Figure 4.10 Elbow criterion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
Figure 4.11 Vectorization, correlation and taxonomy of TFD image. . . . . . . . . . . 102
Figure 4.12 Bayesian regularised neural network model . . . . . . . . . . . . . . . . . . . . . . 103
Figure 4.13 Post-processing of the output data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
Figure 4.14 Test TFDs for bat chirps signal, (a) the spectrogram TFD, and (b) Theresultant TFD after processing through proposed framework. . . . . . . . . . . . . . . . . . . . 110
Figure 4.15 Resultant TFD obtained by the method of [132]. . . . . . . . . . . . . . . . . . 111
Figure 5.1 Test TFDs (a) Crossing chirps (TI 1), (b) mono-component linear chirp(TI 2), (c) combined quadratic swept-frequency signals whose spectrograms areconcave and convex parabolic chirps respectively (TI 3), (d) combined sinusoidal FMand crossing chirps (TI 4), and (e) quadratic chirp (TI 5) . . . . . . . . . . . . . . . . . . . . . . . 116
Figure 5.2 Resultant TFDs after processing through correlation vectored taxonomyalgorithm with LNNs for (a) Crossing chirps (TI 1), (b) mono-component linear chirp(TI 2), (c) combined quadratic swept-frequency signals whose spectrograms are
concave and convex parabolic chirps respectively (TI 3), (d) combined sinusoidal FMand crossing chirps (TI 4), and (e) quadratic chirp (TI 5) . . . . . . . . . . . . . . . . . . . . . . . 119
Figure 5.3 (a) The test spectrogram (TI 2) [Hamm;L = 90] . (b) The NTFD ofa synthetic signal consisting of two sinusoidal FM components intersecting eachother. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
Figure 5.4 (a) The test spectrogram (TI 3) [Hamm;L = 90] . (b) The NTFD of asynthetic signal consisting of two-sets of non-parallel, non-intersecting chirps. . . 121
Figure 5.5 (a) The test spectrogram (TI 4) [Hamm;L = 90] . (b) The NTFD of asynthetic signal consisting of crossing chirps and a sinusoidal FM component.. . . 122
Figure 5.6 (a) The test spectrogram (TI 5), and (b) the NTFD of test case 4. . . 122
Figure 5.7 The time slices for the spectrogram (blue) and the NTFD (red) for thebat echolocation chirps' signal, at n=150 (left) and n=310 (right) . . . . . . . . . . . . . . . 124
Figure 5.8 Comparison plots, criterions' values vs TFDs, for the test images1 � 4, (a) The Shannon entropy measure, (b) Rényi entropy measure, (c) Volumenormalized Rényi entropy measure,(d) Ratio of norm based measure, and (e) LJubisameasure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
Figure 5.9 The normalized slices at t = 64 of TFDs. (a) The spectrogram. (b)WD. (c) ZAMD. (d) CWD. (e) BJD. (f) NTFD. First �ve TFDs (dashed) are comparedagainst the modi�ed B distribution (solid), adopted from Boashash [33]. . . . . . . . . 131
Figure 5.10 Comparasion plots for Boashash TFDs' performance measuresvs TFDs, (a) The modi�ed concentration measure (Cn(64)), (b) normalizedinstantaneous resolution measure (Ri) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
Figure 6.1 The �ow diagram of the neural network based method. . . . . . . . . . . . 163
Figure 6.2 (a) Human's neuron (b) Arti�cial neuron . . . . . . . . . . . . . . . . . . . . . . . . 165
LIST OF TABLES
Table 1.1 Synthesis of Main Problems related to QTFDs . . . . . . . . . . . . . . . . . . . . . . 6
Table 3.1 Comparison of Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
Table 3.2 Comparison of Training Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
Table 3.3 Impact of varying neurons and hidden layers over entropy of resultantimage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
Table 3.4 Comparison of Entropies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
Table 4.1 Entropy values vs clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
Table 4.2 Cluster parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
Table 4.3 Entropy values for various techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
Table 5.1 Entropy values for various techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
Table 5.2 Performance Measures Comparison for Various TFDs . . . . . . . . . . . . 128
Table 5.3 Parameters and the Normalized Instantaneous Resolution PerformanceMeasure of TFDs for the Time Instant t=64 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
Table 5.4 Parameters and the Modi�ed Instantaneous Concentration PerformanceMeasure of TFDs for the Time Instant t=64 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
ACRONYMS
TFD Time�Frequeny Distribution
TF Time�Frequency
LNN Localized Neural Network
ANN Arti�cial Neural Network
STSC Signals with time�dependant spectral content
BD Bilinear Distribution
ES Evolutionary Spectrum
EP Evolutionary Periodogram
GTF Generalized Transfer Function
MSE Mean Square Error
WD Wigner�Ville Distribution
CT Cross�Term
2�D Two�Dimensional
IF Instantaneous Frequency
STFT Short Time Fourier Transform
QTFD Quadratic TFD
TVARMA Time�Varying Auto�Regressive Moving Average
NTFD Neural Network TFD
LFM Linear Frequency Modulation
BRNN Bayesian Regularised Neural Network
NENN Network of Expert Neural Network
EMD Empirical Mode Decomposition
CAD Complex Argument Distribution
LMB Lavenberg�Marquardt back propagation
EN Expert Neural Network
CDMN Clustering the Data and training Multiple ANNs
FM Frequency Modulation
PBCGB Powell�Beale conjugate gradient backpropagation
RPB Resilient Propagation back propagation
GDALB Gradient descent with adaptive lr backpropagation
BRNNM Bayesian Regularised Neural Network Model
AF Ambiguity Function
LTV Linear Time Varying
DASE Data�Adaptive evolutionary Spectral Estimator
LAF Local Autocorrelation Function
MCE Minimum Cross Entropy
AOK Adaptive Optimal Kernel
OMP Optimized Matching Pursuit
STAF Short Time Ambiguity Function
OKM Optimal Kernel Method
1
Chapter 1Introduction
During the last twenty years there has been spectacular growth in the volume
of research on studying and processing the signals with time�dependant spectral con-
tent (STSC). For such signal we need techniques that can show the variation in the
frequency of the signal over time. Although some of the methods may not result
in a proper distribution, these techniqes are generally known as time�frequency dis-
tributions (TFDs) [34, 35]. The TFDs are two�dimensional (2�D) functions which
provide simultaneously, the temporal and spectral information and thus are used to
analyze the STSC. By distributing the signal energy over the time�frequency (TF)
plane, the TFDs provide the analyst with information unavailable from the signal's
time or frequency domain representation alone. This includes the number of com-
ponents present in the signal, the time durations and frequency bands over which
these components are de�ned, the components' relative amplitudes, phase informa-
tion, and the instantaneous frequency (IF) laws that components follow in the TF
plane.
There has been a great surge of activity in the past few years in the TF sig-
nal processing domain. The pioneering work in this area is performed by Claasen
and Mecklenbrauker [69]�[71], Janse and Kaizer [73], and Boashash [74]. They pro-
vided the initial impetus, demonstrated useful methods for implementation and de-
veloped ideas uniquely suited to the situation. Also, they innovatively and ef�ciently
2
made use of the similarities and differences of signal processing fundamentals with
quantum mechanics. Claasen and Mecklenbrauker devised many new ideas, proce-
dures and developed a comprehensive approach for the study of joint TFDs [69]�[71].
However Boashash [74] is believed to be the �rst researcher, who used various TFDs
for real world problems. He developed a number of new methods and particularly re-
alized that a distribution may not behave properly in all respects or interpretations,
but it could still be used if a particular property such as the IF [27, 31] is well de�ned.
Escudie [77] and coworkers transcribed directly some of the early quantum mechan-
ical results, particularly the work on the general class of distributions [68, 78], into
signal analysis. The work by Janse and Kaizer [73] developed innovative theoreti-
cal and practical techniques for the use of TFDs and introduced new methodologies
remarkable in their scope.
Historically the spectrogram [23, 24, 25] has been the most widely used tool for
the analysis of time�varying spectra. The spectrogram is expressed mathematically
as the magnitude�square of the short�time Fourier transform (STFT) of the signal,
given by1
S (t; !) =
����Z x (t)h(t� �)e�i!�d�����2 (1.1)
where x(t) is the signal and h(t) is a window function. The basic idea is to fourier
analyze a small part of the signal centered around a particular time by means of a
1 Throughout the thesis that follows, both j and i are used forp�1 depending upon the mathemat-
ical requirements and limits forRare from �1 to +1, unless otherwise speci�ed
3
sliding window and getting an energy spectrum as a continuous function of time by
doing it for each instant of time. As long as the signal's chunks themselves do not
contain rapid changes, the results obtained can be used to get a fairly good idea about
the spectral composition of the signal. This selected chunk may be shortened ap-
propriately to a limit if signi�cant changes occurred considerably faster. However
�nding a suitable chunk size for some signals, like human speech, may not be possi-
ble whose spectral content changes rapidly as there may not be any time interval for
which the signal is stationary. Also, the frequency resolution reduces once the chunk
size is reduced in time�domain. Hence there is an inherent tradeoff between time
and frequency resolution [1]. Nevertheless, the spectrogram has severe drawbacks,
both theoretically, since it provides biased estimators of the signal IF and group delay
(GD), and practically, since the Gabor�Heisenberg inequality [120] makes a tradeoff
between temporal and spectral resolutions unavoidable. However STFT and its vari-
ation being simple and easy to manipulate, are still the primary methods for analysis
of the STSC and most commonly used today.
There are other approaches [34, 35, 80] with a motivation to improve upon the
spectrogram, with an objective to clarify the physical and mathematical ideas needed
to understand time�varying spectrum. These techniques generally aim at devising a
joint function of time and frequency, a distribution that will be highly concentrated
along the IFs present in a signal and cross terms (CTs) free thus exhibiting good
resolution. One form of TFD can be formulated by the multiplicative comparison of
4
a signal with itself, expanded in different directions about each point in time. Such
formulations are known as quadratic TFDs (QTFDs) because the representation is
quadratic in the signal. This formulation was �rst described by Eugene Wigner in
quantum mechanics [2] and introduced in signal analysis by Ville [3] to form what
is now known as the Wigner�Ville distribution (WD). The WD is the prototype of
distributions that are qualitatively different from the spectrogram, produces the ideal
energy concentration along the IF for linear frequency modulated (LFM) signals,
given by
W (t; !) , 1
2�
Zs�(t� 1
2�)s(t+
1
2�)e�i�!d� (1.2)
where s(t) is the signal, the distribution is said to be bilinear in the signal because
the signal enters twice in its calculation. The WD preserves the time and frequency
energy marginals of a signal with high TF concentration. It can be argued that more
concentration than in the WD would be undesirable in the sense that it would not
preserve the TF marginals.
It is found that the spectrogram results in a blurred version [1, 31], which can
be reduced to some degree by use of an adaptive window or by combination of spec-
trograms. On the other hand, the use of WD in practical applications is limited by
the presence of nonnegligible CTs, resulting from interactions between signal com-
ponents. These CTs may lead to an erroneous visual interpretation of the signal's TF
structure, and are also a hindrance to pattern recognition, since they may overlap with
5
the searched TF pattern. Moreover If the IF variations are non�linear, then the WD
cannot produce the ideal concentration. Such impediments, pose dif�culties in the
STSC' correct analysis, are dealt in various ways and historically many techniques
are developed to remove them partially or completely. They were partly addressed
by the development of the Choi�Williams distribution [144] in 1989, followed by
numerous ideas proposed in literature with an aim to improve the TFDs' concentra-
tion and resolution for practical analysis [17, 26, 31, 33, 90]. Few other important
non�stationary representations among the Cohen's class [1, 68, 135] of bilinear TF
energy distributions include the Margenau�Hill distribution [8], their smoothed ver-
sions [69]�[71] and [75, 76], and many others with reduced CTs [143]�[146] are
members of this class. Nearly at the same time, some authors also proposed other
time�varying signal analysis tools based on a concept of scale rather than frequency,
such as the scalogram [9, 10] (the squared modulus of the wavelet transform), the
af�ne smoothed pseudo WD [12] or the Bertrand distribution [13]. The theoretical
properties and the application �elds of this large variety of these existing methods
are now well determined, and wide�spread [1], [69]�[71], [86]. Although many
other QTFDs have been proposed in the literature (an alphabatical list can be found
in [209]), no single QTFD can be effectively used in all possible applications. This
is because different QTFDs suffer from one or more problems.
Nevertheless, a critical point of these methods is their readability, which means
both a good concentration of the signal components and no misleading interference
6
terms. This characteristic is necessary for an easy visual interpretation of their out-
comes and a good discrimination between known patterns for nonstationary signal
classi�cation tasks. An ideal TFD function roughly requires the following four prop-
erties:
1. High clarity which makes it easier to be analyzed. This require high
concentration and good resolution along the individual components for the
multicomponent signals. Consequently the resultant TFDs are de�blurred.
2. CTs' elimination which avoids confusion between noise and real components
in a TFD for nonlinear TF structures and multicomponent signals.
3. Good mathematical properties which bene�t to its application. This requires
that TFDs to satisfy total energy constraint, marginal characteristics and
positivity issue etc. Positive distributions are everywhere nonnegative, and yield
the correct univariate marginal distributions in time and frequency.
4. Lower computational complexity means the time needed to represent a signal
on a TF plane. The signature discontinuity and weak signal mitigation may
increase computation complexity in some cases.
A comparison of some popular TFD functions is presented in Table 1.1. To
analyze the signals well, choosing an appropriate TFD function is important. Which
TFD function should be used depends on what application it applies on. On the
7
Table 1.1: Synthesis of Main Problems related to QTFDs
Synthesis of Major Concerns Gabor transform WD Gabor�Wigner transformClarity Worst Best Reasonably GoodCTs Nil Present for multi� Almost eliminated
component signals andnonlinear TF structures
Mathematical properties Unsatisfactory Satisfactory GoodComputational complexity Quite Low High Higher
other hand, the short comings make speci�c TFDs suited only for analyzing STSC
with speci�c types of properties and TF structures. An obvious question then arise
that which distribution is the "best" for a particular situation. Generally there is
an attempt to set up a set of desirable conditions and to try to prove that only one
distribution �ts them. Typically, however, the list is not complete with the obvious
requirements, because the author knows that the added desirable properties would
not be satis�ed by the distribution he or she is advocating. Also these lists very
often contain requirements that are questionable and are obviously put in to force an
issue. As an illustration, by focusing on the WD and its variants, Jones and Parks
[100] have made an interesting comparative study of the resolution properties and
have shown that the relative performance of the various distributions depend on the
signal. The results show that the pseudo WD (PWD) is best for the signals with
only one frequency component at any one time, the Choi�Williams distribution is
most attractive for multicomponent signals in which all components have constant
frequency content, and the matched �lter STFT is best for signal components with
signi�cant frequency modulation. Jones and Parks have concluded that no TFD can
8
be considered as the best approach for all TF analysis and both concentration and
resolution can not be improved at one time.
Half way in this decade, there has been an enormous amount of work towards
achieving high concentration along the individual components and to enhance the
ease of identifying the closely spaced components in the TFDs. The aim has been
to correctly interpret the fundamental nature of the STSC under analysis in the TF
domain. There have been three open trends that make this task inherently more com-
plex:
1. Concentration and resolution tradeoff
2. Application speci�c environment
3. Objective Assessment of TFDs
1.1 Concentration and Resolution Tradeoff
Tradeoff between concentration and CTs' removal is a classical problem. The con-
cepts of concentration and resolution have generally been used synonymously and
are considered equivalent in literature. This may be true for monocomponent signals
but for multicomponent signals this is not necessarily the case and we need to es-
tablish a clear distinction between the two terms. As an illustration, the CTs in the
WD do not reduce the auto component concentration of the WD, which is consid-
9
ered optimal, but they do reduce the resolution. Although high signal concentration
is always desired and is often of primary importance, in many application signal res-
olution may be more important, specially in the analysis of multicomponent signals.
Consequently these two aspects can be ef�ciently utilized for adaptive and automatic
parameters selection and optimization in TF analysis, without interference by a user
[100].
1.2 Application speci�c Environment
Different applications have different preferences and requirements to the TFDs. In
general the choice of a TFD in a particular situation depends on many factors such as
the relevance of properties satis�ed by TFDs, the computational cost and speed of the
TFD, and the tradeoff in using the TFD. For example, the results, which are claimed
optimal for a situation being highly concentrated but with weak signal mitigation and
discontinous signature, may not be feasible for certain applications.
1.3 Objective Assessment of TFDs
It is a fact that choosing the right TFD to analyze the given signal is not straightfor-
ward, even for monocomponent signals, and becomes more complex while dealing
with multicomponent signals. The common practice to determine the best TFD for
the given signal have been the visual comparison of all plots with the choice of most
appealing one. However this selection is generally dif�cult and subjective. The need
10
to objectively compare the various TFDs requires the introduction of some quantita-
tive performance measure speci�cally tailored for TFDs.
The estimation of signal information and complexity in the TF plane is quite
challenging. The themes which inspire new measures for estimation of signal infor-
mation and complexity in the TF plane, include the CTs' suppression, concentration
and resolution of auto�components and the ability to correctly distinguish closely
spaced components. Ef�cient concentration and resolution measurement can provide
a quantitative criterion to evaluate performances of different distributions. They con-
form closely to the notion of complexity that are used when visually inspecting TF
images [37].
1.4 Our Approach
This thesis proposes a novel arti�cial neural network (ANN) based framework which
focuses to estimate de�blurred TFDs of different signals taking advantage of the
ANN learning capabilities. A number of research papers, have been published based
on the proposed idea, which is evidence to the novelty of the idea. A list is provided
at appendix B. The removal of TFDs' distortions is considered as a case of image de�
blurring. This is particularly suited for learning [218] by ANN because of following
reasons [109]:
� There is little information available on the source of blurring.
11
� Usually blurring is the result of combination of events, which makes it too
complex to be mathematically described.
� Suf�cient data are available and it is conceivable that data capture the
fundamental principle at work.
The method fundamentally involves training a set of suitably chosen ANNs
with the spectrograms of known signals as the input, and processed WDs as the
target. Judiciously selected signals having time�varying frequency components are
employed for the training purposes and the trained ANN model then provides the
de�blurred TFDs from spectrograms of unknown signals.
1.5 Contribution
The work presented in this thesis contribute in three ways to the research �eld of TF
signal processing, which are discussed as under.
1.5.1 Review of High Resolution TF methods
In the �rst part of thesis, a review of TF methods, for obtaining high concentra-
tion and good resolution in a TFD, is presented which are proposed in last decades.
Though the task is ambitious but this part is important for signal processing research
community due to following reasons
12
1. It provides the basic concepts and well�tested algorithms to obtain highly
concentrated and good resolution TFDs for research community. The emphasis
is given to the ideas and methods that have been developed steadily so that
readily understood by the uninitiated,
2. It highlights unresolved issues with stress over the fundamentals to make it
interesting for an expert as well. The approaches are presented in a sequence
developing the ideas and techniques in a logical sequence rather than historical,
and
3. It attempts to clearly describe what a time�varying spectrum is, and dicusses
important aspects to represent properties of signals simultaneously in time and
frequency without any ambiguity.
1.5.2 A Novel ANN based Framework
The second contribution of the thesis is the implementation of a novel ANN based
method for computing highly informative TFDs. The proposed method provides
a way to obtain a non�blurred and high resolution version of the TFDs of signals
whose frequency components vary with time. The resulting TFDs do not have the
CTs that appear in case of multicomponent signals in some distributions such as WD,
thus providing visual way to determine the IF of nonstationary signals. It is proved
that
13
1. ANN learning capabilities can be successfully used in the TF �eld, where they
have not been applied before.
2. The effectiveness of the BRNNM to estimate the good resolution and highly
concentrated TFDs. The degree of regularisation is automatically controlled
in the Bayesian inference framework and produces networks with better
generalised performance and lower susceptibility to over��tting,
3. The usefulness of clustering the data based on underlying TF images'
characteristics,
4. The advantage of training multiple networks for each cluster and selecting the
best, and
5. A mixture of expert networks (ENs) focused on a speci�c task are found to
deliver a TFD that is highly concentrated along the IF with no CTs as compared
to training ANN which do not receive the selected input.
1.5.3 The Objective Assessment
The third contribution is the exploration of TFDs' performance measures and dimen-
sions. An alteration for an existing concentration measure is suggested and used to
get true picture of TFDs performance presented in [33]. It is brought out that
14
1. Ef�cient TFD concentration and resolution measure can provide a quantitative
criterion to evaluate performance of different distributions,
2. Such measures can be used for adaptive and automatic parameters selection,
3. The validity of the selected objective criteria is veri�ed in effectively describing
the TFDs' concentration and resolution performance.
1.6 Overview of the Thesis
The thesis is organised as follows
Chapter 2 gives the background on the high resolution TF analysis and is or-
ganised in three parts. First part covers two separate areas of interest. The former
gives the detail of those methods which are based on the evolutionary spectrum to
improve the TF picture. Whereas the later presents the work to obtain high con-
centration and good resolution TFDs based on Cohen's bilinear class. The ideas of
concentration and resolution are presented in greater detail and a brief overview of
the best�known open issues considered by the academic and industrial communities
is also discussed. The second part discusses the necessary ANN fundamentals, on
which the proposed technique to obtain de�blurred TFDs is built upon. Part three
gives the description of the most popular objective criteria, which are used to evalu-
ate the TFDs' performance.
15
In Chapter 3; we progressively recount, in various sections, the work towards
the goal of improving the TFDs' resolution and concentration. The Chapter formu-
lates the problem, states various constraints and presents a novel ANN based fram-
work to de�blur the TFDs with few limitations. It presents the comparison of ANN
training algorithms and proceeds with an experimental study on optimizing the de-
sign, architecture and various parameters of the ANN setup such as, number of neu-
rons, layers, and type of activation functions. More importantly the Chapter discusses
the advantage of training multiple ANNs and clustering the data.
In Chapter 4; the optimized ANN based framework is presented for realization
of highly concentrated and good resolution TFDs, catering for the limitations as-
sumed in the previous Chapter. The proposed technique makes use of the principles
of vectorization, correlation and taxonomy. This ANN based technique is evaluated
by comparing it with other related work and the pre�eminence of results over other
techniques is shown. The role of appropriately vectored and clustered data and the
effect of regularization under Mackay's evidence framework on the training process
is described. In the �nal part of chapter, a real life signal is considered to check
the effectiveness of the proposed algorithm via analysis based on entropy and visual
interpretation.
Chapter 5 provides the discussion on the experimental results by considering
various synthetic and real life signals. A number of theoretical results are presented
for the validation, soundness and completeness of the proposed framework. The
16
Chapter uses objective methods of assessment to evaluate the performance of de�
blurred TFDs by the BRNNM. Performance comparison with various other quadratic
TFDs is provided too.
Finally, Chapter 6 concludes the thesis. The limitations and future work are
discussed. Possible extensions of the framework are proposed, the most interesting
being the derivation of mathematical expression for the IF and satisfying the mar-
ginals and other TFDs' constraints.
17
Chapter 2Background
The chapter discusses the background of this research work for computing de�
blurred TFDs, and is divided in two parts. The �rst part gives the basic concepts and
well�tested algorithms to obtain highly concentrated and good resolution TFDs2. The
emphasis remains on the ideas and methods that have been developed steadily so that
readily understood by the uninitiated. It is endeavored to highlight unresolved issues
with stress over the fundamentals to make it interesting for an expert as well. The ap-
proaches are presented in a sequence developing the ideas and techniques in a logical
sequence rather than historical. The individual sections may be read independently.
The ANN fundamentals have been placed in Appendix A. The parameters of ANN
based setup are optimized in succeeding Chapters. The second part discusses the
necessary theoretical and mathematical description of TFDs' objective assessment
methods that have been proposed in past few years.
2.1 TF Analysis
The concepts of concentration and resolution have generally been used synonymously
and are considered equivalent in literature. This may be true for monocomponent
signals but for multicomponent signals this is not necessarily the case and a clear
2 Although new ideas are coming up rapidly, we can not discuss all of them due to space limitations.
18
distinction between the two terms is needed. As an illustration, the CTs in the WD
do not reduce the auto component concentration of the WD, which is considered
optimal, but they do reduce the resolution. Although high signal concentration is
always desired and is often of primary importance, in many application signal reso-
lution may be more important, specially in the analysis of multicomponent signals.
There have generally been two approaches to estimate the time�dependent spectrum
of nonstationary processes.
1. The evolutionary spectrum (ES) approaches [38]�[41], which model the
spectrum as a slowly varying envelope of a complex sinusoid.
2. The Cohen's bilinear distributions (BDs) [31], including the spectrogram, which
provides a general formulation for joint TFDs. Computationally, the ES methods
fall within Cohen's class.
There are known limitations and inherent drawbacks associated with these clas-
sical approaches. These pheneomena make their interpretation dif�cult, consequently
estimation of the spectra in the TF domain displaying good resolution and high con-
centration has become a research topic of great interest3. In Section 2.1.1 the high
TF resolution approaches based on the ES theory are presented, and Section 2.1.2
discusses the approaches based on Cohen's BDs.
3 Due to the limitation of space, only a brief account of various high resolution TF method is pre-sented here. For greater details and the best simulation results refer to [255]
19
2.1.1 The Methods based on Evolutionary Spectrum
The ES was �rst proposed by Priestley in 1965. The basic idea is to extend the
classic Fourier spectral analysis to a more generalized basis: from sine or cosine
to a family of orthogonal functions. In his evolutionay spectral theory, Priestely
represents nonstationary signals using a general class of oscillatory functions and
then de�nes the spectrum based on this representation [229]. A special case of the
ES used the Wold�Cramer representation of nonstationary processes [230]�[233] to
obtain a unique de�nition of the time�dependent spectral density function. The main
objective in deriving and presenting these relations in [234] was to show that the BDs
and the spectrogram can be the estimators of the ES. These relations will also enable
us to represent the different TFDs in terms of the generalized transfer function (GTF),
allowing us to recover the GTF or the ES from them.
A great amount of work is found by Pitton and Loughlin in this area [242]�
[250]. They have investigated the positve TFDs and their potential applications,
utilizing the ES and Thompson's multitaper approach [240, 241], but they do not
discuss the issue of their concentration and resolution as such. Positive distributions
are everywhere nonnegative, and yield the correct univariate marginal distributions
in time and frequency.
The literature indicates that the pioneering work is performed by Chaparro,
Jaroudi, Kayhan, Akan, and Suleesathira. These researcher have not only focused in
computing the improved evolutionary spectra of non�stationary signals but also inno-
20
vatively applied the concepts to application in various practical situations [38]�[64].
Their major work include, signal�adaptive evolutionary spectral analysis and a para-
metric approach for data�adaptive evolutionary spectral estimation. An interesting
work is performed by Jachan, Matz and Hlawatsch [49] on the parametric estima-
tions for underspread nonstationary random processes.
2.1.1.1 Signal�Adaptive Evolutionary Spectral Analysis
Although it is well recognized that the spectra of most signals found in practical
applications depend on time, estimation of these spectra displaying good TF resolu-
tion is dif�cult [31]. The problem lies in the adaptation of the analysis methods to the
change of frequency in the signal components. Constant�bandwidth mehods, such as
the spectrogram and traditional Gabor expansion [67], provide estimates with poor
TF resolution.
The earlier approaches by Akan and Chaparro to obtain high resolution evolu-
tionary spectral estimates include: averaging estimates obtained using multiple win-
dows [48] and maximizing energy concentration measure [41]. A modi�ed Gabor
expansion is proposed in [41] that uses multiple windows, dependent on different
scales and modulated by linear chirps. Computation of the ES with this expansion
provides estimates with good TF resolution. The dif�culties encountered, however,
were the choices of scales and in the implementation of the chirping.
21
2.1.1.2 Data�Adaptive Evolutionary Spectral Estimation
The ES theory is though mathematically well�grounded, but has suffered from
a shortage of estimators. The initial work from Kayhan concentrate on evolutionary
periodogram (EP) as an estimator on the line of BDs. The latest work, however,
follows a parametric approach in deriving the high quality estimator for the ES [38,
39]. Parametric approaches to model the non�stationary signal using rational models
with time�varying coef�cients represented as expansions of orthogonal polynomial
have been proposed by various investigators, e.g., [235, 236]. However, the validity
of their view of a nonstationary spectrum as a concatenation of �frozen�time� spectra
has been questioned [232, 237].
In their earlier effort, Kayhan, Jaroudi and Chaparro [38, 65] proposed the EP
as an estimator of the Wold�Cramer ES. The EP is found to possess many desir-
able properties and reduces to the conventional periodogram in the stationary case.
But there were some unrealistic assumptions like considering signal components as
uncorrelated. This lead to the development of data�adaptive evolutionary spectral
esitmator to improve the performance [39].
The proposed estimator uses information about the signal components at fre-
quencies other than the frequency of interest. It computes the spectrum at each
frequency while minimizing the interference from components at other frequencies
without making any assumptions regarding these components. This estimator re-
duces to Capon's maximum likelihood method [239] in the stationary case. This new
22
estimator has better TF resolution than the EP and it possesses many desirable prop-
erties analogous to those of Capon's method. In particular, it performs more robustly
than existing methods when the data is noisy.
2.1.1.3 Miscellaneous Techniques and Applications
A considerable amount of work is performed by a number of researchers in
achieving good resolution ES and applying the results and related theory to many
�elds, specially where nonstationary signals arise. The purpose of their work has
ranged from the simple graphic presentation of the results to sophisticated manipu-
lations of spectra. Suleesathira, Chaparro and Akan [44] propose a transformation
for discrete signals with time�varying spectra. The kernel of this transformation pro-
vides the energy density of the signal in TF with good resolution qualities. With
this discrete evolutionary transform a clear representation for the signal as well as its
TF energy density is obtained. It is suggested to use either the Gabor or the Malvar
discrete signal representations to obtain the kernel of the transformation. The sig-
nal adaptive analysis is then possible using modulated or chirped bases, and can be
implemented with either masking or image segmentation on the TF plane.
The discrete evolutionary and Hough transform are innovatively used in jam-
mer excision techniques for spread spectrum communication system, for �nding un-
ambiguous IF estimation for a jammer composed of chirps. This interesting approach
is a piecewise linear approximation of the IF, concentrated along the individual com-
ponents of signal, using the Hough transform (used in image processing to infer the
23
presence of lines or curves in an image) and the ES [45]. The ef�ciency and prac-
ticality of this approach lie in localized processing, linearization of the IF estimate,
recursive correction, and minimum problems due to CTs in the TFDs or in the match-
ing of parametric models.
Barbarossa in [103] proposed a combination of the WD and the Hough trans-
form for detection and parameter estimation of chirp signals in a problem of detection
of lines in an image, which is the WD of the signal under analysis. This method pro-
vides a bridge between signal and image processing techniques, is asymptotically
ef�cient, and offers a good rejection capability of the CTs, but it has an increased
computational complexity. Barbarossa et al. further proposed an adaptive method
for suppressing wideband interferences in spread spectrum communications based
on high resolution TFD of the received signal [46]. The approach is based on the
generalized Wigner�Hough transform as an effective way to estimate the clear pic-
ture of the IF of parametric signals embedded in noise. The proposed method pro-
vides the advantages like, (1) it is able to reliably estimate the interference parameters
at lower SNR, exploiting the signal model, (2) the despreading �lter is optimal and
takes into account the presence of the excision �lter. The disadvantage of the pro-
posed method, besides the higher computational cost, is that it is not robust against
mismatching between the observed data and the assumed model.
Chaparro and Alshehri [47], innovatively obtain better spectral esimates and
use it for the jammer excision in direct sequence spread spectrum communications
24
when the jammers cannot be parametrically characterized. The non�stationary sig-
nals are represented using the TF and the frequency�frequency evolutionary trans-
formations. One of the methods, based on the frequency� frequency representation
of the received signal, uses a deterministic masking approach while the other, based
in non�stationary Wiener �ltering, reduces interference in a mean�square fashion.
Both of these approaches use the fact that the spreading sequence is known at the
transmitter and the receiver, and that as such its evolutionary representation can be
used to estimate the sent bit. The difference in performance between these two ap-
proaches depends on the support rather than on the type of jammer being excised.
The frequency�frequency masking approach is found to work well when the jam-
mer is narrowly concentrated in parts of the frequency�frequency plane, while the
Wiener masking approach works well in situations when the jammer is spread over
all frequencies.
Shah, Loughlin and Chaparro [245] have developed a method for generating
an informative prior when constructing a positive TFD by the method of MCE. This
prior results in a more informative MCE�TFD, as quanti�ed via entropy and mutual
information measures. The procedure allows any of the BDs to be used in the prior
and the TFDs obtained by this procedure are close to the ones obtained by the de-
convolution procedure at reduced computational cost. Further Shah and Chaparro
[64, 65] make use of the TFDs for the estimation of GTF of an LTV �lter with a goal
that once it is blurred, produces the TFD estimate. They used the fact that many of
25
these distributions are written as blurred versions of the GTF and made use of de-
convolution technique to obtain the de�blurred GTF. The technique is found general
and can be based on any TFD with many advantages like: (i) it estimates the GTF
without the need for orthonormal expansion used in other estimators of the ES, (ii) it
does not require the semi�stationarity assumption used in the existing deconvolution
techniques, (iii) it can be used on many TFDs, (iv) the GTF obtained can be used to
reconstruct the signal and to model LTV systems and, (v) the resulting ES estimate
out performs the ES obtained by using the existing estimation techniques and can be
made to satisfy the TF marginals while maintaining positivity.
The Power Spectral Density of a signal calculated from the second order sta-
tistics can provide valuable information for the characterization of stationary signals.
This information is only suf�cient for Gaussian and linear processes. Whereas, most
real�life signals, such as biomedical, speech, and seismic signals may have non�
Gaussian, non�linear and non�stationary properties. Addressing this issue, Unsal,
Akan and Chaparro [66], combine the higher order statistics and the TF approaches,
and present a method for the calculation of a time�dependent bispectrum based on
the positive distributed ES. This idea is particularly useful for the analysis of such
signals and to analyze the time�varying properties of non�stationary signals.
2.1.2 The Methods based on Cohen's Bilinear Class
The Cohen's BDs can be obtained from a general expression
26
C (t; !) =1
4�2
ZZZs�(�� 1
2�)s(�+
1
2�)(�; �)e�i�t�i�!+i��d�d�d� (2.1)
where C(t; !) is the joint distribution of signal s(t) and (�; �)is called the ker-
nel. The term kernel was coined by Classen and Mecklenbrauker [69]�[71]. These
two made extensive contributions to general understanding in signal analysis context
alongwith Jansen [72].
Many divergent attitudes toward the use, meaning, interpretation, and most im-
portantly improving the spectral quality of Cohen's BDs have arisen over the years.
The divergent viewpoints and interests have led to a better understanding and imple-
mentation. The subject is evolving rapidly and most of the issues are open. However
it is important to understand the ideas and arguments that have been given, as varia-
tions and insights of them has lead way to further developments.
2.1.2.1 The Scaled�Variant Distribution � A TFD Concentrated along the IF
In an important set of papers, Stankovic and co�workers [79, 88, 89] innova-
tively used the similarities and differences with quantum mechanics and originated
many new ideas and procedures to achieve the good resolution and high concentra-
tion in the joint distributions. Their initial work suggest use of the polynomial WD
[17, 18] to improve the concentration of monocomponent signals, taking the IF as
polynomial function of time. A similar idea for improving the distribution concen-
tration of the signal whose phase is polynomial up to the fourth order is presented in
27
[80]. In order to improve distribution concentration for a signal with an arbitrary non-
linear IF, the L� Wigner distribution (LWD) is proposed and studied in [15, 82, 83].
The polynomial WD, as well as the LWD, are closely related to the time�varying
higher order spectra [18, 83, 85]. They were found to satisfy only the generalized
forms of marginal and unable to preserve the usual marginal properties [1, 86].
The recent work by Stankovic is a variant of LWD obtained by mainly scal-
ing the phase while keeping the signals' amplitudes unchanged [79, 90]. This new
distribution is termed as the scaled variant of the LWD (SD) of a signal. The word
"pseudo" is used to indicate the presence of the window. The distribution achieves
high concentration at the IF�as high as the LWD�while at the same time satisfy-
ing time marginal and unbiased energy condition for any L. The frequency marginal
is satis�ed for asymptotic signals as well.
A method for the direct realization of the SD, based on the straightforward
application of a distribution de�nition, is presented in [89]. In the case of multicom-
ponent signals, this method produces signal power concentrated at the resulting IF,
according to the theorem presented in [79]. Theory is illustrated on the numerical
examples of multicomponent real signals. The proposed distributions may achieve
arbitrary high concentration at the IF, satisfying the marginal properties. Till the pub-
lication of [89], this was possible only in a very special case of the LFM signals using
the WD.
28
2.1.2.2 Reassigned TFDs
Some TFDs were proposed to adapt to the signal TF changes. In particular, an
adaptive TFD can be obtained by estimating some pertinent parameters of a signal�
dependant function at different time intervals [209]. Such TFDs provide highly lo-
calized representations without suffering QTFDs' CTs. The trade�off is that these
TFDs may not satisfy some desirable properties such as energy preservation. Ex-
amples of adaptive TFDs include the high resolution TFD [112], the signal�adaptive
optimal�kernel TFDs [131, 133], the optimal radially Gaussian TFD [132] and Co-
hen's nonnegative distribution [135]. Reassigned TFDs also adapt to the signal by
employing other QTFDs of the signal such as the spectrogram, the WD or the scalo-
gram [93]�[99]. The former types of adaptive TFDs are discussed under the name
Optimal�kernel TFDs in the following Section.
The method of reassignment improves the TF concentration and resolution by
mapping the data to those TF coordinates. which are closer to the true region of
support of signal under consideration. The method is presented by several researchers
with different names [93]�[99], including method of reassignment, remapping, TF
reassignment, and modi�ed moving�window method.
The reassignment method. The classical work on the method of reassignment
was �rst done by Kodera, Gendrin, and de Villedary. They gave it the name of mod-
i�ed moving window method [96]. The proposed technique enhances the resolution
in time and frequency of the the spectrogram by projecting each data point a new TF
29
coordinate. The new TF coordinate re�ects the distribution of energy in the analyzed
signal in a better way. This modi�cation of the spectrogram remained unused due to
implementation and ef�ciency issues. Later on Auger and Flandrin [93] showed and
applied this method advantageously to all the bilinear TF and time�scale represen-
tations. They called it the reassignment method. Also Nelson arrived at a method,
similar to Kodera, for improving the TF precision of short�time spectral data from
partial derivatives of the short�time phase spectrum [97].
2.1.2.3 Optimal�Kernel TFDs.
In fact a QTFD can be obtained by �rst smoothing the symmetric ambiguity
function (AF) by using the kernel function and then by taking a 2�D FT of the result.
This result is equivalent to a 2�D �ltering in the ambiguity domain. The properties of
distribution are re�ected by simple constraints on the kernel, and have been used ad-
vantageously to develop practical methods for analysis and �ltering, as was done by
Eichmann and Dong [16]. Excellent reviews relating the properties of the kernel to
the properties of the distribution have been given by Janse and Kaizer [73], Janssen
[72], Classen and Meclenbrauker [71], and Boashash [34]. By examining the ker-
nel one readily can ascertain the properties of the distribution. This allows one to
pick and choose those kernels that produce distributions with prescribed, desirable
properties. Thus, by a proper choice of kernel function, one can reduce or remove
the CTs in the analysis of multicomponent signal. This uni�ed approach is simple
with an advantage that all distributions can be studied together in a consistent way.
30
Generally the optimum kernel TFDs can be achieved by three different approaches to
optimizing the kernel with an aim to improve the resolution of resulting TFDs, which
are:
1. High resolution TFDs based on high spectral resolution kernels.
2. High resolution TFDs based on signal independant kernels.
3. High resolution TFDs based on signal dependant kernels.
2.1.2.3.1 High�Resolution TFDs�High Spectral Resolution Kernels
TFD's along with their temporal and spectral resolutions are uniquely de�ned
by the employed TF kernels. Potential kernels seek to map, at every time sample,
the time�varying signals in the data into approximately �xed frequency sinusoids in
the local autocorrelation function (LAF). Applying the Fourier transform to the LAF,
therefore, provides a peaky spectrum where the location of the peaks are indicative to
the signals' instantaneous power concentrations. The sinusoidal components in the
LAF, however, generally appear with some type of amplitude modulations, which
are highly dependent on the kernel composition [199]. Such modulation presents a
limitation on spectral resolution in the TF plane, as it may spread the auto and CTs
to localizations.
Because of the kernel modulation effects on the various terms, closely spaced
frequencies may not be resolved. Further, since TFD's are Fourier�based, then in ad-
dition to the amplitude modulations imposed by the kernels, the spectral resolution is
31
limited by and highly dependent on the extent of LAF, i.e., the lag window employed
[199]. However, increasing the length of the LAF will not always yield improved
resolution. Events occurring over short periods of time do not require large kernels,
which may only lead to increased CT contributions from distant events and obscure
the local auto terms. Limited availability of data samples may also provide another
reason for using small extent kernels. In these cases, improving spectral resolution of
a TFD can be achieved by parameterizing its LAF via autoregressive modeling tech-
niques [200]�[204]. Such parameterization seeks to �t a least�squares randommodel
to the second�order statistics of the LAF at different time instants. The autoregres-
sive modeling techniques, however, view the LAF as a stationary process along the
lag dimension. Since TF distribution kernels translate deterministic signals into oth-
ers of deterministic nature, it will be more appropriate to �t a deterministic, rather
than a stochastic, model to the LAF. Further, all modeling techniques applied in the
TFD context mostly have only dealt with pseudo�WD or the smoothed pseudo�WD
kernels.
Amin and Williams [199], have maintained that in addition to pseudo�WD
and the smoothed pseudo�WD of separable time and lag windows, there exists a
large class of TF kernels for which the LAF are amenable to high spectral resolution
techniques. The members of this class satisfy the desirable TF properties for power
localization in nonstationary environment, yet they produce LAFs that are amenable
to exponential deterministic modeling during periods of stationarity. The proposed
32
high spectral resolution kernels are, however, bound to meet two basic conditions
[199], (i) the frequency marginal, and, (ii) an exponential behavior in the ambiguity
domain for constant values of few parameters.
In dealing with sinusoidal data, the �rst property guarantees that the auto term
sinusoids in the LAF are undamped. The second property enforces an exponential
damping on all CTs. Resultantly the sinusoidal components translate into damped or
undamped sinusoids. High�resolution techniques such as reduced rank approxima-
tion of the backward linear prediction data matrix can then be applied for frequency
estimation. Amin and Williams use Prony's method and its other approximations
[205], [206] in the TF context. This method is shown to be applicable to high spec-
tral resolution TFD problems, speci�cally when the underlying LAF is made up of a
sum of exponentially damped/undamped sinusoids or chirp�like signals.
2.1.2.3.2 A High Resolution QTFD�Signal Independent Kernels
A signal independent kernel for the design of a high resolution and CTs free
quadratic TFD is proposed in [208]. The �ltering of the CTs in the ambiguity domain
that reduces (or removes) the CTs in the TF domain unfortunately results in a lower
TF resolution. That is, there is tradeoff between CTs suppression and TF resolution
in the design of a given quadratic TFD. Barkat and Boashash propose a kernel that
allows retaining as many auto terms energy as possible while �ltering out as much
CTs energy as possible. The kernel is de�ned in the time lag domain keeping in view
the implementation of the resultant TFD. This results in an alias free distribution
33
that can solve problems that the WD or the spectrogram cannot. In particular, the
proposed distribution is shown to resolve two close signals in the TF domain that the
two other distributions cannot.
2.1.2.3.3 Adaptive TFDs�Signal Dependant Kernel
It is shown that an adaptive TFD can be obtained by estimating some pertinent
parameters of signal dependant function at different time intervals. Such TFDs are
expected to provide highly localized representations withouth suffering from CTs.
The tradeoff is that these TFDs may not satisfy some desirable properties such as
energy preservation. Baraniuk and Jones have made use of the fact that symmetric
AF is the characteristic function of the WD. The mathematical and possible physical
analogy between the two enhances the interpretation of the properties of the AF.
Several different approaches have been developed optimizing the signal depen-
dant kernel for the TF analysis [130]�[133], including:
1. 1/0 optimal kernel TFD [131] approach in which the optimal kernel is given a
special binary structure,
2. Optimal Radially Gaussian kernel TFD [132] approach which tempers the
`1/0 kernel' where an additional smoothness constraint is used that makes the
optimal kernel to become the Gaussian along radial pro�les.
3. Signal Adaptive Optimal kernal (AOK) TFD [133] approach which varies with
time according to the radially gaussian kernel thus maximizing the performance.
34
2.1.2.4 Dispersive Class TFDs
These TFDs are also termed warping�based TFDs which provide a very good
concentration for STSC having non�linear TF characterstics, such as dolphin and
whale whistles, radar and sonar waveforms, and shock waves in fault structures. To
improve the processing of such signals, QTFDs that satisfy the dispersive GD shift
covariance property are designed by Papandreou, Hlawatsch and Boudreaux�Bartels
in [147]�[151].
Papandreou and Boudreaux�Bartels prove that for successful TF analysis, it is
advantageous to match the speci�c time shift of a QTFD with changes in the GD of
the signal. In some applications, signals with known GD need to be processed. As
a result, a matched QTFD can be designed with a characteristic function. When the
signal GD is not known a priori, some pre�processing is necessary before designing a
well matched QTFD. A rough GD estimate can be obtained by �tting a curve through
the spectrogram of the signal or by using one of the many porposed algorithms to
estimate GD or IF characteristics [26]�[30]. Because the phase function of the signal
needs to be one�to�one for designing its matched QTFD by appropriately warping
the WD or its smoothed versions, approximations of the GD function can also be
used.
Warping based TFDs� theoretical examples and advantages. Different dis-
persive QTFDs can be obtained that include the linear chirp class (warped af�ne
class) with linear GD; the hyperbolic class (warped Cohen's class) with hyperbolic
35
GD; the k�th power class (warped af�ne class) with k�th order power GD; and the
exponential class (warped af�ne class) with exponential GD.
Papandreau, Boudreaux�Bartels and co�workers [154],[166]�[168] demonstrate
the effectiveness of dispersive class QTFDs and the importance of matching STSC
with QTFDs using various simulations including constant and linear, constant and
hyperbolic, constant and exponential, constant and power TF structures and power
TF structures with real data. The QTFDs in all these cases show better resolution and
CT suppression. For example it is demonstrated the dispersive WD is highly local-
ized for the time modulation signal. Speci�cally, it is found that dispersive WD is a
dirac delta function at GD of the signal. This means that the dispersive WD is ideally
matched to time modulation signals when the GD in the dispersive WD formulation
matches the GD of the signal. It is important to note that a dual dispersive class can
be similarly obtained to match the dispersive FM signals by preserving dispersive IF
shift [157].
Another example is the af�ne class that is actually the power class, the corre-
sponding power class is the linear chirp class that is well matched to signals with
linear TF characteristics. Two QTFDs from the linear chirp class are the linearly
warped WD and the chirpogram. These are obtained when the WD and the spectro-
gram, respectively, are warped with quadratic characteristics function. The linearly
warped WD is found to provide high localied representations when analyzing linear
time mudulation signals. On the other hand, the chirpogram has a de�nite TF reso-
36
lution advantage over the spectrogram when analyzing multicomponent signals with
linear characteristics. This is because the smoothing operation of the chirpogram
is performed along lines of any slope in the TF plane wheras the smoothing of the
spectrogram is only along horizontal or vertical lines [158].
Through various examples, Papandreou et al. prove the power QTFDs as ideal
for signals that propagate through linear systems with speci�c power GD character-
istics such as when a wave propagates through a dispersive medium [154]. Other
signals that are matched to k�th power QTFDs include the dispersive propagation of
shock wave in a steel beam (k = 0:5) [159, 161]; transionospheric signals mesured
by satellites (k = �1) [160]; acoustical waves re�ected from a spherical shell im-
mersed in water [162]; some cetacean mammal whistles [165], and diffusion equation
based waveforms (k = 0:5) [163](e.g., waves from uniform distubuted radio com-
munication transmission lines [164]).
Limitations. Warping based or Dispersive QTFDs could be computationally
intensive when implemented directly using numerical integration as in the case of
warping WD to obtain power WD. Papandreou et al. suggest an alternative imple-
mentation scheme that allows the use of existing ef�cient algorithms for computing
Cohen's class or Af�ne class QTFDs as done by them in [154] for power QTFDs.
However the increased computational complexity of the dispersive QTFDs is the
trade�off for the improved performance in analyzing signals with matched dispesive
GD characteristics.
37
2.1.2.5 TFDs with Complex Arguments
One of the most important concept to improve concentration in case of non�
linear structures is the complex argument distributions (CADs) introduced by Srdjan
Stankovic and LJ Stankovic [251, 252] and generalized later by Cornu et al. [253].
The purpose is to give a distribution that is highly concentrated along the GD and in
turn to the IF for the mono and multicomponent signals. The CADs use the concepts
of complex�frequency and complex-lag arguments in two domains of the Laplace
and the time [252]. The two forms successfully produce the concentrated representa-
tions along the GD or the IF. As the signal is available along the real time axis only,
a complex�valued argument form of the signal is computed using certain tools pro-
posed by Stankovic in [252]. The relation between the FT and the Laplace transform
and the analytic extension of the signal are used in these tools [7].
Generalized representations of phase derivative for regular signals. A recent
work in the same category are the generalized complex�lag distributions proposed
by Cornu et al. in [253]. These distributions are based on generalized complex�lag
moment and give the arbitrary instantaneous phase derivative (IPD) representation,
producing high concentration. An accurate IF estimation is obtained by these distri-
butions even in the case of signi�cant IF variation. Moreover a slight modi�cation of
the generalized CAD can result in accurate IF rate estimation like some of the exist-
ing method (e.g. [254]). Higher order TF rate distributions in this type can result in
38
better IF concentration. These distributions are parameterized by two integersK and
N .
2.1.2.6 TFDs Based On Signal Expansions
The wide scope of patterns embedded in complex signals and the precision of
their characterization motivate decompositions over large and redundant dictionaries
of waveforms. Linear expansions in a single basis, whether it is a Fourier, wavelet,
or any other basis, are not �exible enough. In Fourier and wavelet basis, it is dif�cult
to detect and identify the signal patterns from their expansion coef�cients, because
the information is diluted across the whole basis. Due to this reason, the alternatives
are found to traditional signal representations in form of alternate dictionaries instead
of representing signals as superpositions of sinusoids. Out of such dictionaries one
can �nd the wavelets, steerable wavelets, segmented wavelets, Gabor dictionaries,
multiscale Gabor dictionaries, wavelet packets, cosine packets, chirplets, warplets,
and a wide range of other dictionaries.
There is an explosion of interest for obtaining signal representations in over-
complete dictionaries4, ranging from general approaches, like the method of frames
[11] and the method of matching pursuit (MP) [175], to specialized dictionaries, like
the method of best orthogonal basis [174]. There are both advantages and short-
comings of these classical approaches. The expansion of the STSC into an in�nite
4 Because they start out that way or because complete dictionaries are merged, obtaining a newmegadictionary consisting of several types of waveforms (e.g., Fourier and wavelets dictionaries).
39
number of TF shifted versions of a weighted elementary atom based on these meth-
ods and then applying suitable TF transform method like WD will result in highly
concentrated and good resolution TFDs. Some important signal expansion concepts
and the resulting TFDs are presented in succeeding paragraphs, from which the TF
research community has specially been bene�tted.
2.1.2.6.1 Matching Pursuits TFDs with TF Dictionaries.
Mallat and Zhang [175] introduce an algorithm called MP, that decomposes
any signal into waveforms selected among a dictionary of TF atoms. These atoms
are somehow like the dilations and translations, and somewhere like the modulations
of a single window function. This is achieved using successive approximations of
the signal with orthogonal projections on dictionary elements. Literature indicates
similar algorithms proposed by Qian and Chen [213] for Gabor dictionaries and for
Walsh dictionaries by Villemoes [214]. The MPs provide extremely �exible signal
representations since the choice of dictionaries is not limited. Moreover the proper-
ties of the signal components are explicitly given by the scale, frequency, time and
phase indexes of the selected atoms. This representation is therefore well adapted
to information processing. Although an MP is nonlinear, like an orthogonal expan-
sion, it maintains an energy conservation which guaranties its convergence. Mallat
and Zhang then derive a TF energy distribution which is obtained by addition of the
WD of the chosen TF atoms. This distribution thus obtained is free of interference
40
terms and provides a clearer picture quite contrarily to the WD or Cohen's class dis-
tributions.
Compact signal coding is another important domain of application of MPs. For
a given class of signals, if the dictionary can be adapted to minimize the storage for
a given approximation precision, better results are guaranteed than decompositions
on orthonormal bases. Indeed, an orthonormal decomposition is a particular case
of MP where the dictionary is the orthonormal basis. For dictionaries that are not
orthonormal bases, the inner products of the structure book and the indexes of the
selected vectors need coding. This requires to quantize the inner product values
and use a dictionary of �nite size. The MP decomposition is then equivalent to a
multistage shape�gain vector quantization in a very high dimensional space. For
information processing or compact signal coding, it is important to have strategies
to adapt the dictionary to the class of signal that is decomposed. If enough prior
information is available, the dictionary can be adapted to the probability distribution
of the signal class within the signal space. Finding strategies to optimize dictionaries
in high dimensions is an open problem that shares similar features with learning
problems in NNs.
2.1.2.6.2 Basis Pursuit TFDs.
The basis pursuit proposed by Chen, Donoho, and Saunders [173] uses con-
vex optimization to �nd signal representations in overcomplete dictionaries. They
obtain the decomposition that minimizes the �1 norm of the coef�cients occurring in
41
the representation. The optimization principle leads to decompositions that is much
sparser. Also this can superresolve as it is based on global optimization. This tech-
nique can be used with noisy data by solving an optimization problem keeping in
view a quadratic mis�t measure. One can easily identify the important connections
between basis pursuit and the other methods like Mallat and Zhong's MP [175] mul-
tiscale edge representation and the total variation�based denoising methods of Rudin,
Osher, and Fatemi's [202].
2.1.2.6.3 TFDs based on Empirical Mode Decomposition.
A new data�driven technique termed as empirical mode decomposition (EMD)
is introduced by Huang et al. [176]. In their original paper, Huang et al. introduce
a general two step method in analysing the data. The data is �rst preprocessed by
the EMD, resulting into a number of intrinsic mode function (IMF) components. In
this way the data is expanded in a basis taken from the data itself. Then the Hilbert
transform is applied in the second step to the IMFs. Later on the energy�frequency�
time distribution is constructed which is designated as the Hilbert spectrum. This
Hilbert spectrum can preserve the time localities of events. This construction of TFD
is offcourse not limited to any one technique, and the better methods may be used to
get TFDs that become highly localized in TF domain.
The EMD has received more attention in terms of applications [177]�[189]
and interpretations [190, 191]. The EMD gives the main bene�t of deriving the basis
functions from the signal itself, thus the analysis is adaptive.
42
The idea is to decompose time series into superposition of components with
well de�ned IFs i.e. the IMFs. The components should (approximately) obey earlier
requirements of completeness, orthogonality, locality and adaptiveness. Next con-
struct the Hilbert spectrum of each IMF, representing it in the TF plane. However the
appropriate TF representation (e.g. reassignment method) of the decomposed IMF
result into highly concentrated TFDs [193].
2.1.2.6.4 Matching Pursuit adaptive TFDs.
A novel approach to extract the IF from its adaptive TFD is proposed recently
by Krishnan [170]. The adaptive TFD of a signal is obtained by decomposing the
signal into components with reasonable TF localization and by combining the WD of
the components. The adaptive TFD thus obtained is free of CTs and is a positive TFD
but it does not satisfy the marginal properties. The marginal properties are achieved
by applying the MCE optimization to the TFD. Then, IF may be obtained as the �rst
central moment of this adaptive TFD. Krishnan has shown successful extraction of
the IF of a set of real�world and synthetic signals of known IF dynamics with the
proposed method. In [171], a solution to the multicomponent problem was given
by proposing an algorithm to select an optimal TFD from a set of TFDs for a given
signal. Krishnan, in his approach, has addressed the same problem by constructing
TFDs according to the application in hand, that is, he has tailored the TFD according
to the properties of the signal being analyzed. In his method, by using constraints,
the TFDs are modi�ed to satisfy certain speci�ed criteria. It is assumed that the
43
given signal is somehow decomposed into components of a speci�ed mathematical
representation. By knowing the components of a signal, the interaction between
them can be established and used to remove or prevent CTs. This avoids the main
drawback associated with Cohen's class TFDs.
2.2 Objective Assessment Methods
It is a fact that choosing the right TFD to analyze the given signal is not straight-
forward, even for monocomponent signal, and becomes more complex while deal-
ing with multicomponent signals. The common practice to determine the best TFD
for the given signal have been the visual comparison of all plots with the choice
of most appealing one. As an example, various BDs of real life multicomponent bat
echolocation chirp signal [134] are shown in Fig. 2.1, which include the spectrogram,
WD, Zhao�Atlas�Marks distribution (ZAMD), Margenau�Hill distribution (MHD),
CWD, and Born�Jordan distribution (BJD). Less interference and better component
separation is obvious for spectrogram and CWD than the other considered TFDs.
However this selection is generally dif�cult and subjective. The need to objectively
compare the plots in Fig. 2.1 requires the introduction of some quantitative perfor-
mance measure speci�cally tailored for TFDs. The estimation of signal information
and complexity in the TF plane is quite challenging. The themes which inspire new
measures for estimation of signal information and complexity in the TF plane, in-
clude the suppression of TFDs' cross components, the concentration and resolution
44
of auto�components and the ability to correctly distinguish closely spaced compo-
nents. Ef�cient concentration and resolution measurement can provide a quantitative
criterion to evaluate performances of different distributions. They conform closely to
the notion of complexity that are used when visually inspecting TF images [37].
Concentration of a TFD is one of TFD's very important and extensively stud-
ied properties [31, 90]. For a monocomponent signal, performance of its TFD is
usually de�ned in terms of its energy concentration about the signal IF [14]. To
measure distribution concentration for monocomponent signals, Gabor [120], Vak-
man [121], Janssen [72], and Cohen [14] made important initial contributions. For
more complex signals, some quantities in the statistics were the inspiration for de�n-
ing measures for TFDs in the form of: the ratio of distribution norms by Jones and
Parks [112], the Rényi entropy by Williams et al. [114, 116] and Baraniuk et al.
[37], and distribution energy for optimal kernel distributions' design by Baraniuk
and Jones [132]. A simple measure for distributions concentration was presented by
Stankovic [172] based on the de�nition of duration of time limited signals. Boashash
and Sucic [33], on the other hand, combined the characteristics of TFDs like main-
lobe, sidelobe magnitudes, instantaneous bandwidth and the signal IF to de�ne an
instantaneous concentration measure.
For multicomponent signals, resolution is equally important to evaluate the per-
formance of its TFD alongwith the energy concentration it attains along the IF of each
component present in the signal. The good TF resolution of the signal components
45
(a) (b)
(d)(c)
(f)(e)
Figure 2.1: TFDs of a multicomponent bat echolocation chirp signal. (a) Spectro-gram (Test Input to the BRNNM)[Hamming window of length L = 100], (b) WVD,(c) ZAMD, (d) MHD, (e) CWD [kernel width =1], (f) BJD.
46
requires a good energy concentration for each of the components and a good suppres-
sion of any undesirable artifacts. The resolution may be measured by the minimum
frequency separation between the component' mainlobes for which their magnitudes
and bandwidths are still preserved [33].
Keeping above in view, some important thoeretical measures are selected, ran-
domly discussed in literature, to evaluate the proposed ANN based framework. They
include ratio of norms based measures [100], Shannon & Rényi entropy measures
[117, 118], normalized Rényi entropy measure [114], LJubisa measure [172] and
Boashash performance measures [33]. An alteration for concentration measure in
[33] is suggested and implemented to get the true picture of multicomponent TFDs'
concentration. A brief overview of these measures is presented next.
2.2.1 Entropy Measures
The terms entropy, uncertainty, and information are used more or less interchange-
ably and is the measure of information for a given probability density function. Sim-
ilarly it can be applied to TFDs to quantify the information by measuring the signal's
complexity [37, 116, 119]. By the probabilistic analogy, minimizing the complex-
ity or information in a particular TFD is equivalent to maximizing its concentration,
peakiness, and, therefore, resolution [100].
47
2.2.1.1 Shannon entropy
The well known Shannon entropy [117] for TFD of unit energy signals, can be
expressed as
EShannon = �Xn
X!
Q(n; !) log2(Q(n; !)) (2.2)
The classical Shannon entropy is a natural candidate for estimating the concen-
tration of a TFD and can be viewed as the inverse of a measure of concentration of
the distribution in the TF plane. The peaky TFDs of signals with high concentration
would yield small entropy values and vice versa. The negative values taken on by
most TFDs prohibit the application of the Shannon entropy due to the logarithm in
Eqn. (2:2). By taking into account the absolute value of the distribution ensures that
the integrated logarithm exists.
2.2.1.2 Rényi entropy
It is introduced as a more appropriate way of measuring the TF uncertainty
sidestepping the negativity issue, derived from the same set of axioms as the Shannon
entropy [36, 114]. The only difference being the employment of a more general
exponential mean instead of the arithmetic mean in the derivation [116], given as
ERENY I� =1
1� � log2
Xn
X!
Q�(n; !)
!(2.3)
48
where � is the order of Rényi entropy, which has been taken as 3 being the smallest
integer value to yield a well�de�ned, useful information measure for a large class
of signals. The generalized entropies of Rényi inspire new measures for estimating
signal information and complexity in the TF plane. When applied to a TFD from
Cohen's class, they conform closely to the notion of complexity that we use when
visually inspecting TF plots.
2.2.2 Normalized Entropy Measures
The Rényi entropy measure with � = 3 does not detect zero mean CTs, so normal-
ization either with signal energy or distribution volume is necessary [114]. It will
also reduce a distribution to the unity signal energy case.
2.2.2.1 Normalization with the signal energy
It is important for comparison of various distributions, or the same distribution
when it is not energy unbiased. Behavior of this measure is quite similar to the
nonnormalized measure form, except in its magnitude. By de�nition Rényi entropy
normalized by signal energy is given by:
ENRE� =1
1� � log2�P
n
P!Q
�(n; !)Pn
P!Q(n; !)
�with a � 2 (2.4)
2.2.2.2 Normalization with the distribution volume
The expression for this type of entropy measure can be written as:
49
ENRE� =1
1� � log2�P
n
P!Q
�(n; !)Pn
P! jQ(n; !)j
�with a � 2 (2.5)
This form of measure has been used for adaptive kernel design in [114]. If the distri-
bution contains oscillatory values, then summing them in absolute value means that
large CTs will decrease this measure, indicating smaller concentration due to CTs
appearance.
2.2.3 Ratio of Norms based Measure
Jones and Parks proposed a measure of concentration created by dividing the fourth
power norm of TFD Q(n; !) by its second power norm, given as [112]:
EJP =
Pn
P! jQ(n; !)j
4�Pn
P! jQ(n; !)j
2�2 (2.6)
The fourth power in the numerator favors a peaky distribution. To obtain the
optimal distribution for a given signal, the value of this measure should be the maxi-
mum:
Q(n; !)optimum ) argmaxQ[EJP ] (2.7)
2.2.4 LJubisa Measure
This is a simple criterion, presented by Stankovic [172], for objective assessment of
TFD concentration makes use of the duration of time limited signals. If a signal x(�)
50
is time limited within the interval � 2 [�1; �2], i.e. x(�) 6= 0 only for the speci�ed
interval , then the duration of signal is = �2 � �1. Consequently we can denote
� = lim�!1R1�1 jx(�)j
1=� d�.
It is assumed in the derivation that the distribution Q(�; !) 6= 0 only for
(�; !) 2 Dx(�; !). For a large �, we may express mathematically
J� �Z 1
�1
Z 1
�1jQ(�; !)j1=� d�d! (2.8)
!ZZ
Dx(�;!)
1:d�d! = Sx
where Sx is the area of Dx(�; !). As a criterion for the distributions concentration
measure it is assumed that among several given unbiased energy distributions, the
best concentrated is the one having the smallest Sx. Value of J�raised to the �th
power is referred to as the LJubisa concentration measure. Its discrete form is
J [Q(n; !)] � J�� = X
n
X!
jQ(n; !)j1=�!�
(2.9)
withP
n
P!Q(n; !) = 1 being the normalized unbiased energy constraint, and
� > 1. The best choice according to this criterion (optimal distribution with respect
to this measure) is the distribution that produces the minimal value of J [Q(n; !)].
2.2.5 Boashash Performance Criteria
These objective measures proposed by Boashash not only take into account the con-
centration but also TFDs' resolution aspects for a practical analysis in the case of
51
signals with closely spaced components. The characteristics of TFDs that in�uence
their resolution, such as components concentration and separation and interference
terms minimization, are combined to de�ne these measures [33].
2.2.5.1 Concentration measure
A time slice (t = t0) of a typical quadratic TFD of an n�component signal will
have the instantaneous bandwidth, the IF, the sidelobe magnitude, and the mainlobe
magnitude for each of the nth component at time t = t0 denoted by Vin(t0), fin(t0),
ASn(t0), and AMn(t0). AX(t0) may be used to represent the CTs magnitude.
At any instant of time, concentration performance of a TFD will improve if it
minimizes sidelobes magnitudesASn(t) relative to mainlobe magnitudesAMn(t) and
mainlobe bandwidth Vin(t) about the signal IF fin(t) for each signal component [33].
Consequently for a given time slice t = t0 of TFD �z(t; f) of an n�component signal
z(t) =Pzn(t), the signal's TFD concentration performance is quanti�ed by [33]:
c/n(t) =ASn(t)
AMn(t)
Vin(t)
fin(t)(2.10)
where c/n(t) is the concentration measure for each signal component.
2.2.5.2 Modi�ed concentration measure
An alternative to the measure c/n is suggested, which is de�ned in Eqn. (2:10).
It combines ASn(t)=AMn(t), and Vin(t)=fin(t) into a sum, rather than a product and,
therefore, account for their effects more independently. The newmeasure gives better
52
picture of TFDs' concentration performance, even for those having no sidelobes.
This results in the following de�nition for the instantaneous concentration measure
for each signal component in z(t) =Pzn(t):
Cn(t) =ASn(t)
AMn(t)+Vin(t)
fin(t)(2.11)
The good performance of a TFD is characterized by a close to zero value of this
measure.
2.2.5.3 Resolution measure
The frequency resolution in a power spectral density estimate of a signal com-
posed of two single tones f1 and f2 is de�ned as the minimum difference f2 � f1 for
which the following inequality holds:
f1 +V12< f2 �
V22; f1 < f2 (2.12)
where V1 and V2 are the bandwidths of the �rst and the second sinusoid, respectively.
From Eqn. (2:12) and earlier discussion, the frequency resolution of TFD for
a pair of components in a multicomponent signal may be quanti�ed by the minimum
difference fi2(t)� fi1(t) (fi1(t) < fi2(t)) for which a separation measureD between
the components' mainlobes, centred about their respective IFs fi1(t) and fi2(t), is
positive. D(t) is a measure of the components' mainlobes separation in frequency,
which is de�ned as
53
D(t) =
�fi2(t)�
Vi2 (t)
2
���fi1(t)�
Vi1 (t)
2
�fi2(t)� fi1(t)
= 1� Vi(t)
4fi(t)(2.13)
where Vi(t) =PVin=2 is the components' mainlobes average instantaneous band-
width, and 4fi(t) = fin+1(t) � fin(t) is the difference between the components'
IFs. The measure D(t) requires computations for each adjacent pair of components
present in the signal indicated by subscript n.
In order to get better resolution performance of quadratic TFDs, the separation
measure D should be maximized and, concurrently, the interference terms (CTs and
components' sidelobes) should be minimized. The imposed constraints thus result in
an overall measureR of the resolution performance of a TFD for a pair of components
in a multicomponent signal expressed as [33]
R(t) =AS(t)
AM(t)
AX(t)
AM(t)
1
D (t)(2.14)
where AM(t) =PAMn(t)=2, AS(t) =
PASn(t)=2, and AX(t) are, respectively,
the average magnitude of the components' mainlobes, the average magnitude of the
components' sidelobes and the CT magnitude of any two adjacent signal compo-
nents. Good resolution performance of TFD for a given pair of components in a
multicomponent signal is characterized by a small (close to zero) positive value of
the measure R.
In order to make the measure close to 1 for good performing TFDs and 0 for
poor performing ones (TFDs with large interference terms and components poorly re-
54
solved), the normalized instantaneous resolution performance measure Ri is de�ned
as [33]:
Ri(t) = 1�1
3
�AS(t)
AM(t)+1
2
AX(t)
AM(t)+ (1�D(t))
�0 < Ri(t) < 1 (2.15)
2.3 Summary
The most fundamental and challenging aspects of analysis are the clear understand-
ing of a time�varying spectrum, and the representation of the properties of a signal
simultaneously in time and frequency without any ambiguity. Historically various
different TF techniques are de�ned for achieving these tasks, however it is impor-
tant to search for the one that �ts to the application. Consequently the �rst part of
this Chapter attempts to describe the importance of high concentration and good res-
olution for the TF signal processing. It speci�cally mentions the motivations and
ingenuity of various researchers to implement newer techniques to improve the spec-
trum in TF domain. Basic concepts of various methods and well�tested algorithms
are discussed that emphasize the signi�cance of the technique to the analysis sig-
nals. Indeed different applications have different preferences and requirements to the
TFDs. In general the choice of a TFD in a particular situation depends on many fac-
tors such as the relevance of properties satis�ed by TFDs, the computational cost and
speed of the TFD, and the tradeoff in using the TFD.
55
An imperative discussion is presented in the second part of this Chapter on the
description of objective methods of assessment to evaluate the TFDs' concentration
and resolution performance. These methods are used to quantify and compare the
de�blurred TFDs obtained by the proposed ANN based framework in Chapter 5.
56
Chapter 3Neural Network based Framework forComputing De�blurred TFDs�Part I
This Chapter gives the preliminary emergent ANN based framework and pro-
ceeds with its optimization for computing de�blurred TFDs. The work in Section 3.1
formulates the problem, states various constraints and presents an initial ANN based
framework to solve the problem. Section 3.2 presents comparison of ANN training
algorithms and selects the LMB as the most optimum training algorithm. Further
an experimental study is presented on optimizing the design and architecture of the
ANN setup to include number of neurons, layers, and type of activation functions in
Section 3.3. In Section 3.4, the advantage of clustering the data and training multiple
ANN for each cluster is ascertained.
3.1 TFDs using ANN�Binary Case
The goal is to obtain a TFD that is free of any blurring effect. Furthermore, no
knowledge of the components is assumed to be known ahead of time. For this the
binary spectrogram of several known signals are used as input to train an ANN (Figs.
3.2 and 3.3). As a target a TFD is used that is constructed by adding the expected TFD
for each of the individual component present in the signal. The expected individual
TFDs are constructed by considering the IF of each of the component present in the
57
signal (Figs. 3.4 and 3.5). Fig. 3.1 is the graphical explanation of the method. The
TFD is considered as 2�D image matrix. As an initial work both the training and
target TFDs are changed to binary versions. The corresponding region vectors in
both the input and target TFDs are paired to form the training and validation sets.
The entropy [107] of TFD Q(n; !) is considered here as measure of concentration
given by:
EQ = �N�1Xn=0
Q (n; !) log2Q (n; !) d! � 0 (3.1)
The lower the entropy of a distribution, the more concentrated it is.
3.1.1 Selected ANN Architecture
The method uses the LMB feed forward ANN training algorithm with �ve neurons
in a single hidden layer. There are three and one neuron respectively in the input and
output layers.
3.1.1.1 Training Set
The spectrograms of two known signals are used as an initial blurred estimate.
The �rst is a sinusoidal FM signal, while the second signal is composed of two par-
allel chirps. The �rst training signal is given by:
trg1 = ei�[ 12�!(n)n] with ! (n) = 0:1 sin
�2�n
N
�(3.2)
while the second signal is given by:
58
Preprocessing
X1, X2, , Xj
Area ofconcern
TargetNormalizedimages
Figure 3.1: Graphical explanation of the method
59
Figure 3.2: Input training TFD image of sinusoidal FM signal.
trg2 = ei!1(n)n + ei!2(n)n (3.3)
with !1(n) =�n
4Nand !2(n) =
�
3+�n
4N
here N refers to the total number of sampling points in the signal.
The binary spectrograms of these signals are shown as Figs. 3.2 and 3.3. The
respective target TFDs computed from known IF laws are shown in Figs. 3.4 and 3.5.
3.1.2 Test Result
A bat echolocation chirp provides an excellent motivation for TF based signal process-
ing. Fig. 3.6 shows the TFD for this signal obtained by an existing optimum kernel
method (OKM) [132]. The spectrogram (depicted in Fig. 3.7) of this chirp signal is
60
Figure 3.3: Input training TFD image of parallel chirps.
Figure 3.4: Target TFD image for the sinusoidal FM signal.
61
Figure 3.5: Target TFD image of parallel chirp signal.
Table 3.1: Comparison of Entropy
Description Proposed Approach using ANN OKM used by [132] SpectrogramEQ(bits) 7.301 11.826 12.798
used to test the trained ANN. The resulting TFD is shown in Fig. 3.8. This resultant
TFD is highly concentrated and has the lowest entropy as shown in Table 3.1.
3.2 Analysis &Comparison of the ANNTraining Algorithms
While working with an ANN, there are some fundamental questions like how the
weights are initialized?, how is the learning rate chosen?, how many hidden layers
and how many neurons be chosen?, how many examples to include in the training
set?, and what should be the training algorithm?. In this section, the effect of using
62
Figure 3.6: Bineary TFD obtained by the OKM [132].
Figure 3.7: Spectrogram of the bat echolocation chirp signal.
63
Figure 3.8: The deblurred TFD obtained by the proposed ANN model.
different training algorithms on the ANN's performance is observed and the best
training algorithm is found for the task of de�blurring TFDs.
The algorithms analyzed are the PBCGB, RPB, GDALB and the LMB, being
the most frequently used in the ANN literature. The theoretical description of these
algorithms is provided in Appendix A, Section A.5. The progress made here is the
use of grayscale spectrograms instead of binary, of two known signals as input to
train the ANN (Figs. 3.9 and 3.10). The target TFDs are still binary in nature, that
are constructed by adding the expected TFDs for each of the individual component
present in the signal (Figs. 3.4 and 3.5). This training is carried out with the above�
mentioned four training algorithms. The spectrogram (Fig. 3.11) of combined chirp
and sinusoidal FM signal is used as the test TFD. To measure the information in the
resultant TFD Q(n; !), the entropy is used given by Eqn. (3:1).
64
Figure 3.9: Input training TFD image of the sinusoidal FM signal.
Figure 3.10: Input training TFD image of parallel chirps signal.
65
3.2.1 The Method
It is important to note that the input TFDs for ANN are grayscale, consequently
the method to process them has been different than the one described in previous
Section. The idea is based on divide and conquer. A complex computational task
is divided into a set of less complex tasks. The solutions of these are combined
at a later stage to produce the solution to the original problem. It is of paramount
importance to pre�process the available data to make it suitable for training an ANN.
By pre�processing it is meant to convert attributes into variables. This is achieved in
four steps, (i) taking TFD as a matrix, (ii) vectorization of TFD image matrix, (iii)
clustering of these vectors, and (iv) forming pairs of vectors from the input and target
TFDs.
Vectors of appropriate length are obtained from the TFD image matrix, while
working along each row. Based on visual output result the length is decided to be
1� 3 (i.e. three pixels along a row). Two vector spaces are formed by accumulating
the vectors of the spec�ed length by doing it for both training (Figs. 3.9 and 3.10)
and target TFD images (Figs. 3.4 and 3.5). Next the vectors are clustered by �nding
the correlation of each vector with three directional vectors, each representing one
type of edge. The objective is to divide the input space into number of subspaces, Sn,
described by directional unit vectors, vn, that correspond to some useful information.
This creates a certain clustering effect on the input vectors since a vector will lie in
the subspace Sn represented by vn that is most similar to this vector with respect to
66
its information content. After experimenting with subspace's sizes and observing the
�nal results, the number of subspaces is decided to be three. This value is chosen
because it not only gives the advantage of clustering but also has the lowest effect on
the computational complexity of the algorithm. Three directional vectors are used to
characterize three types of edges in the image. This choice is dictated by the problem
of de�blurring. Here are few issues that are considered:
� Edges are important image characteristics.
� Blurring causes loss of edge information from images.
� The process of de�blurring may produce a more useful image if enhancement
is also achieved along with de�blurring.
For each such cluster, an ANN is trained by the four selected training algo-
rithms. By doing so the problem of forcing one network to learn input vectors that
are distant from each other is eliminated. Of course, the choice of number of di-
rectional vectors remains dependent on the problem. A pseudo procedure is given
as:
1. The TFDs' matrices are converted to vectors of speci�c length.
2. The number of subspaces Ns are decided based on suitable criterion.
3. The subspace direction vector vn, (n = 1; 2; : : : :; Ns) are selected that will best
represent the subspace Sn.
67
4. The direction vectors vn are normalized.
5. The correlation between each input vector and vn is found and it is assigned to
the corresponding subspace. Three directional vectors vh; vc; vl are computed in
the following manner:
(a) vh is obtained by rearranging (any) 3 integers in descending order,
representing a downhill edge.
(b) vc is obtained by rearranging (any) 3 integers in a triangular fashion where
the highest value occurs in the middle and values on either side are in
descending order. This represents a triangular type of edge.
(c) vl is obtained by rearranging (any) 3 integers in ascending order,
representing an uphill edge.
6. For each cluster, an ANN is trained using the training vectors that include the
input vectors obtained from the spectrograms and the mean of respective vectors
from the binary target TFDs.
7. The trained ANNs are then tested using unknown blurred TFDs after
vectorization and clustering according to step 1 and step 5. The test vectors
are fed to the network which is specially trained for that type of vector. The
resulting values are zero�padded and are placed at the original grid locations to
construct the resultant TFDs.
68
3.2.2 Selected ANN Architecture
The performance of four short listed algorithms is compared based on the entropy
values of resultant TFDs. For this, single hidden layer is used with 10 neurons. The
neurons in the input and output layers are �xed at 3 and 1 respectively. Single hidden
layer is used because most of the non�linear problems can be solved with single hid-
den layer with suitable choice of neurons [218]. The `tansig' and `purelin' transfer
function are used in between input�hidden layers and hidden�output layers respec-
tively representing the hidden layer of sigmoid neurons followed by an output layer
of linear neurons. Multiple layers of neurons with nonlinear transfer functions allow
the network to learn nonlinear and linear relationships between input and output vec-
tors. The linear output layer lets the network produce values outside the range �1 to
+1.
3.2.2.1 Training Set
The spectrograms of two known signals, given in Eqns. (3:2) and (3:3), are
used as input for training the ANNs. The grayscale spectrograms of these signals are
shown as Figs. 3.9 and 3.10. The respective target TF plane images are shown in
Figs. 3.4 and 3.5.
3.2.3 Performance Evaluation
The performance of four selected algorithms is compared on the basis of parameters
like time to coverge, MSE in last epoch and entropy values of resultant TFDs. The
69
Figure 3.11: Test TFD image of combined sinusoidal FM & parallel chirps signal.
spectrogram (depicted in Fig. 3.11) of combined chirp and sinusoidal FM signal is
used as the test TFD. The results are shown in Figs. 3.12 to 3.15, the output TFDs
obtained by the RPB, PBCGB, GDALB and the LMB training algorithms respec-
tively. The visual result indicate that the result obtained by the ANN trained by the
LMB training algorithm is more concentrated along the individual components. The
comparative graph in Fig. 3.16 indicates that the LMB algorithm converges sharply
within few epochs and results in lesser error as compared to other training algorithms.
Furthermore it has the lowest entropy of all as shown in Table 3.2. However if vari-
ous other parameters used as default in these algorithms are altered appropriately by
an exhaustive approach, then results can be improved for other algorithms as well.
70
Figure 3.12: The resultant TFD obtained after passing the spectrogram of the testsignal through the trained ANN with RPROP backpropagation algorithm.
Figure 3.13: The resultant TFD obtained after passing the spectrogram of the test sig-nal through the trained ANN with Powell-Beale conjugate gradient back propagationalgorithm.
71
Figure 3.14: The resultant TFD obtained after passing the spectrogram of the test sig-nal through the trained ANN with Gradient descent with adaptive lr backpropagationalgorithm.
Figure 3.15: The resultant TFD obtained after passing the spectrogram of the testsignal through the trained neural network with Levenberg-Marquardt training algo-rithm.
72
Figure 3.16: The comparative graph which shows error convergence with respect tonumber of iterations.
73
Table 3.2: Comparison of Training Algorithms
Training Algorithms LMB PBCGB GDALB RPBParameters
MSE performance Available Not available Not available Not availableLearning rate Available Not available Available AvailableMemory/speed trade off Available Not available Not available Not availableTolerance for linear search Not available Available Not available Not availableEntropy values (EQ bits) 7:30 8:03 12:79 11:83Error converged 4:49� 10�5 3:69� 10�4 4:32� 10�2 8:25� 10�4Time taken for convergence(sec) 30:18 26:53 14:43 25:46
3.3 Impact of varying number of Neurons and the HiddenLayers
This section presents an experimental investigation, to know the effect of varying the
number of neurons and hidden layers in feed forward back propagation ANN archi-
tecture, for obtaining de�blurred TFDs. The discussion highlighting the signi�cance
of neurons and hidden layers in an ANN is presented in Appendix A, Sections A.2
and A.3. It is important to note that varying the number of neurons and hidden layers
in an ANN architecture for a speci�c problem is expected to greatly affect the per-
formance of ANN. Due to this reason ANNs are trained with TFD parallel chirps'
signal, by varying hidden layers and their neurons. The spectrogram of single chirp
signal is the test image (Fig. 3.17). The test TFD is processed through these ANNs
and the visual effect is recorded alongwith computation of entropy and MSE values.
The method remains the same as discussed in Section 3.2.1, and will not be discussed
here.
74
Figure 3.17: Test spectrogram image of single chirp signal.
3.3.1 ANN topology
Based on the results of previous Section, the LMB training algorithm is used for
learning. The activation functions are �xed as 'tansig' and 'poslin' in between input�
hidden layers and hidden�output layers. Number of hidden layer and neurons are
varied to �nd the optimum solution for the stated problem. The neurons in the input
and output layers are �xed at 3 and 1 respectively.
3.3.1.1 Training Set
The spectrogram of parallel chirps' signal given in Eqn. (3:3), is used as input
for training the ANNs. The grayscale spectrogram of this signal is shown in Fig.
3.10. The respective target TF plane image is shown in Fig. 3.5.
75
5 10 15 20 25 30 35 40 45 500
0.01
0.02
0.03
0.04
0.05
0.06
no of neuron
erro
r con
verg
ed
ERROR VS NO OF NEURONS
Figure 3.18: The comparative graph of error vs number of neurons in single hiddenlayer.
3.3.2 Effect of varying the number of Hidden Neurons
While studying the effect of varying the number of hidden neurons for computing
de�blurred TFD, the networks, trained with 2; 3; � � � 5; 10; � � � 40 and 50 neurons re-
spectively in 1; 2 and 3 hidden layer(s), are tested. The network never converged to
a stable point with neurons upto 30. The reason being that the less number of neu-
rons take the data from the input grid, unable to interpret the complete information,
and hence fails to convey the correct information to the next layers. The results were
satisfactory with 35 neurons in the hidden layer (one or more). But by increasing the
number of neurons further, no signi�cant improvement was observed in the reduc-
tion of error in last epoch as shown in Fig. 3.18. Entropy values are found to be the
minimum for 40 or more neurons irrespective of number of hidden layers, as given
in Table 3.3.
76
Table 3.3: Impact of varying neurons and hidden layers over entropy of resultantimage
Description Number of neurons10 20 30 40 50
EQ bits for single layer 10.20 9.31 9.01 8.20 8.20EQ bits for two layers 20.41 16.31 15.60 11.21 11.21EQ bits for three layer 22.56 14.30 12.10 10.24 10.24
3.3.3 Effect of varying the number of Hidden Layers
To decide about the optimum number of hidden layers in an ANN architecture is one
of the most important factor for solving any problem. It is observed from Fig. 3.19,
that there is no signi�cant difference in the MSE values converged while the hidden
layers are varied from 1 to 3: The networks converges to approximately the same
MSE values in the last epochs. Also the similar time is consumed for convergence
as indicated by number of epoches for 1; 2 and 3 layers respectively in Fig. 3.19.
Next the impact of this is observed on the quality of resultant TFD, by seeking all
possible rational combinations of number of layers and neurons in those layers. In
this Section only the selected resultant TFD images for few combinations are shown
in Figs. 3.20 to 3.26. The important point to note is that the resultant TFDs even
deteriorate if higher number of hidden layers are used. The empirical study testi�es
to the fact that many complex non�linear problems can be solved with only single
hidden layer. The entropy values recorded in Table 3.3, highlight the same fact that
the ANN architecture with single hidden layer and appropriate number of neurons
results in the most informative output for the stated problem.
77
0 5 10 15 20 25 30 35 4010
4
103
102
101
100
101
epoches
MS
E
ERROR VS EPOCHES
3 LAYERS2 LAYERS1 LAYERS
Figure 3.19: The comparative graph of error vs epoches for various number of hiddenlayers
Figure 3.20: Resultant TFD (2 hidden layers with 50 neurons in each).
78
Figure 3.21: Resultant TFD (2 hidden layers with 5 neurons in each).
Figure 3.22: Resultant TFD (3 hidden layers with 5 neurons in each).
79
Figure 3.23: Resultant TFD (3 hidden layers with 20 neurons in each).
Figure 3.24: Resultant TFD (2 hidden layers with 15 neurons in each).
80
Figure 3.25: Resultant TFD (single layer with 40 neurons).
Figure 3.26: Resultant TFD (Single hidden layer with 30 neurons).
81
3.4 Effect of Data Clustering and using Multiple ANNs foreach Cluster
In this Section, the effect of clustering the data and training multiple ANNs for each
cluster (CDMN) is examined to obtain the high resolution TFDs. It has been found
that training does not give the same results every time; this is because the weights
are initialized to random values and high validation error may end up training early.
Moreover once a network is trained with selected input, its performance improves
signi�cantly as opposed to the one that does not receive selected input data for train-
ing. The training TFDs remain the same with mathematical description given in
Section 1 and graphically shown in Figs. 3.9�3.10 and 3.4�3.5 for input and target
TFDs respectively. The spectrogram (depicted in Fig. 3.11) of combined chirp and
sinusoidal FM signal is used as test TFD. The performance of CDMN is checked by
computing the entropy, MSE and the time consumed for convergence.
3.4.1 Advantages of Clustering and Training Multiple ANNs
The clustering process, groups together entities based on the similarity of their fea-
tures. The purpose is to form groups of entities such that entities within a group are
similar to each other and different from entities in other groups. Different networks
are then trained for each cluster, thus the problem of forcing one network to learn
input vectors that are distant from each other is eliminated.
82
On the other hand using multiple ANNs for each cluster is expected to be ad-
vantageous due to the fact that weights are initialized to random values. When the
network begins to over �t the data, the error on the validation set typically begins to
rise and when the validation error increases for a speci�ed number of iterations, the
training is stopped, and the weights and biases at the minimum of the validation er-
ror are returned. By keeping track of the network error, or performance, accessible
via the training record, best network can be selected in terms of training performance
for each cluster.
3.4.2 The Network Architecture and Procedure
Based on the results of Sections 3.2 and 3.3, the LMB training algorithm is used
for learning for single hidden layer with 40 neurons [224]. The activation functions
are �xed as 'tansig' and 'poslin' in between input�hidden layers and hidden�output
layers. The neurons in the input and output layers are �xed at 3 and 1 respectively.
The procedural steps for training and testing are same as described in Section
3.2.1. The exception here include training multiple ANNs for each cluster and selec-
tion of the best ANN for each cluster based on MSE in last epoch. These selected
networks are termed as expert networks (ENs).
3.4.3 Performance Evaluation
The available data is clustered into three different regions based on three intuitive
edges present in the data as discussed in Section 3.2.1. There are three ANNs trained
83
Figure 3.27: Resultant TFD obtained after processing test TFD with single ANNwithout data clustering.
for each cluster. The spectrogram (depicted in Fig. 3.11) of combined chirp and sinu-
soidal FM signals is used as test TFD. The results are shown in Figs. 3.27 and 3.28.
It is evident that result obtained with CDMN is highly concentrated as compared to
the result obtained otherwise.
3.4.3.1 MSE performance
For MSE performance comparison, plot of MSEs converged by various ANNs
trained for each cluster is shown in Fig. 3.29. It indicates that only once out of
three (about 34%) ANN � 1 displays the minimum MSE and it is ANN � 2 which
is found to be the best as it ful�ll this criterion twice. The graph between the MSE
convergence rate of various ANNs is plotted in Fig. 3.30, which depicts that the MSE
for ANN � 2 converges sharply and stops due to early stopping to avoid over�tting
84
Figure 3.28: Resultant TFD obtained after processing test TFD with multiple ANNsafter data clustering.
within 22 and 12 epochs for cluster 1 & 3 as compared to rather �at convergence rate
ofANN�1within 50& 15 epochs. This results in sharper rate of MSE convergence
as compared to other ANNs trained for these clusters. Signi�cantly for third cluster,
ANN � 1 is even the worst in comparison to the other two ANNs, obvious from Fig.
3.30(c).
3.4.3.2 Information analysis
The advantage of CDMN is further ascertained once the information content
in each resultant TFD is computed by measuring the entropy of resultant images
according to Eqn. (3:1). It is found that the CDMN produces the resultant TFD with
maximum information content as it has the lowest entropy of all as shown in Table
3.4.
85
(c)
(a) (b)
Figure 3.29: The MSE in last epoch for (a) ANNs trained for cluster 1 (b) ANNstrained for cluster 2 (c) ANNs trained for cluster 3.
86
(c)
(a) (b)
Figure 3.30: Rate of MSE convergence against epochs for (a) ANNs trained forcluster 1 (b) ANNs trained for cluster 2 (c) ANNs trained for cluster 3
87
Table 3.4: Comparison of Entropies
Description of approach EQ bitsSingle ANN without clustering 15:987Single ANN with clustering 13:768Multiple ANNs without clustering 11:667CDMN 8:223Spectrogram 29:743WD 18:915
3.4.3.3 Convergence time taken by various ANNs
The MSE converged by various ANNs is plotted against epoches in Fig. 3.31,
which indicates the time consumed by the different ANNs trained for each cluster
to approach the minimum MSE. This factor is however not exactly controllable due
to factors like early stopping and random weight initialization at the start for same
number of epoches. It is not guaranteed that the EN will consume minimum time
for convergence. However it can be observed from these comparative graphs that
ANN � 3 consumes the minimum time for two of clusters and ANN � 1 has taken
more time to converge than other networks trained for same cluster. This aspect
validates that training only one network for any task may not always give the best
results.
3.5 Summary
A novel ANN based approach is presented for computing de�blurred TFDs. The ap-
proach is progressively re�ned by optimizing various ANN parameters. Section 3.1
presented a method of computing informative and highly concentrated binary TFDs
88
(a) (b)
(c)
Figure 3.31: The convergence time taken by variuos ANNs for each cluster of data,(a) ANNs trained for cluster 1, (b) ANNs trained for cluster 2, (c) ANNs trained forcluster 3.
89
of signals whose frequency components vary with time. In Section 3.2, performance
of the various ANN training algorithms is evaluated on the basis of time, error and
entropy analysis. The LMB training algorithm is found to be the most optimum train-
ing algorithm, resulting in high resolution TFDs. The simulation results presented in
Section 3.3, indicate that the ANN architecture composed of single hidden layer with
40 neurons effectively removes the blur from the unknown spectrograms. The MSE
in last epoch is minimum for this ANN topology and it yields the lowest entropy
value in Table 3.3. Increasing the number of neurons and the hidden layers is found
to increase the complexity of the network. Moreove it is unsuitable manifested by
both visual (Figs. 3.20 to 3.26) and numerical �ndings (Table 3.3). Studying the ef-
fect of these parameters has a major impact on future research work in this direction.
Section 3.4 evaluates the performance of training CDMN on the basis of time,
error and entropy analysis. It is found that a mixture of ENs focused on a speci�c
task delivers a TFD that is highly concentrated along the IF. Experimental results
demonstrate the effectiveness of the approach.
90
Chapter 4Neural Network based Framework forComputing De�blurred TFDs�Part II
This Chapter presents the optimized BRNNM's based correlation vectored tax-
onomy algorithm to compute the TFDs that are highly concentrated in the TF plane,
by raising the scope of the approach presented in the previous Chapter. The ideas of
improved clustering, generalized training set and the Bayesian regularization in the
ANNs' training phase are incorporated, which greatly enhances the �nal results. The
degree of regularization is automatically controlled in the Bayesian inference frame-
work and produces networks with better generalized performance and lower suscep-
tibility to over��tting. The elbow criterion is used to �nd the optimum number of
clusters for the stated problem, which is found to have positive impact on the results.
Also the input and target TFDs are made more generalized where now the grayscale
spectrograms and pre�processed WD of known signals are vectorized and clustered
as per the elbow criterion to constitute the training data for multiple ANNs. The
best trained networks are selected and made part of the localized neural networks
(LNNs). Test TFDs of unknown signals are then processed through the algorithm
and presented to LNNs. Experimental results demonstrate that appropriately vec-
tored and clustered data and the regularization, with input training under Mackay's
evidence framework, once processed through LNNs produce high resolution TFDs.
91
A real life signal is tested to show the effectiveness of the proposed algorithm via
analysis based on entropy and visual interpretation.
4.1 The ANN based Framework's Description
Fig. 4.1 shows the overall block diagram of the model. The method employs Bayesian
regularization during ANNs' training phase to obtain energy concentration along the
IF of individual components for unknown blurred TFDs. The TFDs are treated as the
2�D images and vectors of different nature are recognized based on various edges
present in these images. These vectors are separated and clustered according to the
elbow criterion. The multiple Bayesian regularized neural networks (BRNNs) are
trained for each group of vectors in a cluster and by keeping track of the network er-
ror or performance, accessible via the training record, the best network is selected
in terms of training performance. These selected networks are the LNNs, special-
ized for one type of vectors each, with better generalization abilities. These LNNs
together are termed as network of expert neural networks (NENNs). In this way, the
aspect of forcing one network to learn input pattern that are distant from each other
is eliminated [110].
Fig. 4.1 is the overall block representation of the proposed ANN based frame-
work. This block diagram highlights three major modules of the method which are
drawn in Fig. 4.2 for more clarity. The modules include (i) pre�processing of train-
ing data, (ii) processing through the BRNNM and (iii) post�processing of output
92
ResultantTFDsPost
Processing
TestTFDs
Vectorization
Cluster A …Cluster B Cluster N
Correlation
LNNSelection
LNNSelection
LNNSelection
…
…TrainingMultipleBRNNs
TrainingMultipleBRNNs
TrainingMultipleBRNNs
Vectorization
Cluster A …Cluster B Cluster N
CorrelationTraining& TargetTFDs Preprocessing
ProcessingthroughBRNNM
Post processing
Figure 4.1: Flow diagram of the method
93
Output Postprocessing
BRNNM
PreprocessingInput
Figure 4.2: Major modules of the method
data. These major modules are further elaborated in Figs. 4.3� 4.13. These modules
and the rationale of the proposed method are described below:
4.1.1 Pre�processing of Training Data
Fig. 4.3 depicts the block diagram for this module. It consist of �ve major steps,
namely (i) two�step pre�processing of target TFDs, (ii) vectorization, (iii) subspaces
selection and direction vectors, (iv) correlation, and (v) taxonomy. They are de-
scribed as follows.
4.1.1.1 Two�step pre�processing of target TFDs
The highly concentrated WD of various known signals are used as the target
TFDs. As will be shown in Fig. 4.5, WD suffer from CTs which make them un-
94
TrainingData
Assigning mean values toRespective Clusters
Vectorization
Target TFD withoutcross terms
Two Step Processing
Target TFD withcross terms
Cluster 1 Cluster 2 Cluster NCluster 3 … .
Correlation
Vectorization
Input BlurredTFD
Figure 4.3: Pre-processing of training data
95
(a) (b)
Figure 4.4: The spectrograms used as input training images of the (a) sinusoidal FM,and (b) parallel chirp signals.
suitable to be presented as targets to the ANNs [218]. This fact is further elaborated
separately for these target TFDs in Figs. 4.6 and 4.8, where the CTs are clearly visi-
ble in binary versions. The CTs are therefore eliminated before the WD is fed to the
ANN. This is achieved in two steps:
1. The WD is multiplied point by point with spectrogram of the same signal
obtained with a hamming window of reasonable size.
2. All values below a certain threshold are set to zero.
The resultant target TFDs are shown as Figs. 4.7(a) and 4.9(a), which are fed
to the ANN after vectorization described as follows.
96
(a) (b)
Figure 4.5: Target TFDs with CTs unsuitable for training ANN taking WD of the, (a)parallel chirps' signal, and (b) sinusoidal FM signal.
(a) (b)
Figure 4.6: The non-processed WD target images of the sinusoidal FM signal, (a)grayscale version, (b) binary version.
97
(a) (b)
Figure 4.7: The pre-processed WD target image of sinusoidal FM signal, (a)grayscale version, (b) binary version.
(b)(a)
Figure 4.8: The non-processed WD target images of the parallel chirps' signal, (a)grayscale version, (b) binary version.
98
(a) (b)
Figure 4.9: The pre-processed WD target image of the parallel chirps' signal, (a)grayscale version, (b) binary version.
4.1.1.2 Vectorization
(1) Input TFDs. Fig. 4.4 depicts input spectrograms. They are consid-
ered as 2�D images consisting of pixels having appropriate grayscale values e.g.,0@a11 � � � a1n... . . . ...am1 � � � amn
1AThese pixel values can be used to generate vectors, for example,a vector of length three will contain three pixel values of a row of TFD image. The
three pixel values of input TFDs in a row are taken as a vector of length three. The
size of TFD image matrix is pre�adjusted to avoid any leftover row or column. The
vector length is decided after experimenting with various vector lengths (3; 5; 7 and
9). The decision is made based on visual results. Each input TFD image is thus
converted to vectors of particular length. These vectors are paired with the vectors
obtained from target TFDs, to be subsequently used for training.
99
(2) Target TFDs. Target WD are made CTs free using two�step procedure de-
scribed above. Mean values of the pixels of length three are computed from the corre-
sponding region of target TFD against the input TFD. For example, if ha11; a12; a13i
is a vector of pixels from input TFD and hb11; b12; b13i is the vector representing
corresponding region from the target TFD, then (b11+b12+b13)3
will become the target
numerical value for the input vector of length three. Mean values are taken as tar-
gets with a view that the IF can be computed by averaging frequencies at each time
instant, a de�nition suggested by many researchers [31].
4.1.1.3 Subspaces selection and direction vectors
1. Elbow Criterion. The elbow criterion is a common rule of thumb to
determine what number of clusters should be chosen. It states that number
of clusters be chosen so that adding another cluster does not add suf�cient
information [223]. More precisely, if the percentage of variance explained by
the clusters is plotted against the number of clusters, the �rst clusters will add
much information (explain a lot of variance), but at some point the marginal
gain will drop, giving an angle in the graph (the elbow). This elbow can not
always be unambiguously identi�ed. On the following graph (Fig. 4.10) which
is drawn for the problem in hand, the elbow is indicated by the "goose egg". The
number of clusters chosen is therefore three.
100
2. The number of subspaces Ns into which vectors will be distributed is selected
based on elbow criterion in relation to underlying image features like edges
present in the data. As mentioned in the previous Chapter, the edge is considered
because it is one of the important image underlying features and characteristics.
Moreover it is well established fact that blurring mostly causes loss of edge
information [111]. An edge could be ascending (1; 2; 3), descending (3; 2; 1),
wedge (1; 3; 2), �at (1; 1; 1), triangular (1; 3; 1) etc. Empirically it is found that
going from three to four clusters does not add suf�cient information, as the end
result has no signi�cant change in entropy values as indicated in Table 4.1 and
evident from Fig. 4.10. The impact of clustering is noted for six different test
images (TIs), shown in Chapter 5. As a result of this study, Ns = 3 is chosen
considering the �rst three most general types of edges.
3. The sub space direction vectors vn (n = 1; 2 : : : Ns) are selected that will best
represent the subspaces. As these subspaces are de�ned on the basis of edges,
so three directional vectors vh; vc; vl are computed in the following manner:
(a) vh is obtained by rearranging (any) 3 integers in descending order.
(b) vcis obtained by rearranging (any) 3 integers in a wedge shape where
the highest value occurs in the middle and values on either side are in
descending order.
(c) vl is obtained by rearranging (any) 3 integers in ascending order.
101
Figure 4.10: Elbow criterion
4. All the direction vectors vh; vc; vl are normalized.
4.1.1.4 Correlation
An input vector xi is chosen from input spectrogram. The correlation between
each input vector xi from input TFD and each direction vector vh; vc; vl is calculated,
i.e. tij = xTi vj is computed where j = h; c; l.
4.1.1.5 Taxonomy
1. There will be Ns product values obtained as a result of last step for each input
vector xi. To �nd the best match, if tic has the largest value then this indicates
102
Table 4.1: Entropy values vs clusters
Description EQ (bits) for test TFDsTI 1 TI 2 TI 3 TI 4 TI 5 TI 6
No cluster 20:539 18:113 18:323 19:975 21:548 17:9102 clusters 13:523 12:294 12:421 11:131 14:049 11:9403 clusters 8.623 6.629 7.228 5.672 8.175 6.9484 clusters 8:101 6:300 7:202 5:193 8:025 6:7335 clusters 7:998 6:187 7:111 5:012 7:939 6:6786 clusters 7:877 6:015 7:019 5:995 7:883 6:661
that the input xi is most similar to the directional vector vc, which implies that
the vector is wedge type.
2. Step (1) is repeated for all input vectors. Consequently all the vectors are
classi�ed based on the type of edge they represent and Ns clusters are obtained.
Input spectrograms are depicted in Fig. 4.4. The statistical data revealing
numerical values for each type of vector for these two TFD images is shown in
Table 4.2.
3. Pairs of input vectors and targets (mean values of the pixels of the corresponding
window from the target TFD) are formed.
4. These pairs are divided into training set and validation set for training phase and
by observing error on these two sets, the aspect of over�tting is avoided.
These steps of vectorization, correlation and taxonomy are further elaborated
in graphical form by Fig. 4.11.
103
TFD Image
Vectorization
[1 2 3], [3 2 1], [1 3 2]… [1 3 2]
NormalizedDirectionVector 1[1 2 3] Correlation
NormalizedDirectionVector 2[3 2 1]
NormalizedDirectionVector 3[1 3 2]
AscendingVectors Wedge
Vectors
DescendingVectors
Figure 4.11: Vectorization, correlation and taxonomy of TFD image.
4.1.2 Processing through Bayesian Regularized Neural NetworkModel
Fig. 4.12 represents this module. There are three steps in this module, namely (i)
training of BRNNM, (ii) selecting the LNNs, and (iii) testing the LNNs. They are
discussed in the following subsection.
4.1.2.1 Training of BRNNM
1. Since the ANN is being used in a data�rich environment to provide high
resolution TFDs, it is important that it does well on data it has not seen before,
i.e. that it can generalize. To make sure that the network does not become over
trained. the error is monitored on a subset of the data that does not actually
take part in the training. This subset is called the validation set other than the
104
Output Data
LNN 1 LNN 2 LNN N
Cluster 1
… … ……Cluster 2 Cluster N…
Training Data
Cluster 1 Cluster 2 Cluster N
Correlation
Vectorization
Test TFDs
…
Figure 4.12: Bayesian regularised neural network model
105
Table 4.2: Cluster parameters
Various parameters Cluster 1 consisting of Cluster 2 consisting of Cluster 3 consisting of(input training TFDs) ascending edge type descending edge type wedge edge type
vectors vectors vectorsVectors from spectrogram 19157 18531 112of sinusoidal FM signalVectors from spectrogram of 4817 4959 52parallel chirps' signalThe best ANN taken as LNN ANN � 3 ANN � 2 ANN � 1Time taken by the best ANN 308 seconds 114 seconds 55 secondsto complete the trainingMSE converged by the best 2:54� 10�4 3:56� 10�4 1:38� 10�2ANN (LNN)
training set. If the error of the validation sets increases the training stops. For
this purpose, alternate pairs of vectors from input and target TFDs are included
in training and validation sets.
2. The input vectors represented by xi and the mean values yi of the pixel values,
of the corresponding window from the target TFD image are used to train the
multiple ANNs under Bayesian framework. There are three ANNs trained
for each cluster, being the smallest numerical value to check the advantage
of training multiple ANNs. This selection has no relation with the number of
subspaces or direction vectors.
3. Step (2) is repeated until all pairs of input and corresponding target vectors are
used for training.
106
4.1.2.2 LNNs' selection
1. As mentioned above, three ANNs are being trained for each cluster and the
best for each cluster is required to be selected. The �training record� is a
programmed structure in which the training algorithm saves the performance of
the training�set, test�set, and the validation�set, as well as the epoch number
and the learning rate. By keeping track of the network error or performance,
accessible via the training record, the best network is selected for each cluster.
These best networks for the respective clusters are called the LNNs.
2. Using multiple networks for each cluster is found to be advantageous because
the weights are initialized to random values, and when the network begins to
over��t the data, the error in the validation set typically begins to rise. If this
happens for a speci�ed number of iterations, the training is stopped, and the
weights and biases at the minimum of the validation error are obtained. As a
result, various networks will have different MSEs in the last training epoch. The
ANN with minimum MSE is the winner and is included in the LNNs. There
are three ANN trained for each of three clusters, and it is found that ANN � 3
and ANN � 2 are the best for the �rst and second clusters respectively, and the
ANN � 1 is found to be the best for the third cluster only. This fact is evident
from Table 4.2 as well. It is assumed that these selected ANN are optimally
trained and will posses better generalization abilities.
107
4.1.2.3 Testing of BRNNM
1. Test TFDs are converted to vectors (zi) and clustered after correlating with the
direction vectors, as done for the input TFDs.
2. Each test vector zi is fed to the LNN trained for the type.
3. The steps are repeated until all test vectors are tested.
4.1.3 Post�processing of the Output Data
This module is illustrated in Fig. 4.13. After testing phase, the resultant data is post�
processed to get the resultant TFD. As we obtain one value for each vector of length
three from test TFD after processing through the LNNs. There are two possibilities
to �ll the rest of two pixels, either (i) replicate the same value for other two places,
or (ii) use zero padding around this single value to complete the number of pixels.
Zero padding is optimal because it is found to reduce the blur in TF plane. Next
the resultant vectors of correct length are placed at their original places from where
they were correlated and clustered. These vectors are placed according to the initially
stored grid positions.
108
Output Data
Declustering
Placement atappropriategrid positions
Formation of highlyconcentrated & good
resolutions TFD
Figure 4.13: Post-processing of the output data
109
4.2 Performance Evaluation
To address the stated problem, Bayesian Regularized LMB training algorithm is used
with feed forward back propagation ANN architecture and 40 neurons in the single
hidden layer. This architecture is chosen after an empirical study [224, 226]. We ex-
periment with various training algorithms using different parameters such as different
activation functions between layers, number of hidden layers and number of neurons.
Also the positive impact of localised processing by selecting the best trained ANN
out of many is ascertained [227]. The `tansig' and `poslin' transfer functions are used
respectively representing the hidden layer of sigmoid neurons followed by an output
layer of positive linear neurons. Multiple layers of neurons with nonlinear transfer
functions allow the network to learn linear and nonlinear relationships between input
and output vectors. The linear output layer lets the network produce values outside
the range [�1:+ 1].
To train the BRNNM, the spectrograms and WD of the two signals are used as
input and target TFDs respectively. The �rst signal is a sinusoidal FM signal, given
by:
x(n) = e�i�f52+!(n)gn (4.1)
where ! (n) = 0:1 sin�2�nN
�, and N refers to the number of sampling points. The
spectrogram of this signal is depicted in Fig. 4.4(a). The respective target TFD,
obtained through WD, is depicted in Fig. 4.7(a).
110
The second signal is with two parallel chirps given by:
Y (n) = x1(n) + x2(n) (4.2)
where
x1(n) = ei!1(n)n with !1(n) =�n
4nand
x2(n) = ei!2(n)n with !2(n) =�
3+�n
4n
whereN refers to the total number of sampling points in the signal. The spectrogram
of this signal is depicted in Fig. 4.4(b). The respective target TFD, obtained through
the WD, is depicted in Fig. 4.9(a). The model's performance is evaluated using test
TFD of a bat echolocation chirps signal, whose spectrogram is shown in Fig. 4.14(a).
As discussed the entropy can be considered as a measure of concentration [107] (the
lower the entropy of a distribution, the more concentrated it is,). The expression given
by Eqn. (3:1) is used to quantify the TFDs' performance in terms of concentration.
There is a requirement to determine the optimum number of clusters, in which
the input TFDs will be divided for processing by LNNs. The elbow criterion states
that the number of clusters must be chosen so that adding another cluster does not add
suf�cient information [223]. Entropy has an inverse relation to information [107].
TFDs with lesser entropy values will contain maximum information content. This
concept contributes signi�cantly to �nding the optimum number of clusters as per
elbow criterion. Table 4.1 shows that the TFDs processed through LNNs without
111
(a) (b)
Figure 4.14: Test TFDs for bat chirps signal, (a) the spectrogram TFD, and (b) Theresultant TFD after processing through proposed framework.
clustering (where we do not make use of effective correlation vectored taxonomy)
carry minimum information as the entropy values are the largest. This fact is further
elaborated in Fig. 4.10 where the percentage of variance explained by the clusters
is plotted against the number of clusters. Obviously the �rst cluster has added much
information (a lot of variance), i.e the entropy has reduced (see Table 4.1). Adding
another cluster, based on various edges present in the signal, further reduces the
entropy value for the resultant TFD image of test signals and improves the visual
result (shown in Fig. 5.2). The visual results are found to be indistinctive for any
additional cluster with an increase in computational complexity, but a hump in the
marginal gain is observed, giving an angle in the graph (the elbow). This elbow is
unambiguously identi�ed in Fig. 4.10 indicated by the "goose egg". The clusters are
therefore chosen to be 3.
112
Table 4.3: Entropy values for various techniques
The method Resultant EQ (bits) for test TFD
Correlation vectored taxonomy 7:228algorithm with three clustersas per elbow criterionWD 18:623Spectrogram 24:986Approach used by [132] 12:125
Figure 4.15: Resultant TFD obtained by the method of [132].
The entropy values of the test result by various techniques are recorded in Table
4.3. It is found that the proposed framework produces output TFD which has lower
entropy value in comparison to any other technique like WD, spectrogram, or the
OKM [132].
The de�blurred test TFD obtained by the proposed algorithm is shown in Fig.
4.14(b). It can be compared with existing methods like [132] which proposes a signal
dependent kernel that changes shape for each signal to offer improved TF represen-
113
tation for a large class of signals based on quantitative optimization criteria. The
resultant TFD of this technique is depicted in Fig. 4.15, which hides some impor-
tant signal information by losing the uppermost chirp, obvious in spectrogram of the
same signal (Fig. 4.14(b)).
4.3 Summary
The method presented in this Chapter provides an effective way to obtain high res-
olution TFDs of signals whose frequency components vary with time by using the
LNNs specially trained for various clusters of training data. As discussed earlier the
WD and the spectrogram QTFDs are often the easiest to use, they do not always pro-
vide an accurate characterization of the real data. The idea uses the spectrogram to
obtain an overall characterization of the STSC' structure, and then the information is
used to invest in the WD that is well matched to the data for further processing that
requires information that is not provided by the spectrogram. As a result, the IFs of
the individual components present in the non�stationary signals can be visually de-
termined and mathematically computed by calculating the average frequency at each
time [31, 34].
114
Chapter 5Discussion on Experimental Results
The discussion on experimental results by the proposed approach and perfor-
mance evaluation of various BDs is presented in this Chapter. It uses objective meth-
ods of assessment to evaluate the performance of de�blurred TFDs estimated through
BRNNM (henceforth the NTFDs). As discussed in Section 2.2, the objective meth-
ods allow quantifying the quality of TFDs instead of relying solely on visual inspec-
tion of their plots. In particular the computation regularities show the criteria's effec-
tiveness in quantifying the TFDs' concentration and resolution information. Perfor-
mance comparison with various other quadratic TFDs is provided too. This Chapter
is organized in three sections. Section 5.1 discusses the NTFDs' performance basing
on the visual results and carrying out their information quanti�cation by measuring
the entropy values only. In Section 5.2, the concept and importance of TFDs' ob-
jective assessment is described. These objective methods are used to evaluate the
performance of de�blurred TFDs obtained by the proposed BRNNM for both real
life and synthetic signals. Section 5.3 �nally summarizes the Chapter.
5.1 Visual Interpretation and Entropy Analysis
In the �rst phase, �ve synthetic signals are tested to evaluate the effectiveness of
the proposed algorithm basing on visual results and their entropy analysis. They
115
include (i) a two sets of parallel chirps signal intersecting at four places, (ii) a
mono�component linear chirp signal, (iii) combined quadratic swept�frequency sig-
nals whose spectrograms are concave and convex parabolic chirps respectively, (iv)
a combined crossing chirps and sinusoidal FM signal and (v) a quadratic chirp sig-
nal. Spectrograms of these signals are shown in Figs. 5.1(a) to 5.1(e) respectively.
Keeping in mind that estimation of the IF is rather dif�cult at the intersections of
chirps, the �rst and �fth test cases are considered to check the performance of pro-
posed algorithm at the intersection of the IFs of individual components present in the
signals.
The spectrogram of the two sets of parallel chirps signals crossing each other
at four points depicted in Fig. 5.1(a) is obtained by point�by�point addition of the
following two parallel chirps signals with different phases as indicated:
TS1(n) = X1(n) +X2(n); (5.1)
with the �rst set of parallel chirps computed as
X1(n) = x11(n) + x12(n);
where
x11(n) = ei[���n6N ]n and
x12(n) = ei[�3� �n6N ]n
116
and the second set of parallel chirps computed as
X2(n) = x21(n) + x22(n);
where
x21(n) = ei[�nN ]n and
x22(n) = ei[�+�nN ]n
The spectrogram of the resultant signal where individual components intersect each
other at multiple points is fed as the �rst test signal and is depicted in Fig. 5.1(a).
The second test signal is a mono�component chirp signal given by:
TS2(n) = ei[�+�n
N ]n (5.2)
The spectrogram of the resultant signal is depicted in Fig. 5.1(b).
The third test signal is obtained by point�by�point addition of two quadratic
swept�frequency signals whose spectrograms are concave and convex parabolic chirps
respectively. Mathematically both the signals can be obtained by manipulating dif-
ferent parameters of following equation:
TS3(n) = cos
�2�
�@
1 + �
��n(1+�)
�+ f0 +
�
360
�; (5.3)
where
@ = (f1 � f0) �(��)
117
(d)
(a)
(c)
(b)
(e)
Figure 5.1: Test TFDs (a) Crossing chirps (TI 1), (b) mono-component linear chirp(TI 2), (c) combined quadratic swept-frequency signals whose spectrograms are con-cave and convex parabolic chirps respectively (TI 3), (d) combined sinusoidal FMand crossing chirps (TI 4), and (e) quadratic chirp (TI 5)
118
here �; f0;f1; � and � are de�ned as the matching string constant, start frequency,
frequency after one second, initial phase of signal and sample rate respectively. The
spectrogram of the �rst quadratic swept�frequency signal is concave parabolic chirp
which starts at 250 Hz and go down to 0 Hz at a 1 kHz sample rate; whereas
spectrogram of the second quadratic swept�frequency signal is a convex parabolic
chirp starting at 250 Hz and going up to 500 Hz at a 1 kHz sample rate. These
aspects are evident in the combined spectrogram depicted in Fig. 5.1(c).
Another test signal is obtained by combining crossing chirps given in Eqn.
(5:4) below and sinusoidal FM signal in Eqn. (3:2),
TS4(n) = ei[�nN ]n + ei[�+
�nN ]n (5.4)
The spectrogram of the signal is depicted in Fig. 5.1(d).
Yet another test signal is a quadratic chirp which starts at 100 Hz and crosses
200 Hz at 1 second with a 1 kHz sample rate. It is obtained from Eqn. (5:3)
after necessary adjustment of different parameters. The spectrogram of this signal is
depicted in Fig. 5.1(e).
5.1.1 Resultant NTFDs � Experimental Results
The �ve synthetic test signals are: a combined parallel chirps signal crossing at four
points, a mono�component linear chirp signal, combined quadratic swept�frequency
signals whose spectrograms are concave and convex parabolic chirps respectively,
119
Table 5.1: Entropy values for various techniques
The method Resultant EQ (bits) for test TFDsTI 1 TI 2 TI 3 TI 4 TI 5
NTFDs 8:623 6:629 5:672 8:175 6:948WD 21:562 10:334 18:511 20:637 18:134Spectrogram 28:231 18:987 27:743 28:785 23:774
combined crossing chirps and sinusoidal FM signals without any intersection and a
quadratic chirp signal. The spectrograms of these signals constitute test image 1 (TI
1), test image 2 (TI 2), test image 3 (TI 3), test image 4 (TI 4), and test image 5
(TI 5). They are depicted in Figs. 5.1(a�e) respectively. In the initial attempt, the
expression given by Eqn. (3:1) is used to quantify the TFDs' information in form of
entropy values, which has an inverse relation with the information [107].
In the Table 5.1 entropy values for various type of TFDs have been recorded. It
is found that the NTFDs by the proposed ANN based framework have lower entropy
values than those of any other technique like WD and the spectrogram. TI 1 and TI
5 are taken into account to check the performance of the proposed algorithm with
LNNs for estimation of the IFs at the intersections along the individual components
in the signals. Even though estimation of IF is considered rather dif�cult at inter-
sections, the algorithm performs well as depicted in Figs. 5.2(a) and (d). The test
images including TI 2, TI 3 and TI 5 present the ideal cases to check the performance
of the proposed algorithm with LNNs trained with signals of different natures. The
resultant TFD images are highly concentrated along the IF of individual components
present in the signal as shown in Figs. 5.2(b), (c) and (e).
120
(a) (b)
(d) (e)
(c)
Figure 5.2: Resultant TFDs after processing through correlation vectored taxonomyalgorithm with LNNs for (a) Crossing chirps (TI 1), (b) mono-component linear chirp(TI 2), (c) combined quadratic swept-frequency signals whose spectrograms are con-cave and convex parabolic chirps respectively (TI 3), (d) combined sinusoidal FMand crossing chirps (TI 4), and (e) quadratic chirp (TI 5)
121
5.2 Objective Assessment
In this section, the objective measures described in Section 2.2 are used to analyze the
NTFDs' performance in comparison to other TFDs. The aim has been to �nd, based
on these measures, the highly informative TFDs having the best concentration and
the highest resolution. Five examples, other than the previous Section, including both
real life and synthetic multicomponent signals, are being considered. The signals
include (i) a multicomponent bat echolocation chirp signal, (ii) a two�component
intersecting sinusoidal FM signal, (iii) a two sets of nonparallel, nonintersecting
chirps' signal, and (iv) a closely spaced three�component signal containing a sinu-
soidal FM component intersecting the crossing chirps. The respective spectrograms,
termed as test image A (TI A), test image B (TI B), test image C (TI C), and test
image D (TI D), are shown in Figs. 4.14(a), 5.3(a)�5.5(a) respectively. As an il-
lustration of the evaluation of the NTFDs' performance through measures in Eqns.
(2:10) and (2:14), we have further considered a closely spaced multicomponent sig-
nal containing two signi�cantly close parallel chirps. The spectrogram of this signal,
termed as test image E (TI E), is depicted in Fig. 5.6(a). The resultant NTFDs for
the test signals are shown in Fig. 4.14(b) & Figs. 5.3(b)�5.6(b) respectively. The vi-
sual results are indicative of NTFDs' high resolution and concentration along the IF
of the individual component present in the signals.
122
(a) (b)
Figure 5.3: (a) The test spectrogram (TI 2) [Hamm;L = 90] . (b) The NTFD of asynthetic signal consisting of two sinusoidal FM components intersecting each other.
(a) (b)
Figure 5.4: (a) The test spectrogram (TI 3) [Hamm;L = 90] . (b) The NTFD of asynthetic signal consisting of two-sets of non-parallel, non-intersecting chirps.
123
(b)(a)
Figure 5.5: (a) The test spectrogram (TI 4) [Hamm;L = 90] . (b) The NTFD of asynthetic signal consisting of crossing chirps and a sinusoidal FM component.
(b)(a)
Figure 5.6: (a) The test spectrogram (TI 5), and (b) the NTFD of test case 4.
124
5.2.1 Real Life Test Case
Real life data for bat echolocation chirp sound (adopted from [134]) provides an
excellent multicomponent test case. The nonstationary nature of the signal is only
obvious from its TFD and neither the time nor the frequency domain representations
present a clear picture of its true nature. The spectrogram of this signal is shown
in Fig. 4.14(a), and the resultant NTFD is depicted in Fig. 4.14(b). The result for
the same test case TFD is computed using an existing OKM [132] and is plotted in
Fig. 4.15. The OKM proposes a signal�dependent kernel that changes shape for
each signal to offer improved TF representation for a large class of signals based on
quantitative optimization criteria. On close monitoring the OKM's output depicted
in Fig. 4.15, it is revealed that this TFD does not fully recover all the components,
thus losing some useful information about the signal. Whereas the NTFD is not only
highly concentrated along the IF of the individual components present in the signal
but also more informative able to show the all components.
For further analysis, slices of the test and resultant NTFDs are taken at the time
instants n = 150 and n = 310 (recall that n = 1; 2; : : : ; 400) and the normalized
amplitudes of these slices are plotted in Fig. 5.7. These instants are chosen because
three chirps are visible (see Fig. 4.14(b)) at these time instants. Fig. 5.7 con�rm the
peaky appearance of three different frequencies at these time instants. There are no
CTs and the results of the proposed method offer better frequency resolution. It is
worth mentioning that the NTFD not only successfully recovers the fourth component
125
(b)(a)
Figure 5.7: The time slices for the spectrogram (blue) and the NTFD (red) for the batecholocation chirps' signal, at n=150 (left) and n=310 (right)
(the weakest) but it has the best resolution i.e. (narrower main lobe and no side lobes)
compared to all the other considered distributions e.g in Fig. 2.1 (drawn with the
optimum parameters). The largest frequency seen in Fig. 5.7(b) is not recovered by
any other TFD drawn in Fig. 2.1.
5.2.2 Synthetic Test Cases
Further four specially synthesized signals of different nature are fed to the model to
check its performance at the intersection of the IFs and closely spaced components,
keeping in mind that estimation of the IF is rather dif�cult in these situations. The
test cases are described as under:
126
5.2.2.1 Test case 1.
The �rst one is the synthetic signal consisting of two intersecting sinusoidal
FM components, given as:
SynTS1(n) = e�i�( 52�0:1 sin(2�n=N))n + ei�(
52�0:1 sin(2�n=N))n (5.5)
The spectrogram of the signal is shown in Fig. 5.3(a).
5.2.2.2 Test case 2.
The second synthetic signal contains two sets of nonparallel, nonintersecting
chirps once plotted on the TF plane. Mathematically it can be written as:
SynTS2(n) = ei�( n
6N )n + ei�(1+n6N )n + e�i�(
n6N )n + e�i�(1+
n6N )n (5.6)
The spectrogram of the signal is shown in Fig. 5.4(a).
5.2.2.3 Test case 3.
It is a three�component signal containing a sinusoidal FM component inter-
secting two crossing chirps. It is expressed as:
SynTS3(n) = ei�( 52�0:1 sin(2�n=N))n + ei�(
n6N)n + ei�(
13� n6N)n (5.7)
The spectrogram of the signal is shown in Fig. 5.5(a). The frequency seperation
between the two components (sinusoidal FM and chirp components) in between
127
150 � 200 Hz near 0:5 sec is low enough and is just avoiding intersection. This
is to con�rm the model's effectiveness in de�blurring closely spaced components.
5.2.2.4 Test case 4.
This particular test case is adopted from Boashash [33] to compare the TFDs'
concentration and resolution performance at the middle of the signal duration interval
by Boashash performance measures in Eqns. (2:10) and (2:14). The signal consists
of two LFMs whose frequencies increase from 0:15 to 0:25 Hz and from 0:2 to
0:3 Hz, respectively, over the time interval t8[1; 128]. The sampling frequency is
fs = 1 Hz.The authors in [33] have speci�cally found the modi�ed B distribution
(� = 0:01) as the best performing TFD for this particular signal at the middle after
measuring the signal components' parameters needed in Eqn. (2:14) (see Table 5.3).
The signal is de�ned as;
SynTS4(n) = cos�2��0:15t+ 0:0004t2
��+ cos
�2��0:2t+ 0:0004t2
��(5.8)
The spectrogram of the signal is shown in Fig. 5.6(a).
The above mentioned test cases are processed through the BRNNM and the es-
timated NTFDs are shown in Figs. 5.3(b)�5.6(b). High resolution and concentration
along the IF of individual components is obvious once inspecting these plots visually.
128
5.2.3 Performance Evaluation
To evaluate the performance, numerical computations by the methods like the ratio
of norms based measures, Shannon & Rényi entropy measures, normalized Rényi en-
tropy measure and LJubisa measure are recorded in Table 5.2. The entropy measures
including Shannon & Rényi entropies with or without normalization make excellent
measures of the information extraction performance of TFDs. By the probabilistic
analogy, minimizing the complexity or information in a particular TFD is equivalent
to maximizing its concentration, peakiness, and, therefore, resolution [100]. To ob-
tain the optimum distribution for a given signal, the value of ratio of norms based and
Boashash resolution measures should be the maximum [112], whereas TFDs' yield-
ing the smallest values for LJubisa and Boashash concentration measures are consid-
ered as the best performing TFD in terms of concentration and resolution [33, 172].
The values in Table 5.2 refer to the NTFDs as the best TFDs according to most
of the criteria. Although few singularities are present in the data mainly attributable
to inherent shortcomings and derivations' assumptions, e.g. simple Rényi entropies,
being unable to detect zero mean CTs, indicate ZAMD as the best concentrated TFD.
However the more often used volume normalized Rényi entropies are the minimum
for the NTFDs.
It seems appropriate to plot these measures independently for various TI's (i.e.
TI A�TI D), which are shown in Fig. 5.8. These plots conform to the visual results
and highlight that the NTFDs are better in comparison to other considered distribu-
129
Table 5.2: Performance Measures Comparison for Various TFDs
Description Test Spec WVD ZAMD MHD CWD BJD NTFD SNN OKMTFD [132]
Shannon TI A 13.46 36.81 102.23 42.98 17.27 17.73 7.27 10.18 14.68
entropy TI B 13.45 64.33 76.81 37.74 20.82 20.43 8.75 10.88 18.08
measure TI C 18.66 185.49 274.73 126.02 28.08 28.05 7.87 13.45 21.42
TI D 18.94 74.82 87.30 49.24 35.31 29.92 17.25 24.23 23.57
Ratio of TI A 3.81 3.84 2.94 1.05 2.89 2.73 66 13.88 8.32
Norm based TI B 1.94 1.91 2.18 1.10 3.10 4.67 24 18.12 1.59
measure TI C 51.23 58.0 1.02 48.71 38.53 26.37 44 33.90 10.26
(�10�4) TI D 0.95 0.92 1.19 0.12 1.11 2.68 14 8 4.60
Rényi TI A 12.45 10.90 7 11.47 12.67 12.54 7.26 9.25 11.65
entropy TI B 12.98 9.95 7.56 11.03 12.06 11.85 8.74 10.89 13.82
measure TI C 17.07 14.01 8.62 14.74 16.24 15.84 7.85 12.82 17.22
TI D 12.47 9.48 7.06 10.50 11.54 11.34 8.23 10.03 13.31
Energy TI A 12.45 10.90 7 11.47 12.67 12.54 7.26 9.25 11.65
Normalized TI B 12.98 9.95 7.56 11.03 12.06 11.85 8.74 10.89 13.82
Rényi TI C 17.07 14.01 8.62 14.74 16.24 15.84 7.85 12.82 17.22
entropy measure TI D 12.47 9.48 7.06 10.50 11.54 11.34 8.23 10.03 13.31
Volume TI A 12.45 12.02 9.18 12.75 12.93 12.85 7.26 12.97 11.77
Normalized TI B 12.98 11.62 9.54 12.26 12.60 12.38 8.74 11.68 10.98
Rényi TI C 17.07 16.28 11.35 16.70 16.77 16.41 7.85 14.49 15.43
entropy measure TI D 12.47 9.48 7.06 10.50 11.54 11.34 8.23 10.03 10.31
LJubisa TI A 0.2219 3.30 13.14 2.9200 1.06 1.01 0.0015 0.0912 0.6300
measure TI B 0.1600 4.68 5.6266 1.1861 1.0123 0.8946 0.0024 0.0145 8.6564
(�105) TI C 6.03 47.05 39.64 36.47 33.08 29.39 0.0043 0.9973 14.73
TI D 0.1553 8.67 9.6253 5.1848 6.0110 5.8933 1.0030 3.0223 8.5551
tions. The congruence and regular nature of the curves is also obviuos in these plots
which are indicative of the objective criteria's validity.
Boashash performance measures for concentration and resolution as de�ned
in Eqns. (2:10) and (2:14) are computationaly expensive because they require calcu-
lations at various time instants. To limit the scope, these measures are computed at
the middle of the synthetic signal de�ned in Eqn. (5:8) and the results are compared
with the one reported in [33]. A slice is taken at t = 64 and the signal compo-
nents' parameters AM1(64); AM2(64); AM(64); AS1(64); AS2(64); AS(64); Vi1(64);
130
(a) (b)
(c) (d)
(e)
Figure 5.8: Comparison plots, criterions' values vs TFDs, for the test images 1�4, (a)The Shannon entropy measure, (b) Rényi entropy measure, (c) Volume normalizedRényi entropy measure,(d) Ratio of norm based measure, and (e) LJubisa measure
131
Table 5.3: Parameters and the Normalized Instantaneous Resolution PerformanceMeasure of TFDs for the Time Instant t=64
TFD (optimal parameter) AM (64) AS(64) AX (64) Vi(64) 4fi(64) D(64) R(64)Spectrogram (Hann;L = 35) 0.9119 0.0087 0.5527 0.0266 0.0501 0.4691 0.7188
WVD 0.9153 0.3365 1 0.0130 0.0574 0.7735 0.6199
ZAMD (a = 2) 0.9146 0.4847 0.4796 0.0214 0.0420 0.4905 0.5661
CWD (� = 2) 0.9355 0.0178 0.4415 0.0238 0.0493 0.5172 0.7541
BJD 0.9320 0.1222 0.3798 0.0219 0.0488 0.5512 0.7388
Modi�ed B (� = 0:01) 0.9676 0.0099 0.0983 0.0185 0.0526 0.5957 0.8449
NTFD 0.9013 0 0 0.0110 0.0550 0.800 0.9333
Table 5.4: Parameters and the Modi�ed Instantaneous Concentration PerformanceMeasure of TFDs for the Time Instant t=64
TFD AS1 (64) AS2 (64) AM1 (64) AM2 (64) Vi1 (64) Vi2 (64) fi1 (64) fi2 (64) C1(64) C2(64)(optimal parameters)Spectrogram 0.0087 0.0087 1 0.8238 0.03200 0.0200 0.1990 0.2500 0.1695 0.0905
(Hann;L = 35)WVD 0.3365 0.3365 0.9153 0.9153 0.0130 0.013 0.1980 0.2554 0.4333 0.4185
ZAMD(a = 2) 0.4848 0.4900 1 0.8292 0.0224 0.0204 0.2075 0.2495 0.5927 0.6727
CWD(� = 2) 0.0176 0.0179 1 0.8710 0.0300 0.0176 0.205 0.2543 0.1639 0.0898
BJD 0.1240 0.1204 1 0.8640 0.0270 0.0168 0.2042 0.2530 0.2562 0.2058
Modi�ed B 0.0100 0.0098 1 0.9352 0.0190 0.0180 0.200 0.2526 0.1050 0.0817
(� = 0:01)NTFD 0 0 0.8846 0.9180 0.0110 0.0110 0.2035 0.2585 0.0541 0.0425
Vi2(64); Vi(64); fi1(64); fi2(64)and�fi(64), as well as the CTs' magnitude AX(64)
are measured. These are then used to calculate the TFDs' normalized instantaneous
resolution and modi�ed concentration performance measures Ri(t) and Cn(t) , de-
�ned by Eqns. (2:15) and (2:11). The measurement results are recorded in Table 5.3
and Table 5.4 seperately for Ri(64) and Cn(64). The slice of the signal's NTFD at
t = 64 is shown in Fig. 5.9(f).
As mentioned earlier, a TFD that, at a given time instant, has the largest positive
value (close to 1) of the measure Ri is the TFD with the best resolution performance
132
(a) (b)
(c) (d)
(e) (f)
Figure 5.9: The normalized slices at t = 64 of TFDs. (a) The spectrogram. (b) WD.(c) ZAMD. (d) CWD. (e) BJD. (f) NTFD. First �ve TFDs (dashed) are comparedagainst the modi�ed B distribution (solid), adopted from Boashash [33].
133
at that time instant for the signal under consideration. From Table 5.3, the NTFD of
synthetic signal given by Eqn. (5:8) gives the largest value of Ri at time t = 64 and
hence is selected as the best performing TFD of this signal at t = 64. On similar lines,
the TFDs' concentration performance is compared at the middle of signal duration
interval. A TFD is considered to have the best energy concentration for a given
multicomponent signal if for each signal component, it yields the smallest
1. Instantaneous bandwidth relative to component IF (Vi(t)=fi(t)) and,
2. Sidelobe magnitude relative to mainlobe magnitude (AS(t)=AM(t)).
The measured results are recorded in Table 5.4, which indicate that the NTFD
of signal given by Eqn. (5:8) yield the smallest values of C1;2(t) at t = 64 and hence
is selected as the best concentrated TFD at t = 64. To draw a better comparison, the
values of Ri and C1;2 computed for different TFDs are plotted in Fig. 5.10. The plot
supports the tabulated results and con�rms the NTFD's superiority in comparison to
other considered TFDs.
5.3 Summary
In this Chapter, the objective assessment methods are used to compare the concen-
tration and resolution performance of TFDs for multicomponent signal analysis thus
using a quantitative measure of goodness for TFDs instead of relying solely on the
visual measure of goodness of their plots. Given a TFD, Boashash normalized instan-
134
(a) (b)
Figure 5.10: Comparasion plots for Boashash TFDs' performance measures vs TFDs,(a) The modi�ed concentration measure (Cn(64)), (b) normalized instantaneous res-olution measure (Ri)
taneous resolution and modi�ed concentration performance measures can be consid-
ered the most appropriate for quantifying its TF concentration and resolution. This
is due to the reasons that these objective measures take into account both the concen-
tration and resolution aspects, thus providing a better picture in the case of signals
with closely spaced components. What makes them the better choice is the inclusion
of the characteristics of TFDs that in�uence their resolution, such as components
concentration and separation and interference terms minimization. The quantitative
framework is found signi�cantly effective for the analysis and evaluation of the pro-
posed model's performance, using both synthetic and real life examples. Experimen-
tal results demonstrate the effectiveness of the approach.
135
Chapter 6Conclusion and Future Directions
The attempt to clearly understand what a time�varying spectrum is, and to rep-
resent the properties of a signal simultaneously in time and frequency without any
ambiguity, is one of the most fundamental and challenging aspects of analysis. A
large pubished scienti�c literature highlights the signi�cance of TF processing with
regard to improved concentration and resolution. However as this task is achieved
by many different types of TF techniques, it is important to search for the one that is
most pertinent to the application. Although the WD and the spectrogram QTFDs are
often the easiest to use, they do not always provide an accurate characterization of
the real data. The spectrogram results in a blurred version and the use of the WD in
practical applications has been limited by the presence of CTs and inability to pro-
duce ideal concentration for non�linear IF variations. The spectrogram, for example,
could be used to obtain an overall characterization of the STSC' structure, and then
the information could be used to invest in another QTFD that is well matched to
the data for further processing that requires information that is not provided by the
spectrogram, the idea conceived and implemented in this thesis [217].
In the �rst part of thesis, it is attempted to provide a response to the questions:
Why high concentration and good resolution is important? What were the motiva-
tions of various researchers to come up and implement newer methods for this pur-
136
pose? How they have used new ideas and implemented the techniques to achieve the
desired objectives? Concentrating on various methods and well�tested algorithms,
the discussion is focused on the basic concept, important peroperties, implementa-
tion methods and simulation results that emphasize the importance and signi�cance
of the technique to the analysis signals. However there are a large number of pro-
posed methods, and only a few have been explored in a sequence with an aim to
produce the ideas and techniques in a logical way.
In the rest of the thesis, a new ANN based approach incorporating Bayesian
regularization is implemented and evaluated of computing informative, non�blurred
and high resolution TFDs. The resulting TFDs do not have the CTs that appear in
case of multicomponent signals in some distributions such as WDs, thus providing
visual way to determine the IF of non�stationary signals. The technique explores that
the mixture of ENs focused on a speci�c task deliver a TFD that is highly concen-
trated along the IF with no CTs as compared to training the ANN which does not
receive the selected input. Experimental results presented in Chapter 5 demonstrate
the effectiveness of the approach.
For the completeness of proposed framework, the NTFDs' performance is fur-
ther assessed by the information theoretic criteria. These quantitative measures of
goodness are used instead of relying solely on the visual measure of goodness of
TFDs' plots. The mathematical framework to quantify the TFDs' information is
found effective in ascertaining the superiority of the results obtained by the ANN
137
based multiprocesses technique, using both synthetic and real life examples. The
NTFDs are compared with some popular distributions known for their CTs' suppres-
sion and high energy concentration in the TF domain. It is shown that the NTFDs
exhibit high resolution, no interference terms between the signal components and
are highly concentrated. Also they are found to be better at detecting the number of
components in a given signal compared to the conventional distributions.
6.1 Future Directions
In this thesis, a framework is proposed of computing informative, non�blurred and
high resolution version TFDs by identifying a novel utilization of the ANN �eld. To
assess and further improve the ef�ciency of this framework, as well as to identify
other useful extensions, following may be investigated:
1. If a TFD is positive and satis�es the marginals, it may be considered to be a
proper TFD for extraction of time�varying frequency parameters such as the IF.
This is because positivity coupled with correct marginals ensures that the TFD is
a true probability density function, and the parameters extracted are meaningful
[135]. The NTFD may be modi�ed to satisfy the marginal requirements, and still
preserve its other important characteristics like positivity. One way to optimize
the NTFD is by using the cross entropy minimization method [197, 198]. The
MCE optimization was �rst applied to TFDs by Loughlin et al. [246].
138
2. Indeed different applications have different preferences and requirements to the
TFDs. In general the choice of a TFD in a particular situation depends on many
factors such as the relevance of properties satis�ed by TFDs, the computational
cost and speed of the TFD, and the tradeoff in using the TFD. Although NTFDs
are de�blurred, and highly concentrated, but are discontinuous, showing energy
gaps, thus missing some signal information. Moreover it is found that the
resulting TFDs are not valid energy distributions because they do not observe the
signature continuity and marginal characteristics or weak signal mitigation. Due
to this reason, the results may not be feasible for certain applications because
different applications have different preference and requirement to the TFDs. It
can be attributed to the pre�processing limitations as the processed target WD
images, as shown in Fig. 4.9(a), are discontinuous at various places once seen
at high resolution. This aspect is expected to improve if the target TFDs are
made continuous along the IFs of individual components present in the signal.
However the BRNNM's produces results that are better or close to the actual
TFD images than the initial blurred estimates (spectrograms). Furthermore,
several TFDs, especially the ones satisfying the marginals, have discontinuities
[31]. The other possibilities like the use of region growing algorithms and
interpolating along the individual components may also be considered
3. The approach can be extended to the analysis of signals with more complicated
IF laws by possibly incorporating other techniques, for example, piece wise
139
linear approximation of the the IF using Hough transform and evolutionary
spectrum [45] and to the combined Wigner�Hough transform [46, 103] for CTs
suppression, optimal detection and parameter estimation. A separate work is
needed for the signals that are not linear or sinusoidal chirps with or without the
addition of noise, how performance of algorithm will change.
4. The method does not give a mathematical expression for the IF which is
important for certain applications such as jammer excision, but an image. The IF
can be computed for the resultant NTFDs, by calculating the average frequency
at each time [31].
5. Essentially the objective assessment part of the thesis is an incremental study
that combines existing results on concentration measure evaluation and TFD
kernel and thus merely scratches the surface of potential application of these
criteria in TF analysis. Worthy of pursuit seems the axiomatic derivation of an
application of the ideal TF complexity measure along the lines of Jones and
Parks for devising the ratio of distribution norms [112], Baraniuk and Jones's
effort in de�ning optimal kernel distributions' design [132], Rényi's work in
probability theory [118] and investigate other possible measures.
140
References
[1] L. Cohen, �Time�frequency distributions�A review,� Proc. IEEE, vol. 77 pp.941�981, July 1989.
[2] E.P. Wigner, �On the quantum correction for thermodynamic equilibrium,�PHYS. Rev., vol. 40, pp. 749�759, 1932.
[3] J. Ville, �Theorie et applications de la notion de signal analytique,� cables etTransmission, vol. 2, no. 1, pp. 61�74, 1946.
[4] S. Erkucuk, S. Krishnan, and M. Zeytinoglu, �Robust audio watermarkingusing a chirp based technique,� Proc. IEEE Intl. Conf. Multimedia and Expo(ICME '03), vol. 2, pp. 513�516, Baltimore, Md, USA, July 2003.
[5] A. Ramalingam and S. Krishnan, �A novel robust image watermarking usinga chirp based technique,� Proc. IEEE Canadian Conf. Electrical and Com-puter Engineering (CCECE '04), vol. 4, pp. 1889�1892, Ontario, Canada,May 2004.
[6] P. E. Gill, W. Murray, and M. H. Wright, Numerical Linear Algebra andOptimization, Addison�Wesley, Redwood City, CA, 1991.
[7] W. Rudin, Real and Complex Analysis. New York: McGraw�Hill, 1987.
[8] H. Margenau and R. N. Hill. �Correlation between measurements in quantumtheory,� Prog. Theor. Phys., vol. 26. pp. 772�738, 1961.
[9] P. Goupillaud, A. Grossmann, and J. Morlet, �Cycle�octave and related trans-forms in seismic signal analysis,� Geoexploration, vol. 23, pp. 85�102, 1984.
[10] I. Daubechies, �The wavelet transform, time�frequency localization, and sig-nal analysis,� IEEE Trans. Inform. Theory, vol. 36, pp. 961�1005, 1990.
[11] I. Daubechies, Time�frequency localization operators: A geometric phasespace approach, IEEE Trans. Inform. Theory, 34, pp. 605�612, 1988.
[12] O. Rioul and P. Flandrin, �Time�scale energy distributions: A general classextending wavelet transforms,� IEEE Trans. Signal Process., vol. 40, pp.1746�1757, 1992.
141
[13] J. Bertrand and P. Bertrand, �Time�frequency representations of broadbandsignals,� Proc. IEEE Intl. Conf on Acoustics, Speech, and Signal Processing(IEEE ICASSP), pp. 2196�2199, 1988.
[14] L. Cohen, �Distribution concentrated along the instantaneous frequency,�SPIE�Advanced Signal Processing Alg., Architect., Implement., vol. 1348,pp. 149�157, 1990.
[15] LJ. Stankovic, �An analysis of some time�frequency and time�scale distrib-utions,� Annales des Telecommun., vol. 49 , pp. 505�517 , 1994.
[16] C. Eichmann and B. Z. Dong, �Two�dimensional optical �ltering of l�Dsignals,� Appl. Opt., vol.21, pp. 3152�3156, 1982.
[17] B. Boashash and B. Ristic, �Polynomial WVD's and time�varying polyspec-tra�, in Higher Order Statistical Proc., B. Boashash et al., Eds. London, U.K.:Longman Chesihire, 1993.
[18] B. Boashash and P. O'Shea, �Polynomial Wigner�Ville distributions andtheir relationship to time�varying higher order spectra,� IEEE Trans. SignalProcess., vol. 42, no. 1, pp. 216�220, Jan. 1994.
[19] J. E. Allen and L. R. Rabiner, �A uni�ed approach to short� time Fourieranalysis and synthesis,� Proc. IEEE, vol. 65, pp. 1558�1564, 1977.
[20] R. Altes, �Detection, estimation and classi�cation with spectrograms,� Jour-nal Acoust. Soc. Am., vol. 67, pp. 1232�1246,1980.
[21] A. Dziewonski, S. Bloch, and M. Landisman, �A technique for the analysisof transient signals,� Bull. Seism. Soc. Am., Vol. 59, pp. 427�444, 1969.
[22] J. Flanagan, Speech Analysis Synthesis and Perception. New York, NY:Springer, 1972.
[23] A. L. Levshin, V. F. Pisarenko, and G. A. Pogrebinsky, �On a frequency�timeanalysis of oscillations,� Ann. Geophys., Vol. 28, pp. 211�218, 1972.
[24] A. V. Oppenheim, �Speech spectrograms using the fast Fourier transform,�IEEE Spectrum, vol. 7, pp. 57�62, 1970.
142
[25] M. R. Portnoff, �Time�frequency representation of digital signals and sys-tems based on short�time Fourier analysis,� IEEE Trans. Acoust., Speech,Signal Process., vol. ASSP�28, pp. 55�69, 1980.
[26] B. Boashash, �Estimating and interpreting the instantaneous frequency of asignal�Part 1: Fundamentals�, Proc. IEEE, Vol. 80, pp. 519�538, Apr. 1992.
[27] B. Boashash, �Estimating and interpreting the instantaneous frequency of asignal. II. Algorithms and applications,� Proc.IEEE, vol. 80, no. 4, pp. 540�568, Apr. 1992.
[28] V. Katkovnic and L. Stankovic,"Instantaneous frequency esimation usingthe Wigner distribution with varying and data�driven window length," IEEETrans. Signal Process., 46, pp. 2315�2325, Sep. 1998.
[29] Z.M.Hussain and B. Boashash, " Adaptive Instantaneous Frequency Esitma-tion of Multi�component FM Signals," Proc. IEEE ICASSP, 5, pp. 657�660,Jun. 2000.
[30] V. Katkovnic, "Nonparametric estimation of instantaneous frequency," IEEETrans. Info. Theory, 43, pp. 183�189, Jan. 1997.
[31] L. Cohen, �Time Frequency Analysis�, Prentice�Hall, NJ, 1995.
[32] S. Qian, D. Chen, �Joint time�frequency analysis,� IEEE Signal ProcessingMagazine, vol. 16, no. 2, pp. 52�67, Mar. 1999.
[33] B. Boashash and V. Sucic, � Resolution Measure Criteria for the Objec-tive Assessment of the Performance of Quadratic Time�Frequency Distri-butions,� IEEE Trans. Signal Process., vol. 51, no. 5, pp. 1253�1263, May2003.
[34] B. Boashash, Time�Frequency Signal Analysis and Processing, B. Boashash,Ed. Englewood Cliffs, NJ: Prentice�Hall, 2003.
[35] __________ , Time�Frequency Signal Analysis. Methods and Applications,B. Boashash, Ed. Melbourne, Australia/NewYork: Longman�Cheshire/Wiley,1992.
[36] S. Aviyente, and W. J. Williams, �Minimum Entropy Time�Frequency Dis-tributions,� IEEE Signal Process. Lett., vol. 12, no. 1, pp. 37�40, Jan. 2005.
143
[37] R. G. Baraniuk, P. Flandrin, A. J. E. M. Janssen, and O. Michel, �Measuringtime�frequency information content using the Rényi entropies,� IEEE Trans.Info. Theory, vol. 47, no. 4, pp. 1391�1409, May 2001.
[38] A. Kayhan, A. El�Jaroudi, and L. Chaparro. The evolutionary periodogramfor non�stationary signals. IEEE Trans. Signal Process., 42(6), 1994.
[39] A. S. Kayhan, A. El�Jaroudi, and L. F. Chaparro, � Data�Adaptive Evolu-tionary Spectral Estimation,� IEEE Trans. Signal Process., vol. 43, no. 1, pp.204�213, Jan. 1995.
[40] Akan, A., & Chaparro, L.F., �Evolutionary Chirp Representation of Non�stationary Signals via Gabor Transform�, Signal Processing, Vol. 81, No.11, pp. 2429�2436, Nov 2001.
[41] Akan, A., & Chaparro, L.F., �Evolutionary spectral analysis using a warpedGabor expansion�, Proc. IEEE ICASSP, Vol. 3, pp. 1403�1406, May 1996.
[42] Akan, A., �Signal�Adaptive Evolutionary Spectral Analysis Using Instanta-neous Frequency Estimation�, FREQUENZ Journal of RF�Engineering andTelecommunications, Vol. 59, No. 7�8/2005, pp. 201�205, July�Aug 2005.
[43] Chaparro, L.F., Suleesathira, R., Akan, A., & Unsal, B., �Instantaneous Fre-quency Estimation using Discrete Evolutionary transform for Jammer Exci-sion�, Proc. IEEE ICASSP, vol. 6, pp. 3525 � 3528, 7�11 May 2001.
[44] Suleesathira, R., Chaparro, L.F., & Akan, A.,� Discrete Evolutionary Trans-form for Time�Frequency Analysis�, Conf. Record of the 32nd AsilomarConference on Signals, Systems & Computers, vol. 1, pp. 812 � 816, 1�4Nov. 1998
[45] Suleesathira, R., & Chaparro, L.F.,�Interference Mitigation in Spread Spec-trumUsing Discrete Evolutionary and Hough Transforms�, Proc. IEEE ICASSP,vol. 5, pp. 2821 � 2824, 5�9 June 2000.
[46] Barbarossa, S., Scaglione, A., Spalletta, S., Votini, S.,"Adaptive suppres-sion of wideband interferences in spread�spectrum communications usingthe Wigner�Hough transform," Proc. IEEE ICASSP, Vol. 5, pp. 3861�3864,21�24 April 1997.
144
[47] Chaparro, L.F., Alshehri, A., "Jammer excision in spread spectrum commu-nication via wiener masking and frequency�frequency evolutionary trans-form," Proc. IEEE ICASSP, Vol. 4, pp.473�476, 6�10 April 2003.
[48] Akan, A., & Chaparro, L.F.,�Multi�window Gabor Expansion for Evolution-ary Spectral Analysis�, Signal Processing, Vol. 63, pp. 249�262, Dec. 1997.
[49] Jachan, M. Matz, G. Hlawatsch, F. , �Time�Frequency ARMA Models andParameter Estimators for Underspread Nonstationary Random Processes�,IEEE Trans. Signal Process., Vol. 55, Number 9, pp. 4366�4381, Sept 2007.
[50] M. Niedz´wiecki, Identi�cation of Time�Varying Processes. New York: Wi-ley, 2000.
[51] G. Matz and F. Hlawatsch, �Nonstationary spectral analysis based on time�frequency operator symbols and underspread approximations,� IEEE Trans.Info. Theory, vol. 52, pp. 1067�1086, Mar. 2006.
[52] G. Matz and F. Hlawatsch, �Time�varying power spectra of nonstationaryrandom processes,� in Time�Frequency Signal Analysis and Processing: AComprehensive Reference, B. Boashash, Ed. Oxford, U.K.: Elsevier, ch. 9.4,pp. 400�409, 2003.
[53] M. Wax and T. Kailath, �Ef�cient inversion of Toeplitz�block Toeplitz ma-trix,� IEEE Trans. Acoust., Speech, Signal Process., vol. 31, pp. 1218�1221,Oct. 1983.
[54] Y. Grenier, �Parametric time�frequency representations,� in Traitement duSignal/Signal Processing, Les Houches, Session XLV, J. L. Lacoume, T. S.Durrani, and R. Stora, Eds. Amsterdam, The Netherlands: Elsevier, pp. 338�397, 1987.
[55] M. Jachan, G. Matz, and F. Hlawatsch, �Time�frequency�autoregressiverandom processes: Modeling and fast parameter estimation,� Proc. IEEEICASSP, Hong Kong, vol. VI, pp. 125�128, Apr. 2003.
[56] Y. Grenier, �Time�dependent ARMA modeling of nonstationary signals,�IEEE Trans. Acoust., Speech, Signal Process., vol. 31, pp. 899�911, Aug.1983.
[57] N. A. Abdrabbo and M. B. Priestley, �On the prediction of nonstationaryprocesses,� J. Roy. Stat. Soc. Ser. B, vol. 29, no. 3, pp. 570�585, 1967.
145
[58] M. Jachan, F. Hlawatsch, and G. Matz, �Linear methods for TFARMA pa-rameter estimation and system approximation,� Proc. 13th IEEE WorkshopStatistical Signal Processing, Bordeaux, France, pp. 909�914, Jul. 2005.
[59] P. Flandrin, Time�Frequency/Time�Scale Analysis. San Diego, CA: Acad-emic, 1999.
[60] F. Hlawatsch and P. Flandrin, �The interference structure of theWigner dis-tribution and related time�frequency signal representations,� in The WignerDistribution�Theory and Applications in Signal Processing, W. Mecklen-bräuker and F. Hlawatsch, Eds. Amsterdam, The Netherlands: Elsevier, pp.59�133, 1997.
[61] M. Jachan, G. Matz, and F. Hlawatsch, �TFARMAmodels: Order estimationand stabilization,� Proc. IEEE ICASSP, Philadelphia, PA, vol. IV, pp. 301�304, Mar. 2005.
[62] H. Akaike, �A new look at the statistical model identi�cation,� IEEE Trans.Autom. Control, vol. 19, pp. 716�723, Dec. 1974.
[63] S. M. Kay, Modern Spectral Estimation. Englewood Cliffs, NJ: Prentice�Hall, 1988.
[64] Shah, S.I., Chaparro, L.F., & El�Jaroudi, A.,"Generalized Transfer Func-tion Estimation using Evolutionary Spectral Deblurring�, IEEE Trans. SignalProcess., Vol. 47, Number 8, pp. 2335�2339, August 1999.
[65] Shah, S.I., "Generalized Transfer Function estimation and Informative Priorsfor Positive Time�Frequency Distributions", PhD Dissertation, University ofPittsburgh, Pittsburgh, PA, 1997.
[66] Unsal Artan, R.B., Akan, A., Chaparro, L.F.,"Higher order evolutionaryspectral analysis," Proc. IEEE ICASSP, Vol. 4, pp. 633�636, 6�10 April2003.
[67] Wexler, J., and Raz, S., "Discrete Gabor Expansions," Signal Processing,vol. 21, no. 3, pp. 207�220, Nov. 1990.
[68] L. Cohen, "Generalized phase�space distribution functions," J. Math. Phys.,vol. 7, pp. 781�786, 1966.
146
[69] T. A. C.M. Claasen andW. F. G.Mecklenbrau ker, "TheWigner distribution�a tool for time�frequency signal analysis; part I: continuous�time signals,"Philips Journal of Research, vol. 35, pp. 217�250, 1980.
[70] T. A. C. M. Claasen andW. F. G. Mecklenbrauker, "TheWigner distribution�a tool for time�frequency signal analysis; part II: discrete time signals,"Philips Journal of Research, vol. 35, pp. 276�300, 1980.
[71] T. A. C. M. Claasen andW. F. G. Mecklenbrauker, "TheWigner distribution�a tool for time�frequency signal analysis; part III: relations with other time�frequency signal transformations," Philips Journal of Research, vol. 35, pp.372�389, 1980.
[72] A. J. E. M. Janssen, �On the locus and spread of pseudo�density functions inthe time�frequency plane,� Philips Journal of Research, vol. 37, pp. 79�110,1982.
[73] C. P. Janse and J. M. Kaizer, �Time�frequency distributions of loudspeakers:the application of the Wigner distribution,� Journal of Audio Engg. Soc., vol.31, pp.198�223, 1983.
[74] B. Boashash, "Representation temps�frequence," Soc. Nat. ELF Aquitaine,Pau, France, Publ. Recherches, no. 373�378, 1978.
[75] P. Flandrin and W. Martin, �A general class of estimators for the Wigner�Ville spectrum of nonstationary processes,� in Systems Analysis and Opti-mization of Systems, Lecture Notes in Control and Information Sciences.Berlin, Vienna, New York Springer�Verlag, pp. 15�23, 1984.
[76] R. D. Hippenstiel and P. M. de Oliveira, `Time varying spectral estima-tion using the instantaneous power spectrum (IPS),� IEEE Trans. Acoust.,Speech, Signal Process., vol. 38, pp. 1752�1759, 1990.
[77] P. Flandrin and B. Escudie, �Time and frequency representation of �nite en-ergy signals: a physical property as a result of a Hilbertian condition,� SignalProcessing, vol. 2, pp. 93�100, 1980.
[78] H. Margenau and L. Cohen, �Probabilities in quantum mechanics,� in Quan-tum Theory and Reality, M. Bunge, Ed. New York, NY: Springer, 1967.
147
[79] L. Stankovic, "A Time�Frequency Distribution Concentrated Along the In-stantaneous Frequency�, IEEE Signal Process. Lett., Vol. 3, No. 3, pp. 89�91, March 1996.
[80] LJ. Stankovic and S. Stankovic, �An analysis of the instantaneous frequencyrepresentation using time�frequency distributions�Generalized Wigner dis-tribution,� IEEE Trans. Signal Process., vol. 43, no. 2, Feb. 1995.
[81] LJ. Stankovic, �A method for improved distribution concentration in thetime�frequency signal analysis using the L�Wigner distribution,� IEEE Trans.Signal Process., vol. 43, no. 5, May 1995.
[82] ___________, �A multitime de�nition of the Wigner higher order distribu-tion: L�Wigner distribution,� IEEE Signal Process. Lett., vol.1, no. 7, pp.106�109, July 1994.
[83] __________, �An analysis of the Wigner higher order spectra of multicom-ponent signals,� Ann. Telecomm., vol. 49, no. 3�4, pp. 132�136, Mar/Apr.1994.
[84] I. Djurovic´ and LJ. Stankovic´, �In�uence of high noise on the instanta-neous frequency estimation using time�frequency distributions,� IEEE Sig-nal Process. Lett., vol. 7, pp. 317�319, Nov. 2000.
[85] J. R. Fonolosa and C. L. Nikias: �Wigner higher order moment spectra: De�-nitions, properties, computation and application to the transient signal analy-sis,� IEEE Trans. Signal Process., vol. 41, no. 1, pp.245�266, Jan. 1993.
[86] F. Hlawatsch and G. F. Boudreaux�Bartels, �Linear and quadratic time�frequency signal representations,� IEEE Signal Processing Mag., pp.21�67,Apr. 1992.
[87] Slueesathira, R., Chaparro, L.F., Akan, A., "Discrete Evolutionary Trans-form for Positive Time Frequency Signal Analysis", Journal of Franklin In-stitute, Vol. 337, No. 4, pp. 347�364, 2000.
[88] LJ. Stankovic, �L�class of time�frequency distributions,� IEEE Signal Process.Lett., vol. 3, pp. 22�25, Jan. 1996.
[89] __________, �On the realization of the highly concentrated time�frequencydistributions,� Proc. IEEE Symp. TFTSA, Paris, pp. 461�464, Jun 1996.
148
[90] __________, �Highly Concentrated Time�Frequency Distributions: PseudoQuantum Signal Representation�, IEEE Trans. Signal Process., Vol. 45, No.3, pp. 543�551, March 1997.
[91] http://en.wikipedia.org/wiki/Reassignment_method.
[92] F. Hlawatsch and P. Flandrin, �The interference structure of the Wigner dis-tribution and related time�frequency signal representations,� in The WignerDistribution�Theory and Applications in Signal Processing, W.Mclenbrauker,ed., Amsterdam, Netherlands, Elsevier 1994.
[93] F. Auger and P. Flandrin, Improving the readability of time�frequency andtime�scale representations by the reassignment method, IEEE Trans. SignalProcess., vol. 43, pp. 1068 � 1089, May 1995.
[94] P. Flandrin, F. Auger, and E. Chassande�Mottin, Time�frequency reassign-ment: From principles to algorithms, in Applications in Time�FrequencySignal Processing (A. Papandreou�Suppappola, ed.), ch. 5, pp. 179 � 203,CRC Press, 2003.
[95] K. Kodera, C. de Villedary, and R. Gendrin, R, �A newmethod for the numer-ical analysis of non�stationary signals,� Phys. Earth and Planetary Interiors,vol. 12, pp. 142�150, 1976.
[96] K. Kodera, R. Gendrin, and C. de Villedary, �Analysis of time� varying sig-nals with small BT values,� IEEE Trans. Acoust., Speech, Signal Process.,vol. ASSP�26, pp. 64�76, 1978.
[97] D. J. Nelson, Cross�spectral methods for processing speech, Journal of theAcoustical Society of America, vol. 110, pp. 2575 � 2592, Nov. 2001.
[98] S. A. Fulop and K. Fitz, A spectrogram for the twenty��rst century, AcousticsToday, vol. 2, no. 3, pp. 26�33, 2006.
[99] ���������, Algorithms for computing the time�corrected instan-taneous frequency (reassigned) spectrogram, with applications, Journal ofthe Acoustical Society of America, vol. 119, pp. 360 � 371, Jan 2006.
[100] D. L. Jones, T.W. Parks, �A Resolution Comparison of Several Time�FrequencyRepresentations,� IEEE Trans. Signal Process., vol. 40, No. 2, Feb 1992.
149
[101] LJubisa, S., Vladimir, K., Algorithm for the Instantaneous Frequency Esti-mation Using Time�Frequency Distributions with AdaptiveWindowWidth,IEEE Signal Process. Lett., Vol. 5, No. 9, pp. 224�227, 1998.
[102] Barkat, B., Abed�Meraim, K., Algorithms for Blind Components Separa-tion and Extraction from the Time�Frequency Distribution of Their Mixture,EURASIP Journal on Applied Signal Processing, Vol. 2004, No. 13, pp.2025�2033, 2004.
[103] Barbarossa, S., "Analysis of Multicomponent LFM Signals by a CombinedWigner�Hough Transform", IEEE Trans. Signal Process., Vol. 46, No. 6, pp.1511�1515, 1995.
[104] Yagle, A.E., Torres�Fernandez, J.E., "Construction of Signal�DependentCohen's�class time�frequency distributions using iterative blind deconvo-lution", Proc. SPIE, Advanced Signal Processing Algorithms, Architectures,and Implementations XIII. Edited by Luk, Franklin T., Vol. 5205, pp. 47�58,2003.
[105] C. Stergiou, �What is a Neural Network�, http://www.doc.ic.ac.uk.
[106] C. Stergiou, �Neural Networks, the Human Brain and Learning�, http://www.doc.ic.ac.uk
[107] R.M. Gray, �Entropy and Information Theory�. New York Springer�Verlag,1990.
[108] K. Jain, J. Mao and K. M. Mohiddin, �Arti�cial Neural Network: A tutorial�,IEEE Trans. Computers, pp. 31�44, 1996.
[109] Basu, M., and Su, M., "Deblurring images using projection pursuit learningnetwork," Proc. Int. Joint Conf. on Neural Networks, IJCNN'99, Washing-ton, DC, 1999.
[110] A.E. Ruano, �Intelligent Control Systems Using Computational IntelligenceTechniques�, July 2005.
[111] R.C. Gonzalez & P.Wintz, �Digital Image Processing�, 2nd Ed., Addison�Wesley, 1987.
[112] D. Jones and T. Parks, �A high resolution data�adaptive time�frequencyrepresentation,� IEEE Trans. Acoust., Speech, Signal Process., vol. 38, pp.2127�2135, Dec. 1990.
150
[113] W. J. Williams and T. Sang, �Adaptive RID kernels which minimize time�frequency uncertainty,� Proc. IEEE�SP Int. Symp. Time�Freq. Time�ScaleAnal., Philadelphia, PA, pp. 96�99, Oct. 1994.
[114] T. H. Sang and W. J.Williams, �Renyi information and signal�dependentoptimal kernel design,� Proc.IEEE ICASSP, vol. 2, Detroit, MI, pp. 997�1000, May 1995.
[115] P. M. Oliveira and V. Barosso, �Uncertainty in the time�frequency plane,�Proc. 10th IEEE Workshop Statist. Signal Array Process., Pocono Manor,PA, pp. 607�611, Aug. 2000.
[116] W. J. Williams, M. Brown, and A. Hero, �Uncertainty, information and time�frequency distributions,� in SPIE�Advanced Signal Processing Algorithms,vol. 1556, pp. 144�156, 1991.
[117] C. E. Shannon, �A mathematical theory of communication, Part I,� Bell Sys.Tech J., vol. 27, pp. 379�423, July 1948.
[118] A. Rényi, �On measures of entropy and information,� Proc. 4th BerkeleySymp. Math. Stat. and Prob., vol. 1, pp. 547�561, 1961.
[119] C. Arndt, �InformationMeasures: Information and its Description in Scienceand Engineering�, Springer, Berlin, 2001.
[120] D. Gabor, �Theory of communication,� J. Inst. Electron. Eng., vol. 93, no.11, pp. 429�457, Nov. 1946.
[121] D. Vakman, �Optimum signals which minimizes partial volume under anambiguity surface,� Radio Eng., Electron. Phys.� vol. 27, pp. 1260�1268,Aug. 1968.
[122] A. Dziewonsi, S. Bloch, and M. Landisman, �A technique for the analysisof transient seismic signals,� Bull. Seismological Soc. Amer., pp. 427�449,Feb. 1969.
[123] G. L. Duckworth, �Processing and inversion of Arctic Ocean refraction data,�Sc.D. dissertation, Massachusetts Inst. Technol., Campridge, MA, 1983.
[124] G. W. Deley, �Waveform design,� in Radar Handbook, M.I. Skolnik, Ed.New York, NY: McGraw�Hill, 1970.
151
[125] W. Rihaczek, Principles of High�Resolution Radar. NewYork, NY:McGraw�Hill, 1969.
[126] M. I. Scolnik, lntroduction to Radar Systems. New York, NY: McCraw�Hill,1980.
[127] P. M. Woodward, Probability and information Theory with Application toRadar. London, England: Pergamon, 1953.
[128] H. H. Szu and J. Blodgett, �Wigner distribution and ambiguity functions,� inOptics in Four Dimensions, L. M. Narducci, Ed. NewYork, NY:Am. Inst.ofPhysics, pp.355�381, 1981.
[129] C. Eichmann and N. M. Marinovic, �Scale�invariant Wigner distribution andambiguity functions,� Proc. lnt. Soc. Opt. Eng., Proc. SPIE, vol. 519, pp. 18�24, 1985.
[130] R. G. Baraniuk, �Shear Madness: Signal�Dependent and Metaplectic Time�Frequency Representations,� Ph.D. Thesis, Department of Electrical and Com-puter Engineering, University of Illinois at Urbana�Champaign, August 1992.
[131] R. G. Baraniuk, D. L. Jones, �A Signal�Dependent Time�Frequency Repre-sentation: Optimal Kernel Design,� IEEE Trans. Signal Process., vol. 41, no.4, pp. 1589�1602, April 1993.
[132] R. G. Baraniuk and D. L. Jones, �Signal�Dependent Time�Frequency Analy-sis Using a Radially Gaussian Kernel,� Signal Processing, vol. 32, no. 3, pp.263�284, June 1993.
[133] D. L. Jones and R. G. Baraniuk, �An Adaptive Optimal�Kernel Time�FrequencyRepresentation,� IEEE Trans. Signal Process., vol. 43, no. 11, pp. 2361�2371, October 1995.
[134] http://www�dsp.rice.edu.
[135] L. Cohen and T.E. Posch, �Positive Time�Frequency Distribution Functions,�IEEE Trans. Acoust. Speech Signal Process., 33, pp. 31�37, Feb. 1985.
[136] W. D. Mark, �Spectral analysis of the convolution and �ltering of non�stationary stochastic processes,�J. Sound Vib., VOL. 11, pp. 19�63, 1970.
152
[137] R. M. Fano, �Short�time autocorrelation functions and power spectra,� J.Acoust. Soc. Am., vol. 22, pp. 546�550, 1950.
[138] M. R. Schroeder and B. S. Atal, �Generalized short�time power spectra andautocorrelation functions,� J. Acoust. Soc. Am., vol. 34, pp. 1679�1683,1962.
[139] M. H. Ackroyd, �Instantaneous and time�varying spectra� an introduction,�Radio Electron. Eng., vol. 239, pp. 45�152, 1970.
[140] ������, �Short�time spectra and time�frequency energy distribu-tions,�J. Acoust. Soc. Am., vol. 50, pp. 1229�1231,1970.
[141] D. G. Lampard, �Generalization of the Wiener�Khintchine theorem to non-stationary processes,� J. Appl. Phys., vol. 25, p. 802, 1954.
[142] S. Grasssin and R. Garello, �Spectral analysis of the swell using the re-assigned Wigner�Ville Representation,� Proc. IEEE Conf. Oceans'96, pp.1539�1544, Fort Lauderdale, FL, Sep. 1996.
[143] M. Born and P. Jordan, �Zur Quantenmechanik,� Z. Phys., vol. 34, pp. 858�888, 1925.
[144] H. Choi and W. J. Williams, � Improved time�frequency representation ofmulticomponent signals using exponential kernels,� IEEE Trans. Acoust.,Speech, Signal Process., vol. 37, no. 6, pp. 862�871, June. 1989.
[145] J. Jeong andW.J.William, "Alias�free generalized discrete�time time�frequencydistributions," IEEE Trans. Signal Process., vol. 40, pp. 2757�2765, Nov.1992.
[146] A. Papandreou and G. F. Boudreaux�Bartels, �Distributions for time�frequencyanalysis: A generalization of Choi�Williams and the Butterworth distribu-tions,� Proc. IEEE ICASSP, vol. 5, pp.181�184, 1992.
[147] A. Papandreou�Suppappola, �Generalized time�shift covariant quadratic time�frequency representations with arbitrary group delays,� Proc. 29th AsilomarConf. Signals, Systems and Computers, Paci�c Grove, CA, pp. 553�557,Oct. 1995.
[148] A. Papandreou, F. Hlawatsch, and G. F. Boudreaux�Bartels, �Quadratic time�frequency representations with scale covariance and generalized time�shift
153
covariance: a uni�ed framework for the af�ne, hyperbolic, and power classes,�Digital Signal Process. a Rev. J., 8, 3�48, Jan. 1998.
[149] A. Papandreou and G. F. Boudreaux�Bartels, �The Exponential class andGeneralized time�shift covariant quadratic time�frequency representations,�Proc. IEEE�SP Intl. Symposium on Time�Frequency and Time�Scale analy-sis, Paris, France, pp.429�432, Jun. 1996.
[150] A. Papandreou, F. Hlawatsch, and G. F. Boudreaux�Bartels, �A uni�ed frame-work for the Scale covariant Af�ne, Hyperbolic, and Power class Time�Frequenc Representations Using Generalized Time�Shifts,� Proc. IEEE ICASSP,Detroit, MI, May 1995.
[151] A. Papandreou�Suppappola, "New Classes of Quadratic Time�FrequencyRepresentations with Scale Covariance and Generalized Time�Shift Covari-ance: Analysis, detection, and estimation", Ph.D. thesis, University of RhodeIsland, Kingston, RI, 1995.
[152] A. Papandreou, F. Hlawatsch, and G. F. Boudreaux�Bartels, �The hyperbolicclass of Quadratic time�frequency representations, Part I. Constant Q warp-ing, the hyperbolic paradigm, properties and members,� IEEE Trans. SignalProcess., Special issue on wavelets and signal processing, 41, 3425�3444,Dec. 1993.
[153] F. Hlawatsch, A. Papandreou, and G. F. Boudreaux�Bartels, " The Powerclasses of Quadratic time�frequency representations: A Generalization ofthe Af�ne and Hyperbolic Classes," Proc. 27th Asilomer Conf. on Signals,Systems and Computers, Paci�c Grove, CA, pp. 1265�1270, Nov. 1993.
[154] F. Hlawatsch, A. Papandreou, and G. F. Boudreaux�Bartels, " The Powerclasses � quadratic time�frequency representations with scale covarianceand dispersive time�shift covariance," IEEE Trans. Signal Process., 47, pp.3067�3083, Nov. 1999.
[155] A. Papandreou�Suppappola, R.L. Murray, B.G. Iem, and G. F. Boudreaux�Bartels, �Group delay shift covariant quadratic time�frequency representa-tions," IEEE Trans. Signal Process., 49, pp. 2549�2564, Nov. 2001.
[156] A. Papandreou�Suppappola, "Time�Frequency Representations covariant tofrequency�dependant time shifts", in Time�Frequency Signal Analysis andProcessing, B. Boashash, Ed., Prentice Hall, New York, 2002.
154
[157] A. Papandreou�Suppappola, B.G. Iem, and G. F. Boudreaux�Bartels, �Time�Frequency symbols for statistical signal processing," in Time�FrequencySignal Analysis and Processing, B. Boashash, Ed., Prentice Hall, New York,2002.
[158] A. Papandreou and G. F. Boudreaux�Bartels, �The Effect of mismatchinganalysis signals and time�frequency representations,� Proc. IEEE�SP Intl.Symposium on Time�Frequency and Time�Scale analysis, Paris, France,pp.149�152, Jun. 1996.
[159] P. Guillemain and P. White, �Wavelet transform for the analysis of dispersivesystems,� Proc. IEEE UK Symposium on Applications of Time�Frequencyand Time�Scale Methods, University of Warwick, Coventry, UK, pp. 32�39,Aug. 1995.
[160] M.J. Freeman, M.E. Dunham, and S. Qian, �Trans�ionospheric Signal De-tection by Time�Scale Representation,� Proc. IEEE UK Symposium on Ap-plications of Time�Frequency and Time�Scale Methods, University of War-wick, Coventry, UK, pp. 152�158, Aug. 1995.
[161] D.E. Newland, �Time�Frequency and Time�Scale analysis by harmonic wavelets,�in Signal analysis and Prediction, A. Prochazka, Ed., Birkhauser, Boston,Chap. 1, 1998.
[162] J.P. Sessarego, J. Sageloli, P.Flandrin, and M. Zakharia,"Time�FrequencyWigner�Ville analysis of echoes scattered by a spherical shell,"in Wavelets,Time�Frequency Methods and Phase Space, J.M. Combes, A. Grossman,and P. T chamitchian, Eds., Springer�Verlag, Heidelberg, pp. 147�153, 1989.
[163] P.M. Morse and H. Feshbach, Methods of Theoretical Physics, McGraw�Hill, New York, 1953.
[164] V. Szekely, Distributed RC networks, in The Circuits and Filters Handbook,W.K. Chen, Ed., CRC Press/IEEE Press, Boca Raton, FL, pp. 1203�1221,1995.
[165] A. Papandreou and L.T. Antonelli, � Use of quadratic time�frequency rep-resentations to analyze Cetacean mammal sounds,� Technical rep. 11, 284,Naval Undersea Warfare Centre, Newport, RI, Dec. 2001.
155
[166] A.H. Costa and G. F. Boudreaux�Bartels, �Design of time�frequency repre-sentations using a multiform, tiltable exponential kernel,� IEEE Trans. SignalProcess., 43, pp. 2283�2301, Oct. 1995.
[167] A. Papandreou�Suppappola, F. Hlawatsch, and G. F. Boudreaux�Bartels,�Power class Time�Frequency Representations: Interference geometry, Smooth-ing, and Implementation,� Proc. IEEE�SP Intl. Symposium on Time�Frequencyand Time�Scale analysis, Paris, France, pp.193�196, Jun. 1996.
[168] A. Papandreou and G. F. Boudreaux�Bartels, �Distortion that occurs whenthe signal group delay does not match the Time�Shift Covariance of a Time�Frequency Representation,� Proc. 30th Annual Conf. on Information Sci-ences and Systems, Princeton, NJ, pp. 520�525, Mar. 1996.
[169] Y. Zhao, L. E. Atlas, and R. J. Marks, �The use of cone�shaped kernels forgeneralized time�frequency representations of nonstationary signals,� IEEETrans. Acoust., Speech, Signal Process., vol. 38, pp. 1084�1091, July 1990.
[170] Sridhar Krishnan, �A New Approach for Estimation of Instantaneous MeanFrequency of a Time�Varying Signal,� EURASIP Journal on Applied SignalProcessing 2005:17, pp. 2848�2855.
[171] V. Sucic and B. Boashash, �Optimisation algorithm for selecting quadratictime�frequency distributions: performance results and calibration,� Proc. 6thInternational Symposium on Signal Processing and Its Applications (ISSPA'01), vol. 1, pp. 331�334, Kuala Lumpur, Malaysia, August 2001.
[172] LJubisa Stankovic, �AMeasure of Some Time�Frequency Distributions Con-centration,� Signal Processing, vol. 81, No. 3, pp. 212�223, Mar. 2001.
[173] S. S. Chen, D.L. Donoho, M.A. Saunders,"Atomic Decomposition by BasisPursuit," SIAM Journal on Scienti�c Computing, Volume 20, Number 1, pp.33�61, 1998.
[174] R. R. Coifman and M. V. Wickerhauser, Entropy�based algorithms for best�basis selection, IEEE Trans. Info. Theory, 38, pp. 713�718, 1992.
[175] S. G. Mallat and Z. Zhang, �Matching pursuits with time�frequency dictio-naries,� IEEE Trans. Signal Process., vol. 41, no. 12, pp. 3397�3415, 1993.
[176] N. E. Huang, Z. Shen, S. R. Long, M. C. Wu, H. H. Shin, Q. Zheng, N. C.Yen, C. C. Tung, and H. H. Liu, �The empirical mode decomposition and
156
the Hilbert spectrum for nonlinear and non�stationary time series analysis,�Proc. R. Soc. Lond. A, Math. Phys. Sci., vol. 454, no. 1971, pp. 903�995,Mar. 1998.
[177] A. O. Boudraa, J. C. Cexus, F. Salzenstein, and L. Guillon, �IF estimationusing empirical mode decomposition and nonlinear Teager energy operator,�Proc. IEEE ISCCSP, Hammamet, Tunisia, pp. 45�48, 2004.
[178] J. C. Cexus and A. O. Boudraa, �Nonstationary signals analysis by Teager�Huang transform (THT),� Proc. EUSIPCO, Florence, Italy, 5 p, 2006.
[179] A. O. Boudraa, and J. C. Cexus, �EMD�Based Signal Filtering,� IEEE Trans.Instumentation andMeasurement, Vol. 56, No. 6, pp. 2196�2202, Dec. 2007.
[180] J. C. Cexus and A. O. Boudraa, �Teager�Huang analysis applied to sonartarget recognition,� Int. J. Signal Process., vol. 1, no. 1, pp. 23�27, 2004.
[181] A. O. Boudraa, J. C. Cexus, and Z. Saidi, �EMD�based signal noise reduc-tion,� Int. J. Signal Process., vol. 1, no. 1, pp. 33�37, 2004.
[182] Z. Liu and S. Peng, �Boundary processing of bidimensional EMD using tex-ture synthesis,� IEEE Signal Process. Lett., vol. 12, no. 1, pp. 33�36, Jan.2005.
[183] A. O. Boudraa, J. C. Cexus, F. Salzenstein, and A. Beghdadi, �EMD�basedmultibeam echosounder images segmentation,� Proc. IEEE ISCCSP, Mar-rakech, Morocco, 2006.
[184] K. Zeng and M. X. He, �A simple boundary process technique for empiricalmode decomposition,� Proc. IEEE IGARSS, vol. 6, pp. 4258�4261, 2004.
[185] P. Flandrin, P. Goncalves, and G. Rilling, �Detrending and denoising withempirical mode decompositions,� Proc. EUSIPCO, Vienna, Austria, pp. 1581�1584, 2004.
[186] P. Flandrin and P. Gonçalves, �Empirical mode decompositions as a data�driven wavelet�like expansions,� Int. J. Wavelets, Multires., Inf. Process.,vol. 2, no. 4, pp. 477�496, 2004.
[187] G. Rilling, P. Flandrin, and P. Goncalves, �Empirical mode decomposition,fractional Gaussian noise, and Hurst exponent estimation,� Proc. IEEE ICASSP,Philadelphia, PA, vol. 4, pp. 489�492, 2005.
157
[188] R. Deering and J. F. Kaiser, �The use of a masking signal to improve empir-ical mode decomposition,� Proc. IEEE ICASSP, Philadelphia, USA, vol. 4,pp. 485�488, 2005.
[189] S. Benramdane, J. C. Cexus, A. O. Boudraa, and J. A. Astol�, �Transientturbulent pressure signal processing using empirical mode decomposition,�Proc. Phys. Signal Image Process., Mulhouse, France, 2007.
[190] P. Flandrin, G. Rilling, and P. Goncalves, �Empirical mode decomposition asa �lter bank,� IEEE Signal Process. Lett., vol. 11, no. 2, pp. 112�114, Feb.2004.
[191] Z. Wu and N. E. Huang, �A study of the characteristics of white noise usingthe empirical mode decomposition method,� Proc. R. Soc. Lond. A, Math.Phys. Sci., vol. 460, no. 2046, pp. 1597�1611, Jun. 2004.
[192] Flandrin, P., P. Goncalves and G. Rilling, "EMD equivalent �lter banks, frominterpretation to applications," Introduction to Hilbert�Huang Transform andits Applications, Ed. N. E. Huang and S. S. P. Shen, pp. 57�74. World Sci-enti�c, New Jersey, 2005.
[193] G. Rilling and P. Flandrin, "One or Two Frequencies? The Empirical ModeDecomposition Answers," IEEE Trans. Signal Process., Vol. 56, No. 1, pp.85�95, Jan. 2008.
[194] K. T. Coughlin and K. K. Tung, �11�year solar cycle in the stratosphereextracted by the Empirical Mode Decomposition method,� Adv. Space Res.,vol. 34, pp. 323�329, 2004.
[195] M. Chavez, C. Adam, V. Navarro, S. Boccaletti, and J. Martinerie, �On theintrinsic time scales involved in synchronization: A data�driven approach,�Chaos: An Interdisciplinary J. Nonlin. Sci., vol. 15, no. 2, pp. 023904�023904, 2005.
[196] A. Aïssa�El�Bey, K. Abed�Meraim, and Y. Grenier, �Underdetermined blindaudio source separation using modal decomposition,� EURASIP J. Audio,Speech, Music Process., vol. 2007, pp. 15�15, 2007.
[197] J. Shore and R. Johnson, �Axiomatic derivation of the principle of maxi-mum entropy and the principle of minimum cross�entropy,� IEEE Trans.Info. Theory, vol. 26, no. 1, pp. 26�37, 1980.
158
[198] J. Shore and R. Johnson, �Properties of cross�entropy minimization,� IEEETrans. Info. Theory, vol. 27, no. 4, pp. 472� 482, 1981.
[199] M. G. Amin and W. J. Williams, �High spectral Resolution Time�FrequencyKernels,� IEEE Trans. Signal Process., vol. 46, no. 10, pp. 2796� 2804, 1998.
[200] B. Boashash, B. Lovell, and H.Whitehouse, �High resolution time frequencysignal analysis by parametric modeling of the Wigner�Ville distribution,�Proc. ISSPA, Brisbane, Australia, Aug. 1987.
[201] P. Ramamoorthy, V. Iyer, and Y. Ploysongsang, �Autoregressive modeling ofthe Wigner spectrum,� Proc. IEEE ICASSP., Dallas, TX, Apr. 1987.
[202] L. J. Rudin, S. Osher, and E. Fatemi," Nonlinear total�variation�based noiseremoval algorithms," Phys. D, 60 , pp. 259�268, 1992.
[203] E. F. Velez and R. G. Absher, �Smoothed Wigner�Ville parametric modelingfor the analysis of nonstationary signals,� Proc. IEEE Int. Symp. CircuitsSyst., May 1989, pp. 507�510.
[204] _______, �Wigner half kernel modeling,� Signal Process., vol. 26, no. 2, pp.162�175, Feb. 1992.
[205] R. Kumaresan, �On the zeros of the linear prediction error �lter for deter-ministic signals,� IEEE Trans. Signal Process., vol. ASSP�31, pp. 217�220,Feb. 1983.
[206] S. L. Marple, Jr., Digital Spectral Analysis with Applications. EnglewoodCliffs, NJ: Prentice�Hall, 1987, ch. 11.
[207] Y. Zhang, M. G. Amin and G. J. Frazer, �High�resolution time�frequencydistributions for maneuvering target detection in over�the�horizon radars,�IEE Proc. Radar sonar Navig., vol. 150, no. 4, pp. 299� 304, Aug. 2003.
[208] B. Barkat and B. Boashash, � A High�resolution Quadratic time�frequencydistribution forMulticomponent signals analysis,� IEEE Trans. Signal Process.,vol. 49, no. 10, pp. 2232� 2239, Oct. 2001.
[209] A. Papandreou�Suppappola, "Applications in time�frequency signal process-ing," CRC Press LLC, 2003.
159
[210] H. G. Feichtinger and T. Strohmer, Eds., Gabor Analysis and Algorithm:Theory an Applications, Springer, 1998.
[211] H. G. Feichtinger and T. Strohmer, Eds., Advances in Gabor Analysis, Birkhäuser,2001.
[212] S. Qian and D. Chen, Decompostion of the Wigner distribution and time�frequency distribution series, IEEE Trans. Signal Process., 42, pp. 2836�2842, Oct. 1994.
[213] S. Qian and D. Chen, "Signal representation using adaptive normalized Gaussianfunctions," Signal Processing, 36 , pp. 1�11, 1994.
[214] L. F. Villemoes, "Best approximation with Walsh atoms," Constr. Approx.,13, pp. 329�355, 1997.
[215] A. Bultan, A four�parameter atomic decomposition of chirplets, IEEE Trans.Signal Process., 47, pp 731�745, March 1999.
[216] M. R. McClure and L. Carin, Matching pursuits with a wave�based dictio-nary, IEEE Trans. Signal Process., 45, pp. 2912�2927, Dec. 1997.
[217] I. Sha�, J. Ahmad, S. I. Shah, and F.M. Kashif, �Evolutionary time�frequencydistributions using Bayesian regularised neural network model�, IET SignalProcess., vol. 1, no. 2, pp. 97�106, June 2007.
[218] M.T. Hagan, H.B. Demuth &M. Beale, �Neural Network Design�, ThomsonLearning USA, 1996.
[219] Chauvin, Y., & Rumelhart, D.E., �Back propagation: Theory, Architecture,and Applications�, Lawrence Erlbaum Associates, Publisher UK, 1995.
[220] MacKay, D.J.C.,�A Practical Bayesian Framework for Back propagation Net-work�, Neural Computation, vol. 4, no. 3, pp. 448�472, 1992.
[221] M. Reidmiller and H. Broun, �A direct adaptive method for faster back prop-agation learning: The RPROP algorithm� Proc. IEEE Int. Conf. on ANN(ICNN) San Francisco, pp. 586�591, 1993.
[222] S. Pei and J. Ding, �Relations between Gabor Transforms and FractionalFourier Transforms and their applications for signal processing�.IEEE Trans.Signal Process., vol. 55, no. 10, pp. 4839�4850, Oct. 2007.
160
[223] http://en.wikipedia.org/wiki/Data_clustering.
[224] Sha�, I., Ahmad, J., Shah, S.I., & Kashif, F.M., �Impact of varying Neuronsand Hidden layers in Neural Network Architecture for a Time Frequency Ap-plication�, Proc. 10th IEEE Intl. Multi topic Conf., INMIC 2006, Islamabad,Pakistan, 23�24 Dec. 2006.
[225] I. Sha�, J. Ahmad, S.I. Shah, FM. Kashif, � Time Frequency Distributionusing Neural Networks�, Proc.IEEE Intl. Conf. on Emerging Technologies,pp. 32�35, Pakistan, 2005.
[226] Ahmad, J., Sha�, I., Shah, S.I., & Kashif, F.M., �Analysis and Comparison ofNeural Network Training Algorithms for the Joint Time�Frequency Analy-sis�, Proc. IASTED Intl. Conf on Arti�cial Intelligence and application, pp.193�198, Austria, Feb 2006.
[227] S.I. Shah, I. Sha�, J. Ahmad, and F. M. Kashif,�Multiple Neural Networksover Clustered Data (MNCD) to Obtain Instantaneous Frequencies (IFs)�,Proc. IEEE Intl. Conf. on Information and Emerging Technologies (ICIET),pp. 1�6 , 6�7 July 2007, Karachi, Pakistan.
[228] I. Sha�, J. Ahmad, S. I. Shah, and F. M. Kashif, �Computing De�blurredTime Frequency Distributions using Arti�cial Neural Networks�, Circuits,Systems, and Signal Processing, Birkhäuser Boston, Springer Verlag, vol.27, no. 3, pp. 277�294, Jun 2008.
[229] M. B. Priestley. �Evolutionary spectra and nonstationary processes.� J .RoyalStat. Soc. B, vol. 27, no. 2, pp. 204�237, 1965.
[230] ������, Spectral Analysis and Time Series. London: Academic, 1981.
[231] ������, Nonlinear and Non�stationary Time Series Analysis. Lon-don: Academic, 1988.
[232] Ci. Melard and A. Herteler de Schutter, �Contributions to evolutionary spec-tral theory,� J. Time Series Anal., vol. 10, no. 1 , pp. 41�63, 1989.
[233] A. M. Yaglom, An Introduction to the Theory of Stationary Random Func-tions. Englewood Cliffs, NJ: Prentice�Hall, 1962.
161
[234] C. S. Detka, A. El�Jaroudi, and L. F. Chaparro,"Relating the bilinear distri-bution and the evolutionary spectrum," Proc. IEEE ICASSP, pp. 496�499,vol. 4, 1993.
[235] Y. Grenier, �Time�dependent ARMA modeling of nonstationary signals,�IEEE Trans. Acoust., Speech Signal Process., vol. ASSP�31,no. 4, pp. 899�911, Aug. 1983.
[236] T. Subba Rao, �The �tting of non�stationary time�series models with time�dependent parameters,� J. Royal Stat. Soc., B, vol. 32, no. 2, pp. 312�322,1970.
[237] M. Kahn, L. F. Chaparro, and E. W. Kamen, �Frequency analysis of nonsta-tionary signal models,� Proc. Conf. Inform. Sci. Syst., pp. 617�622, (Balti-more), Mar. 1989.
[238] S. M. Kay, Modern Spectral Estimation: Theory and Application. Engle-wood Cliffs, NJ: Prentice�Hall 1988.
[239] J. Capon, �High�resolution frequency�wavenumber spectrum analysis,� Proc.IEEE, vol. 57, no. 8, pp 1408�1419, Aug. 1969.
[240] D. J. Thomson.,"Spectrum estimation and harmonic analysis," Proc. IEEE,70:1055�1096, 1982.
[241] D. Thomson and A. Chave. Jackknifed error estimates for spectra, coher-ences, and transfer functions. S. Haykin, (ed.), Advances in Spectrum Analy-sis and Array Processing, Vol. 1, 58�113. Prentice�Hall, 1991.
[242] Pitton, JamesW., �Positive Time�Frequency Distributions via Quadratic Pro-gramming�, Journal of Multidimensional Systems and Signal Processing,SpringerLink, Vol. 9, No. 4, pp. 439�445, October 1998.
[243] Emresoy, M.K., Loughlin, P.J., "Weighted Least Square Cohen�Posch Time�Frequency Distribution Functions", IEEE Trans. Signal Process., Vol. 46,No. 3, pp. 753�757, 1998.
[244] Groutage, D., A Fast Algorithm for Computing Minimum Cross�EntropyPositive Time�Frequency Distributions, IEEE Trans. Signal Process., Vol.45, No. 8, pp. 1954�1970, 1997.
162
[245] Shah, S.I., Loughlin, P.J., Chaparro, L.F., El�Jaroudi, A., Informative Priorsfor Minimum Cross�Entropy Positive Time Frequency Distributions, IEEESignal Process. Lett., Vol. 4, No. 6, pp. 176�177, 1997.
[246] P. Loughlin, J. Pitton, and L. Atlas,"Construction of positive time�frequencydistributions,". IEEE Trans. Signal Process., 42(10):2697�2705, 1994.
[247] P. Loughlin, J. Pitton, and B. Hannaford,"Approximating time�frequencydensity functions via optimal combinations of spectrograms," IEEE SignalProcess. Lett., 1(12):199�202, 1994.
[248] J. Pitton,"An algorithm for weighted least squares positive time�frequencydistributions," In SPIE Advanced Sig. Proc. Algs. Archs., Impl. VII, volume3162, 1997.
[249] J. Pitton," Linear and quadratic methods for positive time�frequency distrib-utions," In IEEE ICASSP, vol. V, pp. 3649�3652, 1997.
[250] J. Pitton, L. Atlas, and P. Loughlin,"Applications of positive time�frequencydistributions to speech processing," IEEE Trans. Sp. Audio Proc., 2(4):554�566, 1994.
[251] S. Stankovic, LJ. Stankovic "Introducing time-frequency distribution witha �complex-time� argument�, Electronics Letters, Vol.32, No.14, pp.1265-1267, July 1996.
[252] L. Stankovic, �Time�frequency distributions with complex argument,� IEEETrans. Signal Process., vol. 50, no. 3, pp. 475�486, Mar. 2002.
[253] C. Cornu, S. Stankovic, C. Ioana, A. Quinquis, LJ. Stankovic, �GeneralizedRepresentations of Phase Derivatives for Regular Signals�, IEEE Trans. Sig-nal Process., Vol. 55, No. 10, pp. 4831�4838, Oct. 2007.
[254] P. O'Shea, �A new technique for instantaneous frequency rate estimation,� inIEEE Signal Process. Lett., vol. 9, no. 8, pp. 251�252, 2002.
[255] I. Sha�, J. Ahmad, S. I. Shah, and F. M. Kashif, "Techniques to obtaingood resolution and concentrated time�frequency distributions�a review",EURASIP Journal on Advances in Signal Processing, Volume 2009 (2009),Article ID 673539, 43 pages.
163
Appendix AANN Fundamentals
A novel ANN based method is presented in Chapters 3 and 4 to compute de�
blurred TFDs [217, 228]. Fig. A.1 is the general block form representation of the
method. The resultant TFDs are highly concentrated, better in resolution, and free of
CTs, thus can be used for STSC' analysis. The method employs Bayesian regulariza-
tion during training phase of the ANN to obtain energy concentration along the IF of
individual components for unknown blurred TFDs. De�blurring TFDs is particularly
suited for learning [218] by an ANN for the following reasons [109, 111]:
1. There is little information available on the source of blurring.
2. Usually blurring is the result of combination of events, which makes it too
complex to be mathematically described.
3. Suf�cient data is available and it is conceivable that data captures the fundamental
principle at work.
The important theoretical aspects of an ANN setup are discussed next. They
are necessary for a clear understanding of the proposed ANN based multi�processes
framework covered in the Chapters to follow.
164
Multiple BRNNsTraining & NENNsSelection
Test TFDsPre Processing
Training TFDs Correlation &Clusters Formation
Vectorization
Pre Processing
Resultant TFDsOutputData
PostProcessing
Figure A.1: The �ow diagram of the neural network based method.
165
A.1 Brain Vs ANN
The brain is a very ef�cient tool. Having about much slower slower response time
than computer chips but it beats the computer in complex tasks, such as image and
sound recognition and many others. It is extremely ef�cient than the computer chip
for energy consumption per operation. An ANN is an information processing paradigm
that is inspired by the way, the brain process information [108]. The key element of this
paradigm is the novel structure of the information processing system. It is composed
of a large number of highly interconnected processing elements (neurons) working in
unison to solve speci�c problems. ANNs, like people, learn by example. An ANN is
con�gured for a speci�c application, such as pattern recognition or data classi�cation,
through a learning process. Learning in biological systems involves adjustments to
the synaptic connections that exist between the neurons. This is true of ANNs as well
[218].
A.2 Human Vs Arti�cial Neuron
A typical human neuron collects signals from others through a host of �ne structures
called dendrites. The neuron sends out spikes of electrical activity through a long, thin
stand known as an axon, which splits into thousands of branches. At the end of each
branch, a structure called a synapse converts the activity from the axon into electrical
effects that inhibit or excite activity from the axon into electrical effects that inhibit or
excite activity in the connected neurons. When a neuron receives excitatory input that
166
(a) (b)
Figure A.2: (a) Human's neuron (b) Arti�cial neuron
is suf�ciently large compared with its inhibitory input, it sends a spike of electrical
activity down its axon. Learning occurs by changing the effectiveness of the synapses
so that the in�uence of one neuron on another changes [105, 106].
The essential features of human's neurons and their interconnections are esti-
mated. A computer is then typically programmed to simulate these features. How-
ever because the knowledge about neurons is still incomplete with limited computing
power, these models are necessarily gross idealizations of real networks of neurons. A
model of human's neuron vs arti�cial neuron is presented in Fig. A.2.
A.3 ANN Layers
The commonest type of ANN consists of three groups, or layers, of units: a layer
of "input" units is connected to a layer of "hidden" units, which is connected to a
layer of "output" units [105, 106]. The activity of the input units represents the raw
167
information that is fed into the network. The activity of each hidden unit is determined
by the activities of the input units and the weights on the connections between the input
and the hidden units. The behavior of the output units depends on the activity of the
hidden units and the weights between the hidden and output units.
This simple type of network is interesting because the hidden units are free to
construct their own representations of the input. The weights between the input and
hidden units determine when each hidden unit is active, and so by modifying these
weights, a hidden unit can choose what it represents. We also distinguish single�
layer and multi�layer architectures. The single�layer organization, in which all units
are connected to one another, constitutes the most general case and is of more poten-
tial computational power than hierarchically structured multi�layer organizations. In
multi�layer networks, units are often numbered by layer, instead of following a global
numbering.
The most widely used ANNs' architecture has been the multiple layer percep-
tron, trained with the back propagation error learning algorithm. However, it suffers
from fundamental problems like convergence time, local minima and absence of a sim-
ple rule to obtain the right number of neurons and hidden layers.
A.4 Weights and Error Adjustment
In order to train an ANN to perform some task, the weights of each unit are adjusted in
such a way that the error between the desired output and the actual output is reduced
168
[105, 106, 218]. This process requires that the ANN computes the error derivative
of the weights represented by EW . In other words, it must calculate how the error
changes as each weight is increased or decreased slightly. The back propagation algo-
rithm is the most widely used method for determining the EW .
The back propagation algorithm is easiest to understand if all the units in the
network are linear. The algorithm computes each EW by �rst computing the EA, the
rate at which the error changes as the activity level of a unit is changed. For output
units, the EA is simply the difference between the actual and the desired output. To
compute the EA for a hidden unit in the layer just before the output layer, we �rst
identify all the weights between that hidden unit and the output units to which it is
connected. We then multiply those weights by the EAs of those output units and add
the products. This sum equals the EA for the chosen hidden unit. After calculating
all the EAs in the hidden layer just before the output layer, we can compute in like
fashion the EAs for other layers, moving from layer to layer in a direction opposite to
the way activities propagate through the network. This is what gives back propagation
its name. Once the EA has been computed for a unit, it is straight forward to compute
the EW for each incoming connection of the unit. The EW is the product of the EA
and the activity through the incoming connection.
A.4.1 Back propagation Algorithm
The back propagation algorithm consists of four steps [105, 106, 218]:
169
1. Compute how fast the error changes as the activity of an output unit is changed.
This error derivative de�ned by symbolEA is the difference between the actual and
the desired activity.
EAj =@E
@yj= yj � dj (A.1)
2. Compute how fast the error changes as the total input received by an output
unit is changed. This quantity de�ned by the symbol EI is the answer from step
1 multiplied by the rate at which the output of a unit changes as its total input is
changed.
EIj =@E
@xj=@E
@yj� @yj@xj
= EAjyj (1� yj) (A.2)
3. Compute how fast the error changes as a weight on the connection into an output
unit is changed. This quantity de�ned by the symbol EW is the answer from step
2 multiplied by the activity level of the unit from which the connection emanates.
EWij =@E
@Wij
=@E
@xj� @xj@Wij
= EIjyi (A.3)
4. Compute how fast the error changes as the activity of a unit in the previous layer
is changed. This crucial step allows back propagation to be applied to multilayer
networks. When the activity of a unit in the previous layer changes, it affects the
170
activites of all the output units to which it is connected. So to compute the overall
effect on the error, we add together all these seperate effects on output units. But
each effect is simple to calculate. It is the answer in step 2 multiplied by the weight
on the connection to that output unit.
EAi =@E
@yj=Xj
@E
@xj� @xj@yi
=Xj
EIjWij (A.4)
By using steps 2 and 4, the EAs of one layer of units are converted into EAs
for the previous layer. This procedure can be repeated to get the EAs for as many
previous layers as desired. Once the EA of a unit is known, steps 2 and 3 can be used
to compute the EW s on its incoming connections.
A.5 Learning Algorithms
The brain learns from experience. ANNs are sometimes called machine learning al-
gorithms, because changing of its connection weights (training) causes the network
to learn the solution to a problem. The strength of connection between the neurons is
stored as a weight�value for the speci�c connection. The system learns new knowledge
by adjusting these connection weights. The learning ability of an ANN is determined
by its architecture and by the algorithmic method chosen for training. In the succeed-
ing paragraph a brief description of the most popular ANN training algorithms based
171
on back propagation is presented. To compute the de�blurred TFDs, the comparison
and selection of the best training algorithm is made in Section 3.2 of Chapter 3.
A.5.1 The Lavenberg�Marquardt back propagation training algorithm
The LMB algorithm is a variation of Newton's method [218, 219] that was designed
for minimizing functions that are sums of squares of other nonlinear functions. This is
very well suited to ANN training where the performance index is the MSE. Newton's
method approximates to Gauss�Newton method and after a number of substitutions
transforms to the LMB algorithm:
xk+1 = xk ���T (xk)�(xk)�kI
��1�T (xk)�(xk) (A.5)
where x;�(x); �; I;and � are learning vectors, Jacobian matrix, nonlinear functions,
identity matrix and step size, respectively. This algorithm has the very useful feature
that as �k is increased it approaches the steepest descent algorithm with small learning
rate:
xk+1 ' xk �1
�k�T (xk)�(xk) (A.6)
= xk �1
2�k�F (x); for large �k
whereas as �k is decreased to zero the algorithm becomes Gauss�Newton. Here we
assume that F (x) is a sum of squares function:
172
F (x) = �T (x)�(x) (A.7)
The algorithm begins with �k set to some small value (e.g.,�k = 0:01). If a step
does not yield a smaller value for F (x), then the step is repeated with multiplied by
some factor � > 1 (e.g., � = 10). Eventually F (x) should decrease, since we would
be taking a small step in the direction of steepest descent. If a step does reduce a
smaller value for F (x), then is divided by � for the next step, so that the algorithm will
approach Gauss�Newton, which should provide faster convergence. The algorithm
provides a nice compromise between the speed of Newton's method and the guaranteed
convergence of steepest descent.
A.5.2 The Powell�Beale conjugate gradient back propagation train-ing algorithm
The Powell�Beale conjugate gradient back propagation (PBCGB) training algorithm
can train any network as long as its weight, net input, and transfer functions have deriv-
ative functions [218]. Back propagation is used to calculate derivatives of performance
with respect to the weight and bias variablesX . Each variable is adjusted according to
the following:
X = X + �� dX (A.8)
where dX is the search direction. The parameter � is selected to minimize the perfor-
mance along the search direction. The line search function is used to locate the min-
173
imum point. The �rst search direction is the negative of the gradient of performance.
In succeeding iterations the search direction is computed from the new gradient and
the previous search direction according to the formula:
dX = ��X + dXold � Z (A.9)
Where �X is the gradient. The parameter Z can be computed in several different
ways. The Powell�Beale variation of conjugate gradient is distinguished by two fea-
tures. First, the algorithm uses a test to determine when to reset the search direction to
the negative of the gradient. Second, the search direction is computed from the neg-
ative gradient, the previous search direction, and the last search direction before the
previous reset.
A.5.3 The Gradient descent with adaptive learning rate back propa-gation training algorithm
In the Gradient descent with adaptive learning rate back propagation (GDALB) train-
ing algorithm, back propagation is used to calculate derivatives of performance dperf
with respect to the weight and bias variables X [218]. Each variable is adjusted ac-
cording to gradient descent:
dX = lr �dperfdX
(A.10)
where lr is the learning rate. Each of epochs, if performance decreases toward the ear-
lier de�ned goal, then the learning rate is increased by the factor lrinc . If performance
174
increases by more than the factormaxperfinc , the learning rate is adjusted by the factor
lrdec and the change, which increased the performance, is not made.
A.5.4 The Resilient propagation back propagation training algorithm
The Resilient propagation back propagation (RPB) training algorithm is ef�cient new
learning scheme that performs a direct adaptation of the weight step based on local
gradient information [218, 221]. In crucial difference to previously developed adapta-
tion techniques, the effort of adaptation is not blurred by gradient behavior whatsoever.
Back propagation is used to calculate derivatives of performance with respect to the
weight and bias variables X . Each variable is adjusted according to the following:
dX = �X � sign(�X) (A.11)
where the elements of �X are all initialized to �0 and �X is the gradient. At each
iteration the elements of �X are modi�ed. If an element of �X changes sign from
one iteration to the next, then the corresponding element of �X is decreased by �dec.
If an element of �X maintains the same sign from one iteration to the next, then the
corresponding element of �X is increased by �inc. During training it was found that
there is a requirement to increase number of neurons from 10 to 20 in the hidden layer
for convergence of algorithm.
175
A.6 Bayesian Regularisation
The approach involves modifying the usually used objective function, such as the mean
sum of squared network errors [220].
mse =1
N
KXk=1
(ek)2 (A.12)
wheremse; ek; andN represent MSE, network error and network errors' taps for aver-
aging respectively. It is possible to improve generalization if the performance function
is modi�ed by adding a term that consists of the mean of the sum of squares of the
network weights and biases
msereg = mse+ (1� )msw (A.13)
where ;msereg; andmsw are the performance ratio, performance function and mean
of the sum of squares of network weights and biases, respectively. msw is mathemat-
ically described as under:
msw =1
n
nXj=1
(wj)2 (A.14)
using this performance function causes the network to have smaller weights and biases,
and this force the network response to be smoother and less likely to over �t. More-
over it is desirable to determine the optimal regularization parameters in an automated
fashion. One approach to this process is the Bayesian framework of David Mackay
[220]. In this framework, the weights and biases of the network are assumed to be ran-
176
dom variables with speci�ed distributions. The regularization parameters are related to
the unknown variances associated with these distributions. Statistical techniques can
then be used to estimate these parameters.
177
Appendix BList of Publications
B.1 Journal Publications
B.1.1 Published
1. "Techniques to obtain good resolution and concentrated time�frequency
distributions�a review", EURASIP Journal on Advances in Signal Processing,
Volume 2009 (2009), Article ID 673539, 43 pages.
2. �Computing De�blurred Time Frequency Distributions using Arti�cial Neural
Networks�, Circuits, Systems, and Signal Processing, Birkhäuser Boston,
Springer Verlag, Volume 27, no. 3, pp. 277�294, Jun 2008.
3. �Evolutionary time�frequency distributions using Bayesian regularised neural
network model�, IET Signal Process., Volume 1, no. 2, pp. 97�106, June 2007.
B.2 Conference Publications
1. "Quantitative evaluation of concentrated Time Frequency Distributions", accepted
for publication in Proc. IEEE EUSIPCO, 24-26 Aug 2009, Glasgow, UK.
178
2. �Neural Network Solution for Compensating Distortions of Time Frequency
Representations�, Proc. IEEE Intl. Conf. on Signal Process. & Comm., pp.
1575�1578, 24�27 Nov 2007, Dubai, UAE.
3. �Multiple Neural Networks over Clustered Data (MNCD) to Obtain Instantaneous
Frequencies (IFs)�, Proc. IEEE Intl. Conf. on Information and Emerging
Technologies (ICIET), pp. 1�6 , 6�7 July 2007, Karachi, Pakistan.
4. �Impact of Varying Neurons and Hidden Layers in Neural Network Architecture
for a Time Frequency Application�, Proc. 10th IEEE Intl. Multi Topic Conf
(INMIC 2006), pp. 188�193, 23�24 Dec 2006, Islamabad, Pakistan.
5. �Time Frequency Image analysis using Neural Networks�, Proc. IMACS
Multi�conference on Computational Engineering in Systems Applications
(CESA), vol. 1, pp 315�320, 4�6 Oct. 2006, Beijing, China.
6. "Analysis and Comparison of Neural Network Training Algorithms for the Joint
Time�Frequency Analysis�, Proc. IASTED Intl. Conf. on Arti�cial Intelligence
and application, pp. 193�198, Austria, Feb 2006.
7. �Time Frequency Distribution using Neural Networks�, Proc. IEEE Intl. Conf.
on Emerging Technologies (ICET), pp. 32�35, Pakistan, 2005.