imran sha - prr.hec.gov.pkprr.hec.gov.pk/jspui/bitstream/123456789/837/1/658s.pdf · allah subhanhu...

TIME�FREQUENCY ANALYSIS USING NEURAL NETWORKS

[De�blurred Time�Frequency Distributions Using Neural Networks]

A Dissertation presented

by

Imran Sha�

to

The Department of Computer Engineering

in partial ful�llment of the requirements

for the degree of

DOCTOR OF PHILOSOPHY

in the subject of

Digital Signal Processing

Centre for Advance Studies in Engineering (C@SE), IslamabadUniversity of Engineering and Technology (UET), Taxila

Pakistan

September 2009

CERTIFICATE

It is certi�ed that the work contained in this thesis is carried out by Mr. Imran Sha�under my supervision at C@SE Islamabad, af�liated with UET Taxila, Pakistan.

Prof. Dr. Syed Ismail Shah

Department of Computer Engineering

C@SE, Islamabad

Prof. Dr. Abdul Khaliq

Chairman

Department of Computer Engineering

C@SE, Islamabad

President

C@SE, Islamabad

ABSTRACT

The thesis is divided in three parts. In the �rst part, it explores and discusses

the diversity of concepts and motivations for obtaining good resolution and highly

concentrated time�frequency distributions (TFDs) for the research community. The

description of the methods used for TFDs' objective assessment is provided later in

this part.

In the second part, a novel multi�processes ANN based framework to obtain

highly concentrated TFDs is proposed. The propose method utilizes a localised Bayesian

regularised neural network model (BRNNM) to obtain the energy concentration along

the instantaneous frequencies (IFs) of individual components in the multicomponent

signals without assuming any prior knowledge. The spectrogram and pre�processed

Wigner�Ville distribution (WD) of the signals with known IF laws are used as the train-

ing set for the BRNNM. These distributions, taken as two�dimensional (2�D) image

matrices, are vectorized and clustered according to the elbow criterion. Each cluster

contains the pairs of the input and target vectors from the spectrograms and highly

concentrated pre�processed WD respectively. For each cluster, the pairs of vectors are

used to train the multiple ANNs under the Bayesian framework of David Mackay. The

best trained network for each cluster is selected based on network error criterion. In

the test phase, the test TFDs of unknown signals, after vectorization and clustering,

are processed through these specialized ANNs. After post�processing, the resulting

TFDs are found to exhibit improved resolution and concentration along the individual

components then the initial blurred estimates.

The third part presents the discussion on the experimental results obtained by the

proposed technique. Moreover the framework is extended to include the various objec-

tive methods of assessment to evaluate the performance of de�blurred TFDs obtained

through the proposed technique. The selected methods not only allow quantifying the

quality of TFDs instead of relying solely on visual inspection of their plots, but also

help in drawing comparison of the proposed technique with the other existing tech-

niques found in literature for the purpose. In particular the computation regularities

show the effectiveness of the objective criteria in quantifying the TFDs' concentration

and resolution information.

ACKNOWLEDGMENTS

In the name of Allah the most bene�cient the most merciful. Praise be to Allah

subhanhu wa ta'ala and peace and blessing be on all his prophets and messengers, espe-

cially on the seal of prophets, prophet Muhammad salalahu alaihi wassalam. Without

Allah subhanhu wa ta'ala's help and blessing, I was unable to complete this thesis.

I am indebted to Professor Dr. Syed Ismail Shah and Professor Dr. Jamil Ah-

mad, my advisors, for their guidance and patience throughout my research. More

importantly, I am grateful to both for the support and encouragement they generously

gave me when I needed it most. Special thanks to Professor Dr. Shoab Ahmad Khan

for not only being on my dissertation committee but for his comments on my proposal

report, fruitful discussions and valuable help in the second part of the thesis. I would

also like to thank my other dissertation committe members Professor Dr. Abdul Khaliq

and Professor Dr. Amir Iqbal Bhatti.

I would also like to thank my research colleague Faisal Mehmood Kashif for

encouraging me to embark on my research. I offer thanks to my friends and colleagues

Adnan Khan, Sajid Bashir, Imran Zaka, Habib ur Rehman and Seema Khalid.

I am grateful to my parents for their in�nite support and for teaching me the

importance of knowledge. I would also like to thank my wife and daughter for their

patience and continued support.

I would also like to thank the Higher Education Commission (HEC) of Pakistan

for the four year scholarship for graduate studies. Last but not least, I would like

to thank unknown and anonymous reveiwers, whose critique caused stimulating and

illuminating discussions. Their valuable comments have helped me revising my work

and eventually publishing it in prestigious international journals. List of publications

is given in appendix B.

CONTENTS

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1 Concentration and Resolution Tradeoff . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.2 Application speci�c Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

1.3 Objective Assessment of TFDs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

1.4 Our Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

1.5 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

1.5.1 Review of High Resolution TF methods . . . . . . . . . . . . . . . . . . . . . . . . . 11

1.5.2 A Novel ANN based Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

1.5.3 The Objective Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

1.6 Overview of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.1 TF Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.1.1 The Methods based on Evolutionary Spectrum . . . . . . . . . . . . . . . . . . . 19

2.1.2 The Methods based on Cohen's Bilinear Class . . . . . . . . . . . . . . . . . . . 25

2.2 Objective Assessment Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

2.2.1 Entropy Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

2.2.2 Normalized Entropy Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

2.2.3 Ratio of Norms based Measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

2.2.4 LJubisa Measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

2.2.5 Boashash Performance Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

2.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

3 Neural Network based Framework for ComputingDe�blurred TFDs�Part I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

3.1 TFDs using ANN�Binary Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

3.1.1 Selected ANN Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

3.1.2 Test Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

3.2 Analysis & Comparison of the ANN Training Algorithms . . . . . . . . . . . . . . . 60

3.2.1 The Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

3.2.2 Selected ANN Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

3.2.3 Performance Evaluation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

3.3 Impact of varying number of Neurons and the Hidden Layers . . . . . . . . . . . . 72

3.3.1 ANN topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

3.3.2 Effect of varying the number of Hidden Neurons . . . . . . . . . . . . . . . . . 74

3.3.3 Effect of varying the number of Hidden Layers . . . . . . . . . . . . . . . . . . . 75

3.4 Effect of Data Clustering and using Multiple ANNs for each Cluster . . . . . 80

3.4.1 Advantages of Clustering and Training Multiple ANNs . . . . . . . . . . . 80

3.4.2 The Network Architecture and Procedure . . . . . . . . . . . . . . . . . . . . . . . . 81

3.4.3 Performance Evaluation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

4 Neural Network based Framework for ComputingDe�blurred TFDs�Part II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

4.1 The ANN based Framework's Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

4.1.1 Pre�processing of Training Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

4.1.2 Processing through Bayesian Regularized Neural Network Model 102

4.1.3 Post�processing of the Output Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

4.2 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

4.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

5 Discussion on Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .113

5.1 Visual Interpretation and Entropy Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

5.1.1 Resultant NTFDs � Experimental Results . . . . . . . . . . . . . . . . . . . . . . . 117

5.2 Objective Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

5.2.1 Real Life Test Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

5.2.2 Synthetic Test Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

5.2.3 Performance Evaluation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

5.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

6 Conclusion and Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .134

6.1 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .139

A ANN Fundamentals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .162

A.1 Brain Vs ANN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164

A.2 Human Vs Arti�cial Neuron . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164

A.3 ANN Layers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165

A.4 Weights and Error Adjustment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166

A.4.1 Back propagation Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167

A.5 Learning Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169

A.5.1 The Lavenberg�Marquardt back propagation training algorithm . . 170

A.5.2 The Powell�Beale conjugate gradient back propagation trainingalgorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171

A.5.3 The Gradient descent with adaptive learning rate back propagationtraining algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172

A.5.4 The Resilient propagation back propagation training algorithm. . . 173

A.6 Bayesian Regularisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174

B List of Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .176

B.1 Journal Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176

B.1.1 Published . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176

B.2 Conference Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176

LIST OF FIGURES

Figure 2.1 TFDs of a multicomponent bat echolocation chirp signal. (a)Spectrogram (Test Input to the BRNNM)[Hamming window of length L = 100], (b)WVD, (c) ZAMD, (d) MHD, (e) CWD [kernel width =1], (f) BJD.. . . . . . . . . . . . . . . 45

Figure 3.1 Graphical explanation of the method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

Figure 3.2 Input training TFD image of sinusoidal FM signal. . . . . . . . . . . . . . . . . 58

Figure 3.3 Input training TFD image of parallel chirps. . . . . . . . . . . . . . . . . . . . . . . 59

Figure 3.4 Target TFD image for the sinusoidal FM signal. . . . . . . . . . . . . . . . . . . . 59

Figure 3.5 Target TFD image of parallel chirp signal. . . . . . . . . . . . . . . . . . . . . . . . . 60

Figure 3.6 Bineary TFD obtained by the OKM [132]. . . . . . . . . . . . . . . . . . . . . . . . . 61

Figure 3.7 Spectrogram of the bat echolocation chirp signal. . . . . . . . . . . . . . . . . . 61

Figure 3.8 The deblurred TFD obtained by the proposed ANN model. . . . . . . . . 62

Figure 3.9 Input training TFD image of the sinusoidal FM signal. . . . . . . . . . . . . 63

Figure 3.10 Input training TFD image of parallel chirps signal. . . . . . . . . . . . . . . . . 63

Figure 3.11 Test TFD image of combined sinusoidal FM & parallel chirpssignal. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

Figure 3.12 The resultant TFD obtained after passing the spectrogram of the testsignal through the trained ANN with RPROP backpropagation algorithm. . . . . . . . . 69

Figure 3.13 The resultant TFD obtained after passing the spectrogram of thetest signal through the trained ANN with Powell-Beale conjugate gradient backpropagation algorithm.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

Figure 3.14 The resultant TFD obtained after passing the spectrogram of thetest signal through the trained ANN with Gradient descent with adaptive lrbackpropagation algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

Figure 3.15 The resultant TFD obtained after passing the spectrogram of thetest signal through the trained neural network with Levenberg-Marquardt trainingalgorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

Figure 3.16 The comparative graph which shows error convergence with respect tonumber of iterations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

Figure 3.17 Test spectrogram image of single chirp signal. . . . . . . . . . . . . . . . . . . . . 73

Figure 3.18 The comparative graph of error vs number of neurons in single hiddenlayer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

Figure 3.19 The comparative graph of error vs epoches for various number ofhidden layers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

Figure 3.20 Resultant TFD (2 hidden layers with 50 neurons in each). . . . . . . . . . 76

Figure 3.21 Resultant TFD (2 hidden layers with 5 neurons in each). . . . . . . . . . . . 77

Figure 3.22 Resultant TFD (3 hidden layers with 5 neurons in each). . . . . . . . . . . . 77



Figure 3.25 Resultant TFD (single layer with 40 neurons). . . . . . . . . . . . . . . . . . . . . 79

Figure 3.26 Resultant TFD (Single hidden layer with 30 neurons). . . . . . . . . . . . . . 79

Figure 3.27 Resultant TFD obtained after processing test TFD with single ANNwithout data clustering. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

Figure 3.28 Resultant TFD obtained after processing test TFD with multiple ANNsafter data clustering. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

Figure 3.29 The MSE in last epoch for (a) ANNs trained for cluster 1 (b) ANNstrained for cluster 2 (c) ANNs trained for cluster 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

Figure 3.30 Rate of MSE convergence against epochs for (a) ANNs trained forcluster 1 (b) ANNs trained for cluster 2 (c) ANNs trained for cluster 3 . . . . . . . . . . . 85

Figure 3.31 The convergence time taken by variuos ANNs for each cluster of data,(a) ANNs trained for cluster 1, (b) ANNs trained for cluster 2, (c) ANNs trained forcluster 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

Figure 4.1 Flow diagram of the method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

Figure 4.2 Major modules of the method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

Figure 4.3 Pre-processing of training data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

Figure 4.4 The spectrograms used as input training images of the (a) sinusoidalFM, and (b) parallel chirp signals. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

Figure 4.5 Target TFDs with CTs unsuitable for training ANN taking WD of the,(a) parallel chirps' signal, and (b) sinusoidal FM signal. . . . . . . . . . . . . . . . . . . . . . . . . . 95

Figure 4.6 The non-processed WD target images of the sinusoidal FM signal, (a)grayscale version, (b) binary version. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

Figure 4.7 The pre-processed WD target image of sinusoidal FM signal, (a)grayscale version, (b) binary version. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

Figure 4.8 The non-processed WD target images of the parallel chirps' signal, (a)grayscale version, (b) binary version. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

Figure 4.9 The pre-processed WD target image of the parallel chirps' signal, (a)grayscale version, (b) binary version. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

Figure 4.10 Elbow criterion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

Figure 4.11 Vectorization, correlation and taxonomy of TFD image. . . . . . . . . . . 102

Figure 4.12 Bayesian regularised neural network model . . . . . . . . . . . . . . . . . . . . . . 103

Figure 4.13 Post-processing of the output data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

Figure 4.14 Test TFDs for bat chirps signal, (a) the spectrogram TFD, and (b) Theresultant TFD after processing through proposed framework. . . . . . . . . . . . . . . . . . . . 110

Figure 4.15 Resultant TFD obtained by the method of [132]. . . . . . . . . . . . . . . . . . 111

Figure 5.1 Test TFDs (a) Crossing chirps (TI 1), (b) mono-component linear chirp(TI 2), (c) combined quadratic swept-frequency signals whose spectrograms areconcave and convex parabolic chirps respectively (TI 3), (d) combined sinusoidal FMand crossing chirps (TI 4), and (e) quadratic chirp (TI 5) . . . . . . . . . . . . . . . . . . . . . . . 116

Figure 5.2 Resultant TFDs after processing through correlation vectored taxonomyalgorithm with LNNs for (a) Crossing chirps (TI 1), (b) mono-component linear chirp(TI 2), (c) combined quadratic swept-frequency signals whose spectrograms are

concave and convex parabolic chirps respectively (TI 3), (d) combined sinusoidal FMand crossing chirps (TI 4), and (e) quadratic chirp (TI 5) . . . . . . . . . . . . . . . . . . . . . . . 119

Figure 5.3 (a) The test spectrogram (TI 2) [Hamm;L = 90] . (b) The NTFD ofa synthetic signal consisting of two sinusoidal FM components intersecting eachother. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

Figure 5.4 (a) The test spectrogram (TI 3) [Hamm;L = 90] . (b) The NTFD of asynthetic signal consisting of two-sets of non-parallel, non-intersecting chirps. . . 121

Figure 5.5 (a) The test spectrogram (TI 4) [Hamm;L = 90] . (b) The NTFD of asynthetic signal consisting of crossing chirps and a sinusoidal FM component.. . . 122

Figure 5.6 (a) The test spectrogram (TI 5), and (b) the NTFD of test case 4. . . 122

Figure 5.7 The time slices for the spectrogram (blue) and the NTFD (red) for thebat echolocation chirps' signal, at n=150 (left) and n=310 (right) . . . . . . . . . . . . . . . 124

Figure 5.8 Comparison plots, criterions' values vs TFDs, for the test images1 � 4, (a) The Shannon entropy measure, (b) Rényi entropy measure, (c) Volumenormalized Rényi entropy measure,(d) Ratio of norm based measure, and (e) LJubisameasure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

Figure 5.9 The normalized slices at t = 64 of TFDs. (a) The spectrogram. (b)WD. (c) ZAMD. (d) CWD. (e) BJD. (f) NTFD. First �ve TFDs (dashed) are comparedagainst the modi�ed B distribution (solid), adopted from Boashash [33]. . . . . . . . . 131

Figure 5.10 Comparasion plots for Boashash TFDs' performance measuresvs TFDs, (a) The modi�ed concentration measure (Cn(64)), (b) normalizedinstantaneous resolution measure (Ri) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

Figure 6.1 The �ow diagram of the neural network based method. . . . . . . . . . . . 163

Figure 6.2 (a) Human's neuron (b) Arti�cial neuron . . . . . . . . . . . . . . . . . . . . . . . . 165

LIST OF TABLES

Table 1.1 Synthesis of Main Problems related to QTFDs . . . . . . . . . . . . . . . . . . . . . . 6

Table 3.1 Comparison of Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

Table 3.2 Comparison of Training Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

Table 3.3 Impact of varying neurons and hidden layers over entropy of resultantimage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

Table 3.4 Comparison of Entropies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

Table 4.1 Entropy values vs clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

Table 4.2 Cluster parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

Table 4.3 Entropy values for various techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

Table 5.1 Entropy values for various techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

Table 5.2 Performance Measures Comparison for Various TFDs . . . . . . . . . . . . 128

Table 5.3 Parameters and the Normalized Instantaneous Resolution PerformanceMeasure of TFDs for the Time Instant t=64 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130

Table 5.4 Parameters and the Modi�ed Instantaneous Concentration PerformanceMeasure of TFDs for the Time Instant t=64 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130

ACRONYMS

TFD Time�Frequeny Distribution

TF Time�Frequency

LNN Localized Neural Network

ANN Arti�cial Neural Network

STSC Signals with time�dependant spectral content

BD Bilinear Distribution

ES Evolutionary Spectrum

EP Evolutionary Periodogram

GTF Generalized Transfer Function

MSE Mean Square Error

WD Wigner�Ville Distribution

CT Cross�Term

2�D Two�Dimensional

IF Instantaneous Frequency

STFT Short Time Fourier Transform

QTFD Quadratic TFD

TVARMA Time�Varying Auto�Regressive Moving Average

NTFD Neural Network TFD

LFM Linear Frequency Modulation

BRNN Bayesian Regularised Neural Network

NENN Network of Expert Neural Network

EMD Empirical Mode Decomposition

CAD Complex Argument Distribution

LMB Lavenberg�Marquardt back propagation

EN Expert Neural Network

CDMN Clustering the Data and training Multiple ANNs

FM Frequency Modulation

PBCGB Powell�Beale conjugate gradient backpropagation

RPB Resilient Propagation back propagation

GDALB Gradient descent with adaptive lr backpropagation

BRNNM Bayesian Regularised Neural Network Model

AF Ambiguity Function

LTV Linear Time Varying

DASE Data�Adaptive evolutionary Spectral Estimator

LAF Local Autocorrelation Function

MCE Minimum Cross Entropy

AOK Adaptive Optimal Kernel

OMP Optimized Matching Pursuit

STAF Short Time Ambiguity Function

OKM Optimal Kernel Method

1

Chapter 1Introduction

During the last twenty years there has been spectacular growth in the volume

of research on studying and processing the signals with time�dependant spectral con-

tent (STSC). For such signal we need techniques that can show the variation in the

frequency of the signal over time. Although some of the methods may not result

in a proper distribution, these techniqes are generally known as time�frequency dis-

tributions (TFDs) [34, 35]. The TFDs are two�dimensional (2�D) functions which

provide simultaneously, the temporal and spectral information and thus are used to

analyze the STSC. By distributing the signal energy over the time�frequency (TF)

plane, the TFDs provide the analyst with information unavailable from the signal's

time or frequency domain representation alone. This includes the number of com-

ponents present in the signal, the time durations and frequency bands over which

these components are de�ned, the components' relative amplitudes, phase informa-

tion, and the instantaneous frequency (IF) laws that components follow in the TF

plane.

There has been a great surge of activity in the past few years in the TF sig-

nal processing domain. The pioneering work in this area is performed by Claasen

and Mecklenbrauker [69]�[71], Janse and Kaizer [73], and Boashash [74]. They pro-

vided the initial impetus, demonstrated useful methods for implementation and de-

veloped ideas uniquely suited to the situation. Also, they innovatively and ef�ciently

2

made use of the similarities and differences of signal processing fundamentals with

quantum mechanics. Claasen and Mecklenbrauker devised many new ideas, proce-

dures and developed a comprehensive approach for the study of joint TFDs [69]�[71].

However Boashash [74] is believed to be the �rst researcher, who used various TFDs

for real world problems. He developed a number of new methods and particularly re-

alized that a distribution may not behave properly in all respects or interpretations,

but it could still be used if a particular property such as the IF [27, 31] is well de�ned.

Escudie [77] and coworkers transcribed directly some of the early quantum mechan-

ical results, particularly the work on the general class of distributions [68, 78], into

signal analysis. The work by Janse and Kaizer [73] developed innovative theoreti-

cal and practical techniques for the use of TFDs and introduced new methodologies

remarkable in their scope.

Historically the spectrogram [23, 24, 25] has been the most widely used tool for

the analysis of time�varying spectra. The spectrogram is expressed mathematically

as the magnitude�square of the short�time Fourier transform (STFT) of the signal,

given by1

S (t; !) =

��Z x (t)h(t� �)e�i!�d��2 (1.1)

where x(t) is the signal and h(t) is a window function. The basic idea is to fourier

analyze a small part of the signal centered around a particular time by means of a

1 Throughout the thesis that follows, both j and i are used forp�1 depending upon the mathemat-

ical requirements and limits forRare from �1 to +1, unless otherwise speci�ed

3

sliding window and getting an energy spectrum as a continuous function of time by

doing it for each instant of time. As long as the signal's chunks themselves do not

contain rapid changes, the results obtained can be used to get a fairly good idea about

the spectral composition of the signal. This selected chunk may be shortened ap-

propriately to a limit if signi�cant changes occurred considerably faster. However

�nding a suitable chunk size for some signals, like human speech, may not be possi-

ble whose spectral content changes rapidly as there may not be any time interval for

which the signal is stationary. Also, the frequency resolution reduces once the chunk

size is reduced in time�domain. Hence there is an inherent tradeoff between time

and frequency resolution [1]. Nevertheless, the spectrogram has severe drawbacks,

both theoretically, since it provides biased estimators of the signal IF and group delay

(GD), and practically, since the Gabor�Heisenberg inequality [120] makes a tradeoff

between temporal and spectral resolutions unavoidable. However STFT and its vari-

ation being simple and easy to manipulate, are still the primary methods for analysis

of the STSC and most commonly used today.

There are other approaches [34, 35, 80] with a motivation to improve upon the

spectrogram, with an objective to clarify the physical and mathematical ideas needed

to understand time�varying spectrum. These techniques generally aim at devising a

joint function of time and frequency, a distribution that will be highly concentrated

along the IFs present in a signal and cross terms (CTs) free thus exhibiting good

resolution. One form of TFD can be formulated by the multiplicative comparison of

4

a signal with itself, expanded in different directions about each point in time. Such

formulations are known as quadratic TFDs (QTFDs) because the representation is

quadratic in the signal. This formulation was �rst described by Eugene Wigner in

quantum mechanics [2] and introduced in signal analysis by Ville [3] to form what

is now known as the Wigner�Ville distribution (WD). The WD is the prototype of

distributions that are qualitatively different from the spectrogram, produces the ideal

energy concentration along the IF for linear frequency modulated (LFM) signals,

given by

W (t; !) , 1

2�

Zs�(t� 1

2�)s(t+

1

2�)e�i�!d� (1.2)

where s(t) is the signal, the distribution is said to be bilinear in the signal because

the signal enters twice in its calculation. The WD preserves the time and frequency

energy marginals of a signal with high TF concentration. It can be argued that more

concentration than in the WD would be undesirable in the sense that it would not

preserve the TF marginals.

It is found that the spectrogram results in a blurred version [1, 31], which can

be reduced to some degree by use of an adaptive window or by combination of spec-

trograms. On the other hand, the use of WD in practical applications is limited by

the presence of nonnegligible CTs, resulting from interactions between signal com-

ponents. These CTs may lead to an erroneous visual interpretation of the signal's TF

structure, and are also a hindrance to pattern recognition, since they may overlap with

5

the searched TF pattern. Moreover If the IF variations are non�linear, then the WD

cannot produce the ideal concentration. Such impediments, pose dif�culties in the

STSC' correct analysis, are dealt in various ways and historically many techniques

are developed to remove them partially or completely. They were partly addressed

by the development of the Choi�Williams distribution [144] in 1989, followed by

numerous ideas proposed in literature with an aim to improve the TFDs' concentra-

tion and resolution for practical analysis [17, 26, 31, 33, 90]. Few other important

non�stationary representations among the Cohen's class [1, 68, 135] of bilinear TF

energy distributions include the Margenau�Hill distribution [8], their smoothed ver-

sions [69]�[71] and [75, 76], and many others with reduced CTs [143]�[146] are

members of this class. Nearly at the same time, some authors also proposed other

time�varying signal analysis tools based on a concept of scale rather than frequency,

such as the scalogram [9, 10] (the squared modulus of the wavelet transform), the

af�ne smoothed pseudo WD [12] or the Bertrand distribution [13]. The theoretical

properties and the application �elds of this large variety of these existing methods

are now well determined, and wide�spread [1], [69]�[71], [86]. Although many

other QTFDs have been proposed in the literature (an alphabatical list can be found

in [209]), no single QTFD can be effectively used in all possible applications. This

is because different QTFDs suffer from one or more problems.

Nevertheless, a critical point of these methods is their readability, which means

both a good concentration of the signal components and no misleading interference

6

terms. This characteristic is necessary for an easy visual interpretation of their out-

comes and a good discrimination between known patterns for nonstationary signal

classi�cation tasks. An ideal TFD function roughly requires the following four prop-

erties:

1. High clarity which makes it easier to be analyzed. This require high

concentration and good resolution along the individual components for the

multicomponent signals. Consequently the resultant TFDs are de�blurred.

2. CTs' elimination which avoids confusion between noise and real components

in a TFD for nonlinear TF structures and multicomponent signals.

3. Good mathematical properties which bene�t to its application. This requires

that TFDs to satisfy total energy constraint, marginal characteristics and

positivity issue etc. Positive distributions are everywhere nonnegative, and yield

the correct univariate marginal distributions in time and frequency.

4. Lower computational complexity means the time needed to represent a signal

on a TF plane. The signature discontinuity and weak signal mitigation may

increase computation complexity in some cases.

A comparison of some popular TFD functions is presented in Table 1.1. To

analyze the signals well, choosing an appropriate TFD function is important. Which

TFD function should be used depends on what application it applies on. On the

7

Table 1.1: Synthesis of Main Problems related to QTFDs

Synthesis of Major Concerns Gabor transform WD Gabor�Wigner transformClarity Worst Best Reasonably GoodCTs Nil Present for multi� Almost eliminated

component signals andnonlinear TF structures

Mathematical properties Unsatisfactory Satisfactory GoodComputational complexity Quite Low High Higher

other hand, the short comings make speci�c TFDs suited only for analyzing STSC

with speci�c types of properties and TF structures. An obvious question then arise

that which distribution is the "best" for a particular situation. Generally there is

an attempt to set up a set of desirable conditions and to try to prove that only one

distribution �ts them. Typically, however, the list is not complete with the obvious

requirements, because the author knows that the added desirable properties would

not be satis�ed by the distribution he or she is advocating. Also these lists very

often contain requirements that are questionable and are obviously put in to force an

issue. As an illustration, by focusing on the WD and its variants, Jones and Parks

[100] have made an interesting comparative study of the resolution properties and

have shown that the relative performance of the various distributions depend on the

signal. The results show that the pseudo WD (PWD) is best for the signals with

only one frequency component at any one time, the Choi�Williams distribution is

most attractive for multicomponent signals in which all components have constant

frequency content, and the matched �lter STFT is best for signal components with

signi�cant frequency modulation. Jones and Parks have concluded that no TFD can

8

be considered as the best approach for all TF analysis and both concentration and

resolution can not be improved at one time.

Half way in this decade, there has been an enormous amount of work towards

achieving high concentration along the individual components and to enhance the

ease of identifying the closely spaced components in the TFDs. The aim has been

to correctly interpret the fundamental nature of the STSC under analysis in the TF

domain. There have been three open trends that make this task inherently more com-

plex:

1. Concentration and resolution tradeoff

2. Application speci�c environment

3. Objective Assessment of TFDs

1.1 Concentration and Resolution Tradeoff

Tradeoff between concentration and CTs' removal is a classical problem. The con-

cepts of concentration and resolution have generally been used synonymously and

are considered equivalent in literature. This may be true for monocomponent signals

but for multicomponent signals this is not necessarily the case and we need to es-

tablish a clear distinction between the two terms. As an illustration, the CTs in the

WD do not reduce the auto component concentration of the WD, which is consid-

9

ered optimal, but they do reduce the resolution. Although high signal concentration

is always desired and is often of primary importance, in many application signal res-

olution may be more important, specially in the analysis of multicomponent signals.

Consequently these two aspects can be ef�ciently utilized for adaptive and automatic

parameters selection and optimization in TF analysis, without interference by a user

[100].

1.2 Application speci�c Environment

Different applications have different preferences and requirements to the TFDs. In

general the choice of a TFD in a particular situation depends on many factors such as

the relevance of properties satis�ed by TFDs, the computational cost and speed of the

TFD, and the tradeoff in using the TFD. For example, the results, which are claimed

optimal for a situation being highly concentrated but with weak signal mitigation and

discontinous signature, may not be feasible for certain applications.

1.3 Objective Assessment of TFDs

It is a fact that choosing the right TFD to analyze the given signal is not straightfor-

ward, even for monocomponent signals, and becomes more complex while dealing

with multicomponent signals. The common practice to determine the best TFD for

the given signal have been the visual comparison of all plots with the choice of most

appealing one. However this selection is generally dif�cult and subjective. The need

10

to objectively compare the various TFDs requires the introduction of some quantita-

tive performance measure speci�cally tailored for TFDs.

The estimation of signal information and complexity in the TF plane is quite

challenging. The themes which inspire new measures for estimation of signal infor-

mation and complexity in the TF plane, include the CTs' suppression, concentration

and resolution of auto�components and the ability to correctly distinguish closely

spaced components. Ef�cient concentration and resolution measurement can provide

a quantitative criterion to evaluate performances of different distributions. They con-

form closely to the notion of complexity that are used when visually inspecting TF

images [37].

1.4 Our Approach

This thesis proposes a novel arti�cial neural network (ANN) based framework which

focuses to estimate de�blurred TFDs of different signals taking advantage of the

ANN learning capabilities. A number of research papers, have been published based

on the proposed idea, which is evidence to the novelty of the idea. A list is provided

at appendix B. The removal of TFDs' distortions is considered as a case of image de�

blurring. This is particularly suited for learning [218] by ANN because of following

reasons [109]:

� There is little information available on the source of blurring.

11

� Usually blurring is the result of combination of events, which makes it too

complex to be mathematically described.

� Suf�cient data are available and it is conceivable that data capture the

fundamental principle at work.

The method fundamentally involves training a set of suitably chosen ANNs

with the spectrograms of known signals as the input, and processed WDs as the

target. Judiciously selected signals having time�varying frequency components are

employed for the training purposes and the trained ANN model then provides the

de�blurred TFDs from spectrograms of unknown signals.

1.5 Contribution

The work presented in this thesis contribute in three ways to the research �eld of TF

signal processing, which are discussed as under.

1.5.1 Review of High Resolution TF methods

In the �rst part of thesis, a review of TF methods, for obtaining high concentra-

tion and good resolution in a TFD, is presented which are proposed in last decades.

Though the task is ambitious but this part is important for signal processing research

community due to following reasons

12

1. It provides the basic concepts and well�tested algorithms to obtain highly

concentrated and good resolution TFDs for research community. The emphasis

is given to the ideas and methods that have been developed steadily so that

readily understood by the uninitiated,

2. It highlights unresolved issues with stress over the fundamentals to make it

interesting for an expert as well. The approaches are presented in a sequence

developing the ideas and techniques in a logical sequence rather than historical,

and

3. It attempts to clearly describe what a time�varying spectrum is, and dicusses

important aspects to represent properties of signals simultaneously in time and

frequency without any ambiguity.

1.5.2 A Novel ANN based Framework

The second contribution of the thesis is the implementation of a novel ANN based

method for computing highly informative TFDs. The proposed method provides

a way to obtain a non�blurred and high resolution version of the TFDs of signals

whose frequency components vary with time. The resulting TFDs do not have the

CTs that appear in case of multicomponent signals in some distributions such as WD,

thus providing visual way to determine the IF of nonstationary signals. It is proved

that

13

1. ANN learning capabilities can be successfully used in the TF �eld, where they

have not been applied before.

2. The effectiveness of the BRNNM to estimate the good resolution and highly

concentrated TFDs. The degree of regularisation is automatically controlled

in the Bayesian inference framework and produces networks with better

generalised performance and lower susceptibility to over��tting,

3. The usefulness of clustering the data based on underlying TF images'

characteristics,

4. The advantage of training multiple networks for each cluster and selecting the

best, and

5. A mixture of expert networks (ENs) focused on a speci�c task are found to

deliver a TFD that is highly concentrated along the IF with no CTs as compared

to training ANN which do not receive the selected input.

1.5.3 The Objective Assessment

The third contribution is the exploration of TFDs' performance measures and dimen-

sions. An alteration for an existing concentration measure is suggested and used to

get true picture of TFDs performance presented in [33]. It is brought out that

14

1. Ef�cient TFD concentration and resolution measure can provide a quantitative

criterion to evaluate performance of different distributions,

2. Such measures can be used for adaptive and automatic parameters selection,

3. The validity of the selected objective criteria is veri�ed in effectively describing

the TFDs' concentration and resolution performance.

1.6 Overview of the Thesis

The thesis is organised as follows

Chapter 2 gives the background on the high resolution TF analysis and is or-

ganised in three parts. First part covers two separate areas of interest. The former

gives the detail of those methods which are based on the evolutionary spectrum to

improve the TF picture. Whereas the later presents the work to obtain high con-

centration and good resolution TFDs based on Cohen's bilinear class. The ideas of

concentration and resolution are presented in greater detail and a brief overview of

the best�known open issues considered by the academic and industrial communities

is also discussed. The second part discusses the necessary ANN fundamentals, on

which the proposed technique to obtain de�blurred TFDs is built upon. Part three

gives the description of the most popular objective criteria, which are used to evalu-

ate the TFDs' performance.

15

In Chapter 3; we progressively recount, in various sections, the work towards

the goal of improving the TFDs' resolution and concentration. The Chapter formu-

lates the problem, states various constraints and presents a novel ANN based fram-

work to de�blur the TFDs with few limitations. It presents the comparison of ANN

training algorithms and proceeds with an experimental study on optimizing the de-

sign, architecture and various parameters of the ANN setup such as, number of neu-

rons, layers, and type of activation functions. More importantly the Chapter discusses

the advantage of training multiple ANNs and clustering the data.

In Chapter 4; the optimized ANN based framework is presented for realization

of highly concentrated and good resolution TFDs, catering for the limitations as-

sumed in the previous Chapter. The proposed technique makes use of the principles

of vectorization, correlation and taxonomy. This ANN based technique is evaluated

by comparing it with other related work and the pre�eminence of results over other

techniques is shown. The role of appropriately vectored and clustered data and the

effect of regularization under Mackay's evidence framework on the training process

is described. In the �nal part of chapter, a real life signal is considered to check

the effectiveness of the proposed algorithm via analysis based on entropy and visual

interpretation.

Chapter 5 provides the discussion on the experimental results by considering

various synthetic and real life signals. A number of theoretical results are presented

for the validation, soundness and completeness of the proposed framework. The

16

Chapter uses objective methods of assessment to evaluate the performance of de�

blurred TFDs by the BRNNM. Performance comparison with various other quadratic

TFDs is provided too.

Finally, Chapter 6 concludes the thesis. The limitations and future work are

discussed. Possible extensions of the framework are proposed, the most interesting

being the derivation of mathematical expression for the IF and satisfying the mar-

ginals and other TFDs' constraints.

17

Chapter 2Background

The chapter discusses the background of this research work for computing de�

blurred TFDs, and is divided in two parts. The �rst part gives the basic concepts and

well�tested algorithms to obtain highly concentrated and good resolution TFDs2. The

emphasis remains on the ideas and methods that have been developed steadily so that

readily understood by the uninitiated. It is endeavored to highlight unresolved issues

with stress over the fundamentals to make it interesting for an expert as well. The ap-

proaches are presented in a sequence developing the ideas and techniques in a logical

sequence rather than historical. The individual sections may be read independently.

The ANN fundamentals have been placed in Appendix A. The parameters of ANN

based setup are optimized in succeeding Chapters. The second part discusses the

necessary theoretical and mathematical description of TFDs' objective assessment

methods that have been proposed in past few years.

2.1 TF Analysis

The concepts of concentration and resolution have generally been used synonymously

and are considered equivalent in literature. This may be true for monocomponent

signals but for multicomponent signals this is not necessarily the case and a clear

2 Although new ideas are coming up rapidly, we can not discuss all of them due to space limitations.

18

distinction between the two terms is needed. As an illustration, the CTs in the WD

do not reduce the auto component concentration of the WD, which is considered

optimal, but they do reduce the resolution. Although high signal concentration is

always desired and is often of primary importance, in many application signal reso-

lution may be more important, specially in the analysis of multicomponent signals.

There have generally been two approaches to estimate the time�dependent spectrum

of nonstationary processes.

1. The evolutionary spectrum (ES) approaches [38]�[41], which model the

spectrum as a slowly varying envelope of a complex sinusoid.

2. The Cohen's bilinear distributions (BDs) [31], including the spectrogram, which

provides a general formulation for joint TFDs. Computationally, the ES methods

fall within Cohen's class.

There are known limitations and inherent drawbacks associated with these clas-

sical approaches. These pheneomena make their interpretation dif�cult, consequently

estimation of the spectra in the TF domain displaying good resolution and high con-

centration has become a research topic of great interest3. In Section 2.1.1 the high

TF resolution approaches based on the ES theory are presented, and Section 2.1.2

discusses the approaches based on Cohen's BDs.

3 Due to the limitation of space, only a brief account of various high resolution TF method is pre-sented here. For greater details and the best simulation results refer to [255]

19

2.1.1 The Methods based on Evolutionary Spectrum

The ES was �rst proposed by Priestley in 1965. The basic idea is to extend the

classic Fourier spectral analysis to a more generalized basis: from sine or cosine

to a family of orthogonal functions. In his evolutionay spectral theory, Priestely

represents nonstationary signals using a general class of oscillatory functions and

then de�nes the spectrum based on this representation [229]. A special case of the

ES used the Wold�Cramer representation of nonstationary processes [230]�[233] to

obtain a unique de�nition of the time�dependent spectral density function. The main

objective in deriving and presenting these relations in [234] was to show that the BDs

and the spectrogram can be the estimators of the ES. These relations will also enable

us to represent the different TFDs in terms of the generalized transfer function (GTF),

allowing us to recover the GTF or the ES from them.

A great amount of work is found by Pitton and Loughlin in this area [242]�

[250]. They have investigated the positve TFDs and their potential applications,

utilizing the ES and Thompson's multitaper approach [240, 241], but they do not

discuss the issue of their concentration and resolution as such. Positive distributions

are everywhere nonnegative, and yield the correct univariate marginal distributions

in time and frequency.

The literature indicates that the pioneering work is performed by Chaparro,

Jaroudi, Kayhan, Akan, and Suleesathira. These researcher have not only focused in

computing the improved evolutionary spectra of non�stationary signals but also inno-

20

vatively applied the concepts to application in various practical situations [38]�[64].

Their major work include, signal�adaptive evolutionary spectral analysis and a para-

metric approach for data�adaptive evolutionary spectral estimation. An interesting

work is performed by Jachan, Matz and Hlawatsch [49] on the parametric estima-

tions for underspread nonstationary random processes.

2.1.1.1 Signal�Adaptive Evolutionary Spectral Analysis

Although it is well recognized that the spectra of most signals found in practical

applications depend on time, estimation of these spectra displaying good TF resolu-

tion is dif�cult [31]. The problem lies in the adaptation of the analysis methods to the

change of frequency in the signal components. Constant�bandwidth mehods, such as

the spectrogram and traditional Gabor expansion [67], provide estimates with poor

TF resolution.

The earlier approaches by Akan and Chaparro to obtain high resolution evolu-

tionary spectral estimates include: averaging estimates obtained using multiple win-

dows [48] and maximizing energy concentration measure [41]. A modi�ed Gabor

expansion is proposed in [41] that uses multiple windows, dependent on different

scales and modulated by linear chirps. Computation of the ES with this expansion

provides estimates with good TF resolution. The dif�culties encountered, however,

were the choices of scales and in the implementation of the chirping.

21

2.1.1.2 Data�Adaptive Evolutionary Spectral Estimation

The ES theory is though mathematically well�grounded, but has suffered from

a shortage of estimators. The initial work from Kayhan concentrate on evolutionary

periodogram (EP) as an estimator on the line of BDs. The latest work, however,

follows a parametric approach in deriving the high quality estimator for the ES [38,

39]. Parametric approaches to model the non�stationary signal using rational models

with time�varying coef�cients represented as expansions of orthogonal polynomial

have been proposed by various investigators, e.g., [235, 236]. However, the validity

of their view of a nonstationary spectrum as a concatenation of �frozen�time� spectra

has been questioned [232, 237].

In their earlier effort, Kayhan, Jaroudi and Chaparro [38, 65] proposed the EP

as an estimator of the Wold�Cramer ES. The EP is found to possess many desir-

able properties and reduces to the conventional periodogram in the stationary case.

But there were some unrealistic assumptions like considering signal components as

uncorrelated. This lead to the development of data�adaptive evolutionary spectral

esitmator to improve the performance [39].

The proposed estimator uses information about the signal components at fre-

quencies other than the frequency of interest. It computes the spectrum at each

frequency while minimizing the interference from components at other frequencies

without making any assumptions regarding these components. This estimator re-

duces to Capon's maximum likelihood method [239] in the stationary case. This new

22

estimator has better TF resolution than the EP and it possesses many desirable prop-

erties analogous to those of Capon's method. In particular, it performs more robustly

than existing methods when the data is noisy.

2.1.1.3 Miscellaneous Techniques and Applications

A considerable amount of work is performed by a number of researchers in

achieving good resolution ES and applying the results and related theory to many

�elds, specially where nonstationary signals arise. The purpose of their work has

ranged from the simple graphic presentation of the results to sophisticated manipu-

lations of spectra. Suleesathira, Chaparro and Akan [44] propose a transformation

for discrete signals with time�varying spectra. The kernel of this transformation pro-

vides the energy density of the signal in TF with good resolution qualities. With

this discrete evolutionary transform a clear representation for the signal as well as its

TF energy density is obtained. It is suggested to use either the Gabor or the Malvar

discrete signal representations to obtain the kernel of the transformation. The sig-

nal adaptive analysis is then possible using modulated or chirped bases, and can be

implemented with either masking or image segmentation on the TF plane.

The discrete evolutionary and Hough transform are innovatively used in jam-

mer excision techniques for spread spectrum communication system, for �nding un-

ambiguous IF estimation for a jammer composed of chirps. This interesting approach

is a piecewise linear approximation of the IF, concentrated along the individual com-

ponents of signal, using the Hough transform (used in image processing to infer the

23

presence of lines or curves in an image) and the ES [45]. The ef�ciency and prac-

ticality of this approach lie in localized processing, linearization of the IF estimate,

recursive correction, and minimum problems due to CTs in the TFDs or in the match-

ing of parametric models.

Barbarossa in [103] proposed a combination of the WD and the Hough trans-

form for detection and parameter estimation of chirp signals in a problem of detection

of lines in an image, which is the WD of the signal under analysis. This method pro-

vides a bridge between signal and image processing techniques, is asymptotically

ef�cient, and offers a good rejection capability of the CTs, but it has an increased

computational complexity. Barbarossa et al. further proposed an adaptive method

for suppressing wideband interferences in spread spectrum communications based

on high resolution TFD of the received signal [46]. The approach is based on the

generalized Wigner�Hough transform as an effective way to estimate the clear pic-

ture of the IF of parametric signals embedded in noise. The proposed method pro-

vides the advantages like, (1) it is able to reliably estimate the interference parameters

at lower SNR, exploiting the signal model, (2) the despreading �lter is optimal and

takes into account the presence of the excision �lter. The disadvantage of the pro-

posed method, besides the higher computational cost, is that it is not robust against

mismatching between the observed data and the assumed model.

Chaparro and Alshehri [47], innovatively obtain better spectral esimates and

use it for the jammer excision in direct sequence spread spectrum communications

24

when the jammers cannot be parametrically characterized. The non�stationary sig-

nals are represented using the TF and the frequency�frequency evolutionary trans-

formations. One of the methods, based on the frequency� frequency representation

of the received signal, uses a deterministic masking approach while the other, based

in non�stationary Wiener �ltering, reduces interference in a mean�square fashion.

Both of these approaches use the fact that the spreading sequence is known at the

transmitter and the receiver, and that as such its evolutionary representation can be

used to estimate the sent bit. The difference in performance between these two ap-

proaches depends on the support rather than on the type of jammer being excised.

The frequency�frequency masking approach is found to work well when the jam-

mer is narrowly concentrated in parts of the frequency�frequency plane, while the

Wiener masking approach works well in situations when the jammer is spread over

all frequencies.

Shah, Loughlin and Chaparro [245] have developed a method for generating

an informative prior when constructing a positive TFD by the method of MCE. This

prior results in a more informative MCE�TFD, as quanti�ed via entropy and mutual

information measures. The procedure allows any of the BDs to be used in the prior

and the TFDs obtained by this procedure are close to the ones obtained by the de-

convolution procedure at reduced computational cost. Further Shah and Chaparro

[64, 65] make use of the TFDs for the estimation of GTF of an LTV �lter with a goal

that once it is blurred, produces the TFD estimate. They used the fact that many of

25

these distributions are written as blurred versions of the GTF and made use of de-

convolution technique to obtain the de�blurred GTF. The technique is found general

and can be based on any TFD with many advantages like: (i) it estimates the GTF

without the need for orthonormal expansion used in other estimators of the ES, (ii) it

does not require the semi�stationarity assumption used in the existing deconvolution

techniques, (iii) it can be used on many TFDs, (iv) the GTF obtained can be used to

reconstruct the signal and to model LTV systems and, (v) the resulting ES estimate

out performs the ES obtained by using the existing estimation techniques and can be

made to satisfy the TF marginals while maintaining positivity.

The Power Spectral Density of a signal calculated from the second order sta-

tistics can provide valuable information for the characterization of stationary signals.

This information is only suf�cient for Gaussian and linear processes. Whereas, most

real�life signals, such as biomedical, speech, and seismic signals may have non�

Gaussian, non�linear and non�stationary properties. Addressing this issue, Unsal,

Akan and Chaparro [66], combine the higher order statistics and the TF approaches,

and present a method for the calculation of a time�dependent bispectrum based on

the positive distributed ES. This idea is particularly useful for the analysis of such

signals and to analyze the time�varying properties of non�stationary signals.

2.1.2 The Methods based on Cohen's Bilinear Class

The Cohen's BDs can be obtained from a general expression

26

C (t; !) =1

4�2

ZZZs�(�� 1

2�)s(�+

1

2�)(�; �)e�i�t�i�!+i��d�d�d� (2.1)

where C(t; !) is the joint distribution of signal s(t) and (�; �)is called the ker-

nel. The term kernel was coined by Classen and Mecklenbrauker [69]�[71]. These

two made extensive contributions to general understanding in signal analysis context

alongwith Jansen [72].

Many divergent attitudes toward the use, meaning, interpretation, and most im-

portantly improving the spectral quality of Cohen's BDs have arisen over the years.

The divergent viewpoints and interests have led to a better understanding and imple-

mentation. The subject is evolving rapidly and most of the issues are open. However

it is important to understand the ideas and arguments that have been given, as varia-

tions and insights of them has lead way to further developments.

2.1.2.1 The Scaled�Variant Distribution � A TFD Concentrated along the IF

In an important set of papers, Stankovic and co�workers [79, 88, 89] innova-

tively used the similarities and differences with quantum mechanics and originated

many new ideas and procedures to achieve the good resolution and high concentra-

tion in the joint distributions. Their initial work suggest use of the polynomial WD

[17, 18] to improve the concentration of monocomponent signals, taking the IF as

polynomial function of time. A similar idea for improving the distribution concen-

tration of the signal whose phase is polynomial up to the fourth order is presented in

27

[80]. In order to improve distribution concentration for a signal with an arbitrary non-

linear IF, the L� Wigner distribution (LWD) is proposed and studied in [15, 82, 83].

The polynomial WD, as well as the LWD, are closely related to the time�varying

higher order spectra [18, 83, 85]. They were found to satisfy only the generalized

forms of marginal and unable to preserve the usual marginal properties [1, 86].

The recent work by Stankovic is a variant of LWD obtained by mainly scal-

ing the phase while keeping the signals' amplitudes unchanged [79, 90]. This new

distribution is termed as the scaled variant of the LWD (SD) of a signal. The word

"pseudo" is used to indicate the presence of the window. The distribution achieves

high concentration at the IF�as high as the LWD�while at the same time satisfy-

ing time marginal and unbiased energy condition for any L. The frequency marginal

is satis�ed for asymptotic signals as well.

A method for the direct realization of the SD, based on the straightforward

application of a distribution de�nition, is presented in [89]. In the case of multicom-

ponent signals, this method produces signal power concentrated at the resulting IF,

according to the theorem presented in [79]. Theory is illustrated on the numerical

examples of multicomponent real signals. The proposed distributions may achieve

arbitrary high concentration at the IF, satisfying the marginal properties. Till the pub-

lication of [89], this was possible only in a very special case of the LFM signals using

the WD.

28

2.1.2.2 Reassigned TFDs

Some TFDs were proposed to adapt to the signal TF changes. In particular, an

adaptive TFD can be obtained by estimating some pertinent parameters of a signal�

dependant function at different time intervals [209]. Such TFDs provide highly lo-

calized representations without suffering QTFDs' CTs. The trade�off is that these

TFDs may not satisfy some desirable properties such as energy preservation. Ex-

amples of adaptive TFDs include the high resolution TFD [112], the signal�adaptive

optimal�kernel TFDs [131, 133], the optimal radially Gaussian TFD [132] and Co-

hen's nonnegative distribution [135]. Reassigned TFDs also adapt to the signal by

employing other QTFDs of the signal such as the spectrogram, the WD or the scalo-

gram [93]�[99]. The former types of adaptive TFDs are discussed under the name

Optimal�kernel TFDs in the following Section.

The method of reassignment improves the TF concentration and resolution by

mapping the data to those TF coordinates. which are closer to the true region of

support of signal under consideration. The method is presented by several researchers

with different names [93]�[99], including method of reassignment, remapping, TF

reassignment, and modi�ed moving�window method.

The reassignment method. The classical work on the method of reassignment

was �rst done by Kodera, Gendrin, and de Villedary. They gave it the name of mod-

i�ed moving window method [96]. The proposed technique enhances the resolution

in time and frequency of the the spectrogram by projecting each data point a new TF

29

coordinate. The new TF coordinate re�ects the distribution of energy in the analyzed

signal in a better way. This modi�cation of the spectrogram remained unused due to

implementation and ef�ciency issues. Later on Auger and Flandrin [93] showed and

applied this method advantageously to all the bilinear TF and time�scale represen-

tations. They called it the reassignment method. Also Nelson arrived at a method,

similar to Kodera, for improving the TF precision of short�time spectral data from

partial derivatives of the short�time phase spectrum [97].

2.1.2.3 Optimal�Kernel TFDs.

In fact a QTFD can be obtained by �rst smoothing the symmetric ambiguity

function (AF) by using the kernel function and then by taking a 2�D FT of the result.

This result is equivalent to a 2�D �ltering in the ambiguity domain. The properties of

distribution are re�ected by simple constraints on the kernel, and have been used ad-

vantageously to develop practical methods for analysis and �ltering, as was done by

Eichmann and Dong [16]. Excellent reviews relating the properties of the kernel to

the properties of the distribution have been given by Janse and Kaizer [73], Janssen

[72], Classen and Meclenbrauker [71], and Boashash [34]. By examining the ker-

nel one readily can ascertain the properties of the distribution. This allows one to

pick and choose those kernels that produce distributions with prescribed, desirable

properties. Thus, by a proper choice of kernel function, one can reduce or remove

the CTs in the analysis of multicomponent signal. This uni�ed approach is simple

with an advantage that all distributions can be studied together in a consistent way.

30

Generally the optimum kernel TFDs can be achieved by three different approaches to

optimizing the kernel with an aim to improve the resolution of resulting TFDs, which

are:

1. High resolution TFDs based on high spectral resolution kernels.

2. High resolution TFDs based on signal independant kernels.

3. High resolution TFDs based on signal dependant kernels.

2.1.2.3.1 High�Resolution TFDs�High Spectral Resolution Kernels

TFD's along with their temporal and spectral resolutions are uniquely de�ned

by the employed TF kernels. Potential kernels seek to map, at every time sample,

the time�varying signals in the data into approximately �xed frequency sinusoids in

the local autocorrelation function (LAF). Applying the Fourier transform to the LAF,

therefore, provides a peaky spectrum where the location of the peaks are indicative to

the signals' instantaneous power concentrations. The sinusoidal components in the

LAF, however, generally appear with some type of amplitude modulations, which

are highly dependent on the kernel composition [199]. Such modulation presents a

limitation on spectral resolution in the TF plane, as it may spread the auto and CTs

to localizations.

Because of the kernel modulation effects on the various terms, closely spaced

frequencies may not be resolved. Further, since TFD's are Fourier�based, then in ad-

dition to the amplitude modulations imposed by the kernels, the spectral resolution is

31

limited by and highly dependent on the extent of LAF, i.e., the lag window employed

[199]. However, increasing the length of the LAF will not always yield improved

resolution. Events occurring over short periods of time do not require large kernels,

which may only lead to increased CT contributions from distant events and obscure

the local auto terms. Limited availability of data samples may also provide another

reason for using small extent kernels. In these cases, improving spectral resolution of

a TFD can be achieved by parameterizing its LAF via autoregressive modeling tech-

niques [200]�[204]. Such parameterization seeks to �t a least�squares randommodel

to the second�order statistics of the LAF at different time instants. The autoregres-

sive modeling techniques, however, view the LAF as a stationary process along the

lag dimension. Since TF distribution kernels translate deterministic signals into oth-

ers of deterministic nature, it will be more appropriate to �t a deterministic, rather

than a stochastic, model to the LAF. Further, all modeling techniques applied in the

TFD context mostly have only dealt with pseudo�WD or the smoothed pseudo�WD

kernels.

Amin and Williams [199], have maintained that in addition to pseudo�WD

and the smoothed pseudo�WD of separable time and lag windows, there exists a

large class of TF kernels for which the LAF are amenable to high spectral resolution

techniques. The members of this class satisfy the desirable TF properties for power

localization in nonstationary environment, yet they produce LAFs that are amenable

to exponential deterministic modeling during periods of stationarity. The proposed

32

high spectral resolution kernels are, however, bound to meet two basic conditions

[199], (i) the frequency marginal, and, (ii) an exponential behavior in the ambiguity

domain for constant values of few parameters.

In dealing with sinusoidal data, the �rst property guarantees that the auto term

sinusoids in the LAF are undamped. The second property enforces an exponential

damping on all CTs. Resultantly the sinusoidal components translate into damped or

undamped sinusoids. High�resolution techniques such as reduced rank approxima-

tion of the backward linear prediction data matrix can then be applied for frequency

estimation. Amin and Williams use Prony's method and its other approximations

[205], [206] in the TF context. This method is shown to be applicable to high spec-

tral resolution TFD problems, speci�cally when the underlying LAF is made up of a

sum of exponentially damped/undamped sinusoids or chirp�like signals.

2.1.2.3.2 A High Resolution QTFD�Signal Independent Kernels

A signal independent kernel for the design of a high resolution and CTs free

quadratic TFD is proposed in [208]. The �ltering of the CTs in the ambiguity domain

that reduces (or removes) the CTs in the TF domain unfortunately results in a lower

TF resolution. That is, there is tradeoff between CTs suppression and TF resolution

in the design of a given quadratic TFD. Barkat and Boashash propose a kernel that

allows retaining as many auto terms energy as possible while �ltering out as much

CTs energy as possible. The kernel is de�ned in the time lag domain keeping in view

the implementation of the resultant TFD. This results in an alias free distribution

33

that can solve problems that the WD or the spectrogram cannot. In particular, the

proposed distribution is shown to resolve two close signals in the TF domain that the

two other distributions cannot.

2.1.2.3.3 Adaptive TFDs�Signal Dependant Kernel

It is shown that an adaptive TFD can be obtained by estimating some pertinent

parameters of signal dependant function at different time intervals. Such TFDs are

expected to provide highly localized representations withouth suffering from CTs.

The tradeoff is that these TFDs may not satisfy some desirable properties such as

energy preservation. Baraniuk and Jones have made use of the fact that symmetric

AF is the characteristic function of the WD. The mathematical and possible physical

analogy between the two enhances the interpretation of the properties of the AF.

Several different approaches have been developed optimizing the signal depen-

dant kernel for the TF analysis [130]�[133], including:

1. 1/0 optimal kernel TFD [131] approach in which the optimal kernel is given a

special binary structure,

2. Optimal Radially Gaussian kernel TFD [132] approach which tempers the

`1/0 kernel' where an additional smoothness constraint is used that makes the

optimal kernel to become the Gaussian along radial pro�les.

3. Signal Adaptive Optimal kernal (AOK) TFD [133] approach which varies with

time according to the radially gaussian kernel thus maximizing the performance.

34

2.1.2.4 Dispersive Class TFDs

These TFDs are also termed warping�based TFDs which provide a very good

concentration for STSC having non�linear TF characterstics, such as dolphin and

whale whistles, radar and sonar waveforms, and shock waves in fault structures. To

improve the processing of such signals, QTFDs that satisfy the dispersive GD shift

covariance property are designed by Papandreou, Hlawatsch and Boudreaux�Bartels

in [147]�[151].

Papandreou and Boudreaux�Bartels prove that for successful TF analysis, it is

advantageous to match the speci�c time shift of a QTFD with changes in the GD of

the signal. In some applications, signals with known GD need to be processed. As

a result, a matched QTFD can be designed with a characteristic function. When the

signal GD is not known a priori, some pre�processing is necessary before designing a

well matched QTFD. A rough GD estimate can be obtained by �tting a curve through

the spectrogram of the signal or by using one of the many porposed algorithms to

estimate GD or IF characteristics [26]�[30]. Because the phase function of the signal

needs to be one�to�one for designing its matched QTFD by appropriately warping

the WD or its smoothed versions, approximations of the GD function can also be

used.

Warping based TFDs� theoretical examples and advantages. Different dis-

persive QTFDs can be obtained that include the linear chirp class (warped af�ne

class) with linear GD; the hyperbolic class (warped Cohen's class) with hyperbolic

35

GD; the k�th power class (warped af�ne class) with k�th order power GD; and the

exponential class (warped af�ne class) with exponential GD.

Papandreau, Boudreaux�Bartels and co�workers [154],[166]�[168] demonstrate

the effectiveness of dispersive class QTFDs and the importance of matching STSC

with QTFDs using various simulations including constant and linear, constant and

hyperbolic, constant and exponential, constant and power TF structures and power

TF structures with real data. The QTFDs in all these cases show better resolution and

CT suppression. For example it is demonstrated the dispersive WD is highly local-

ized for the time modulation signal. Speci�cally, it is found that dispersive WD is a

dirac delta function at GD of the signal. This means that the dispersive WD is ideally

matched to time modulation signals when the GD in the dispersive WD formulation

matches the GD of the signal. It is important to note that a dual dispersive class can

be similarly obtained to match the dispersive FM signals by preserving dispersive IF

shift [157].

Another example is the af�ne class that is actually the power class, the corre-

sponding power class is the linear chirp class that is well matched to signals with

linear TF characteristics. Two QTFDs from the linear chirp class are the linearly

warped WD and the chirpogram. These are obtained when the WD and the spectro-

gram, respectively, are warped with quadratic characteristics function. The linearly

warped WD is found to provide high localied representations when analyzing linear

time mudulation signals. On the other hand, the chirpogram has a de�nite TF reso-

36

lution advantage over the spectrogram when analyzing multicomponent signals with

linear characteristics. This is because the smoothing operation of the chirpogram

is performed along lines of any slope in the TF plane wheras the smoothing of the

spectrogram is only along horizontal or vertical lines [158].

Through various examples, Papandreou et al. prove the power QTFDs as ideal

for signals that propagate through linear systems with speci�c power GD character-

istics such as when a wave propagates through a dispersive medium [154]. Other

signals that are matched to k�th power QTFDs include the dispersive propagation of

shock wave in a steel beam (k = 0:5) [159, 161]; transionospheric signals mesured

by satellites (k = �1) [160]; acoustical waves re�ected from a spherical shell im-

mersed in water [162]; some cetacean mammal whistles [165], and diffusion equation

based waveforms (k = 0:5) [163](e.g., waves from uniform distubuted radio com-

munication transmission lines [164]).

Limitations. Warping based or Dispersive QTFDs could be computationally

intensive when implemented directly using numerical integration as in the case of

warping WD to obtain power WD. Papandreou et al. suggest an alternative imple-

mentation scheme that allows the use of existing ef�cient algorithms for computing

Cohen's class or Af�ne class QTFDs as done by them in [154] for power QTFDs.

However the increased computational complexity of the dispersive QTFDs is the

trade�off for the improved performance in analyzing signals with matched dispesive

GD characteristics.

37

2.1.2.5 TFDs with Complex Arguments

One of the most important concept to improve concentration in case of non�

linear structures is the complex argument distributions (CADs) introduced by Srdjan

Stankovic and LJ Stankovic [251, 252] and generalized later by Cornu et al. [253].

The purpose is to give a distribution that is highly concentrated along the GD and in

turn to the IF for the mono and multicomponent signals. The CADs use the concepts

of complex�frequency and complex-lag arguments in two domains of the Laplace

and the time [252]. The two forms successfully produce the concentrated representa-

tions along the GD or the IF. As the signal is available along the real time axis only,

a complex�valued argument form of the signal is computed using certain tools pro-

posed by Stankovic in [252]. The relation between the FT and the Laplace transform

and the analytic extension of the signal are used in these tools [7].

Generalized representations of phase derivative for regular signals. A recent

work in the same category are the generalized complex�lag distributions proposed

by Cornu et al. in [253]. These distributions are based on generalized complex�lag

moment and give the arbitrary instantaneous phase derivative (IPD) representation,

producing high concentration. An accurate IF estimation is obtained by these distri-

butions even in the case of signi�cant IF variation. Moreover a slight modi�cation of

the generalized CAD can result in accurate IF rate estimation like some of the exist-

ing method (e.g. [254]). Higher order TF rate distributions in this type can result in

38

better IF concentration. These distributions are parameterized by two integersK and

N .

2.1.2.6 TFDs Based On Signal Expansions

The wide scope of patterns embedded in complex signals and the precision of

their characterization motivate decompositions over large and redundant dictionaries

of waveforms. Linear expansions in a single basis, whether it is a Fourier, wavelet,

or any other basis, are not �exible enough. In Fourier and wavelet basis, it is dif�cult

to detect and identify the signal patterns from their expansion coef�cients, because

the information is diluted across the whole basis. Due to this reason, the alternatives

are found to traditional signal representations in form of alternate dictionaries instead

of representing signals as superpositions of sinusoids. Out of such dictionaries one

can �nd the wavelets, steerable wavelets, segmented wavelets, Gabor dictionaries,

multiscale Gabor dictionaries, wavelet packets, cosine packets, chirplets, warplets,

and a wide range of other dictionaries.

There is an explosion of interest for obtaining signal representations in over-

complete dictionaries4, ranging from general approaches, like the method of frames

[11] and the method of matching pursuit (MP) [175], to specialized dictionaries, like

the method of best orthogonal basis [174]. There are both advantages and short-

comings of these classical approaches. The expansion of the STSC into an in�nite

4 Because they start out that way or because complete dictionaries are merged, obtaining a newmegadictionary consisting of several types of waveforms (e.g., Fourier and wavelets dictionaries).

39

number of TF shifted versions of a weighted elementary atom based on these meth-

ods and then applying suitable TF transform method like WD will result in highly

concentrated and good resolution TFDs. Some important signal expansion concepts

and the resulting TFDs are presented in succeeding paragraphs, from which the TF

research community has specially been bene�tted.

2.1.2.6.1 Matching Pursuits TFDs with TF Dictionaries.

Mallat and Zhang [175] introduce an algorithm called MP, that decomposes

any signal into waveforms selected among a dictionary of TF atoms. These atoms

are somehow like the dilations and translations, and somewhere like the modulations

of a single window function. This is achieved using successive approximations of

the signal with orthogonal projections on dictionary elements. Literature indicates

similar algorithms proposed by Qian and Chen [213] for Gabor dictionaries and for

Walsh dictionaries by Villemoes [214]. The MPs provide extremely �exible signal

representations since the choice of dictionaries is not limited. Moreover the proper-

ties of the signal components are explicitly given by the scale, frequency, time and

phase indexes of the selected atoms. This representation is therefore well adapted

to information processing. Although an MP is nonlinear, like an orthogonal expan-

sion, it maintains an energy conservation which guaranties its convergence. Mallat

and Zhang then derive a TF energy distribution which is obtained by addition of the

WD of the chosen TF atoms. This distribution thus obtained is free of interference

40

terms and provides a clearer picture quite contrarily to the WD or Cohen's class dis-

tributions.

Compact signal coding is another important domain of application of MPs. For

a given class of signals, if the dictionary can be adapted to minimize the storage for

a given approximation precision, better results are guaranteed than decompositions

on orthonormal bases. Indeed, an orthonormal decomposition is a particular case

of MP where the dictionary is the orthonormal basis. For dictionaries that are not

orthonormal bases, the inner products of the structure book and the indexes of the

selected vectors need coding. This requires to quantize the inner product values

and use a dictionary of �nite size. The MP decomposition is then equivalent to a

multistage shape�gain vector quantization in a very high dimensional space. For

information processing or compact signal coding, it is important to have strategies

to adapt the dictionary to the class of signal that is decomposed. If enough prior

information is available, the dictionary can be adapted to the probability distribution

of the signal class within the signal space. Finding strategies to optimize dictionaries

in high dimensions is an open problem that shares similar features with learning

problems in NNs.

2.1.2.6.2 Basis Pursuit TFDs.

The basis pursuit proposed by Chen, Donoho, and Saunders [173] uses con-

vex optimization to �nd signal representations in overcomplete dictionaries. They

obtain the decomposition that minimizes the �1 norm of the coef�cients occurring in

41

the representation. The optimization principle leads to decompositions that is much

sparser. Also this can superresolve as it is based on global optimization. This tech-

nique can be used with noisy data by solving an optimization problem keeping in

view a quadratic mis�t measure. One can easily identify the important connections

between basis pursuit and the other methods like Mallat and Zhong's MP [175] mul-

tiscale edge representation and the total variation�based denoising methods of Rudin,

Osher, and Fatemi's [202].

2.1.2.6.3 TFDs based on Empirical Mode Decomposition.

A new data�driven technique termed as empirical mode decomposition (EMD)

is introduced by Huang et al. [176]. In their original paper, Huang et al. introduce

a general two step method in analysing the data. The data is �rst preprocessed by

the EMD, resulting into a number of intrinsic mode function (IMF) components. In

this way the data is expanded in a basis taken from the data itself. Then the Hilbert

transform is applied in the second step to the IMFs. Later on the energy�frequency�

time distribution is constructed which is designated as the Hilbert spectrum. This

Hilbert spectrum can preserve the time localities of events. This construction of TFD

is offcourse not limited to any one technique, and the better methods may be used to

get TFDs that become highly localized in TF domain.

The EMD has received more attention in terms of applications [177]�[189]

and interpretations [190, 191]. The EMD gives the main bene�t of deriving the basis

functions from the signal itself, thus the analysis is adaptive.

42

The idea is to decompose time series into superposition of components with

well de�ned IFs i.e. the IMFs. The components should (approximately) obey earlier

requirements of completeness, orthogonality, locality and adaptiveness. Next con-

struct the Hilbert spectrum of each IMF, representing it in the TF plane. However the

appropriate TF representation (e.g. reassignment method) of the decomposed IMF

result into highly concentrated TFDs [193].

2.1.2.6.4 Matching Pursuit adaptive TFDs.

A novel approach to extract the IF from its adaptive TFD is proposed recently

by Krishnan [170]. The adaptive TFD of a signal is obtained by decomposing the

signal into components with reasonable TF localization and by combining the WD of

the components. The adaptive TFD thus obtained is free of CTs and is a positive TFD

but it does not satisfy the marginal properties. The marginal properties are achieved

by applying the MCE optimization to the TFD. Then, IF may be obtained as the �rst

central moment of this adaptive TFD. Krishnan has shown successful extraction of

the IF of a set of real�world and synthetic signals of known IF dynamics with the

proposed method. In [171], a solution to the multicomponent problem was given

by proposing an algorithm to select an optimal TFD from a set of TFDs for a given

signal. Krishnan, in his approach, has addressed the same problem by constructing

TFDs according to the application in hand, that is, he has tailored the TFD according

to the properties of the signal being analyzed. In his method, by using constraints,

the TFDs are modi�ed to satisfy certain speci�ed criteria. It is assumed that the

43

given signal is somehow decomposed into components of a speci�ed mathematical

representation. By knowing the components of a signal, the interaction between

them can be established and used to remove or prevent CTs. This avoids the main

drawback associated with Cohen's class TFDs.

2.2 Objective Assessment Methods

It is a fact that choosing the right TFD to analyze the given signal is not straight-

forward, even for monocomponent signal, and becomes more complex while deal-

ing with multicomponent signals. The common practice to determine the best TFD

for the given signal have been the visual comparison of all plots with the choice

of most appealing one. As an example, various BDs of real life multicomponent bat

echolocation chirp signal [134] are shown in Fig. 2.1, which include the spectrogram,

WD, Zhao�Atlas�Marks distribution (ZAMD), Margenau�Hill distribution (MHD),

CWD, and Born�Jordan distribution (BJD). Less interference and better component

separation is obvious for spectrogram and CWD than the other considered TFDs.

However this selection is generally dif�cult and subjective. The need to objectively

compare the plots in Fig. 2.1 requires the introduction of some quantitative perfor-

mance measure speci�cally tailored for TFDs. The estimation of signal information

and complexity in the TF plane is quite challenging. The themes which inspire new

measures for estimation of signal information and complexity in the TF plane, in-

clude the suppression of TFDs' cross components, the concentration and resolution

44

of auto�components and the ability to correctly distinguish closely spaced compo-

nents. Ef�cient concentration and resolution measurement can provide a quantitative

criterion to evaluate performances of different distributions. They conform closely to

the notion of complexity that are used when visually inspecting TF images [37].

Concentration of a TFD is one of TFD's very important and extensively stud-

ied properties [31, 90]. For a monocomponent signal, performance of its TFD is

usually de�ned in terms of its energy concentration about the signal IF [14]. To

measure distribution concentration for monocomponent signals, Gabor [120], Vak-

man [121], Janssen [72], and Cohen [14] made important initial contributions. For

more complex signals, some quantities in the statistics were the inspiration for de�n-

ing measures for TFDs in the form of: the ratio of distribution norms by Jones and

Parks [112], the Rényi entropy by Williams et al. [114, 116] and Baraniuk et al.

[37], and distribution energy for optimal kernel distributions' design by Baraniuk

and Jones [132]. A simple measure for distributions concentration was presented by

Stankovic [172] based on the de�nition of duration of time limited signals. Boashash

and Sucic [33], on the other hand, combined the characteristics of TFDs like main-

lobe, sidelobe magnitudes, instantaneous bandwidth and the signal IF to de�ne an

instantaneous concentration measure.

For multicomponent signals, resolution is equally important to evaluate the per-

formance of its TFD alongwith the energy concentration it attains along the IF of each

component present in the signal. The good TF resolution of the signal components

45

(a) (b)

(d)(c)

(f)(e)

Figure 2.1: TFDs of a multicomponent bat echolocation chirp signal. (a) Spectro-gram (Test Input to the BRNNM)[Hamming window of length L = 100], (b) WVD,(c) ZAMD, (d) MHD, (e) CWD [kernel width =1], (f) BJD.

46

requires a good energy concentration for each of the components and a good suppres-

sion of any undesirable artifacts. The resolution may be measured by the minimum

frequency separation between the component' mainlobes for which their magnitudes

and bandwidths are still preserved [33].

Keeping above in view, some important thoeretical measures are selected, ran-

domly discussed in literature, to evaluate the proposed ANN based framework. They

include ratio of norms based measures [100], Shannon & Rényi entropy measures

[117, 118], normalized Rényi entropy measure [114], LJubisa measure [172] and

Boashash performance measures [33]. An alteration for concentration measure in

[33] is suggested and implemented to get the true picture of multicomponent TFDs'

concentration. A brief overview of these measures is presented next.

2.2.1 Entropy Measures

The terms entropy, uncertainty, and information are used more or less interchange-

ably and is the measure of information for a given probability density function. Sim-

ilarly it can be applied to TFDs to quantify the information by measuring the signal's

complexity [37, 116, 119]. By the probabilistic analogy, minimizing the complex-

ity or information in a particular TFD is equivalent to maximizing its concentration,

peakiness, and, therefore, resolution [100].

47

2.2.1.1 Shannon entropy

The well known Shannon entropy [117] for TFD of unit energy signals, can be

expressed as

EShannon = �Xn

X!

Q(n; !) log2(Q(n; !)) (2.2)

The classical Shannon entropy is a natural candidate for estimating the concen-

tration of a TFD and can be viewed as the inverse of a measure of concentration of

the distribution in the TF plane. The peaky TFDs of signals with high concentration

would yield small entropy values and vice versa. The negative values taken on by

most TFDs prohibit the application of the Shannon entropy due to the logarithm in

Eqn. (2:2). By taking into account the absolute value of the distribution ensures that

the integrated logarithm exists.

2.2.1.2 Rényi entropy

It is introduced as a more appropriate way of measuring the TF uncertainty

sidestepping the negativity issue, derived from the same set of axioms as the Shannon

entropy [36, 114]. The only difference being the employment of a more general

exponential mean instead of the arithmetic mean in the derivation [116], given as

ERENY I� =1

1� � log2

Xn

X!

Q�(n; !)

!(2.3)

48

where � is the order of Rényi entropy, which has been taken as 3 being the smallest

integer value to yield a well�de�ned, useful information measure for a large class

of signals. The generalized entropies of Rényi inspire new measures for estimating

signal information and complexity in the TF plane. When applied to a TFD from

Cohen's class, they conform closely to the notion of complexity that we use when

visually inspecting TF plots.

2.2.2 Normalized Entropy Measures

The Rényi entropy measure with � = 3 does not detect zero mean CTs, so normal-

ization either with signal energy or distribution volume is necessary [114]. It will

also reduce a distribution to the unity signal energy case.

2.2.2.1 Normalization with the signal energy

It is important for comparison of various distributions, or the same distribution

when it is not energy unbiased. Behavior of this measure is quite similar to the

nonnormalized measure form, except in its magnitude. By de�nition Rényi entropy

normalized by signal energy is given by:

ENRE� =1

1� � log2�P

n

P!Q

�(n; !)Pn

P!Q(n; !)

�with a � 2 (2.4)

2.2.2.2 Normalization with the distribution volume

The expression for this type of entropy measure can be written as:

49

ENRE� =1

1� � log2�P

n

P!Q

�(n; !)Pn

P! jQ(n; !)j

�with a � 2 (2.5)

This form of measure has been used for adaptive kernel design in [114]. If the distri-

bution contains oscillatory values, then summing them in absolute value means that

large CTs will decrease this measure, indicating smaller concentration due to CTs

appearance.

2.2.3 Ratio of Norms based Measure

Jones and Parks proposed a measure of concentration created by dividing the fourth

power norm of TFD Q(n; !) by its second power norm, given as [112]:

EJP =

Pn

P! jQ(n; !)j

4�Pn

P! jQ(n; !)j

2�2 (2.6)

The fourth power in the numerator favors a peaky distribution. To obtain the

optimal distribution for a given signal, the value of this measure should be the maxi-

mum:

Q(n; !)optimum ) argmaxQ[EJP ] (2.7)

2.2.4 LJubisa Measure

This is a simple criterion, presented by Stankovic [172], for objective assessment of

TFD concentration makes use of the duration of time limited signals. If a signal x(�)

50

is time limited within the interval � 2 [�1; �2], i.e. x(�) 6= 0 only for the speci�ed

interval , then the duration of signal is = �2 � �1. Consequently we can denote

� = lim�!1R1�1 jx(�)j

1=� d�.

It is assumed in the derivation that the distribution Q(�; !) 6= 0 only for

(�; !) 2 Dx(�; !). For a large �, we may express mathematically

J� �Z 1

�1

Z 1

�1jQ(�; !)j1=� d�d! (2.8)

!ZZ

Dx(�;!)

1:d�d! = Sx

where Sx is the area of Dx(�; !). As a criterion for the distributions concentration

measure it is assumed that among several given unbiased energy distributions, the

best concentrated is the one having the smallest Sx. Value of J�raised to the �th

power is referred to as the LJubisa concentration measure. Its discrete form is

J [Q(n; !)] � J�� = X

n

X!

jQ(n; !)j1=�!�

(2.9)

withP

n

P!Q(n; !) = 1 being the normalized unbiased energy constraint, and

� > 1. The best choice according to this criterion (optimal distribution with respect

to this measure) is the distribution that produces the minimal value of J [Q(n; !)].

2.2.5 Boashash Performance Criteria

These objective measures proposed by Boashash not only take into account the con-

centration but also TFDs' resolution aspects for a practical analysis in the case of

51

signals with closely spaced components. The characteristics of TFDs that in�uence

their resolution, such as components concentration and separation and interference

terms minimization, are combined to de�ne these measures [33].

2.2.5.1 Concentration measure

A time slice (t = t0) of a typical quadratic TFD of an n�component signal will

have the instantaneous bandwidth, the IF, the sidelobe magnitude, and the mainlobe

magnitude for each of the nth component at time t = t0 denoted by Vin(t0), fin(t0),

ASn(t0), and AMn(t0). AX(t0) may be used to represent the CTs magnitude.

At any instant of time, concentration performance of a TFD will improve if it

minimizes sidelobes magnitudesASn(t) relative to mainlobe magnitudesAMn(t) and

mainlobe bandwidth Vin(t) about the signal IF fin(t) for each signal component [33].

Consequently for a given time slice t = t0 of TFD �z(t; f) of an n�component signal

z(t) =Pzn(t), the signal's TFD concentration performance is quanti�ed by [33]:

c/n(t) =ASn(t)

AMn(t)

Vin(t)

fin(t)(2.10)

where c/n(t) is the concentration measure for each signal component.

2.2.5.2 Modi�ed concentration measure

An alternative to the measure c/n is suggested, which is de�ned in Eqn. (2:10).

It combines ASn(t)=AMn(t), and Vin(t)=fin(t) into a sum, rather than a product and,

therefore, account for their effects more independently. The newmeasure gives better

52

picture of TFDs' concentration performance, even for those having no sidelobes.

This results in the following de�nition for the instantaneous concentration measure

for each signal component in z(t) =Pzn(t):

Cn(t) =ASn(t)

AMn(t)+Vin(t)

fin(t)(2.11)

The good performance of a TFD is characterized by a close to zero value of this

measure.

2.2.5.3 Resolution measure

The frequency resolution in a power spectral density estimate of a signal com-

posed of two single tones f1 and f2 is de�ned as the minimum difference f2 � f1 for

which the following inequality holds:

f1 +V12< f2 �

V22; f1 < f2 (2.12)

where V1 and V2 are the bandwidths of the �rst and the second sinusoid, respectively.

From Eqn. (2:12) and earlier discussion, the frequency resolution of TFD for

a pair of components in a multicomponent signal may be quanti�ed by the minimum

difference fi2(t)� fi1(t) (fi1(t) < fi2(t)) for which a separation measureD between

the components' mainlobes, centred about their respective IFs fi1(t) and fi2(t), is

positive. D(t) is a measure of the components' mainlobes separation in frequency,

which is de�ned as

53

D(t) =

�fi2(t)�

Vi2 (t)

2

��fi1(t)�

Vi1 (t)

2

�fi2(t)� fi1(t)

= 1� Vi(t)

4fi(t)(2.13)

where Vi(t) =PVin=2 is the components' mainlobes average instantaneous band-

width, and 4fi(t) = fin+1(t) � fin(t) is the difference between the components'

IFs. The measure D(t) requires computations for each adjacent pair of components

present in the signal indicated by subscript n.

In order to get better resolution performance of quadratic TFDs, the separation

measure D should be maximized and, concurrently, the interference terms (CTs and

components' sidelobes) should be minimized. The imposed constraints thus result in

an overall measureR of the resolution performance of a TFD for a pair of components

in a multicomponent signal expressed as [33]

R(t) =AS(t)

AM(t)

AX(t)

AM(t)

1

D (t)(2.14)

where AM(t) =PAMn(t)=2, AS(t) =

PASn(t)=2, and AX(t) are, respectively,

the average magnitude of the components' mainlobes, the average magnitude of the

components' sidelobes and the CT magnitude of any two adjacent signal compo-

nents. Good resolution performance of TFD for a given pair of components in a

multicomponent signal is characterized by a small (close to zero) positive value of

the measure R.

In order to make the measure close to 1 for good performing TFDs and 0 for

poor performing ones (TFDs with large interference terms and components poorly re-

54

solved), the normalized instantaneous resolution performance measure Ri is de�ned

as [33]:

Ri(t) = 1�1

3

�AS(t)

AM(t)+1

2

AX(t)

AM(t)+ (1�D(t))

�0 < Ri(t) < 1 (2.15)

2.3 Summary

The most fundamental and challenging aspects of analysis are the clear understand-

ing of a time�varying spectrum, and the representation of the properties of a signal

simultaneously in time and frequency without any ambiguity. Historically various

different TF techniques are de�ned for achieving these tasks, however it is impor-

tant to search for the one that �ts to the application. Consequently the �rst part of

this Chapter attempts to describe the importance of high concentration and good res-

olution for the TF signal processing. It speci�cally mentions the motivations and

ingenuity of various researchers to implement newer techniques to improve the spec-

trum in TF domain. Basic concepts of various methods and well�tested algorithms

are discussed that emphasize the signi�cance of the technique to the analysis sig-

nals. Indeed different applications have different preferences and requirements to the

TFDs. In general the choice of a TFD in a particular situation depends on many fac-

tors such as the relevance of properties satis�ed by TFDs, the computational cost and

speed of the TFD, and the tradeoff in using the TFD.

55

An imperative discussion is presented in the second part of this Chapter on the

description of objective methods of assessment to evaluate the TFDs' concentration

and resolution performance. These methods are used to quantify and compare the

de�blurred TFDs obtained by the proposed ANN based framework in Chapter 5.

56

Chapter 3Neural Network based Framework forComputing De�blurred TFDs�Part I

This Chapter gives the preliminary emergent ANN based framework and pro-

ceeds with its optimization for computing de�blurred TFDs. The work in Section 3.1

formulates the problem, states various constraints and presents an initial ANN based

framework to solve the problem. Section 3.2 presents comparison of ANN training

algorithms and selects the LMB as the most optimum training algorithm. Further

an experimental study is presented on optimizing the design and architecture of the

ANN setup to include number of neurons, layers, and type of activation functions in

Section 3.3. In Section 3.4, the advantage of clustering the data and training multiple

ANN for each cluster is ascertained.

3.1 TFDs using ANN�Binary Case

The goal is to obtain a TFD that is free of any blurring effect. Furthermore, no

knowledge of the components is assumed to be known ahead of time. For this the

binary spectrogram of several known signals are used as input to train an ANN (Figs.

3.2 and 3.3). As a target a TFD is used that is constructed by adding the expected TFD

for each of the individual component present in the signal. The expected individual

TFDs are constructed by considering the IF of each of the component present in the

57

signal (Figs. 3.4 and 3.5). Fig. 3.1 is the graphical explanation of the method. The

TFD is considered as 2�D image matrix. As an initial work both the training and

target TFDs are changed to binary versions. The corresponding region vectors in

both the input and target TFDs are paired to form the training and validation sets.

The entropy [107] of TFD Q(n; !) is considered here as measure of concentration

given by:

EQ = �N�1Xn=0

Q (n; !) log2Q (n; !) d! � 0 (3.1)

The lower the entropy of a distribution, the more concentrated it is.

3.1.1 Selected ANN Architecture

The method uses the LMB feed forward ANN training algorithm with �ve neurons

in a single hidden layer. There are three and one neuron respectively in the input and

output layers.

3.1.1.1 Training Set

The spectrograms of two known signals are used as an initial blurred estimate.

The �rst is a sinusoidal FM signal, while the second signal is composed of two par-

allel chirps. The �rst training signal is given by:

trg1 = ei�[ 12�!(n)n] with ! (n) = 0:1 sin

�2�n

N

�(3.2)

while the second signal is given by:

58

Preprocessing

X1, X2, , Xj

Area ofconcern

TargetNormalizedimages

Figure 3.1: Graphical explanation of the method

59

Figure 3.2: Input training TFD image of sinusoidal FM signal.

trg2 = ei!1(n)n + ei!2(n)n (3.3)

with !1(n) =�n

4Nand !2(n) =

�

3+�n

4N

here N refers to the total number of sampling points in the signal.

The binary spectrograms of these signals are shown as Figs. 3.2 and 3.3. The

respective target TFDs computed from known IF laws are shown in Figs. 3.4 and 3.5.

3.1.2 Test Result

A bat echolocation chirp provides an excellent motivation for TF based signal process-

ing. Fig. 3.6 shows the TFD for this signal obtained by an existing optimum kernel

method (OKM) [132]. The spectrogram (depicted in Fig. 3.7) of this chirp signal is

60

Figure 3.3: Input training TFD image of parallel chirps.

Figure 3.4: Target TFD image for the sinusoidal FM signal.

61

Figure 3.5: Target TFD image of parallel chirp signal.

Table 3.1: Comparison of Entropy

Description Proposed Approach using ANN OKM used by [132] SpectrogramEQ(bits) 7.301 11.826 12.798

used to test the trained ANN. The resulting TFD is shown in Fig. 3.8. This resultant

TFD is highly concentrated and has the lowest entropy as shown in Table 3.1.

3.2 Analysis &Comparison of the ANNTraining Algorithms

While working with an ANN, there are some fundamental questions like how the

weights are initialized?, how is the learning rate chosen?, how many hidden layers

and how many neurons be chosen?, how many examples to include in the training

set?, and what should be the training algorithm?. In this section, the effect of using

62

Figure 3.6: Bineary TFD obtained by the OKM [132].

Figure 3.7: Spectrogram of the bat echolocation chirp signal.

63

Figure 3.8: The deblurred TFD obtained by the proposed ANN model.

different training algorithms on the ANN's performance is observed and the best

training algorithm is found for the task of de�blurring TFDs.

The algorithms analyzed are the PBCGB, RPB, GDALB and the LMB, being

the most frequently used in the ANN literature. The theoretical description of these

algorithms is provided in Appendix A, Section A.5. The progress made here is the

use of grayscale spectrograms instead of binary, of two known signals as input to

train the ANN (Figs. 3.9 and 3.10). The target TFDs are still binary in nature, that

are constructed by adding the expected TFDs for each of the individual component

present in the signal (Figs. 3.4 and 3.5). This training is carried out with the above�

mentioned four training algorithms. The spectrogram (Fig. 3.11) of combined chirp

and sinusoidal FM signal is used as the test TFD. To measure the information in the

resultant TFD Q(n; !), the entropy is used given by Eqn. (3:1).

64

Figure 3.9: Input training TFD image of the sinusoidal FM signal.

Figure 3.10: Input training TFD image of parallel chirps signal.

65

3.2.1 The Method

It is important to note that the input TFDs for ANN are grayscale, consequently

the method to process them has been different than the one described in previous

Section. The idea is based on divide and conquer. A complex computational task

is divided into a set of less complex tasks. The solutions of these are combined

at a later stage to produce the solution to the original problem. It is of paramount

importance to pre�process the available data to make it suitable for training an ANN.

By pre�processing it is meant to convert attributes into variables. This is achieved in

four steps, (i) taking TFD as a matrix, (ii) vectorization of TFD image matrix, (iii)

clustering of these vectors, and (iv) forming pairs of vectors from the input and target

TFDs.

Vectors of appropriate length are obtained from the TFD image matrix, while

working along each row. Based on visual output result the length is decided to be

1� 3 (i.e. three pixels along a row). Two vector spaces are formed by accumulating

the vectors of the spec�ed length by doing it for both training (Figs. 3.9 and 3.10)

and target TFD images (Figs. 3.4 and 3.5). Next the vectors are clustered by �nding

the correlation of each vector with three directional vectors, each representing one

type of edge. The objective is to divide the input space into number of subspaces, Sn,

described by directional unit vectors, vn, that correspond to some useful information.

This creates a certain clustering effect on the input vectors since a vector will lie in

the subspace Sn represented by vn that is most similar to this vector with respect to

66

its information content. After experimenting with subspace's sizes and observing the

�nal results, the number of subspaces is decided to be three. This value is chosen

because it not only gives the advantage of clustering but also has the lowest effect on

the computational complexity of the algorithm. Three directional vectors are used to

characterize three types of edges in the image. This choice is dictated by the problem

of de�blurring. Here are few issues that are considered:

� Edges are important image characteristics.

� Blurring causes loss of edge information from images.

� The process of de�blurring may produce a more useful image if enhancement

is also achieved along with de�blurring.

For each such cluster, an ANN is trained by the four selected training algo-

rithms. By doing so the problem of forcing one network to learn input vectors that

are distant from each other is eliminated. Of course, the choice of number of di-

rectional vectors remains dependent on the problem. A pseudo procedure is given

as:

1. The TFDs' matrices are converted to vectors of speci�c length.

2. The number of subspaces Ns are decided based on suitable criterion.

3. The subspace direction vector vn, (n = 1; 2; : : : :; Ns) are selected that will best

represent the subspace Sn.

67

4. The direction vectors vn are normalized.

5. The correlation between each input vector and vn is found and it is assigned to

the corresponding subspace. Three directional vectors vh; vc; vl are computed in

the following manner:

(a) vh is obtained by rearranging (any) 3 integers in descending order,

representing a downhill edge.

(b) vc is obtained by rearranging (any) 3 integers in a triangular fashion where

the highest value occurs in the middle and values on either side are in

descending order. This represents a triangular type of edge.

(c) vl is obtained by rearranging (any) 3 integers in ascending order,

representing an uphill edge.

6. For each cluster, an ANN is trained using the training vectors that include the

input vectors obtained from the spectrograms and the mean of respective vectors

from the binary target TFDs.

7. The trained ANNs are then tested using unknown blurred TFDs after

vectorization and clustering according to step 1 and step 5. The test vectors

are fed to the network which is specially trained for that type of vector. The

resulting values are zero�padded and are placed at the original grid locations to

construct the resultant TFDs.

68

3.2.2 Selected ANN Architecture

The performance of four short listed algorithms is compared based on the entropy

values of resultant TFDs. For this, single hidden layer is used with 10 neurons. The

neurons in the input and output layers are �xed at 3 and 1 respectively. Single hidden

layer is used because most of the non�linear problems can be solved with single hid-

den layer with suitable choice of neurons [218]. The `tansig' and `purelin' transfer

function are used in between input�hidden layers and hidden�output layers respec-

tively representing the hidden layer of sigmoid neurons followed by an output layer

of linear neurons. Multiple layers of neurons with nonlinear transfer functions allow

the network to learn nonlinear and linear relationships between input and output vec-

tors. The linear output layer lets the network produce values outside the range �1 to

+1.


The spectrograms of two known signals, given in Eqns. (3:2) and (3:3), are

used as input for training the ANNs. The grayscale spectrograms of these signals are

shown as Figs. 3.9 and 3.10. The respective target TF plane images are shown in

Figs. 3.4 and 3.5.

3.2.3 Performance Evaluation

The performance of four selected algorithms is compared on the basis of parameters

like time to coverge, MSE in last epoch and entropy values of resultant TFDs. The

69

Figure 3.11: Test TFD image of combined sinusoidal FM & parallel chirps signal.

spectrogram (depicted in Fig. 3.11) of combined chirp and sinusoidal FM signal is

used as the test TFD. The results are shown in Figs. 3.12 to 3.15, the output TFDs

obtained by the RPB, PBCGB, GDALB and the LMB training algorithms respec-

tively. The visual result indicate that the result obtained by the ANN trained by the

LMB training algorithm is more concentrated along the individual components. The

comparative graph in Fig. 3.16 indicates that the LMB algorithm converges sharply

within few epochs and results in lesser error as compared to other training algorithms.

Furthermore it has the lowest entropy of all as shown in Table 3.2. However if vari-

ous other parameters used as default in these algorithms are altered appropriately by

an exhaustive approach, then results can be improved for other algorithms as well.

70

Figure 3.12: The resultant TFD obtained after passing the spectrogram of the testsignal through the trained ANN with RPROP backpropagation algorithm.

Figure 3.13: The resultant TFD obtained after passing the spectrogram of the test sig-nal through the trained ANN with Powell-Beale conjugate gradient back propagationalgorithm.

71

Figure 3.14: The resultant TFD obtained after passing the spectrogram of the test sig-nal through the trained ANN with Gradient descent with adaptive lr backpropagationalgorithm.

Figure 3.15: The resultant TFD obtained after passing the spectrogram of the testsignal through the trained neural network with Levenberg-Marquardt training algo-rithm.

72

Figure 3.16: The comparative graph which shows error convergence with respect tonumber of iterations.

73

Table 3.2: Comparison of Training Algorithms

Training Algorithms LMB PBCGB GDALB RPBParameters

MSE performance Available Not available Not available Not availableLearning rate Available Not available Available AvailableMemory/speed trade off Available Not available Not available Not availableTolerance for linear search Not available Available Not available Not availableEntropy values (EQ bits) 7:30 8:03 12:79 11:83Error converged 4:49� 10�5 3:69� 10�4 4:32� 10�2 8:25� 10�4Time taken for convergence(sec) 30:18 26:53 14:43 25:46

3.3 Impact of varying number of Neurons and the HiddenLayers

This section presents an experimental investigation, to know the effect of varying the

number of neurons and hidden layers in feed forward back propagation ANN archi-

tecture, for obtaining de�blurred TFDs. The discussion highlighting the signi�cance

of neurons and hidden layers in an ANN is presented in Appendix A, Sections A.2

and A.3. It is important to note that varying the number of neurons and hidden layers

in an ANN architecture for a speci�c problem is expected to greatly affect the per-

formance of ANN. Due to this reason ANNs are trained with TFD parallel chirps'

signal, by varying hidden layers and their neurons. The spectrogram of single chirp

signal is the test image (Fig. 3.17). The test TFD is processed through these ANNs

and the visual effect is recorded alongwith computation of entropy and MSE values.

The method remains the same as discussed in Section 3.2.1, and will not be discussed

here.

74

Figure 3.17: Test spectrogram image of single chirp signal.

3.3.1 ANN topology

Based on the results of previous Section, the LMB training algorithm is used for

learning. The activation functions are �xed as 'tansig' and 'poslin' in between input�

hidden layers and hidden�output layers. Number of hidden layer and neurons are

varied to �nd the optimum solution for the stated problem. The neurons in the input

and output layers are �xed at 3 and 1 respectively.


The spectrogram of parallel chirps' signal given in Eqn. (3:3), is used as input

for training the ANNs. The grayscale spectrogram of this signal is shown in Fig.

3.10. The respective target TF plane image is shown in Fig. 3.5.

75

5 10 15 20 25 30 35 40 45 500

0.01

0.02

0.03

0.04

0.05

0.06

no of neuron

erro

r con

verg

ed

ERROR VS NO OF NEURONS

Figure 3.18: The comparative graph of error vs number of neurons in single hiddenlayer.

3.3.2 Effect of varying the number of Hidden Neurons

While studying the effect of varying the number of hidden neurons for computing

de�blurred TFD, the networks, trained with 2; 3; � � � 5; 10; � � � 40 and 50 neurons re-

spectively in 1; 2 and 3 hidden layer(s), are tested. The network never converged to

a stable point with neurons upto 30. The reason being that the less number of neu-

rons take the data from the input grid, unable to interpret the complete information,

and hence fails to convey the correct information to the next layers. The results were

satisfactory with 35 neurons in the hidden layer (one or more). But by increasing the

number of neurons further, no signi�cant improvement was observed in the reduc-

tion of error in last epoch as shown in Fig. 3.18. Entropy values are found to be the

minimum for 40 or more neurons irrespective of number of hidden layers, as given

in Table 3.3.

76

Table 3.3: Impact of varying neurons and hidden layers over entropy of resultantimage

Description Number of neurons10 20 30 40 50

EQ bits for single layer 10.20 9.31 9.01 8.20 8.20EQ bits for two layers 20.41 16.31 15.60 11.21 11.21EQ bits for three layer 22.56 14.30 12.10 10.24 10.24

3.3.3 Effect of varying the number of Hidden Layers

To decide about the optimum number of hidden layers in an ANN architecture is one

of the most important factor for solving any problem. It is observed from Fig. 3.19,

that there is no signi�cant difference in the MSE values converged while the hidden

layers are varied from 1 to 3: The networks converges to approximately the same

MSE values in the last epochs. Also the similar time is consumed for convergence

as indicated by number of epoches for 1; 2 and 3 layers respectively in Fig. 3.19.

Next the impact of this is observed on the quality of resultant TFD, by seeking all

possible rational combinations of number of layers and neurons in those layers. In

this Section only the selected resultant TFD images for few combinations are shown

in Figs. 3.20 to 3.26. The important point to note is that the resultant TFDs even

deteriorate if higher number of hidden layers are used. The empirical study testi�es

to the fact that many complex non�linear problems can be solved with only single

hidden layer. The entropy values recorded in Table 3.3, highlight the same fact that

the ANN architecture with single hidden layer and appropriate number of neurons

results in the most informative output for the stated problem.

77

0 5 10 15 20 25 30 35 4010

4

103

102

101

100

101

epoches

MS

E

ERROR VS EPOCHES

3 LAYERS2 LAYERS1 LAYERS

Figure 3.19: The comparative graph of error vs epoches for various number of hiddenlayers

Figure 3.20: Resultant TFD (2 hidden layers with 50 neurons in each).

78



79



80

Figure 3.25: Resultant TFD (single layer with 40 neurons).

Figure 3.26: Resultant TFD (Single hidden layer with 30 neurons).

81

3.4 Effect of Data Clustering and using Multiple ANNs foreach Cluster

In this Section, the effect of clustering the data and training multiple ANNs for each

cluster (CDMN) is examined to obtain the high resolution TFDs. It has been found

that training does not give the same results every time; this is because the weights

are initialized to random values and high validation error may end up training early.

Moreover once a network is trained with selected input, its performance improves

signi�cantly as opposed to the one that does not receive selected input data for train-

ing. The training TFDs remain the same with mathematical description given in

Section 1 and graphically shown in Figs. 3.9�3.10 and 3.4�3.5 for input and target

TFDs respectively. The spectrogram (depicted in Fig. 3.11) of combined chirp and

sinusoidal FM signal is used as test TFD. The performance of CDMN is checked by

computing the entropy, MSE and the time consumed for convergence.

3.4.1 Advantages of Clustering and Training Multiple ANNs

The clustering process, groups together entities based on the similarity of their fea-

tures. The purpose is to form groups of entities such that entities within a group are

similar to each other and different from entities in other groups. Different networks

are then trained for each cluster, thus the problem of forcing one network to learn

input vectors that are distant from each other is eliminated.

82

On the other hand using multiple ANNs for each cluster is expected to be ad-

vantageous due to the fact that weights are initialized to random values. When the

network begins to over �t the data, the error on the validation set typically begins to

rise and when the validation error increases for a speci�ed number of iterations, the

training is stopped, and the weights and biases at the minimum of the validation er-

ror are returned. By keeping track of the network error, or performance, accessible

via the training record, best network can be selected in terms of training performance

for each cluster.

3.4.2 The Network Architecture and Procedure

Based on the results of Sections 3.2 and 3.3, the LMB training algorithm is used

for learning for single hidden layer with 40 neurons [224]. The activation functions

are �xed as 'tansig' and 'poslin' in between input�hidden layers and hidden�output

layers. The neurons in the input and output layers are �xed at 3 and 1 respectively.

The procedural steps for training and testing are same as described in Section

3.2.1. The exception here include training multiple ANNs for each cluster and selec-

tion of the best ANN for each cluster based on MSE in last epoch. These selected

networks are termed as expert networks (ENs).


The available data is clustered into three different regions based on three intuitive

edges present in the data as discussed in Section 3.2.1. There are three ANNs trained

83

Figure 3.27: Resultant TFD obtained after processing test TFD with single ANNwithout data clustering.

for each cluster. The spectrogram (depicted in Fig. 3.11) of combined chirp and sinu-

soidal FM signals is used as test TFD. The results are shown in Figs. 3.27 and 3.28.

It is evident that result obtained with CDMN is highly concentrated as compared to

the result obtained otherwise.

3.4.3.1 MSE performance

For MSE performance comparison, plot of MSEs converged by various ANNs

trained for each cluster is shown in Fig. 3.29. It indicates that only once out of

three (about 34%) ANN � 1 displays the minimum MSE and it is ANN � 2 which

is found to be the best as it ful�ll this criterion twice. The graph between the MSE

convergence rate of various ANNs is plotted in Fig. 3.30, which depicts that the MSE

for ANN � 2 converges sharply and stops due to early stopping to avoid over�tting

84

Figure 3.28: Resultant TFD obtained after processing test TFD with multiple ANNsafter data clustering.

within 22 and 12 epochs for cluster 1 & 3 as compared to rather �at convergence rate

ofANN�1within 50& 15 epochs. This results in sharper rate of MSE convergence

as compared to other ANNs trained for these clusters. Signi�cantly for third cluster,

ANN � 1 is even the worst in comparison to the other two ANNs, obvious from Fig.

3.30(c).

3.4.3.2 Information analysis

The advantage of CDMN is further ascertained once the information content

in each resultant TFD is computed by measuring the entropy of resultant images

according to Eqn. (3:1). It is found that the CDMN produces the resultant TFD with

maximum information content as it has the lowest entropy of all as shown in Table

3.4.

85

(c)

(a) (b)

Figure 3.29: The MSE in last epoch for (a) ANNs trained for cluster 1 (b) ANNstrained for cluster 2 (c) ANNs trained for cluster 3.

86

(c)

(a) (b)

Figure 3.30: Rate of MSE convergence against epochs for (a) ANNs trained forcluster 1 (b) ANNs trained for cluster 2 (c) ANNs trained for cluster 3

87

Table 3.4: Comparison of Entropies

Description of approach EQ bitsSingle ANN without clustering 15:987Single ANN with clustering 13:768Multiple ANNs without clustering 11:667CDMN 8:223Spectrogram 29:743WD 18:915

3.4.3.3 Convergence time taken by various ANNs

The MSE converged by various ANNs is plotted against epoches in Fig. 3.31,

which indicates the time consumed by the different ANNs trained for each cluster

to approach the minimum MSE. This factor is however not exactly controllable due

to factors like early stopping and random weight initialization at the start for same

number of epoches. It is not guaranteed that the EN will consume minimum time

for convergence. However it can be observed from these comparative graphs that

ANN � 3 consumes the minimum time for two of clusters and ANN � 1 has taken

more time to converge than other networks trained for same cluster. This aspect

validates that training only one network for any task may not always give the best

results.

3.5 Summary

A novel ANN based approach is presented for computing de�blurred TFDs. The ap-

proach is progressively re�ned by optimizing various ANN parameters. Section 3.1

presented a method of computing informative and highly concentrated binary TFDs

88

(a) (b)

(c)

Figure 3.31: The convergence time taken by variuos ANNs for each cluster of data,(a) ANNs trained for cluster 1, (b) ANNs trained for cluster 2, (c) ANNs trained forcluster 3.

89

of signals whose frequency components vary with time. In Section 3.2, performance

of the various ANN training algorithms is evaluated on the basis of time, error and

entropy analysis. The LMB training algorithm is found to be the most optimum train-

ing algorithm, resulting in high resolution TFDs. The simulation results presented in

Section 3.3, indicate that the ANN architecture composed of single hidden layer with

40 neurons effectively removes the blur from the unknown spectrograms. The MSE

in last epoch is minimum for this ANN topology and it yields the lowest entropy

value in Table 3.3. Increasing the number of neurons and the hidden layers is found

to increase the complexity of the network. Moreove it is unsuitable manifested by

both visual (Figs. 3.20 to 3.26) and numerical �ndings (Table 3.3). Studying the ef-

fect of these parameters has a major impact on future research work in this direction.

Section 3.4 evaluates the performance of training CDMN on the basis of time,

error and entropy analysis. It is found that a mixture of ENs focused on a speci�c

task delivers a TFD that is highly concentrated along the IF. Experimental results

demonstrate the effectiveness of the approach.

90

Chapter 4Neural Network based Framework forComputing De�blurred TFDs�Part II

This Chapter presents the optimized BRNNM's based correlation vectored tax-

onomy algorithm to compute the TFDs that are highly concentrated in the TF plane,

by raising the scope of the approach presented in the previous Chapter. The ideas of

improved clustering, generalized training set and the Bayesian regularization in the

ANNs' training phase are incorporated, which greatly enhances the �nal results. The

degree of regularization is automatically controlled in the Bayesian inference frame-

work and produces networks with better generalized performance and lower suscep-

tibility to over��tting. The elbow criterion is used to �nd the optimum number of

clusters for the stated problem, which is found to have positive impact on the results.

Also the input and target TFDs are made more generalized where now the grayscale

spectrograms and pre�processed WD of known signals are vectorized and clustered

as per the elbow criterion to constitute the training data for multiple ANNs. The

best trained networks are selected and made part of the localized neural networks

(LNNs). Test TFDs of unknown signals are then processed through the algorithm

and presented to LNNs. Experimental results demonstrate that appropriately vec-

tored and clustered data and the regularization, with input training under Mackay's

evidence framework, once processed through LNNs produce high resolution TFDs.

91

A real life signal is tested to show the effectiveness of the proposed algorithm via

analysis based on entropy and visual interpretation.

4.1 The ANN based Framework's Description

Fig. 4.1 shows the overall block diagram of the model. The method employs Bayesian

regularization during ANNs' training phase to obtain energy concentration along the

IF of individual components for unknown blurred TFDs. The TFDs are treated as the

2�D images and vectors of different nature are recognized based on various edges

present in these images. These vectors are separated and clustered according to the

elbow criterion. The multiple Bayesian regularized neural networks (BRNNs) are

trained for each group of vectors in a cluster and by keeping track of the network er-

ror or performance, accessible via the training record, the best network is selected

in terms of training performance. These selected networks are the LNNs, special-

ized for one type of vectors each, with better generalization abilities. These LNNs

together are termed as network of expert neural networks (NENNs). In this way, the

aspect of forcing one network to learn input pattern that are distant from each other

is eliminated [110].

Fig. 4.1 is the overall block representation of the proposed ANN based frame-

work. This block diagram highlights three major modules of the method which are

drawn in Fig. 4.2 for more clarity. The modules include (i) pre�processing of train-

ing data, (ii) processing through the BRNNM and (iii) post�processing of output

92

ResultantTFDsPost

Processing

TestTFDs

Vectorization

Cluster A …Cluster B Cluster N

Correlation

LNNSelection

LNNSelection

LNNSelection

…

…TrainingMultipleBRNNs

TrainingMultipleBRNNs

TrainingMultipleBRNNs

Vectorization

Cluster A …Cluster B Cluster N

CorrelationTraining& TargetTFDs Preprocessing

ProcessingthroughBRNNM

Post processing

Figure 4.1: Flow diagram of the method

93

Output Postprocessing

BRNNM

PreprocessingInput

Figure 4.2: Major modules of the method

data. These major modules are further elaborated in Figs. 4.3� 4.13. These modules

and the rationale of the proposed method are described below:

4.1.1 Pre�processing of Training Data

Fig. 4.3 depicts the block diagram for this module. It consist of �ve major steps,

namely (i) two�step pre�processing of target TFDs, (ii) vectorization, (iii) subspaces

selection and direction vectors, (iv) correlation, and (v) taxonomy. They are de-

scribed as follows.

4.1.1.1 Two�step pre�processing of target TFDs

The highly concentrated WD of various known signals are used as the target

TFDs. As will be shown in Fig. 4.5, WD suffer from CTs which make them un-

94

TrainingData

Assigning mean values toRespective Clusters

Vectorization

Target TFD withoutcross terms

Two Step Processing

Target TFD withcross terms

Cluster 1 Cluster 2 Cluster NCluster 3 … .

Correlation

Vectorization

Input BlurredTFD

Figure 4.3: Pre-processing of training data

95

(a) (b)

Figure 4.4: The spectrograms used as input training images of the (a) sinusoidal FM,and (b) parallel chirp signals.

suitable to be presented as targets to the ANNs [218]. This fact is further elaborated

separately for these target TFDs in Figs. 4.6 and 4.8, where the CTs are clearly visi-

ble in binary versions. The CTs are therefore eliminated before the WD is fed to the

ANN. This is achieved in two steps:

1. The WD is multiplied point by point with spectrogram of the same signal

obtained with a hamming window of reasonable size.

2. All values below a certain threshold are set to zero.

The resultant target TFDs are shown as Figs. 4.7(a) and 4.9(a), which are fed

to the ANN after vectorization described as follows.

96

(a) (b)

Figure 4.5: Target TFDs with CTs unsuitable for training ANN taking WD of the, (a)parallel chirps' signal, and (b) sinusoidal FM signal.

(a) (b)

Figure 4.6: The non-processed WD target images of the sinusoidal FM signal, (a)grayscale version, (b) binary version.

97

(a) (b)

Figure 4.7: The pre-processed WD target image of sinusoidal FM signal, (a)grayscale version, (b) binary version.

(b)(a)

Figure 4.8: The non-processed WD target images of the parallel chirps' signal, (a)grayscale version, (b) binary version.

98

(a) (b)

Figure 4.9: The pre-processed WD target image of the parallel chirps' signal, (a)grayscale version, (b) binary version.

4.1.1.2 Vectorization

(1) Input TFDs. Fig. 4.4 depicts input spectrograms. They are consid-

ered as 2�D images consisting of pixels having appropriate grayscale values e.g.,0@a11 � � � a1n... . . . ...am1 � � � amn

1AThese pixel values can be used to generate vectors, for example,a vector of length three will contain three pixel values of a row of TFD image. The

three pixel values of input TFDs in a row are taken as a vector of length three. The

size of TFD image matrix is pre�adjusted to avoid any leftover row or column. The

vector length is decided after experimenting with various vector lengths (3; 5; 7 and

9). The decision is made based on visual results. Each input TFD image is thus

converted to vectors of particular length. These vectors are paired with the vectors

obtained from target TFDs, to be subsequently used for training.

99

(2) Target TFDs. Target WD are made CTs free using two�step procedure de-

scribed above. Mean values of the pixels of length three are computed from the corre-

sponding region of target TFD against the input TFD. For example, if ha11; a12; a13i

is a vector of pixels from input TFD and hb11; b12; b13i is the vector representing

corresponding region from the target TFD, then (b11+b12+b13)3

will become the target

numerical value for the input vector of length three. Mean values are taken as tar-

gets with a view that the IF can be computed by averaging frequencies at each time

instant, a de�nition suggested by many researchers [31].

4.1.1.3 Subspaces selection and direction vectors

1. Elbow Criterion. The elbow criterion is a common rule of thumb to

determine what number of clusters should be chosen. It states that number

of clusters be chosen so that adding another cluster does not add suf�cient

information [223]. More precisely, if the percentage of variance explained by

the clusters is plotted against the number of clusters, the �rst clusters will add

much information (explain a lot of variance), but at some point the marginal

gain will drop, giving an angle in the graph (the elbow). This elbow can not

always be unambiguously identi�ed. On the following graph (Fig. 4.10) which

is drawn for the problem in hand, the elbow is indicated by the "goose egg". The

number of clusters chosen is therefore three.

100

2. The number of subspaces Ns into which vectors will be distributed is selected

based on elbow criterion in relation to underlying image features like edges

present in the data. As mentioned in the previous Chapter, the edge is considered

because it is one of the important image underlying features and characteristics.

Moreover it is well established fact that blurring mostly causes loss of edge

information [111]. An edge could be ascending (1; 2; 3), descending (3; 2; 1),

wedge (1; 3; 2), �at (1; 1; 1), triangular (1; 3; 1) etc. Empirically it is found that

going from three to four clusters does not add suf�cient information, as the end

result has no signi�cant change in entropy values as indicated in Table 4.1 and

evident from Fig. 4.10. The impact of clustering is noted for six different test

images (TIs), shown in Chapter 5. As a result of this study, Ns = 3 is chosen

considering the �rst three most general types of edges.

3. The sub space direction vectors vn (n = 1; 2 : : : Ns) are selected that will best

represent the subspaces. As these subspaces are de�ned on the basis of edges,

so three directional vectors vh; vc; vl are computed in the following manner:

(a) vh is obtained by rearranging (any) 3 integers in descending order.

(b) vcis obtained by rearranging (any) 3 integers in a wedge shape where

the highest value occurs in the middle and values on either side are in

descending order.

(c) vl is obtained by rearranging (any) 3 integers in ascending order.

101

Figure 4.10: Elbow criterion

4. All the direction vectors vh; vc; vl are normalized.

4.1.1.4 Correlation

An input vector xi is chosen from input spectrogram. The correlation between

each input vector xi from input TFD and each direction vector vh; vc; vl is calculated,

i.e. tij = xTi vj is computed where j = h; c; l.

4.1.1.5 Taxonomy

1. There will be Ns product values obtained as a result of last step for each input

vector xi. To �nd the best match, if tic has the largest value then this indicates

102

Table 4.1: Entropy values vs clusters

Description EQ (bits) for test TFDsTI 1 TI 2 TI 3 TI 4 TI 5 TI 6

No cluster 20:539 18:113 18:323 19:975 21:548 17:9102 clusters 13:523 12:294 12:421 11:131 14:049 11:9403 clusters 8.623 6.629 7.228 5.672 8.175 6.9484 clusters 8:101 6:300 7:202 5:193 8:025 6:7335 clusters 7:998 6:187 7:111 5:012 7:939 6:6786 clusters 7:877 6:015 7:019 5:995 7:883 6:661

that the input xi is most similar to the directional vector vc, which implies that

the vector is wedge type.

2. Step (1) is repeated for all input vectors. Consequently all the vectors are

classi�ed based on the type of edge they represent and Ns clusters are obtained.

Input spectrograms are depicted in Fig. 4.4. The statistical data revealing

numerical values for each type of vector for these two TFD images is shown in

Table 4.2.

3. Pairs of input vectors and targets (mean values of the pixels of the corresponding

window from the target TFD) are formed.

4. These pairs are divided into training set and validation set for training phase and

by observing error on these two sets, the aspect of over�tting is avoided.

These steps of vectorization, correlation and taxonomy are further elaborated

in graphical form by Fig. 4.11.

103

TFD Image

Vectorization

[1 2 3], [3 2 1], [1 3 2]… [1 3 2]

NormalizedDirectionVector 1[1 2 3] Correlation

NormalizedDirectionVector 2[3 2 1]

NormalizedDirectionVector 3[1 3 2]

AscendingVectors Wedge

Vectors

DescendingVectors

Figure 4.11: Vectorization, correlation and taxonomy of TFD image.

4.1.2 Processing through Bayesian Regularized Neural NetworkModel

Fig. 4.12 represents this module. There are three steps in this module, namely (i)

training of BRNNM, (ii) selecting the LNNs, and (iii) testing the LNNs. They are

discussed in the following subsection.

4.1.2.1 Training of BRNNM

1. Since the ANN is being used in a data�rich environment to provide high

resolution TFDs, it is important that it does well on data it has not seen before,

i.e. that it can generalize. To make sure that the network does not become over

trained. the error is monitored on a subset of the data that does not actually

take part in the training. This subset is called the validation set other than the

104

Output Data

LNN 1 LNN 2 LNN N

Cluster 1

… … ……Cluster 2 Cluster N…

Training Data

Cluster 1 Cluster 2 Cluster N

Correlation

Vectorization

Test TFDs

…

Figure 4.12: Bayesian regularised neural network model

105

Table 4.2: Cluster parameters

Various parameters Cluster 1 consisting of Cluster 2 consisting of Cluster 3 consisting of(input training TFDs) ascending edge type descending edge type wedge edge type

vectors vectors vectorsVectors from spectrogram 19157 18531 112of sinusoidal FM signalVectors from spectrogram of 4817 4959 52parallel chirps' signalThe best ANN taken as LNN ANN � 3 ANN � 2 ANN � 1Time taken by the best ANN 308 seconds 114 seconds 55 secondsto complete the trainingMSE converged by the best 2:54� 10�4 3:56� 10�4 1:38� 10�2ANN (LNN)

training set. If the error of the validation sets increases the training stops. For

this purpose, alternate pairs of vectors from input and target TFDs are included

in training and validation sets.

2. The input vectors represented by xi and the mean values yi of the pixel values,

of the corresponding window from the target TFD image are used to train the

multiple ANNs under Bayesian framework. There are three ANNs trained

for each cluster, being the smallest numerical value to check the advantage

of training multiple ANNs. This selection has no relation with the number of

subspaces or direction vectors.

3. Step (2) is repeated until all pairs of input and corresponding target vectors are

used for training.

106

4.1.2.2 LNNs' selection

1. As mentioned above, three ANNs are being trained for each cluster and the

best for each cluster is required to be selected. The �training record� is a

programmed structure in which the training algorithm saves the performance of

the training�set, test�set, and the validation�set, as well as the epoch number

and the learning rate. By keeping track of the network error or performance,

accessible via the training record, the best network is selected for each cluster.

These best networks for the respective clusters are called the LNNs.

2. Using multiple networks for each cluster is found to be advantageous because

the weights are initialized to random values, and when the network begins to

over��t the data, the error in the validation set typically begins to rise. If this

happens for a speci�ed number of iterations, the training is stopped, and the

weights and biases at the minimum of the validation error are obtained. As a

result, various networks will have different MSEs in the last training epoch. The

ANN with minimum MSE is the winner and is included in the LNNs. There

are three ANN trained for each of three clusters, and it is found that ANN � 3

and ANN � 2 are the best for the �rst and second clusters respectively, and the

ANN � 1 is found to be the best for the third cluster only. This fact is evident

from Table 4.2 as well. It is assumed that these selected ANN are optimally

trained and will posses better generalization abilities.

107

4.1.2.3 Testing of BRNNM

1. Test TFDs are converted to vectors (zi) and clustered after correlating with the

direction vectors, as done for the input TFDs.

2. Each test vector zi is fed to the LNN trained for the type.

3. The steps are repeated until all test vectors are tested.

4.1.3 Post�processing of the Output Data

This module is illustrated in Fig. 4.13. After testing phase, the resultant data is post�

processed to get the resultant TFD. As we obtain one value for each vector of length

three from test TFD after processing through the LNNs. There are two possibilities

to �ll the rest of two pixels, either (i) replicate the same value for other two places,

or (ii) use zero padding around this single value to complete the number of pixels.

Zero padding is optimal because it is found to reduce the blur in TF plane. Next

the resultant vectors of correct length are placed at their original places from where

they were correlated and clustered. These vectors are placed according to the initially

stored grid positions.

108

Output Data

Declustering

Placement atappropriategrid positions

Formation of highlyconcentrated & good

resolutions TFD

Figure 4.13: Post-processing of the output data

109

4.2 Performance Evaluation

To address the stated problem, Bayesian Regularized LMB training algorithm is used

with feed forward back propagation ANN architecture and 40 neurons in the single

hidden layer. This architecture is chosen after an empirical study [224, 226]. We ex-

periment with various training algorithms using different parameters such as different

activation functions between layers, number of hidden layers and number of neurons.

Also the positive impact of localised processing by selecting the best trained ANN

out of many is ascertained [227]. The `tansig' and `poslin' transfer functions are used

respectively representing the hidden layer of sigmoid neurons followed by an output

layer of positive linear neurons. Multiple layers of neurons with nonlinear transfer

functions allow the network to learn linear and nonlinear relationships between input

and output vectors. The linear output layer lets the network produce values outside

the range [�1:+ 1].

To train the BRNNM, the spectrograms and WD of the two signals are used as

input and target TFDs respectively. The �rst signal is a sinusoidal FM signal, given

by:

x(n) = e�i�f52+!(n)gn (4.1)

where ! (n) = 0:1 sin�2�nN

�, and N refers to the number of sampling points. The

spectrogram of this signal is depicted in Fig. 4.4(a). The respective target TFD,

obtained through WD, is depicted in Fig. 4.7(a).

110

The second signal is with two parallel chirps given by:

Y (n) = x1(n) + x2(n) (4.2)

where

x1(n) = ei!1(n)n with !1(n) =�n

4nand

x2(n) = ei!2(n)n with !2(n) =�

3+�n

4n

whereN refers to the total number of sampling points in the signal. The spectrogram

of this signal is depicted in Fig. 4.4(b). The respective target TFD, obtained through

the WD, is depicted in Fig. 4.9(a). The model's performance is evaluated using test

TFD of a bat echolocation chirps signal, whose spectrogram is shown in Fig. 4.14(a).

As discussed the entropy can be considered as a measure of concentration [107] (the

lower the entropy of a distribution, the more concentrated it is,). The expression given

by Eqn. (3:1) is used to quantify the TFDs' performance in terms of concentration.

There is a requirement to determine the optimum number of clusters, in which

the input TFDs will be divided for processing by LNNs. The elbow criterion states

that the number of clusters must be chosen so that adding another cluster does not add

suf�cient information [223]. Entropy has an inverse relation to information [107].

TFDs with lesser entropy values will contain maximum information content. This

concept contributes signi�cantly to �nding the optimum number of clusters as per

elbow criterion. Table 4.1 shows that the TFDs processed through LNNs without

111

(a) (b)

Figure 4.14: Test TFDs for bat chirps signal, (a) the spectrogram TFD, and (b) Theresultant TFD after processing through proposed framework.

clustering (where we do not make use of effective correlation vectored taxonomy)

carry minimum information as the entropy values are the largest. This fact is further

elaborated in Fig. 4.10 where the percentage of variance explained by the clusters

is plotted against the number of clusters. Obviously the �rst cluster has added much

information (a lot of variance), i.e the entropy has reduced (see Table 4.1). Adding

another cluster, based on various edges present in the signal, further reduces the

entropy value for the resultant TFD image of test signals and improves the visual

result (shown in Fig. 5.2). The visual results are found to be indistinctive for any

additional cluster with an increase in computational complexity, but a hump in the

marginal gain is observed, giving an angle in the graph (the elbow). This elbow is

unambiguously identi�ed in Fig. 4.10 indicated by the "goose egg". The clusters are

therefore chosen to be 3.

112

Table 4.3: Entropy values for various techniques

The method Resultant EQ (bits) for test TFD

Correlation vectored taxonomy 7:228algorithm with three clustersas per elbow criterionWD 18:623Spectrogram 24:986Approach used by [132] 12:125

Figure 4.15: Resultant TFD obtained by the method of [132].

The entropy values of the test result by various techniques are recorded in Table

4.3. It is found that the proposed framework produces output TFD which has lower

entropy value in comparison to any other technique like WD, spectrogram, or the

OKM [132].

The de�blurred test TFD obtained by the proposed algorithm is shown in Fig.

4.14(b). It can be compared with existing methods like [132] which proposes a signal

dependent kernel that changes shape for each signal to offer improved TF represen-

113

tation for a large class of signals based on quantitative optimization criteria. The

resultant TFD of this technique is depicted in Fig. 4.15, which hides some impor-

tant signal information by losing the uppermost chirp, obvious in spectrogram of the

same signal (Fig. 4.14(b)).

4.3 Summary

The method presented in this Chapter provides an effective way to obtain high res-

olution TFDs of signals whose frequency components vary with time by using the

LNNs specially trained for various clusters of training data. As discussed earlier the

WD and the spectrogram QTFDs are often the easiest to use, they do not always pro-

vide an accurate characterization of the real data. The idea uses the spectrogram to

obtain an overall characterization of the STSC' structure, and then the information is

used to invest in the WD that is well matched to the data for further processing that

requires information that is not provided by the spectrogram. As a result, the IFs of

the individual components present in the non�stationary signals can be visually de-

termined and mathematically computed by calculating the average frequency at each

time [31, 34].

114

Chapter 5Discussion on Experimental Results

The discussion on experimental results by the proposed approach and perfor-

mance evaluation of various BDs is presented in this Chapter. It uses objective meth-

ods of assessment to evaluate the performance of de�blurred TFDs estimated through

BRNNM (henceforth the NTFDs). As discussed in Section 2.2, the objective meth-

ods allow quantifying the quality of TFDs instead of relying solely on visual inspec-

tion of their plots. In particular the computation regularities show the criteria's effec-

tiveness in quantifying the TFDs' concentration and resolution information. Perfor-

mance comparison with various other quadratic TFDs is provided too. This Chapter

is organized in three sections. Section 5.1 discusses the NTFDs' performance basing

on the visual results and carrying out their information quanti�cation by measuring

the entropy values only. In Section 5.2, the concept and importance of TFDs' ob-

jective assessment is described. These objective methods are used to evaluate the

performance of de�blurred TFDs obtained by the proposed BRNNM for both real

life and synthetic signals. Section 5.3 �nally summarizes the Chapter.

5.1 Visual Interpretation and Entropy Analysis

In the �rst phase, �ve synthetic signals are tested to evaluate the effectiveness of

the proposed algorithm basing on visual results and their entropy analysis. They

115

include (i) a two sets of parallel chirps signal intersecting at four places, (ii) a

mono�component linear chirp signal, (iii) combined quadratic swept�frequency sig-

nals whose spectrograms are concave and convex parabolic chirps respectively, (iv)

a combined crossing chirps and sinusoidal FM signal and (v) a quadratic chirp sig-

nal. Spectrograms of these signals are shown in Figs. 5.1(a) to 5.1(e) respectively.

Keeping in mind that estimation of the IF is rather dif�cult at the intersections of

chirps, the �rst and �fth test cases are considered to check the performance of pro-

posed algorithm at the intersection of the IFs of individual components present in the

signals.

The spectrogram of the two sets of parallel chirps signals crossing each other

at four points depicted in Fig. 5.1(a) is obtained by point�by�point addition of the

following two parallel chirps signals with different phases as indicated:

TS1(n) = X1(n) +X2(n); (5.1)

with the �rst set of parallel chirps computed as

X1(n) = x11(n) + x12(n);

where

x11(n) = ei[��n6N ]n and

x12(n) = ei[�3� �n6N ]n

116

and the second set of parallel chirps computed as

X2(n) = x21(n) + x22(n);

where

x21(n) = ei[�nN ]n and

x22(n) = ei[�+�nN ]n

The spectrogram of the resultant signal where individual components intersect each

other at multiple points is fed as the �rst test signal and is depicted in Fig. 5.1(a).

The second test signal is a mono�component chirp signal given by:

TS2(n) = ei[�+�n

N ]n (5.2)

The spectrogram of the resultant signal is depicted in Fig. 5.1(b).

The third test signal is obtained by point�by�point addition of two quadratic

swept�frequency signals whose spectrograms are concave and convex parabolic chirps

respectively. Mathematically both the signals can be obtained by manipulating dif-

ferent parameters of following equation:

TS3(n) = cos

�2�

�@

1 + �

��n(1+�)

�+ f0 +

�

360

�; (5.3)

where

@ = (f1 � f0) �(��)

117

(d)

(a)

(c)

(b)

(e)

Figure 5.1: Test TFDs (a) Crossing chirps (TI 1), (b) mono-component linear chirp(TI 2), (c) combined quadratic swept-frequency signals whose spectrograms are con-cave and convex parabolic chirps respectively (TI 3), (d) combined sinusoidal FMand crossing chirps (TI 4), and (e) quadratic chirp (TI 5)

118

here �; f0;f1; � and � are de�ned as the matching string constant, start frequency,

frequency after one second, initial phase of signal and sample rate respectively. The

spectrogram of the �rst quadratic swept�frequency signal is concave parabolic chirp

which starts at 250 Hz and go down to 0 Hz at a 1 kHz sample rate; whereas

spectrogram of the second quadratic swept�frequency signal is a convex parabolic

chirp starting at 250 Hz and going up to 500 Hz at a 1 kHz sample rate. These

aspects are evident in the combined spectrogram depicted in Fig. 5.1(c).

Another test signal is obtained by combining crossing chirps given in Eqn.

(5:4) below and sinusoidal FM signal in Eqn. (3:2),

TS4(n) = ei[�nN ]n + ei[�+

�nN ]n (5.4)

The spectrogram of the signal is depicted in Fig. 5.1(d).

Yet another test signal is a quadratic chirp which starts at 100 Hz and crosses

200 Hz at 1 second with a 1 kHz sample rate. It is obtained from Eqn. (5:3)

after necessary adjustment of different parameters. The spectrogram of this signal is

depicted in Fig. 5.1(e).

5.1.1 Resultant NTFDs � Experimental Results

The �ve synthetic test signals are: a combined parallel chirps signal crossing at four

points, a mono�component linear chirp signal, combined quadratic swept�frequency

signals whose spectrograms are concave and convex parabolic chirps respectively,

119

Table 5.1: Entropy values for various techniques

The method Resultant EQ (bits) for test TFDsTI 1 TI 2 TI 3 TI 4 TI 5

NTFDs 8:623 6:629 5:672 8:175 6:948WD 21:562 10:334 18:511 20:637 18:134Spectrogram 28:231 18:987 27:743 28:785 23:774

combined crossing chirps and sinusoidal FM signals without any intersection and a

quadratic chirp signal. The spectrograms of these signals constitute test image 1 (TI

1), test image 2 (TI 2), test image 3 (TI 3), test image 4 (TI 4), and test image 5

(TI 5). They are depicted in Figs. 5.1(a�e) respectively. In the initial attempt, the

expression given by Eqn. (3:1) is used to quantify the TFDs' information in form of

entropy values, which has an inverse relation with the information [107].

In the Table 5.1 entropy values for various type of TFDs have been recorded. It

is found that the NTFDs by the proposed ANN based framework have lower entropy

values than those of any other technique like WD and the spectrogram. TI 1 and TI

5 are taken into account to check the performance of the proposed algorithm with

LNNs for estimation of the IFs at the intersections along the individual components

in the signals. Even though estimation of IF is considered rather dif�cult at inter-

sections, the algorithm performs well as depicted in Figs. 5.2(a) and (d). The test

images including TI 2, TI 3 and TI 5 present the ideal cases to check the performance

of the proposed algorithm with LNNs trained with signals of different natures. The

resultant TFD images are highly concentrated along the IF of individual components

present in the signal as shown in Figs. 5.2(b), (c) and (e).

120

(a) (b)

(d) (e)

(c)

Figure 5.2: Resultant TFDs after processing through correlation vectored taxonomyalgorithm with LNNs for (a) Crossing chirps (TI 1), (b) mono-component linear chirp(TI 2), (c) combined quadratic swept-frequency signals whose spectrograms are con-cave and convex parabolic chirps respectively (TI 3), (d) combined sinusoidal FMand crossing chirps (TI 4), and (e) quadratic chirp (TI 5)

121

5.2 Objective Assessment

In this section, the objective measures described in Section 2.2 are used to analyze the

NTFDs' performance in comparison to other TFDs. The aim has been to �nd, based

on these measures, the highly informative TFDs having the best concentration and

the highest resolution. Five examples, other than the previous Section, including both

real life and synthetic multicomponent signals, are being considered. The signals

include (i) a multicomponent bat echolocation chirp signal, (ii) a two�component

intersecting sinusoidal FM signal, (iii) a two sets of nonparallel, nonintersecting

chirps' signal, and (iv) a closely spaced three�component signal containing a sinu-

soidal FM component intersecting the crossing chirps. The respective spectrograms,

termed as test image A (TI A), test image B (TI B), test image C (TI C), and test

image D (TI D), are shown in Figs. 4.14(a), 5.3(a)�5.5(a) respectively. As an il-

lustration of the evaluation of the NTFDs' performance through measures in Eqns.

(2:10) and (2:14), we have further considered a closely spaced multicomponent sig-

nal containing two signi�cantly close parallel chirps. The spectrogram of this signal,

termed as test image E (TI E), is depicted in Fig. 5.6(a). The resultant NTFDs for

the test signals are shown in Fig. 4.14(b) & Figs. 5.3(b)�5.6(b) respectively. The vi-

sual results are indicative of NTFDs' high resolution and concentration along the IF

of the individual component present in the signals.

122

(a) (b)

Figure 5.3: (a) The test spectrogram (TI 2) [Hamm;L = 90] . (b) The NTFD of asynthetic signal consisting of two sinusoidal FM components intersecting each other.

(a) (b)

Figure 5.4: (a) The test spectrogram (TI 3) [Hamm;L = 90] . (b) The NTFD of asynthetic signal consisting of two-sets of non-parallel, non-intersecting chirps.

123

(b)(a)

Figure 5.5: (a) The test spectrogram (TI 4) [Hamm;L = 90] . (b) The NTFD of asynthetic signal consisting of crossing chirps and a sinusoidal FM component.

(b)(a)

Figure 5.6: (a) The test spectrogram (TI 5), and (b) the NTFD of test case 4.

124

5.2.1 Real Life Test Case

Real life data for bat echolocation chirp sound (adopted from [134]) provides an

excellent multicomponent test case. The nonstationary nature of the signal is only

obvious from its TFD and neither the time nor the frequency domain representations

present a clear picture of its true nature. The spectrogram of this signal is shown

in Fig. 4.14(a), and the resultant NTFD is depicted in Fig. 4.14(b). The result for

the same test case TFD is computed using an existing OKM [132] and is plotted in

Fig. 4.15. The OKM proposes a signal�dependent kernel that changes shape for

each signal to offer improved TF representation for a large class of signals based on

quantitative optimization criteria. On close monitoring the OKM's output depicted

in Fig. 4.15, it is revealed that this TFD does not fully recover all the components,

thus losing some useful information about the signal. Whereas the NTFD is not only

highly concentrated along the IF of the individual components present in the signal

but also more informative able to show the all components.

For further analysis, slices of the test and resultant NTFDs are taken at the time

instants n = 150 and n = 310 (recall that n = 1; 2; : : : ; 400) and the normalized

amplitudes of these slices are plotted in Fig. 5.7. These instants are chosen because

three chirps are visible (see Fig. 4.14(b)) at these time instants. Fig. 5.7 con�rm the

peaky appearance of three different frequencies at these time instants. There are no

CTs and the results of the proposed method offer better frequency resolution. It is

worth mentioning that the NTFD not only successfully recovers the fourth component

125

(b)(a)

Figure 5.7: The time slices for the spectrogram (blue) and the NTFD (red) for the batecholocation chirps' signal, at n=150 (left) and n=310 (right)

(the weakest) but it has the best resolution i.e. (narrower main lobe and no side lobes)

compared to all the other considered distributions e.g in Fig. 2.1 (drawn with the

optimum parameters). The largest frequency seen in Fig. 5.7(b) is not recovered by

any other TFD drawn in Fig. 2.1.

5.2.2 Synthetic Test Cases

Further four specially synthesized signals of different nature are fed to the model to

check its performance at the intersection of the IFs and closely spaced components,

keeping in mind that estimation of the IF is rather dif�cult in these situations. The

test cases are described as under:

126

5.2.2.1 Test case 1.

The �rst one is the synthetic signal consisting of two intersecting sinusoidal

FM components, given as:

SynTS1(n) = e�i�( 52�0:1 sin(2�n=N))n + ei�(

52�0:1 sin(2�n=N))n (5.5)

The spectrogram of the signal is shown in Fig. 5.3(a).


The second synthetic signal contains two sets of nonparallel, nonintersecting

chirps once plotted on the TF plane. Mathematically it can be written as:

SynTS2(n) = ei�( n

6N )n + ei�(1+n6N )n + e�i�(

n6N )n + e�i�(1+

n6N )n (5.6)



It is a three�component signal containing a sinusoidal FM component inter-

secting two crossing chirps. It is expressed as:

SynTS3(n) = ei�( 52�0:1 sin(2�n=N))n + ei�(

n6N)n + ei�(

13� n6N)n (5.7)

The spectrogram of the signal is shown in Fig. 5.5(a). The frequency seperation

between the two components (sinusoidal FM and chirp components) in between

127

150 � 200 Hz near 0:5 sec is low enough and is just avoiding intersection. This

is to con�rm the model's effectiveness in de�blurring closely spaced components.


This particular test case is adopted from Boashash [33] to compare the TFDs'

concentration and resolution performance at the middle of the signal duration interval

by Boashash performance measures in Eqns. (2:10) and (2:14). The signal consists

of two LFMs whose frequencies increase from 0:15 to 0:25 Hz and from 0:2 to

0:3 Hz, respectively, over the time interval t8[1; 128]. The sampling frequency is

fs = 1 Hz.The authors in [33] have speci�cally found the modi�ed B distribution

(� = 0:01) as the best performing TFD for this particular signal at the middle after

measuring the signal components' parameters needed in Eqn. (2:14) (see Table 5.3).

The signal is de�ned as;

SynTS4(n) = cos�2��0:15t+ 0:0004t2

��+ cos

�2��0:2t+ 0:0004t2

��(5.8)


The above mentioned test cases are processed through the BRNNM and the es-

timated NTFDs are shown in Figs. 5.3(b)�5.6(b). High resolution and concentration

along the IF of individual components is obvious once inspecting these plots visually.

128


To evaluate the performance, numerical computations by the methods like the ratio

of norms based measures, Shannon & Rényi entropy measures, normalized Rényi en-

tropy measure and LJubisa measure are recorded in Table 5.2. The entropy measures

including Shannon & Rényi entropies with or without normalization make excellent

measures of the information extraction performance of TFDs. By the probabilistic

analogy, minimizing the complexity or information in a particular TFD is equivalent

to maximizing its concentration, peakiness, and, therefore, resolution [100]. To ob-

tain the optimum distribution for a given signal, the value of ratio of norms based and

Boashash resolution measures should be the maximum [112], whereas TFDs' yield-

ing the smallest values for LJubisa and Boashash concentration measures are consid-

ered as the best performing TFD in terms of concentration and resolution [33, 172].

The values in Table 5.2 refer to the NTFDs as the best TFDs according to most

of the criteria. Although few singularities are present in the data mainly attributable

to inherent shortcomings and derivations' assumptions, e.g. simple Rényi entropies,

being unable to detect zero mean CTs, indicate ZAMD as the best concentrated TFD.

However the more often used volume normalized Rényi entropies are the minimum

for the NTFDs.

It seems appropriate to plot these measures independently for various TI's (i.e.

TI A�TI D), which are shown in Fig. 5.8. These plots conform to the visual results

and highlight that the NTFDs are better in comparison to other considered distribu-

129

Table 5.2: Performance Measures Comparison for Various TFDs

Description Test Spec WVD ZAMD MHD CWD BJD NTFD SNN OKMTFD [132]

Shannon TI A 13.46 36.81 102.23 42.98 17.27 17.73 7.27 10.18 14.68

entropy TI B 13.45 64.33 76.81 37.74 20.82 20.43 8.75 10.88 18.08

measure TI C 18.66 185.49 274.73 126.02 28.08 28.05 7.87 13.45 21.42

TI D 18.94 74.82 87.30 49.24 35.31 29.92 17.25 24.23 23.57

Ratio of TI A 3.81 3.84 2.94 1.05 2.89 2.73 66 13.88 8.32

Norm based TI B 1.94 1.91 2.18 1.10 3.10 4.67 24 18.12 1.59

measure TI C 51.23 58.0 1.02 48.71 38.53 26.37 44 33.90 10.26

(�10�4) TI D 0.95 0.92 1.19 0.12 1.11 2.68 14 8 4.60

Rényi TI A 12.45 10.90 7 11.47 12.67 12.54 7.26 9.25 11.65

entropy TI B 12.98 9.95 7.56 11.03 12.06 11.85 8.74 10.89 13.82

measure TI C 17.07 14.01 8.62 14.74 16.24 15.84 7.85 12.82 17.22

TI D 12.47 9.48 7.06 10.50 11.54 11.34 8.23 10.03 13.31

Energy TI A 12.45 10.90 7 11.47 12.67 12.54 7.26 9.25 11.65

Normalized TI B 12.98 9.95 7.56 11.03 12.06 11.85 8.74 10.89 13.82

Rényi TI C 17.07 14.01 8.62 14.74 16.24 15.84 7.85 12.82 17.22

entropy measure TI D 12.47 9.48 7.06 10.50 11.54 11.34 8.23 10.03 13.31

Volume TI A 12.45 12.02 9.18 12.75 12.93 12.85 7.26 12.97 11.77

Normalized TI B 12.98 11.62 9.54 12.26 12.60 12.38 8.74 11.68 10.98

Rényi TI C 17.07 16.28 11.35 16.70 16.77 16.41 7.85 14.49 15.43

entropy measure TI D 12.47 9.48 7.06 10.50 11.54 11.34 8.23 10.03 10.31

LJubisa TI A 0.2219 3.30 13.14 2.9200 1.06 1.01 0.0015 0.0912 0.6300

measure TI B 0.1600 4.68 5.6266 1.1861 1.0123 0.8946 0.0024 0.0145 8.6564

(�105) TI C 6.03 47.05 39.64 36.47 33.08 29.39 0.0043 0.9973 14.73

TI D 0.1553 8.67 9.6253 5.1848 6.0110 5.8933 1.0030 3.0223 8.5551

tions. The congruence and regular nature of the curves is also obviuos in these plots

which are indicative of the objective criteria's validity.

Boashash performance measures for concentration and resolution as de�ned

in Eqns. (2:10) and (2:14) are computationaly expensive because they require calcu-

lations at various time instants. To limit the scope, these measures are computed at

the middle of the synthetic signal de�ned in Eqn. (5:8) and the results are compared

with the one reported in [33]. A slice is taken at t = 64 and the signal compo-

nents' parameters AM1(64); AM2(64); AM(64); AS1(64); AS2(64); AS(64); Vi1(64);

130

(a) (b)

(c) (d)

(e)

Figure 5.8: Comparison plots, criterions' values vs TFDs, for the test images 1�4, (a)The Shannon entropy measure, (b) Rényi entropy measure, (c) Volume normalizedRényi entropy measure,(d) Ratio of norm based measure, and (e) LJubisa measure

131

Table 5.3: Parameters and the Normalized Instantaneous Resolution PerformanceMeasure of TFDs for the Time Instant t=64

TFD (optimal parameter) AM (64) AS(64) AX (64) Vi(64) 4fi(64) D(64) R(64)Spectrogram (Hann;L = 35) 0.9119 0.0087 0.5527 0.0266 0.0501 0.4691 0.7188

WVD 0.9153 0.3365 1 0.0130 0.0574 0.7735 0.6199

ZAMD (a = 2) 0.9146 0.4847 0.4796 0.0214 0.0420 0.4905 0.5661

CWD (� = 2) 0.9355 0.0178 0.4415 0.0238 0.0493 0.5172 0.7541

BJD 0.9320 0.1222 0.3798 0.0219 0.0488 0.5512 0.7388

Modi�ed B (� = 0:01) 0.9676 0.0099 0.0983 0.0185 0.0526 0.5957 0.8449

NTFD 0.9013 0 0 0.0110 0.0550 0.800 0.9333

Table 5.4: Parameters and the Modi�ed Instantaneous Concentration PerformanceMeasure of TFDs for the Time Instant t=64

TFD AS1 (64) AS2 (64) AM1 (64) AM2 (64) Vi1 (64) Vi2 (64) fi1 (64) fi2 (64) C1(64) C2(64)(optimal parameters)Spectrogram 0.0087 0.0087 1 0.8238 0.03200 0.0200 0.1990 0.2500 0.1695 0.0905

(Hann;L = 35)WVD 0.3365 0.3365 0.9153 0.9153 0.0130 0.013 0.1980 0.2554 0.4333 0.4185

ZAMD(a = 2) 0.4848 0.4900 1 0.8292 0.0224 0.0204 0.2075 0.2495 0.5927 0.6727

CWD(� = 2) 0.0176 0.0179 1 0.8710 0.0300 0.0176 0.205 0.2543 0.1639 0.0898

BJD 0.1240 0.1204 1 0.8640 0.0270 0.0168 0.2042 0.2530 0.2562 0.2058

Modi�ed B 0.0100 0.0098 1 0.9352 0.0190 0.0180 0.200 0.2526 0.1050 0.0817

(� = 0:01)NTFD 0 0 0.8846 0.9180 0.0110 0.0110 0.2035 0.2585 0.0541 0.0425

Vi2(64); Vi(64); fi1(64); fi2(64)and�fi(64), as well as the CTs' magnitude AX(64)

are measured. These are then used to calculate the TFDs' normalized instantaneous

resolution and modi�ed concentration performance measures Ri(t) and Cn(t) , de-

�ned by Eqns. (2:15) and (2:11). The measurement results are recorded in Table 5.3

and Table 5.4 seperately for Ri(64) and Cn(64). The slice of the signal's NTFD at

t = 64 is shown in Fig. 5.9(f).

As mentioned earlier, a TFD that, at a given time instant, has the largest positive

value (close to 1) of the measure Ri is the TFD with the best resolution performance

132

(a) (b)

(c) (d)

(e) (f)

Figure 5.9: The normalized slices at t = 64 of TFDs. (a) The spectrogram. (b) WD.(c) ZAMD. (d) CWD. (e) BJD. (f) NTFD. First �ve TFDs (dashed) are comparedagainst the modi�ed B distribution (solid), adopted from Boashash [33].

133

at that time instant for the signal under consideration. From Table 5.3, the NTFD of

synthetic signal given by Eqn. (5:8) gives the largest value of Ri at time t = 64 and

hence is selected as the best performing TFD of this signal at t = 64. On similar lines,

the TFDs' concentration performance is compared at the middle of signal duration

interval. A TFD is considered to have the best energy concentration for a given

multicomponent signal if for each signal component, it yields the smallest

1. Instantaneous bandwidth relative to component IF (Vi(t)=fi(t)) and,

2. Sidelobe magnitude relative to mainlobe magnitude (AS(t)=AM(t)).

The measured results are recorded in Table 5.4, which indicate that the NTFD

of signal given by Eqn. (5:8) yield the smallest values of C1;2(t) at t = 64 and hence

is selected as the best concentrated TFD at t = 64. To draw a better comparison, the

values of Ri and C1;2 computed for different TFDs are plotted in Fig. 5.10. The plot

supports the tabulated results and con�rms the NTFD's superiority in comparison to

other considered TFDs.

5.3 Summary

In this Chapter, the objective assessment methods are used to compare the concen-

tration and resolution performance of TFDs for multicomponent signal analysis thus

using a quantitative measure of goodness for TFDs instead of relying solely on the

visual measure of goodness of their plots. Given a TFD, Boashash normalized instan-

134

(a) (b)

Figure 5.10: Comparasion plots for Boashash TFDs' performance measures vs TFDs,(a) The modi�ed concentration measure (Cn(64)), (b) normalized instantaneous res-olution measure (Ri)

taneous resolution and modi�ed concentration performance measures can be consid-

ered the most appropriate for quantifying its TF concentration and resolution. This

is due to the reasons that these objective measures take into account both the concen-

tration and resolution aspects, thus providing a better picture in the case of signals

with closely spaced components. What makes them the better choice is the inclusion

of the characteristics of TFDs that in�uence their resolution, such as components

concentration and separation and interference terms minimization. The quantitative

framework is found signi�cantly effective for the analysis and evaluation of the pro-

posed model's performance, using both synthetic and real life examples. Experimen-

tal results demonstrate the effectiveness of the approach.

135

Chapter 6Conclusion and Future Directions

The attempt to clearly understand what a time�varying spectrum is, and to rep-

resent the properties of a signal simultaneously in time and frequency without any

ambiguity, is one of the most fundamental and challenging aspects of analysis. A

large pubished scienti�c literature highlights the signi�cance of TF processing with

regard to improved concentration and resolution. However as this task is achieved

by many different types of TF techniques, it is important to search for the one that is

most pertinent to the application. Although the WD and the spectrogram QTFDs are

often the easiest to use, they do not always provide an accurate characterization of

the real data. The spectrogram results in a blurred version and the use of the WD in

practical applications has been limited by the presence of CTs and inability to pro-

duce ideal concentration for non�linear IF variations. The spectrogram, for example,

could be used to obtain an overall characterization of the STSC' structure, and then

the information could be used to invest in another QTFD that is well matched to

the data for further processing that requires information that is not provided by the

spectrogram, the idea conceived and implemented in this thesis [217].

In the �rst part of thesis, it is attempted to provide a response to the questions:

Why high concentration and good resolution is important? What were the motiva-

tions of various researchers to come up and implement newer methods for this pur-

136

pose? How they have used new ideas and implemented the techniques to achieve the

desired objectives? Concentrating on various methods and well�tested algorithms,

the discussion is focused on the basic concept, important peroperties, implementa-

tion methods and simulation results that emphasize the importance and signi�cance

of the technique to the analysis signals. However there are a large number of pro-

posed methods, and only a few have been explored in a sequence with an aim to

produce the ideas and techniques in a logical way.

In the rest of the thesis, a new ANN based approach incorporating Bayesian

regularization is implemented and evaluated of computing informative, non�blurred

and high resolution TFDs. The resulting TFDs do not have the CTs that appear in

case of multicomponent signals in some distributions such as WDs, thus providing

visual way to determine the IF of non�stationary signals. The technique explores that

the mixture of ENs focused on a speci�c task deliver a TFD that is highly concen-

trated along the IF with no CTs as compared to training the ANN which does not

receive the selected input. Experimental results presented in Chapter 5 demonstrate

the effectiveness of the approach.

For the completeness of proposed framework, the NTFDs' performance is fur-

ther assessed by the information theoretic criteria. These quantitative measures of

goodness are used instead of relying solely on the visual measure of goodness of

TFDs' plots. The mathematical framework to quantify the TFDs' information is

found effective in ascertaining the superiority of the results obtained by the ANN

137

based multiprocesses technique, using both synthetic and real life examples. The

NTFDs are compared with some popular distributions known for their CTs' suppres-

sion and high energy concentration in the TF domain. It is shown that the NTFDs

exhibit high resolution, no interference terms between the signal components and

are highly concentrated. Also they are found to be better at detecting the number of

components in a given signal compared to the conventional distributions.

6.1 Future Directions

In this thesis, a framework is proposed of computing informative, non�blurred and

high resolution version TFDs by identifying a novel utilization of the ANN �eld. To

assess and further improve the ef�ciency of this framework, as well as to identify

other useful extensions, following may be investigated:

1. If a TFD is positive and satis�es the marginals, it may be considered to be a

proper TFD for extraction of time�varying frequency parameters such as the IF.

This is because positivity coupled with correct marginals ensures that the TFD is

a true probability density function, and the parameters extracted are meaningful

[135]. The NTFD may be modi�ed to satisfy the marginal requirements, and still

preserve its other important characteristics like positivity. One way to optimize

the NTFD is by using the cross entropy minimization method [197, 198]. The

MCE optimization was �rst applied to TFDs by Loughlin et al. [246].

138

2. Indeed different applications have different preferences and requirements to the

TFDs. In general the choice of a TFD in a particular situation depends on many

factors such as the relevance of properties satis�ed by TFDs, the computational

cost and speed of the TFD, and the tradeoff in using the TFD. Although NTFDs

are de�blurred, and highly concentrated, but are discontinuous, showing energy

gaps, thus missing some signal information. Moreover it is found that the

resulting TFDs are not valid energy distributions because they do not observe the

signature continuity and marginal characteristics or weak signal mitigation. Due

to this reason, the results may not be feasible for certain applications because

different applications have different preference and requirement to the TFDs. It

can be attributed to the pre�processing limitations as the processed target WD

images, as shown in Fig. 4.9(a), are discontinuous at various places once seen

at high resolution. This aspect is expected to improve if the target TFDs are

made continuous along the IFs of individual components present in the signal.

However the BRNNM's produces results that are better or close to the actual

TFD images than the initial blurred estimates (spectrograms). Furthermore,

several TFDs, especially the ones satisfying the marginals, have discontinuities

[31]. The other possibilities like the use of region growing algorithms and

interpolating along the individual components may also be considered

3. The approach can be extended to the analysis of signals with more complicated

IF laws by possibly incorporating other techniques, for example, piece wise

139

linear approximation of the the IF using Hough transform and evolutionary

spectrum [45] and to the combined Wigner�Hough transform [46, 103] for CTs

suppression, optimal detection and parameter estimation. A separate work is

needed for the signals that are not linear or sinusoidal chirps with or without the

addition of noise, how performance of algorithm will change.

4. The method does not give a mathematical expression for the IF which is

important for certain applications such as jammer excision, but an image. The IF

can be computed for the resultant NTFDs, by calculating the average frequency

at each time [31].

5. Essentially the objective assessment part of the thesis is an incremental study

that combines existing results on concentration measure evaluation and TFD

kernel and thus merely scratches the surface of potential application of these

criteria in TF analysis. Worthy of pursuit seems the axiomatic derivation of an

application of the ideal TF complexity measure along the lines of Jones and

Parks for devising the ratio of distribution norms [112], Baraniuk and Jones's

effort in de�ning optimal kernel distributions' design [132], Rényi's work in

probability theory [118] and investigate other possible measures.

140

References

[1] L. Cohen, �Time�frequency distributions�A review,� Proc. IEEE, vol. 77 pp.941�981, July 1989.

[2] E.P. Wigner, �On the quantum correction for thermodynamic equilibrium,�PHYS. Rev., vol. 40, pp. 749�759, 1932.

[3] J. Ville, �Theorie et applications de la notion de signal analytique,� cables etTransmission, vol. 2, no. 1, pp. 61�74, 1946.

[4] S. Erkucuk, S. Krishnan, and M. Zeytinoglu, �Robust audio watermarkingusing a chirp based technique,� Proc. IEEE Intl. Conf. Multimedia and Expo(ICME '03), vol. 2, pp. 513�516, Baltimore, Md, USA, July 2003.

[5] A. Ramalingam and S. Krishnan, �A novel robust image watermarking usinga chirp based technique,� Proc. IEEE Canadian Conf. Electrical and Com-puter Engineering (CCECE '04), vol. 4, pp. 1889�1892, Ontario, Canada,May 2004.

[6] P. E. Gill, W. Murray, and M. H. Wright, Numerical Linear Algebra andOptimization, Addison�Wesley, Redwood City, CA, 1991.

[7] W. Rudin, Real and Complex Analysis. New York: McGraw�Hill, 1987.

[8] H. Margenau and R. N. Hill. �Correlation between measurements in quantumtheory,� Prog. Theor. Phys., vol. 26. pp. 772�738, 1961.

[9] P. Goupillaud, A. Grossmann, and J. Morlet, �Cycle�octave and related trans-forms in seismic signal analysis,� Geoexploration, vol. 23, pp. 85�102, 1984.

[10] I. Daubechies, �The wavelet transform, time�frequency localization, and sig-nal analysis,� IEEE Trans. Inform. Theory, vol. 36, pp. 961�1005, 1990.

[11] I. Daubechies, Time�frequency localization operators: A geometric phasespace approach, IEEE Trans. Inform. Theory, 34, pp. 605�612, 1988.

[12] O. Rioul and P. Flandrin, �Time�scale energy distributions: A general classextending wavelet transforms,� IEEE Trans. Signal Process., vol. 40, pp.1746�1757, 1992.

141

[13] J. Bertrand and P. Bertrand, �Time�frequency representations of broadbandsignals,� Proc. IEEE Intl. Conf on Acoustics, Speech, and Signal Processing(IEEE ICASSP), pp. 2196�2199, 1988.

[14] L. Cohen, �Distribution concentrated along the instantaneous frequency,�SPIE�Advanced Signal Processing Alg., Architect., Implement., vol. 1348,pp. 149�157, 1990.

[15] LJ. Stankovic, �An analysis of some time�frequency and time�scale distrib-utions,� Annales des Telecommun., vol. 49 , pp. 505�517 , 1994.

[16] C. Eichmann and B. Z. Dong, �Two�dimensional optical �ltering of l�Dsignals,� Appl. Opt., vol.21, pp. 3152�3156, 1982.

[17] B. Boashash and B. Ristic, �Polynomial WVD's and time�varying polyspec-tra�, in Higher Order Statistical Proc., B. Boashash et al., Eds. London, U.K.:Longman Chesihire, 1993.

[18] B. Boashash and P. O'Shea, �Polynomial Wigner�Ville distributions andtheir relationship to time�varying higher order spectra,� IEEE Trans. SignalProcess., vol. 42, no. 1, pp. 216�220, Jan. 1994.

[19] J. E. Allen and L. R. Rabiner, �A uni�ed approach to short� time Fourieranalysis and synthesis,� Proc. IEEE, vol. 65, pp. 1558�1564, 1977.

[20] R. Altes, �Detection, estimation and classi�cation with spectrograms,� Jour-nal Acoust. Soc. Am., vol. 67, pp. 1232�1246,1980.

[21] A. Dziewonski, S. Bloch, and M. Landisman, �A technique for the analysisof transient signals,� Bull. Seism. Soc. Am., Vol. 59, pp. 427�444, 1969.

[22] J. Flanagan, Speech Analysis Synthesis and Perception. New York, NY:Springer, 1972.

[23] A. L. Levshin, V. F. Pisarenko, and G. A. Pogrebinsky, �On a frequency�timeanalysis of oscillations,� Ann. Geophys., Vol. 28, pp. 211�218, 1972.

[24] A. V. Oppenheim, �Speech spectrograms using the fast Fourier transform,�IEEE Spectrum, vol. 7, pp. 57�62, 1970.

142

[25] M. R. Portnoff, �Time�frequency representation of digital signals and sys-tems based on short�time Fourier analysis,� IEEE Trans. Acoust., Speech,Signal Process., vol. ASSP�28, pp. 55�69, 1980.

[26] B. Boashash, �Estimating and interpreting the instantaneous frequency of asignal�Part 1: Fundamentals�, Proc. IEEE, Vol. 80, pp. 519�538, Apr. 1992.

[27] B. Boashash, �Estimating and interpreting the instantaneous frequency of asignal. II. Algorithms and applications,� Proc.IEEE, vol. 80, no. 4, pp. 540�568, Apr. 1992.

[28] V. Katkovnic and L. Stankovic,"Instantaneous frequency esimation usingthe Wigner distribution with varying and data�driven window length," IEEETrans. Signal Process., 46, pp. 2315�2325, Sep. 1998.

[29] Z.M.Hussain and B. Boashash, " Adaptive Instantaneous Frequency Esitma-tion of Multi�component FM Signals," Proc. IEEE ICASSP, 5, pp. 657�660,Jun. 2000.

[30] V. Katkovnic, "Nonparametric estimation of instantaneous frequency," IEEETrans. Info. Theory, 43, pp. 183�189, Jan. 1997.

[31] L. Cohen, �Time Frequency Analysis�, Prentice�Hall, NJ, 1995.

[32] S. Qian, D. Chen, �Joint time�frequency analysis,� IEEE Signal ProcessingMagazine, vol. 16, no. 2, pp. 52�67, Mar. 1999.

[33] B. Boashash and V. Sucic, � Resolution Measure Criteria for the Objec-tive Assessment of the Performance of Quadratic Time�Frequency Distri-butions,� IEEE Trans. Signal Process., vol. 51, no. 5, pp. 1253�1263, May2003.

[34] B. Boashash, Time�Frequency Signal Analysis and Processing, B. Boashash,Ed. Englewood Cliffs, NJ: Prentice�Hall, 2003.

[35] __________ , Time�Frequency Signal Analysis. Methods and Applications,B. Boashash, Ed. Melbourne, Australia/NewYork: Longman�Cheshire/Wiley,1992.

[36] S. Aviyente, and W. J. Williams, �Minimum Entropy Time�Frequency Dis-tributions,� IEEE Signal Process. Lett., vol. 12, no. 1, pp. 37�40, Jan. 2005.

143

[37] R. G. Baraniuk, P. Flandrin, A. J. E. M. Janssen, and O. Michel, �Measuringtime�frequency information content using the Rényi entropies,� IEEE Trans.Info. Theory, vol. 47, no. 4, pp. 1391�1409, May 2001.

[38] A. Kayhan, A. El�Jaroudi, and L. Chaparro. The evolutionary periodogramfor non�stationary signals. IEEE Trans. Signal Process., 42(6), 1994.

[39] A. S. Kayhan, A. El�Jaroudi, and L. F. Chaparro, � Data�Adaptive Evolu-tionary Spectral Estimation,� IEEE Trans. Signal Process., vol. 43, no. 1, pp.204�213, Jan. 1995.

[40] Akan, A., & Chaparro, L.F., �Evolutionary Chirp Representation of Non�stationary Signals via Gabor Transform�, Signal Processing, Vol. 81, No.11, pp. 2429�2436, Nov 2001.

[41] Akan, A., & Chaparro, L.F., �Evolutionary spectral analysis using a warpedGabor expansion�, Proc. IEEE ICASSP, Vol. 3, pp. 1403�1406, May 1996.

[42] Akan, A., �Signal�Adaptive Evolutionary Spectral Analysis Using Instanta-neous Frequency Estimation�, FREQUENZ Journal of RF�Engineering andTelecommunications, Vol. 59, No. 7�8/2005, pp. 201�205, July�Aug 2005.

[43] Chaparro, L.F., Suleesathira, R., Akan, A., & Unsal, B., �Instantaneous Fre-quency Estimation using Discrete Evolutionary transform for Jammer Exci-sion�, Proc. IEEE ICASSP, vol. 6, pp. 3525 � 3528, 7�11 May 2001.

[44] Suleesathira, R., Chaparro, L.F., & Akan, A.,� Discrete Evolutionary Trans-form for Time�Frequency Analysis�, Conf. Record of the 32nd AsilomarConference on Signals, Systems & Computers, vol. 1, pp. 812 � 816, 1�4Nov. 1998

[45] Suleesathira, R., & Chaparro, L.F.,�Interference Mitigation in Spread Spec-trumUsing Discrete Evolutionary and Hough Transforms�, Proc. IEEE ICASSP,vol. 5, pp. 2821 � 2824, 5�9 June 2000.

[46] Barbarossa, S., Scaglione, A., Spalletta, S., Votini, S.,"Adaptive suppres-sion of wideband interferences in spread�spectrum communications usingthe Wigner�Hough transform," Proc. IEEE ICASSP, Vol. 5, pp. 3861�3864,21�24 April 1997.

144

[47] Chaparro, L.F., Alshehri, A., "Jammer excision in spread spectrum commu-nication via wiener masking and frequency�frequency evolutionary trans-form," Proc. IEEE ICASSP, Vol. 4, pp.473�476, 6�10 April 2003.

[48] Akan, A., & Chaparro, L.F.,�Multi�window Gabor Expansion for Evolution-ary Spectral Analysis�, Signal Processing, Vol. 63, pp. 249�262, Dec. 1997.

[49] Jachan, M. Matz, G. Hlawatsch, F. , �Time�Frequency ARMA Models andParameter Estimators for Underspread Nonstationary Random Processes�,IEEE Trans. Signal Process., Vol. 55, Number 9, pp. 4366�4381, Sept 2007.

[50] M. Niedz´wiecki, Identi�cation of Time�Varying Processes. New York: Wi-ley, 2000.

[51] G. Matz and F. Hlawatsch, �Nonstationary spectral analysis based on time�frequency operator symbols and underspread approximations,� IEEE Trans.Info. Theory, vol. 52, pp. 1067�1086, Mar. 2006.

[52] G. Matz and F. Hlawatsch, �Time�varying power spectra of nonstationaryrandom processes,� in Time�Frequency Signal Analysis and Processing: AComprehensive Reference, B. Boashash, Ed. Oxford, U.K.: Elsevier, ch. 9.4,pp. 400�409, 2003.

[53] M. Wax and T. Kailath, �Ef�cient inversion of Toeplitz�block Toeplitz ma-trix,� IEEE Trans. Acoust., Speech, Signal Process., vol. 31, pp. 1218�1221,Oct. 1983.

[54] Y. Grenier, �Parametric time�frequency representations,� in Traitement duSignal/Signal Processing, Les Houches, Session XLV, J. L. Lacoume, T. S.Durrani, and R. Stora, Eds. Amsterdam, The Netherlands: Elsevier, pp. 338�397, 1987.

[55] M. Jachan, G. Matz, and F. Hlawatsch, �Time�frequency�autoregressiverandom processes: Modeling and fast parameter estimation,� Proc. IEEEICASSP, Hong Kong, vol. VI, pp. 125�128, Apr. 2003.

[56] Y. Grenier, �Time�dependent ARMA modeling of nonstationary signals,�IEEE Trans. Acoust., Speech, Signal Process., vol. 31, pp. 899�911, Aug.1983.

[57] N. A. Abdrabbo and M. B. Priestley, �On the prediction of nonstationaryprocesses,� J. Roy. Stat. Soc. Ser. B, vol. 29, no. 3, pp. 570�585, 1967.

145

[58] M. Jachan, F. Hlawatsch, and G. Matz, �Linear methods for TFARMA pa-rameter estimation and system approximation,� Proc. 13th IEEE WorkshopStatistical Signal Processing, Bordeaux, France, pp. 909�914, Jul. 2005.

[59] P. Flandrin, Time�Frequency/Time�Scale Analysis. San Diego, CA: Acad-emic, 1999.

[60] F. Hlawatsch and P. Flandrin, �The interference structure of theWigner dis-tribution and related time�frequency signal representations,� in The WignerDistribution�Theory and Applications in Signal Processing, W. Mecklen-bräuker and F. Hlawatsch, Eds. Amsterdam, The Netherlands: Elsevier, pp.59�133, 1997.

[61] M. Jachan, G. Matz, and F. Hlawatsch, �TFARMAmodels: Order estimationand stabilization,� Proc. IEEE ICASSP, Philadelphia, PA, vol. IV, pp. 301�304, Mar. 2005.

[62] H. Akaike, �A new look at the statistical model identi�cation,� IEEE Trans.Autom. Control, vol. 19, pp. 716�723, Dec. 1974.

[63] S. M. Kay, Modern Spectral Estimation. Englewood Cliffs, NJ: Prentice�Hall, 1988.

[64] Shah, S.I., Chaparro, L.F., & El�Jaroudi, A.,"Generalized Transfer Func-tion Estimation using Evolutionary Spectral Deblurring�, IEEE Trans. SignalProcess., Vol. 47, Number 8, pp. 2335�2339, August 1999.

[65] Shah, S.I., "Generalized Transfer Function estimation and Informative Priorsfor Positive Time�Frequency Distributions", PhD Dissertation, University ofPittsburgh, Pittsburgh, PA, 1997.

[66] Unsal Artan, R.B., Akan, A., Chaparro, L.F.,"Higher order evolutionaryspectral analysis," Proc. IEEE ICASSP, Vol. 4, pp. 633�636, 6�10 April2003.

[67] Wexler, J., and Raz, S., "Discrete Gabor Expansions," Signal Processing,vol. 21, no. 3, pp. 207�220, Nov. 1990.

[68] L. Cohen, "Generalized phase�space distribution functions," J. Math. Phys.,vol. 7, pp. 781�786, 1966.

146

[69] T. A. C.M. Claasen andW. F. G.Mecklenbrau ker, "TheWigner distribution�a tool for time�frequency signal analysis; part I: continuous�time signals,"Philips Journal of Research, vol. 35, pp. 217�250, 1980.

[70] T. A. C. M. Claasen andW. F. G. Mecklenbrauker, "TheWigner distribution�a tool for time�frequency signal analysis; part II: discrete time signals,"Philips Journal of Research, vol. 35, pp. 276�300, 1980.

[71] T. A. C. M. Claasen andW. F. G. Mecklenbrauker, "TheWigner distribution�a tool for time�frequency signal analysis; part III: relations with other time�frequency signal transformations," Philips Journal of Research, vol. 35, pp.372�389, 1980.

[72] A. J. E. M. Janssen, �On the locus and spread of pseudo�density functions inthe time�frequency plane,� Philips Journal of Research, vol. 37, pp. 79�110,1982.

[73] C. P. Janse and J. M. Kaizer, �Time�frequency distributions of loudspeakers:the application of the Wigner distribution,� Journal of Audio Engg. Soc., vol.31, pp.198�223, 1983.

[74] B. Boashash, "Representation temps�frequence," Soc. Nat. ELF Aquitaine,Pau, France, Publ. Recherches, no. 373�378, 1978.

[75] P. Flandrin and W. Martin, �A general class of estimators for the Wigner�Ville spectrum of nonstationary processes,� in Systems Analysis and Opti-mization of Systems, Lecture Notes in Control and Information Sciences.Berlin, Vienna, New York Springer�Verlag, pp. 15�23, 1984.

[76] R. D. Hippenstiel and P. M. de Oliveira, `Time varying spectral estima-tion using the instantaneous power spectrum (IPS),� IEEE Trans. Acoust.,Speech, Signal Process., vol. 38, pp. 1752�1759, 1990.

[77] P. Flandrin and B. Escudie, �Time and frequency representation of �nite en-ergy signals: a physical property as a result of a Hilbertian condition,� SignalProcessing, vol. 2, pp. 93�100, 1980.

[78] H. Margenau and L. Cohen, �Probabilities in quantum mechanics,� in Quan-tum Theory and Reality, M. Bunge, Ed. New York, NY: Springer, 1967.

147

[79] L. Stankovic, "A Time�Frequency Distribution Concentrated Along the In-stantaneous Frequency�, IEEE Signal Process. Lett., Vol. 3, No. 3, pp. 89�91, March 1996.

[80] LJ. Stankovic and S. Stankovic, �An analysis of the instantaneous frequencyrepresentation using time�frequency distributions�Generalized Wigner dis-tribution,� IEEE Trans. Signal Process., vol. 43, no. 2, Feb. 1995.

[81] LJ. Stankovic, �A method for improved distribution concentration in thetime�frequency signal analysis using the L�Wigner distribution,� IEEE Trans.Signal Process., vol. 43, no. 5, May 1995.

[82] ___________, �A multitime de�nition of the Wigner higher order distribu-tion: L�Wigner distribution,� IEEE Signal Process. Lett., vol.1, no. 7, pp.106�109, July 1994.

[83] __________, �An analysis of the Wigner higher order spectra of multicom-ponent signals,� Ann. Telecomm., vol. 49, no. 3�4, pp. 132�136, Mar/Apr.1994.

[84] I. Djurovic´ and LJ. Stankovic´, �In�uence of high noise on the instanta-neous frequency estimation using time�frequency distributions,� IEEE Sig-nal Process. Lett., vol. 7, pp. 317�319, Nov. 2000.

[85] J. R. Fonolosa and C. L. Nikias: �Wigner higher order moment spectra: De�-nitions, properties, computation and application to the transient signal analy-sis,� IEEE Trans. Signal Process., vol. 41, no. 1, pp.245�266, Jan. 1993.

[86] F. Hlawatsch and G. F. Boudreaux�Bartels, �Linear and quadratic time�frequency signal representations,� IEEE Signal Processing Mag., pp.21�67,Apr. 1992.

[87] Slueesathira, R., Chaparro, L.F., Akan, A., "Discrete Evolutionary Trans-form for Positive Time Frequency Signal Analysis", Journal of Franklin In-stitute, Vol. 337, No. 4, pp. 347�364, 2000.

[88] LJ. Stankovic, �L�class of time�frequency distributions,� IEEE Signal Process.Lett., vol. 3, pp. 22�25, Jan. 1996.

[89] __________, �On the realization of the highly concentrated time�frequencydistributions,� Proc. IEEE Symp. TFTSA, Paris, pp. 461�464, Jun 1996.

148

[90] __________, �Highly Concentrated Time�Frequency Distributions: PseudoQuantum Signal Representation�, IEEE Trans. Signal Process., Vol. 45, No.3, pp. 543�551, March 1997.

[91] http://en.wikipedia.org/wiki/Reassignment_method.

[92] F. Hlawatsch and P. Flandrin, �The interference structure of the Wigner dis-tribution and related time�frequency signal representations,� in The WignerDistribution�Theory and Applications in Signal Processing, W.Mclenbrauker,ed., Amsterdam, Netherlands, Elsevier 1994.

[93] F. Auger and P. Flandrin, Improving the readability of time�frequency andtime�scale representations by the reassignment method, IEEE Trans. SignalProcess., vol. 43, pp. 1068 � 1089, May 1995.

[94] P. Flandrin, F. Auger, and E. Chassande�Mottin, Time�frequency reassign-ment: From principles to algorithms, in Applications in Time�FrequencySignal Processing (A. Papandreou�Suppappola, ed.), ch. 5, pp. 179 � 203,CRC Press, 2003.

[95] K. Kodera, C. de Villedary, and R. Gendrin, R, �A newmethod for the numer-ical analysis of non�stationary signals,� Phys. Earth and Planetary Interiors,vol. 12, pp. 142�150, 1976.

[96] K. Kodera, R. Gendrin, and C. de Villedary, �Analysis of time� varying sig-nals with small BT values,� IEEE Trans. Acoust., Speech, Signal Process.,vol. ASSP�26, pp. 64�76, 1978.

[97] D. J. Nelson, Cross�spectral methods for processing speech, Journal of theAcoustical Society of America, vol. 110, pp. 2575 � 2592, Nov. 2001.

[98] S. A. Fulop and K. Fitz, A spectrogram for the twenty��rst century, AcousticsToday, vol. 2, no. 3, pp. 26�33, 2006.

[99] ��, Algorithms for computing the time�corrected instan-taneous frequency (reassigned) spectrogram, with applications, Journal ofthe Acoustical Society of America, vol. 119, pp. 360 � 371, Jan 2006.

[100] D. L. Jones, T.W. Parks, �A Resolution Comparison of Several Time�FrequencyRepresentations,� IEEE Trans. Signal Process., vol. 40, No. 2, Feb 1992.

149

[101] LJubisa, S., Vladimir, K., Algorithm for the Instantaneous Frequency Esti-mation Using Time�Frequency Distributions with AdaptiveWindowWidth,IEEE Signal Process. Lett., Vol. 5, No. 9, pp. 224�227, 1998.

[102] Barkat, B., Abed�Meraim, K., Algorithms for Blind Components Separa-tion and Extraction from the Time�Frequency Distribution of Their Mixture,EURASIP Journal on Applied Signal Processing, Vol. 2004, No. 13, pp.2025�2033, 2004.

[103] Barbarossa, S., "Analysis of Multicomponent LFM Signals by a CombinedWigner�Hough Transform", IEEE Trans. Signal Process., Vol. 46, No. 6, pp.1511�1515, 1995.

[104] Yagle, A.E., Torres�Fernandez, J.E., "Construction of Signal�DependentCohen's�class time�frequency distributions using iterative blind deconvo-lution", Proc. SPIE, Advanced Signal Processing Algorithms, Architectures,and Implementations XIII. Edited by Luk, Franklin T., Vol. 5205, pp. 47�58,2003.

[105] C. Stergiou, �What is a Neural Network�, http://www.doc.ic.ac.uk.

[106] C. Stergiou, �Neural Networks, the Human Brain and Learning�, http://www.doc.ic.ac.uk

[107] R.M. Gray, �Entropy and Information Theory�. New York Springer�Verlag,1990.

[108] K. Jain, J. Mao and K. M. Mohiddin, �Arti�cial Neural Network: A tutorial�,IEEE Trans. Computers, pp. 31�44, 1996.

[109] Basu, M., and Su, M., "Deblurring images using projection pursuit learningnetwork," Proc. Int. Joint Conf. on Neural Networks, IJCNN'99, Washing-ton, DC, 1999.

[110] A.E. Ruano, �Intelligent Control Systems Using Computational IntelligenceTechniques�, July 2005.

[111] R.C. Gonzalez & P.Wintz, �Digital Image Processing�, 2nd Ed., Addison�Wesley, 1987.

[112] D. Jones and T. Parks, �A high resolution data�adaptive time�frequencyrepresentation,� IEEE Trans. Acoust., Speech, Signal Process., vol. 38, pp.2127�2135, Dec. 1990.

150

[113] W. J. Williams and T. Sang, �Adaptive RID kernels which minimize time�frequency uncertainty,� Proc. IEEE�SP Int. Symp. Time�Freq. Time�ScaleAnal., Philadelphia, PA, pp. 96�99, Oct. 1994.

[114] T. H. Sang and W. J.Williams, �Renyi information and signal�dependentoptimal kernel design,� Proc.IEEE ICASSP, vol. 2, Detroit, MI, pp. 997�1000, May 1995.

[115] P. M. Oliveira and V. Barosso, �Uncertainty in the time�frequency plane,�Proc. 10th IEEE Workshop Statist. Signal Array Process., Pocono Manor,PA, pp. 607�611, Aug. 2000.

[116] W. J. Williams, M. Brown, and A. Hero, �Uncertainty, information and time�frequency distributions,� in SPIE�Advanced Signal Processing Algorithms,vol. 1556, pp. 144�156, 1991.

[117] C. E. Shannon, �A mathematical theory of communication, Part I,� Bell Sys.Tech J., vol. 27, pp. 379�423, July 1948.

[118] A. Rényi, �On measures of entropy and information,� Proc. 4th BerkeleySymp. Math. Stat. and Prob., vol. 1, pp. 547�561, 1961.

[119] C. Arndt, �InformationMeasures: Information and its Description in Scienceand Engineering�, Springer, Berlin, 2001.

[120] D. Gabor, �Theory of communication,� J. Inst. Electron. Eng., vol. 93, no.11, pp. 429�457, Nov. 1946.

[121] D. Vakman, �Optimum signals which minimizes partial volume under anambiguity surface,� Radio Eng., Electron. Phys.� vol. 27, pp. 1260�1268,Aug. 1968.

[122] A. Dziewonsi, S. Bloch, and M. Landisman, �A technique for the analysisof transient seismic signals,� Bull. Seismological Soc. Amer., pp. 427�449,Feb. 1969.

[123] G. L. Duckworth, �Processing and inversion of Arctic Ocean refraction data,�Sc.D. dissertation, Massachusetts Inst. Technol., Campridge, MA, 1983.

[124] G. W. Deley, �Waveform design,� in Radar Handbook, M.I. Skolnik, Ed.New York, NY: McGraw�Hill, 1970.

151

[125] W. Rihaczek, Principles of High�Resolution Radar. NewYork, NY:McGraw�Hill, 1969.

[126] M. I. Scolnik, lntroduction to Radar Systems. New York, NY: McCraw�Hill,1980.

[127] P. M. Woodward, Probability and information Theory with Application toRadar. London, England: Pergamon, 1953.

[128] H. H. Szu and J. Blodgett, �Wigner distribution and ambiguity functions,� inOptics in Four Dimensions, L. M. Narducci, Ed. NewYork, NY:Am. Inst.ofPhysics, pp.355�381, 1981.

[129] C. Eichmann and N. M. Marinovic, �Scale�invariant Wigner distribution andambiguity functions,� Proc. lnt. Soc. Opt. Eng., Proc. SPIE, vol. 519, pp. 18�24, 1985.

[130] R. G. Baraniuk, �Shear Madness: Signal�Dependent and Metaplectic Time�Frequency Representations,� Ph.D. Thesis, Department of Electrical and Com-puter Engineering, University of Illinois at Urbana�Champaign, August 1992.

[131] R. G. Baraniuk, D. L. Jones, �A Signal�Dependent Time�Frequency Repre-sentation: Optimal Kernel Design,� IEEE Trans. Signal Process., vol. 41, no.4, pp. 1589�1602, April 1993.

[132] R. G. Baraniuk and D. L. Jones, �Signal�Dependent Time�Frequency Analy-sis Using a Radially Gaussian Kernel,� Signal Processing, vol. 32, no. 3, pp.263�284, June 1993.

[133] D. L. Jones and R. G. Baraniuk, �An Adaptive Optimal�Kernel Time�FrequencyRepresentation,� IEEE Trans. Signal Process., vol. 43, no. 11, pp. 2361�2371, October 1995.

[134] http://www�dsp.rice.edu.

[135] L. Cohen and T.E. Posch, �Positive Time�Frequency Distribution Functions,�IEEE Trans. Acoust. Speech Signal Process., 33, pp. 31�37, Feb. 1985.

[136] W. D. Mark, �Spectral analysis of the convolution and �ltering of non�stationary stochastic processes,�J. Sound Vib., VOL. 11, pp. 19�63, 1970.

152

[137] R. M. Fano, �Short�time autocorrelation functions and power spectra,� J.Acoust. Soc. Am., vol. 22, pp. 546�550, 1950.

[138] M. R. Schroeder and B. S. Atal, �Generalized short�time power spectra andautocorrelation functions,� J. Acoust. Soc. Am., vol. 34, pp. 1679�1683,1962.

[139] M. H. Ackroyd, �Instantaneous and time�varying spectra� an introduction,�Radio Electron. Eng., vol. 239, pp. 45�152, 1970.

[140] ��, �Short�time spectra and time�frequency energy distribu-tions,�J. Acoust. Soc. Am., vol. 50, pp. 1229�1231,1970.

[141] D. G. Lampard, �Generalization of the Wiener�Khintchine theorem to non-stationary processes,� J. Appl. Phys., vol. 25, p. 802, 1954.

[142] S. Grasssin and R. Garello, �Spectral analysis of the swell using the re-assigned Wigner�Ville Representation,� Proc. IEEE Conf. Oceans'96, pp.1539�1544, Fort Lauderdale, FL, Sep. 1996.

[143] M. Born and P. Jordan, �Zur Quantenmechanik,� Z. Phys., vol. 34, pp. 858�888, 1925.

[144] H. Choi and W. J. Williams, � Improved time�frequency representation ofmulticomponent signals using exponential kernels,� IEEE Trans. Acoust.,Speech, Signal Process., vol. 37, no. 6, pp. 862�871, June. 1989.

[145] J. Jeong andW.J.William, "Alias�free generalized discrete�time time�frequencydistributions," IEEE Trans. Signal Process., vol. 40, pp. 2757�2765, Nov.1992.

[146] A. Papandreou and G. F. Boudreaux�Bartels, �Distributions for time�frequencyanalysis: A generalization of Choi�Williams and the Butterworth distribu-tions,� Proc. IEEE ICASSP, vol. 5, pp.181�184, 1992.

[147] A. Papandreou�Suppappola, �Generalized time�shift covariant quadratic time�frequency representations with arbitrary group delays,� Proc. 29th AsilomarConf. Signals, Systems and Computers, Paci�c Grove, CA, pp. 553�557,Oct. 1995.

[148] A. Papandreou, F. Hlawatsch, and G. F. Boudreaux�Bartels, �Quadratic time�frequency representations with scale covariance and generalized time�shift

153

covariance: a uni�ed framework for the af�ne, hyperbolic, and power classes,�Digital Signal Process. a Rev. J., 8, 3�48, Jan. 1998.

[149] A. Papandreou and G. F. Boudreaux�Bartels, �The Exponential class andGeneralized time�shift covariant quadratic time�frequency representations,�Proc. IEEE�SP Intl. Symposium on Time�Frequency and Time�Scale analy-sis, Paris, France, pp.429�432, Jun. 1996.

[150] A. Papandreou, F. Hlawatsch, and G. F. Boudreaux�Bartels, �A uni�ed frame-work for the Scale covariant Af�ne, Hyperbolic, and Power class Time�Frequenc Representations Using Generalized Time�Shifts,� Proc. IEEE ICASSP,Detroit, MI, May 1995.

[151] A. Papandreou�Suppappola, "New Classes of Quadratic Time�FrequencyRepresentations with Scale Covariance and Generalized Time�Shift Covari-ance: Analysis, detection, and estimation", Ph.D. thesis, University of RhodeIsland, Kingston, RI, 1995.

[152] A. Papandreou, F. Hlawatsch, and G. F. Boudreaux�Bartels, �The hyperbolicclass of Quadratic time�frequency representations, Part I. Constant Q warp-ing, the hyperbolic paradigm, properties and members,� IEEE Trans. SignalProcess., Special issue on wavelets and signal processing, 41, 3425�3444,Dec. 1993.

[153] F. Hlawatsch, A. Papandreou, and G. F. Boudreaux�Bartels, " The Powerclasses of Quadratic time�frequency representations: A Generalization ofthe Af�ne and Hyperbolic Classes," Proc. 27th Asilomer Conf. on Signals,Systems and Computers, Paci�c Grove, CA, pp. 1265�1270, Nov. 1993.

[154] F. Hlawatsch, A. Papandreou, and G. F. Boudreaux�Bartels, " The Powerclasses � quadratic time�frequency representations with scale covarianceand dispersive time�shift covariance," IEEE Trans. Signal Process., 47, pp.3067�3083, Nov. 1999.

[155] A. Papandreou�Suppappola, R.L. Murray, B.G. Iem, and G. F. Boudreaux�Bartels, �Group delay shift covariant quadratic time�frequency representa-tions," IEEE Trans. Signal Process., 49, pp. 2549�2564, Nov. 2001.

[156] A. Papandreou�Suppappola, "Time�Frequency Representations covariant tofrequency�dependant time shifts", in Time�Frequency Signal Analysis andProcessing, B. Boashash, Ed., Prentice Hall, New York, 2002.

154

[157] A. Papandreou�Suppappola, B.G. Iem, and G. F. Boudreaux�Bartels, �Time�Frequency symbols for statistical signal processing," in Time�FrequencySignal Analysis and Processing, B. Boashash, Ed., Prentice Hall, New York,2002.

[158] A. Papandreou and G. F. Boudreaux�Bartels, �The Effect of mismatchinganalysis signals and time�frequency representations,� Proc. IEEE�SP Intl.Symposium on Time�Frequency and Time�Scale analysis, Paris, France,pp.149�152, Jun. 1996.

[159] P. Guillemain and P. White, �Wavelet transform for the analysis of dispersivesystems,� Proc. IEEE UK Symposium on Applications of Time�Frequencyand Time�Scale Methods, University of Warwick, Coventry, UK, pp. 32�39,Aug. 1995.

[160] M.J. Freeman, M.E. Dunham, and S. Qian, �Trans�ionospheric Signal De-tection by Time�Scale Representation,� Proc. IEEE UK Symposium on Ap-plications of Time�Frequency and Time�Scale Methods, University of War-wick, Coventry, UK, pp. 152�158, Aug. 1995.

[161] D.E. Newland, �Time�Frequency and Time�Scale analysis by harmonic wavelets,�in Signal analysis and Prediction, A. Prochazka, Ed., Birkhauser, Boston,Chap. 1, 1998.

[162] J.P. Sessarego, J. Sageloli, P.Flandrin, and M. Zakharia,"Time�FrequencyWigner�Ville analysis of echoes scattered by a spherical shell,"in Wavelets,Time�Frequency Methods and Phase Space, J.M. Combes, A. Grossman,and P. T chamitchian, Eds., Springer�Verlag, Heidelberg, pp. 147�153, 1989.

[163] P.M. Morse and H. Feshbach, Methods of Theoretical Physics, McGraw�Hill, New York, 1953.

[164] V. Szekely, Distributed RC networks, in The Circuits and Filters Handbook,W.K. Chen, Ed., CRC Press/IEEE Press, Boca Raton, FL, pp. 1203�1221,1995.

[165] A. Papandreou and L.T. Antonelli, � Use of quadratic time�frequency rep-resentations to analyze Cetacean mammal sounds,� Technical rep. 11, 284,Naval Undersea Warfare Centre, Newport, RI, Dec. 2001.

155

[166] A.H. Costa and G. F. Boudreaux�Bartels, �Design of time�frequency repre-sentations using a multiform, tiltable exponential kernel,� IEEE Trans. SignalProcess., 43, pp. 2283�2301, Oct. 1995.

[167] A. Papandreou�Suppappola, F. Hlawatsch, and G. F. Boudreaux�Bartels,�Power class Time�Frequency Representations: Interference geometry, Smooth-ing, and Implementation,� Proc. IEEE�SP Intl. Symposium on Time�Frequencyand Time�Scale analysis, Paris, France, pp.193�196, Jun. 1996.

[168] A. Papandreou and G. F. Boudreaux�Bartels, �Distortion that occurs whenthe signal group delay does not match the Time�Shift Covariance of a Time�Frequency Representation,� Proc. 30th Annual Conf. on Information Sci-ences and Systems, Princeton, NJ, pp. 520�525, Mar. 1996.

[169] Y. Zhao, L. E. Atlas, and R. J. Marks, �The use of cone�shaped kernels forgeneralized time�frequency representations of nonstationary signals,� IEEETrans. Acoust., Speech, Signal Process., vol. 38, pp. 1084�1091, July 1990.

[170] Sridhar Krishnan, �A New Approach for Estimation of Instantaneous MeanFrequency of a Time�Varying Signal,� EURASIP Journal on Applied SignalProcessing 2005:17, pp. 2848�2855.

[171] V. Sucic and B. Boashash, �Optimisation algorithm for selecting quadratictime�frequency distributions: performance results and calibration,� Proc. 6thInternational Symposium on Signal Processing and Its Applications (ISSPA'01), vol. 1, pp. 331�334, Kuala Lumpur, Malaysia, August 2001.

[172] LJubisa Stankovic, �AMeasure of Some Time�Frequency Distributions Con-centration,� Signal Processing, vol. 81, No. 3, pp. 212�223, Mar. 2001.

[173] S. S. Chen, D.L. Donoho, M.A. Saunders,"Atomic Decomposition by BasisPursuit," SIAM Journal on Scienti�c Computing, Volume 20, Number 1, pp.33�61, 1998.

[174] R. R. Coifman and M. V. Wickerhauser, Entropy�based algorithms for best�basis selection, IEEE Trans. Info. Theory, 38, pp. 713�718, 1992.

[175] S. G. Mallat and Z. Zhang, �Matching pursuits with time�frequency dictio-naries,� IEEE Trans. Signal Process., vol. 41, no. 12, pp. 3397�3415, 1993.

[176] N. E. Huang, Z. Shen, S. R. Long, M. C. Wu, H. H. Shin, Q. Zheng, N. C.Yen, C. C. Tung, and H. H. Liu, �The empirical mode decomposition and

156

the Hilbert spectrum for nonlinear and non�stationary time series analysis,�Proc. R. Soc. Lond. A, Math. Phys. Sci., vol. 454, no. 1971, pp. 903�995,Mar. 1998.

[177] A. O. Boudraa, J. C. Cexus, F. Salzenstein, and L. Guillon, �IF estimationusing empirical mode decomposition and nonlinear Teager energy operator,�Proc. IEEE ISCCSP, Hammamet, Tunisia, pp. 45�48, 2004.

[178] J. C. Cexus and A. O. Boudraa, �Nonstationary signals analysis by Teager�Huang transform (THT),� Proc. EUSIPCO, Florence, Italy, 5 p, 2006.

[179] A. O. Boudraa, and J. C. Cexus, �EMD�Based Signal Filtering,� IEEE Trans.Instumentation andMeasurement, Vol. 56, No. 6, pp. 2196�2202, Dec. 2007.

[180] J. C. Cexus and A. O. Boudraa, �Teager�Huang analysis applied to sonartarget recognition,� Int. J. Signal Process., vol. 1, no. 1, pp. 23�27, 2004.

[181] A. O. Boudraa, J. C. Cexus, and Z. Saidi, �EMD�based signal noise reduc-tion,� Int. J. Signal Process., vol. 1, no. 1, pp. 33�37, 2004.

[182] Z. Liu and S. Peng, �Boundary processing of bidimensional EMD using tex-ture synthesis,� IEEE Signal Process. Lett., vol. 12, no. 1, pp. 33�36, Jan.2005.

[183] A. O. Boudraa, J. C. Cexus, F. Salzenstein, and A. Beghdadi, �EMD�basedmultibeam echosounder images segmentation,� Proc. IEEE ISCCSP, Mar-rakech, Morocco, 2006.

[184] K. Zeng and M. X. He, �A simple boundary process technique for empiricalmode decomposition,� Proc. IEEE IGARSS, vol. 6, pp. 4258�4261, 2004.

[185] P. Flandrin, P. Goncalves, and G. Rilling, �Detrending and denoising withempirical mode decompositions,� Proc. EUSIPCO, Vienna, Austria, pp. 1581�1584, 2004.

[186] P. Flandrin and P. Gonçalves, �Empirical mode decompositions as a data�driven wavelet�like expansions,� Int. J. Wavelets, Multires., Inf. Process.,vol. 2, no. 4, pp. 477�496, 2004.

[187] G. Rilling, P. Flandrin, and P. Goncalves, �Empirical mode decomposition,fractional Gaussian noise, and Hurst exponent estimation,� Proc. IEEE ICASSP,Philadelphia, PA, vol. 4, pp. 489�492, 2005.

157

[188] R. Deering and J. F. Kaiser, �The use of a masking signal to improve empir-ical mode decomposition,� Proc. IEEE ICASSP, Philadelphia, USA, vol. 4,pp. 485�488, 2005.

[189] S. Benramdane, J. C. Cexus, A. O. Boudraa, and J. A. Astol�, �Transientturbulent pressure signal processing using empirical mode decomposition,�Proc. Phys. Signal Image Process., Mulhouse, France, 2007.

[190] P. Flandrin, G. Rilling, and P. Goncalves, �Empirical mode decomposition asa �lter bank,� IEEE Signal Process. Lett., vol. 11, no. 2, pp. 112�114, Feb.2004.

[191] Z. Wu and N. E. Huang, �A study of the characteristics of white noise usingthe empirical mode decomposition method,� Proc. R. Soc. Lond. A, Math.Phys. Sci., vol. 460, no. 2046, pp. 1597�1611, Jun. 2004.

[192] Flandrin, P., P. Goncalves and G. Rilling, "EMD equivalent �lter banks, frominterpretation to applications," Introduction to Hilbert�Huang Transform andits Applications, Ed. N. E. Huang and S. S. P. Shen, pp. 57�74. World Sci-enti�c, New Jersey, 2005.

[193] G. Rilling and P. Flandrin, "One or Two Frequencies? The Empirical ModeDecomposition Answers," IEEE Trans. Signal Process., Vol. 56, No. 1, pp.85�95, Jan. 2008.

[194] K. T. Coughlin and K. K. Tung, �11�year solar cycle in the stratosphereextracted by the Empirical Mode Decomposition method,� Adv. Space Res.,vol. 34, pp. 323�329, 2004.

[195] M. Chavez, C. Adam, V. Navarro, S. Boccaletti, and J. Martinerie, �On theintrinsic time scales involved in synchronization: A data�driven approach,�Chaos: An Interdisciplinary J. Nonlin. Sci., vol. 15, no. 2, pp. 023904�023904, 2005.

[196] A. Aïssa�El�Bey, K. Abed�Meraim, and Y. Grenier, �Underdetermined blindaudio source separation using modal decomposition,� EURASIP J. Audio,Speech, Music Process., vol. 2007, pp. 15�15, 2007.

[197] J. Shore and R. Johnson, �Axiomatic derivation of the principle of maxi-mum entropy and the principle of minimum cross�entropy,� IEEE Trans.Info. Theory, vol. 26, no. 1, pp. 26�37, 1980.

158

[198] J. Shore and R. Johnson, �Properties of cross�entropy minimization,� IEEETrans. Info. Theory, vol. 27, no. 4, pp. 472� 482, 1981.

[199] M. G. Amin and W. J. Williams, �High spectral Resolution Time�FrequencyKernels,� IEEE Trans. Signal Process., vol. 46, no. 10, pp. 2796� 2804, 1998.

[200] B. Boashash, B. Lovell, and H.Whitehouse, �High resolution time frequencysignal analysis by parametric modeling of the Wigner�Ville distribution,�Proc. ISSPA, Brisbane, Australia, Aug. 1987.

[201] P. Ramamoorthy, V. Iyer, and Y. Ploysongsang, �Autoregressive modeling ofthe Wigner spectrum,� Proc. IEEE ICASSP., Dallas, TX, Apr. 1987.

[202] L. J. Rudin, S. Osher, and E. Fatemi," Nonlinear total�variation�based noiseremoval algorithms," Phys. D, 60 , pp. 259�268, 1992.

[203] E. F. Velez and R. G. Absher, �Smoothed Wigner�Ville parametric modelingfor the analysis of nonstationary signals,� Proc. IEEE Int. Symp. CircuitsSyst., May 1989, pp. 507�510.

[204] _______, �Wigner half kernel modeling,� Signal Process., vol. 26, no. 2, pp.162�175, Feb. 1992.

[205] R. Kumaresan, �On the zeros of the linear prediction error �lter for deter-ministic signals,� IEEE Trans. Signal Process., vol. ASSP�31, pp. 217�220,Feb. 1983.

[206] S. L. Marple, Jr., Digital Spectral Analysis with Applications. EnglewoodCliffs, NJ: Prentice�Hall, 1987, ch. 11.

[207] Y. Zhang, M. G. Amin and G. J. Frazer, �High�resolution time�frequencydistributions for maneuvering target detection in over�the�horizon radars,�IEE Proc. Radar sonar Navig., vol. 150, no. 4, pp. 299� 304, Aug. 2003.

[208] B. Barkat and B. Boashash, � A High�resolution Quadratic time�frequencydistribution forMulticomponent signals analysis,� IEEE Trans. Signal Process.,vol. 49, no. 10, pp. 2232� 2239, Oct. 2001.

[209] A. Papandreou�Suppappola, "Applications in time�frequency signal process-ing," CRC Press LLC, 2003.

159

[210] H. G. Feichtinger and T. Strohmer, Eds., Gabor Analysis and Algorithm:Theory an Applications, Springer, 1998.

[211] H. G. Feichtinger and T. Strohmer, Eds., Advances in Gabor Analysis, Birkhäuser,2001.

[212] S. Qian and D. Chen, Decompostion of the Wigner distribution and time�frequency distribution series, IEEE Trans. Signal Process., 42, pp. 2836�2842, Oct. 1994.

[213] S. Qian and D. Chen, "Signal representation using adaptive normalized Gaussianfunctions," Signal Processing, 36 , pp. 1�11, 1994.

[214] L. F. Villemoes, "Best approximation with Walsh atoms," Constr. Approx.,13, pp. 329�355, 1997.

[215] A. Bultan, A four�parameter atomic decomposition of chirplets, IEEE Trans.Signal Process., 47, pp 731�745, March 1999.

[216] M. R. McClure and L. Carin, Matching pursuits with a wave�based dictio-nary, IEEE Trans. Signal Process., 45, pp. 2912�2927, Dec. 1997.

[217] I. Sha�, J. Ahmad, S. I. Shah, and F.M. Kashif, �Evolutionary time�frequencydistributions using Bayesian regularised neural network model�, IET SignalProcess., vol. 1, no. 2, pp. 97�106, June 2007.

[218] M.T. Hagan, H.B. Demuth &M. Beale, �Neural Network Design�, ThomsonLearning USA, 1996.

[219] Chauvin, Y., & Rumelhart, D.E., �Back propagation: Theory, Architecture,and Applications�, Lawrence Erlbaum Associates, Publisher UK, 1995.

[220] MacKay, D.J.C.,�A Practical Bayesian Framework for Back propagation Net-work�, Neural Computation, vol. 4, no. 3, pp. 448�472, 1992.

[221] M. Reidmiller and H. Broun, �A direct adaptive method for faster back prop-agation learning: The RPROP algorithm� Proc. IEEE Int. Conf. on ANN(ICNN) San Francisco, pp. 586�591, 1993.

[222] S. Pei and J. Ding, �Relations between Gabor Transforms and FractionalFourier Transforms and their applications for signal processing�.IEEE Trans.Signal Process., vol. 55, no. 10, pp. 4839�4850, Oct. 2007.

160

[223] http://en.wikipedia.org/wiki/Data_clustering.

[224] Sha�, I., Ahmad, J., Shah, S.I., & Kashif, F.M., �Impact of varying Neuronsand Hidden layers in Neural Network Architecture for a Time Frequency Ap-plication�, Proc. 10th IEEE Intl. Multi topic Conf., INMIC 2006, Islamabad,Pakistan, 23�24 Dec. 2006.

[225] I. Sha�, J. Ahmad, S.I. Shah, FM. Kashif, � Time Frequency Distributionusing Neural Networks�, Proc.IEEE Intl. Conf. on Emerging Technologies,pp. 32�35, Pakistan, 2005.

[226] Ahmad, J., Sha�, I., Shah, S.I., & Kashif, F.M., �Analysis and Comparison ofNeural Network Training Algorithms for the Joint Time�Frequency Analy-sis�, Proc. IASTED Intl. Conf on Arti�cial Intelligence and application, pp.193�198, Austria, Feb 2006.

[227] S.I. Shah, I. Sha�, J. Ahmad, and F. M. Kashif,�Multiple Neural Networksover Clustered Data (MNCD) to Obtain Instantaneous Frequencies (IFs)�,Proc. IEEE Intl. Conf. on Information and Emerging Technologies (ICIET),pp. 1�6 , 6�7 July 2007, Karachi, Pakistan.

[228] I. Sha�, J. Ahmad, S. I. Shah, and F. M. Kashif, �Computing De�blurredTime Frequency Distributions using Arti�cial Neural Networks�, Circuits,Systems, and Signal Processing, Birkhäuser Boston, Springer Verlag, vol.27, no. 3, pp. 277�294, Jun 2008.

[229] M. B. Priestley. �Evolutionary spectra and nonstationary processes.� J .RoyalStat. Soc. B, vol. 27, no. 2, pp. 204�237, 1965.

[230] ��, Spectral Analysis and Time Series. London: Academic, 1981.

[231] ��, Nonlinear and Non�stationary Time Series Analysis. Lon-don: Academic, 1988.

[232] Ci. Melard and A. Herteler de Schutter, �Contributions to evolutionary spec-tral theory,� J. Time Series Anal., vol. 10, no. 1 , pp. 41�63, 1989.

[233] A. M. Yaglom, An Introduction to the Theory of Stationary Random Func-tions. Englewood Cliffs, NJ: Prentice�Hall, 1962.

161

[234] C. S. Detka, A. El�Jaroudi, and L. F. Chaparro,"Relating the bilinear distri-bution and the evolutionary spectrum," Proc. IEEE ICASSP, pp. 496�499,vol. 4, 1993.

[235] Y. Grenier, �Time�dependent ARMA modeling of nonstationary signals,�IEEE Trans. Acoust., Speech Signal Process., vol. ASSP�31,no. 4, pp. 899�911, Aug. 1983.

[236] T. Subba Rao, �The �tting of non�stationary time�series models with time�dependent parameters,� J. Royal Stat. Soc., B, vol. 32, no. 2, pp. 312�322,1970.

[237] M. Kahn, L. F. Chaparro, and E. W. Kamen, �Frequency analysis of nonsta-tionary signal models,� Proc. Conf. Inform. Sci. Syst., pp. 617�622, (Balti-more), Mar. 1989.

[238] S. M. Kay, Modern Spectral Estimation: Theory and Application. Engle-wood Cliffs, NJ: Prentice�Hall 1988.

[239] J. Capon, �High�resolution frequency�wavenumber spectrum analysis,� Proc.IEEE, vol. 57, no. 8, pp 1408�1419, Aug. 1969.

[240] D. J. Thomson.,"Spectrum estimation and harmonic analysis," Proc. IEEE,70:1055�1096, 1982.

[241] D. Thomson and A. Chave. Jackknifed error estimates for spectra, coher-ences, and transfer functions. S. Haykin, (ed.), Advances in Spectrum Analy-sis and Array Processing, Vol. 1, 58�113. Prentice�Hall, 1991.

[242] Pitton, JamesW., �Positive Time�Frequency Distributions via Quadratic Pro-gramming�, Journal of Multidimensional Systems and Signal Processing,SpringerLink, Vol. 9, No. 4, pp. 439�445, October 1998.

[243] Emresoy, M.K., Loughlin, P.J., "Weighted Least Square Cohen�Posch Time�Frequency Distribution Functions", IEEE Trans. Signal Process., Vol. 46,No. 3, pp. 753�757, 1998.

[244] Groutage, D., A Fast Algorithm for Computing Minimum Cross�EntropyPositive Time�Frequency Distributions, IEEE Trans. Signal Process., Vol.45, No. 8, pp. 1954�1970, 1997.

162

[245] Shah, S.I., Loughlin, P.J., Chaparro, L.F., El�Jaroudi, A., Informative Priorsfor Minimum Cross�Entropy Positive Time Frequency Distributions, IEEESignal Process. Lett., Vol. 4, No. 6, pp. 176�177, 1997.

[246] P. Loughlin, J. Pitton, and L. Atlas,"Construction of positive time�frequencydistributions,". IEEE Trans. Signal Process., 42(10):2697�2705, 1994.

[247] P. Loughlin, J. Pitton, and B. Hannaford,"Approximating time�frequencydensity functions via optimal combinations of spectrograms," IEEE SignalProcess. Lett., 1(12):199�202, 1994.

[248] J. Pitton,"An algorithm for weighted least squares positive time�frequencydistributions," In SPIE Advanced Sig. Proc. Algs. Archs., Impl. VII, volume3162, 1997.

[249] J. Pitton," Linear and quadratic methods for positive time�frequency distrib-utions," In IEEE ICASSP, vol. V, pp. 3649�3652, 1997.

[250] J. Pitton, L. Atlas, and P. Loughlin,"Applications of positive time�frequencydistributions to speech processing," IEEE Trans. Sp. Audio Proc., 2(4):554�566, 1994.

[251] S. Stankovic, LJ. Stankovic "Introducing time-frequency distribution witha �complex-time� argument�, Electronics Letters, Vol.32, No.14, pp.1265-1267, July 1996.

[252] L. Stankovic, �Time�frequency distributions with complex argument,� IEEETrans. Signal Process., vol. 50, no. 3, pp. 475�486, Mar. 2002.

[253] C. Cornu, S. Stankovic, C. Ioana, A. Quinquis, LJ. Stankovic, �GeneralizedRepresentations of Phase Derivatives for Regular Signals�, IEEE Trans. Sig-nal Process., Vol. 55, No. 10, pp. 4831�4838, Oct. 2007.

[254] P. O'Shea, �A new technique for instantaneous frequency rate estimation,� inIEEE Signal Process. Lett., vol. 9, no. 8, pp. 251�252, 2002.

[255] I. Sha�, J. Ahmad, S. I. Shah, and F. M. Kashif, "Techniques to obtaingood resolution and concentrated time�frequency distributions�a review",EURASIP Journal on Advances in Signal Processing, Volume 2009 (2009),Article ID 673539, 43 pages.

163

Appendix AANN Fundamentals

A novel ANN based method is presented in Chapters 3 and 4 to compute de�

blurred TFDs [217, 228]. Fig. A.1 is the general block form representation of the

method. The resultant TFDs are highly concentrated, better in resolution, and free of

CTs, thus can be used for STSC' analysis. The method employs Bayesian regulariza-

tion during training phase of the ANN to obtain energy concentration along the IF of

individual components for unknown blurred TFDs. De�blurring TFDs is particularly

suited for learning [218] by an ANN for the following reasons [109, 111]:

1. There is little information available on the source of blurring.

2. Usually blurring is the result of combination of events, which makes it too

complex to be mathematically described.

3. Suf�cient data is available and it is conceivable that data captures the fundamental

principle at work.

The important theoretical aspects of an ANN setup are discussed next. They

are necessary for a clear understanding of the proposed ANN based multi�processes

framework covered in the Chapters to follow.

164

Multiple BRNNsTraining & NENNsSelection

Test TFDsPre Processing

Training TFDs Correlation &Clusters Formation

Vectorization

Pre Processing

Resultant TFDsOutputData

PostProcessing

Figure A.1: The �ow diagram of the neural network based method.

165

A.1 Brain Vs ANN

The brain is a very ef�cient tool. Having about much slower slower response time

than computer chips but it beats the computer in complex tasks, such as image and

sound recognition and many others. It is extremely ef�cient than the computer chip

for energy consumption per operation. An ANN is an information processing paradigm

that is inspired by the way, the brain process information [108]. The key element of this

paradigm is the novel structure of the information processing system. It is composed

of a large number of highly interconnected processing elements (neurons) working in

unison to solve speci�c problems. ANNs, like people, learn by example. An ANN is

con�gured for a speci�c application, such as pattern recognition or data classi�cation,

through a learning process. Learning in biological systems involves adjustments to

the synaptic connections that exist between the neurons. This is true of ANNs as well

[218].

A.2 Human Vs Arti�cial Neuron

A typical human neuron collects signals from others through a host of �ne structures

called dendrites. The neuron sends out spikes of electrical activity through a long, thin

stand known as an axon, which splits into thousands of branches. At the end of each

branch, a structure called a synapse converts the activity from the axon into electrical

effects that inhibit or excite activity from the axon into electrical effects that inhibit or

excite activity in the connected neurons. When a neuron receives excitatory input that

166

(a) (b)

Figure A.2: (a) Human's neuron (b) Arti�cial neuron

is suf�ciently large compared with its inhibitory input, it sends a spike of electrical

activity down its axon. Learning occurs by changing the effectiveness of the synapses

so that the in�uence of one neuron on another changes [105, 106].

The essential features of human's neurons and their interconnections are esti-

mated. A computer is then typically programmed to simulate these features. How-

ever because the knowledge about neurons is still incomplete with limited computing

power, these models are necessarily gross idealizations of real networks of neurons. A

model of human's neuron vs arti�cial neuron is presented in Fig. A.2.

A.3 ANN Layers

The commonest type of ANN consists of three groups, or layers, of units: a layer

of "input" units is connected to a layer of "hidden" units, which is connected to a

layer of "output" units [105, 106]. The activity of the input units represents the raw

167

information that is fed into the network. The activity of each hidden unit is determined

by the activities of the input units and the weights on the connections between the input

and the hidden units. The behavior of the output units depends on the activity of the

hidden units and the weights between the hidden and output units.

This simple type of network is interesting because the hidden units are free to

construct their own representations of the input. The weights between the input and

hidden units determine when each hidden unit is active, and so by modifying these

weights, a hidden unit can choose what it represents. We also distinguish single�

layer and multi�layer architectures. The single�layer organization, in which all units

are connected to one another, constitutes the most general case and is of more poten-

tial computational power than hierarchically structured multi�layer organizations. In

multi�layer networks, units are often numbered by layer, instead of following a global

numbering.

The most widely used ANNs' architecture has been the multiple layer percep-

tron, trained with the back propagation error learning algorithm. However, it suffers

from fundamental problems like convergence time, local minima and absence of a sim-

ple rule to obtain the right number of neurons and hidden layers.

A.4 Weights and Error Adjustment

In order to train an ANN to perform some task, the weights of each unit are adjusted in

such a way that the error between the desired output and the actual output is reduced

168

[105, 106, 218]. This process requires that the ANN computes the error derivative

of the weights represented by EW . In other words, it must calculate how the error

changes as each weight is increased or decreased slightly. The back propagation algo-

rithm is the most widely used method for determining the EW .

The back propagation algorithm is easiest to understand if all the units in the

network are linear. The algorithm computes each EW by �rst computing the EA, the

rate at which the error changes as the activity level of a unit is changed. For output

units, the EA is simply the difference between the actual and the desired output. To

compute the EA for a hidden unit in the layer just before the output layer, we �rst

identify all the weights between that hidden unit and the output units to which it is

connected. We then multiply those weights by the EAs of those output units and add

the products. This sum equals the EA for the chosen hidden unit. After calculating

all the EAs in the hidden layer just before the output layer, we can compute in like

fashion the EAs for other layers, moving from layer to layer in a direction opposite to

the way activities propagate through the network. This is what gives back propagation

its name. Once the EA has been computed for a unit, it is straight forward to compute

the EW for each incoming connection of the unit. The EW is the product of the EA

and the activity through the incoming connection.

A.4.1 Back propagation Algorithm

The back propagation algorithm consists of four steps [105, 106, 218]:

169

1. Compute how fast the error changes as the activity of an output unit is changed.

This error derivative de�ned by symbolEA is the difference between the actual and

the desired activity.

EAj =@E

@yj= yj � dj (A.1)

2. Compute how fast the error changes as the total input received by an output

unit is changed. This quantity de�ned by the symbol EI is the answer from step

1 multiplied by the rate at which the output of a unit changes as its total input is

changed.

EIj =@E

@xj=@E

@yj� @yj@xj

= EAjyj (1� yj) (A.2)

3. Compute how fast the error changes as a weight on the connection into an output

unit is changed. This quantity de�ned by the symbol EW is the answer from step

2 multiplied by the activity level of the unit from which the connection emanates.

EWij =@E

@Wij

=@E

@xj� @xj@Wij

= EIjyi (A.3)

4. Compute how fast the error changes as the activity of a unit in the previous layer

is changed. This crucial step allows back propagation to be applied to multilayer

networks. When the activity of a unit in the previous layer changes, it affects the

170

activites of all the output units to which it is connected. So to compute the overall

effect on the error, we add together all these seperate effects on output units. But

each effect is simple to calculate. It is the answer in step 2 multiplied by the weight

on the connection to that output unit.

EAi =@E

@yj=Xj

@E

@xj� @xj@yi

=Xj

EIjWij (A.4)

By using steps 2 and 4, the EAs of one layer of units are converted into EAs

for the previous layer. This procedure can be repeated to get the EAs for as many

previous layers as desired. Once the EA of a unit is known, steps 2 and 3 can be used

to compute the EW s on its incoming connections.

A.5 Learning Algorithms

The brain learns from experience. ANNs are sometimes called machine learning al-

gorithms, because changing of its connection weights (training) causes the network

to learn the solution to a problem. The strength of connection between the neurons is

stored as a weight�value for the speci�c connection. The system learns new knowledge

by adjusting these connection weights. The learning ability of an ANN is determined

by its architecture and by the algorithmic method chosen for training. In the succeed-

ing paragraph a brief description of the most popular ANN training algorithms based

171

on back propagation is presented. To compute the de�blurred TFDs, the comparison

and selection of the best training algorithm is made in Section 3.2 of Chapter 3.

A.5.1 The Lavenberg�Marquardt back propagation training algorithm

The LMB algorithm is a variation of Newton's method [218, 219] that was designed

for minimizing functions that are sums of squares of other nonlinear functions. This is

very well suited to ANN training where the performance index is the MSE. Newton's

method approximates to Gauss�Newton method and after a number of substitutions

transforms to the LMB algorithm:

xk+1 = xk ��T (xk)�(xk)�kI

��1�T (xk)�(xk) (A.5)

where x;�(x); �; I;and � are learning vectors, Jacobian matrix, nonlinear functions,

identity matrix and step size, respectively. This algorithm has the very useful feature

that as �k is increased it approaches the steepest descent algorithm with small learning

rate:

xk+1 ' xk �1

�k�T (xk)�(xk) (A.6)

= xk �1

2�k�F (x); for large �k

whereas as �k is decreased to zero the algorithm becomes Gauss�Newton. Here we

assume that F (x) is a sum of squares function:

172

F (x) = �T (x)�(x) (A.7)

The algorithm begins with �k set to some small value (e.g.,�k = 0:01). If a step

does not yield a smaller value for F (x), then the step is repeated with multiplied by

some factor � > 1 (e.g., � = 10). Eventually F (x) should decrease, since we would

be taking a small step in the direction of steepest descent. If a step does reduce a

smaller value for F (x), then is divided by � for the next step, so that the algorithm will

approach Gauss�Newton, which should provide faster convergence. The algorithm

provides a nice compromise between the speed of Newton's method and the guaranteed

convergence of steepest descent.

A.5.2 The Powell�Beale conjugate gradient back propagation train-ing algorithm

The Powell�Beale conjugate gradient back propagation (PBCGB) training algorithm

can train any network as long as its weight, net input, and transfer functions have deriv-

ative functions [218]. Back propagation is used to calculate derivatives of performance

with respect to the weight and bias variablesX . Each variable is adjusted according to

the following:

X = X + �� dX (A.8)

where dX is the search direction. The parameter � is selected to minimize the perfor-

mance along the search direction. The line search function is used to locate the min-

173

imum point. The �rst search direction is the negative of the gradient of performance.

In succeeding iterations the search direction is computed from the new gradient and

the previous search direction according to the formula:

dX = ��X + dXold � Z (A.9)

Where �X is the gradient. The parameter Z can be computed in several different

ways. The Powell�Beale variation of conjugate gradient is distinguished by two fea-

tures. First, the algorithm uses a test to determine when to reset the search direction to

the negative of the gradient. Second, the search direction is computed from the neg-

ative gradient, the previous search direction, and the last search direction before the

previous reset.

A.5.3 The Gradient descent with adaptive learning rate back propa-gation training algorithm

In the Gradient descent with adaptive learning rate back propagation (GDALB) train-

ing algorithm, back propagation is used to calculate derivatives of performance dperf

with respect to the weight and bias variables X [218]. Each variable is adjusted ac-

cording to gradient descent:

dX = lr �dperfdX

(A.10)

where lr is the learning rate. Each of epochs, if performance decreases toward the ear-

lier de�ned goal, then the learning rate is increased by the factor lrinc . If performance

174

increases by more than the factormaxperfinc , the learning rate is adjusted by the factor

lrdec and the change, which increased the performance, is not made.

A.5.4 The Resilient propagation back propagation training algorithm

The Resilient propagation back propagation (RPB) training algorithm is ef�cient new

learning scheme that performs a direct adaptation of the weight step based on local

gradient information [218, 221]. In crucial difference to previously developed adapta-

tion techniques, the effort of adaptation is not blurred by gradient behavior whatsoever.

Back propagation is used to calculate derivatives of performance with respect to the

weight and bias variables X . Each variable is adjusted according to the following:

dX = �X � sign(�X) (A.11)

where the elements of �X are all initialized to �0 and �X is the gradient. At each

iteration the elements of �X are modi�ed. If an element of �X changes sign from

one iteration to the next, then the corresponding element of �X is decreased by �dec.

If an element of �X maintains the same sign from one iteration to the next, then the

corresponding element of �X is increased by �inc. During training it was found that

there is a requirement to increase number of neurons from 10 to 20 in the hidden layer

for convergence of algorithm.

175

A.6 Bayesian Regularisation

The approach involves modifying the usually used objective function, such as the mean

sum of squared network errors [220].

mse =1

N

KXk=1

(ek)2 (A.12)

wheremse; ek; andN represent MSE, network error and network errors' taps for aver-

aging respectively. It is possible to improve generalization if the performance function

is modi�ed by adding a term that consists of the mean of the sum of squares of the

network weights and biases

msereg = mse+ (1� )msw (A.13)

where ;msereg; andmsw are the performance ratio, performance function and mean

of the sum of squares of network weights and biases, respectively. msw is mathemat-

ically described as under:

msw =1

n

nXj=1

(wj)2 (A.14)

using this performance function causes the network to have smaller weights and biases,

and this force the network response to be smoother and less likely to over �t. More-

over it is desirable to determine the optimal regularization parameters in an automated

fashion. One approach to this process is the Bayesian framework of David Mackay

[220]. In this framework, the weights and biases of the network are assumed to be ran-

176

dom variables with speci�ed distributions. The regularization parameters are related to

the unknown variances associated with these distributions. Statistical techniques can

then be used to estimate these parameters.

177

Appendix BList of Publications

B.1 Journal Publications

B.1.1 Published

1. "Techniques to obtain good resolution and concentrated time�frequency

distributions�a review", EURASIP Journal on Advances in Signal Processing,

Volume 2009 (2009), Article ID 673539, 43 pages.

2. �Computing De�blurred Time Frequency Distributions using Arti�cial Neural

Networks�, Circuits, Systems, and Signal Processing, Birkhäuser Boston,

Springer Verlag, Volume 27, no. 3, pp. 277�294, Jun 2008.

3. �Evolutionary time�frequency distributions using Bayesian regularised neural

network model�, IET Signal Process., Volume 1, no. 2, pp. 97�106, June 2007.

B.2 Conference Publications

1. "Quantitative evaluation of concentrated Time Frequency Distributions", accepted

for publication in Proc. IEEE EUSIPCO, 24-26 Aug 2009, Glasgow, UK.

178

2. �Neural Network Solution for Compensating Distortions of Time Frequency

Representations�, Proc. IEEE Intl. Conf. on Signal Process. & Comm., pp.

1575�1578, 24�27 Nov 2007, Dubai, UAE.

3. �Multiple Neural Networks over Clustered Data (MNCD) to Obtain Instantaneous

Frequencies (IFs)�, Proc. IEEE Intl. Conf. on Information and Emerging

Technologies (ICIET), pp. 1�6 , 6�7 July 2007, Karachi, Pakistan.

4. �Impact of Varying Neurons and Hidden Layers in Neural Network Architecture

for a Time Frequency Application�, Proc. 10th IEEE Intl. Multi Topic Conf

(INMIC 2006), pp. 188�193, 23�24 Dec 2006, Islamabad, Pakistan.

5. �Time Frequency Image analysis using Neural Networks�, Proc. IMACS

Multi�conference on Computational Engineering in Systems Applications

(CESA), vol. 1, pp 315�320, 4�6 Oct. 2006, Beijing, China.

6. "Analysis and Comparison of Neural Network Training Algorithms for the Joint

Time�Frequency Analysis�, Proc. IASTED Intl. Conf. on Arti�cial Intelligence

and application, pp. 193�198, Austria, Feb 2006.

7. �Time Frequency Distribution using Neural Networks�, Proc. IEEE Intl. Conf.

on Emerging Technologies (ICET), pp. 32�35, Pakistan, 2005.

imran sha - prr.hec.gov.pkprr.hec.gov.pk/jspui/bitstream/123456789/837/1/658s.pdf · allah subhanhu...

Documents