advanced methods of biomedical signal processing (cerutti/advanced) || molecular bioengineering and...

14

Click here to load reader

Upload: carlo

Post on 17-Feb-2017

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Advanced Methods of Biomedical Signal Processing (Cerutti/Advanced) || Molecular Bioengineering and Nanobioscience: Data Analysis and Processing Methods

part viINFORMATION PROCESSING OF MOLECULAR BIOLOGYDATA

Advanced Methods of Biomédical Signal Processing. Edited by S. Cerutti and C. MarchesiCopyright © 2011 the Institute of Electrical and Electronics Engineers, Inc.

Page 2: Advanced Methods of Biomedical Signal Processing (Cerutti/Advanced) || Molecular Bioengineering and Nanobioscience: Data Analysis and Processing Methods

¡£«ΑΤ»

MOmCULAR

17.1 INTRODUCTION

In the past few years, a growing interest has arisen in using nanotechnolo-gy in biomédical applications, particularly molecular bioengineering, whereas microscale technology is often used in cellular engineering. Con-tributions to nanobioscience and molecular bioengineering are made, at times independently and at times collaboratively, by scientists with vari-ous backgrounds: physicists, chemists, biologists, engineers, mathemati-cians, and computer scientists.

Molecular and subcellular aspects of bioengineering and the related technology at the nanometer scale have been considered extensively. The recent developments in microscopy, measurement methods, nanostructure fabrication, and informatics have fueled this interest in nanometer-scale research. Specifically, atomic force microscopy has led to the ability to measure nanometer-scale surface properties of biological samples such as living cells, DNA, proteins, and biomaterials. Other methods, such as mi-croarray-based analysis and spectroscopy for the characterization of bio-molecular structures, have also achieved significant developments. A body of knowledge is being established that provides a basis for measurement methods and techniques at the nanometer scale.

The developments mentioned above in measurement methods and techniques have been paralleled by fabrication methods and techniques for the fabrication of ordered nanostructures, nanomaterials, and nanodevices for a wide variety of applications. The biological, biotechnological, and

Advanced Methods of Biomédical Signal Processing. Edited by S. Cerutti and C. Marchesi 4 2 9 Copyright © 2011 the Institute ofElectrical and Electronics Engineers, Inc.

17

BIOENGINEERING ANDNANOBIOSCIENCEData Analysis and Processing MethodsCarmelina Ruggiero

Page 3: Advanced Methods of Biomedical Signal Processing (Cerutti/Advanced) || Molecular Bioengineering and Nanobioscience: Data Analysis and Processing Methods

4 3 0 CHAPTER 17 MOLECULAR BIOENGINEERING AND NANOBIOSCIENCE

medical ones are relevant parts of these applications. Such methods in-clude self-assembly, electron-beam lithography, and nanocontact printing. Some related areas in the biomédical field are biosensors for experimental laboratory work, clinical purposes, and environmental applications (Vo-Dinh and Cullum, 2000; Liu et al., 2003; Castillo et al., 2004; Kissinger, 2005; Rodriguez-Mozaz et al., 2005; Mohanty and Kougianos, 2006; Pas-torino et al., 2006a; Erickson et al., 2008; Borisove and Wolbeis, 2008); the fabrication of implants to be inserted in the human body for tissue en-gineering and regenerative medicine; the biocompatibility of materials at the nanoscale level; the design and fabrication of surfaces by assembly of ensembles of molecules; and surface patterning to guide specific interac-tions with molecules and cells (Brown, 2000; Desai, 2000; Evans, 2001; Wang et al., 2001 ; Tyroen-Toth et al., 2001 ; Wilkinson et al., 2002; Sinani et al., 2003; Wang and Lineaweaver, 2003; Belkas et al., 2004; Li et al., 2005; Pastorino et al., 2006b; Soumetz et al., 2008). Most interestingly, cells have been found to be capable of perceiving details on surfaces at the nanoscale; therefore, the nanotopography of surfaces (and, specifically, or-der and symmetry) impacts cell adhesion onto surfaces and cell-cell inter-actions in general (Curtis, 2004; Ruggiero et al., 2005).

The design of molecules, especially in genomics and proteomics, is closely related to the areas described above, both as relates to experimental methods and techniques and as relates to bioinformatics, which plays a key role in the present postgenomic period. Some significant examples are software tools for data mining, workflows, and integrative bioinformatics; methods and techniques for the sequence-structure analysis of proteins; molecular dynamics tools; docking modeling and simulation; and DNA microarray and protein microarray data processing tools.

A further related area is molecular electronics, which focuses on the use of molecules to perform tasks that are at present performed by semi-conductor-based elements in electronic circuits. Molecules have much smaller dimensions than those that can be obtained today by semiconduc-tor technology, and their use would allow reducing dimensions and, there-fore, increasing potential applications, such as computational power.

The convergence of molecular oriented approaches by different sci-entific communities has lead to a need for interdisciplinary and more coher-ent approaches. In this respect, as relates to biomédical data and signal analysis, it may be observed that often methods and techniques that are em-ployed are common to more than one scientific community involved in mol-ecular engineering. Specifically, some methods that are being used for the study of DNA and proteins have also been used for the analysis of data se-quences of various kinds, and methods and techniques used in biomédical signal analysis have been successfully used in genomics and proteomics.

Since the 1980s, Fourier analysis has been applied along with bio-medical signal analysis for the prediction of protein structures, specifically (but not only) for the study of the hydrophobicity of sequences. At the same time, work based on Fourier analysis has been carried out for DNA

Page 4: Advanced Methods of Biomedical Signal Processing (Cerutti/Advanced) || Molecular Bioengineering and Nanobioscience: Data Analysis and Processing Methods

17.2 DATA ANALYSIS AND PROCESSING METHODS FOR GENOMICS 4 3 1

sequencing, both for the identification of structural patterns in the double helix and for singling out variations in the double helix in relation to spe-cific nucleotides. Since then, Fourier-analysis-based methods have been successfully used for the analysis of various aspects of DNA and protein sequencing.

At the end of the 1980s, the wavelet transform was introduced. Whereas in Fourier analysis the basic functions are localized in frequency but not in time, the basic functions in wavelet analysis are local both in the time domain and in the frequency domain. For several kinds of signals, wavelet-transform-based analysis allows one to obtain a more compact representation than Fourier analysis. This is the case for signals with local peaks, such as some biomédical signals, in which the presence of such peaks in the time domain originates many components in the frequency domain. Wavelet-transform analysis has been successfully applied to vari-ous kinds of signals and images, including biomédical signals and images, and in sequence and image analysis for genomics (Arneodo et al., 1995; Audit et al., 2002; Audit and Ouzounis, 2003; Chain et al., 2003; Aggarw-al et al., 2005; Touchon et al., 2005; Haimovich, 2006; Kwan et al., 2006; Thurman et al., 2007). The recent achievements in genome sequencing and in DNA microarray technology have originated a significant amount of data analysis and processing work. A variety of methods have been used. In genomics, especially as relates to microarrays and to the problem of in-ferring significance from microarray data, clustering methods and other statistical methods are often used, as well as machine learning methods and data mining methods. Such methods have been used for many years, prior to the advent of microarrays, for biomédical signal and image analy-sis. Simple clustering methods have been used since the 1960s for bioméd-ical signal analysis, such as evoked potentials, ECGs, and other signals. More recently, artificial neural networks (whose main feature is the capa-bility of learning from examples, that is, of generalizing and extracting knowledge from data) have been successfully applied for the solution of many types of pattern recognition, prediction, and classification problems both for biomédical signals and images, and for genomics and proteomics.

In the following, analysis and processing methods that play a key role in genomics and proteomics are described.

17.2 DATA ANALYSIS AND PROCESSING METHODS FOR GENOMICS IN THE POSTGENOMIC ERA

In the past several years since the time when the sequencing of the human genome was completed, rapid progress in the understanding of the human genome and the genetic basis of disease has been achieved. Moreover, high-throughput technologies related to microarrays have brought about the rapid generation of large-scale dataseis focused on genes and gene products. The great increase of the available information on DNA se-

Page 5: Advanced Methods of Biomedical Signal Processing (Cerutti/Advanced) || Molecular Bioengineering and Nanobioscience: Data Analysis and Processing Methods

4 3 2 CHAPTER 17 MOLECULAR BIOENGINEERING AND NANOBIOSCIENCE

quences and the development of methods and techniques for the use of this knowledge are the main aspects of a most significant transition phase in biomédical research. The recent achievements are opening up the way to the identification of further topics and to the solution of related problems. Using sequence-coupling techniques, it is possible to identify sequences that are related to diseases, to locate such sequences in the human genome, and to identify specific genes. However, in spite of the constant increase in these data, many genes are not known, and the same applies to most func-tions of the genes that have been discovered, to the processes leading to proteins, and to the regulatory mechanisms that control such processes.

The production of data by centers that obtain sequences has in-creased to a great extent in the last two decades, whereas the analysis of sequences and genomes has not progressed at the same rate.

The continuing improvements in gene sequencing and the continu-ing increases in sequence databases have led to a demand for the function-al analysis that follows sequencing. Comparing complete genomes is the next step in solving problems such as coding region or regulation-signal identification (Xie and Hood, 2003; Frazer et al., 2004; El-Sayed et al., 2005; Notredame, 2007; Zhou et al., 2008). The main requirements for such analyses are sequence comparison, visualization, and analysis.

17.2.1 Genome Sequence Alignment

Sequence alignment has provided one of the main tools in sequence analysis and has led to the development of a great number of informatic and statistical genome-oriented tools (Margulies, 2008; Huang et al., 2007). Moreover, Web-based tools have been developed that provide shared databases and data mining and processing software (Carmona-Saez et al., 2007; Brudno, 2007; Brudno et al., 2007; Bruford et al., 2008; Karolchik et al., 2008). A variety of alignment algorithms are available (Li, 1997; Bradley et al., 2008; Kapustin, 2008). They are based on scoring all possible alignments according to simi-larity/identity parameters for each residue, followed by alignment optimiza-tion. The early algorithms relate to DNA sequences containing one gene only, whereas longer alignments bring about computational speed problems. Com-paring complete genomes requires solving a great number of problems, such as dealing with repeated elements, including or eliminating elements, and re-organizations. Some of these problems have been solved using algorithms that find exact length correspondences, and start from minimum length cor-respondences and form contiguous, more extended correspondences.

On the basis of the observation of similar aspects of analogous genes, it is possible to obtain insights into the possible regulation and func-tions of such genes and on their evolutionistic history.

17.2.2 Genome Sequence Analysis

DNA sequences can be represented as symbolic strings relating to the nu-cleotides adenine, cytosine, guanine, and thymine (A, C, G, T), similar to

Page 6: Advanced Methods of Biomedical Signal Processing (Cerutti/Advanced) || Molecular Bioengineering and Nanobioscience: Data Analysis and Processing Methods

17.2 DATA ANALYSIS AND PROCESSING METHODS FOR GENOMICS 4 3 3

the 20 amino acid strings for proteins. The correlation structure of strings can be completely characterized starting from the possible nucleotide-nu-cleotide correlation functions or from their corresponding power spectra. More generally, the statistical analysis of DNA sequences plays an impor-tant role in the understanding of the structure and function of genomes.

The knowledge and understanding of the correlation among bases in DNA sequences has been of great interest for a long time. Before the hu-man genome project, long and continuous DNA sequences were not avail-able; therefore, only short correlations were considered. Since then, long DNA sequences have become easily available and it has been possible to obtain more complete characterizations of correlations among base cou-ples both for short distances and for long distances (Herzel and Grope, 1997; Choe et al., 2000).

From a molecular biology point of view, long-distance correlations are not surprising, since the complex organization of genomes involves re-lations at very diverse distances. For example, it has been experimentally proved that fragments containing up to 104 base couples exhibit rather large variances in the content of the sum of guanine plus cytosine, and this cannot be explained by considering fluctuations relating to short distances [49]. More recently, pronounced fluctuations in the content of the sum of guanine plus cytosine with a period of about 105 couples of bases have been found (Choe et al., 2000).

A DNA sequence can be regarded as a string of characters whose correlation structure can be characterized by all possible base-base func-tions or by their corresponding power spectra. The correlation structure of DNA sequences can be evaluated by analyzing various estimators such as Fourier spectra and the wavelet transform (the latter has been found to be a very useful for the study of the heterogeneity of DNA sequences (Herzel and Grope, 1997). Other algorithms that can be used are based on machine learning methods, such as artificial neural networks, which provide the most powerful tools for pattern recognition problems and have, therefore, been employed to extract information from DNA sequences. Some exam-ples are the identification of the genome regions that code proteins, pre-dicting mRNA donor and acceptor sites for the DNA sequence (Abe et al., 2003), and the use of the oligonucleotide frequency in order to distinguish genomes (Wu, 1997).

Other applications relate to gene identification, which can be achieved by two complementary approaches: search by content (which takes into account the protein coding potential of sequences) and search by signal (based on the identification of sequences that limit coding regions) (Oakley and Hanna, 2004). Artificial neural networks have been success-fully used for the identification and analysis of sites (for example, regula-tion sites and transcription sites), for sequence classification, for the identi-fication of significant sequence features, and for the understanding of biological rules that guide the structure and regulation of genes (Oakley and Hanna, 2004).

Page 7: Advanced Methods of Biomedical Signal Processing (Cerutti/Advanced) || Molecular Bioengineering and Nanobioscience: Data Analysis and Processing Methods

4 3 4 CHAPTER 17 MOLECULAR BIOENGINEERING AND NANOBIOSCIENCE

17.2.3 DNA Microarray Data Analysis

DNA microarray s, which are very frequently used for genome analysis, provide the most relevant contributions for the understanding of DNA se-quences. Data mining tools (such as Bayesian networks, clustering algo-rithms, genetic algorithms, Markov models, and artificial neural net-works) are being extensively used in DNA microarray data analysis (Bertone and Gerstein, 2001; Valafar, 2002; Greer and Khan, 2007; Kim and Cho, 2008; Tan et al., 2008).

Microarray technology is based on the immobilization of fragments of oligonucleotides with known sequences on matrixes and on their hy-bridization by exposure to DNA markers. The signals corresponding to hy-bridized fragments are quantified and originate an image that is the result of the simultaneous examination of thousands of genes.

The analysis of microarray data includes the search for genes that have similar or related expression patterns. In order to understand and in-terpret data deriving from microarray technology, specifically tailored computational methods have been developed (Salzburg, 1998), even though their basis lies in statistical and computational intelligence methods (such as genome data analysis in a broader sense). Data analysis methods aim to identify correlations between the microarray data and an underlying function or biological condition.

For a specific function or biological condition, the question to be an-swered is whether, when this function or condition is present, the expres-sion levels of genes or gene sequences change significantly with respect to the case in which it is absent. A simple example of such analysis is the t-test, which compares averages of observations. Another example is prin-cipal-component analysis, a linear technique that finds basis vectors (prin-cipal components) that expand the space of the problem (the genie expres-sion space). A principal component can be regarded as a relevant pattern in the ensemble of the genie expression data. Other statistical methods, such as Bayesian analysis, can take into account aspects relating to noise and to the typical variability of microarray data. Clustering methods, both k-means and hierarchical clustering, originate simple and easy to set up tools that have been successfully applied to genie expression data. For example, k-means analysis can be applied to microarray data forming a fixed num-ber of groups with similar expression patterns. All data can be randomly assigned to each of the groups. Subsequently, the distances among the data and the averages of each group and the distances among the averages of each group are calculated. At this point, distances within each group and distances among groups are maximized. Hierarchical clustering algorithms can be used for microarray data, establishing similarities among groups and using them as a basis to form new groups.

Artificial neural networks provide a further, most suitable analysis method for the classification of DNA microarray data. Self-organizing neural networks have features that are to some extent similar to k-means al-

Page 8: Advanced Methods of Biomedical Signal Processing (Cerutti/Advanced) || Molecular Bioengineering and Nanobioscience: Data Analysis and Processing Methods

17.4 PROTEIN STRUCTURE DETERMINATION 435

gorithms, but each group is represented by a node to which a specific weight is associated. Weights and positions of nodes are updated during a learning process in which relations among groups are obtained. When it is possible to establish the number of classes, the most appropriate method is super-vised learning by backpropagation training, in which training takes place starting from a dataset for which the classification is known. The following phase classifies the data for which the classification is not available.

17.3 FROM GENOMICS TO PROTEOMICS

The analysis of DNA sequences is mostly aimed at charatecterizing se-quences and genes. For proteins, the analysis of the amino acid sequences is mostly aimed at secondary and tertiary structure prediction, the identifi-cation of sequence patterns related to functional domains, and the predic-tion of the function of molcules or of domains of proteins (Tyers and Mann, 2003). The most frequently used methods are based on homologies with known molecular structures and on analyses starting from basic prin-ciples (using knowledge of fundamental atomic interactions and energy-based approaches).

The term proteomics originated within the work carried out on the genome. This term relates to the study of the proteins that are present in the cell, to structural descriptions of proteins and their interactions, to the description of protein complexes, and to protein structure modifications needed to change protein structure. Work on such aspects has great prospects for improving the understanding of cellular functions and drug design.

Proteomics can be regarded as an area that complements functional genomic aspects, such as genie expression profiles based on microarrays and phenotypic profiles at the cellular and organism levels (Baker and Sal, 2001). Proteomics is based on recent results in genomics, which elucidated aspects relating to genes that brought about specific analyses of the related proteins. Genome sequencing discovers sequences of amino acids. How-ever, in order to understand the biological role of the corresponding pro-teins it is necessary to know their structure, which determines their func-tion. Functional genomics, which uses experimental and algorithmic methods to characterize protein sequences, is focused on this aim. Various approaches have been adopted, ranging from focusing on folded structure only to analyzing all proteins that are present in one genome, which is tak-en as a model (Pei, 2008).

17.4 PROTEIN STRUCTURE DETERMINATION

After many years of work, the determination of protein structure remains one of the key goals of computational biology. Protein structure knowl-

Page 9: Advanced Methods of Biomedical Signal Processing (Cerutti/Advanced) || Molecular Bioengineering and Nanobioscience: Data Analysis and Processing Methods

4 3 6 CHAPTER 17 MOLECULAR BIOENGINEERING AND NANOBIOSCIENCE

edge is particularly important for functional and structural genomics. Ex-perimental methods (especially X-ray crystallography and NMR spec-troscopy) have determined the structure of a great number of proteins. Of-ten, such methods are complemented by computer-based structure-prediction methods, which play a key role in cases in which experimental determination is not possible or difficult.

For proteins whose structure is similar to the structure of known pro-teins, structure prediction can be carried out by locating similar structure parts and aligning them with the unknown sequence. This can be achieved by simple methods that have been available for quite some time if the structure identity with a known protein is greater than 25-30%. The most effective methods that are presently available use sequence alignment, pre-diction algorithms without use of known structures, and algorithms based on conformational energy (Lim, 1974a, b; Dumas and Ninio, 1982; Keskin et al., 2005; Floudas et al., 2006; Katzman et al., 2008; Viklund and Elofs-son, 2008).

Computer-based methods for protein structure prediction have been used since the 1970s. The most frequently used methods are the ones for the prediction of secondary structure elements. The first methods that were set up are based on simple stereochemical principles (Lim, 1974b; Fasman, 1989) that take into account structural features such as compact-ness and the presence of an internal hydrophobic, tightly compressed part and an external, polar part. Another very early method (Gamier et al., 1978) focuses on the frequency of occurrence of each of three conforma-tions (alpha helix, beta sheet, and coil) in the residues present in a set of proteins. Yet another method uses parameters based on the frequency of occurrence of an amino acid in one protein, on its presence in each type of secondary structure, and on the percentage of amino acids in that struc-ture type, together with empirical rules. A further early method is based on the observation that the conformation of an amino acid depends on the amino acid that surrounds it in the sequence.

For amphipatic structure prediction, the hydrophobic pattern has been recognized as a key element, so effective prediction methods are based on it. Fourier analysis of the hydrophobic profile has the great ad-vantage of taking into account cooperativity among amino acids in protein folding. This aspect is difficult to take into account in methods based on the use of databases (Gamier et al., 1978).

Starting from the late 1980s, artificial neural networks have been used to a significant extent, and better results have been obtained with re-spect to previously used methods. Moreover, encouraging results have been obtained even in some tertiary-structure prediction cases (Oakley and Hanna, 2004). It can be noticed that, in general, the use of neural networks has given very good results for many kinds of molecular sequence analysis problems. Moreover, the association of neural-net-based methods and of

Page 10: Advanced Methods of Biomedical Signal Processing (Cerutti/Advanced) || Molecular Bioengineering and Nanobioscience: Data Analysis and Processing Methods

REFERENCES 437

other methods such as de novo structure-prediction methods have given most promising results (Pei, 2008). The latter are based on the assumption that the native state of a protein corresponds to the minimum of the free energy. Free-energy-based methods have limitations deriving from the great number of variables that are involved, from the uncertainties regard-ing the formulae of the terms that represent the energy, and from the fact that many conformations exist that correspond to local minima of the glob-al potential energy. Using fragments of known structures allows one to re-duce such limitations to a great extent.

17.5 CONCLUSIONS

The methods described above play a key role in the achievement of a body of knowledge relating to molecular bioengineering and nanobioscience. These methods include statistical ones, databases, mathematical modeling, and machine learning, and relate to several existing disciplines.

REFERENCES

Abe, T., Kanaya, S., Kinouchi, M., Ichiba, Y., Kozuki, T., and Ikemura, T., Infor-matics for Unveiling Hidden Genome Signatures, Genome Research, 693-702, 2003.

Aggarwal, A., Leong, S. H., Lee, C , Kon, O. L., and Tan, P., Wavelet Trans- for-mations of Tumor Expression Profiles Reveals a Pervasive Genome-Wide Im-printing of Aneuploidy on the Cancer Transcriptome, Cancer Res., Vol. 65,186-194,2005.

Arneodo, A., Bacry, E., Graves, P. V., and Muzy, J. F., Characterizing Long-Range Correlations in DNA Sequences from Wavelet Analysis, Phys. Rev.Lett, Vol. 74, 3293-3296, 1995.

Audit, B., and Ouzounis, C. A., From Genes to Genomes: Universal Scale-invari-ant Proprties of Microbial Chromosome Organisation, J. Mol. Biol, Vol. 332, 617-633,2003.

Audit, B., Bacry, E., Muzy, J. F., and Arneodo, A., Wavelet-Based Estimators of Scaling Behavior, IEEE Transactions on Information Theory, Vol. 48, No. 11, 2002.

Baker, F. and Sali A., Protein Structure Prediction and Structural Genomics, Sci-ence, Vol. 294, No. 5540, 93-96, 2001.

Belkas, J. S., Shoichet, M. S., and Rajiv, M., Peripheral Nerve Regeneration Through Guidance Tubes, Neurological Research., Vol. 26, 151-160, 2004.

Bertone, P., and Gerstein, M., Integrative Data Mining: The New Direction in Bioinformatics, IEEE Engineering in Medicine and Biology, July/August, 2001.

Borisov, S. M., and Wolfbeis, O. S., Optical Biosensor, Chem. Rev., Vol. 108, No. 2,423-461,2008.

Page 11: Advanced Methods of Biomedical Signal Processing (Cerutti/Advanced) || Molecular Bioengineering and Nanobioscience: Data Analysis and Processing Methods

4 3 8 CHAPTER 17 MOLECULAR BIOENGINEERING AND NANOBIOSCIENCE

Bradley, R. K., Pachter, L., and Holmes, I., Specific Alignment of Structured RNA: Stochastic Grammars and Sequence Annealing, Bioin formatics, Vol. 24, No. 23, 2677-2683, 2008.

Brown, R. A., Bioartificial Implants: Design and Tissue Engineering, In Structural Biological Materials: Design And Structure—Property Relationships, El ices, M. (Ed.), pp. 105-160, Pergamon, 2000.

Brudno, M., An Introduction to the Lagan Alignment Toolkit, Comparative Ge-nomics, Vol. 395, 205-219, 2007.

Brudno, M., Poliakov, A., Minovitsky, S., Ratnere, I., and Dubchak, I., Multiple Whole Genome Alignments and Novel

Biomédical Applications at the VISTA Portal, Nucleic Acids Res., Vol. 35 (Web Server issue), W669-674,

Bruford, E. A., Lush, M. J., Wright, M. W., Sneddon, T. P., Povey, S., and Birney, E., The HGNC Database in 2008: A Resource for the Human Genome, Nucleic Acids Res., Vol. 36 (Database issue), D445-448, 2008.

Carmona-Saez, P., Chagoyen, M., Tirado, T., Carazo, J. M., and Pascual-Montano, A., GENECODIS: a web-based tool for finding significant concurrent annota-tions in gene lists, Genome Biology, Vol. 8, R3, 2007.

Castillo, J., Gaspar, S., Leth, S., Niculescu, M., Mortari, A., Bontidean, I., Soukharev, V., Dorneanu, S. A., Ryabov, A. D., and

Csöregi, E., Biosensors for Life Quality: Design, Development and Applications, Sensors and Actuators B, Chemical, Vol. 102, 79, 2004.

Chain, P., Kurtz, S., Ohlebush, E., and Slezak, T., An Applications-Focused Re-view of Comparative Genomics Tools: Capabilities, Limitations and Future Challenges, Briefings in Bioinformatics, Vol. 4, No. 2, 105-123, 2003.

Choe, W., Ersoy, O. K., and Bina, M., Neural Network Schemes for Detecting Rare Events in Human Genomic DNA, Bioinformatics, Vol. 16, No. 12, 1062-1072,2000.

Curtis, A., Tutorial on the Biology of Nanotopography, IEEE Trans Nanobio-science,Vol. 3, No. 4, 293-295, 2004. Database: 2008 Update, Nucl. Acids Res., Vol. 36, D773-D779, 2008.

Desai, T. A., Micro- and Nanoscale Structures for Tissue Engineering Constructs, Med. Eng. Phys., Vol. 22, 595-606, 2000.

Dumas, J., and Ninio, J., Efficient Algorithms for Folding and Comparing Nucleic Acid Sequences, Nucleic Acids Res., Vol. 10,

197-206, 1982. El-Sayed, N.M., Myler, P. J., Blandin, G., Berriman, M., Crabtree, J., Aggarwal,

G., Caler, E., Renauld, H., Worthey, E. A., Hertz-Fowler, C, Ghedin, E., Pea-cock, C, Bartholomeu, D. C , Haas, B. J. Tran, A., Wortman, J. R., Aismark, U. C. M., Angiuoli, S., Anupama, A., Badger, J., Bringaud, F., Cadag, E., Carlton, J. M., Cerqueira, G. C , Creasy, T., Deicher, A. L., Djikeng, A.,Embley, T. M., Häuser, C, Ivens, A. C , Kummerfeld, S. K., Pereira-Leal, J. B., Nilsson, D., Peterson, J., Salzberg, S. L., Shallom, J., Silva, J. C , Sundaram, J., Westen-berger, S., White, O., Melville, S. E., Donelson, J. E., Andersson, B., Stuart, K. D., and Hall, N., Comparative Genomics of Trypanosomatid Parasitic Protozoa, Science, Vol. 309, No. 5733, 404, 2005.

Erickson, D., Mandai, S., Yang, A. H. J., and Cordovez, B., Nanobiosensors:

Page 12: Advanced Methods of Biomedical Signal Processing (Cerutti/Advanced) || Molecular Bioengineering and Nanobioscience: Data Analysis and Processing Methods

REFERENCES 439

Optofluidic, Electrical and Mechanical Approaches to Biomolecular Detection at the Nanoscale, Microfluid Nanofluid., Vol. 4, 33-52, 2008.

Evans, G. R., Peripheral Nerve Injury: A Review and Approach to Tissue Engi-neered Constructs, Anatomical Record, Vol. 263, 396-404, 2001.

Fasman, G. D., Prediction of Protein Structure and the Principles of Protein Con-formation, Plenum Press, 1989.

Floudas, C. A., Fung, H. K., et al. Advances in Protein Structure Prediction and De Novo Protein Design: A Review, Chemical Engineering Science, Vol. 61, No. 3, 966-988, 2006.

Frazer, K. A., Pachter, L., Poliakov, A., Rubin, E. M., and Dubchak, I.,VISTA: Computational Tools for Comparative Genomics, Nucl. Acids Res., Vol. 32, W273-W279, 2004.

Gamier, J., Osguthorpe, D. J., and Robsonk B., Analysis of the Accuracy and Im-plications of Simple Methods for Predicting the Secondary Structure of Globu-lar Proteins,/ Mol. Bioi, Vol. 120, 97-120, 1978.

Greer, B., and Khan, J., Online Analysis of Microarray Data Using Artificial Neur-al Networks, Microarray Data Analysis, Vol. 377, 61-73, 2008.

Haimovich, A. D., Byrne, B., Ramaswamy, R., and Welsh, W. J., Wavelet Analy-sis of DNA Walks, Journal of Computational Biology, Vol. 13, No. 7, 1289-1298,2006.

Herzel, H., and Grope, I., Correlations in DNA Sequences: The Role of Protein Coding Segments, Physical Review E, Vol. 55, No. 1, 1997.

Huang, D. W., Sherman, B. T., Tan, Q., Collins, J. R., Alvord, W. G., Roayaei, J., Stephens, R., Baseler, M. W., Lane, H. C , and

Lempicki, R. A., The DAVID Gene Functional Classification Tool: A Novel Bio-logical Module- Centric Algorithm to Functionally Analyze Large Gene Lists, Genome Biology, Vol. 8, R183, 2007.

Kapustin, Y., Souvorov, A., Tatusova, T., and Lipman, D., Splign Algorithms for Computing Spliced Alignments with Identification of Paralogs, Biol. Direct.,Vol 3, 20, 2008.

Karolchik, D., Kuhn, R. M., Baertsch, R., Barber, G. P., Clawson, H., Diekhans, M., Giardine, B., Harte, R. A., Hinrichs, A. S.,

Hsu, F., Kober, K. M., Miller, W., Pedersen, J. S., Pohl, A., Raney, B. J., Rhead, B., Rosenbloom, K. R., Smith, K. E., Stanke, M., Thakkapallayil, A., Trum-bower, H., Wang, T., Zweig, A. S., Haussler, D., and Kent, W. J., The UCSC Genome Browser

Katzman, S., C. Barrett, et al. PREDICT-2ND: A Tool for Generalized Protein Lo-cal Structure Prediction, Bioinformatics, Vol. 24, No. 21, 2453-2459, 2008.

Keskin, O., Nussinov, R., and Gursoy, A., Prism: Protein-Protein Interaction Pre-diction by Structural Matching, Functional Proteomics, Vol. 484, 505-521, 2005.

Kim, K.-J., and Cho, S.-B., An Evolutionary Algorithm Approach to Optimal En-semble Classifiers for DNA Microarray Data Analysis, IEEE Transactions on Evolutionary Computation, Vol. 12, No. 3, 377-388.

Kissinger, P. T., Biosensors—A Perspective, Biosensors and Bioelectronics, Vol. 20,2512,2005.

Kwan, B. Y. M., Kwan, J. Y. Y., and Kwan, H. K., Wavelet Analysis of the

Page 13: Advanced Methods of Biomedical Signal Processing (Cerutti/Advanced) || Molecular Bioengineering and Nanobioscience: Data Analysis and Processing Methods

4 4 0 CHAPTER 17 MOLECULAR BIOENGINEERING AND NANOBIOSCIENCE

Genome of the Model Plant Arabidopsis thaliana, in TENCON 2006. 2006 IEEE Region 10 Conference. 14-17 Nov. 2006.

Li, M., Mills, D. K., Cui, T., and McShane, M. J., Cellular Response to Gelatin-and Fibronectin-Coated Multilayers Polyelectrolyte Nanofilms, IEEE Transac-tions on Nanobioscience, Vol. 4, No. 2, 170-179, 2005.

Li, W., The Study of Correlation Structures of DNA Sequences—A Critical Re-view, Computers & Chemistry, Vol. 21, No. 4, 257-272, 1997.

Lim V.l., Algorithms for Prediction of _-Helices and „-Structural Regions in Glob-ular Proteins,/ Mol. Bioi, Vol. 88, 873-894, 1974b.

Lim, V. L, Structural Principles of the Globular Organization of Protein Chains: A Stereochemical Theory of Globular Protein Secondary Structure, J. Mol. Bioi, Vol. 88, 857-872, 1974a.

Liu, Y., Yu, X., Zhao, R., Shangguan, D. H., Bo, Z. Y., and Liu, G. Q., Quartz Crystal Biosensor for Real-Time Monitoring of Molecular Recognition Be-tween Protein and Small Molecular Drug, Biosensors and Bioelectronics, Vol.19, 9, 2003.

Margulies, E. H., Confidence in Comparative Genomics, Genome Res., Vol. 18, 199-200,2008.

Mohanty, S. P., and Kougianos, E., Biosensors: A Tutorial Review, IEEE Poten-tials, March/April, 35-40, 2006.

Notredame, C, Recent Evolutions of Multiple Sequence Alignment Algorithms, PLoS Computational Biology, Vol. 3, No. 8, 2007.

Oakley, B. A., and Hanna, D. M., A Review of Nanobioscience and Bioinformatics Initiatives in North America, IEEE Transactions on NanoBioscience, Vol. 3, 1, 74-84, 2004.

Pastorino, L., Soumetz, F. C , and Ruggiero, C, Nanofunctionalisation for the Treatment of Peripheral Nervous System Injuries, IEE Proceedings Nanobiotechnology, Vol. 153, No. 2, 16-20, 2006.

Pastorino, L., Soumetz, F. C , Giacomini, M., and Ruggiero, C , Development of a Piezoelectric Immunosensor for Paclitaxel Measurement, Journal of Immuno-logical Methods, Vol. 313, 119-198, 2006a.

Pei, J., Multiple Protein Sequence Alignment, Current Opinion in Structural Biol-ogy, Vol. 18, No. 3, 382-386, 2008.

PNAS, Vol. 102, 9836-9841, 2005. Rodriguez-Mozaz, S., López de Alda, M. J., Marco, M. P., and Barceló, D.,

Biosensors for Environmental Monitoring: A Global Perspective, Talanta, Vol. 65,291,2005.

Ruggiero, C , Mantelli, M., Curtis, A., and Rolfe, P., Protein-Surface Interactions: An Energy-Based Mathematical Model, Cell Biochem Biophys., Vol. 43, No. 3, 407^17,2005.

Salzburg, S. L., Searls, D. B., and Kash, S., Computational Methods in Molecular Biology, Elsevier, 1998.

Sinani, V. A., Koktysh, D. S., Yun, B. G., Matts, R. L., Pappas, T. C , Motamedi, M., Thomas, S. N., and Kotov, N. A., Collagen Coating Promotes Biocompati-bility of Semiconductor Nanoparticles in Stratified LBL Films, Nano Letters, Vol. 3, No. 9, 1177-1182,2003.

Soumetz, F. C, Pastorino, L., and Ruggiero, C , Human Osteoblast-Like Cells Re-sponse to Nanofunctionalised Surfaces for Tissue Engineering, Journal of Bio-

Page 14: Advanced Methods of Biomedical Signal Processing (Cerutti/Advanced) || Molecular Bioengineering and Nanobioscience: Data Analysis and Processing Methods

REFERENCES 441

medical Materials Research—Part B, Applied Biomaterials, Vol. 84B, No. 1, 249-255, 2008.

Tan, M. P., Smith, E. N., Broach, J. R., and Floudas, C , A., Microarray Data Min-ing: A Novel Optimization-Based Approach to Uncover Biologically Coherent Structures, BMC Bioinformatics, Vol. 9, 268, 2008.

Thurman, R. E., Day, N., Noble, W. S., and Stamatoyannopoulos, J. A., Identifica-tion of Higher-Order Functional Domains in the Human ENCODE Regions, Genome Res., Vol. 17, 917-927, 2007.

Touchon M., Nicolay, S., Audit, B., Brodie of Brodie, E.-B., d'Aubenton- Carafa, Y., Arneodo, A., and Thermes, C , Replication-Associated Strand Asymmetries in Mammalian Genomes: Toward Detection of Replication Origins,

Tyers, M., and Mann, M., From Genomics to Proteomics, Nature, Vol. 442,193-197,2003.

Tyroen-Tóth, P., Vautier, D., Haikel, Y., Voegel, J., Schaaf, P., Chluba, J., and Ogier, J., Viability, Adhesion, and Bone Phenotype of Osteoblast-Like Cells on Polyelectrolyte Multilayer Films, J. Biomed. Mater. Res., Vol. 60, 657-667, 2002.

Valafar F., Pattern Recognition Techniques in Microarray Data Analysis: A Sur-vey, Annals of New York Accademy of Sciences, Vol. 980, 41-64, December 2002.

Viklund, H., and Elofsson, A., OCTOPUS: Improving Topology Prediction by Two-Track ANN-Based Preference Scores and an Extended Topological Grammar, Bioinformatics, Vol. 24, No. 15, 1662-1668, 2008.

Vo-Dinh, T., and Cullum, B., Biosensors and Biochips: Advances in Biological and Medical Diagnostics, FreseniusJ. Anal. Chem., Vol. 366, 540-551, 2000.

Wang, H., and Lineaweaver, W., Nerve Conduits for Nerve Reconstruction, Oper-ative Techniques in Plastic and Reconstructive Surgery, Vol. 9, 59-66, 2003.

Wang, Y., Du, W., Spillman, W. B., and Claus, R. O., Biocompatible Thin Film Coatings Fabricated Using the Electrostatic Self-Assembly Process, Proc. SPIE, Vol. 4265, 142-151, 2001.

Wilkinson, C. D. W., Riehle, M., Wood, M., Gallagher, J., and Curtis, A. S. G., The Use of Materials Patterned on a Nano- and Micro-Metric Scale in Cellular Engineering, Mater. Sei. Eng., C, Vol. 19, 263-269, 2002.

Wu, C. H., Artificial Neural Networks for Molecular Sequence Analysis, Comput-ers & Chemistry, Vol. 21, No. 4, 237-256, 1997.

Xie, T., and Hood, L., ACGT—A Comparative Genomics Tool, Bioinformatics, Vol. 19, No. 8, 1039-40,2003.

Zhou, J., Zhu, T., Hu, C , Li, H., Chen, G., Xu, G., Wang, S., Zhou, J., and Ma, D., Comparative Genomics and Function Analysis on BIl Family, Computational Biology and Chemistry, Vol. 32, No. 3, 159-162, 2008.