thesis - ครูปอนด์ · acknowledgements i wish to express my deep gratitude to a...
TRANSCRIPT
THESIS
3D VISUALIZATION SYSTEM TO AID
UNDERSTANDING HEMAGGLUTININ MUTATIONS IN
H5N1 INFLUENZA VIRUS
TAWEESAK POOCHAI
GRADUATE SCHOOL, KASETSART UNIVERSITY
2008
THESIS
3D VISUALIZATION SYSTEM TO AID UNDERSTANDING
HEMAGGLUTININ MUTATIONS IN H5N1 INFLUENZA VIRUS
TAWEESAK POOCHAI
A Thesis Submitted in Partial Fulfillment of
the Requirements for the Degree of
Master of Science (Chemistry)
Graduate School, Kasetsart University
2008
ACKNOWLEDGEMENTS
I wish to express my deep gratitude to a number of people who giving me the
guidance, help and support to reach my goal of this thesis. First of all, most of credits
in this thesis should justifiably go to my advisor, Dr. Chak Sangma, for his valuable
guidance, continuous support, kindness and encouragement throughout the course of
my graduate. I also thank my other my co-supervisor, Dr. Pipat Khongpracha and Dr.
Songwut Suramitr, for their very valuable comments and suggestions for this work. I
would like to thank the rest of my thesis committee, Dr. Jakkapan Sirijaraensre and
Dr. Piti Treesukol whose the representative of the Graduate School of Kasetsart
University, for his helpful comments and suggestions. I am also greatly indebted to
many teachers, Prof. Dr. Jumras Limtrakul, Assoc. Prof. Dr. Supa Hannongbua,
Asst. Prof. Piboon Pantu, Dr. Pensri Bunsawansong, Dr. Tanin Nanok, Dr.
Patchreenart Saparpakorn, Dr. Somkiat Nokbin and Dr. Pimpa Hormnirun.
Furthermore, I would like to acknowledge Miss Wipawee Punnopasri, Miss
Waraporn Jungtanasombat, Miss Daungmanee Chuakhaew, Miss Nipa Jongol and
all of my colleagues at LCAC, your thoughtful help and support.
During the course of this work at Kasetsart University, I was supported by the
Ministry of University Affairs under the Higher Education Development Project
Scholarship (MUA-ADB funds). Thanks are also to the Laboratory for Computational
and Applied Chemistry (LCAC) and Cheminformatics Research Unit, Kasetsart
University for offering such good computational knowledge-based systems.
Finally, I would like to dedicate this thesis to my dad, Pol.Col. Somsak
Poochai, my mum, Somradee Poochai and my sister, Phattarawan Poochai. Their love
and support for me is priceless.
Taweesak Poochai April 2008
i_
TABLE OF CONTENTS
Page
TABLE OF CONTENTS i
LIST OF TABLES ii
LIST OF FIGURES iv
LIST OF ABBREVIATIONS vi
INTRODUCTION 1
OBJECTIVES 8
LITERATURE REVIEW 9
MATERIALS AND METHODS 17
Materials 17
Methods 18
RESULTS AND DISCUSSIONS 30
CONCLUSION AND RECOMMENDATION 46
Conclusion 46
Recommendation 46
LITERATURE CITED 47
APPENDICES 52
Appendix A Methodologies implemented in MODELLER 53
Appendix B Poster contribution to conferences 73
Appendix C The survey satisfy of user about 3dvis system 79
CURRICULUM VITAE 102
ii_
LIST OF TABLES
Table Page
1 DOPE score for mutation residue of KAN-1 model,
calculated from this system and SWISS-MODEL workspace,
and 1JSO structure in each residues and different percent
value compare each residue of model with residue of template
structure. 38
2 DOPE score for mutation residue of KAN-1 model,
calculated from this system and SWISS-MODEL workspace,
and 1JSN structure in each residues and different percent
value compare each residue of model with residue of template
structure. 40
3 DOPE score for mutation residue of KAN-1 model,
calculated from this system and SWISS-MODEL workspace,
and 1JSM structure in each residues and different percent
value compare each residue of model with residue of template
structure. 42
4 DOPE score for mutation residue of KAN-1 model,
calculated from this system and SWISS-MODEL workspace,
and 2FK0 structure in each residues and different percent
value compare each residue of model with residue of template
structure. 43
5 DOPE score for mutation residue of KAN-1 model,
calculated from this system and SWISS-MODEL workspace,
and 2IBX structure in each residues and different percent
value compare each residue of model with residue of template
structure. 44
iii_
LIST OF TABLES (Continued)
Appendix table Page
A1 Numerical restraint forms. 71
A2 Numerical feature types. 72
C1 The statistical analysis result in percentage of people whose
answer this questionnaire separate with sex, age, education level,
status, work, expert/interesting field and current working. 81
C2 The statistical analysis result in percentage of people whose
answer this questionnaire in experience of homology modeling
and homology related programs. 82
C3 The statistical analysis result about 3dvis system separate with
sex. 86
C4 The statistical analysis result about 3dvis system separate with
age. 87
C5 The statistical analysis result about 3dvis system separate with
education level. 88
C6 The statistical analysis result about 3dvis system separate with
work. 89
C7 The statistical analysis result about 3dvis system separate with
expert/interesting field. 90
C8 The statistical analysis result about 3dvis system separate with
current working. 91
iv_
LIST OF FIGURES
Figure Page
1 Schematic representation of influenza A virion. 1
2 Influenza virus replication cycle. 2
3 Steps in comparative protein structure modeling. 18
4 3D visualization system to aid understanding mutation in
Hemagglutinin protein (H5N1) Schematics. 23
5 Schematic diagram for 2D Alignment page. 24
6 Schematics for 3D Automodel page. 25
7 Schematic for briefly result page. 25
8 The multiple table layout of relative tables in the database 26
9 Three sectorial parts, connected by username and jobname, of data
source collection in database system. 27
10 Schematics diagram of Bots program. 29
11 Home page of the system. 30
12 Workspace page. 31
13 2dalign page. 32
14 3dauto page. 33
15 Result page. 34
16 Download page. 35
17 DOPE score graph between 3D structure of KAN-1 structure that
calculate with the 3dvis system (green line) and SWISS-MODEL
workspace (red line) and 3D structure of 1JSO. 37
v_
LIST OF FIGURES (Continued)
Figure Page
18 DOPE score graph between 3D structure of KAN-1 structure that
calculate with the 3dvis system (green line) and SWISS-MODEL
workspace (red line) and 3D structure of 1JSN. 39
19 DOPE score graph between 3D structure of KAN-1 structure that
calculate with the 3dvis system (green line) and SWISS-MODEL
workspace (red line) and 3D structure of 1JSM. 41
20 DOPE score graph between 3D structure of KAN-1 structure that
calculate with the 3dvis system (green line) and SWISS-MODEL
workspace (red line) and 3D structure of 2FK0. 43
21 DOPE score graph between 3D structure of KAN-1 structure that
calculate with the 3dvis system (green line) and SWISS-MODEL
workspace (red line) and 3D structure of 2IBX. 44
Appendix Figure
A1 Schematic representation of the reference state. 65
A2 Input alignment file in PIR format. 68
A3 Restrain file. 70
vi_
LIST OF ABBREVIATIONS
3D = Three dimension
3dvis = Three-dimensional visualization system to aid understanding
hemagglutinin mutations in H5N1 influenza virus
Ala (A) = Alanine
Asn (N) = Asparagine
Asp (D) = Aspartic acid
Cys (C) = Cysteine
DOPE = Discrete Optimization Potential Energy
Glu (E) = Glutamic acid
Gly (G) = Glycine
GUI = Graphic User Interface
HA = Hemagglutinin
His (H) = Histidine
HTML = Hypertext markup language
Ile (I) = Isoleucine
LD = Langevin Dynamics
Leu (L) = Leucine
LSTA = Sialyllacto-N-tetraose A
LSTC = Sialyllacto-N-tetraose C
Lys (K) = Lysine
M = Matrix protein
MC = Monte Carlo
MD = Molecular Dynamic
mRNA = Messenger ribonucleic acid
NA = Neuraminidase
NMR = Nuclear Magnetic Resonance
NP = Nucleoprotein.
NS = Non-structural protein.
PA, PB2 = RNA polymerase.
vii_
LIST OF ABBREVIATIONS (Continued)
PB1 = RNA polymerase and PB1-F2 protein
Phe (F) = Phenylalanine
PHP = Prehypertext Processor
Pro (P) = Proline
QM = Quantum mechanics
RCSB PDB = The research collaboratory for structural bioinformatics
(RCSB), the non-profit consortium that manages the Protein
Data Bank (PDB)
rmsd = Root mean square deviation
RNA = Ribonucleic acid
RNP = Ribonucleoprotein
1
3D VISUALIZATION SYSTEM TO AID UNDERSTANDING
HEMAGGLUTININ MUTATIONS IN H5N1 INFLUENZA VIRUS
INTRODUCTION
The influenza A virus is an RNA virus that contains two surface
glycoproteins, hemagglutinin (HA) and neuraminidase (NA) shown in Figure 1, to
initiate viral fusion and subsequent budding of new virions from the infected cell
(Sears and Wong, 1999). Both of these glycoproteins are presented on the surface of
the influenza virus and are essential for virion propagation. Hemagglutinin recognizes
target cells via sialic acid binding sites and then promotes viral fusion (Huang et al.,
1981).
Figure�1 Schematic representation of influenza A virion. Eight ribonucleoprotein
segments (RNP) are surrounded by layer of matrix (M1) protein and lipid
bilayer taken from host cell at budding. NS2 protein is associated with
M1. Three viral proteins are incorporated into the lipid bilayer: HA, NA,
and M2 protein. HA trimers and NA tetramers form spikes on the surface
of the virion. RNP segments contain viral RNA surrounded by
nucleoprotein and associated with the polymerase complex.
Source: Gubareva et al. (2000)
2
Hemagglutinin gene (HA) of influenza A virus (H5N1), infect a variety of
birds and mammals (Edwin D. K., 2006), is an antigenic glycoprotein found on the
surface of the influenza viruses which the receptor-binding and membrane fusion
glycoprotein of influenza virus and the target for infectivity-neutralizing antibodies
(John J. S. and Don C. W., 2000)
Figure�2 Influenza virus replication cycle.
Source: Gubareva et al. (2000)
In Figure 2, Infection of virus host can be described in following steps. First,
influenza virus binds through HA onto sialic acid sugars on the surfaces of epithelial
cells, typically in the nose, throat and lungs of mammals and intestines of birds.
Second, the cell imports the virus by endocytosis. In the acidic endosome part of the
HA protein fuses the viral envelop with the vacuole’s membrane, releasing the viral
RNA molecule into the cytoplasm (Melike L. et. al., 2003). Next, protein and viral
3
RNA form a complex that is transported into cell nucleus (Cros, J and Palese P, 2003)
and synthesized viral protein and form new viral genome particle by using inhibiting
translation of host cell mRNAs (Kash J. et. al., 2006). Finally, the mature virus depart
from the cell in a sphere of host phospholipids membrane It detach once their
neuraminidase has cleaved sialic acid residues from the host cell after cell died.
HA, at least 16 different HA antigens, is an antigenic glycoprotein found on
the surface of the influenza viruses. It is responsible for binding the virus to the cell
that is being infected in Human of type H5N1 in Asia (Suzuki Y., 2005). Studying
mutation, including substitutions, deletions, and insertions, are one of the most
important mechanisms for producing variation in influenza viruses (Robert G. W.,
1992), in HA is the receptor-binding and membrane fusion glycoprotein of influenza
virus and the target for infectivity-neutralizing antibodies (John J. S. and Don C. W.,
2000), consider position of mutation in protein sequence characterize by associated
with glycosylation sites, cleavage site, residues of the H5 receptor binding site and
antigenic site (Shiuh-Ming L. et.al., 1992).
To date, sum of x-ray structure of hemagglutinin protein in influenza A virus
subtype H5N1 was lower than sequence, 5 full x-ray structures from
http://www.pdb.org/pdb/home/home.do, and 991 full-length sequences from
http://www.ncbi.nlm.nih.gov/genomes/FLU/FLU.html. To understanding protein
function should have three dimension structures, so researcher used homology
modeling method to generated initial structures for calculate physical/chemical
properties. Homology modeling combines sequence analysis and molecular modeling
to predict three-dimensional structure. We will choose a remote homologue of your
Project protein that has not had its structure yet solved and use the SwissModel
WWW resource, MODELLER homology modeling tool or Scwrl3 modeling based
graph theory to model the molecule. The theoretical structure is then visualized with
SwissPDBViewer, RasMol or PyMol to gain insight into the way in which its
structure relates to its function. Color coding different physical attributes such as
residue charge, hydrophobicity, and secondary structure elements; different
representations, such as alpha-carbon traces, cartoon graphics, and space-filling
4
models; and super-positioning of the model with an actual structure all assist in the
interpretation.
Therefore, we have developing our own 3D visualization system to aid
automates, visualization as a portal web tool for lots of bioinformatics researches. In
this work, the molecular structures and protein functional as well as glycosylation site,
receptor binding site antigenic site that related with mutation residue were interested.
We has develop a web-based graphical user interface to generate 3D structure of
hemagglutinin protein influenza A virus subtype H5N1.
The system for this research is client/server web architecture. Therefore,
users interact with the system through their web browser. A well-known web server,
the Apache web server, is open source and widely used. The Apache web server
used for transferring data between the client and the server. The database and all
scripts are run on 3.2 GHz Intel Pentium 4 Processor server with 2 GB of memory,
under the GNU/Linux operating system.
From architecture of the system above, we can be divide the system
descriptions into two parts are the client and the server.
1. The Client
The client is an individual user's computer or a user application that does a
certain amount of processing on its own. It also sends and receives requests to and
from one or more servers for other processing and/or data. The client interacts with
the system through web browser.
In recent years, internet has become the main medium of choice for multi-user
application program distribution. The major advantage for web-base database
searching is that users can search the most recent data without regularity downloading
and updating the whole database to their local disks. In addition, there is no need to
install additional software on the user’s computer except standard web browser. In
5
this research, the 3D viewer program written in java called “Jmol”, Jmol applet
(www.jmol.sourceforge.net/applet), can be downloaded through the network and
integrated into the web document, By using this application the users can see the 3D
structure directly within the web page through java supported web browser, such as
Netscape and internet explorer (IE).
2. The server
The server consists of one or more computers that receive and process requests
from one or more client machines. A server is typically designed with some
redundancy in power, network, computing, and file storage. However, a machine with
dual processors is not necessarily a server. An individual workstation can function as
a server.
The server site in this research along with Linux operating system, apache web
server, my-structure query language (MySQL) database engine, and Hypertext
PreProcessor (PHP) scripting language, called LAMP (Linux Apache MySQL PHP).
LAMP is an open source web server solution which is both powerful and stable. The
homology modeling tools for this work was MODELLER, modeling 3D structure of
protein structure using sequence and template, control by Python scripting. This
combination of software allows one to build and customize a high performance and
innovative web server.
2.1 Linux operating system
Linux is an open source and constantly evolving operating system which
allows the administrator full control over almost every system aspect. In recent years,
Linux has proven itself as a production level competitor in the operating system world
(www.linuxforum.com).
6
2.2 Apache web server
Apache is the most popular web server on the net. It is very secure, fast,
and reliable. Moreover, it is used for transferring data between the client and the
server (www.linuxhelp.net). Apache 2.2 is used in this study.
2.3 MySQL database engine
MySQL is fast, multithreaded, multi-user, platform-independent, and
robust structure query language (SQL) database server and powerful relational
database management system (RDBMS) being widely used on the Internet. It is very
good for web-based applications as well and is a database server that holds content for
many things. For example, one could store website member information in a MySQL
database for use on a website. A database server allows a webmaster to keep data
organized and accessible (www.mysql.com). MySQL-3.23.58-1.9.i386 is back-end
database engine in order to achieve scalability, flexibility, and high performance
software for our research.
2.4 PHP scripting language
PHP is a relative with new server side programming language, which
allows webmasters to easily add dynamic pages to their website (www.php.net). PHP
is extremely versatile, as it can do everything from printing database data to webpage
to declaring browser cookies. All of these applications are open source and free for
experiment and modification. Many tools have been developed for MySQL with PHP
such as phpMyAdmin (www.mysql.com), a very good web based admin tool for
MySQL. It has many advantages over its competitors (ASP, Perl, and Java), such as
it is object oriented, embedded into hyper text markup language (HTML), very fast,
cross-platform compatible, can running as an apache module. PHP 5 is used in this
research.
7
2.5 Python scripting language
Python, developed by Guido van Rossum in 1991, is scripting language, as
well as Perl, PHP, Tcl (Alfred V. A., 2004), is an interpreted, interactive, object-
oriented programming language designed around a philosophy which emphasizes
readability and the importance of programmer effort over computer effort (John K.
O., 1998; M.F. Sanner1999). To develop python script can easy to use integrated
development environments (IDE) which open source software, such as IDLE
(http://www.python.org/idle/), PyDev (http://pydev.sourceforge.net/), IPython
(http://ipython.scipy.org/moin/) and others.
2.6 MODELLER
MODELLER is a computer program for comparative protein structure
modeling presented by Andrej Sali (Sali A. et. al., 1993; Fiser A. et. al., 2000). The
input is an alignment of a sequence to be modeled with the template structure, the
atomic coordinates of the template and a simple script, Python’s script.
8
OBJECTIVES
1. To propose 3D visualization system to aid understanding hemagglutinin
mutation in influenza A virus.
2. To integrate web-base tool for comparative analysis data of Influenza virus.
3. To perform modeling impact mutations in protein structure of Influenza
virus.
9
LITERATURE REVIEW
Studying mutation in HA, are one of the most important mechanisms for
producing variation in influenza viruses (Robert G. W., 1992), consider position of
mutation in protein sequence characterize by associated with glycosylation sites,
cleavage site, residues of receptor binding site and antigenic site of protein sequence
(Shiuh-Ming L. et.al., 1992). Thus, HA is the main determinant of the host range of
virus (Kanta S. et. al. 1998).
Clayton W. N. et al. (1984) proposed two mutations in the receptor-binding
site of a human hemagglutinin at residue 226 and 228, receptor binding site, had
effect to allowed replication in ducks.
Kanta S. et al. (1998) isolated an avian H5N1 influenza A virus (A/Hong
Kong/156/97) from a tracheal aspirate obtain from a 3-year-old child in Hong Kong
with a fatal illness consistent with influenza. They found that the hemagglutinin
protein contained multiple basic amino acids adjacent to the cleavage site. A feature
characteristic of highly pathogenic avian influenza A viruses.
Ya H. et al. (2001) investigated four new three-dimensional structures of avian
H5 and swine H9 influenza hemagglutinins (HAs). They found that closely related to
those that caused outbreaks of human disease in Hong Kong in 1997 and 1999 were
determined bound to avian and human cell receptor analogs. Form structures show
that HA binding sites specific for human receptors appear to be wider than those
preferring avian receptors and how avian and human receptors are distinguished by
atomic contacts at the glycosidic linkage. They compare new structures with
previously reported crystal structures of HA/sialoside complexes of the H3 subtype
that caused the 1968 Hong Kong influenza virus pandemic and analyzed in relation to
HA sequence of all 15 subtypes and to receptor affinity.
10
Ya H. et al. (2002) determined the three-dimensional structures of the
hemagglutinins (HAs) from H5 avian and H9 swine viruses closely related to the
viruses isolated from humans in Hong Kong. They compared it with known structures
of the H3 HA from the virus that caused the 1968 H3 pandemic and of the HA-
esrerase-fusion (HEF) glycoprotein from an influenza C virus. The result suggest that
HA subtypes may have originated by diversification of properties that affected the
metastability of Has requires for their membrane fusion activities in viral infection.
Enrique T. M. and Michael W. D. (2005) proposed method, from statistical
mechanics and probabilistic statistics, to quantify the non-monotonic immune
response that results from antigenic drift in the epitope of the hemagglutinin and
neuraminidase protein. They found that the results, compare epitope sequences of the
hemagglutinin protein A/Fujian/411/2002 and A/Panama/2007/99, explain the
ineffectiveness of the 2003-3004 influenza vaccine in the United States and provide
an accurate measure by which to optimize the effectiveness of future annual influenza
vaccines.
James S. et al. (2006) investigated relation between hemagglutinin (HA)
structure from highly pathogenic Vietnamese H5N1 influenza virus (Viet04) and 1918
and other human H1 HAs influenza A virus. They found that Viet04 more related to
1918 and other human H1 HAs than 1997 duck H5 HA by studying variation in
antigenic site and receptor binding site effected to �2-3 and �2-6 receptor specificity
only enhanced or reduced affinity for avian type receptors. Mutations that can convert
avian H2 and H3 HAs to human receptor specificity, when inserted on to the Viet04
H5 HA framework, permitted binding to a natural human receptor.
Functional characterization of protein sequence is one of the most frequent
problems in biology. This task is usually facilitated by accurate three dimensional
structure of the studied protein. There are several computer programs and web servers
that automate the homology modeling process. The web server for automated
homology modeling was a ModLoop (Andras F. and Andrej S., 2003) and SWISS-
MODEL workspace (Konstantin A. et al., 2006). The free computer program to
11
automated structure based on several theories such as Scwrl, Sccomp and
MODELLER (Sali A. et. al., 1993; Fiser A. et. al., 2000)..
In 1995, Andrej S. presented the number of sequence that can be modeled is
an order of magnitude larger than the number of experimentally determined protein
structures, when a protein sequence with at least 40% identity to a known structure.
Evaluation techniques are available that can estimate errors in different regions of the
model. In the same year, he examined evaluate three dimensional of protein structure
of human nucleoside diphosphate kinase, mouse cellular retinoic acid binding protein
I, and human eosinophil neurotoxin that were calculated by MODELLER, a program
for comparative protein modeling by satisfaction of spatial restrains. When template
structure with more than 40% sequence identity to the target protein was available the
model was likely to have about 90% of the main chain atoms modeled with an root
mean square (rms) derivation from the X-ray structure of ~ 1 Ao, in large part because
the template were likely to be that similar to the X-ray structure of the target. They
compared rms derivation to overall differences between refined NMR and X-ray
crystallography structures of the same protein.
Adrian A. C. et al. (2003) presented SCWRL, designed to be used as a
homology modeling tool and so it preserves all input coordinate features in contrast to
many publicly available programs. A new algorithm for this tool is presented that uses
results from graph theory to solve the combinatorial problem encountered in the side-
chain prediction problem, side chains are represented as vertices in an undirected
graph. The resulting graph can be partitioned into connected subgraphs with no edges
between them. This algorithm is able to complete predictions on a set of 180 proteins
with 34,342 side chains in <7 min of computer time. The total 1 and 1 + 2 dihedral
angle accuracies are 82.6% and 73.7% using a simple energy function based on the
backbone-dependent rotamer library and a linear repulsive steric energy.
Trosten S. et al. (2003) presented SWISS-MODEL, a server for automated
comparative modeling of three dimensional protein structures. It provided several
levels of user interaction through its World Wide Web interface; template selection,
12
alignment and model building are done completely automated by the server. Complex
modeling tasks can be handled with the ‘project mode’ using DeepView (Swiss-
PdbViewer), an integrated sequence-to-structure workbench. All model are sent back
via email with a detailed modeling report. WhatCheck analyses and ANOLEA
evaluations are provided optionally.
Andras F. and Andrej S. (2003) proposed ModLoop, is a web server for
automated modeling of loops in protein structures. It’s used the input as atomic
coordinates of the protein structure in the Protein Data Bank format and the output is
the coordinates of the nonhydrogen atoms in the modeled segments amd relies on the
loop modeling routine in MODELLER that predicts the loop conformations by
satisfaction of spatial restrains. A user provides the input to the server via a simpole
web interface, and received the output by email.
Eran E. et al. (2004) presented Sccomps, relatively fast execution times;
correctly predict 1 angles for 92–93% of buried residues and 82– 84% for all residues
with an RMSD of 1.7 Å for side chain heavy atoms. The program calculated influence
of the crystal packing, completeness of rotamer library and precise positioning of C�
atoms on the accuracy of side-chain prediction. Its used used to concurrently predict
conformations of multiple amino acid side chains on a fixed protein backbone.
Konstantin A. et al. (2006) presented SWISS-MODEL workspace, a web-
based integrated service dedicated to protein structure homology modeling. It assists
and guides the user in building protein homology models at different levels of
complexity. This system provided for each user where several modeling projects can
be carried out in parallel. They included protein sequence and structure databases
necessary for modeling that are accessible from the workspace and are updated in
regular intervals. Tools for template selection, model building and structure qualify
evaluation can be invoked from within the workspace.
13
Nowaday, Most researcher were collected and/or developed tools, to
convenience sequence alignment, protein visualization or protein function prediction,
to aid understanding protein functional. The tool was following:
Patrick A. et al. (2001) proposed automated and benchmarked a method based
on the evolutionary trace approach. Using multiple sequence alignment, they
identified invariant polar residue, which mapped onto the protein structure to
predicted functional site. This algorithm for functional site prediction was used to
access the validity of transferring the function between homologues. Its use to filter
putative docked complexes with a discrimination similar to that obtained by manually
including biological information about actives sites or binding residues.
Yutaka U. and Kiyoshi A. (2002) proposed MOSBY, a molecular structure
viewer program used to understanding protein molecules. Its designed to portable
with a comprehensive user interface by high-throughput graphic library. Their
MOSBY illustrated the portability and extensibility are prerequisites for a software
platform in scientific computing for variational of analysis and calculations with
atomic coordinates.
Andreas E. et al. (2003), proposed MOBILE, the process presented that
models protein binding-sites including bound ligand molecules as restraints. They
applied homology modeling method based on MODELLER to generated target
protein then refined iteratively by including information about bioactive ligands as
spatial restraints and optimizing the mutual interactions between the ligands and the
binding sites. Thus optimized models can be used for structure-based drug design and
virtual screening.
Marc A. et al. (2003) presented procedure to modeling protein structures from
protein sequences by using comparative modeling procedure, using MODELLER
program with the main tool. They proposed steps in comparative modeling, such as
Fold Assignment and Template Selection, Target-Template Alignment, Model
Building and Predicting the Model Accuracy.�
14
Valentin A. I. et al. (2003) proposed ModView, a web application for
visualization of multiple protein sequences and structures, integrates a multiple
structure viewer, a multiple sequence alignment editor and a database querying
engine. It is possible to interactively manipulate hundreds of proteins, to visualize
conservative and variable residues. In additional it can be included in HTML pages
along with text and Figures, which makes it useful for teaching and presentations.
Thomas J. O. (2004) proposed visualization tool to make correlations between
distinct biological disciplines using visualization techniques to highlight the critical
information. Its tool have had viewer in context maintenance, fisheye sequence view
and magic lens that have been developed to display protein structure and sequence
information.
Vladimir S. et al. (2005) presented a suite of SPACE tools for analysis and
prediction of structures of biomolecules and their complexes. Its includes servers and
programs, LPC/CSU software provides a common definition of inter-atomic contacts
and complementarity of contactiong surfaces to analyze protein structure and
complexes; CryCo server building a crystal environment and analysis of crystal
contacts; CMA to construction and analysis of protein contact maps; MutaProt for
structural analysis of point mutations; SCCOMP for side chain modeling based on
surface complementarity and LIGIN molecular docking software.
David E. et al. (2006) Studied 24 individual assessment scores, including
physics-based energy functions, statistical potentials and machine learning based
scoring functions. Individual scores were also used to construct 85000 composite
scoring functions using support vector machine (SVM) regression. The scores were
tested for their abilities to identify the most native-like models from a set of 6000
comparative models of 20 representative protein structures. Each of the 2o targets was
modeled using a template of <30% sequence identity. The result of the best SVM
score outperformed all individual scores by decreasing the average root mean square
distance (RMSD) difference between the model identified as the best of set and the
model with the lowest RMSD from 0.63 Ao to 0.45 Ao, while having a higher
15
Pearson Correlation coefficient to RMSD(r = 0.87) than any other tested score. They
found that the most accurate score is based on a combination of the DOPE non-
hydrogen atom statistical potential.
Ursula P. et al. (2006) proposed MODBASE, is a database of annotated
comparative protein structure models for all available protein sequence that can be
matched to at least one known protein structure. It’s updated regularly to reflect the
growth in protein sequence and structure database and model assessment. MODBASE
also allows users to generate comparative models for proteins of interest with the
automated modeling server, MODWEB.
B. Balamurugan et al (2007) presented PSAP, Protein Structure Analysis
Package, to calculate and display various hidden structural and functional features of
three dimensional protein structures. The proposed computing engine provides an
easy-to-use Web interface to compute and visualize the necessary features
dynamically on the client machine and the options are intended to better serve
researchers working in area of structural biology.
Marc A. M. et al. (2007) was proposed the DBAli tools that used a
comprehensive set of structural alignments in the DBAli database to leverage the
structural information deposited in the Protein Data Bank (PDB). It included tools to
allows users to input the 3D coordinates of a protein structure for comparison by
MAMMOTH against all chain in the PDB which annotated a target structure based on
the AnnoLite and AnnoLyze tools and stored relationships to other structure and used
the ModClus program that clusters structures by sequence and structure similarities
and use the MOdDom program to identifies domains as recurrent structural fragments
and used implementation of the COMPARER method in the SALIGN command in
MODELLER to creates a multiple structure alignment for a set of related protein
structures. Its freely accessible via the World Wide Web, allow users to mine the
protein structure space by establishing relationships between protein structures and
their functions.
16
Michal J. P. et al. (2007) proposed PROTMAP2D, is a software tool for
calculation of contact and distance maps based on user-defined criteria, quantitative
comparison of pairs or series of contact maps, written in Python programming
language. Its calculate 3D models and provides many options for their visualization,
the statistic and allows saving the output as bitmap graphics or ASCII files use for
MD trajectories comprising multiple conformations.
Stephen C. (2007) presented SChiSM2, web server-based program for creating
web pages that include interactive molecular graphics using Jmol for illustration. The
SChiSM2 interface provides two options, provides URL of structure and used
character of Protein Data Bank file format, for choosing a structure file to display in
the page. It works with World Wide Web implementation software.
17
MATERIALS AND METHODS
Materials
1. Hardware - DELL workstations; Intel Pentium IV 3.0 GHz, 2 GB of RAM
(Cheminformatics Research Unit Department of Chemistry, Kasetsart University,
Bangkok)
2. Software
2.1 Linux Operating System – Fedora Core 6.0
2.2 MODELLER version 9
2.3 Jmol version 11.3
2.4 Python Integrate Development Environment version 2.5
2.5 Java Runtime Environment version 5
2.6 PHP version 5
2.7 phpMyAdmin version 2.5.9
18
METHODS
1. Homology modeling
In this work was using MODELLER as the main program to compute 3D
structure of hemagglutinin protein of influenza A virus subtype H5N1 (HA). The
process consist in 4 step following
Figure 3 Steps in comparative protein structure modeling
19
1.1 Target and Template selection
The first step in homology modeling is to identify one or more template
structures that have detectable similarity to the target. In this work, consist eight-
template structure of HA which user should be select.
1.2 Sequence alignment method
Sequence-structure alignment calculated using the module of
MODELLER based on a global dynamic programming algorithm (Needleman, S. B.
and Wunsch, C. D., 1970). Its different from standard sequence-sequence alignment
method because it takes into account structural information from template when
constructing an alignment. This task achieved through a variable gap penalty function.
Given two sequence of element and an M times N score matrix W where M and N are
the numbers of elements on the first and second sequence. The scoring matrix is
composed of score Wi,j describing differences between element i and j from the first
and second sequence respectively. The recursive dynamic programming formulae that
give a matrix D are:
��
��
�
�� ��
ji
jiji
ji
ji
Q
WD
P
D
.
,1,1
,
, min (1)
��
���
�
��
�
�
vP
gDP
ji
ji
ji,1
,1,
)1(min (2)
��
���
�
��
�
�
vQ
gDQ
ji
ji
ji1.
1,,
)1(min (3)
Where g(1) is a linear gap penalty
lvug .)1( �� (4)
20
Note that only a vector is need for the storage of P and Q. The uppermost
formula in equation calculated for i=M and j=N. Variable l is a gap length and
parameters u and v are gap penalty constants.
The arrays D, P and Q initialized as follows:
���
��
��
Nieeig
eiDi , )(
, 00, (6)
���
��
��
Njeejg
ejD j , )(
, 0,0 (7)
MiQP ii ,...,3,2,1, 0,0, ��� (8)
NjQP jj ,...,3,2,1, ,0,0 ��� (9)
Where parameter e is the maximal number of elements at sequence termini
which are not penalized with a gap penalty if not equivalences. A segment at the
terminus of length e is termed an “overhangs”. The minimal score dM,N is obtain from
),min( ,,, jMNiNM DDd � (10)
Where i=M, M-1, …, M-e and j=N, N-1, …, N-e to allow for the
overhangs. The equivalence assignments are obtained by backtracking in matrix D.
Backtracking starts from the element Di,j=dM,N
1.3 Structure building
Comparative modeling by MODELLER, as implemented in automodel
class can describes as a flowchart
Input: contain; script file, alignment file and PDB file for template.
21
Output: contain; job.log log file, job.ini initial conformation for
optimization, job.rsr restraints file, job.sch VTFM schedule file, job.B9999???? PDB
atom file for the model of the target sequence, job.V9999???? violation profiles for
the model, job.D9999???? progress of optimization , job.BL9999???? optional loop
model, job.DL9999???? progress of optimization for loop model and job.IL9999????
initial structures for loop model. Where “????”indicates the model number,means
value between 0001-9999. The main MODELLER routines used in each step given in
parentheses.
1.3.1 Read and check the alignment between the target sequence and
the template structures.
1.3.2 Calculate restraints on the target from its alignment with the
templates:
1.3.2.1 Generate molecular topology for the target sequence.
Disulfides in the target are assigned here from the equivalent disulfides in the
templates. Any user defined patches are also done here.
1.3.2.2 Calculate coordinates for atoms that have equivalent
atoms in the templates as an average over all templates.
1.3.2.3 Build the remaining unknown coordinates using internal
coordinates from the CHARMM topology library.
1.3.2.4 Write the initial model to a file with extension .ini.
1.3.2.5 Write all restraints to a file with extension .rsr.
1.3.3. Calculate model(s) that satisfy the restraints as well as possible.
For each model, first, generate the optimization schedule for the variable target
function method (VTFM). Next, read the initial model. Last, randomize the initial
22
structure by adding a random number between automodel.deviation angstroms to all
atomic positions.
1.3.4 Optimize the model:
1.3.4.1 Partially optimize the model by VTFM; Repeat the
following steps as many times as specified by the optimization schedule: Select only
the restraints that operate on the atoms that are close enough in sequence, as specified
by the current step of VTFM.Then optimize the model by conjugate gradients, using
only currently selected restraints.
1.3.4.2 Refine the model by simulated annealing with molecular
dynamics, if so selected. First, do a short conjugate gradients optimization. Next,
increase temperature in several steps and do molecular dynamics optimization at each
temperature. Next, decrease temperature in several steps and do molecular dynamics
optimization at each temperature. Last, do a short conjugate gradients optimization.
1.3.5 Calculate the remaining restraint violations and write them out.
1.3.6 Write out the final model to a file with extension .B9999????.pdb
where ???? indicates the model number. Also write out the violations profile.
1.3.7 Superpose the models and the templates, if so selected by
automodel.final_malign3d = True, and write them out.
1.4 Models evaluation
The accuracy of model should be assessed using the Discrete Optimization
Potential Energy (DOPE) score to increase or decrease confidence of model (M.A.
Marti-Renom et al, 2000). DOPE based on an improved reference state that
corresponds to non-interacting atoms in a homogeneous sphere with the radius
dependent on a sample native structure; it thus accounts for the finite and spherical
23
shape of the native structures. It is implemented in the popular homology modeling
program MODELLER and used to assess the energy of the protein model generated
through many iterations by MODELLER, which produces homology models by the
satisfaction of spatial restraints. The models returning the minimum score can be
choose as best probable structures and further used for evaluating with the DOPE
score. DOPE can also generate a residue-by-residue energy profile for the input
model, making it possible for the user to spot the problematic region in the structure
model (Min-yi Shen and Andrej Sali, 2006).
2. System Architectures
For automated comparative modeling of Hemagglutinin H5N1 protein
structure using MODELLER I had proposed schematics diagram for this system
following figure:
Figure 4 3D visualization system to aid understanding mutation in Hemagglutinin
protein (H5N1) Schematics.
How to use
Home Page
login New member
login�pass?
Workspace Page
2D alignment 3D Automodel View job Download job
Briefly Result Download Result Post sequence Post sequence
Type of alignment? Sequence alignment
Model building
Generate evaluation
data
Pairwise Sequence alignment
Multiple sequence alignment
Web pages
User actions
Bot processes
pairwise multiple
yes
no
24
2.1 Web-page Design
2.1.1 Workspace page
In workspace page, a query form offers the six separated query
fields, jobname, Result, Delete, Download and Status. They are connecting query
fields with MySQL command. This page was a main page, which user can automate
structure, sequence alignment, view result, delete result and show job status.
2.1.2 2D Alignment page
When user selected option to sequence alignment, user should be
select type of sequence alignment, pairwise with template sequence alignment or
multiple sequence alignment. The input sequence will uploaded to server side and
operate with bots program.
Figure 5 Schematic diagrams for 2D Alignment page.
Sequence data, sequence name and type of alignment
Write input file: make_align.py
Update jobname, sequence, sequencename and templatename to 3dvis database
Write uploaded file to directory. Write input file: malign.py
Multiple sequence alignment? yes no
25
2.1.3 3D Automodel page
Figure 6 Schematics for 3D Automodel page.
If user selected 3D automodel options, user should be selected
template from list of protein template name and give input sequence to automate 3D
structure. The input will uploaded to server side and operate with bots program.
2.1.4 Briefly Result page
Figure 7 Schematic for briefly result page.
In briefly result page shown of screenshot briefly result page for
automate 3D structure of target sequence. In this page divided into three parts. First,
Get outputname and templatename and position of residue from 3dvis database
Write out 3D structure viewer (Jmol), sequence alignment and
DOPE score graph between template and sequence structure
username and jobname
Sequence data, sequence name and template selected
Write input file: make_align.py, model-single.py, profile-gen.py,
plot.gp and jobname.in
Update jobname, sequence, sequencename and templatename to 3dvis database
26
Display 3D structure with Jmol and introduce some important site, such as
glycosylation, receptor-binding site, mutation residues. Second, Sequence alignment
of template and output structure. Last, Show detail of evaluated model with DOPE
score graph by all of data will show in one web page.
2.1.5 Download Result page
For Download Result page, will shown all of data such as sequence
alignment, 3D structure, DOPE score graph, log files, profile file in each jobs.
2.2 Database Design
Figure 8 The multiple table layout of relative tables in the database
3dvis
3d_id 3d_jobname 3d_date 3d_sequence 3d_sequencename 3d_chain mem_username template_name
member
mem_id mem_username mem_password mem_email mem_name mem_surname mem_address mem_gender mem_birthday mem_jobnum
member_directory
dir_id dir_name job_name job_stat mem_username
templatedetail
template_id template_name template_pdbdirectory template_profiledirectory template_detail
chaindetail
chain_id chain_name chain_sequence chain_sequencelength chain_glycosite chain_antigensite chain_receptorsite chain_cleavsite chain_residuesite template_name
27
For database management using MySql, is selected for back-end database
engine in this research. It’s free, powerful relational database management system
(RDBMS), and robust SQL database server. The database system is consisted of three
sectorial parts, tables in MySQL database, Template structures and Output of target
sequence. Each part has been connected with another by using username and jobname
as demonstrated in figure below.
Figure 9 Three sectorial parts, connected by username and jobname, of data source
collection in database system.
2.2.1 Table in MySQL database
Seven tables were design to store all property fields of data. Three
tables store template and detail of each template structure. One table stores
membership’s information and the last two tables is the connection table.
2.2.2 Job data folder
The data of sequence alignment and structures are stored in db_Sys
folder and link to table by username and jobname.
28
2.2.3 PDB data folder
The data of template structure in pdb (Protein Data Bank) format
was stored in template3d folder and link to templatedetail table by templatename.
2.2.4 Profile data folder
The data of template structure in profile format, using in evaluation
process, was stored in template3d folder and link to templatedetail table by
templatename.
2.3 Bots
Bots, autonomous software that operates as an agent for simulates a human
activity, use to search and executable any data that user submit to this system. It’s
divided into three parts following in sequence alignment, automated 3D structure and
adds ligand to 3D structure using condition of output file in each steps was appeared
when finish.
29
Figure 10 Schematics diagram of Bots program.
��������������
����������������
��������������������
���������������
����������������
����������������
������
������
���
���
���
��
��
���������������
���������������
����������
��������������
���������������� ��
��
��
��
��
��
�����������������
������
������
������
���������������������
��
�����������
30
RESULTS AND DISCUSSION
Graphic user interface had developed and call “3dvis” (3D Visualization
system). There are contained 5 hemagglutinin protein structures for use as template
structure, in PDB file format.
1. Homepage
The first page, when users visit the system in the home page, it is allowed for
all users. This page composts of two parts as show in Figure 10.
Figure 11 Home page of the system.
The first part is head and menu parts. The menus allowed all users access,
including How to use and About this work. If the users need to access as member,
they can register with individual information and aim in accession, by new member
link. The second part is member login part. The member must fill in both username
31
and password. Then the system checks valid user and status whether an authorize
member can go into the system.
2. Membership page
After the system checked valid user, it will go to the next page that is
workspace page. User can selected process menu to alignment, automate model of
protein structure using template structure hemagglutinin H5N1 proteins or pose
question or problem about the system.
2.1 workspace page
This page contains membership menu including 2D alignment, 3D
Automate model, PDB lists, webboard, wiki, Changepassword and Logout menu.
Figure 12 Workspace page.
32
2.2 2dalign page
If user select 2D alignment menu, the 2dalign page will appear. User
should be select template structure to alignment with sequence that user pose to server
(target sequence) or user can download sequence file in fasta format (.fa) to do
multiple sequence alignment.
Figure 13 2dalign page.
2.3 3dauto page
If user select 3D Automate model menu, the 3dauto page will appear. User
should be select template structure to generate model from target sequence that user
pose to server and give the name of target model. In addition user can select ligand
type to add to model.
33
Figure 14 3dauto page.
2.3 Result page
After user manage 3D automate model job completely and the system
calculate model complete user can use view menu to view the result. In this page
contain detail of model that show 3D structure, sequence alignment and DOPE score
evaluation graph that assist user to choose as best probable structures.
34
Figure 15 Result page.
35
2.4 Download page
When the system completely calculated structure, user can download all
file that generated as input and output file for MODELLER program.
Figure 16 Download page.
36
3. Example Model – KAN-1 (chain A)
3.1 With 1JSO template structure.
From figure 15 and 16, is the result of calculated structure from sequence
of KAN-1 hemagglutinin (Influenza A virus (A/Thailand/1(KAN-1)/2004(H5N1))).
The result shown mutation in residues at position 45, 71, 83, 88, 93, 94, 107, 108,
115, 119, 124, 126, 129, 138, 139, 140, 155, 156, 174, 198, 209, 212, 217, 261, 263,
268, 309 and 320 when compare with 1JSO template (Structure of avian H5
hemagglutinin bound to LSTC receptor). The result file contain alignment file (.pap),
3D structure file (.pdb), DOPE score graph picture (.png), profile file (.profile), and
other input and output file of MODELLER.
3.2 With 1JSM template structure.
For the 1JSM template, mutation residue appeared in positions 45, 71, 83,
88, 93, 94, 107, 108, 115, 119, 124, 126, 129, 138, 139, 140, 155, 156, 174, 198, 209,
212, 217, 261, 263, 268, 308 and 320.
3.3 With 1JSN template structure.
For the 1JSN template, mutation residue appeared in positions 45, 71, 83,
88, 93, 94, 107, 108, 115, 119, 124, 126, 129, 138, 139, 140, 155, 156, 174, 198, 209,
212, 217, 261, 263, 268, 308 and 320.
3.4 With 2FK0 template structure.
For the 2FK0 template, mutation residue appeared in positions 36 and 139.
3.5 With 2IBX template structure.
For the 2IBX template, mutation residue appeared in position 139.
37
4. Compare structure with another resource
To determine structure of the system can perform to initial structure, we
compare model with model that calculate from SWISS-MODEL WORKSPACE
(http://swissmodel.expasy.org/workspace/index.php). The comparison structure of
both resources was calculated DOPE score. The result following figure 17 and table 1.
Figure 17 DOPE score graph between 3D structure of KAN-1 structure that calculate
with the 3dvis system (green line) and SWISS-MODEL workspace (red
line) and 3D structure of 1JSO.
38
Table 1 DOPE score for mutation residue of KAN-1 model, calculated from this
system and SWISS-MODEL workspace, and 1JSO structure in each residues
and different percent value compare each residue of model with residue of
template structure.
1jso KAN-1 3dvis swiss 1jso 3dvis-1jso swiss-1jso45 ASN ASP -3.30E-02 -3.70E-02 -3.90E-02 0.846 0.94971 LEU ILE -3.30E-02 -3.40E-02 -3.40E-02 0.971 1.00083 ASP ALA -3.50E-02 -3.50E-02 -3.40E-02 1.029 1.02988 GLY ASP -3.40E-02 -3.40E-02 -3.30E-02 1.030 1.03093 GLU GLY -3.30E-02 -3.40E-02 -2.50E-02 1.320 1.36094 ASN ASP -3.30E-02 -3.40E-02 -2.50E-02 1.320 1.360
107 SER ARG -3.50E-02 -3.80E-02 -3.50E-02 1.000 1.086108 THR ILE -3.30E-02 -3.60E-02 -3.40E-02 0.971 1.059115 ARG GLN -3.50E-02 -3.60E-02 -3.70E-02 0.946 0.973119 ARG LYS -3.00E-02 -3.10E-02 -3.20E-02 0.938 0.969124 ASN SER -2.50E-02 -2.40E-02 -2.60E-02 0.962 0.923126 ASP GLU -2.80E-02 -2.60E-02 -2.60E-02 1.077 1.000129 SER LEU -3.00E-02 -3.00E-02 -2.50E-02 1.200 1.200138 ASN GLN -3.80E-02 -3.80E-02 -4.00E-02 0.950 0.950139 GLY ARG -3.70E-02 -3.70E-02 -4.00E-02 0.925 0.925140 ARG LYS -3.50E-02 -3.60E-02 -3.80E-02 0.921 0.947155 ASN SER -3.20E-02 -3.30E-02 -2.90E-02 1.103 1.138156 ALA THR -3.20E-02 -3.40E-02 -3.00E-02 1.067 1.133174 ILE VAL -4.40E-02 -4.60E-02 -4.60E-02 0.957 1.000198 VAL ILE -3.80E-02 -3.90E-02 -3.80E-02 1.000 1.026209 SER LEU -3.20E-02 -3.50E-02 -3.10E-02 1.032 1.129212 GLU ARG -2.90E-02 -3.10E-02 -2.90E-02 1.000 1.069217 PRO SER -2.90E-02 -3.10E-02 -3.20E-02 0.906 0.969261 GLY ASP -2.70E-02 -2.80E-02 -2.50E-02 1.080 1.120263 ALA THR -2.90E-02 -2.90E-02 -2.80E-02 1.036 1.036268 GLY GLU -2.90E-02 -2.90E-02 -3.00E-02 0.967 0.967309 GLY ASN -3.50E-02 -3.80E-02 -3.30E-02 1.061 1.152320 VAL SER -3.70E-02 -3.60E-02 -3.20E-02 1.156 1.125
number ofresidue
Residue DOPE score Absolute ratio
39
Figure 18 DOPE score graph between 3D structure of KAN-1 structure that calculate
with the 3dvis system (green line) and SWISS-MODEL workspace (red
line) and 3D structure of 1JSN.
40
Table 2 DOPE score for mutation residue of KAN-1 model, calculated from this
system and SWISS-MODEL workspace, and 1JSN structure in each
residues and different percent value compare each residue of model with
residue of template structure.
1jsn KAN-1 3dvis swiss 1jsn 3dvis-1jsn swiss-1jsn45 ASN ASP -3.50E-02 -3.10E-02 -3.90E-02 0.897 0.79571 LEU ILE -3.30E-02 -3.40E-02 -3.40E-02 0.971 1.00083 ASP ALA -3.50E-02 -3.50E-02 -3.30E-02 1.061 1.06188 GLY ASP -3.40E-02 -3.40E-02 -3.20E-02 1.063 1.06393 GLU GLY -3.30E-02 -3.40E-02 -2.50E-02 1.320 1.36094 ASN ASP -3.20E-02 -3.40E-02 -2.50E-02 1.280 1.360
107 SER ARG -3.70E-02 -3.80E-02 -3.30E-02 1.121 1.152108 THR ILE -3.50E-02 -3.60E-02 -3.20E-02 1.094 1.125115 ARG GLN -3.50E-02 -3.60E-02 -3.60E-02 0.972 1.000119 ARG LYS -3.00E-02 -3.10E-02 -3.20E-02 0.938 0.969124 ASN SER -2.50E-02 -2.40E-02 -2.60E-02 0.962 0.923126 ASP GLU -2.70E-02 -2.60E-02 -2.60E-02 1.038 1.000129 SER LEU -3.00E-02 -3.00E-02 -2.50E-02 1.200 1.200138 ASN GLN -3.60E-02 -3.80E-02 -4.00E-02 0.900 0.950139 GLY ARG -3.50E-02 -3.70E-02 -3.70E-02 0.946 1.000140 ARG LYS -3.30E-02 -3.60E-02 -3.80E-02 0.868 0.947155 ASN SER -3.10E-02 -3.30E-02 -2.90E-02 1.069 1.138156 ALA THR -3.10E-02 -3.40E-02 -3.00E-02 1.033 1.133174 ILE VAL -4.50E-02 -4.60E-02 -4.60E-02 0.978 1.000198 VAL ILE -3.90E-02 -3.90E-02 -3.70E-02 1.054 1.054209 SER LEU -3.30E-02 -3.50E-02 -3.10E-02 1.065 1.129212 GLU ARG -2.80E-02 -3.10E-02 -2.70E-02 1.037 1.148217 PRO SER -2.80E-02 -3.10E-02 -2.90E-02 0.966 1.069261 GLY ASP -2.70E-02 -2.80E-02 -2.40E-02 1.125 1.167263 ALA THR -2.70E-02 -2.90E-02 -2.60E-02 1.038 1.115268 GLY GLU -2.70E-02 -2.90E-02 -3.00E-02 0.900 0.967309 GLY ASN -3.60E-02 -3.80E-02 -3.20E-02 1.125 1.188320 VAL SER -3.60E-02 -3.60E-02 -3.10E-02 1.161 1.161
number ofresidue
Residue DOPE score Absolute ratio
41
Figure 19 DOPE score graph between 3D structure of KAN-1 structure that calculate
with the 3dvis system (green line) and SWISS-MODEL workspace (red
line) and 3D structure of 1JSM.
42
Table 3 DOPE score for mutation residue of KAN-1 model, calculated from this
system and SWISS-MODEL workspace, and 1JSM structure in each
residues and different percent value compare each residue of model with
residue of template structure.
1JSO KAN-1 3dvis swiss 1jso 3dvis-1jso swiss-1jso45 ASN ASP -0.033 -0.038 -0.039 0.846 0.97471 LEU ILE -0.033 -0.034 -0.035 0.943 0.97183 ASP ALA -0.034 -0.035 -0.034 1.000 1.02988 GLY ASP -0.033 -0.034 -0.033 1.000 1.03093 GLU GLY -0.033 -0.034 -0.025 1.320 1.36094 ASN ASP -0.032 -0.034 -0.025 1.280 1.360
107 SER ARG -0.037 -0.038 -0.035 1.057 1.086108 THR ILE -0.036 -0.036 -0.033 1.091 1.091115 ARG GLN -0.035 -0.037 -0.036 0.972 1.028119 ARG LYS -0.030 -0.031 -0.032 0.938 0.969124 ASN SER -0.025 -0.025 -0.026 0.962 0.962126 ASP GLU -0.027 -0.027 -0.026 1.038 1.038129 SER LEU -0.030 -0.030 -0.025 1.200 1.200138 ASN GLN -0.037 -0.035 -0.040 0.925 0.875139 GLY ARG -0.036 -0.034 -0.040 0.900 0.850140 ARG LYS -0.035 -0.033 -0.038 0.921 0.868155 ASN SER -0.031 -0.033 -0.029 1.069 1.138156 ALA THR -0.032 -0.034 -0.030 1.067 1.133174 ILE VAL -0.044 -0.046 -0.046 0.957 1.000198 VAL ILE -0.038 -0.039 -0.038 1.000 1.026209 SER LEU -0.032 -0.035 -0.031 1.032 1.129212 GLU ARG -0.028 -0.031 -0.029 0.966 1.069217 PRO SER -0.029 -0.031 -0.032 0.906 0.969261 GLY ASP -0.027 -0.028 -0.025 1.080 1.120263 ALA THR -0.028 -0.029 -0.028 1.000 1.036268 GLY GLU -0.028 -0.029 -0.031 0.903 0.935309 GLY ASN -0.036 -0.038 -0.032 1.125 1.188320 VAL SER -0.038 -0.037 -0.032 1.188 1.156
number ofresidue
Residue DOPE score Absolute ratio
43
Figure 20 DOPE score graph between 3D structure of KAN-1 structure that calculate
with the 3dvis system (green line) and SWISS-MODEL workspace (red
line) and 3D structure of 2FK0.
Table 4 DOPE score for mutation residue of KAN-1 model, calculated from this
system and SWISS-MODEL workspace, and 2FK0 structure in each
residues and different percent value compare each residue of model with
residue of template structure.
2FK0 KAN-1 3dvis swiss 2fk0 3dvis-2fk0 swiss-2fk036 LYS THR -0.028 -0.029 -0.020 1.400 1.450
139 GLY ARG -0.034 -0.037 -0.026 1.308 1.423
number ofresidue
Residue DOPE score Absolute ratio
44
Figure 21 DOPE score graph between 3D structure of KAN-1 structure that calculate
with the 3dvis system (green line) and SWISS-MODEL workspace (red
line) and 3D structure of 2IBX.
Table 5 DOPE score for mutation residue of KAN-1 model, calculated from this
system and SWISS-MODEL workspace, and 2IBX structure in each
residues and different percent value compare each residue of model with
residue of template structure.
2ibx KAN-1 3dvis swiss 2ibx 3dvis-2ibx swiss-2ibx139 GLY ARG -0.036 -0.037 -0.040 0.900 0.925
number ofresidue
Residue DOPE score Absolute ratio
45
From the result, KAN-1 sequence was conserved to 2IBX structure more than
2FK0, and same mutation in 1JSO, 1JSN and 1JSM structures. In each template
structures, were different in coordinate ligand and function, 2IBX structure is
hexamer form of H5N1 viruses isolated from human; 2FK0 structure is hexamer form
of hemagglutinin of influenza A virus (a/viet nam/1203/2004(h5n1)); 1JSO 1JSN and
1JSM structures is structure of avian H5 hemagglutinin, but structure have had bound
to LSTC (Sialyllacto-N-tetraose C) receptor in 1JSO and bound to LSTA (Sialyllacto-
N-tetraose A) receptor in 1JSN. So, user can select template for target structure that
consider to initial structures. We found that the DOPE score for structure calculated
with the system and SWISS-MODEL workspace was small different.
46
CONCLUSION AND RECOMMENDATION
Conclusion
This research developed 3D Visualization System to Aid Understanding
Hemagglutinin Mutations in H5N1� Influenza Virus� to preparing initial structure for
researcher to predict theoretical function of HA protein. Objective of system was
integrating software for homology modeling and structure visualization, control with
server side script and client side script language to aid assess user to determine model
that calculated. The development of the system consists of many steps. First, the x-ray
structures of hemagglutinin protein subtype H5N1 were collected to database. Second,
improved system architecture for system shown as figure 4, the system contain three
part; alignment, automate model and portal. Last, design and coding system for the
server side, developed by MySQL command, python and PHP script language to
control process that operate and management all job that user posed to server, and the
client side, improved by hypertext markup language (HTML) and javascript language
as graphic user interface of system. The system efficiency was acceptable, by
compared result with acceptable resource (SWISS-MODEL workspace) with
acceptable score (DOPE score).
Recommendation
The survey satisfy of user about 3dvis system, using purposive sampling in
student and researcher whose have skill in computational and protein modeling, the
result was good agreement with system (see appendix C). Suggestion of system could
be include security, frequency ask question (FAQ), checking of error and artificial
intelligent about sequence alignment.
To develop the next version of 3dvis system we prepared some data to aid
improvement such as system architecture (methods section) and survey satisfy data
about the system (appendix C).
47
LITERATURE CITED
�
Alfred, V. A.. 2004. Software and the Future of Programming Languages. Science
303: 1331-1333.
Andras, F.and S., Andrej. 2003. ModLoop: automated modeling of loops in protein
structures. BIOINFORMATICS. 19(18): 2500-2501.
Andrej, S.. 1995. MODELER: Implementing 3D protein modeling. mc2 Molecular
Simlations Inc Burlington MA. 2(5).
_________. 1995. Modeling mutations and homologous proteins. Curr. Opin.
Biotech. 6: 437-451.
_________ and T. L., Blundell. 1993. Comparative protein modeling by satisfaction
of spatial restraints. J. Mol. Biol. 234: 779-815.
_________, P., Liz, Y., Feng, V., Herman and K., Martin. 1995. Evaluation of
Comparative Protein Modeling by MODELLER. PROTEINS: Struct, Funct
and Genom. 23:318-326.
_________, L., Potterton, F., Yuan, H., Vlijmen and M., Karplus. 1995. Evaluation of
comparative protein modeling by MODELLER. Prot.: Struct., Func. Gene.
29:318-326.
Adrian, A. C., A. S., Andrew and L. D., Roland. 2003. A graph-theory algorithm for
rapid protein side-chain prediction. Prot. Sci. 12: 2001–2014.
Basak, I., D., Pemra and B., Ivet. 2002. Functional Motions of Influenza Virus
Hemagglutinin: A Structure-Based Analytical Approach. Biophys. J. 82: 569-
581.
48
Balamurugan, B., M. N. A. M., Roshan, B. S., Hameed, K., Sumathi, R.,
Senthilkumar, A., Udayakumar, K. H. V., Babu, M., Kalaivani, G., Sowmiya,
P., Sivasankari, S., Saravanan, C. V., Ranjani, K., Gopalakrishnan, K. N.,
Selvakumar, M., Jaikumar, T., Brindha, D., Michaela and K., Sekar .2007.
PSAP: protein structure analysis package. Journal of Applied
Crystallography. 40: 773-777.
Clayton, W. N., S. H., Virginia and G. W., Robert. 1984. Mutations in the
Hemagglutinin Receptor-Binding Site Can Change the Biological Properties
of an Influenza Virus. J. Virol. 51(2): 567–569.
Cros, J and P., Palese 2003. Trafficking of viral genomic RNA into and out of the
nucleus: influenza, Thogoto and Borna disease viruses. Virus Res. 95(1-2):3-
12.
David, E., S., Min-yi, D., Damien, M., Francisco, S., Andrej and A. M., Marc. 2006.
A composite score for predicting errors in protein structure models. Prot. Sci.
15: 1653-1666.
Edwin, D. K.. 2006. Influenza Pandemics of the 20th Century. Emerging Infectious
Diseases. 12(1): 9-14.
Eran, E., N., Rafael, J. M., Brendan, E., Marvin and S., Vladimir. 2004. Importance of
Solvent Accessibility and Contact Surfaces in Modeling Side-Chain
Conformations in Proteins. J. Comput. Chem. 25(5): 712-724.
Eswar, N., B., Webb, M. A., Marti-Remon, M. S., Madhusudhan, D., Eramian, M.,
Shen, U., Pieper and A., Sali. 2006. Comparative protein structure modeling
using Modeller. Curr. Prot. Bioinfo. Suppl. 15: 5.6.1-5.6.30.
Fiser, A., R. K., Do and A., Sali. 2000. Modeling of loops in protein structures. Prot.
Sci. 9: 1753-1773.
49
Gubareva, L. V., L., Kaiser and F. G., Hayden. 2000. Influenza virus neuraminidase
inhibitors. Lancet 355(9206): 827-835.
John, J. S. and C. W., Don. 2000. RECEPTOR BINDING AND MEMBRANE
FUSION IN VIRUS ENTRY: The Influenza Hemagglutinin. Annu. Rev.
Biochem. 2000. 69: 531–569.
John, K. O.. 1998. Scripting: Higher Level Programming Languages for the 21st
Century. IEEE Computer. 98: 23–30.
Kash, J., A., Goodman, M., Korth and M., Katze. 2006. Hijacking of the host-cell
response and translational control during influenza virus infection. Virus Res.
119(1):111-120.
Konstantin, A., B., Lorenza, K., Jurgen and S., Torsten. 2006. The SWISS-MODEL
workspace: a web-based environment for protein structure homology
modeling. BIOINFORMATICS. 22(2): 195-201.
Marc, A. M., P., Ursula, M. S., Madhusudhan, R., Andrea, E., Narayanan, P. D., Fred,
A., Fatima, D., Joaquin and S., Andrej. 2007. DBAli tools: mining the protein
structure space. Nucleic. Acids. Res. 35: W393-397.
___________, F., Andras, M. S., Madhusudhan, J., Bino, S., Ashley, E., Narayanan,
P., Ursula, S., Min-yi, and S., Andrej. 2003. Modeling Protein Structure from
Its Sequence. Curr. Prot. Bioinfo. 5.1: 5.1.1-5.1.32.
Melike, L., J. R., Michael, P. B., Hazen and Z., Xiaowei. 2003. Visualizing infection
of individual influenza viruses. PNAS 100(16):9280–9285.
Michal, J. P., T., Irina and M. B., Janusz. 2007. PROTMAP2D: visualization,
comparison and analysis of 2D maps of protein structure.
BIOINFORMATICS. 23(11): 1429-1430.
50
Narayanan, E., J., Bino, M., Nebojsa, F., Andras, A., Valentin, P., Ursula, C. S.,
Ashley, A. M., Marc, M. S., Madhusudhan, Y., Bozidar and S., Andrej. 2003.
Tools for comparative protein structure modeling and analysis. Nucleic.
Acids. Res. 31(13): 3375-3380.
Needleman, S. B. and C. D., Wungch. 1970. A general method applicable to the
search for similarities in the amino acid sequence of two proteins. J. Mol.
Biol. 48: 443-453.
Patrick, A., Q., Enrique, X. A., Francesc and J. E. S., Michael. 2001. Automated
Structure-based Prediction of Functional Sites in Proteins: Applications to
Assessing the Validity of Inheriting Protein Function from Homology in
Genome Annotation and to Protein Docking. J. Mol. Biol. 311: 395-408.
Roshan, B. B., M. N. A., Md, B. S., Hameed, K., Sumathi, R., Senthilkumar, A.,
Udayakumar, K. H. V., Babu, M., Kalaivani, G., Sowiya, P., Sivasankari, S.,
Saravanan, C. V., Ranjani, K., Gopalakrishnan, K.N., Selvakumar, M.,
Jaikumar, T., Brindha, D., Michael and K., Sekar. 2007. PSAP: protein
structure analysis package. J.Appl.Cryst. 40: 773–777.
Robert, G. W., J. B., William, T. G., Owen, M. C., Thomas, and K., Yoshihiro. 1992.
Evolution and Ecology of Influenza A Viruses. Microbiol Rev. 56(1): 152-
179.
Shiuh-Ming, L, W. M., Martha and S. H., Virginia. 1992. Hemagglutinin Mutations
Related to Antigenic Variation in Hi Swine Influenza Viruses. J. Virol.
66(2):1066-1073.
Stephen, C.. 2007. SChiSM2: creating interactive web page annotations of molecular
structure models using Jmol. BIOINFORMATICS. 23(3): 383-384.
51
Suzuki, Y.. 2005. Sialobiology of Influenza Molecular Mechanism of Host Range
Variation of Influenza Viruses. Biol. Pharm. Bull. 28(3): 399-408.
Torsten, S., K., Jurgen, G., Nicolas and C. P., Manuel. 2003. SWISS-MODEL: an
automated protein homology-modeling server. Nucleic. Acids. Res. 31(13):
3381–3385.
Thomas, J. O.. 2004. A Java applet for multiple linked visualization of protein
structure and sequence. J. Comp. Aid. Mol. Design. 18: 225-234.
Ursula, P., E., Narayanan, P. D., Fred, B., Hannes, M. S., Madhusudhan, R., Andrea,
M., Marc, K., Rachel, M. W., Ben, E., David, S., Min-Yi, K., Libusha, M.,
Francisco and S., Andrej. 2006. MODBASE: a database of annotated
comparative protein structure models and associated resources. Nucleic.
Acids. Res. 34: D291-D295.
Valentin, A. l., P., Ursula, C. S., Asley, A. M., Marc, M., Linda and S., Andrej. 2003.
ModView, visualization of multiple protein sequences and structures.
BIOINFORMATICS. 19(1): 165-166.
Vladimir, S., E., Eran, G., Sergey, P., Vladimir, B., Mariana, P., Jaime and E.,
Marvin. 2005. SPACE: a suite of tools for protein structure prediction and
analysis based on complementarity and environment. Nucleic. Acids. Res. 33:
W39-W43.
Yutaka, U. and A., Kiyoshi. 2002. MOSBY: a molecular structure viewer program
with portability and extensibility. J. Mol. Graph. Mod. 20: 411-413.
52
APPENDICES
53
Appendix A
Methodologies implemented in MODELLER
54
Methodologies implemented in MODELLER
1. Structure optimization method
Structure optimization methods implemented in MODELLER contain general
form of the objective function and the structures of optimization are similar to
molecular dynamics programs, such as CHARMM.
1.1 Objective function
MODELLER minimizes the objective function F with respect to
Cartesian coordinates of 10,000 atoms (3D points) that form a system, one or more
molecules.
����i
iiisymm pfcFRFF ),()(
(1)
Where symmF is an optional symmetry term, R are Cartesian
coordinates of all atoms, is a restraint i, f is a geometric feature of a molecule, and P
are parameters. For a 10,000 atom system there can be on the order of 200,000
restraints. The form of c is simple; it includes a quadratic function, cosine, a weighted
sum of a few Gaussian functions, Coulomb law, Lennard-Jones potential, cubic
splines, and some other simple functions. The geometric features presently include a
distance, an angle, a dihedral angle, a pair of dihedral angles between two, three, four
atoms and eight atoms, respectively, the shortest distance in the set of distances (not
documented further), solvent accessibility in oA 2 , and atom density expressed as the
number of atoms around the central atom. A pair of dihedral angles can be used to
restrain such strongly correlated features as the mainchain dihedral angles � and� .
Each of the restraints also depends on a few parameters ip that generally vary from a
restraint to a restraint. Some restraints can restrain pseudo-atoms such as a gravity
center of several atoms.
55
There are two kinds of restraints, static and dynamic, that both
contribute to the objective function
dssymm FFFF ��� (2)
The static restraints and their parameters are pre-defined. The
dynamic restraints are re-generated repeatedly during optimization. All dynamic
restraints are always selected and they can restrain only pairs of atoms. In all other
respects, the two kinds of restraints are the same.
The dynamic restraints are obtained from a dynamic pairs list (the
non-bonded pairs list). Each dynamic pair corresponds to at least one restraint, which
may or may not be violated. The dynamic pairs list includes only the pairs of atoms
that satisfy the following three conditions: (1) One or both atoms in a pair are allowed
to move. (2) The two atoms are not connected through one, two, or three chemical
bonds. (3) The two atoms are closer than a preset cutoff distance��There are on the
order of 5000 atom pairs in the dynamic pairs list when only soft-sphere overlap
restraints are used. Currently, the restraint types on the dynamic atom pairs that can be
selected include the soft-sphere overlap, Lennard-Jones, Coulomb interactions, and
MODELLER non-bonded spline restraints. The existence of the dynamic pairs list is
justified by the fact that dynamic pairs are usually a small fraction of all possible
atom-atom pairs, 2/)1( �NN , where N is the number of atoms in a system.
The dynamic pairs list is not necessarily re-generated each time the
objective function is evaluated, although the contribution of the restraint to the
objective function is calculated in each call to the objective function routine with the
current values of the Cartesian coordinates. The dynamic pairs list is re-generated
only when maximal atomic shifts accumulate to a value larger than a preset cutoff.
This cutoff is chosen such that there cannot be a violation of a restraint without
having its atom pair on the dynamic pairs list. The dynamic pairs list is recalculated in
56
%20~ and %2~ of the objective function calls at the beginning and the end of
optimization, respectively.
Each evaluation of the objective function or of its first derivatives
with respect to the Cartesian coordinates involves the following steps:
a) Calculate non-fixed pseudo-atoms from the current atomic
positions.
b) Update the dynamic pairs list, if necessary.
c) Calculate the violations of selected restraints and all other
quantities that are shared between the calculations of the objective function and its
derivatives.
d) Sum the contributions of all violated restraints to the
objective function and the derivatives.
1.2 Optimizer
MODELLER currently implements a Beale restart conjugate gradients
algorithm and a molecular dynamics procedure with the leap-frog Verlet integrator.
The conjugate gradients optimizer is usually used in combination with the variable
target function method which is implemented with the automodel class. The
molecular dynamics procedure can be used in a simulated annealing protocol that is
also implemented with the automodel class.
1.2.1 Molecular dynamics
Force in MODELLER is obtained by equating the objective
function with internal energy in kcal/mole. The atomic masses are all set to that of
C (MODELLER unit is kg/mole). The initial velocities at a given temperature are
obtained from a Gaussian random number generator with a mean and standard
deviation of:
0_
�xv (4)
57
i
Bx m
Tk��
(5)
Where Bk the Boltzmann constant, mi is is the mass of one 12C
atom, and the velocity is expressed in angstroms/femtosecond. The Newtonian
equations of motion are integrated by the leap-frog Verlet algorithm.
iiii m
ttr
Fttr
ttr
���)(22
..
��
� �
� ���
� �
� � (6)
tt
trtrttr iii ��
� �
� �
� ����2
)()(.
(7)
Where ir is the position of atom i. In addition, velocity is
capped at a maximum value, before calculating the shift, such that the maximal shift
along one axis can only be cap_atom_shift. The velocities can be equilibrated every
equilibrate steps to stabilize temperature. This is achieved by scaling the velocities
with a factor f:
kinET
f � (8)
��N
iiikin rmE.2
21
(9)
Where kinE is the current kinetic energy of the system.
1.2.2 Langevin dynamics
Langevin dynamics (LD) are implemented as in the equations
of motion are modified as follows.
58
�
�
�
�
��
� �
�
��
�
�
�
�
�
��
� �
� ���
� �
� �tm
ttr
FR
t
tttr
ttr
iiiii
��
�
��
����
21
1
1)(
21
1
21
1
22
..
(10)
Where � is a friction factor (in sf / ) and iR a random force,
chosen to have zero mean and standard deviation
tTkm
R Bii �
��
2)( � (11)
1.2.3 Self-guided MD and LD
MODELLER also implements the self-guided MD and LD
methods. For self-guided MD, the equations of motion are modified as follows.
�
� �
�
�����
� �
���
)()()(1)(
trF
ttgtt
ttgtt
tgi
il
il
i ���
��
(12)
iiiii m
ttr
Fttg
ttr
ttr
���
���
� �
�
����
� �
� ���
� �
� �)(
)(22
..
(13)
Where � is the guiding factor, the same for all atoms, lt the
guide time in femtoseconds, and ig a guiding force, set to zero at the start of the
simulation. Position ir is updated in the usual way. For self-guided Langevin
dynamics, the guiding forces are determined as follows:
�
� �
� ����
� �
���
2)(1)(
. ttrm
tt
ttgtt
tg iil
il
i�
��
��
(14)
A scaling parameter � is then determined by first making an
unconstrained half step.
59
iiiiii m
ttr
FRtg
ttrtr
���
��
�
� �
����
�
� �
� ��)(
)(21
2)(
..' (15)
�
�
�
�
�
� �
� ��
��
N
i ii
N
i ii
trm
trtgt.
2'
.'
)(
)()(
21
���� (16)
1
2)(
1�
�
� �
� ���
t���� (17)
Finally, the velocities are advanced using the scaling factor.
iiiiii m
ttr
FRtg
ttr
ttr
��
��
��
� �
�
����
� �
� ����
� �
� �)(
)(2
)12(2
..
(18)
1.2.4 Rigid molecular dynamics
Where rigid bodies are used, these are optimized separately
from the other atoms in the system. This has the additional advantage of reducing the
number of degrees of freedom. The state of each rigid body is specified by the
position of the center of mass, COMr , and an orientation quaternion, ~
q . The quaternion
has 4 components, 1q through 4q , of which the first three refer to the vector part, and
the last to the scalar. The translational and rotational motions of each body are
separated. Each body is translated about its center of mass using the standard Verlet
equations using the force:
�
�
i iCOM rF
rF
(19)
Where the sum i operates over all atoms in the rigid body, and
ir is the position of atom i in real space. For the rotational motion, the orientation
60
quaternions are again integrated using the same Verlet equations. For this, the
quaternion accelerations are calculated using the following relation:
�
�
�
�
��
��
��
�
4321
3412
2143
1234
qqqq
qqqq
qqqq
qqqq
W (20)
Where W is the orthogonal matrix and .'kw is the first derivative
of the angular velocity (in the body-fixed frame) about axis k, angular acceleration.
These angular accelerations are in turn calculated from the Euler equations for rigid
body rotation, such as:
x
zyZyxk I
wwIITw
''.' )( ��� (21)
Similar equations exist for the y and z components. The
angular velocities 'w are obtained from the quaternion velocities.
.~
'
'
'
2
0
qWw
w
w
z
y
x
�
�
�
�
�
(22)
The torque, T , in the body-fixed frame, is calculated as
iiCOMi r
FxrrAT
��� � )( (23)
61
And A is the rotation matrix to convert from world space to
body space.
�
�
�
�
����
����
����
�
21
21
21
2
24
2341324231
413224
224321
4221432124
21
qqqqqqqqqq
qqqqqqqqqq
qqqqqqqqqq
A (24)
And finally the component of the inertia tensor, xI , is given
by
� ��i
ziyiix rrmI )( 2',
2', (25)
Where 'ir is the position of each atom in body space, and im is
the mass of atom i (taken to be the mass of one 12C atom, as above). Similar relations
exist for the y and z components. The kinetic energy of each rigid body, used for
temperature control, is given as a combination of translation and rotational
components:
� �2'2'2'.
2
21
21
zzyyxxCOMi
bodykin wIwIwIrmE ���
�
� �
�� � (26)
Initial translational and rotational velocities of each rigid body
are set in the same way as for atomistic dynamics.
1.2.5 Rigid minimization
The state of each rigid body is specified by 6 parameters: the
position of the center of mass, COMr , and the rotations in radians about the body-fixed
62
axes: x� , y� , and z� . The first derivative of the objective function F with respect to
the center of mass is, and those with respect to the angles from:
iik
k rF
rMF
�
.'�
(27)
The transformation matrices kM are given as.
���
�
�
���
�
�
��
���
���
�
xyxy
xyzxzxyzxz
xyzxzxyzxz
x cM
����
����������
����������
sincos coscos 0
sinsinsincosos cossinsinsincos 0
sinsincoscossin cossincossinsin 0
���
�
�
���
�
�
���
��
���
�
xyxyy
xyzxyzyz
xyzxyzyz
yM
�����
��������
��������
cossin sinsin cos
coscossin sincossin sinsin
coscoscos sincoscos sincos
���
�
�
�����
���
�
0 0 0
cossinsinsinsin sinsincoscossin coscos
cossinsinsincos sinsinsincoscos cossin
xyzxzxyzxzyz
xyzxzxyzxzyz
zM ������������
������������
(28)
The atomic positions ir are reconstructed when necessary from
the body's orientation by means of the following relation, where is the rotation
matrix.
COMii rMrr �� ' (29)
���
�
�
���
�
�
��
���
��
�
xyxyx
xyzxzxyzxzyz
xyzxzxyzxzyz
M
�����
������������
������������
coscos sincos sin
cossinsinsincos sinsinsincoscos cossin
cossincossinsin sinsincoscossin coscos
(30)
63
2 Statistical potential
The statistical potential for assessment and prediction of protein
structure implemented in MODELLER that used in this work was Discrete
Optimization Potential Energy (DOPE) score. DOPE is an atomic distance-dependent
statistical potential calculated from a simple of native protein structure.
Prediction of the native structure of a protein would be enabled by
expressing as a scoring function whose global optimum corresponds to the native
structure. One such function is a joint probability density function of the Cartesian
coordinates of the protein atoms, given available information I about the
system, )|,...,,,( 321 Ixxxxp N���� , where N is the number of atoms in the protein and ix�
are Cartesian coordinates of atom i. For each atom in a given protein, the joint
probability density function (pdf) p gives the probability density that the atom I of the
native structure is positioned very close to ix� . In general, information I may include
the sequence of the protein, a molecular mechanics force field, experimental structural
information, a sample of known native structure and an alignment of the sequence to a
related known protein structure. The joint pdf p can be approximated by a normalized
product of the pair pdfs for all protein atom pairs:
������������
N
jiji
NN
ii
N
jijiN xxpxpxxpxxxxp
�
�
��
� �
�� ),()(/),(),...,,,(
2
321 � (31)
The denominator is derived from the condition that be joint pdf must be
a product of single body pdfs when all the pair pdfs are uncorrelated with each other.
The terms in the denominator, )( ixp � , are single-body distribution functions that
depend only on the composition of the protein and the total volume of the system. In
other words, )( ixp � is the number density of atom I, equal to the reciprocal volume of
the system. An )( ixp � is constant for a given protein, it does not impact on the rank
order of different conformations and is ignored here.
64
In the context of the statistical mechanical liquid state theory, Equation
(31) is also known as the Kirkwood superposition approximation. The superposition
approximation would be exact only if all the pair pdfs were mutually independent
from each other. The pair pdfs ),( ji xxp �� of atom pairs are generally interdependent
because each atom in the system interacts with more than one other stom. In general,
the Kirkwood approximation of the joint pdf ),...,,,( 32 Ni xxxxp ���� by pair pdfs ),( ji xxp ��
is cleary more accurate than a product of N single-body pdfs, )( ixp � .
Estimate the pair pdf ),( ji xxp �� for all atom parts ),( ji , using a single
sample native structure. A structure is defined by internal coordinates that are
invariant with respect to translation and rotation. Thus, the interparticle distance r
between ix� and ix� is the most relevant internal coordinate for a pair of atoms.
Consequently, the distribution that can be estimated directly from a sample native
structure is the distance pdf for a pair of atom type:
� �
irimn
mnmn rrN
rNrp
)()(
)( (32)
Where m and n denote the atom types and )(rNmn is the number of atom
type pairs (m,n) at a distance within ][ rr � . The distance pdf is proportional to the
number of (m,n) pairs in a spherical shell of volume rr 24! ; thus the density of the
(m,n) pairs in the shell is 24)(
rrpmn
!. For a finite and nonspherical native structure, only
a fraction )(r" of the spherical shell between r and rr � centered on ix� is
occupied by protein atoms 1)(0 ## r" (Appendix figure A1). Thus, the density of the
(m,n) pairs at the distance r is )(4
)(2 rr
rpmn
"!
65
Appendix Figure A1 Schematic representation of the reference state
Source: Min-yi Shen and Andrej Sali, (2006)
From the appendix figure 1A, (A) is an illustration showing why only
fraction of a spherical shell generally contributes to the normalization function (see
equation (33)). (B) A pair of noninteracting atoms in a protein is modeled by two
points positioned randomly inside shere with radius a; the points are at distance r from
each other. (C) The large and small spheres are the reference and probe spheres.
The relate the distance pdf )(rpmn to the pair pdf ),( ji xxp ��. The
probability of finding atom i at ix� and atom j at jx� is )()( ji xpxp �� . Therefore, the
pair pdf ),( ji xxp �� is the product of the pair probability )()( ji xpxp �� and the (m,n) pair
density:
)(/)()(/)()()(
))(4/()()()(),(
,,
2,
rnrprnrpxpxp
rrrpxpxpxxp
nmnmji
nmjiji
�
"!��
����
�
� (33)
66
Where n(r) is the normalization function equal to )(4 2 rr "! , and m and n
are the types of atoms i and j , respectively. The single-body pdf )( ixp� is the number
density of atom i and is ignored because it does not impact on the ranking of different
conformations of the same protein. The calculation of the normalization function )(rn
is not straightforward because the native structures are finite and varying in size
(Appendix figure 1A). Therefore, we explicitly denote )(rn as dependent on the size
a of the sample native structure, );( arn .The size a define as the radius of the sphere
of uniform density that has the same radius of gyration gR as the sample native
structure; thus, gRa35
� . Similarity in distance pdf )(, rp nm as );(, arp nm .
The pair pdf ),( ji xxp �� for tha sample of all native structures is calculates
as a weighted sum of the pair pdfs corresponding to the individual sample structures:
��s
nmsji arnarpwxxp );(/);(),( ,�� (34)
Where index s runs over all sample native structures. This averaging
procedure is based on the presumed independence of the pair pdf from the protein
size. The joint pdf p defines the N-body correlation function g
),...,,,()(),...()(),...,,,( 321)(
21321 Nn
NN xxxxxgxpxpxpxxxxp ������������ (35)
The total free energy G of the system can then be expressed in terms of
the correlation function g :
),...,,,(ln),...,,,( 321)(
321 Nn
BN xxxxxgTkxxxxG ���������� (36)
Where Bk is the Boltzmann constant. Therefore, an approximate free
energy of a system is
67
��
�
�
�
��
N
ji ji
N
ji
njiBN
ru
rgTkxxxxG
)(
)(ln),...,,,(
,
)(,321
����
(37)
Where )()(, rg nji is the radial distribution function equal to
)(/)(, rnrp nm , and )(, ru ji is the potential of mean force for a pair of atoms,
� ���
N
ji
njiBji rgTkru )(ln)( )(
,, . Because )()(, rnrp REFnm � and thus 0)(, �ru ji for the
reference state, the potential of mean force derived from the observed distance pdf
)(, rp nm is
�
� �
���
�
� �
���
)(
)(ln
)(
)(ln)(
,
,
,
,, rN
rNTk
rp
rpTkru
REFnm
OBSnm
BREFnm
nmBji (38)
Where )(, rN OBSnm and )(, rN REF
nm are the numbers of atom type pairs ),( nm
at the distance r within ),( rrr � for the interacting real system and the
noninteracting reference state and )(, rpREFnm is reference state uncorrelated uniform
atomic density is grounded of the native structures., respectively. The reference state
is identical to that of the discharge state used for free energy calculations in statistical
mechanics. The equation (38) establishes the relation between the statistical potential
derived from a sample of known structure and the potential of mean force.
3 File format description
The file that important in homology modeling using MODELLER was
two files following:
3.1 Alignment file (PIR)
The alignment file preferred format for comparative modeling is
related to the PIR database format. A sample PIR format following.
68
Appendix Figure A2 Input alignment file in PIR format.
From figure A2, the first line of each sequence entry specifies the
protein code after the >P1; line identifier. The line identifier must occur at the
beginning of the line. For example, 1fdx is the protein code of the first entry in the
alignment above. The protein code corresponds to the alnsequence.code variable.
(Conventionally, this code is the PDB code followed by an optional one-letter chain
ID, but this is not required.). The second line of each entry contains information
necessary to extract atomic coordinates of the segment from the original PDB
coordinate set. The fields in this line are separated by colon characters, ‘:’. The fields
are as follows:
Field 1: A specification of whether or not 3D structure is available
and of the type of the method used to obtain the structure (structureX, X-ray;
structureN, NMR; structureM, model; sequence, sequence). Only structure is also a
valid value.
Field 2: The PDB code. While the protein code in the first line of
an entry, which is used to identify the entry, must be unique for all proteins in the file,
C; A sample alignment in the PIR format; KAN-1 >P1;1jso structeurX:1jso :A :325 : : : : : : : DQICIGYHANNSTEQVDTIMEKNVTVTHAQDILEKTHNGKLCDLNGVKPLILRDCSVAGWLLGNPMCDEFLNVPEWSYIVEKDNPVNGLCYPENFNDYEELKHLLSSTNHFEKIRIIPRSSWSNHDASSGVSSACPYNGRSSFFRNVVWLIKKNNAYPTIKRSYNNTNQEDLLILWGIHHPNDAAEQTKLYQNPTTYVSVGTSTLNQRSVPEIATRPKVNGQSGRMEFFWTILKPNDAINFESNGNFIAPEYAYKIVKKGGSAIMKSGLEYGNCNTKCQTPMGAINSSMPFHNIHPLTIGECPKYVKSGRLVLATGLRNVPQRET* >P1;1jso structeurX:1jso :B :176 : : : : : : : GLFGAIAGFIEGGWQGMVDGWYGYHHSNEQGSGYAADKESTQKAIDGTTNKVNSIIDKMNTQFEAVGKEFNNLERRIENLNKKMEDGFLDVWTYNAELLVLMENERTLDFHDSNVKNLYDKVRLQLRDNAKELGNGCFEFYHKCDNECMESVKNGTYDYPQYSEEARLNREEISGV* >P1;KAN-1 sequence:KAN-1 :321 : : : : : : : DQICIGYHANNSTEQVDTIMEKNVTVTHAQDILEKTHNGKLCDLDGVKPLILRDCSVAGWLLGNPMCDEFINVPEWSYIVEKANPVNDLCYPGDFNDYEELKHLLSRINHFEKIQIIPKSSWSSHEASLGVSSACPYQRKSSFFRNVVWLIKKNSTYPTIKRSYNNTNQEDLLVLWGIHHPNDAAEQTKLYQNPTTYISVGTSTLNQRLVPRIATRSKVNGQSGRMEFFWTILKPNDAINFESNGNFIAPEYAYKIVKKGDSTIMKSELEYGNCNTKCQTPMGAINSSMPFHNIHPLTIGECPKYVKSNRLVLATGLRNSP*
69
the PDB code in this field, which is used to get structural data, does not have to be
unique. It is a good idea to use the PDB code with an optional chain identifier as the
protein code. The PDB code corresponds to the alnsequence.atom_file variable and
can also contain the full atom filename, directory included.
Fields 3-6: The residue and chain identifiers (see below) for the
first (fields 3-4) and last residue (fields 5-6) of the sequence in the subsequent lines.
There is no need to edit the coordinate file if a contiguous sequence of residues is
required -- simply specify the beginning and ending residues of the required
contiguous region of the chain. If the beginning residue is not found, no segment is
read in. If the ending residue identifier is not found in the coordinate file, the last
residue in the coordinate file is used. By default, the whole file is read in. The
unspecified beginning and ending residue numbers and chain id's for a structure entry
in an alignment file are taken automatically from the corresponding atom file, if
possible. The first matching sequence in the atom file that also satisfies the explicitly
specified residue numbers and chain id's is used. A residue number is not specified
when a blank character or a dot, .', is given. A chain id is not specified when a dot, .',
is given. This slight difference between residue and chain id's is necessary because a
blank character is a valid chain id.
Field 7: Protein name. Optional.
Field 8: Source of the protein. Optional.
Field 9: Resolution of the crystallographic analysis. Optional.
Field 10: R-factor of the crystallographic analysis. Optional.
Note that, each sequence must be terminated by the terminating
character, *', and chain breaks are indicated by /'. There should not be more than one
chain break character to indicate a single chain break (use gap characters instead, -').
70
The alignment file can contain any number of blank lines between
the protein entries. Comment lines can occur outside protein entries and must begin
with the identifiers C;' or R;' as the first two characters in the line.An alignment file
is also used to input non-aligned sequences.
3.2 Restraints file
The first line of a restraints file should read 'MODELLER5
VERSION: MODELLER FORMAT'. After this, there is one entry per line. The
format is free, except that the first character has to be at the beginning of the line.
When the line starts with 'R', it contains a restraint, 'E' indicates a pair of atoms to be
excluded from the calculation of the dynamic non-bonded pairs list, 'P' indicates a
pseudo atom definition, 'S' a symmetry restraint, and 'B' a rigid body. In this work will
used Restraints file contain line starts with R only.
Appendix Figure A3 Restrain file.
From figure A3, An 'R' line should look like: R Form Modality
Feature Group Numb_atoms Numb_parameters Numb_Feat Atom_indices
Parameters. Its will create a Gaussian restraint on the distance between atoms 3 and 2,
with mean of 1.5380 and standard deviation of 0.0364.
Form is the restraint form type (see Appendix table A1). Modality
is an integer argument to Form, and specifies the number of single Gaussians in a
poly-Gaussian pdf, periodicity n of the cosine in the cosine potential, and the number
of spline points for cubic splines. Feature is the feature that this restraint acts on (see
Appendix table A2.) Group is the physical feature type. Numb_atoms is the total
number of atoms this restraint acts on, Numb_parameters is the number of defined
parameters, and Numb_Feat is the number of features the restraint acts on.
MODELLER5 VERSION: MODELLER FORMAT R 3 1 1 1 2 2 1 3 2 1.5380 0.0364 …
71
Numb_Feat is typically 1, except for the multiple binormal (where it should be 2) and
ND spline (where it can be any number). In cases where Numb_Feat is greater than 1,
the modality, feature type, and number of atoms of each subsequent feature should be
listed in order after Numb_Feat. Finally, the integer atom indices and floating point
parameters are listed.
Appendix Table A1 Numerical restraint forms.
Numeric form Form type
1 forms.lower_bound
2 forms.upper_bound
3 forms.gaussian
4 forms.multi_gaussian
5 forms.lennard_jones
6 forms.coulomb
7 forms.cosine
8 forms.factor
9 forms.multi_binormal
10 forms.spline or forms.nd_spline
50+ user-defined restraint forms
72
Appendix Table A2 Numerical feature types.
Numeric feature Feature type
1 features.distance
2 features.angle
3 features.dihedral
6 features.minimal_distance
7 features.solvent_access
8 features.density
9 features.x_coordinate
10 features.y_coordinate
11 features.z_coordinate
12 features.dihedral_diff
50+ user-defined feature types
73
Appendix B
Poster contribution to conferences
74
Poster presentation
Web-Based Application for Automated 3D Structure of Hemagglutinin (H5N1)
Taweesak Poochai, Daungmanee Chuakhaew, Chak Sangma Web-Based Application for Automated 3D Structure of Hemagglutinin (H5N1). Pure
and Applied Chemistry International Conference (PACCON) 2008, January 30 - February 1, 2008, Sofitel Centara Grand Bangkok, Bangkok, Thailand
75
76
77
78
79
Appendix C The survey satisfy of user about 3dvis system
80
The survey satisfy of user about 3dvis system
The survey satisfies of user about 3dvis system, using purposive sampling in
student and researcher whose have skill in computational and protein modeling.
Criterion to compile the result from questionnaire (in Thai language) can divide in to
fifth level from Likert’s Scale�following:
Means 1.00-1.50 mean not agreement
Means 1.51-2.50 mean few agreement
Means 2.51-3.50 mean moderate agreement
Means 3.51-4.50 mean much agreement
Means 4.51-5.00 mean most agreement
Symbol in statistical analysis
X � mean Mean
S.D. mean Standard Deviation
SS mean Sum of Squares
df mean Degree of Freedom
MS mean Mean of Squares
� 2 mean Chi-square Sig mean Significance
Statistical analysis that used in this work was means and standard deviation.
The analysis result were percentage of people whose answer this questionnaire
separate with sex, age, education level, status, work, expert/interesting field and
current working.
81
Appendix Table C1 The statistical analysis result in percentage of people whose
answer this questionnaire separate with sex, age, education level,
status, work, expert/interesting field and current working.
Detail Number of
people Percentage
Sex 11 ���
Male 3 27.3
Female 8 72.7
Age (years) 11 ����
# 25 5 45.5
26-30 5 45.5
31-35 1 9.1
Education level 11 100
Bachelor 1 9.1
Master 7 63.6
Doctor 3 27.3
Status 11 100
Student 9 81.8
Researcher 2 18.2
Expert/Interesting field 11 100
Protein modeling 4 36.4
Computational 7 63.6
Current working 11 100
Theoretical 8 72.7
Other 3 27.3
From appendix table C1 we found that the people whose answer is female
more than male (72.7% in female and 27.3% male) and age of people was in range 25
years old or younger to 30 years old. The education level of people was in master
82
degree (63.6%) more than doctoral (27.3%) and bachelor degree (9.1%) and most of
people was student (81.8%). When desirable in current working most people were in
field of theoretical (72.7%) and both theoretical and experimental (27.3%).
� Next, analysis data about percentage whose answer the questionnaire in
experience of homology modeling and homology related programs.
Appendix Table C2 The statistical analysis result in percentage of people whose
answer this questionnaire in experience of homology modeling
and homology related programs.
Detail Number of
people Percentage
Experience of homology modeling� 11� ���
Yes� 8 72.7
No 3 27.3
Experience in homology program: WHAT-IF 8 ����
Yes� 2 25.0
No 6 75.0
Experience in homology program: MODELLER 8 ����
Yes� 4� 50.0
No 4 50.0
Experience in homology program: SCWRL 8 ����
Yes� 1� 12.5
No 7 70.0
Experience in homology program: SCCOMP 8 ����
Yes� -� -
No 8 100.0
Experience in homology program: other 8� ����
Yes� 3 37.5
No 5 62.5
83
Appendix Table C2 (Continued)
Detail Number of people Percentage
Experience of molecular viewer 8� 100
Yes 8 72.7
No - 27.3
Experience of molecular viewer: DS viewer
pro 8 100
Yes 5 37.5
No 3 62.5
Experience of molecular viewer: Jmol 8 100
Yes 4 50.0
No 4 50.0
Experience of molecular viewer: Gauss view 8� 100
Yes 6 75.0
No 2 25.0
Experience of molecular viewer: WebMol 8 100
Yes 1� 12.5
No 7 87.5
Experience of molecular viewer: KING 8 100
Yes - -
No 8 100.0
Experience of molecular viewer: QuickPDB 8 100
Yes� 2 75.0
No 6 25.0
Experience of molecular viewer: RasMol 8 100
Yes� 2 75.0
No 6 25.0
Experience of molecular viewer: PyMol 8 100
Yes� 4 50.0
No 4 50.0
84
Appendix Table C2 (Continued)
Detail Number of
people Percentage
Experience of molecular viewer: other 8 100
Yes - -
No 8 100.0
Experience of molecular statistical analysis 8 100
Yes 5 62.5
No 3 37.5
Experience of molecular statistical analysis:
Ramanchandran plot 8 100
Yes 5 62.5
No 3 37.5
Experience of molecular statistical analysis: DOPE 8 100
Yes 2 25.0
No 6 75.0
Experience of molecular statistical analysis: Anolea 8 100
Yes - -
No 8 100.0
Experience of molecular statistical analysis:
Gromos 8 100
Yes - -
No 8 100.0
Experience of molecular statistical analysis:
Verify3D 8 100
Yes 2 25.0
No 6 75.0
Experience of molecular statistical analysis: other 8 100
Yes 1 12.5
No 7 87.5
85
Appendix Table C2 (Continued)
Detail Number of
people Percentage
Experience of using homology modeling web
server. 8� 100
Yes 3 37.5
No 5 62.5
Experience of using homology modeling web
server: SWISS-MODEL workspace. 8 100
Yes 3 37.5
No 5 62.5
Experience of using homology modeling web
server: 3DJIGSAW. 8 100
Yes 1 12.5
No 7 87.5
Experience of using homology modeling web
server: MODWEB. 8 100
Yes - -
No 8 100.0
Experience of using homology modeling web
server: other. 8 100
Yes� - -
No 8 100.0
From appendix table C2 we found that most people have experience in
Homology modeling (72.7%) and most experience in MODELLER (50%) of
homology program. The consider in experience of using molecular viewer program,
the result shown that the Gauss view program was the most experience (75.0%) of
this group. The experience of using homology modeling web server was less in this
group (37.5 for SWISS-MODEL workspace and 12.5% for 3DJIGSAW).
86
Appendix Table C3 The statistical analysis result about 3dvis system separate with
sex.
Male Female Total Detail
X S.D. X S.D.� X S.D.�
Using 3dvis system�
1. Interesting of 3dvis system.�
3.67
0.577
4.25�
0.463
2. Easy to use.� 4.00� 0.000 3.88� 0.641
3.�Language correction in 3dvis
system.� 3.67 0.577 3.88� 0.641
4.�System efficiency in processing. 3.33� 1.160 3.75� 1.040
5.�Result of the system was satisfy.� 3.00 0.000 3.63� 0.744
6.�Result of the system was
convince.� 3.67� 0.577 3.75� 0.463
7. 3dvis system useful. 2.67 0.577 3.88� 0.641
8. System efficiency. 3.67 0.577 3.75� 0.463
9.�Satisfy of using 3dvis system.� 3.67 0.577 3.88� 0.641
Total of satisfy about 3dvis system. 3.67 0.577 3.88� 0.641 3.820 0.603
User guide about 3dvis system
1.When read user guide, you can
use 3dvis system.
3.33
0.577
3.75�
0.707
2.�User guide explain how to use
3dvis system clearly. 3.00� 0.000 3.75� 0.707
3.�User guide�set up content easy.� 3.33 0.577 3.88� 0.641
4. Content easy to understanding� 3.33 0.577 3.88� 0.641
5.�Alphabetic character in user
guide clearly.� 4.33� 1.155 4.13� 0.354
6.�Language correction of user
guide. 4.00� 0.000 4.00� 0.756
Total of satisfy about User guide 4.00� 0.000 3.88� 0.641 3.910 0.539
Total 3.67 0.577 3.87 0.640 3.810 0.603
87
Appendix Table C4 The statistical analysis result about 3dvis system separate with
age.
# 25� 26 - 30� 31- 35� Total Detail
X S.D. X S.D.� X S.D.� X
S.D.�
Using 3dvis system�
1. Interesting of 3dvis system.�
3.80
0.447
4.20
0.447
5.00�
.
2. Easy to use.� 3.80 0.447� 3.80 0.447 5.00� .
3.�Language correction in 3dvis
system.� 3.80 0.447 3.60 0.548 5.00� .
4.�System efficiency in
processing. 3.00 1.00� 4.00 0.707 5.00� .
5.�Result of the system was
satisfy.� 3.20 0.837 3.60 0.548 4.00� .
6.�Result of the system was
convince.� 3.40 0.894� 3.60 0.548 4.00� .
7. 3dvis system useful. 3.00 0.707 3.00 1.000� 5.00� .
8. System efficiency. 3.40 0.894 4.00 0.000 4.00� .
9.�Satisfy of using 3dvis system.� 3.40 0.548 4.00 0.000 5.00� .
Total of satisfy about 3dvis system. 3.60 0.548 3.80 0.447 5.00� . 3.82 0.603
User guide about 3dvis system
1.When read user guide, you can
use 3dvis system.
3.60
0.548
3.40
0.548
5.00�
.
2.�User guide explain how to use
3dvis system clearly. 3.40 .548� 3.40 .548� 5.00� .
3.�User guide�set up content easy.� 3.60 0.548 3.60 0.548 5.00� .
4. Content easy to understanding� 3.60 0.548 3.60 0.548 5.00� .
5.�Alphabetic character in user
guide clearly.� 4.00 0.707� 4.40 0.548 4.00� .
6.�Language correction of user
guide. 3.80 0.447� 4.00 0.707 5.00� .
Total of satisfy about User guide 3.80 0.447� 3.80 0.447� 5.00� . 3.91 0.539
Total 3.40 0.547 4.00 0.000 5.00 . 3.81 0.603
88
Appendix Table C5 The statistical analysis result about 3dvis system separate with
education level.
Bachelor� Master Ph.D. Total Detail
X S.D. X S.D.� X S.D.� X S.D.�
Using 3dvis system�
1. Interesting of 3dvis system.�
3.00
.
4.14
0.378
4.33�
0.577
2. Easy to use.� 4.00 .� 3.86 0.378 4.00� 1.000
3.�Language correction in 3dvis
system.� 3.00 . 3.86 0.378 4.00� 1.000
4.�System efficiency in
processing. 2.00 .� 3.43 0.787 4.67� 0.577
5.�Result of the system was
satisfy.� 3.00 . 3.29 0.756 4.00� 0.000
6.�Result of the system was
convince.� 3.00 .� 3.71 0.756 3.33� 0.577
7. 3dvis system useful. 3.00 . 2.86 0.900� 4.00� 1.000
8. System efficiency. 3.00 . 3.71 0.488 4.00 0.000
9.�Satisfy of using 3dvis system.� 3.00 . 3.71 0.488 4.33� 0.577
Total of satisfy about 3dvis system. 3.00 . 3.86 0.378 4.00� 1.000 3.82 0.603
User guide about 3dvis system
1.When read user guide, you can
use 3dvis system.
4.00
.
3.43
0.535
4.00�
1.000
2.�User guide explain how to use
3dvis system clearly. 3.00 .� 3.43 0.535� 4.00� 1.000
3.�User guide�set up content easy.� 4.00 . 3.43 0.535 4.33� 0.577
4. Content easy to understanding� 3.00 .� 3.57 0.535 4.33� 0.577
5.�Alphabetic character in user
guide clearly.� 3.00 . 4.43 0.535 4.00� 0.000
6.�Language correction of user
guide. 4.00 . 3.86 0.690 4.33� 0.577
Total of satisfy about User guide 4.00 . 3.71 0.488� 4.33� 0.577 3.91 0.539
Total 3.00 . 3.71 0.488 4.33 0.577 3.81 0.603
89
Appendix Table C6 The statistical analysis result about 3dvis system separate with
work.
Student Researcher Total Detail
X S.D. X S.D. X S.D.
Using 3dvis system�
1. Interesting of 3dvis system.
����
0.601
����
0.000
2. Easy to use. ��� 0.601 ���� 0.000
3.�Language correction in 3dvis
system. ��� 0.667 ���� 0.000
4.�System efficiency in processing. ���� 1.014 ���� 1.414
5.�Result of the system was satisfy. ���� 0.726 ���� 0.707
6.�Result of the system was
convince. ���� 0.726 ���� 0.707
7. 3dvis system useful. ���� 1.054 ���� 0.707
8. System efficiency. ��� 0.500 ���� 0.000
9.�Satisfy of using 3dvis system. ��� 0.667 ���� 0.000
Total of satisfy about 3dvis system. ��� 0.601 ���� 0.707 3.82 0.603
User guide about 3dvis system
1.When read user guide, you can
use 3dvis system.
����
0.726
����
0.000
2.�User guide explain how to use
3dvis system clearly. ���� 0.726 ���� 0.707
3.�User guide�set up content easy. ��� 0.707 ���� 0.000
4. Content easy to understanding ��� 0.707 ���� 0.000
5.�Alphabetic character in user
guide clearly. ���� 0.601 ���� 0.707
6.�Language correction of user
guide. ��� 0.601 ���� 0.707
Total of satisfy about User guide ��� 0.601 ���� 0.000 3.91 0.539
Total 3.77 0.667 4.00 0.000 3.82 0.603
90
Appendix Table C7 The statistical analysis result about 3dvis system separate with
expert/interesting field.
Protein
modeling
Computation
al�Total
Detail
X S.D. X S.D. X S.D.
Using 3dvis system
1. Interesting of 3dvis system.
���
.500
4.29
.488
2. Easy to use. ��� .500 4.00 .577
3.�Language correction in 3dvis
system. ���� 0.577 4.00 0.577
4.�System efficiency in processing. �� 0.957 4.14 0.690
5.�Result of the system was satisfy. �� � 0.957 3.57 0.535
6.�Result of the system was
convince. ���� 0.816 3.86 0.378
7. 3dvis system useful. ���� 0.577 3.00 1.155
8. System efficiency. �� � 0.500 4.00 0.000
9.�Satisfy of using 3dvis system. �� � 0.500 4.14 0.378
Total of satisfy about 3dvis system. ���� 0.577 4.00 0.577 3.82 0.603
User guide about 3dvis system
1.When read user guide, you can
use 3dvis system.
���
0.500
3.57
0.787
2.�User guide explain how to use
3dvis system clearly. ��� 0.500 3.43 0.787
3.�User guide�set up content easy. ���� 0.000 3.57 0.787
4. Content easy to understanding ��� 0.500 3.71 0.756
5.�Alphabetic character in user
guide clearly. ��� 0.500 4.43 0.535
6.�Language correction of user
guide. ���� 0.000 4.00 0.816
Total of satisfy about User guide ���� 0.000 3.86 0.690 3.91 0.539
Total 3.50 0.577 4.00 0.577 3.82 0.603
91
Appendix Table C8 The statistical analysis result about 3dvis system separate with
current working.
Theoretical other Total Detail
X S.D. X S.D. X S.D.
Using 3dvis system
1. Interesting of 3dvis system.
4.13
0.641
4.00
0.000
2. Easy to use. 3.88 0.641 4.00 0.000
3.�Language correction in 3dvis
system. 3.75 0.707 4.00 0.000
4.�System efficiency in processing. 4.00 0.926 2.67 0.577
5.�Result of the system was satisfy. 3.63 0.518 3.00 1.000
6.�Result of the system was
convince. 3.63 0.518 3.33 1.155
7. 3dvis system useful. 3.00 1.069 3.67 0.577
8. System efficiency. 3.88 0.354 3.33 0.577
9.�Satisfy of using 3dvis system. 4.00 0.535 3.33 0.577
Total of satisfy about 3dvis system. 3.88 0.641 3.67 0.577 3.82 0.603
User guide about 3dvis system
1.When read user guide, you can
use 3dvis system.
3.50
0.756
4.00
0.000
2.�User guide explain how to use
3dvis system clearly. 3.38 0.744 4.00 0.000
3.�User guide�set up content easy. 3.63 0.744 4.00 0.000
4. Content easy to understanding 3.63 0.744 4.00 0.000
5.�Alphabetic character in user
guide clearly. 4.13 0.641 4.33 0.577
6.�Language correction of user
guide. 3.88 0.641 4.33 0.577
Total of satisfy about User guide 3.88 0.641 4.00 0.000 3.91 0.539
Total 3.88 0.641 3.67 0.577 3.82 0.603
92
From appendix table C3-8 we found that the satisfy about 3dvis system and
user guide of this system were in agreement to much agreement level (3.00-4.51 by
mean).
From overall result, the survey satisfy of user about 3dvis system, using
purposive sampling in student and researcher whose have skill in computational and
protein modeling, the result was good agreement with system. Suggestion of system
could be include security, frequency ask question (FAQ), checking of error and
artificial intelligent about sequence alignment.
93
����������
���� ��������������������������������������� �� ������������� ���������������
� ������ �� ����������� � �������� ��� ������� ��� ��������
�
������ �������!�"��#$�
�����%��������������� ������������������������ �������������� ���� �������
�� !� � ���������"��� � ����������#$���
�� ����� ������������%&��� ��������� � ���������������%&�� �������������%&� � � ���������������%&�� ����������������%&� � � ������������%&��� '��������
� �()�����!*�+�� ���������%�$##����� � ���������%�$##�,�� � ��������%�$##� ���
�� �"� � � ����������$'$�-���!*�+�� ��������� �.���/� � ������������$.����
�� ���'��. ���"����"�#��,%�)�(�����������������������������������������������������������
�� ���+�(�����������,%�)�(��� 00000000000000000000000000000000000000000000000000000000000000��
������ ��$ ����&'����(��"� ������������� ���
�����%��������������� ������������������������ ���������%1�.�$���
�� �������%�('�����/������������������������������ 2���� ������������ � ���������2���������2%� �� ��������
�� ������"�,%�3�����������������������)���� �������4��'����5� �2)������������� ��� ���������������� !� � � ����������"#$%%$&�� ��������'(�&%� � � ���������'(("�)�� ��������� ����6��,%�)�(����7777777777777777777777777777777777777777777777777777777777777��
*� ������"�,%�3��������+,�-.�/��0�.�������� 2���� ������������ � ���������2���������2%� ��� ������
��� ������"�,%�3��������+,�-.�/��0�.��)������'����5� �2)������������� ��� ���������#'�/��0�.�).�� ���������1���� ��������2-,33�/��0� � �� ����������4���� � ���������5 62� ���������7,�+8)#9� � �� ��������&-3���� � ���������)����� ���������� ����6��,%�)�(���7777777777777777777777777777777777777777777���
94
��� ������"�����$��(�/,��'�������$�$������� 2���� ������������ � ���������2���������2%� ��� ����� �
��� ������"�����$��(�/,��'�������$�$�)������'����5� �2)������������� ��� ���������&-�-�+:-��.-��)��;� ���������#�3+.�;��"<;���=���)�;��;�-��$��.���� �������������-� � ���������2.���3�� ���������/�.�>�#� � ��������� ����6��,%�)�(���77777777777777777777777777777777777777777777777777777��
�� ������"���$����0�43�.?�.���������$���'����,��'�������$�$������� 2���� ������������ � ���������2���������2%� �� ��������
��� ������"���$����0�43�.?�.���������$���'����,��'�������$�$�)������� �'����5� �2)������������� ��� ���������'� ''���"#$%���.83<-+�� ���������#1 2'���� ���������"#�$9� � � ��������� ����6��,%�)�(���77777777777777777777777777777777777777777777777777777�
�
������ ����������������������� �� ������������� ���������������� ������ �� ����������� � ��
� ������� ��� ������� ��� ��������������� �
�����%����� ����.���������2)��"�����(���#�/�3�3������������� *� �.����� ��� 2%��4 ����2���� ,%�)������ �������������������
�� ��5��� ������'�)� ����%�������
�� ���� ����'�)�
�
)����������� �� �������
� � � � �
������������$)����(���#�/�3���������'��. ����)� � � � � ����������������"�����(���#�/�3������'()���� ���� ����)�
� � � � �
���������%3��� �8�+�����"����(���#�/�3������5���� � ����)�
� � � � �
������������$)����(���#�/�3�'����5%�(���9�2)��:� ����)� � � � � ���������9��� ;/� ��(���#�/�3������������ ����� ����� ����)�
� � � � �
������������$)���9��� ;/���2)�.���(���#�/�3���������"�� 5� ����)�
� � � � �
������������2)����%�(,�"�/.������"�����(���#�/�3���4 ����)�
� � � � �
������������$)���,)�8� ���3������������� ��(����%�('$�;$8� ����)��
� � � � �
95
�����*��,)�8� ���3������� *� �.����"��(���#�/�3� ����)�
� � � � �
�
)���������*+��(�+*������ �� �������
� � � � �
������������ ���� ������� ;$����$;�����"�3�����������������..�'����5%<$���$���2)� ����)�
� � � � �
������������� '����5 ;$����$;�����"�2)��( ��) ����)� � � � � �
������������.�)���)���$;�����"�� ����� ������ � ����)� � � � � �
�� ��5��� ������'�)� ����%�������
�� ���� ����'�)�
�����������.�)���)���$;�����"�� ����� �� ��� ���� ����)� � � � � �
������������ ��+���� $� /������ ;$����$;�����"������"�).� ����)�
� � � � �
�������������"�8�+��������� ;$����$;�����"����� '�4��(��)��) ����)�
� � � � �
�
������ !���������� �
�����%����,%�)3')����$)�:�������� ����"�����(���'$�������%���%����3��2� $���$���
�� '� 3�(�� ����������������������������������������������������������������������������������������� ����������������������������������������������������������������������������������������� ������������������������������������������������������������������������������������������ ����������������������������������������������������������������������������������������� ������������������������������������������������������������������������������������������ ������������������������������������������������������������������������������������������'$�������%���%���� ����������������������������������������������������������������������������������������� ����������������������������������������������������������������������������������������� ����������������������������������������������������������������������������������������� ����������������������������������������������������������������������������������������� ����������������������������������������������������������������������������������������� ������������������������������������������������������������������������������������������ ����������������������������������������������������������������������������������������� �
96
"#$%&'(#)$*++�3D Visualization System to Aid Understanding Hemagglutinin
Mutations in H�N� Influenza Virus
������������������@ABCDEFGBDHIJKBLDMFFNABOPQLRJSKTUFEVIUWXJHIJ�YZ[RJSK\K]MOFU[LWBLO_FHLCLJB\SGSaLObJB
DMFFGDcU]UdHIJKBLHLeDEfKgDG�g\MHLGBDHIJKBLeDEfKRShiPRJSK]ABGBD\SGSaL]jGeDEfK]UdObJBHIJKBL
��������
k��WXJHIJ�elS�WXJ]UdiTHI@TBIaG�NM@BTBDmObJBmnKbJSTX\HL@_LbSK�About g\M�How to use iZJ
O]BLEfL
o��@TBIaG�elS�WXJ]Ud]ABGBD@TEeD�g\MiZJDEFFEVIUWXJHIJg\J_�TU@a]paHLGBDHIJKBLbJSTX\]EfKCTZ]UdOPQLbSK
@TBIaGHLO_FiqRrLUf�C\EKNBG]ABGBD\SGSaLObJBDMFF
,-#./0+#12345*6)'#789+
�������������k��CLJBgDG�g\M@_LGBDObJBDMFF
DXP]U�k�GBDHIJKBL@_LGBDObJBDMFF
97
������������������WXJHIJ@BTBDmObJBTBOsldSbSHIJDMFFiZJ]Ud�URL – http://�������� �� ����dvis YZ[
WXJHIJ@BTBDmObJBmnKCLJBO_FHL@_LbSK�About g\M�How to use iZJO]BLEfL�CBGWXJHIJRJSKGBDHIJKBLHL@_L
bSKGBD]AB�sequence alignment g\MGBD@DJBKYeDK@DJBK@BTTaRa�NABOPQLRJSK@TEeDbSDEFFEVIUWXJHIJ�OsldSOPQL
@TBIaG�qndK@TBIaGTU@a]paHLGBDHIJKBLbJSTX\]EfKCTZ]UdOPQLbSK@TBIaGHLO_FiqRrLUf�C\EKNBG]ABGBD\SGSaLObJB
DMFF�
������������o��CLJB�My Workspace
DXP]Ud�o�GBDHIJKBL@_LGBD]ABKBLC\EG�My Workspace
OTldS@TBIaG\SGSaLObJBTBg\J_�NMPDBGt]UdCLJB�My Workspace YZ[CBG@TBIaGTUKBLOGBS[X�
@BTBDmNMPDBGu_BTUiv\rbJSTX\S[X�g\M@BTBDm]ABGBDZB_rLYC\Z�\F�g\MZX@DjPKBL]UdTUS[Xg\J_iZJ�
�
98
w��2D alignment
DXP]Ud�w�GBDHIJKBL@_LGBD]ABKBLOsldSeABL_c\ABZEFYPDRULHCTHL@SKTaRa
�������������HL@_LLUfOPQL@_L]UdDSKDEFGBD]ABKBLOsldSeABL_c\ABZEFYPDRULHCTHL@SKTaRa��Sequence
alignment) YZ[@TBIaG@BTBDmO\lSGiZJ_BRJSKGBDeABL_cO]U[FGEFYeDK@DJBKbSKYPDRULxUgTGG\XRaLaLgFF
HZ�YZ[HLDMFFNMTUYeDK@DJBK�X-ray bSKYPDRULxUgTGG\XRaLaL���YeDK@DJBK�qndKDB[\MOSU[Z@BTBDmZXiZJ
NBG�PDB list YZ[WXJHIJRJSKGDSGbJSTX\\ABZEFGDZSMTaYL�g\MIldSbSK\ABZEFGDZSMTaYL]UdRJSKGBD�CDlS]AB�
Multiple sequence alignment YZ[SEsYC\ZgvyTbJSTX\\ABZEFYPDRUL��OzsBM�FASTA format
O]BLEfL��
99
{��3D Automate Model
DXP]Ud�{�GBDHIJKBL@_LGBD]ABKBLOsldSeABL_cYeDK@DJBK@BTTaRaNBG\ABZEFYPDRUL
������������HL@_LLUfOPQL@_LGBD]ABKBLOsldSeABL_cYeDK@DJBK@BTTaRaNBG\ABZEFYPDRUL�YZ[HIJYeDK@DJBKbSK
YPDRULxUgTGG\XRaLaL��HL]UdLUfelSYeDK@DJBK�1JSO) OPQLRJLgFFZJ_[_apU�Comparative Modeling YZ[
WXJHIJ@BTBDmGDSGIldS�g\M\ABZEFYPDRULiZJ�CDlSO\lSGbJSTX\NBGGBDeABL_c\ABZEFYPDRULHCTHL@SKTaRa]UdTUS[X
g\J_iZJ�
100
������������|��Result
DXP]Ud�|�GBDHIJKBL@_LGBDg@ZKW\YZ[@DjP
101
�������������HL@_LLUfelSW\\Espr]UdiZJNBGGBDeABL_c�g\MWBLGBDPDMT_\W\OsadTORaTg\J_�YZ[NMHCJbJSTX\HL
@_LbSKYeDK@DJBK@BTTaRa�GBDDMFjRABgCLK]UdOGaZGBDOP\Ud[LgP\KbSK\ABZEFGDZSMTaYL]UdWXJHIJ@KTBeABL_c
OPDU[FO]U[FGEF�\ABZEFGDZSMTaYLbSKYeDK@DJBKRJLgFF�g\MGDBvg@ZKeB�Discrete Optimized Protein
Energy Score (DOPE Score) OPDU[FO]U[FYeDK@DJBK@BTTaRa]UdeABL_ciZJ�GEFYeDK@DJBKRJLgFF�
������������}��Download
DXP]Ud�}�GBDHIJKBL@_LGBDg@ZKW\YZ[@DjP
�������������HLCLJB�download WXJHIJ@BTBDm�eEZ\SGbJSTX\]EfKCTZ]UdOGaZbnfLHLGBD@DJBKYeDK@DJBK@BTTaRaZJ_[
YPDgGDT�MODELLER g\MbJSTX\SldLh�iZJ]EfKCTZ
102
CURRICULUM VITAE
NAME : Mr. Taweesak Poochai
BIRTH DATE : July 6, 1983
BIRTH PLACE : Bangkok, Thailand
NATIONALITY : Thai
EDUCATION : 2001 – 2005 Kasetsart University, B.Sc. (Physics).
SCHOLARSHIPS : Higher Education Development project Scholarship
Postgraduate Education and Research Program in
Physical Chemistry (2005-2006).