thesis - ครูปอนด์ · acknowledgements i wish to express my deep gratitude to a...

THESIS

3D VISUALIZATION SYSTEM TO AID

UNDERSTANDING HEMAGGLUTININ MUTATIONS IN

H5N1 INFLUENZA VIRUS

TAWEESAK POOCHAI

GRADUATE SCHOOL, KASETSART UNIVERSITY

2008

THESIS

3D VISUALIZATION SYSTEM TO AID UNDERSTANDING

HEMAGGLUTININ MUTATIONS IN H5N1 INFLUENZA VIRUS

TAWEESAK POOCHAI

A Thesis Submitted in Partial Fulfillment of

the Requirements for the Degree of

Master of Science (Chemistry)

Graduate School, Kasetsart University

2008

ACKNOWLEDGEMENTS

I wish to express my deep gratitude to a number of people who giving me the

guidance, help and support to reach my goal of this thesis. First of all, most of credits

in this thesis should justifiably go to my advisor, Dr. Chak Sangma, for his valuable

guidance, continuous support, kindness and encouragement throughout the course of

my graduate. I also thank my other my co-supervisor, Dr. Pipat Khongpracha and Dr.

Songwut Suramitr, for their very valuable comments and suggestions for this work. I

would like to thank the rest of my thesis committee, Dr. Jakkapan Sirijaraensre and

Dr. Piti Treesukol whose the representative of the Graduate School of Kasetsart

University, for his helpful comments and suggestions. I am also greatly indebted to

many teachers, Prof. Dr. Jumras Limtrakul, Assoc. Prof. Dr. Supa Hannongbua,

Asst. Prof. Piboon Pantu, Dr. Pensri Bunsawansong, Dr. Tanin Nanok, Dr.

Patchreenart Saparpakorn, Dr. Somkiat Nokbin and Dr. Pimpa Hormnirun.

Furthermore, I would like to acknowledge Miss Wipawee Punnopasri, Miss

Waraporn Jungtanasombat, Miss Daungmanee Chuakhaew, Miss Nipa Jongol and

all of my colleagues at LCAC, your thoughtful help and support.

During the course of this work at Kasetsart University, I was supported by the

Ministry of University Affairs under the Higher Education Development Project

Scholarship (MUA-ADB funds). Thanks are also to the Laboratory for Computational

and Applied Chemistry (LCAC) and Cheminformatics Research Unit, Kasetsart

University for offering such good computational knowledge-based systems.

Finally, I would like to dedicate this thesis to my dad, Pol.Col. Somsak

Poochai, my mum, Somradee Poochai and my sister, Phattarawan Poochai. Their love

and support for me is priceless.

Taweesak Poochai April 2008

i_

TABLE OF CONTENTS

Page

TABLE OF CONTENTS i

LIST OF TABLES ii

LIST OF FIGURES iv

LIST OF ABBREVIATIONS vi

INTRODUCTION 1

OBJECTIVES 8

LITERATURE REVIEW 9

MATERIALS AND METHODS 17

Materials 17

Methods 18

RESULTS AND DISCUSSIONS 30

CONCLUSION AND RECOMMENDATION 46

Conclusion 46

Recommendation 46

LITERATURE CITED 47

APPENDICES 52

Appendix A Methodologies implemented in MODELLER 53

Appendix B Poster contribution to conferences 73

Appendix C The survey satisfy of user about 3dvis system 79

CURRICULUM VITAE 102

ii_

LIST OF TABLES

Table Page

1 DOPE score for mutation residue of KAN-1 model,

calculated from this system and SWISS-MODEL workspace,

and 1JSO structure in each residues and different percent

value compare each residue of model with residue of template

structure. 38



and 1JSN structure in each residues and different percent


structure. 40



and 1JSM structure in each residues and different percent


structure. 42



and 2FK0 structure in each residues and different percent


structure. 43



and 2IBX structure in each residues and different percent


structure. 44

iii_

LIST OF TABLES (Continued)

Appendix table Page

A1 Numerical restraint forms. 71

A2 Numerical feature types. 72

C1 The statistical analysis result in percentage of people whose

answer this questionnaire separate with sex, age, education level,

status, work, expert/interesting field and current working. 81

C2 The statistical analysis result in percentage of people whose

answer this questionnaire in experience of homology modeling

and homology related programs. 82

C3 The statistical analysis result about 3dvis system separate with

sex. 86


age. 87


education level. 88


work. 89


expert/interesting field. 90


current working. 91

iv_

LIST OF FIGURES

Figure Page

1 Schematic representation of influenza A virion. 1

2 Influenza virus replication cycle. 2

3 Steps in comparative protein structure modeling. 18

4 3D visualization system to aid understanding mutation in

Hemagglutinin protein (H5N1) Schematics. 23

5 Schematic diagram for 2D Alignment page. 24

6 Schematics for 3D Automodel page. 25

7 Schematic for briefly result page. 25

8 The multiple table layout of relative tables in the database 26

9 Three sectorial parts, connected by username and jobname, of data

source collection in database system. 27

10 Schematics diagram of Bots program. 29

11 Home page of the system. 30

12 Workspace page. 31

13 2dalign page. 32

14 3dauto page. 33

15 Result page. 34

16 Download page. 35

17 DOPE score graph between 3D structure of KAN-1 structure that

calculate with the 3dvis system (green line) and SWISS-MODEL

workspace (red line) and 3D structure of 1JSO. 37

v_

LIST OF FIGURES (Continued)

Figure Page



workspace (red line) and 3D structure of 1JSN. 39



workspace (red line) and 3D structure of 1JSM. 41



workspace (red line) and 3D structure of 2FK0. 43



workspace (red line) and 3D structure of 2IBX. 44

Appendix Figure

A1 Schematic representation of the reference state. 65

A2 Input alignment file in PIR format. 68

A3 Restrain file. 70

vi_

LIST OF ABBREVIATIONS

3D = Three dimension

3dvis = Three-dimensional visualization system to aid understanding

hemagglutinin mutations in H5N1 influenza virus

Ala (A) = Alanine

Asn (N) = Asparagine

Asp (D) = Aspartic acid

Cys (C) = Cysteine

DOPE = Discrete Optimization Potential Energy

Glu (E) = Glutamic acid

Gly (G) = Glycine

GUI = Graphic User Interface

HA = Hemagglutinin

His (H) = Histidine

HTML = Hypertext markup language

Ile (I) = Isoleucine

LD = Langevin Dynamics

Leu (L) = Leucine

LSTA = Sialyllacto-N-tetraose A

LSTC = Sialyllacto-N-tetraose C

Lys (K) = Lysine

M = Matrix protein

MC = Monte Carlo

MD = Molecular Dynamic

mRNA = Messenger ribonucleic acid

NA = Neuraminidase

NMR = Nuclear Magnetic Resonance

NP = Nucleoprotein.

NS = Non-structural protein.

PA, PB2 = RNA polymerase.

vii_

LIST OF ABBREVIATIONS (Continued)

PB1 = RNA polymerase and PB1-F2 protein

Phe (F) = Phenylalanine

PHP = Prehypertext Processor

Pro (P) = Proline

QM = Quantum mechanics

RCSB PDB = The research collaboratory for structural bioinformatics

(RCSB), the non-profit consortium that manages the Protein

Data Bank (PDB)

rmsd = Root mean square deviation

RNA = Ribonucleic acid

RNP = Ribonucleoprotein

1

3D VISUALIZATION SYSTEM TO AID UNDERSTANDING

HEMAGGLUTININ MUTATIONS IN H5N1 INFLUENZA VIRUS

INTRODUCTION

The influenza A virus is an RNA virus that contains two surface

glycoproteins, hemagglutinin (HA) and neuraminidase (NA) shown in Figure 1, to

initiate viral fusion and subsequent budding of new virions from the infected cell

(Sears and Wong, 1999). Both of these glycoproteins are presented on the surface of

the influenza virus and are essential for virion propagation. Hemagglutinin recognizes

target cells via sialic acid binding sites and then promotes viral fusion (Huang et al.,

1981).

Figure�1 Schematic representation of influenza A virion. Eight ribonucleoprotein

segments (RNP) are surrounded by layer of matrix (M1) protein and lipid

bilayer taken from host cell at budding. NS2 protein is associated with

M1. Three viral proteins are incorporated into the lipid bilayer: HA, NA,

and M2 protein. HA trimers and NA tetramers form spikes on the surface

of the virion. RNP segments contain viral RNA surrounded by

nucleoprotein and associated with the polymerase complex.

Source: Gubareva et al. (2000)

2

Hemagglutinin gene (HA) of influenza A virus (H5N1), infect a variety of

birds and mammals (Edwin D. K., 2006), is an antigenic glycoprotein found on the

surface of the influenza viruses which the receptor-binding and membrane fusion

glycoprotein of influenza virus and the target for infectivity-neutralizing antibodies

(John J. S. and Don C. W., 2000)

Figure�2 Influenza virus replication cycle.

Source: Gubareva et al. (2000)

In Figure 2, Infection of virus host can be described in following steps. First,

influenza virus binds through HA onto sialic acid sugars on the surfaces of epithelial

cells, typically in the nose, throat and lungs of mammals and intestines of birds.

Second, the cell imports the virus by endocytosis. In the acidic endosome part of the

HA protein fuses the viral envelop with the vacuole’s membrane, releasing the viral

RNA molecule into the cytoplasm (Melike L. et. al., 2003). Next, protein and viral

3

RNA form a complex that is transported into cell nucleus (Cros, J and Palese P, 2003)

and synthesized viral protein and form new viral genome particle by using inhibiting

translation of host cell mRNAs (Kash J. et. al., 2006). Finally, the mature virus depart

from the cell in a sphere of host phospholipids membrane It detach once their

neuraminidase has cleaved sialic acid residues from the host cell after cell died.

HA, at least 16 different HA antigens, is an antigenic glycoprotein found on

the surface of the influenza viruses. It is responsible for binding the virus to the cell

that is being infected in Human of type H5N1 in Asia (Suzuki Y., 2005). Studying

mutation, including substitutions, deletions, and insertions, are one of the most

important mechanisms for producing variation in influenza viruses (Robert G. W.,

1992), in HA is the receptor-binding and membrane fusion glycoprotein of influenza

virus and the target for infectivity-neutralizing antibodies (John J. S. and Don C. W.,

2000), consider position of mutation in protein sequence characterize by associated

with glycosylation sites, cleavage site, residues of the H5 receptor binding site and

antigenic site (Shiuh-Ming L. et.al., 1992).

To date, sum of x-ray structure of hemagglutinin protein in influenza A virus

subtype H5N1 was lower than sequence, 5 full x-ray structures from

http://www.pdb.org/pdb/home/home.do, and 991 full-length sequences from

http://www.ncbi.nlm.nih.gov/genomes/FLU/FLU.html. To understanding protein

function should have three dimension structures, so researcher used homology

modeling method to generated initial structures for calculate physical/chemical

properties. Homology modeling combines sequence analysis and molecular modeling

to predict three-dimensional structure. We will choose a remote homologue of your

Project protein that has not had its structure yet solved and use the SwissModel

WWW resource, MODELLER homology modeling tool or Scwrl3 modeling based

graph theory to model the molecule. The theoretical structure is then visualized with

SwissPDBViewer, RasMol or PyMol to gain insight into the way in which its

structure relates to its function. Color coding different physical attributes such as

residue charge, hydrophobicity, and secondary structure elements; different

representations, such as alpha-carbon traces, cartoon graphics, and space-filling

4

models; and super-positioning of the model with an actual structure all assist in the

interpretation.

Therefore, we have developing our own 3D visualization system to aid

automates, visualization as a portal web tool for lots of bioinformatics researches. In

this work, the molecular structures and protein functional as well as glycosylation site,

receptor binding site antigenic site that related with mutation residue were interested.

We has develop a web-based graphical user interface to generate 3D structure of

hemagglutinin protein influenza A virus subtype H5N1.

The system for this research is client/server web architecture. Therefore,

users interact with the system through their web browser. A well-known web server,

the Apache web server, is open source and widely used. The Apache web server

used for transferring data between the client and the server. The database and all

scripts are run on 3.2 GHz Intel Pentium 4 Processor server with 2 GB of memory,

under the GNU/Linux operating system.

From architecture of the system above, we can be divide the system

descriptions into two parts are the client and the server.

1. The Client

The client is an individual user's computer or a user application that does a

certain amount of processing on its own. It also sends and receives requests to and

from one or more servers for other processing and/or data. The client interacts with

the system through web browser.

In recent years, internet has become the main medium of choice for multi-user

application program distribution. The major advantage for web-base database

searching is that users can search the most recent data without regularity downloading

and updating the whole database to their local disks. In addition, there is no need to

install additional software on the user’s computer except standard web browser. In

5

this research, the 3D viewer program written in java called “Jmol”, Jmol applet

(www.jmol.sourceforge.net/applet), can be downloaded through the network and

integrated into the web document, By using this application the users can see the 3D

structure directly within the web page through java supported web browser, such as

Netscape and internet explorer (IE).

2. The server

The server consists of one or more computers that receive and process requests

from one or more client machines. A server is typically designed with some

redundancy in power, network, computing, and file storage. However, a machine with

dual processors is not necessarily a server. An individual workstation can function as

a server.

The server site in this research along with Linux operating system, apache web

server, my-structure query language (MySQL) database engine, and Hypertext

PreProcessor (PHP) scripting language, called LAMP (Linux Apache MySQL PHP).

LAMP is an open source web server solution which is both powerful and stable. The

homology modeling tools for this work was MODELLER, modeling 3D structure of

protein structure using sequence and template, control by Python scripting. This

combination of software allows one to build and customize a high performance and

innovative web server.

2.1 Linux operating system

Linux is an open source and constantly evolving operating system which

allows the administrator full control over almost every system aspect. In recent years,

Linux has proven itself as a production level competitor in the operating system world

(www.linuxforum.com).

6

2.2 Apache web server

Apache is the most popular web server on the net. It is very secure, fast,

and reliable. Moreover, it is used for transferring data between the client and the

server (www.linuxhelp.net). Apache 2.2 is used in this study.

2.3 MySQL database engine

MySQL is fast, multithreaded, multi-user, platform-independent, and

robust structure query language (SQL) database server and powerful relational

database management system (RDBMS) being widely used on the Internet. It is very

good for web-based applications as well and is a database server that holds content for

many things. For example, one could store website member information in a MySQL

database for use on a website. A database server allows a webmaster to keep data

organized and accessible (www.mysql.com). MySQL-3.23.58-1.9.i386 is back-end

database engine in order to achieve scalability, flexibility, and high performance

software for our research.

2.4 PHP scripting language

PHP is a relative with new server side programming language, which

allows webmasters to easily add dynamic pages to their website (www.php.net). PHP

is extremely versatile, as it can do everything from printing database data to webpage

to declaring browser cookies. All of these applications are open source and free for

experiment and modification. Many tools have been developed for MySQL with PHP

such as phpMyAdmin (www.mysql.com), a very good web based admin tool for

MySQL. It has many advantages over its competitors (ASP, Perl, and Java), such as

it is object oriented, embedded into hyper text markup language (HTML), very fast,

cross-platform compatible, can running as an apache module. PHP 5 is used in this

research.

7

2.5 Python scripting language

Python, developed by Guido van Rossum in 1991, is scripting language, as

well as Perl, PHP, Tcl (Alfred V. A., 2004), is an interpreted, interactive, object-

oriented programming language designed around a philosophy which emphasizes

readability and the importance of programmer effort over computer effort (John K.

O., 1998; M.F. Sanner1999). To develop python script can easy to use integrated

development environments (IDE) which open source software, such as IDLE

(http://www.python.org/idle/), PyDev (http://pydev.sourceforge.net/), IPython

(http://ipython.scipy.org/moin/) and others.

2.6 MODELLER

MODELLER is a computer program for comparative protein structure

modeling presented by Andrej Sali (Sali A. et. al., 1993; Fiser A. et. al., 2000). The

input is an alignment of a sequence to be modeled with the template structure, the

atomic coordinates of the template and a simple script, Python’s script.

8

OBJECTIVES

1. To propose 3D visualization system to aid understanding hemagglutinin

mutation in influenza A virus.

2. To integrate web-base tool for comparative analysis data of Influenza virus.

3. To perform modeling impact mutations in protein structure of Influenza

virus.

9

LITERATURE REVIEW

Studying mutation in HA, are one of the most important mechanisms for

producing variation in influenza viruses (Robert G. W., 1992), consider position of

mutation in protein sequence characterize by associated with glycosylation sites,

cleavage site, residues of receptor binding site and antigenic site of protein sequence

(Shiuh-Ming L. et.al., 1992). Thus, HA is the main determinant of the host range of

virus (Kanta S. et. al. 1998).

Clayton W. N. et al. (1984) proposed two mutations in the receptor-binding

site of a human hemagglutinin at residue 226 and 228, receptor binding site, had

effect to allowed replication in ducks.

Kanta S. et al. (1998) isolated an avian H5N1 influenza A virus (A/Hong

Kong/156/97) from a tracheal aspirate obtain from a 3-year-old child in Hong Kong

with a fatal illness consistent with influenza. They found that the hemagglutinin

protein contained multiple basic amino acids adjacent to the cleavage site. A feature

characteristic of highly pathogenic avian influenza A viruses.

Ya H. et al. (2001) investigated four new three-dimensional structures of avian

H5 and swine H9 influenza hemagglutinins (HAs). They found that closely related to

those that caused outbreaks of human disease in Hong Kong in 1997 and 1999 were

determined bound to avian and human cell receptor analogs. Form structures show

that HA binding sites specific for human receptors appear to be wider than those

preferring avian receptors and how avian and human receptors are distinguished by

atomic contacts at the glycosidic linkage. They compare new structures with

previously reported crystal structures of HA/sialoside complexes of the H3 subtype

that caused the 1968 Hong Kong influenza virus pandemic and analyzed in relation to

HA sequence of all 15 subtypes and to receptor affinity.

10

Ya H. et al. (2002) determined the three-dimensional structures of the

hemagglutinins (HAs) from H5 avian and H9 swine viruses closely related to the

viruses isolated from humans in Hong Kong. They compared it with known structures

of the H3 HA from the virus that caused the 1968 H3 pandemic and of the HA-

esrerase-fusion (HEF) glycoprotein from an influenza C virus. The result suggest that

HA subtypes may have originated by diversification of properties that affected the

metastability of Has requires for their membrane fusion activities in viral infection.

Enrique T. M. and Michael W. D. (2005) proposed method, from statistical

mechanics and probabilistic statistics, to quantify the non-monotonic immune

response that results from antigenic drift in the epitope of the hemagglutinin and

neuraminidase protein. They found that the results, compare epitope sequences of the

hemagglutinin protein A/Fujian/411/2002 and A/Panama/2007/99, explain the

ineffectiveness of the 2003-3004 influenza vaccine in the United States and provide

an accurate measure by which to optimize the effectiveness of future annual influenza

vaccines.

James S. et al. (2006) investigated relation between hemagglutinin (HA)

structure from highly pathogenic Vietnamese H5N1 influenza virus (Viet04) and 1918

and other human H1 HAs influenza A virus. They found that Viet04 more related to

1918 and other human H1 HAs than 1997 duck H5 HA by studying variation in

antigenic site and receptor binding site effected to �2-3 and �2-6 receptor specificity

only enhanced or reduced affinity for avian type receptors. Mutations that can convert

avian H2 and H3 HAs to human receptor specificity, when inserted on to the Viet04

H5 HA framework, permitted binding to a natural human receptor.

Functional characterization of protein sequence is one of the most frequent

problems in biology. This task is usually facilitated by accurate three dimensional

structure of the studied protein. There are several computer programs and web servers

that automate the homology modeling process. The web server for automated

homology modeling was a ModLoop (Andras F. and Andrej S., 2003) and SWISS-

MODEL workspace (Konstantin A. et al., 2006). The free computer program to

11

automated structure based on several theories such as Scwrl, Sccomp and

MODELLER (Sali A. et. al., 1993; Fiser A. et. al., 2000)..

In 1995, Andrej S. presented the number of sequence that can be modeled is

an order of magnitude larger than the number of experimentally determined protein

structures, when a protein sequence with at least 40% identity to a known structure.

Evaluation techniques are available that can estimate errors in different regions of the

model. In the same year, he examined evaluate three dimensional of protein structure

of human nucleoside diphosphate kinase, mouse cellular retinoic acid binding protein

I, and human eosinophil neurotoxin that were calculated by MODELLER, a program

for comparative protein modeling by satisfaction of spatial restrains. When template

structure with more than 40% sequence identity to the target protein was available the

model was likely to have about 90% of the main chain atoms modeled with an root

mean square (rms) derivation from the X-ray structure of ~ 1 Ao, in large part because

the template were likely to be that similar to the X-ray structure of the target. They

compared rms derivation to overall differences between refined NMR and X-ray

crystallography structures of the same protein.

Adrian A. C. et al. (2003) presented SCWRL, designed to be used as a

homology modeling tool and so it preserves all input coordinate features in contrast to

many publicly available programs. A new algorithm for this tool is presented that uses

results from graph theory to solve the combinatorial problem encountered in the side-

chain prediction problem, side chains are represented as vertices in an undirected

graph. The resulting graph can be partitioned into connected subgraphs with no edges

between them. This algorithm is able to complete predictions on a set of 180 proteins

with 34,342 side chains in <7 min of computer time. The total 1 and 1 + 2 dihedral

angle accuracies are 82.6% and 73.7% using a simple energy function based on the

backbone-dependent rotamer library and a linear repulsive steric energy.

Trosten S. et al. (2003) presented SWISS-MODEL, a server for automated

comparative modeling of three dimensional protein structures. It provided several

levels of user interaction through its World Wide Web interface; template selection,

12

alignment and model building are done completely automated by the server. Complex

modeling tasks can be handled with the ‘project mode’ using DeepView (Swiss-

PdbViewer), an integrated sequence-to-structure workbench. All model are sent back

via email with a detailed modeling report. WhatCheck analyses and ANOLEA

evaluations are provided optionally.

Andras F. and Andrej S. (2003) proposed ModLoop, is a web server for

automated modeling of loops in protein structures. It’s used the input as atomic

coordinates of the protein structure in the Protein Data Bank format and the output is

the coordinates of the nonhydrogen atoms in the modeled segments amd relies on the

loop modeling routine in MODELLER that predicts the loop conformations by

satisfaction of spatial restrains. A user provides the input to the server via a simpole

web interface, and received the output by email.

Eran E. et al. (2004) presented Sccomps, relatively fast execution times;

correctly predict 1 angles for 92–93% of buried residues and 82– 84% for all residues

with an RMSD of 1.7 Å for side chain heavy atoms. The program calculated influence

of the crystal packing, completeness of rotamer library and precise positioning of C�

atoms on the accuracy of side-chain prediction. Its used used to concurrently predict

conformations of multiple amino acid side chains on a fixed protein backbone.

Konstantin A. et al. (2006) presented SWISS-MODEL workspace, a web-

based integrated service dedicated to protein structure homology modeling. It assists

and guides the user in building protein homology models at different levels of

complexity. This system provided for each user where several modeling projects can

be carried out in parallel. They included protein sequence and structure databases

necessary for modeling that are accessible from the workspace and are updated in

regular intervals. Tools for template selection, model building and structure qualify

evaluation can be invoked from within the workspace.

13

Nowaday, Most researcher were collected and/or developed tools, to

convenience sequence alignment, protein visualization or protein function prediction,

to aid understanding protein functional. The tool was following:

Patrick A. et al. (2001) proposed automated and benchmarked a method based

on the evolutionary trace approach. Using multiple sequence alignment, they

identified invariant polar residue, which mapped onto the protein structure to

predicted functional site. This algorithm for functional site prediction was used to

access the validity of transferring the function between homologues. Its use to filter

putative docked complexes with a discrimination similar to that obtained by manually

including biological information about actives sites or binding residues.

Yutaka U. and Kiyoshi A. (2002) proposed MOSBY, a molecular structure

viewer program used to understanding protein molecules. Its designed to portable

with a comprehensive user interface by high-throughput graphic library. Their

MOSBY illustrated the portability and extensibility are prerequisites for a software

platform in scientific computing for variational of analysis and calculations with

atomic coordinates.

Andreas E. et al. (2003), proposed MOBILE, the process presented that

models protein binding-sites including bound ligand molecules as restraints. They

applied homology modeling method based on MODELLER to generated target

protein then refined iteratively by including information about bioactive ligands as

spatial restraints and optimizing the mutual interactions between the ligands and the

binding sites. Thus optimized models can be used for structure-based drug design and

virtual screening.

Marc A. et al. (2003) presented procedure to modeling protein structures from

protein sequences by using comparative modeling procedure, using MODELLER

program with the main tool. They proposed steps in comparative modeling, such as

Fold Assignment and Template Selection, Target-Template Alignment, Model

Building and Predicting the Model Accuracy.�

14

Valentin A. I. et al. (2003) proposed ModView, a web application for

visualization of multiple protein sequences and structures, integrates a multiple

structure viewer, a multiple sequence alignment editor and a database querying

engine. It is possible to interactively manipulate hundreds of proteins, to visualize

conservative and variable residues. In additional it can be included in HTML pages

along with text and Figures, which makes it useful for teaching and presentations.

Thomas J. O. (2004) proposed visualization tool to make correlations between

distinct biological disciplines using visualization techniques to highlight the critical

information. Its tool have had viewer in context maintenance, fisheye sequence view

and magic lens that have been developed to display protein structure and sequence

information.

Vladimir S. et al. (2005) presented a suite of SPACE tools for analysis and

prediction of structures of biomolecules and their complexes. Its includes servers and

programs, LPC/CSU software provides a common definition of inter-atomic contacts

and complementarity of contactiong surfaces to analyze protein structure and

complexes; CryCo server building a crystal environment and analysis of crystal

contacts; CMA to construction and analysis of protein contact maps; MutaProt for

structural analysis of point mutations; SCCOMP for side chain modeling based on

surface complementarity and LIGIN molecular docking software.

David E. et al. (2006) Studied 24 individual assessment scores, including

physics-based energy functions, statistical potentials and machine learning based

scoring functions. Individual scores were also used to construct 85000 composite

scoring functions using support vector machine (SVM) regression. The scores were

tested for their abilities to identify the most native-like models from a set of 6000

comparative models of 20 representative protein structures. Each of the 2o targets was

modeled using a template of <30% sequence identity. The result of the best SVM

score outperformed all individual scores by decreasing the average root mean square

distance (RMSD) difference between the model identified as the best of set and the

model with the lowest RMSD from 0.63 Ao to 0.45 Ao, while having a higher

15

Pearson Correlation coefficient to RMSD(r = 0.87) than any other tested score. They

found that the most accurate score is based on a combination of the DOPE non-

hydrogen atom statistical potential.

Ursula P. et al. (2006) proposed MODBASE, is a database of annotated

comparative protein structure models for all available protein sequence that can be

matched to at least one known protein structure. It’s updated regularly to reflect the

growth in protein sequence and structure database and model assessment. MODBASE

also allows users to generate comparative models for proteins of interest with the

automated modeling server, MODWEB.

B. Balamurugan et al (2007) presented PSAP, Protein Structure Analysis

Package, to calculate and display various hidden structural and functional features of

three dimensional protein structures. The proposed computing engine provides an

easy-to-use Web interface to compute and visualize the necessary features

dynamically on the client machine and the options are intended to better serve

researchers working in area of structural biology.

Marc A. M. et al. (2007) was proposed the DBAli tools that used a

comprehensive set of structural alignments in the DBAli database to leverage the

structural information deposited in the Protein Data Bank (PDB). It included tools to

allows users to input the 3D coordinates of a protein structure for comparison by

MAMMOTH against all chain in the PDB which annotated a target structure based on

the AnnoLite and AnnoLyze tools and stored relationships to other structure and used

the ModClus program that clusters structures by sequence and structure similarities

and use the MOdDom program to identifies domains as recurrent structural fragments

and used implementation of the COMPARER method in the SALIGN command in

MODELLER to creates a multiple structure alignment for a set of related protein

structures. Its freely accessible via the World Wide Web, allow users to mine the

protein structure space by establishing relationships between protein structures and

their functions.

16

Michal J. P. et al. (2007) proposed PROTMAP2D, is a software tool for

calculation of contact and distance maps based on user-defined criteria, quantitative

comparison of pairs or series of contact maps, written in Python programming

language. Its calculate 3D models and provides many options for their visualization,

the statistic and allows saving the output as bitmap graphics or ASCII files use for

MD trajectories comprising multiple conformations.

Stephen C. (2007) presented SChiSM2, web server-based program for creating

web pages that include interactive molecular graphics using Jmol for illustration. The

SChiSM2 interface provides two options, provides URL of structure and used

character of Protein Data Bank file format, for choosing a structure file to display in

the page. It works with World Wide Web implementation software.

17

MATERIALS AND METHODS

Materials

1. Hardware - DELL workstations; Intel Pentium IV 3.0 GHz, 2 GB of RAM

(Cheminformatics Research Unit Department of Chemistry, Kasetsart University,

Bangkok)

2. Software

2.1 Linux Operating System – Fedora Core 6.0

2.2 MODELLER version 9

2.3 Jmol version 11.3

2.4 Python Integrate Development Environment version 2.5

2.5 Java Runtime Environment version 5

2.6 PHP version 5

2.7 phpMyAdmin version 2.5.9

18

METHODS

1. Homology modeling

In this work was using MODELLER as the main program to compute 3D

structure of hemagglutinin protein of influenza A virus subtype H5N1 (HA). The

process consist in 4 step following

Figure 3 Steps in comparative protein structure modeling

19

1.1 Target and Template selection

The first step in homology modeling is to identify one or more template

structures that have detectable similarity to the target. In this work, consist eight-

template structure of HA which user should be select.

1.2 Sequence alignment method

Sequence-structure alignment calculated using the module of

MODELLER based on a global dynamic programming algorithm (Needleman, S. B.

and Wunsch, C. D., 1970). Its different from standard sequence-sequence alignment

method because it takes into account structural information from template when

constructing an alignment. This task achieved through a variable gap penalty function.

Given two sequence of element and an M times N score matrix W where M and N are

the numbers of elements on the first and second sequence. The scoring matrix is

composed of score Wi,j describing differences between element i and j from the first

and second sequence respectively. The recursive dynamic programming formulae that

give a matrix D are:

��

��

�

��

ji

jiji

ji

ji

Q

WD

P

D

.

,1,1

,

, min (1)

��

��

�

��

�

�

vP

gDP

ji

ji

ji,1

,1,

)1(min (2)

��

��

�

��

�

�

vQ

gDQ

ji

ji

ji1.

1,,

)1(min (3)

Where g(1) is a linear gap penalty

lvug .)1( �� (4)

20

Note that only a vector is need for the storage of P and Q. The uppermost

formula in equation calculated for i=M and j=N. Variable l is a gap length and

parameters u and v are gap penalty constants.

The arrays D, P and Q initialized as follows:

��

��

��

Nieeig

eiDi , )(

, 00, (6)

��

��

��

Njeejg

ejD j , )(

, 0,0 (7)

MiQP ii ,...,3,2,1, 0,0, �� (8)

NjQP jj ,...,3,2,1, ,0,0 �� (9)

Where parameter e is the maximal number of elements at sequence termini

which are not penalized with a gap penalty if not equivalences. A segment at the

terminus of length e is termed an “overhangs”. The minimal score dM,N is obtain from

),min( ,,, jMNiNM DDd � (10)

Where i=M, M-1, …, M-e and j=N, N-1, …, N-e to allow for the

overhangs. The equivalence assignments are obtained by backtracking in matrix D.

Backtracking starts from the element Di,j=dM,N

1.3 Structure building

Comparative modeling by MODELLER, as implemented in automodel

class can describes as a flowchart

Input: contain; script file, alignment file and PDB file for template.

21

Output: contain; job.log log file, job.ini initial conformation for

optimization, job.rsr restraints file, job.sch VTFM schedule file, job.B9999???? PDB

atom file for the model of the target sequence, job.V9999???? violation profiles for

the model, job.D9999???? progress of optimization , job.BL9999???? optional loop

model, job.DL9999???? progress of optimization for loop model and job.IL9999????

initial structures for loop model. Where “????”indicates the model number,means

value between 0001-9999. The main MODELLER routines used in each step given in

parentheses.

1.3.1 Read and check the alignment between the target sequence and

the template structures.

1.3.2 Calculate restraints on the target from its alignment with the

templates:

1.3.2.1 Generate molecular topology for the target sequence.

Disulfides in the target are assigned here from the equivalent disulfides in the

templates. Any user defined patches are also done here.

1.3.2.2 Calculate coordinates for atoms that have equivalent

atoms in the templates as an average over all templates.

1.3.2.3 Build the remaining unknown coordinates using internal

coordinates from the CHARMM topology library.

1.3.2.4 Write the initial model to a file with extension .ini.

1.3.2.5 Write all restraints to a file with extension .rsr.

1.3.3. Calculate model(s) that satisfy the restraints as well as possible.

For each model, first, generate the optimization schedule for the variable target

function method (VTFM). Next, read the initial model. Last, randomize the initial

22

structure by adding a random number between automodel.deviation angstroms to all

atomic positions.

1.3.4 Optimize the model:

1.3.4.1 Partially optimize the model by VTFM; Repeat the

following steps as many times as specified by the optimization schedule: Select only

the restraints that operate on the atoms that are close enough in sequence, as specified

by the current step of VTFM.Then optimize the model by conjugate gradients, using

only currently selected restraints.

1.3.4.2 Refine the model by simulated annealing with molecular

dynamics, if so selected. First, do a short conjugate gradients optimization. Next,

increase temperature in several steps and do molecular dynamics optimization at each

temperature. Next, decrease temperature in several steps and do molecular dynamics

optimization at each temperature. Last, do a short conjugate gradients optimization.

1.3.5 Calculate the remaining restraint violations and write them out.

1.3.6 Write out the final model to a file with extension .B9999????.pdb

where ???? indicates the model number. Also write out the violations profile.

1.3.7 Superpose the models and the templates, if so selected by

automodel.final_malign3d = True, and write them out.

1.4 Models evaluation

The accuracy of model should be assessed using the Discrete Optimization

Potential Energy (DOPE) score to increase or decrease confidence of model (M.A.

Marti-Renom et al, 2000). DOPE based on an improved reference state that

corresponds to non-interacting atoms in a homogeneous sphere with the radius

dependent on a sample native structure; it thus accounts for the finite and spherical

23

shape of the native structures. It is implemented in the popular homology modeling

program MODELLER and used to assess the energy of the protein model generated

through many iterations by MODELLER, which produces homology models by the

satisfaction of spatial restraints. The models returning the minimum score can be

choose as best probable structures and further used for evaluating with the DOPE

score. DOPE can also generate a residue-by-residue energy profile for the input

model, making it possible for the user to spot the problematic region in the structure

model (Min-yi Shen and Andrej Sali, 2006).

2. System Architectures

For automated comparative modeling of Hemagglutinin H5N1 protein

structure using MODELLER I had proposed schematics diagram for this system

following figure:

Figure 4 3D visualization system to aid understanding mutation in Hemagglutinin

protein (H5N1) Schematics.

How to use

Home Page

login New member

login�pass?

Workspace Page

2D alignment 3D Automodel View job Download job

Briefly Result Download Result Post sequence Post sequence

Type of alignment? Sequence alignment

Model building

Generate evaluation

data

Pairwise Sequence alignment

Multiple sequence alignment

Web pages

User actions

Bot processes

pairwise multiple

yes

no

24

2.1 Web-page Design

2.1.1 Workspace page

In workspace page, a query form offers the six separated query

fields, jobname, Result, Delete, Download and Status. They are connecting query

fields with MySQL command. This page was a main page, which user can automate

structure, sequence alignment, view result, delete result and show job status.

2.1.2 2D Alignment page

When user selected option to sequence alignment, user should be

select type of sequence alignment, pairwise with template sequence alignment or

multiple sequence alignment. The input sequence will uploaded to server side and

operate with bots program.

Figure 5 Schematic diagrams for 2D Alignment page.

Sequence data, sequence name and type of alignment

Write input file: make_align.py

Update jobname, sequence, sequencename and templatename to 3dvis database

Write uploaded file to directory. Write input file: malign.py

Multiple sequence alignment? yes no

25

2.1.3 3D Automodel page

Figure 6 Schematics for 3D Automodel page.

If user selected 3D automodel options, user should be selected

template from list of protein template name and give input sequence to automate 3D

structure. The input will uploaded to server side and operate with bots program.

2.1.4 Briefly Result page

Figure 7 Schematic for briefly result page.

In briefly result page shown of screenshot briefly result page for

automate 3D structure of target sequence. In this page divided into three parts. First,

Get outputname and templatename and position of residue from 3dvis database

Write out 3D structure viewer (Jmol), sequence alignment and

DOPE score graph between template and sequence structure

username and jobname

Sequence data, sequence name and template selected

Write input file: make_align.py, model-single.py, profile-gen.py,

plot.gp and jobname.in

Update jobname, sequence, sequencename and templatename to 3dvis database

26

Display 3D structure with Jmol and introduce some important site, such as

glycosylation, receptor-binding site, mutation residues. Second, Sequence alignment

of template and output structure. Last, Show detail of evaluated model with DOPE

score graph by all of data will show in one web page.

2.1.5 Download Result page

For Download Result page, will shown all of data such as sequence

alignment, 3D structure, DOPE score graph, log files, profile file in each jobs.

2.2 Database Design

Figure 8 The multiple table layout of relative tables in the database

3dvis

3d_id 3d_jobname 3d_date 3d_sequence 3d_sequencename 3d_chain mem_username template_name

member

mem_id mem_username mem_password mem_email mem_name mem_surname mem_address mem_gender mem_birthday mem_jobnum

member_directory

dir_id dir_name job_name job_stat mem_username

templatedetail

template_id template_name template_pdbdirectory template_profiledirectory template_detail

chaindetail

chain_id chain_name chain_sequence chain_sequencelength chain_glycosite chain_antigensite chain_receptorsite chain_cleavsite chain_residuesite template_name

27

For database management using MySql, is selected for back-end database

engine in this research. It’s free, powerful relational database management system

(RDBMS), and robust SQL database server. The database system is consisted of three

sectorial parts, tables in MySQL database, Template structures and Output of target

sequence. Each part has been connected with another by using username and jobname

as demonstrated in figure below.

Figure 9 Three sectorial parts, connected by username and jobname, of data source

collection in database system.

2.2.1 Table in MySQL database

Seven tables were design to store all property fields of data. Three

tables store template and detail of each template structure. One table stores

membership’s information and the last two tables is the connection table.

2.2.2 Job data folder

The data of sequence alignment and structures are stored in db_Sys

folder and link to table by username and jobname.

28

2.2.3 PDB data folder

The data of template structure in pdb (Protein Data Bank) format

was stored in template3d folder and link to templatedetail table by templatename.

2.2.4 Profile data folder

The data of template structure in profile format, using in evaluation

process, was stored in template3d folder and link to templatedetail table by

templatename.

2.3 Bots

Bots, autonomous software that operates as an agent for simulates a human

activity, use to search and executable any data that user submit to this system. It’s

divided into three parts following in sequence alignment, automated 3D structure and

adds ligand to 3D structure using condition of output file in each steps was appeared

when finish.

29

Figure 10 Schematics diagram of Bots program.

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

30

RESULTS AND DISCUSSION

Graphic user interface had developed and call “3dvis” (3D Visualization

system). There are contained 5 hemagglutinin protein structures for use as template

structure, in PDB file format.

1. Homepage

The first page, when users visit the system in the home page, it is allowed for

all users. This page composts of two parts as show in Figure 10.

Figure 11 Home page of the system.

The first part is head and menu parts. The menus allowed all users access,

including How to use and About this work. If the users need to access as member,

they can register with individual information and aim in accession, by new member

link. The second part is member login part. The member must fill in both username

31

and password. Then the system checks valid user and status whether an authorize

member can go into the system.

2. Membership page

After the system checked valid user, it will go to the next page that is

workspace page. User can selected process menu to alignment, automate model of

protein structure using template structure hemagglutinin H5N1 proteins or pose

question or problem about the system.

2.1 workspace page

This page contains membership menu including 2D alignment, 3D

Automate model, PDB lists, webboard, wiki, Changepassword and Logout menu.

Figure 12 Workspace page.

32

2.2 2dalign page

If user select 2D alignment menu, the 2dalign page will appear. User

should be select template structure to alignment with sequence that user pose to server

(target sequence) or user can download sequence file in fasta format (.fa) to do

multiple sequence alignment.

Figure 13 2dalign page.

2.3 3dauto page

If user select 3D Automate model menu, the 3dauto page will appear. User

should be select template structure to generate model from target sequence that user

pose to server and give the name of target model. In addition user can select ligand

type to add to model.

33

Figure 14 3dauto page.

2.3 Result page

After user manage 3D automate model job completely and the system

calculate model complete user can use view menu to view the result. In this page

contain detail of model that show 3D structure, sequence alignment and DOPE score

evaluation graph that assist user to choose as best probable structures.

34

Figure 15 Result page.

35

2.4 Download page

When the system completely calculated structure, user can download all

file that generated as input and output file for MODELLER program.

Figure 16 Download page.

36

3. Example Model – KAN-1 (chain A)

3.1 With 1JSO template structure.

From figure 15 and 16, is the result of calculated structure from sequence

of KAN-1 hemagglutinin (Influenza A virus (A/Thailand/1(KAN-1)/2004(H5N1))).

The result shown mutation in residues at position 45, 71, 83, 88, 93, 94, 107, 108,

115, 119, 124, 126, 129, 138, 139, 140, 155, 156, 174, 198, 209, 212, 217, 261, 263,

268, 309 and 320 when compare with 1JSO template (Structure of avian H5

hemagglutinin bound to LSTC receptor). The result file contain alignment file (.pap),

3D structure file (.pdb), DOPE score graph picture (.png), profile file (.profile), and

other input and output file of MODELLER.

3.2 With 1JSM template structure.

For the 1JSM template, mutation residue appeared in positions 45, 71, 83,

88, 93, 94, 107, 108, 115, 119, 124, 126, 129, 138, 139, 140, 155, 156, 174, 198, 209,

212, 217, 261, 263, 268, 308 and 320.

3.3 With 1JSN template structure.

For the 1JSN template, mutation residue appeared in positions 45, 71, 83,

88, 93, 94, 107, 108, 115, 119, 124, 126, 129, 138, 139, 140, 155, 156, 174, 198, 209,

212, 217, 261, 263, 268, 308 and 320.

3.4 With 2FK0 template structure.

For the 2FK0 template, mutation residue appeared in positions 36 and 139.

3.5 With 2IBX template structure.

For the 2IBX template, mutation residue appeared in position 139.

37

4. Compare structure with another resource

To determine structure of the system can perform to initial structure, we

compare model with model that calculate from SWISS-MODEL WORKSPACE

(http://swissmodel.expasy.org/workspace/index.php). The comparison structure of

both resources was calculated DOPE score. The result following figure 17 and table 1.

Figure 17 DOPE score graph between 3D structure of KAN-1 structure that calculate

with the 3dvis system (green line) and SWISS-MODEL workspace (red

line) and 3D structure of 1JSO.

38

Table 1 DOPE score for mutation residue of KAN-1 model, calculated from this

system and SWISS-MODEL workspace, and 1JSO structure in each residues

and different percent value compare each residue of model with residue of

template structure.

1jso KAN-1 3dvis swiss 1jso 3dvis-1jso swiss-1jso45 ASN ASP -3.30E-02 -3.70E-02 -3.90E-02 0.846 0.94971 LEU ILE -3.30E-02 -3.40E-02 -3.40E-02 0.971 1.00083 ASP ALA -3.50E-02 -3.50E-02 -3.40E-02 1.029 1.02988 GLY ASP -3.40E-02 -3.40E-02 -3.30E-02 1.030 1.03093 GLU GLY -3.30E-02 -3.40E-02 -2.50E-02 1.320 1.36094 ASN ASP -3.30E-02 -3.40E-02 -2.50E-02 1.320 1.360

107 SER ARG -3.50E-02 -3.80E-02 -3.50E-02 1.000 1.086108 THR ILE -3.30E-02 -3.60E-02 -3.40E-02 0.971 1.059115 ARG GLN -3.50E-02 -3.60E-02 -3.70E-02 0.946 0.973119 ARG LYS -3.00E-02 -3.10E-02 -3.20E-02 0.938 0.969124 ASN SER -2.50E-02 -2.40E-02 -2.60E-02 0.962 0.923126 ASP GLU -2.80E-02 -2.60E-02 -2.60E-02 1.077 1.000129 SER LEU -3.00E-02 -3.00E-02 -2.50E-02 1.200 1.200138 ASN GLN -3.80E-02 -3.80E-02 -4.00E-02 0.950 0.950139 GLY ARG -3.70E-02 -3.70E-02 -4.00E-02 0.925 0.925140 ARG LYS -3.50E-02 -3.60E-02 -3.80E-02 0.921 0.947155 ASN SER -3.20E-02 -3.30E-02 -2.90E-02 1.103 1.138156 ALA THR -3.20E-02 -3.40E-02 -3.00E-02 1.067 1.133174 ILE VAL -4.40E-02 -4.60E-02 -4.60E-02 0.957 1.000198 VAL ILE -3.80E-02 -3.90E-02 -3.80E-02 1.000 1.026209 SER LEU -3.20E-02 -3.50E-02 -3.10E-02 1.032 1.129212 GLU ARG -2.90E-02 -3.10E-02 -2.90E-02 1.000 1.069217 PRO SER -2.90E-02 -3.10E-02 -3.20E-02 0.906 0.969261 GLY ASP -2.70E-02 -2.80E-02 -2.50E-02 1.080 1.120263 ALA THR -2.90E-02 -2.90E-02 -2.80E-02 1.036 1.036268 GLY GLU -2.90E-02 -2.90E-02 -3.00E-02 0.967 0.967309 GLY ASN -3.50E-02 -3.80E-02 -3.30E-02 1.061 1.152320 VAL SER -3.70E-02 -3.60E-02 -3.20E-02 1.156 1.125

number ofresidue

Residue DOPE score Absolute ratio

39



line) and 3D structure of 1JSN.

40


system and SWISS-MODEL workspace, and 1JSN structure in each

residues and different percent value compare each residue of model with

residue of template structure.

1jsn KAN-1 3dvis swiss 1jsn 3dvis-1jsn swiss-1jsn45 ASN ASP -3.50E-02 -3.10E-02 -3.90E-02 0.897 0.79571 LEU ILE -3.30E-02 -3.40E-02 -3.40E-02 0.971 1.00083 ASP ALA -3.50E-02 -3.50E-02 -3.30E-02 1.061 1.06188 GLY ASP -3.40E-02 -3.40E-02 -3.20E-02 1.063 1.06393 GLU GLY -3.30E-02 -3.40E-02 -2.50E-02 1.320 1.36094 ASN ASP -3.20E-02 -3.40E-02 -2.50E-02 1.280 1.360

107 SER ARG -3.70E-02 -3.80E-02 -3.30E-02 1.121 1.152108 THR ILE -3.50E-02 -3.60E-02 -3.20E-02 1.094 1.125115 ARG GLN -3.50E-02 -3.60E-02 -3.60E-02 0.972 1.000119 ARG LYS -3.00E-02 -3.10E-02 -3.20E-02 0.938 0.969124 ASN SER -2.50E-02 -2.40E-02 -2.60E-02 0.962 0.923126 ASP GLU -2.70E-02 -2.60E-02 -2.60E-02 1.038 1.000129 SER LEU -3.00E-02 -3.00E-02 -2.50E-02 1.200 1.200138 ASN GLN -3.60E-02 -3.80E-02 -4.00E-02 0.900 0.950139 GLY ARG -3.50E-02 -3.70E-02 -3.70E-02 0.946 1.000140 ARG LYS -3.30E-02 -3.60E-02 -3.80E-02 0.868 0.947155 ASN SER -3.10E-02 -3.30E-02 -2.90E-02 1.069 1.138156 ALA THR -3.10E-02 -3.40E-02 -3.00E-02 1.033 1.133174 ILE VAL -4.50E-02 -4.60E-02 -4.60E-02 0.978 1.000198 VAL ILE -3.90E-02 -3.90E-02 -3.70E-02 1.054 1.054209 SER LEU -3.30E-02 -3.50E-02 -3.10E-02 1.065 1.129212 GLU ARG -2.80E-02 -3.10E-02 -2.70E-02 1.037 1.148217 PRO SER -2.80E-02 -3.10E-02 -2.90E-02 0.966 1.069261 GLY ASP -2.70E-02 -2.80E-02 -2.40E-02 1.125 1.167263 ALA THR -2.70E-02 -2.90E-02 -2.60E-02 1.038 1.115268 GLY GLU -2.70E-02 -2.90E-02 -3.00E-02 0.900 0.967309 GLY ASN -3.60E-02 -3.80E-02 -3.20E-02 1.125 1.188320 VAL SER -3.60E-02 -3.60E-02 -3.10E-02 1.161 1.161

number ofresidue


41



line) and 3D structure of 1JSM.

42


system and SWISS-MODEL workspace, and 1JSM structure in each



1JSO KAN-1 3dvis swiss 1jso 3dvis-1jso swiss-1jso45 ASN ASP -0.033 -0.038 -0.039 0.846 0.97471 LEU ILE -0.033 -0.034 -0.035 0.943 0.97183 ASP ALA -0.034 -0.035 -0.034 1.000 1.02988 GLY ASP -0.033 -0.034 -0.033 1.000 1.03093 GLU GLY -0.033 -0.034 -0.025 1.320 1.36094 ASN ASP -0.032 -0.034 -0.025 1.280 1.360

107 SER ARG -0.037 -0.038 -0.035 1.057 1.086108 THR ILE -0.036 -0.036 -0.033 1.091 1.091115 ARG GLN -0.035 -0.037 -0.036 0.972 1.028119 ARG LYS -0.030 -0.031 -0.032 0.938 0.969124 ASN SER -0.025 -0.025 -0.026 0.962 0.962126 ASP GLU -0.027 -0.027 -0.026 1.038 1.038129 SER LEU -0.030 -0.030 -0.025 1.200 1.200138 ASN GLN -0.037 -0.035 -0.040 0.925 0.875139 GLY ARG -0.036 -0.034 -0.040 0.900 0.850140 ARG LYS -0.035 -0.033 -0.038 0.921 0.868155 ASN SER -0.031 -0.033 -0.029 1.069 1.138156 ALA THR -0.032 -0.034 -0.030 1.067 1.133174 ILE VAL -0.044 -0.046 -0.046 0.957 1.000198 VAL ILE -0.038 -0.039 -0.038 1.000 1.026209 SER LEU -0.032 -0.035 -0.031 1.032 1.129212 GLU ARG -0.028 -0.031 -0.029 0.966 1.069217 PRO SER -0.029 -0.031 -0.032 0.906 0.969261 GLY ASP -0.027 -0.028 -0.025 1.080 1.120263 ALA THR -0.028 -0.029 -0.028 1.000 1.036268 GLY GLU -0.028 -0.029 -0.031 0.903 0.935309 GLY ASN -0.036 -0.038 -0.032 1.125 1.188320 VAL SER -0.038 -0.037 -0.032 1.188 1.156

number ofresidue


43



line) and 3D structure of 2FK0.


system and SWISS-MODEL workspace, and 2FK0 structure in each



2FK0 KAN-1 3dvis swiss 2fk0 3dvis-2fk0 swiss-2fk036 LYS THR -0.028 -0.029 -0.020 1.400 1.450

139 GLY ARG -0.034 -0.037 -0.026 1.308 1.423

number ofresidue


44



line) and 3D structure of 2IBX.


system and SWISS-MODEL workspace, and 2IBX structure in each



2ibx KAN-1 3dvis swiss 2ibx 3dvis-2ibx swiss-2ibx139 GLY ARG -0.036 -0.037 -0.040 0.900 0.925

number ofresidue


45

From the result, KAN-1 sequence was conserved to 2IBX structure more than

2FK0, and same mutation in 1JSO, 1JSN and 1JSM structures. In each template

structures, were different in coordinate ligand and function, 2IBX structure is

hexamer form of H5N1 viruses isolated from human; 2FK0 structure is hexamer form

of hemagglutinin of influenza A virus (a/viet nam/1203/2004(h5n1)); 1JSO 1JSN and

1JSM structures is structure of avian H5 hemagglutinin, but structure have had bound

to LSTC (Sialyllacto-N-tetraose C) receptor in 1JSO and bound to LSTA (Sialyllacto-

N-tetraose A) receptor in 1JSN. So, user can select template for target structure that

consider to initial structures. We found that the DOPE score for structure calculated

with the system and SWISS-MODEL workspace was small different.

46

CONCLUSION AND RECOMMENDATION

Conclusion

This research developed 3D Visualization System to Aid Understanding

Hemagglutinin Mutations in H5N1� Influenza Virus� to preparing initial structure for

researcher to predict theoretical function of HA protein. Objective of system was

integrating software for homology modeling and structure visualization, control with

server side script and client side script language to aid assess user to determine model

that calculated. The development of the system consists of many steps. First, the x-ray

structures of hemagglutinin protein subtype H5N1 were collected to database. Second,

improved system architecture for system shown as figure 4, the system contain three

part; alignment, automate model and portal. Last, design and coding system for the

server side, developed by MySQL command, python and PHP script language to

control process that operate and management all job that user posed to server, and the

client side, improved by hypertext markup language (HTML) and javascript language

as graphic user interface of system. The system efficiency was acceptable, by

compared result with acceptable resource (SWISS-MODEL workspace) with

acceptable score (DOPE score).

Recommendation

The survey satisfy of user about 3dvis system, using purposive sampling in

student and researcher whose have skill in computational and protein modeling, the

result was good agreement with system (see appendix C). Suggestion of system could

be include security, frequency ask question (FAQ), checking of error and artificial

intelligent about sequence alignment.

To develop the next version of 3dvis system we prepared some data to aid

improvement such as system architecture (methods section) and survey satisfy data

about the system (appendix C).

47

LITERATURE CITED

�

Alfred, V. A.. 2004. Software and the Future of Programming Languages. Science

303: 1331-1333.

Andras, F.and S., Andrej. 2003. ModLoop: automated modeling of loops in protein

structures. BIOINFORMATICS. 19(18): 2500-2501.

Andrej, S.. 1995. MODELER: Implementing 3D protein modeling. mc2 Molecular

Simlations Inc Burlington MA. 2(5).

_________. 1995. Modeling mutations and homologous proteins. Curr. Opin.

Biotech. 6: 437-451.

_________ and T. L., Blundell. 1993. Comparative protein modeling by satisfaction

of spatial restraints. J. Mol. Biol. 234: 779-815.

_________, P., Liz, Y., Feng, V., Herman and K., Martin. 1995. Evaluation of

Comparative Protein Modeling by MODELLER. PROTEINS: Struct, Funct

and Genom. 23:318-326.

_________, L., Potterton, F., Yuan, H., Vlijmen and M., Karplus. 1995. Evaluation of

comparative protein modeling by MODELLER. Prot.: Struct., Func. Gene.

29:318-326.

Adrian, A. C., A. S., Andrew and L. D., Roland. 2003. A graph-theory algorithm for

rapid protein side-chain prediction. Prot. Sci. 12: 2001–2014.

Basak, I., D., Pemra and B., Ivet. 2002. Functional Motions of Influenza Virus

Hemagglutinin: A Structure-Based Analytical Approach. Biophys. J. 82: 569-

581.

48

Balamurugan, B., M. N. A. M., Roshan, B. S., Hameed, K., Sumathi, R.,

Senthilkumar, A., Udayakumar, K. H. V., Babu, M., Kalaivani, G., Sowmiya,

P., Sivasankari, S., Saravanan, C. V., Ranjani, K., Gopalakrishnan, K. N.,

Selvakumar, M., Jaikumar, T., Brindha, D., Michaela and K., Sekar .2007.

PSAP: protein structure analysis package. Journal of Applied

Crystallography. 40: 773-777.

Clayton, W. N., S. H., Virginia and G. W., Robert. 1984. Mutations in the

Hemagglutinin Receptor-Binding Site Can Change the Biological Properties

of an Influenza Virus. J. Virol. 51(2): 567–569.

Cros, J and P., Palese 2003. Trafficking of viral genomic RNA into and out of the

nucleus: influenza, Thogoto and Borna disease viruses. Virus Res. 95(1-2):3-

12.

David, E., S., Min-yi, D., Damien, M., Francisco, S., Andrej and A. M., Marc. 2006.

A composite score for predicting errors in protein structure models. Prot. Sci.

15: 1653-1666.

Edwin, D. K.. 2006. Influenza Pandemics of the 20th Century. Emerging Infectious

Diseases. 12(1): 9-14.

Eran, E., N., Rafael, J. M., Brendan, E., Marvin and S., Vladimir. 2004. Importance of

Solvent Accessibility and Contact Surfaces in Modeling Side-Chain

Conformations in Proteins. J. Comput. Chem. 25(5): 712-724.

Eswar, N., B., Webb, M. A., Marti-Remon, M. S., Madhusudhan, D., Eramian, M.,

Shen, U., Pieper and A., Sali. 2006. Comparative protein structure modeling

using Modeller. Curr. Prot. Bioinfo. Suppl. 15: 5.6.1-5.6.30.

Fiser, A., R. K., Do and A., Sali. 2000. Modeling of loops in protein structures. Prot.

Sci. 9: 1753-1773.

49

Gubareva, L. V., L., Kaiser and F. G., Hayden. 2000. Influenza virus neuraminidase

inhibitors. Lancet 355(9206): 827-835.

John, J. S. and C. W., Don. 2000. RECEPTOR BINDING AND MEMBRANE

FUSION IN VIRUS ENTRY: The Influenza Hemagglutinin. Annu. Rev.

Biochem. 2000. 69: 531–569.

John, K. O.. 1998. Scripting: Higher Level Programming Languages for the 21st

Century. IEEE Computer. 98: 23–30.

Kash, J., A., Goodman, M., Korth and M., Katze. 2006. Hijacking of the host-cell

response and translational control during influenza virus infection. Virus Res.

119(1):111-120.

Konstantin, A., B., Lorenza, K., Jurgen and S., Torsten. 2006. The SWISS-MODEL

workspace: a web-based environment for protein structure homology

modeling. BIOINFORMATICS. 22(2): 195-201.

Marc, A. M., P., Ursula, M. S., Madhusudhan, R., Andrea, E., Narayanan, P. D., Fred,

A., Fatima, D., Joaquin and S., Andrej. 2007. DBAli tools: mining the protein

structure space. Nucleic. Acids. Res. 35: W393-397.

___________, F., Andras, M. S., Madhusudhan, J., Bino, S., Ashley, E., Narayanan,

P., Ursula, S., Min-yi, and S., Andrej. 2003. Modeling Protein Structure from

Its Sequence. Curr. Prot. Bioinfo. 5.1: 5.1.1-5.1.32.

Melike, L., J. R., Michael, P. B., Hazen and Z., Xiaowei. 2003. Visualizing infection

of individual influenza viruses. PNAS 100(16):9280–9285.

Michal, J. P., T., Irina and M. B., Janusz. 2007. PROTMAP2D: visualization,

comparison and analysis of 2D maps of protein structure.

BIOINFORMATICS. 23(11): 1429-1430.

50

Narayanan, E., J., Bino, M., Nebojsa, F., Andras, A., Valentin, P., Ursula, C. S.,

Ashley, A. M., Marc, M. S., Madhusudhan, Y., Bozidar and S., Andrej. 2003.

Tools for comparative protein structure modeling and analysis. Nucleic.

Acids. Res. 31(13): 3375-3380.

Needleman, S. B. and C. D., Wungch. 1970. A general method applicable to the

search for similarities in the amino acid sequence of two proteins. J. Mol.

Biol. 48: 443-453.

Patrick, A., Q., Enrique, X. A., Francesc and J. E. S., Michael. 2001. Automated

Structure-based Prediction of Functional Sites in Proteins: Applications to

Assessing the Validity of Inheriting Protein Function from Homology in

Genome Annotation and to Protein Docking. J. Mol. Biol. 311: 395-408.

Roshan, B. B., M. N. A., Md, B. S., Hameed, K., Sumathi, R., Senthilkumar, A.,

Udayakumar, K. H. V., Babu, M., Kalaivani, G., Sowiya, P., Sivasankari, S.,

Saravanan, C. V., Ranjani, K., Gopalakrishnan, K.N., Selvakumar, M.,

Jaikumar, T., Brindha, D., Michael and K., Sekar. 2007. PSAP: protein

structure analysis package. J.Appl.Cryst. 40: 773–777.

Robert, G. W., J. B., William, T. G., Owen, M. C., Thomas, and K., Yoshihiro. 1992.

Evolution and Ecology of Influenza A Viruses. Microbiol Rev. 56(1): 152-

179.

Shiuh-Ming, L, W. M., Martha and S. H., Virginia. 1992. Hemagglutinin Mutations

Related to Antigenic Variation in Hi Swine Influenza Viruses. J. Virol.

66(2):1066-1073.

Stephen, C.. 2007. SChiSM2: creating interactive web page annotations of molecular

structure models using Jmol. BIOINFORMATICS. 23(3): 383-384.

51

Suzuki, Y.. 2005. Sialobiology of Influenza Molecular Mechanism of Host Range

Variation of Influenza Viruses. Biol. Pharm. Bull. 28(3): 399-408.

Torsten, S., K., Jurgen, G., Nicolas and C. P., Manuel. 2003. SWISS-MODEL: an

automated protein homology-modeling server. Nucleic. Acids. Res. 31(13):

3381–3385.

Thomas, J. O.. 2004. A Java applet for multiple linked visualization of protein

structure and sequence. J. Comp. Aid. Mol. Design. 18: 225-234.

Ursula, P., E., Narayanan, P. D., Fred, B., Hannes, M. S., Madhusudhan, R., Andrea,

M., Marc, K., Rachel, M. W., Ben, E., David, S., Min-Yi, K., Libusha, M.,

Francisco and S., Andrej. 2006. MODBASE: a database of annotated

comparative protein structure models and associated resources. Nucleic.

Acids. Res. 34: D291-D295.

Valentin, A. l., P., Ursula, C. S., Asley, A. M., Marc, M., Linda and S., Andrej. 2003.

ModView, visualization of multiple protein sequences and structures.

BIOINFORMATICS. 19(1): 165-166.

Vladimir, S., E., Eran, G., Sergey, P., Vladimir, B., Mariana, P., Jaime and E.,

Marvin. 2005. SPACE: a suite of tools for protein structure prediction and

analysis based on complementarity and environment. Nucleic. Acids. Res. 33:

W39-W43.

Yutaka, U. and A., Kiyoshi. 2002. MOSBY: a molecular structure viewer program

with portability and extensibility. J. Mol. Graph. Mod. 20: 411-413.

52

APPENDICES

53

Appendix A

Methodologies implemented in MODELLER

54

Methodologies implemented in MODELLER

1. Structure optimization method

Structure optimization methods implemented in MODELLER contain general

form of the objective function and the structures of optimization are similar to

molecular dynamics programs, such as CHARMM.

1.1 Objective function

MODELLER minimizes the objective function F with respect to

Cartesian coordinates of 10,000 atoms (3D points) that form a system, one or more

molecules.

��i

iiisymm pfcFRFF ),()(

(1)

Where symmF is an optional symmetry term, R are Cartesian

coordinates of all atoms, is a restraint i, f is a geometric feature of a molecule, and P

are parameters. For a 10,000 atom system there can be on the order of 200,000

restraints. The form of c is simple; it includes a quadratic function, cosine, a weighted

sum of a few Gaussian functions, Coulomb law, Lennard-Jones potential, cubic

splines, and some other simple functions. The geometric features presently include a

distance, an angle, a dihedral angle, a pair of dihedral angles between two, three, four

atoms and eight atoms, respectively, the shortest distance in the set of distances (not

documented further), solvent accessibility in oA 2 , and atom density expressed as the

number of atoms around the central atom. A pair of dihedral angles can be used to

restrain such strongly correlated features as the mainchain dihedral angles � and� .

Each of the restraints also depends on a few parameters ip that generally vary from a

restraint to a restraint. Some restraints can restrain pseudo-atoms such as a gravity

center of several atoms.

55

There are two kinds of restraints, static and dynamic, that both

contribute to the objective function

dssymm FFFF �� (2)

The static restraints and their parameters are pre-defined. The

dynamic restraints are re-generated repeatedly during optimization. All dynamic

restraints are always selected and they can restrain only pairs of atoms. In all other

respects, the two kinds of restraints are the same.

The dynamic restraints are obtained from a dynamic pairs list (the

non-bonded pairs list). Each dynamic pair corresponds to at least one restraint, which

may or may not be violated. The dynamic pairs list includes only the pairs of atoms

that satisfy the following three conditions: (1) One or both atoms in a pair are allowed

to move. (2) The two atoms are not connected through one, two, or three chemical

bonds. (3) The two atoms are closer than a preset cutoff distance��There are on the

order of 5000 atom pairs in the dynamic pairs list when only soft-sphere overlap

restraints are used. Currently, the restraint types on the dynamic atom pairs that can be

selected include the soft-sphere overlap, Lennard-Jones, Coulomb interactions, and

MODELLER non-bonded spline restraints. The existence of the dynamic pairs list is

justified by the fact that dynamic pairs are usually a small fraction of all possible

atom-atom pairs, 2/)1( �NN , where N is the number of atoms in a system.

The dynamic pairs list is not necessarily re-generated each time the

objective function is evaluated, although the contribution of the restraint to the

objective function is calculated in each call to the objective function routine with the

current values of the Cartesian coordinates. The dynamic pairs list is re-generated

only when maximal atomic shifts accumulate to a value larger than a preset cutoff.

This cutoff is chosen such that there cannot be a violation of a restraint without

having its atom pair on the dynamic pairs list. The dynamic pairs list is recalculated in

56

%20~ and %2~ of the objective function calls at the beginning and the end of

optimization, respectively.

Each evaluation of the objective function or of its first derivatives

with respect to the Cartesian coordinates involves the following steps:

a) Calculate non-fixed pseudo-atoms from the current atomic

positions.

b) Update the dynamic pairs list, if necessary.

c) Calculate the violations of selected restraints and all other

quantities that are shared between the calculations of the objective function and its

derivatives.

d) Sum the contributions of all violated restraints to the

objective function and the derivatives.

1.2 Optimizer

MODELLER currently implements a Beale restart conjugate gradients

algorithm and a molecular dynamics procedure with the leap-frog Verlet integrator.

The conjugate gradients optimizer is usually used in combination with the variable

target function method which is implemented with the automodel class. The

molecular dynamics procedure can be used in a simulated annealing protocol that is

also implemented with the automodel class.

1.2.1 Molecular dynamics

Force in MODELLER is obtained by equating the objective

function with internal energy in kcal/mole. The atomic masses are all set to that of

C (MODELLER unit is kg/mole). The initial velocities at a given temperature are

obtained from a Gaussian random number generator with a mean and standard

deviation of:

0_

�xv (4)

57

i

Bx m

Tk��

(5)

Where Bk the Boltzmann constant, mi is is the mass of one 12C

atom, and the velocity is expressed in angstroms/femtosecond. The Newtonian

equations of motion are integrated by the leap-frog Verlet algorithm.

iiii m

ttr

Fttr

ttr

��)(22

..

��

� �

� ��

� �

� � (6)

tt

trtrttr iii ��

� �

� �

� ��2

)()(.

(7)

Where ir is the position of atom i. In addition, velocity is

capped at a maximum value, before calculating the shift, such that the maximal shift

along one axis can only be cap_atom_shift. The velocities can be equilibrated every

equilibrate steps to stabilize temperature. This is achieved by scaling the velocities

with a factor f:

kinET

f � (8)

��N

iiikin rmE.2

21

(9)

Where kinE is the current kinetic energy of the system.

1.2.2 Langevin dynamics

Langevin dynamics (LD) are implemented as in the equations

of motion are modified as follows.

58

�

�

�

�

��

� �

�

��

�

�

�

�

�

��

� �

� ��

� �

� �tm

ttr

FR

t

tttr

ttr

iiiii

��

�

��

��

21

1

1)(

21

1

21

1

22

..

(10)

Where � is a friction factor (in sf / ) and iR a random force,

chosen to have zero mean and standard deviation

tTkm

R Bii �

��

2)( � (11)

1.2.3 Self-guided MD and LD

MODELLER also implements the self-guided MD and LD

methods. For self-guided MD, the equations of motion are modified as follows.

�

� �

�

��

� �

��

)()()(1)(

trF

ttgtt

ttgtt

tgi

il

il

i ��

��

(12)

iiiii m

ttr

Fttg

ttr

ttr

��

��

� �

�

��

� �

� ��

� �

� �)(

)(22

..

(13)

Where � is the guiding factor, the same for all atoms, lt the

guide time in femtoseconds, and ig a guiding force, set to zero at the start of the

simulation. Position ir is updated in the usual way. For self-guided Langevin

dynamics, the guiding forces are determined as follows:

�

� �

� ��

� �

��

2)(1)(

. ttrm

tt

ttgtt

tg iil

il

i�

��

��

(14)

A scaling parameter � is then determined by first making an

unconstrained half step.

59

iiiiii m

ttr

FRtg

ttrtr

��

��

�

� �

��

�

� �

� ��)(

)(21

2)(

..' (15)

�

�

�

�

�

� �

� ��

��

N

i ii

N

i ii

trm

trtgt.

2'

.'

)(

)()(

21

�� (16)

1

2)(

1�

�

� �

� ��

t�� (17)

Finally, the velocities are advanced using the scaling factor.

iiiiii m

ttr

FRtg

ttr

ttr

��

��

��

� �

�

��

� �

� ��

� �

� �)(

)(2

)12(2

..

(18)

1.2.4 Rigid molecular dynamics

Where rigid bodies are used, these are optimized separately

from the other atoms in the system. This has the additional advantage of reducing the

number of degrees of freedom. The state of each rigid body is specified by the

position of the center of mass, COMr , and an orientation quaternion, ~

q . The quaternion

has 4 components, 1q through 4q , of which the first three refer to the vector part, and

the last to the scalar. The translational and rotational motions of each body are

separated. Each body is translated about its center of mass using the standard Verlet

equations using the force:

�

�

i iCOM rF

rF

(19)

Where the sum i operates over all atoms in the rigid body, and

ir is the position of atom i in real space. For the rotational motion, the orientation

60

quaternions are again integrated using the same Verlet equations. For this, the

quaternion accelerations are calculated using the following relation:

�

�

�

�

��

��

��

�

4321

3412

2143

1234

qqqq

qqqq

qqqq

qqqq

W (20)

Where W is the orthogonal matrix and .'kw is the first derivative

of the angular velocity (in the body-fixed frame) about axis k, angular acceleration.

These angular accelerations are in turn calculated from the Euler equations for rigid

body rotation, such as:

x

zyZyxk I

wwIITw

''.' )( �� (21)

Similar equations exist for the y and z components. The

angular velocities 'w are obtained from the quaternion velocities.

.~

'

'

'

2

0

qWw

w

w

z

y

x

�

�

�

�

�

(22)

The torque, T , in the body-fixed frame, is calculated as

iiCOMi r

FxrrAT

�� )( (23)

61

And A is the rotation matrix to convert from world space to

body space.

�

�

�

�

��

��

��

�

21

21

21

2

24

2341324231

413224

224321

4221432124

21

qqqqqqqqqq

qqqqqqqqqq

qqqqqqqqqq

A (24)

And finally the component of the inertia tensor, xI , is given

by

� ��i

ziyiix rrmI )( 2',

2', (25)

Where 'ir is the position of each atom in body space, and im is

the mass of atom i (taken to be the mass of one 12C atom, as above). Similar relations

exist for the y and z components. The kinetic energy of each rigid body, used for

temperature control, is given as a combination of translation and rotational

components:

� �2'2'2'.

2

21

21

zzyyxxCOMi

bodykin wIwIwIrmE ��

�

� �

�� (26)

Initial translational and rotational velocities of each rigid body

are set in the same way as for atomistic dynamics.

1.2.5 Rigid minimization

The state of each rigid body is specified by 6 parameters: the

position of the center of mass, COMr , and the rotations in radians about the body-fixed

62

axes: x� , y� , and z� . The first derivative of the objective function F with respect to

the center of mass is, and those with respect to the angles from:

iik

k rF

rMF

�

.'�

(27)

The transformation matrices kM are given as.

��

�

�

��

�

�

��

��

��

�

xyxy

xyzxzxyzxz

xyzxzxyzxz

x cM

��

��

��

sincos coscos 0

sinsinsincosos cossinsinsincos 0

sinsincoscossin cossincossinsin 0

��

�

�

��

�

�

��

��

��

�

xyxyy

xyzxyzyz

xyzxyzyz

yM

��

��

��

cossin sinsin cos

coscossin sincossin sinsin

coscoscos sincoscos sincos

��

�

�

��

��

�

0 0 0

cossinsinsinsin sinsincoscossin coscos

cossinsinsincos sinsinsincoscos cossin

xyzxzxyzxzyz

xyzxzxyzxzyz

zM ��

��

(28)

The atomic positions ir are reconstructed when necessary from

the body's orientation by means of the following relation, where is the rotation

matrix.

COMii rMrr �� ' (29)

��

�

�

��

�

�

��

��

��

�

xyxyx

xyzxzxyzxzyz

xyzxzxyzxzyz

M

��

��

��

coscos sincos sin

cossinsinsincos sinsinsincoscos cossin

cossincossinsin sinsincoscossin coscos

(30)

63

2 Statistical potential

The statistical potential for assessment and prediction of protein

structure implemented in MODELLER that used in this work was Discrete

Optimization Potential Energy (DOPE) score. DOPE is an atomic distance-dependent

statistical potential calculated from a simple of native protein structure.

Prediction of the native structure of a protein would be enabled by

expressing as a scoring function whose global optimum corresponds to the native

structure. One such function is a joint probability density function of the Cartesian

coordinates of the protein atoms, given available information I about the

system, )|,...,,,( 321 Ixxxxp N�� , where N is the number of atoms in the protein and ix�

are Cartesian coordinates of atom i. For each atom in a given protein, the joint

probability density function (pdf) p gives the probability density that the atom I of the

native structure is positioned very close to ix� . In general, information I may include

the sequence of the protein, a molecular mechanics force field, experimental structural

information, a sample of known native structure and an alignment of the sequence to a

related known protein structure. The joint pdf p can be approximated by a normalized

product of the pair pdfs for all protein atom pairs:

��

N

jiji

NN

ii

N

jijiN xxpxpxxpxxxxp

�

�

��

� �

�� ),()(/),(),...,,,(

2

321 � (31)

The denominator is derived from the condition that be joint pdf must be

a product of single body pdfs when all the pair pdfs are uncorrelated with each other.

The terms in the denominator, )( ixp � , are single-body distribution functions that

depend only on the composition of the protein and the total volume of the system. In

other words, )( ixp � is the number density of atom I, equal to the reciprocal volume of

the system. An )( ixp � is constant for a given protein, it does not impact on the rank

order of different conformations and is ignored here.

64

In the context of the statistical mechanical liquid state theory, Equation

(31) is also known as the Kirkwood superposition approximation. The superposition

approximation would be exact only if all the pair pdfs were mutually independent

from each other. The pair pdfs ),( ji xxp �� of atom pairs are generally interdependent

because each atom in the system interacts with more than one other stom. In general,

the Kirkwood approximation of the joint pdf ),...,,,( 32 Ni xxxxp �� by pair pdfs ),( ji xxp ��

is cleary more accurate than a product of N single-body pdfs, )( ixp � .

Estimate the pair pdf ),( ji xxp �� for all atom parts ),( ji , using a single

sample native structure. A structure is defined by internal coordinates that are

invariant with respect to translation and rotation. Thus, the interparticle distance r

between ix� and ix� is the most relevant internal coordinate for a pair of atoms.

Consequently, the distribution that can be estimated directly from a sample native

structure is the distance pdf for a pair of atom type:

� �

irimn

mnmn rrN

rNrp

)()(

)( (32)

Where m and n denote the atom types and )(rNmn is the number of atom

type pairs (m,n) at a distance within ][ rr � . The distance pdf is proportional to the

number of (m,n) pairs in a spherical shell of volume rr 24! ; thus the density of the

(m,n) pairs in the shell is 24)(

rrpmn

!. For a finite and nonspherical native structure, only

a fraction )(r" of the spherical shell between r and rr � centered on ix� is

occupied by protein atoms 1)(0 ## r" (Appendix figure A1). Thus, the density of the

(m,n) pairs at the distance r is )(4

)(2 rr

rpmn

"!

65

Appendix Figure A1 Schematic representation of the reference state

Source: Min-yi Shen and Andrej Sali, (2006)

From the appendix figure 1A, (A) is an illustration showing why only

fraction of a spherical shell generally contributes to the normalization function (see

equation (33)). (B) A pair of noninteracting atoms in a protein is modeled by two

points positioned randomly inside shere with radius a; the points are at distance r from

each other. (C) The large and small spheres are the reference and probe spheres.

The relate the distance pdf )(rpmn to the pair pdf ),( ji xxp ��. The

probability of finding atom i at ix� and atom j at jx� is )()( ji xpxp �� . Therefore, the

pair pdf ),( ji xxp �� is the product of the pair probability )()( ji xpxp �� and the (m,n) pair

density:

)(/)()(/)()()(

))(4/()()()(),(

,,

2,

rnrprnrpxpxp

rrrpxpxpxxp

nmnmji

nmjiji

�

"!��

��

�

� (33)

66

Where n(r) is the normalization function equal to )(4 2 rr "! , and m and n

are the types of atoms i and j , respectively. The single-body pdf )( ixp� is the number

density of atom i and is ignored because it does not impact on the ranking of different

conformations of the same protein. The calculation of the normalization function )(rn

is not straightforward because the native structures are finite and varying in size

(Appendix figure 1A). Therefore, we explicitly denote )(rn as dependent on the size

a of the sample native structure, );( arn .The size a define as the radius of the sphere

of uniform density that has the same radius of gyration gR as the sample native

structure; thus, gRa35

� . Similarity in distance pdf )(, rp nm as );(, arp nm .

The pair pdf ),( ji xxp �� for tha sample of all native structures is calculates

as a weighted sum of the pair pdfs corresponding to the individual sample structures:

��s

nmsji arnarpwxxp );(/);(),( ,�� (34)

Where index s runs over all sample native structures. This averaging

procedure is based on the presumed independence of the pair pdf from the protein

size. The joint pdf p defines the N-body correlation function g

),...,,,()(),...()(),...,,,( 321)(

21321 Nn

NN xxxxxgxpxpxpxxxxp �� (35)

The total free energy G of the system can then be expressed in terms of

the correlation function g :

),...,,,(ln),...,,,( 321)(

321 Nn

BN xxxxxgTkxxxxG �� (36)

Where Bk is the Boltzmann constant. Therefore, an approximate free

energy of a system is

67

��

�

�

�

��

N

ji ji

N

ji

njiBN

ru

rgTkxxxxG

)(

)(ln),...,,,(

,

)(,321

��

(37)

Where )()(, rg nji is the radial distribution function equal to

)(/)(, rnrp nm , and )(, ru ji is the potential of mean force for a pair of atoms,

� ��

N

ji

njiBji rgTkru )(ln)( )(

,, . Because )()(, rnrp REFnm � and thus 0)(, �ru ji for the

reference state, the potential of mean force derived from the observed distance pdf

)(, rp nm is

�

� �

��

�

� �

��

)(

)(ln

)(

)(ln)(

,

,

,

,, rN

rNTk

rp

rpTkru

REFnm

OBSnm

BREFnm

nmBji (38)

Where )(, rN OBSnm and )(, rN REF

nm are the numbers of atom type pairs ),( nm

at the distance r within ),( rrr � for the interacting real system and the

noninteracting reference state and )(, rpREFnm is reference state uncorrelated uniform

atomic density is grounded of the native structures., respectively. The reference state

is identical to that of the discharge state used for free energy calculations in statistical

mechanics. The equation (38) establishes the relation between the statistical potential

derived from a sample of known structure and the potential of mean force.

3 File format description

The file that important in homology modeling using MODELLER was

two files following:

3.1 Alignment file (PIR)

The alignment file preferred format for comparative modeling is

related to the PIR database format. A sample PIR format following.

68

Appendix Figure A2 Input alignment file in PIR format.

From figure A2, the first line of each sequence entry specifies the

protein code after the >P1; line identifier. The line identifier must occur at the

beginning of the line. For example, 1fdx is the protein code of the first entry in the

alignment above. The protein code corresponds to the alnsequence.code variable.

(Conventionally, this code is the PDB code followed by an optional one-letter chain

ID, but this is not required.). The second line of each entry contains information

necessary to extract atomic coordinates of the segment from the original PDB

coordinate set. The fields in this line are separated by colon characters, ‘:’. The fields

are as follows:

Field 1: A specification of whether or not 3D structure is available

and of the type of the method used to obtain the structure (structureX, X-ray;

structureN, NMR; structureM, model; sequence, sequence). Only structure is also a

valid value.

Field 2: The PDB code. While the protein code in the first line of

an entry, which is used to identify the entry, must be unique for all proteins in the file,

C; A sample alignment in the PIR format; KAN-1 >P1;1jso structeurX:1jso :A :325 : : : : : : : DQICIGYHANNSTEQVDTIMEKNVTVTHAQDILEKTHNGKLCDLNGVKPLILRDCSVAGWLLGNPMCDEFLNVPEWSYIVEKDNPVNGLCYPENFNDYEELKHLLSSTNHFEKIRIIPRSSWSNHDASSGVSSACPYNGRSSFFRNVVWLIKKNNAYPTIKRSYNNTNQEDLLILWGIHHPNDAAEQTKLYQNPTTYVSVGTSTLNQRSVPEIATRPKVNGQSGRMEFFWTILKPNDAINFESNGNFIAPEYAYKIVKKGGSAIMKSGLEYGNCNTKCQTPMGAINSSMPFHNIHPLTIGECPKYVKSGRLVLATGLRNVPQRET* >P1;1jso structeurX:1jso :B :176 : : : : : : : GLFGAIAGFIEGGWQGMVDGWYGYHHSNEQGSGYAADKESTQKAIDGTTNKVNSIIDKMNTQFEAVGKEFNNLERRIENLNKKMEDGFLDVWTYNAELLVLMENERTLDFHDSNVKNLYDKVRLQLRDNAKELGNGCFEFYHKCDNECMESVKNGTYDYPQYSEEARLNREEISGV* >P1;KAN-1 sequence:KAN-1 :321 : : : : : : : DQICIGYHANNSTEQVDTIMEKNVTVTHAQDILEKTHNGKLCDLDGVKPLILRDCSVAGWLLGNPMCDEFINVPEWSYIVEKANPVNDLCYPGDFNDYEELKHLLSRINHFEKIQIIPKSSWSSHEASLGVSSACPYQRKSSFFRNVVWLIKKNSTYPTIKRSYNNTNQEDLLVLWGIHHPNDAAEQTKLYQNPTTYISVGTSTLNQRLVPRIATRSKVNGQSGRMEFFWTILKPNDAINFESNGNFIAPEYAYKIVKKGDSTIMKSELEYGNCNTKCQTPMGAINSSMPFHNIHPLTIGECPKYVKSNRLVLATGLRNSP*

69

the PDB code in this field, which is used to get structural data, does not have to be

unique. It is a good idea to use the PDB code with an optional chain identifier as the

protein code. The PDB code corresponds to the alnsequence.atom_file variable and

can also contain the full atom filename, directory included.

Fields 3-6: The residue and chain identifiers (see below) for the

first (fields 3-4) and last residue (fields 5-6) of the sequence in the subsequent lines.

There is no need to edit the coordinate file if a contiguous sequence of residues is

required -- simply specify the beginning and ending residues of the required

contiguous region of the chain. If the beginning residue is not found, no segment is

read in. If the ending residue identifier is not found in the coordinate file, the last

residue in the coordinate file is used. By default, the whole file is read in. The

unspecified beginning and ending residue numbers and chain id's for a structure entry

in an alignment file are taken automatically from the corresponding atom file, if

possible. The first matching sequence in the atom file that also satisfies the explicitly

specified residue numbers and chain id's is used. A residue number is not specified

when a blank character or a dot, .', is given. A chain id is not specified when a dot, .',

is given. This slight difference between residue and chain id's is necessary because a

blank character is a valid chain id.

Field 7: Protein name. Optional.

Field 8: Source of the protein. Optional.

Field 9: Resolution of the crystallographic analysis. Optional.

Field 10: R-factor of the crystallographic analysis. Optional.

Note that, each sequence must be terminated by the terminating

character, *', and chain breaks are indicated by /'. There should not be more than one

chain break character to indicate a single chain break (use gap characters instead, -').

70

The alignment file can contain any number of blank lines between

the protein entries. Comment lines can occur outside protein entries and must begin

with the identifiers C;' or R;' as the first two characters in the line.An alignment file

is also used to input non-aligned sequences.

3.2 Restraints file

The first line of a restraints file should read 'MODELLER5

VERSION: MODELLER FORMAT'. After this, there is one entry per line. The

format is free, except that the first character has to be at the beginning of the line.

When the line starts with 'R', it contains a restraint, 'E' indicates a pair of atoms to be

excluded from the calculation of the dynamic non-bonded pairs list, 'P' indicates a

pseudo atom definition, 'S' a symmetry restraint, and 'B' a rigid body. In this work will

used Restraints file contain line starts with R only.

Appendix Figure A3 Restrain file.

From figure A3, An 'R' line should look like: R Form Modality

Feature Group Numb_atoms Numb_parameters Numb_Feat Atom_indices

Parameters. Its will create a Gaussian restraint on the distance between atoms 3 and 2,

with mean of 1.5380 and standard deviation of 0.0364.

Form is the restraint form type (see Appendix table A1). Modality

is an integer argument to Form, and specifies the number of single Gaussians in a

poly-Gaussian pdf, periodicity n of the cosine in the cosine potential, and the number

of spline points for cubic splines. Feature is the feature that this restraint acts on (see

Appendix table A2.) Group is the physical feature type. Numb_atoms is the total

number of atoms this restraint acts on, Numb_parameters is the number of defined

parameters, and Numb_Feat is the number of features the restraint acts on.

MODELLER5 VERSION: MODELLER FORMAT R 3 1 1 1 2 2 1 3 2 1.5380 0.0364 …

71

Numb_Feat is typically 1, except for the multiple binormal (where it should be 2) and

ND spline (where it can be any number). In cases where Numb_Feat is greater than 1,

the modality, feature type, and number of atoms of each subsequent feature should be

listed in order after Numb_Feat. Finally, the integer atom indices and floating point

parameters are listed.

Appendix Table A1 Numerical restraint forms.

Numeric form Form type

1 forms.lower_bound

2 forms.upper_bound

3 forms.gaussian

4 forms.multi_gaussian

5 forms.lennard_jones

6 forms.coulomb

7 forms.cosine

8 forms.factor

9 forms.multi_binormal

10 forms.spline or forms.nd_spline

50+ user-defined restraint forms

72

Appendix Table A2 Numerical feature types.

Numeric feature Feature type

1 features.distance

2 features.angle

3 features.dihedral

6 features.minimal_distance

7 features.solvent_access

8 features.density

9 features.x_coordinate

10 features.y_coordinate

11 features.z_coordinate

12 features.dihedral_diff

50+ user-defined feature types

73

Appendix B

Poster contribution to conferences

74

Poster presentation

Web-Based Application for Automated 3D Structure of Hemagglutinin (H5N1)

Taweesak Poochai, Daungmanee Chuakhaew, Chak Sangma Web-Based Application for Automated 3D Structure of Hemagglutinin (H5N1). Pure

and Applied Chemistry International Conference (PACCON) 2008, January 30 - February 1, 2008, Sofitel Centara Grand Bangkok, Bangkok, Thailand

79

Appendix C The survey satisfy of user about 3dvis system

80

The survey satisfy of user about 3dvis system

The survey satisfies of user about 3dvis system, using purposive sampling in

student and researcher whose have skill in computational and protein modeling.

Criterion to compile the result from questionnaire (in Thai language) can divide in to

fifth level from Likert’s Scale�following:

Means 1.00-1.50 mean not agreement

Means 1.51-2.50 mean few agreement

Means 2.51-3.50 mean moderate agreement

Means 3.51-4.50 mean much agreement

Means 4.51-5.00 mean most agreement

Symbol in statistical analysis

X � mean Mean

S.D. mean Standard Deviation

SS mean Sum of Squares

df mean Degree of Freedom

MS mean Mean of Squares

� 2 mean Chi-square Sig mean Significance

Statistical analysis that used in this work was means and standard deviation.

The analysis result were percentage of people whose answer this questionnaire

separate with sex, age, education level, status, work, expert/interesting field and

current working.

81

Appendix Table C1 The statistical analysis result in percentage of people whose

answer this questionnaire separate with sex, age, education level,

status, work, expert/interesting field and current working.

Detail Number of

people Percentage

Sex 11 ��

Male 3 27.3

Female 8 72.7

Age (years) 11 ��

# 25 5 45.5

26-30 5 45.5

31-35 1 9.1

Education level 11 100

Bachelor 1 9.1

Master 7 63.6

Doctor 3 27.3

Status 11 100

Student 9 81.8

Researcher 2 18.2

Expert/Interesting field 11 100

Protein modeling 4 36.4

Computational 7 63.6

Current working 11 100

Theoretical 8 72.7

Other 3 27.3

From appendix table C1 we found that the people whose answer is female

more than male (72.7% in female and 27.3% male) and age of people was in range 25

years old or younger to 30 years old. The education level of people was in master

82

degree (63.6%) more than doctoral (27.3%) and bachelor degree (9.1%) and most of

people was student (81.8%). When desirable in current working most people were in

field of theoretical (72.7%) and both theoretical and experimental (27.3%).

� Next, analysis data about percentage whose answer the questionnaire in

experience of homology modeling and homology related programs.

Appendix Table C2 The statistical analysis result in percentage of people whose

answer this questionnaire in experience of homology modeling

and homology related programs.

Detail Number of

people Percentage

Experience of homology modeling� 11� ��

Yes� 8 72.7

No 3 27.3

Experience in homology program: WHAT-IF 8 ��

Yes� 2 25.0

No 6 75.0

Experience in homology program: MODELLER 8 ��

Yes� 4� 50.0

No 4 50.0

Experience in homology program: SCWRL 8 ��

Yes� 1� 12.5

No 7 70.0

Experience in homology program: SCCOMP 8 ��

Yes� -� -

No 8 100.0

Experience in homology program: other 8� ��

Yes� 3 37.5

No 5 62.5

83

Appendix Table C2 (Continued)

Detail Number of people Percentage

Experience of molecular viewer 8� 100

Yes 8 72.7

No - 27.3

Experience of molecular viewer: DS viewer

pro 8 100

Yes 5 37.5

No 3 62.5

Experience of molecular viewer: Jmol 8 100

Yes 4 50.0

No 4 50.0

Experience of molecular viewer: Gauss view 8� 100

Yes 6 75.0

No 2 25.0

Experience of molecular viewer: WebMol 8 100

Yes 1� 12.5

No 7 87.5

Experience of molecular viewer: KING 8 100

Yes - -

No 8 100.0

Experience of molecular viewer: QuickPDB 8 100

Yes� 2 75.0

No 6 25.0

Experience of molecular viewer: RasMol 8 100

Yes� 2 75.0

No 6 25.0

Experience of molecular viewer: PyMol 8 100

Yes� 4 50.0

No 4 50.0

84


Detail Number of

people Percentage

Experience of molecular viewer: other 8 100

Yes - -

No 8 100.0

Experience of molecular statistical analysis 8 100

Yes 5 62.5

No 3 37.5

Experience of molecular statistical analysis:

Ramanchandran plot 8 100

Yes 5 62.5

No 3 37.5

Experience of molecular statistical analysis: DOPE 8 100

Yes 2 25.0

No 6 75.0

Experience of molecular statistical analysis: Anolea 8 100

Yes - -

No 8 100.0


Gromos 8 100

Yes - -

No 8 100.0


Verify3D 8 100

Yes 2 25.0

No 6 75.0

Experience of molecular statistical analysis: other 8 100

Yes 1 12.5

No 7 87.5

85


Detail Number of

people Percentage

Experience of using homology modeling web

server. 8� 100

Yes 3 37.5

No 5 62.5


server: SWISS-MODEL workspace. 8 100

Yes 3 37.5

No 5 62.5


server: 3DJIGSAW. 8 100

Yes 1 12.5

No 7 87.5


server: MODWEB. 8 100

Yes - -

No 8 100.0


server: other. 8 100

Yes� - -

No 8 100.0

From appendix table C2 we found that most people have experience in

Homology modeling (72.7%) and most experience in MODELLER (50%) of

homology program. The consider in experience of using molecular viewer program,

the result shown that the Gauss view program was the most experience (75.0%) of

this group. The experience of using homology modeling web server was less in this

group (37.5 for SWISS-MODEL workspace and 12.5% for 3DJIGSAW).

86

Appendix Table C3 The statistical analysis result about 3dvis system separate with

sex.

Male Female Total Detail

X S.D. X S.D.� X S.D.�

Using 3dvis system�

1. Interesting of 3dvis system.�

3.67

0.577

4.25�

0.463

2. Easy to use.� 4.00� 0.000 3.88� 0.641

3.�Language correction in 3dvis

system.� 3.67 0.577 3.88� 0.641

4.�System efficiency in processing. 3.33� 1.160 3.75� 1.040

5.�Result of the system was satisfy.� 3.00 0.000 3.63� 0.744

6.�Result of the system was

convince.� 3.67� 0.577 3.75� 0.463

7. 3dvis system useful. 2.67 0.577 3.88� 0.641

8. System efficiency. 3.67 0.577 3.75� 0.463

9.�Satisfy of using 3dvis system.� 3.67 0.577 3.88� 0.641

Total of satisfy about 3dvis system. 3.67 0.577 3.88� 0.641 3.820 0.603

User guide about 3dvis system

1.When read user guide, you can

use 3dvis system.

3.33

0.577

3.75�

0.707

2.�User guide explain how to use

3dvis system clearly. 3.00� 0.000 3.75� 0.707

3.�User guide�set up content easy.� 3.33 0.577 3.88� 0.641

4. Content easy to understanding� 3.33 0.577 3.88� 0.641

5.�Alphabetic character in user

guide clearly.� 4.33� 1.155 4.13� 0.354

6.�Language correction of user

guide. 4.00� 0.000 4.00� 0.756

Total of satisfy about User guide 4.00� 0.000 3.88� 0.641 3.910 0.539

Total 3.67 0.577 3.87 0.640 3.810 0.603

87


age.

# 25� 26 - 30� 31- 35� Total Detail

X S.D. X S.D.� X S.D.� X

S.D.�



3.80

0.447

4.20

0.447

5.00�

.

2. Easy to use.� 3.80 0.447� 3.80 0.447 5.00� .


system.� 3.80 0.447 3.60 0.548 5.00� .

4.�System efficiency in

processing. 3.00 1.00� 4.00 0.707 5.00� .


satisfy.� 3.20 0.837 3.60 0.548 4.00� .


convince.� 3.40 0.894� 3.60 0.548 4.00� .

7. 3dvis system useful. 3.00 0.707 3.00 1.000� 5.00� .

8. System efficiency. 3.40 0.894 4.00 0.000 4.00� .

9.�Satisfy of using 3dvis system.� 3.40 0.548 4.00 0.000 5.00� .

Total of satisfy about 3dvis system. 3.60 0.548 3.80 0.447 5.00� . 3.82 0.603



use 3dvis system.

3.60

0.548

3.40

0.548

5.00�

.


3dvis system clearly. 3.40 .548� 3.40 .548� 5.00� .

3.�User guide�set up content easy.� 3.60 0.548 3.60 0.548 5.00� .

4. Content easy to understanding� 3.60 0.548 3.60 0.548 5.00� .


guide clearly.� 4.00 0.707� 4.40 0.548 4.00� .


guide. 3.80 0.447� 4.00 0.707 5.00� .

Total of satisfy about User guide 3.80 0.447� 3.80 0.447� 5.00� . 3.91 0.539

Total 3.40 0.547 4.00 0.000 5.00 . 3.81 0.603

88


education level.

Bachelor� Master Ph.D. Total Detail

X S.D. X S.D.� X S.D.� X S.D.�



3.00

.

4.14

0.378

4.33�

0.577

2. Easy to use.� 4.00 .� 3.86 0.378 4.00� 1.000


system.� 3.00 . 3.86 0.378 4.00� 1.000

4.�System efficiency in

processing. 2.00 .� 3.43 0.787 4.67� 0.577


satisfy.� 3.00 . 3.29 0.756 4.00� 0.000


convince.� 3.00 .� 3.71 0.756 3.33� 0.577

7. 3dvis system useful. 3.00 . 2.86 0.900� 4.00� 1.000

8. System efficiency. 3.00 . 3.71 0.488 4.00 0.000

9.�Satisfy of using 3dvis system.� 3.00 . 3.71 0.488 4.33� 0.577

Total of satisfy about 3dvis system. 3.00 . 3.86 0.378 4.00� 1.000 3.82 0.603



use 3dvis system.

4.00

.

3.43

0.535

4.00�

1.000


3dvis system clearly. 3.00 .� 3.43 0.535� 4.00� 1.000

3.�User guide�set up content easy.� 4.00 . 3.43 0.535 4.33� 0.577

4. Content easy to understanding� 3.00 .� 3.57 0.535 4.33� 0.577


guide clearly.� 3.00 . 4.43 0.535 4.00� 0.000


guide. 4.00 . 3.86 0.690 4.33� 0.577

Total of satisfy about User guide 4.00 . 3.71 0.488� 4.33� 0.577 3.91 0.539

Total 3.00 . 3.71 0.488 4.33 0.577 3.81 0.603

89


work.

Student Researcher Total Detail

X S.D. X S.D. X S.D.


1. Interesting of 3dvis system.

��

0.601

��

0.000

2. Easy to use. �� 0.601 �� 0.000


system. �� 0.667 �� 0.000

4.�System efficiency in processing. �� 1.014 �� 1.414

5.�Result of the system was satisfy. �� 0.726 �� 0.707


convince. �� 0.726 �� 0.707

7. 3dvis system useful. �� 1.054 �� 0.707

8. System efficiency. �� 0.500 �� 0.000

9.�Satisfy of using 3dvis system. �� 0.667 �� 0.000

Total of satisfy about 3dvis system. �� 0.601 �� 0.707 3.82 0.603



use 3dvis system.

��

0.726

��

0.000


3dvis system clearly. �� 0.726 �� 0.707

3.�User guide�set up content easy. �� 0.707 �� 0.000

4. Content easy to understanding �� 0.707 �� 0.000


guide clearly. �� 0.601 �� 0.707


guide. �� 0.601 �� 0.707

Total of satisfy about User guide �� 0.601 �� 0.000 3.91 0.539

Total 3.77 0.667 4.00 0.000 3.82 0.603

90


expert/interesting field.

Protein

modeling

Computation

al�Total

Detail


Using 3dvis system


��

.500

4.29

.488

2. Easy to use. �� .500 4.00 .577


system. �� 0.577 4.00 0.577

4.�System efficiency in processing. �� 0.957 4.14 0.690

5.�Result of the system was satisfy. �� 0.957 3.57 0.535


convince. �� 0.816 3.86 0.378

7. 3dvis system useful. �� 0.577 3.00 1.155

8. System efficiency. �� 0.500 4.00 0.000

9.�Satisfy of using 3dvis system. �� 0.500 4.14 0.378

Total of satisfy about 3dvis system. �� 0.577 4.00 0.577 3.82 0.603



use 3dvis system.

��

0.500

3.57

0.787


3dvis system clearly. �� 0.500 3.43 0.787

3.�User guide�set up content easy. �� 0.000 3.57 0.787

4. Content easy to understanding �� 0.500 3.71 0.756


guide clearly. �� 0.500 4.43 0.535


guide. �� 0.000 4.00 0.816

Total of satisfy about User guide �� 0.000 3.86 0.690 3.91 0.539

Total 3.50 0.577 4.00 0.577 3.82 0.603

91


current working.

Theoretical other Total Detail


Using 3dvis system


4.13

0.641

4.00

0.000

2. Easy to use. 3.88 0.641 4.00 0.000


system. 3.75 0.707 4.00 0.000

4.�System efficiency in processing. 4.00 0.926 2.67 0.577

5.�Result of the system was satisfy. 3.63 0.518 3.00 1.000


convince. 3.63 0.518 3.33 1.155

7. 3dvis system useful. 3.00 1.069 3.67 0.577

8. System efficiency. 3.88 0.354 3.33 0.577

9.�Satisfy of using 3dvis system. 4.00 0.535 3.33 0.577

Total of satisfy about 3dvis system. 3.88 0.641 3.67 0.577 3.82 0.603



use 3dvis system.

3.50

0.756

4.00

0.000


3dvis system clearly. 3.38 0.744 4.00 0.000

3.�User guide�set up content easy. 3.63 0.744 4.00 0.000

4. Content easy to understanding 3.63 0.744 4.00 0.000


guide clearly. 4.13 0.641 4.33 0.577


guide. 3.88 0.641 4.33 0.577

Total of satisfy about User guide 3.88 0.641 4.00 0.000 3.91 0.539

Total 3.88 0.641 3.67 0.577 3.82 0.603

92

From appendix table C3-8 we found that the satisfy about 3dvis system and

user guide of this system were in agreement to much agreement level (3.00-4.51 by

mean).

From overall result, the survey satisfy of user about 3dvis system, using

purposive sampling in student and researcher whose have skill in computational and

protein modeling, the result was good agreement with system. Suggestion of system

could be include security, frequency ask question (FAQ), checking of error and

artificial intelligent about sequence alignment.

93

��

��

� ��

�

�� !�"��#$�

��%��

�� !� � ��"�� #$��

�� %&�� %&�� %&� � � ��%&�� %&� � � ��%&�� '��

� �()��!*�+�� %�$##�� %�$##�,�� %�$##� ��

�� "� � � ��$'$�-��!*�+�� .��/� � ��$.��

�� '��. ��"��"�#��,%�)�(��

�� +�(��,%�)�(�� 00000000000000000000000000000000000000000000000000000000000000��

�� $ ��&'��(��"� ��

��%�� %1�.�$��

�� %�('��/�� 2�� 2��2%� ��

�� "�,%�3��)�� 4��'��5� �2)�� !� � � ��"#$%%$&�� '(�&%� � � ��'(("�)�� 6��,%�)�(��7777777777777777777777777777777777777777777777777777777777777��

*� ��"�,%�3��+,�-.�/��0�.�� 2�� 2��2%� ��

�� "�,%�3��+,�-.�/��0�.��)��'��5� �2)�� #'�/��0�.�).�� 1�� 2-,33�/��0� � �� 4�� 5 62� ��7,�+8)#9� � �� &-3�� )�� 6��,%�)�(��7777777777777777777777777777777777777777777��

94

�� "��$��(�/,��'��$�$�� 2�� 2��2%� ��

�� "��$��(�/,��'��$�$�)��'��5� �2)�� &-�-�+:-��.-��)��;� ��#�3+.�;��"<;��=��)�;��;�-��$��.�� -� � ��2.��3�� /�.�>�#� � �� 6��,%�)�(��77777777777777777777777777777777777777777777777777777��

�� "��$��0�43�.?�.��$��'��,��'��$�$�� 2�� 2��2%� ��

�� "��$��0�43�.?�.��$��'��,��'��$�$�)�� '��5� �2)�� '� ''��"#$%��.83<-+�� #1 2'�� "#�$9� � � �� 6��,%�)�(��77777777777777777777777777777777777777777777777777777�

�

��

� ��

��%�� .��2)��"��(��#�/�3�3�� *� �.�� 2%��4 ��2�� ,%�)��

�� 5�� '�)� ��%��

�� '�)�

�

)��

� � � � �

��$)��(��#�/�3��'��. ��)� � � � � ��"��(��#�/�3��'()�� )�

� � � � �

��%3�� 8�+��"��(��#�/�3��5�� )�

� � � � �

��$)��(��#�/�3�'��5%�(��9�2)��:� ��)� � � � � ��9�� ;/� ��(��#�/�3�� )�

� � � � �

��$)��9�� ;/��2)�.��(��#�/�3��"�� 5� ��)�

� � � � �

��2)��%�(,�"�/.��"��(��#�/�3��4 ��)�

� � � � �

��$)��,)�8� ��3�� (��%�('$�;$8� ��)��

� � � � �

95

��*��,)�8� ��3�� *� �.��"��(��#�/�3� ��)�

� � � � �

�

)��*+��(�+*��

� � � � �

�� ;$��$;��"�3��..�'��5%<$��$��2)� ��)�

� � � � �

�� '��5 ;$��$;��"�2)��( ��) ��)� � � � � �

��.�)��)��$;��"�� )� � � � � �

�� 5�� '�)� ��%��

�� '�)�

��.�)��)��$;��"�� )� � � � � �

�� +�� $� /�� ;$��$;��"��"�).� ��)�

� � � � �

��"�8�+�� ;$��$;��"�� '�4��(��)��) ��)�

� � � � �

�

�� !��

��%��,%�)3')��$)�:�� "��(��'$��%��%��3��2� $��$��

�� '� 3�(�� '$��%��%��

96

"#$%&'(#)$*++�3D Visualization System to Aid Understanding Hemagglutinin

Mutations in H�N� Influenza Virus

��@ABCDEFGBDHIJKBLDMFFNABOPQLRJSKTUFEVIUWXJHIJ�YZ[RJSK\K]MOFU[LWBLO_FHLCLJB\SGSaLObJB

DMFFGDcU]UdHIJKBLHLeDEfKgDG�g\MHLGBDHIJKBLeDEfKRShiPRJSK]ABGBD\SGSaL]jGeDEfK]UdObJBHIJKBL

��

k��WXJHIJ�elS�WXJ]UdiTHI@TBIaG�NM@BTBDmObJBmnKbJSTX\HL@_LbSK�About g\M�How to use iZJ

O]BLEfL

o��@TBIaG�elS�WXJ]Ud]ABGBD@TEeD�g\MiZJDEFFEVIUWXJHIJg\J_�TU@a]paHLGBDHIJKBLbJSTX\]EfKCTZ]UdOPQLbSK

@TBIaGHLO_FiqRrLUf�C\EKNBG]ABGBD\SGSaLObJBDMFF

,-#./0+#12345*6)'#789+

��k��CLJBgDG�g\M@_LGBDObJBDMFF

DXP]U�k�GBDHIJKBL@_LGBDObJBDMFF

97

��WXJHIJ@BTBDmObJBTBOsldSbSHIJDMFFiZJ]Ud�URL – http://�� dvis YZ[

WXJHIJ@BTBDmObJBmnKCLJBO_FHL@_LbSK�About g\M�How to use iZJO]BLEfL�CBGWXJHIJRJSKGBDHIJKBLHL@_L

bSKGBD]AB�sequence alignment g\MGBD@DJBKYeDK@DJBK@BTTaRa�NABOPQLRJSK@TEeDbSDEFFEVIUWXJHIJ�OsldSOPQL

@TBIaG�qndK@TBIaGTU@a]paHLGBDHIJKBLbJSTX\]EfKCTZ]UdOPQLbSK@TBIaGHLO_FiqRrLUf�C\EKNBG]ABGBD\SGSaLObJB

DMFF�

��o��CLJB�My Workspace

DXP]Ud�o�GBDHIJKBL@_LGBD]ABKBLC\EG�My Workspace

OTldS@TBIaG\SGSaLObJBTBg\J_�NMPDBGt]UdCLJB�My Workspace YZ[CBG@TBIaGTUKBLOGBS[X�

@BTBDmNMPDBGu_BTUiv\rbJSTX\S[X�g\M@BTBDm]ABGBDZB_rLYC\Z�\F�g\MZX@DjPKBL]UdTUS[Xg\J_iZJ�

�

98

w��2D alignment

DXP]Ud�w�GBDHIJKBL@_LGBD]ABKBLOsldSeABL_c\ABZEFYPDRULHCTHL@SKTaRa

��HL@_LLUfOPQL@_L]UdDSKDEFGBD]ABKBLOsldSeABL_c\ABZEFYPDRULHCTHL@SKTaRa��Sequence

alignment) YZ[@TBIaG@BTBDmO\lSGiZJ_BRJSKGBDeABL_cO]U[FGEFYeDK@DJBKbSKYPDRULxUgTGG\XRaLaLgFF

HZ�YZ[HLDMFFNMTUYeDK@DJBK�X-ray bSKYPDRULxUgTGG\XRaLaL��YeDK@DJBK�qndKDB[\MOSU[Z@BTBDmZXiZJ

NBG�PDB list YZ[WXJHIJRJSKGDSGbJSTX\\ABZEFGDZSMTaYL�g\MIldSbSK\ABZEFGDZSMTaYL]UdRJSKGBD�CDlS]AB�

Multiple sequence alignment YZ[SEsYC\ZgvyTbJSTX\\ABZEFYPDRUL��OzsBM�FASTA format

O]BLEfL��

99

{��3D Automate Model

DXP]Ud�{�GBDHIJKBL@_LGBD]ABKBLOsldSeABL_cYeDK@DJBK@BTTaRaNBG\ABZEFYPDRUL

��HL@_LLUfOPQL@_LGBD]ABKBLOsldSeABL_cYeDK@DJBK@BTTaRaNBG\ABZEFYPDRUL�YZ[HIJYeDK@DJBKbSK

YPDRULxUgTGG\XRaLaL��HL]UdLUfelSYeDK@DJBK�1JSO) OPQLRJLgFFZJ_[_apU�Comparative Modeling YZ[

WXJHIJ@BTBDmGDSGIldS�g\M\ABZEFYPDRULiZJ�CDlSO\lSGbJSTX\NBGGBDeABL_c\ABZEFYPDRULHCTHL@SKTaRa]UdTUS[X

g\J_iZJ�

100

��|��Result

DXP]Ud�|�GBDHIJKBL@_LGBDg@ZKW\YZ[@DjP

101

��HL@_LLUfelSW\\Espr]UdiZJNBGGBDeABL_c�g\MWBLGBDPDMT_\W\OsadTORaTg\J_�YZ[NMHCJbJSTX\HL

@_LbSKYeDK@DJBK@BTTaRa�GBDDMFjRABgCLK]UdOGaZGBDOP\Ud[LgP\KbSK\ABZEFGDZSMTaYL]UdWXJHIJ@KTBeABL_c

OPDU[FO]U[FGEF�\ABZEFGDZSMTaYLbSKYeDK@DJBKRJLgFF�g\MGDBvg@ZKeB�Discrete Optimized Protein

Energy Score (DOPE Score) OPDU[FO]U[FYeDK@DJBK@BTTaRa]UdeABL_ciZJ�GEFYeDK@DJBKRJLgFF�

��}��Download

DXP]Ud�}�GBDHIJKBL@_LGBDg@ZKW\YZ[@DjP

��HLCLJB�download WXJHIJ@BTBDm�eEZ\SGbJSTX\]EfKCTZ]UdOGaZbnfLHLGBD@DJBKYeDK@DJBK@BTTaRaZJ_[

YPDgGDT�MODELLER g\MbJSTX\SldLh�iZJ]EfKCTZ

102

CURRICULUM VITAE

NAME : Mr. Taweesak Poochai

BIRTH DATE : July 6, 1983

BIRTH PLACE : Bangkok, Thailand

NATIONALITY : Thai

EDUCATION : 2001 – 2005 Kasetsart University, B.Sc. (Physics).

SCHOLARSHIPS : Higher Education Development project Scholarship

Postgraduate Education and Research Program in

Physical Chemistry (2005-2006).

thesis - ครูปอนด์ · acknowledgements i wish to express my deep gratitude to a...

Documents