bioinformatics as a approach to new generation of ...d93012/20041014.pdf2004/10/14  · introduction...

Post on 22-Jun-2020

1 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1Introduction 1.0

Bioinformatics As A Approach to Bioinformatics As A Approach to New Generation of Biological StudiesNew Generation of Biological Studies

林仲彥林仲彥

Lin, ChungLin, Chung--Yen Yen Ph.DPh.D

cylin@nhri.org.twcylin@nhri.org.tw

助研究員助研究員國家衛生研究院生物統計與生物資訊研究組國家衛生研究院生物統計與生物資訊研究組

2Introduction 1.0

Introduction - Objectives

• Why does bioinformatics exist • What is bioinformatics• What are the big challenges in

bioinformatics– Research– Discipline differences between Bio and CS

3Introduction 1.0

Why Is There Bioinformatics?

u Lots of new sequences being added- Automated sequencers- Genome Projects- EST sequencing, microarray studies, proteomics

Patterns in datasets that can be analyzed using computers

Huge datasets

4Introduction 1.0

Need for Informatics in Biology: Origins

• Gramicidin S (Consden et al., 1947), partial insulin sequence (Sanger and Tuppy, 1951)

• 1961: tRNA fragments• Francis Crick, Sydney Brenner, and colleagues

propose the existence of transfer RNA that uses a three base code and mediates in the synthesis of proteins (Crick et al., 1961) General nature of genetic code for proteins. Nature 192: 1227-1232. In Microbiology: A Centenary Perspective, edited by Wolfgang K. Joklik, ASM Press. 1999, p.384

• First codon assignment UUU/phe (Nirenberg and Matthaei, 1961)

5Introduction 1.0

Need for Informatics in Biology: Origins

• The key to the whole field of nucleic acid-based identification of microorganisms. The introduction molecular systematics using proteins and nucleic acids by the American Nobel laureate Linus Pauling.

Zuckerkandl, E., and L. Pauling. "Molecules as Documents of Evolutionary History." 1965. Journal of Theoretical Biology 8:357-366

• Another landmark: Nucleic acid sequencing (Sanger and Coulson, 1975)

6Introduction 1.0

Need for Informatics in Biology: Origins

• First genomes sequenced: – 3.5 kb RNA bacteriophage MS2

(Fiers et al., 1976)– 5.4 kb bacteriophage ϕX174

(Sanger et al., 1977)– 1.83 Mb First complete genome sequence of a

free-living organism: Haemophilus influenzaeKW20 (Fleischmann et al., 1995)

– First multicellular organism to be sequenced: C. elegans (C. elegans sequencing consortium, 1998)

• Early databases: Dayhoff, 1972; Erdmann, 1978

• Early programs: restriction enzyme sites, promoters, etc… circa 1978.

• 1978 – 1993: Nucleic Acids Research published supplemental information

7Introduction 1.0

Genbank Doubles Every 16 Months

(from the National Centre for Biotechnology Information)

Shorter than Moore’s law (computer power doubling every 20 months!)

8Introduction 1.0

Today: So many genomes…

As of Oct 6, 2004, how many….

• published, complete genomes?

• eukaryotic genome projects in progress?

• prokaryote genome projects in progress?

Guess closest number without going over!

9Introduction 1.0

Today: The Human Genome Project

The genome sequence is complete - almost!– approximately 3.5 billion base pairs.

10Introduction 1.0

The next step is obviously to locate all of the genes and regulatory regions, describe their functions, and identify how they differ between different groups (i.e. “disease” vs “healthy”)……bioinformatics plays a critical role

11Introduction 1.0

Implications for Biomedicine and Bioinformatics

• Physicians will use genetic information to diagnose and treat disease.– Virtually all medical conditions (other than trauma)

have a genetic component– Individualize drugs – reduce side effects– Single Nucleotide Polymorphisms (SNPs)

• Faster drug development research– More targets– Faster clinical trials (selected trial populations)

• Most Biologists will analyze gene sequence information in their daily work

12Introduction 1.0

Bioinformatics will help with DNA Sequencing

u Automated sequencers > 40,000 bp per day

u 500 bp reads must be assembled into complete sequences- Detecting errors especially insertions and deletions

u Data flow management

13Introduction 1.0

Bioinformatics will help with Similarity Searching Sequence Databases

u What is similar to my sequence?

u Searching gets harder as the databases get bigger - and quality changes

u Tools: BLAST and FASTA = time saving heuristics (approximate methods)

u Statistics + informed judgement of the biologist

14Introduction 1.0

Bioinformatics will help with…….Structure-Function Relationships

u Can we predict the function of protein molecules from their sequence?

sequence > structure > function

u Prediction of some simple 3-D structures (α-helix, β-sheet, membrane spanning, etc.)

15Introduction 1.0

u Can we define evolutionary relationships between organisms by comparing DNA sequences- What is the molecular clock?- Lots of methods and software, what is

the "correct" analysis?

Bioinformatics will help with Phylogenetics

16Introduction 1.0

Top 10 Future Challenges for Bioinformatics

• Precise, predictive model of transcription initiation and termination: ability to predict where and when transcription will occur in a genome

• Precise, predictive model of RNA splicing/alternative splicing: ability to predict the splicing pattern of any primary transcript in any tissue

• Precise, quantitative models of signal transduction pathways: ability to predict cellular responses to external stimuli

• Determining effective protein:DNA, protein:RNA and protein:protein recognition codes

• Accurate ab initio protein structure prediction

17Introduction 1.0

Reference: Chris Burge, Ewan Birney, Jim Fickett. Genome Technology, issue No. 17, January, 2002

Top 10 Future Challenges for Bioinformatics

• Rational design of small molecule inhibitors of proteins• Mechanistic understanding of protein evolution:

understanding exactly how new protein functions evolve• Mechanistic understanding of speciation: molecular details

of how speciation occurs • Continued development of effective gene ontologies -

systematic ways to describe the functions of any gene or protein

• Education: development of appropriate bioinformatics curricula for secondary, undergraduate and graduate education

18Introduction 1.0

What is Bioinformatics?

• Think – Pair – Share!

19Introduction 1.0

The Biologist in the Age of Information

20Introduction 1.0

The Job of the Biologist Is Changing

• As more biological information becomes available …– The biologist will spend more time using computers– The biologist will spend more time on data analysis – Biology will become a more quantitative science

(think how the periodic table and atomic theory affected chemistry)

21Introduction 1.0

The challenge: Putting it all together

u The current state of the art requires the biologist to jump around from Web to mainframe to personal computer

u The trend is for integration u Real Power: Being able to use and customize

all resources

22Introduction 1.0

The Computer Scientist in the Age of Genomics

23Introduction 1.0

How much biology to understand?

u Increasing sophistication required for computational biologists in terms of biological knowledge

u What knowledge is important? What about all those exceptions?

u What problems are important?

24Introduction 1.0

What Computational Tools to Understand?

u Perl is still used extensively in bioinformaticsu Open source is prevalent in bioinformatics (Linux,

MySQL, bioperl)u Need to be knowledgeable about both the standard

bioinformatics algorithms and common tools that are based on them

u Appreciate the different databases and programs out there and what their benefits and fallacies are –databases have widely varying quality

25Introduction 1.0

High Quality Bioinformatics Research

Excellent Communication and Cooperation Between Biologists and Computer Scientists are Keys

26Introduction 1.0

The computer scientist and biologist compared

Computer scientist• Logic• Problem-solving• Process-oriented• Algorithmic• Optimizing

Biologist• Knowledge gathering• Experimentally-focused• Exceptions are as common as

rules• Describe work as a story• Develop conclusions and

models

27Introduction 1.0

Computer Science vs Biology

The result….• see the world differently• ask different questions• come to problems with different assumptions• pick up on different details• use different metaphors to organize knowledge• have different sets of analytical tools at their disposal• can even interact with people differently

Coming together• Communicate constantly!• Gain a better understanding of different ways of thinking • Try communicating in different ways• Remember there are others…. Statisticians, mathematicians,

engineers, physicists, chemists, physiologists….

28Introduction 1.0

Thoughts for the day

• What is bioinformatics?

• Why does bioinformatics exist?

• How can I use bioinformatics more effectively in my career?

• Questions?

29Introduction 1.0

Real World Applications of bioinformatics

• 1. . Molecular medicine Molecular medicine – 1.1 More drug targets – 1.2 Personalised medicine – 1.3 Preventative medicine – 1.4 Gene therapy

• 2. Microbial genome applicationsMicrobial genome applications– 2.1 Waste cleanup – 2.2 Climate change – 2.3 Alternative energy sources – 2.4 Biotechnology – 2.5 Antibiotic resistance – 2.6 Forensic analysis of microbes – 2.7 The reality of bioweapon creation – 2.8 Evolutionary studies

•• 3. 3. Agriculture Agriculture – 3.1 Crops – 3.2 Insect resistance – 3.3 Improve nutritional quality – 3.4 Grow crops in poorer soils

and that are drought resistant •• 4. Animals 4. Animals •• 5. Comparative studies 5. Comparative studies

30Introduction 1.0

BIOINFORMATICS INTRODUCTION

31Introduction 1.0

What is Bioinformatics

• The application of computer technology to the management of biological information

• Software applications used to gather, store, analyze and integrate biological information

• Databases and algorithms designed for the purpose of enhancing the process of biological research

32Introduction 1.0

What is Bioinformatics

• NCBI: “Research, development, or application of computational tools and approaches for expanding the use of biological, medical, behavioral or health data including those to acquire, store, archive, analyze, or visualize such data.”

• Lincoln Stein: “Biologists using computers, or the other way around.”

33Introduction 1.0

“Hot” Bioinformatics Topics

• Gene Expression / Regulation

• Protein / RNA Structure

• Ontologies

• Genome Sequencing / Annotation

• Molecular Interactions

34Introduction 1.0

Where is Bioinformatics Used

• Pharmaceuticals

• Universities

• Biotech Companies

• Public Good / Health Research Institutes

• Hardware Manufacturers

• Government Agencies

THESE ARE OUR CLIENTS

35Introduction 1.0

Why Do We Need Bioinformatics

• Accessibility of biological data

• Data integration… at least within an organization

• Processing of data (data mining)

• Prediction and analysis

• Storage of mass amounts of data (high-throughput experiments)

36Introduction 1.0

DATA INTRODUCTION

37Introduction 1.0

How Much Data - GenBank

Source: NCBI

38Introduction 1.0

How Much Data - PDB

Source: RSCB

39Introduction 1.0

How Much Data - BIND

Source: Blueprint North America

40Introduction 1.0

How Much Data - PubMed

Source: Israel Institute of Technology

41Introduction 1.0

What Do We Do With All This Data?

• Design data structures to represent this information unambiguously

• Develop databases to house the data

• Develop accessible software to submit new data

• Develop fast applications to query the data

• Develop fast applications to analyze the data (data mining)

42Introduction 1.0

APPLICATIONS INTRODUCTION

43Introduction 1.0

Bioinformatics Application Trends

• Web based GUI tool accessibility

• Data marts

• Web services

• Integration Services

• Pre-analyzed Data Services

44Introduction 1.0

Languages of Bioinformatics

• Perl

• Python

• Java

• C++

• C

• And More…

45Introduction 1.0

Today’s World of Bioinformatics

Note: This is not intended to be an extensive list ofbioinformatics institutions

46Introduction 1.0

All Sorts of Bioinformatics Tools

Note: This is not intended to be an extensive list ofbioinformatics tools

top related