bioinformatics as a approach to new generation of ...d93012/20041014.pdf2004/10/14  · introduction...

46
1 Introduction 1.0 Bioinformatics As A Approach to Bioinformatics As A Approach to New Generation of Biological Studies New Generation of Biological Studies 林仲彥 林仲彥 Lin, Chung Lin, Chung- Yen Yen Ph.D Ph.D [email protected] [email protected] 助研究員 助研究員 國家衛生研究院生物統計與生物資訊研究組 國家衛生研究院生物統計與生物資訊研究組

Upload: others

Post on 22-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Bioinformatics As A Approach to New Generation of ...d93012/20041014.pdf2004/10/14  · Introduction 1.0 1 Bioinformatics As A Approach to New Generation of Biological Studies 林仲彥

1Introduction 1.0

Bioinformatics As A Approach to Bioinformatics As A Approach to New Generation of Biological StudiesNew Generation of Biological Studies

林仲彥林仲彥

Lin, ChungLin, Chung--Yen Yen Ph.DPh.D

[email protected]@nhri.org.tw

助研究員助研究員國家衛生研究院生物統計與生物資訊研究組國家衛生研究院生物統計與生物資訊研究組

Page 2: Bioinformatics As A Approach to New Generation of ...d93012/20041014.pdf2004/10/14  · Introduction 1.0 1 Bioinformatics As A Approach to New Generation of Biological Studies 林仲彥

2Introduction 1.0

Introduction - Objectives

• Why does bioinformatics exist • What is bioinformatics• What are the big challenges in

bioinformatics– Research– Discipline differences between Bio and CS

Page 3: Bioinformatics As A Approach to New Generation of ...d93012/20041014.pdf2004/10/14  · Introduction 1.0 1 Bioinformatics As A Approach to New Generation of Biological Studies 林仲彥

3Introduction 1.0

Why Is There Bioinformatics?

u Lots of new sequences being added- Automated sequencers- Genome Projects- EST sequencing, microarray studies, proteomics

Patterns in datasets that can be analyzed using computers

Huge datasets

Page 4: Bioinformatics As A Approach to New Generation of ...d93012/20041014.pdf2004/10/14  · Introduction 1.0 1 Bioinformatics As A Approach to New Generation of Biological Studies 林仲彥

4Introduction 1.0

Need for Informatics in Biology: Origins

• Gramicidin S (Consden et al., 1947), partial insulin sequence (Sanger and Tuppy, 1951)

• 1961: tRNA fragments• Francis Crick, Sydney Brenner, and colleagues

propose the existence of transfer RNA that uses a three base code and mediates in the synthesis of proteins (Crick et al., 1961) General nature of genetic code for proteins. Nature 192: 1227-1232. In Microbiology: A Centenary Perspective, edited by Wolfgang K. Joklik, ASM Press. 1999, p.384

• First codon assignment UUU/phe (Nirenberg and Matthaei, 1961)

Page 5: Bioinformatics As A Approach to New Generation of ...d93012/20041014.pdf2004/10/14  · Introduction 1.0 1 Bioinformatics As A Approach to New Generation of Biological Studies 林仲彥

5Introduction 1.0

Need for Informatics in Biology: Origins

• The key to the whole field of nucleic acid-based identification of microorganisms. The introduction molecular systematics using proteins and nucleic acids by the American Nobel laureate Linus Pauling.

Zuckerkandl, E., and L. Pauling. "Molecules as Documents of Evolutionary History." 1965. Journal of Theoretical Biology 8:357-366

• Another landmark: Nucleic acid sequencing (Sanger and Coulson, 1975)

Page 6: Bioinformatics As A Approach to New Generation of ...d93012/20041014.pdf2004/10/14  · Introduction 1.0 1 Bioinformatics As A Approach to New Generation of Biological Studies 林仲彥

6Introduction 1.0

Need for Informatics in Biology: Origins

• First genomes sequenced: – 3.5 kb RNA bacteriophage MS2

(Fiers et al., 1976)– 5.4 kb bacteriophage ϕX174

(Sanger et al., 1977)– 1.83 Mb First complete genome sequence of a

free-living organism: Haemophilus influenzaeKW20 (Fleischmann et al., 1995)

– First multicellular organism to be sequenced: C. elegans (C. elegans sequencing consortium, 1998)

• Early databases: Dayhoff, 1972; Erdmann, 1978

• Early programs: restriction enzyme sites, promoters, etc… circa 1978.

• 1978 – 1993: Nucleic Acids Research published supplemental information

Page 7: Bioinformatics As A Approach to New Generation of ...d93012/20041014.pdf2004/10/14  · Introduction 1.0 1 Bioinformatics As A Approach to New Generation of Biological Studies 林仲彥

7Introduction 1.0

Genbank Doubles Every 16 Months

(from the National Centre for Biotechnology Information)

Shorter than Moore’s law (computer power doubling every 20 months!)

Page 8: Bioinformatics As A Approach to New Generation of ...d93012/20041014.pdf2004/10/14  · Introduction 1.0 1 Bioinformatics As A Approach to New Generation of Biological Studies 林仲彥

8Introduction 1.0

Today: So many genomes…

As of Oct 6, 2004, how many….

• published, complete genomes?

• eukaryotic genome projects in progress?

• prokaryote genome projects in progress?

Guess closest number without going over!

Page 9: Bioinformatics As A Approach to New Generation of ...d93012/20041014.pdf2004/10/14  · Introduction 1.0 1 Bioinformatics As A Approach to New Generation of Biological Studies 林仲彥

9Introduction 1.0

Today: The Human Genome Project

The genome sequence is complete - almost!– approximately 3.5 billion base pairs.

Page 10: Bioinformatics As A Approach to New Generation of ...d93012/20041014.pdf2004/10/14  · Introduction 1.0 1 Bioinformatics As A Approach to New Generation of Biological Studies 林仲彥

10Introduction 1.0

The next step is obviously to locate all of the genes and regulatory regions, describe their functions, and identify how they differ between different groups (i.e. “disease” vs “healthy”)……bioinformatics plays a critical role

Page 11: Bioinformatics As A Approach to New Generation of ...d93012/20041014.pdf2004/10/14  · Introduction 1.0 1 Bioinformatics As A Approach to New Generation of Biological Studies 林仲彥

11Introduction 1.0

Implications for Biomedicine and Bioinformatics

• Physicians will use genetic information to diagnose and treat disease.– Virtually all medical conditions (other than trauma)

have a genetic component– Individualize drugs – reduce side effects– Single Nucleotide Polymorphisms (SNPs)

• Faster drug development research– More targets– Faster clinical trials (selected trial populations)

• Most Biologists will analyze gene sequence information in their daily work

Page 12: Bioinformatics As A Approach to New Generation of ...d93012/20041014.pdf2004/10/14  · Introduction 1.0 1 Bioinformatics As A Approach to New Generation of Biological Studies 林仲彥

12Introduction 1.0

Bioinformatics will help with DNA Sequencing

u Automated sequencers > 40,000 bp per day

u 500 bp reads must be assembled into complete sequences- Detecting errors especially insertions and deletions

u Data flow management

Page 13: Bioinformatics As A Approach to New Generation of ...d93012/20041014.pdf2004/10/14  · Introduction 1.0 1 Bioinformatics As A Approach to New Generation of Biological Studies 林仲彥

13Introduction 1.0

Bioinformatics will help with Similarity Searching Sequence Databases

u What is similar to my sequence?

u Searching gets harder as the databases get bigger - and quality changes

u Tools: BLAST and FASTA = time saving heuristics (approximate methods)

u Statistics + informed judgement of the biologist

Page 14: Bioinformatics As A Approach to New Generation of ...d93012/20041014.pdf2004/10/14  · Introduction 1.0 1 Bioinformatics As A Approach to New Generation of Biological Studies 林仲彥

14Introduction 1.0

Bioinformatics will help with…….Structure-Function Relationships

u Can we predict the function of protein molecules from their sequence?

sequence > structure > function

u Prediction of some simple 3-D structures (α-helix, β-sheet, membrane spanning, etc.)

Page 15: Bioinformatics As A Approach to New Generation of ...d93012/20041014.pdf2004/10/14  · Introduction 1.0 1 Bioinformatics As A Approach to New Generation of Biological Studies 林仲彥

15Introduction 1.0

u Can we define evolutionary relationships between organisms by comparing DNA sequences- What is the molecular clock?- Lots of methods and software, what is

the "correct" analysis?

Bioinformatics will help with Phylogenetics

Page 16: Bioinformatics As A Approach to New Generation of ...d93012/20041014.pdf2004/10/14  · Introduction 1.0 1 Bioinformatics As A Approach to New Generation of Biological Studies 林仲彥

16Introduction 1.0

Top 10 Future Challenges for Bioinformatics

• Precise, predictive model of transcription initiation and termination: ability to predict where and when transcription will occur in a genome

• Precise, predictive model of RNA splicing/alternative splicing: ability to predict the splicing pattern of any primary transcript in any tissue

• Precise, quantitative models of signal transduction pathways: ability to predict cellular responses to external stimuli

• Determining effective protein:DNA, protein:RNA and protein:protein recognition codes

• Accurate ab initio protein structure prediction

Page 17: Bioinformatics As A Approach to New Generation of ...d93012/20041014.pdf2004/10/14  · Introduction 1.0 1 Bioinformatics As A Approach to New Generation of Biological Studies 林仲彥

17Introduction 1.0

Reference: Chris Burge, Ewan Birney, Jim Fickett. Genome Technology, issue No. 17, January, 2002

Top 10 Future Challenges for Bioinformatics

• Rational design of small molecule inhibitors of proteins• Mechanistic understanding of protein evolution:

understanding exactly how new protein functions evolve• Mechanistic understanding of speciation: molecular details

of how speciation occurs • Continued development of effective gene ontologies -

systematic ways to describe the functions of any gene or protein

• Education: development of appropriate bioinformatics curricula for secondary, undergraduate and graduate education

Page 18: Bioinformatics As A Approach to New Generation of ...d93012/20041014.pdf2004/10/14  · Introduction 1.0 1 Bioinformatics As A Approach to New Generation of Biological Studies 林仲彥

18Introduction 1.0

What is Bioinformatics?

• Think – Pair – Share!

Page 19: Bioinformatics As A Approach to New Generation of ...d93012/20041014.pdf2004/10/14  · Introduction 1.0 1 Bioinformatics As A Approach to New Generation of Biological Studies 林仲彥

19Introduction 1.0

The Biologist in the Age of Information

Page 20: Bioinformatics As A Approach to New Generation of ...d93012/20041014.pdf2004/10/14  · Introduction 1.0 1 Bioinformatics As A Approach to New Generation of Biological Studies 林仲彥

20Introduction 1.0

The Job of the Biologist Is Changing

• As more biological information becomes available …– The biologist will spend more time using computers– The biologist will spend more time on data analysis – Biology will become a more quantitative science

(think how the periodic table and atomic theory affected chemistry)

Page 21: Bioinformatics As A Approach to New Generation of ...d93012/20041014.pdf2004/10/14  · Introduction 1.0 1 Bioinformatics As A Approach to New Generation of Biological Studies 林仲彥

21Introduction 1.0

The challenge: Putting it all together

u The current state of the art requires the biologist to jump around from Web to mainframe to personal computer

u The trend is for integration u Real Power: Being able to use and customize

all resources

Page 22: Bioinformatics As A Approach to New Generation of ...d93012/20041014.pdf2004/10/14  · Introduction 1.0 1 Bioinformatics As A Approach to New Generation of Biological Studies 林仲彥

22Introduction 1.0

The Computer Scientist in the Age of Genomics

Page 23: Bioinformatics As A Approach to New Generation of ...d93012/20041014.pdf2004/10/14  · Introduction 1.0 1 Bioinformatics As A Approach to New Generation of Biological Studies 林仲彥

23Introduction 1.0

How much biology to understand?

u Increasing sophistication required for computational biologists in terms of biological knowledge

u What knowledge is important? What about all those exceptions?

u What problems are important?

Page 24: Bioinformatics As A Approach to New Generation of ...d93012/20041014.pdf2004/10/14  · Introduction 1.0 1 Bioinformatics As A Approach to New Generation of Biological Studies 林仲彥

24Introduction 1.0

What Computational Tools to Understand?

u Perl is still used extensively in bioinformaticsu Open source is prevalent in bioinformatics (Linux,

MySQL, bioperl)u Need to be knowledgeable about both the standard

bioinformatics algorithms and common tools that are based on them

u Appreciate the different databases and programs out there and what their benefits and fallacies are –databases have widely varying quality

Page 25: Bioinformatics As A Approach to New Generation of ...d93012/20041014.pdf2004/10/14  · Introduction 1.0 1 Bioinformatics As A Approach to New Generation of Biological Studies 林仲彥

25Introduction 1.0

High Quality Bioinformatics Research

Excellent Communication and Cooperation Between Biologists and Computer Scientists are Keys

Page 26: Bioinformatics As A Approach to New Generation of ...d93012/20041014.pdf2004/10/14  · Introduction 1.0 1 Bioinformatics As A Approach to New Generation of Biological Studies 林仲彥

26Introduction 1.0

The computer scientist and biologist compared

Computer scientist• Logic• Problem-solving• Process-oriented• Algorithmic• Optimizing

Biologist• Knowledge gathering• Experimentally-focused• Exceptions are as common as

rules• Describe work as a story• Develop conclusions and

models

Page 27: Bioinformatics As A Approach to New Generation of ...d93012/20041014.pdf2004/10/14  · Introduction 1.0 1 Bioinformatics As A Approach to New Generation of Biological Studies 林仲彥

27Introduction 1.0

Computer Science vs Biology

The result….• see the world differently• ask different questions• come to problems with different assumptions• pick up on different details• use different metaphors to organize knowledge• have different sets of analytical tools at their disposal• can even interact with people differently

Coming together• Communicate constantly!• Gain a better understanding of different ways of thinking • Try communicating in different ways• Remember there are others…. Statisticians, mathematicians,

engineers, physicists, chemists, physiologists….

Page 28: Bioinformatics As A Approach to New Generation of ...d93012/20041014.pdf2004/10/14  · Introduction 1.0 1 Bioinformatics As A Approach to New Generation of Biological Studies 林仲彥

28Introduction 1.0

Thoughts for the day

• What is bioinformatics?

• Why does bioinformatics exist?

• How can I use bioinformatics more effectively in my career?

• Questions?

Page 29: Bioinformatics As A Approach to New Generation of ...d93012/20041014.pdf2004/10/14  · Introduction 1.0 1 Bioinformatics As A Approach to New Generation of Biological Studies 林仲彥

29Introduction 1.0

Real World Applications of bioinformatics

• 1. . Molecular medicine Molecular medicine – 1.1 More drug targets – 1.2 Personalised medicine – 1.3 Preventative medicine – 1.4 Gene therapy

• 2. Microbial genome applicationsMicrobial genome applications– 2.1 Waste cleanup – 2.2 Climate change – 2.3 Alternative energy sources – 2.4 Biotechnology – 2.5 Antibiotic resistance – 2.6 Forensic analysis of microbes – 2.7 The reality of bioweapon creation – 2.8 Evolutionary studies

•• 3. 3. Agriculture Agriculture – 3.1 Crops – 3.2 Insect resistance – 3.3 Improve nutritional quality – 3.4 Grow crops in poorer soils

and that are drought resistant •• 4. Animals 4. Animals •• 5. Comparative studies 5. Comparative studies

Page 30: Bioinformatics As A Approach to New Generation of ...d93012/20041014.pdf2004/10/14  · Introduction 1.0 1 Bioinformatics As A Approach to New Generation of Biological Studies 林仲彥

30Introduction 1.0

BIOINFORMATICS INTRODUCTION

Page 31: Bioinformatics As A Approach to New Generation of ...d93012/20041014.pdf2004/10/14  · Introduction 1.0 1 Bioinformatics As A Approach to New Generation of Biological Studies 林仲彥

31Introduction 1.0

What is Bioinformatics

• The application of computer technology to the management of biological information

• Software applications used to gather, store, analyze and integrate biological information

• Databases and algorithms designed for the purpose of enhancing the process of biological research

Page 32: Bioinformatics As A Approach to New Generation of ...d93012/20041014.pdf2004/10/14  · Introduction 1.0 1 Bioinformatics As A Approach to New Generation of Biological Studies 林仲彥

32Introduction 1.0

What is Bioinformatics

• NCBI: “Research, development, or application of computational tools and approaches for expanding the use of biological, medical, behavioral or health data including those to acquire, store, archive, analyze, or visualize such data.”

• Lincoln Stein: “Biologists using computers, or the other way around.”

Page 33: Bioinformatics As A Approach to New Generation of ...d93012/20041014.pdf2004/10/14  · Introduction 1.0 1 Bioinformatics As A Approach to New Generation of Biological Studies 林仲彥

33Introduction 1.0

“Hot” Bioinformatics Topics

• Gene Expression / Regulation

• Protein / RNA Structure

• Ontologies

• Genome Sequencing / Annotation

• Molecular Interactions

Page 34: Bioinformatics As A Approach to New Generation of ...d93012/20041014.pdf2004/10/14  · Introduction 1.0 1 Bioinformatics As A Approach to New Generation of Biological Studies 林仲彥

34Introduction 1.0

Where is Bioinformatics Used

• Pharmaceuticals

• Universities

• Biotech Companies

• Public Good / Health Research Institutes

• Hardware Manufacturers

• Government Agencies

THESE ARE OUR CLIENTS

Page 35: Bioinformatics As A Approach to New Generation of ...d93012/20041014.pdf2004/10/14  · Introduction 1.0 1 Bioinformatics As A Approach to New Generation of Biological Studies 林仲彥

35Introduction 1.0

Why Do We Need Bioinformatics

• Accessibility of biological data

• Data integration… at least within an organization

• Processing of data (data mining)

• Prediction and analysis

• Storage of mass amounts of data (high-throughput experiments)

Page 36: Bioinformatics As A Approach to New Generation of ...d93012/20041014.pdf2004/10/14  · Introduction 1.0 1 Bioinformatics As A Approach to New Generation of Biological Studies 林仲彥

36Introduction 1.0

DATA INTRODUCTION

Page 37: Bioinformatics As A Approach to New Generation of ...d93012/20041014.pdf2004/10/14  · Introduction 1.0 1 Bioinformatics As A Approach to New Generation of Biological Studies 林仲彥

37Introduction 1.0

How Much Data - GenBank

Source: NCBI

Page 38: Bioinformatics As A Approach to New Generation of ...d93012/20041014.pdf2004/10/14  · Introduction 1.0 1 Bioinformatics As A Approach to New Generation of Biological Studies 林仲彥

38Introduction 1.0

How Much Data - PDB

Source: RSCB

Page 39: Bioinformatics As A Approach to New Generation of ...d93012/20041014.pdf2004/10/14  · Introduction 1.0 1 Bioinformatics As A Approach to New Generation of Biological Studies 林仲彥

39Introduction 1.0

How Much Data - BIND

Source: Blueprint North America

Page 40: Bioinformatics As A Approach to New Generation of ...d93012/20041014.pdf2004/10/14  · Introduction 1.0 1 Bioinformatics As A Approach to New Generation of Biological Studies 林仲彥

40Introduction 1.0

How Much Data - PubMed

Source: Israel Institute of Technology

Page 41: Bioinformatics As A Approach to New Generation of ...d93012/20041014.pdf2004/10/14  · Introduction 1.0 1 Bioinformatics As A Approach to New Generation of Biological Studies 林仲彥

41Introduction 1.0

What Do We Do With All This Data?

• Design data structures to represent this information unambiguously

• Develop databases to house the data

• Develop accessible software to submit new data

• Develop fast applications to query the data

• Develop fast applications to analyze the data (data mining)

Page 42: Bioinformatics As A Approach to New Generation of ...d93012/20041014.pdf2004/10/14  · Introduction 1.0 1 Bioinformatics As A Approach to New Generation of Biological Studies 林仲彥

42Introduction 1.0

APPLICATIONS INTRODUCTION

Page 43: Bioinformatics As A Approach to New Generation of ...d93012/20041014.pdf2004/10/14  · Introduction 1.0 1 Bioinformatics As A Approach to New Generation of Biological Studies 林仲彥

43Introduction 1.0

Bioinformatics Application Trends

• Web based GUI tool accessibility

• Data marts

• Web services

• Integration Services

• Pre-analyzed Data Services

Page 44: Bioinformatics As A Approach to New Generation of ...d93012/20041014.pdf2004/10/14  · Introduction 1.0 1 Bioinformatics As A Approach to New Generation of Biological Studies 林仲彥

44Introduction 1.0

Languages of Bioinformatics

• Perl

• Python

• Java

• C++

• C

• And More…

Page 45: Bioinformatics As A Approach to New Generation of ...d93012/20041014.pdf2004/10/14  · Introduction 1.0 1 Bioinformatics As A Approach to New Generation of Biological Studies 林仲彥

45Introduction 1.0

Today’s World of Bioinformatics

Note: This is not intended to be an extensive list ofbioinformatics institutions

Page 46: Bioinformatics As A Approach to New Generation of ...d93012/20041014.pdf2004/10/14  · Introduction 1.0 1 Bioinformatics As A Approach to New Generation of Biological Studies 林仲彥

46Introduction 1.0

All Sorts of Bioinformatics Tools

Note: This is not intended to be an extensive list ofbioinformatics tools