henrik bjorn nielsen, rasmus wernersson and steen knudsen

34
Design of oligonucleotid Design of oligonucleotid es for microarrays and es for microarrays and perspectives for design perspectives for design of multi-transcriptome a of multi-transcriptome a rrays rrays Henrik Bjorn Nielsen, Rasmus Wernersson and Stee Henrik Bjorn Nielsen, Rasmus Wernersson and Stee n Knudsen n Knudsen Nucleic Acids Research, 2003, Vol. 31, No. 13 Nucleic Acids Research, 2003, Vol. 31, No. 13 3491–3496 3491–3496 Speaker: Chui-Wei Wong Speaker: Chui-Wei Wong Advisor: Advisor: 薛 薛 薛 薛 薛 薛 , , PhD PhD Institute of Biomedical Sc Institute of Biomedical Sc ience ience

Upload: jean

Post on 17-Jan-2016

24 views

Category:

Documents


2 download

DESCRIPTION

Design of oligonucleotides for microarrays and perspectives for design of multi-transcriptome arrays. Henrik Bjorn Nielsen, Rasmus Wernersson and Steen Knudsen Nucleic Acids Research, 2003, Vol. 31, No. 13 3491–3496 Speaker: Chui-Wei Wong Advisor: 薛 佑 玲 , PhD - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Henrik Bjorn Nielsen, Rasmus Wernersson and Steen Knudsen

Design of oligonucleotides foDesign of oligonucleotides for microarrays andr microarrays and

perspectives for design of mperspectives for design of multi-transcriptome arraysulti-transcriptome arraysHenrik Bjorn Nielsen, Rasmus Wernersson and Steen KnudsenHenrik Bjorn Nielsen, Rasmus Wernersson and Steen Knudsen

Nucleic Acids Research, 2003, Vol. 31, No. 13 Nucleic Acids Research, 2003, Vol. 31, No. 13 3491–34963491–3496

Speaker: Chui-Wei WongSpeaker: Chui-Wei WongAdvisor:Advisor: 薛 佑 玲薛 佑 玲 , , PhDPhD

Institute of Biomedical ScienceInstitute of Biomedical Science

Page 2: Henrik Bjorn Nielsen, Rasmus Wernersson and Steen Knudsen

22

OutlinesOutlines IntroductionIntroduction MethodMethod Designing OligonucleotidesDesigning Oligonucleotides ResultResult DiscussionDiscussion

Page 3: Henrik Bjorn Nielsen, Rasmus Wernersson and Steen Knudsen

33

IntroductionIntroduction Center for Biological Sequence Analysis --CBSCenter for Biological Sequence Analysis --CBS Technical University of Denmark Technical University of Denmark 19931993 Conducts basic research in the field of Conducts basic research in the field of

bioinformatics and systems biologybioinformatics and systems biology research groupsresearch groups

– molecular biologistsmolecular biologists– biochemistsbiochemists– medical doctorsmedical doctors– physicists physicists – computer scientists computer scientists

Page 4: Henrik Bjorn Nielsen, Rasmus Wernersson and Steen Knudsen

44

Oligonucleotides of 20–70 bpOligonucleotides of 20–70 bp OligoWiz OligoWiz Evaluate and graphicalEvaluate and graphical Input sequences according to collectInput sequences according to collect

ion of parameterion of parameter Can detect transcripts from multiple Can detect transcripts from multiple

organismsorganisms

IntroductionIntroduction

Page 5: Henrik Bjorn Nielsen, Rasmus Wernersson and Steen Knudsen

55

OligoWiz is implemented as a client–serOligoWiz is implemented as a client–server solutionver solution

Server is responsible for the calculation Server is responsible for the calculation of the scoresof the scores

Freely availableFreely available OligoWiz web page: OligoWiz web page: http://www.cbs.dtu.dk/serviceshttp://www.cbs.dtu.dk/services

/OligoWiz//OligoWiz/

IntroductionIntroduction

Page 6: Henrik Bjorn Nielsen, Rasmus Wernersson and Steen Knudsen

66

MethodMethod Written in Java 1.3.1Written in Java 1.3.1 MacOS X, Linux and WindowMacOS X, Linux and Window Server Server

– developed on SGI Unix systemdeveloped on SGI Unix system– written in Per15written in Per15

Utilizes the BLAST program for homoloUtilizes the BLAST program for homology databasegy database

Pallelized using the Perl module ChildMPallelized using the Perl module ChildManageranager

Page 7: Henrik Bjorn Nielsen, Rasmus Wernersson and Steen Knudsen

77

Page 8: Henrik Bjorn Nielsen, Rasmus Wernersson and Steen Knudsen

88

Download Java

Page 9: Henrik Bjorn Nielsen, Rasmus Wernersson and Steen Knudsen

99

Page 10: Henrik Bjorn Nielsen, Rasmus Wernersson and Steen Knudsen

1010

Designing Designing OligonucleotidesOligonucleotides Cross-hybridizationCross-hybridization △Tm Position within transcript Low-complexity filtering GATC-only score

Page 11: Henrik Bjorn Nielsen, Rasmus Wernersson and Steen Knudsen

1111

Cross-hybridization

To avoid cross-hybridization Affinity difference between the intended targ

et and all other targets should ideally be maximized

Experimental evidence suggests that a significantly false signal can be detected – if a 50 bp oligonucleotide has >75–80% of the bas

es complementary – if continuous stretches of >15 bp are complemen

tary to a false target

Page 12: Henrik Bjorn Nielsen, Rasmus Wernersson and Steen Knudsen

1212

homology score

m be the number of BLAST hits considered in position i of the oligonucleotide

h{h1i, . . . , hmi} be the BLAST hits in position i L is the length of the oligonucleotide BLAST hit along the full length of the oligonucl

eotide will get a– score of 0 = 100% identity– score of 1 = 0% identity (no homology)

Page 13: Henrik Bjorn Nielsen, Rasmus Wernersson and Steen Knudsen

1313

△Tm

Oligonucleotides to discriminate between the targets, the hybridization and washing conditions need to be optimal

Oligonucleotides perform well under similar hybridization conditions

Melting temperature of the DNA: DNA duplex (Tm) is a good description of an oligonucleotide hybridization property Minimal difference between the Tm of the

oligonucleotides

Page 14: Henrik Bjorn Nielsen, Rasmus Wernersson and Steen Knudsen

1414

△Tm

OligoWiz uses a nearest-neighbor model for Tm estimation:

△ H is the enthalpy △ S is the entropy change of the nucleation reaction A is a constant correcting for helix initiation (-10.8) R is the universal gas constant (1.99 calK-1 mol-1) Ct is the total molar concentration of strands Since the total molar concentration of strands is unknown for most microarray experiments, OligoWiz uses a constant of 2.5x10-10 M

Page 15: Henrik Bjorn Nielsen, Rasmus Wernersson and Steen Knudsen

1515

Based on the Tm estimation a △ Tm score is calculated

OTm by default is the mean Tm of all oligonucleotides in all input sequences of aim length (user specified) or a specific user specified optimal Tm

For each 50 position along the input sequence the oligonucleotide length (extending toward the 3’ end) with the best △ Tm score is chosen

Therefore the △ Tm score is the first calculation the OligoWiz server performs

△Tm

Page 16: Henrik Bjorn Nielsen, Rasmus Wernersson and Steen Knudsen

1616

Minimal Tm

△Tm

Page 17: Henrik Bjorn Nielsen, Rasmus Wernersson and Steen Knudsen

1717

Position within transcript Position within the target transcript can be of importance

The reverse transcriptase will fall off the transcript with a certain probability Further away from the starting point the less signal will be generated

Page 18: Henrik Bjorn Nielsen, Rasmus Wernersson and Steen Knudsen

1818

Briefing in bioinformatics. Vol 2. No.4. 329-340. Dec 2001

Page 19: Henrik Bjorn Nielsen, Rasmus Wernersson and Steen Knudsen

1919

If the labeling commences from the 3’ end (poly A tail) the following score is used:

– dp is the probability that the reverse transcriptase will fall off its template at any given base

– △ 3’end is the oligonucleotide distance to the 3’ end of the input sequence

Position within transcript

Page 20: Henrik Bjorn Nielsen, Rasmus Wernersson and Steen Knudsen

2020

In cases where the labeling is done with random primers, as would be the case under prokaryote mRNA labeling, the chance of having an oligonucleotide upstream of a given position should be accounted for:

c is a constant indicating the probability that a random primer will bind at any given position

Position within transcript

Page 21: Henrik Bjorn Nielsen, Rasmus Wernersson and Steen Knudsen

2121

To avoid oligonucleotides composed of very common sequence fragments in probe design a low-complexity score was implemented

Different sequences are common in different species– to estimate a low-complexity measure for an olig

onucleotide a list of sequence subfragments – the information content is generated specifically

for each species

Low-complexity filtering

Page 22: Henrik Bjorn Nielsen, Rasmus Wernersson and Steen Knudsen

2222

Low-complexity filtering

The information content can be calculated by the following equation :

n(w) is the number of occurrences of a pattern in the transcriptome l(w) the pattern length nt is the total number of patterns found of a given length

Page 23: Henrik Bjorn Nielsen, Rasmus Wernersson and Steen Knudsen

2323

OligoWiz uses this list to calculate a low-complexity score for each oligonucleotide:

L is the length of the oligonucleotide wi is the pattern in position i norm is a function that normalizes the summed inf

ormation to a value between 1 and 0

Low-complexity filtering

Page 24: Henrik Bjorn Nielsen, Rasmus Wernersson and Steen Knudsen

2424

A low-complexity score :– 0 : an oligonucleotide with very low complexity– Between 1 and 0.8 : majority of oligonucleo

tides have a low-complexity

Low-complexity filtering

Page 25: Henrik Bjorn Nielsen, Rasmus Wernersson and Steen Knudsen

2525

GATC-only scoreGATC-only score

To allow for filtering out sequence containing ambiguity annotation OligoWiz has a score called ‘GATC-only’

Oligonucleotides containing – R, Y, M, K, X, S, W, H, B, V, D, N or anything else wi

ll be given a score of 0 – G, A, T and C will be assigned a score of 1

Page 26: Henrik Bjorn Nielsen, Rasmus Wernersson and Steen Knudsen

2626

M M = AC= AC

R R = AG= AG

W W = AT= AT

S S = CG= CG

Y Y = CT= CT

K K = GT= GT

V V = ACG= ACG

H H = ACT= ACT

D D = AGT= AGT

B B = CGT= CGT

X X = AGCT= AGCT

Beside A, C, T, G

GATC-only scoreGATC-only score

Page 27: Henrik Bjorn Nielsen, Rasmus Wernersson and Steen Knudsen

2727

ResultResult

6600 genes annotated in the Saccharomyces cerevisiae genome

Oligonucleotides : length interval 45–55 bp The homology search and complexity score

was based on whole genome databases Mean Tm of the oligonucleotides was 75.7

℃ calculations done in just 20 min

Page 28: Henrik Bjorn Nielsen, Rasmus Wernersson and Steen Knudsen

2828

Page 29: Henrik Bjorn Nielsen, Rasmus Wernersson and Steen Knudsen

2929

Score parameter/infoScore parameter/info

Page 30: Henrik Bjorn Nielsen, Rasmus Wernersson and Steen Knudsen

3030

Page 31: Henrik Bjorn Nielsen, Rasmus Wernersson and Steen Knudsen

3131

1. Graphs represent scores (y-axis) along the input sequence (x-axis).

2. Total (weighted) score

3. Oligonucleotide selected/predicted

4. Sequence of the oligonucleotide selected

5. Score function manipulation interface

6. Sequence info field

7. Iinput sequence table

8. Total score function manipulation interface

9. Applies score weights of the selected entry to all the entries

10. Predicted/custom bottom

11. W-score is the total weighted score for the selected oligonucleotide

12. “Oligos" per entry field

Page 32: Henrik Bjorn Nielsen, Rasmus Wernersson and Steen Knudsen

3232

Page 33: Henrik Bjorn Nielsen, Rasmus Wernersson and Steen Knudsen

3333

Page 34: Henrik Bjorn Nielsen, Rasmus Wernersson and Steen Knudsen

Thanks You!!!Thanks You!!!