96-summer 生物資訊程式設計實習 ( 二 ) bioinformatics with perl 8/13~8/22 蘇中才...

43
96-Summer 生生生生生生生生生生 ( 生 ) Bioinformatics with Perl 8/13~8/22 生生生 8/24~8/29 生生生 8/31 生生生

Upload: gabriella-mason

Post on 06-Jan-2018

300 views

Category:

Documents


3 download

DESCRIPTION

序號姓名帳號 1 許郁彬 course1 2 杜羿樞 course2 3 黃裕雄 course3 4 王建智 course4 5 陳士杰 course5 6 莊智傑 course6 7 朱柏威 course7 8 洪文峯 course8 9 吳耿豪 course9 10 張雯琪 course10 11 王悅 course11 12 張嘉芸 course12 13 林義峰 course13 14 游棨元 course14 15 許育堂 course15 16 陳建瑋 course16 17 黃國鑫 course17 18 翁小涵 course18 19 郭建鴻 course19 20 曾意儒 course20

TRANSCRIPT

Page 1: 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯

96-Summer生物資訊程式設計實習( 二 )

Bioinformatics with Perl

8/13~8/22 蘇中才8/24~8/29 張天豪

8/31 曾宇鳯

Page 2: 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯

課前準備 課程網頁

http://gene.csie.ntu.edu.tw/~sbb/summer-course/

安裝流程 抓 Putty / Pietty 連上 140.112.28.186 wget

http://gene.csie.ntu.edu.tw/~sbb/summer-course/doc/course1.tgz

tar zxvf course1.tgz

Page 3: 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯

序號 姓名 帳號1 許郁彬 course12 杜羿樞 course23 黃裕雄 course34 王建智 course45 陳士杰 course56 莊智傑 course67 朱柏威 course78 洪文峯 course89 吳耿豪 course910 張雯琪 course1011 王悅 course1112 張嘉芸 course1213 林義峰 course1314 游棨元 course1415 許育堂 course1516 陳建瑋 course1617 黃國鑫 course1718 翁小涵 course1819 郭建鴻 course1920 曾意儒 course20

Page 4: 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯

Appendix

Scalar, Array, Hash

Page 5: 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯

Variable reset (1/2)

$scalar = undef;$scalar = “”;$scalar = 0;

@array = ();%hash = ();

Page 6: 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯

Variable reset (1/2)

@array = undef;

print scalar(@array);

Page 7: 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯

Array

my @number = ("one", "two", "three");my $number = ("one", "two", "three");

print "@number\n";print scalar(@number)."\n";print $#number."\n";print @number."\n";print $number."\n";

Page 8: 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯

Array

@array = qw"5 4 9 8 1 3 6 2 7 10";

print "@array\n";print @array."\n";print @array;

Page 9: 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯

Array – sort by number

#! /usr/bin/perl

@test=(1, 5, 4, 22, 9, 101);

@mmm=sort {$a<=>$b} @test;

print join ',', @mmm, "\n\n";

Page 10: 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯

Hash – show all elements#! /usr/bin/perl -w

%nucleotide_bases = ( A => Adenine, T => Thymine, G => Guanine, C => Cytosine );

while (($key, $value)=each %nucleotide_bases) { print "$key ====> $value\n"; }

foreach $key (keys %nucleotide_bases) { print "$key ====> $nucleotide_bases{$key}\n";}

Page 11: 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯

Hash – reverse with identical values%nucleotide_bases = ( A => Adenine, T => Thymine, G => Adenine, C => Cytosine );

while (($key, $value)=each %nucleotide_bases) { print "$key ====> $value\n";}

%reverse = reverse %nucleotide_bases;while (($key, $value)=each %reverse) { print "$key ====> $value\n";}

Page 12: 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯

Hash – the number of elements

How to know the number of elements in a hash?

Ex:my %hash = ('a'=>1,'b'=>2);print scalar(keys(%hash))."\n";

Page 13: 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯

Comment

# This is a comment

=This is a comment, too=This is a comment, three=cut

print "Really ?\n";

Page 14: 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯

Appendix

STDIN, <>, our/my

Page 15: 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯

$_ - extract data from <STDIN>

while (<STDIN>) {print;}

if (<STDIN>) {print;}

Page 16: 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯

<>;$line = <>;

#! /usr/bin/perl -w

while ( $line = <> ){ print $line;}

Processing Data Files

(like UNIX command : cat)

#! /usr/bin/perl -w

while (<> ){ print;}

Page 17: 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯

Others … while (defined($_ = <>)) { print; } while ($_ = <>) { print; } while (<>) { print; } for (;<>;) { print; } print while defined($_ = <>); print while ($_ = <>); print while <>;

Page 18: 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯

our/my

my $var;$var = 1;{ my $var; $var = 2; print $var,"\n";}print $var, "\n";

our $var;$var = 1;{ our $var; $var = 2; print $var,"\n";}print $var, "\n";

Page 19: 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯

Appendix

Regular expression

Page 20: 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯

Reserved word

open log, ">test.txt“ or die “…”;print log "test\n";close log;

Page 21: 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯

Magic diamond - <>

print “$_” while (<>);

print “$_” while (<*.pl>);

Page 22: 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯

Get the list of files in the current directory

my @files = <*.pl>;

my @files = glob("*.pl");

Page 23: 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯

Greedy matching

my $string = "course1:x:509:510::/home/course1:/bin/bash";

if ($string =~ /(.*):/) { print "matched string = [$1]\n";}

#How to match the first column ?

Page 24: 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯

Greedy matching

my $string = "course1:x:509:510::/home/course1:/bin/bash";

if ($string =~ /^([\S]*):/) { print "matched string = [$1]\n";}

if ($string =~ /^([\S]*?):/) { print "matched string = [$1]\n";}if ($string =~ /([^:]*):/) { print "matched string = [$1]\n";}

Page 25: 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯

Substitution – remove all x

$_ = "China xxxxxx Taiwan";s/x*//; # How to rewrite ?print;

China xxxxx Taiwan

Page 26: 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯

Quoted syntax

Symbol General Description Interpolated‘ ‘ q/ / String No“ “ qq/ / String Yes` ` qx/ / Execution Yes( ) qw/ / List of words No/ / m/ / Pattern matching Yess/ / / s/ / / Substitution Yesy/ / / tr/ / / transliteration No“ “ qr/ / Regular expression Yes

Page 27: 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯

Appendix

Useful techniques

Page 28: 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯

Shell command – file/directory

mkdir(“doc”,0x744); chdir(“doc”); rmdir(“doc”); unlink(“log.txt”); chmod(0x700, “log1.txt”, “log2.txt”,”log3.txt”); rename (“old_name”, “new_name”); chown(<uid>,<gid>,”log1.txt”,”log2.txt”,”log3.txt”);

Page 29: 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯

PerlUsage: perl [switches] [--] [programfile] [arguments] -c check syntax only (runs BEGIN and CHECK blocks) -d[:debugger] run program under debugger -e program one line of program (several -e's allowed, omit

programfile) -i[extension] edit <> files in place (makes backup if extension

supplied) -n assume "while (<>) { ... }" loop around program -p assume loop like -n but print line also, like sed -u dump core after parsing program -v print version, subversion -w enable many useful warnings (RECOMMENDED) -W enable all warnings -X disable all warnings

Page 30: 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯

Removal of ^M

perl -pi.bak -e 's/\r//g;' index.html

Page 31: 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯

File Copy

#! /usr/bin/perl

use File::Copy;

copy("file1", "file2");

Page 32: 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯

Reserved word for debug

__FILE__ __LINE__

Ex:print "FILE:".__FILE__." LINE:".__LINE__."\n";

Page 33: 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯

Debug

Perl –d “program name”

Page 34: 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯

Debug

$perlcc –d test.pl

Page 35: 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯

Special variable$_ the last assignment$! Error message$$ current process ID$? the status when the previous child process end

$” the separator of the list$/

$`,$&,$’string matching$+ the last backreference@- @LAST_MATCH_START@+ @LAST_MATCH_END@_ arguments of a subroutine

Page 36: 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯

Bytecode generator

$perlcc -B -o test test3.pl

Page 37: 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯

CPANperl -MCPAN -e "install GD"

Page 38: 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯

BioPerl

Page 39: 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯

PSI-BLAST Position Specific Iterative BLAST constructs a multiple sequence alignment then

creates a position-specific scoring matrix (PSSM)Query

Sequence

BlastSequencedatabase

PSSM

Multiple sequence alignment

Multiple sequence alignment

Homologous proteins Blast

New homologous

proteins

Page 40: 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯

PSSM(1/4)

G H E G V G K V V K L G A G A

G H E K K G Y F E D R G P S AG H E G Y G G R S R G G G Y SG H E F E G P K G C G A L Y IG H E L R G T T F M P A L E C

Query Sequence

Homologous proteins

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15A 0 0 0 0 0 0 0 0 0 0 0 2 1 0 2C 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1D 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0E 0 0 5 0 1 0 0 0 1 0 0 0 0 1 0F 0 0 0 1 0 0 0 1 1 0 0 0 0 0 0G 5 0 0 2 0 5 1 0 1 0 2 3 1 1 0H 0 5 0 0 0 0 0 0 0 0 0 0 0 0 0I 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1K 0 0 0 1 1 0 1 1 0 1 0 0 0 0 0L 0 0 0 1 0 0 0 0 0 0 1 0 2 0 0M 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0N 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0P 0 0 0 0 0 0 1 0 0 0 1 0 1 0 0Q 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0R 0 0 0 0 1 0 0 1 0 1 1 0 0 0 0S 0 0 0 0 0 0 0 0 1 0 0 0 0 1 1T 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0V 0 0 0 0 1 0 0 1 1 0 0 0 0 0 0W 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0Y 0 0 0 0 1 0 1 0 0 0 0 0 0 2 0

Frequency

Column 1: fA,1=0/5, fC,1=0/5, …, fG,1=5/5, …Column 2: fA,1=0/5, fC,1=0/5, …, fH,1=5/5, ……Column 15: fA,1=2/5, fC,1=1/5, …, fS,1=1/5, …

Page 41: 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯

PSSM (2/4) The original data:

Column 1: fA,1=0/5, fC,1=0/5, …, fG,1=5/5, …

Column 2: fA,1=0/5, fC,1=0/5, …, fH,1=5/5, …

…Column 15: fA,1=2/5, fC,1=1/5, …, fS,1=1/5, …

Set a pseudo-counts of 1:Column 1: f’A,1=(0+1)/(5+20),f’C,1=(0+1)/(5+20),…,f’G,1=(1+1)/(5+20),…Column 2: f’A,1=(0+1)/(5+20),f’C,1=(0+1)/(5+20),…,f’H,1=(1+1)/(5+20),… …Column 15: f’A,1=(2+1)/(5+20),f’C,1=(1+1)/(5+20),…,f’S,1=(1+1)/(5+20),…

Page 42: 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯

PSSM (3/4)

The score is derived from the ratio of the observed to the expected frequencies. More precisely, the logarithm of this ratio is taken and refereed to as the log-likelihood ratio:

)log('

,i

ijji q

fScore

where Scorei,j is the score for residue i at position j, f’ij is the relative frequency for a residue i at position j and qi is the expected relative frequency of residue i in a random sequence.

Page 43: 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯

PSSM (4/4)1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

A -0.2 -0.2 -0.2 -0.2 -0.2 -0.2 -0.2 -0.2 -0.2 -0.2 -0.2 1.3 0.7 -0.2 1.3C -0.2 -0.2 -0.2 -0.2 -0.2 -0.2 -0.2 -0.2 -0.2 0.7 -0.2 -0.2 -0.2 -0.2 0.7D -0.2 -0.2 -0.2 -0.2 -0.2 -0.2 -0.2 -0.2 -0.2 0.7 -0.2 -0.2 -0.2 -0.2 -0.2E -0.2 -0.2 2.3 -0.2 0.7 -0.2 -0.2 -0.2 0.7 -0.2 -0.2 -0.2 -0.2 0.7 -0.2F -0.2 -0.2 -0.2 0.7 -0.2 -0.2 -0.2 0.7 0.7 -0.2 -0.2 -0.2 -0.2 -0.2 -0.2G 2.3 -0.2 -0.2 1.3 -0.2 2.3 0.7 -0.2 0.7 -0.2 1.3 1.7 0.7 0.7 -0.2H -0.2 2.3 -0.2 -0.2 -0.2 -0.2 -0.2 -0.2 -0.2 -0.2 -0.2 -0.2 -0.2 -0.2 -0.2I -0.2 -0.2 -0.2 -0.2 -0.2 -0.2 -0.2 -0.2 -0.2 -0.2 -0.2 -0.2 -0.2 -0.2 0.7K -0.2 -0.2 -0.2 0.7 0.7 -0.2 0.7 0.7 -0.2 0.7 -0.2 -0.2 -0.2 -0.2 -0.2L -0.2 -0.2 -0.2 0.7 -0.2 -0.2 -0.2 -0.2 -0.2 -0.2 0.7 -0.2 1.3 -0.2 -0.2M -0.2 -0.2 -0.2 -0.2 -0.2 -0.2 -0.2 -0.2 -0.2 0.7 -0.2 -0.2 -0.2 -0.2 -0.2N -0.2 -0.2 -0.2 -0.2 -0.2 -0.2 -0.2 -0.2 -0.2 -0.2 -0.2 -0.2 -0.2 -0.2 -0.2P -0.2 -0.2 -0.2 -0.2 -0.2 -0.2 0.7 -0.2 -0.2 -0.2 0.7 -0.2 0.7 -0.2 -0.2Q -0.2 -0.2 -0.2 -0.2 -0.2 -0.2 -0.2 -0.2 -0.2 -0.2 -0.2 -0.2 -0.2 -0.2 -0.2R -0.2 -0.2 -0.2 -0.2 0.7 -0.2 -0.2 0.7 -0.2 0.7 0.7 -0.2 -0.2 -0.2 -0.2S -0.2 -0.2 -0.2 -0.2 -0.2 -0.2 -0.2 -0.2 0.7 -0.2 -0.2 -0.2 -0.2 0.7 0.7T -0.2 -0.2 -0.2 -0.2 -0.2 -0.2 0.7 0.7 -0.2 -0.2 -0.2 -0.2 -0.2 -0.2 -0.2V -0.2 -0.2 -0.2 -0.2 0.7 -0.2 -0.2 0.7 0.7 -0.2 -0.2 -0.2 -0.2 -0.2 -0.2W -0.2 -0.2 -0.2 -0.2 -0.2 -0.2 -0.2 -0.2 -0.2 -0.2 -0.2 -0.2 -0.2 -0.2 -0.2Y -0.2 -0.2 -0.2 -0.2 0.7 -0.2 0.7 -0.2 -0.2 -0.2 -0.2 -0.2 -0.2 1.3 -0.2