advanced python concepts: modules

17
10/21/2013 BCHB524 - 2013 - Edwards Advanced Python Concepts: Modules BCHB524 2013 Lecture 14

Upload: orly

Post on 16-Jan-2016

65 views

Category:

Documents


2 download

DESCRIPTION

Advanced Python Concepts: Modules. BCHB524 2013 Lecture 14. Modules. We've already used these from the python standard library and from BioPython. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Advanced Python Concepts: Modules

10/21/2013 BCHB524 - 2013 - Edwards

Advanced Python Concepts: Modules

BCHB5242013

Lecture 14

Page 2: Advanced Python Concepts: Modules

10/21/2013 BCHB524 - 2013 - Edwards 2

Modules

We've already used these from the python standard library and from BioPython

import sysfilename = sys.argv[1]file_handle = open(filename)

import Bio.SeqIOseq_records = Bio.SeqIO.parse(file_handle, "swiss")

import gzipfile_handle = gzip.open(filename)

import urllibfile_handle = urllib.urlopen(url)

import mathx = math.sqrt(y)

Page 3: Advanced Python Concepts: Modules

10/21/2013 BCHB524 - 2013 - Edwards 3

Modules

Modules contain previously defined functions, variables, and (soon!) classes.

Store useful code for re-use in many programs

Structurerelated codetogether

import sysimport MyNucStuff

seqfilename = sys.argv[1]

seq   = MyNucStuff.read_seq_from_filename(seqfilename)cseq  = MyNucStuff.complement(seq)rcseq = MyNucStuff.reverseComplement(seq)

print "   Sequence:",seqprint " C sequence:",cseqprint "RC sequence:",rcseq

Page 4: Advanced Python Concepts: Modules

10/21/2013 BCHB524 - 2013 - Edwards 4

Modules

In MyNucStuff.pydef read_seq_from_filename(seq_filename):    seq_file = open(seq_filename)    dna_seq = ''.join(seq_file.read().split())    dna_seq = dna_seq.upper()    seq_file.close()    return dna_seq

compl_dict = {'A':'T', 'T':'A', 'C':'G', 'G':'C'}

def complement(seq):    return ''.join(map(compl_dict.get,seq))

def revseq(seq):    return ''.join(reversed(seq))

def reverseComplement(seq):    return complement(revseq(seq))                   # plus anything else you like...

Page 5: Advanced Python Concepts: Modules

10/21/2013 BCHB524 - 2013 - Edwards 5

Modules

We can import specific functions directly… And reference them without the module name

import sysfrom MyNucStuff import read_seq_from_filenamefrom MyNucStuff import complementfrom MyNucStuff import reverseComplement

seqfilename = sys.argv[1]

seq   = read_seq_from_filename(seqfilename)cseq  = complement(seq)rcseq = reverseComplement(seq)

print "   Sequence:",seqprint " C sequence:",cseqprint "RC sequence:",rcseq

Page 6: Advanced Python Concepts: Modules

10/21/2013 BCHB524 - 2013 - Edwards 6

Modules

We can even import all the functions from a module…

import sysfrom MyNucStuff import *

seqfilename = sys.argv[1]

seq   = read_seq_from_filename(seqfilename)cseq  = complement(seq)rcseq = reverseComplement(seq)

print "   Sequence:",seqprint " C sequence:",cseqprint "RC sequence:",rcseq

Page 7: Advanced Python Concepts: Modules

10/21/2013 BCHB524 - 2013 - Edwards 7

Packages

Packages are collections of modules, grouped together. All equivalent:

Implemented using files and folders/directories.

import Bio.SeqIOBio.SeqIO.parse(handle, "swiss")

from Bio import SeqIOSeqIO.parse(handle, "swiss")

from Bio.SeqIO import parseparse(handle, "swiss")

Page 8: Advanced Python Concepts: Modules

What can go wrong?

Sometimes our own .py files can "collide" with Python's packages.

Test what happens with an "empty" module in files: xml.py (and then try to import ElementTree) Bio.py (and then try to import SeqIO) etc…

10/21/2013 BCHB524 - 2013 - Edwards 8

Page 9: Advanced Python Concepts: Modules

10/21/2013 BCHB524 - 2013 - Edwards 9

A module for codon tables Module is called: codon_table Functions:

read_codons_from_filename(filename) returns dictionary of codons –

value is pair: (amino-acid symbol, initiation codon true/false) amino_acid(codon_table,codon)

returns amino-acid symbol for codon is_init(codon_table,codon)

returns true if codon is an initiation codon, false, otherwise get_ambig_aa (codon_table,codon)

Returns the single amino-acid consistent with ambiguous codon (containing N's), or X.

translate(codon_table,seq,frame) returns amino-acid sequence for DNA sequence seq

Page 10: Advanced Python Concepts: Modules

10/21/2013 BCHB524 - 2013 - Edwards 10

A module for codon tables

from MyNucStuff import *from codon_table import *import sys

if len(sys.argv) < 3:    print "Require codon table and DNA sequence on command-line."    sys.exit(1)

table = read_codons_from_filename(sys.argv[1])seq = read_seq_from_filename(sys.argv[2])

if is_init(table,seq[:3]):    print "Initial codon is an initiation codon"

for frame in (1,2,3):  print "Frame",frame,"(forward):",translate(table,seq,frame)

Page 11: Advanced Python Concepts: Modules

A module for codons

In codon_table.py:

10/21/2013 BCHB524 - 2013 - Edwards 11

def read_codons_from_filename(codonfile):    # magic    return codon_table

def amino_acid(table,codon):    # magic    return aa

def is_init(table,codon):    # magic    return init

def get_ambig_aa(table,codon):    # magic    return aa

def translate(table,seq,frame):    # magic    return aaseq

Page 12: Advanced Python Concepts: Modules

A module for codons

In codon_table.py:

10/21/2013 BCHB524 - 2013 - Edwards 12

def read_codons_from_filename(codonfile):    f = open(codonfile)    data = {}    for l in f:        sl = l.split()        key = sl[0]        value = sl[2]        data[key] = value        f.close()

    b1 = data['Base1']    b2 = data['Base2']    b3 = data['Base3']    aa = data['AAs']    st = data['Starts']

    codon_table = {}    n = len(aa)    for i in range(n):        codon = b1[i] + b2[i] + b3[i]        isInit = (st[i] == 'M')        codon_table[codon] = (aa[i],isInit)    return codon_table

Page 13: Advanced Python Concepts: Modules

10/21/2013 BCHB524 - 2013 - Edwards 13

Lab exercises Rework the lecture, and your solutions (or mine) from the

homework exercises #1 through #3, to make a MyDNAStuff module. Put as many useful nucleotide functions as possible into

the module...

Rework the lecture, and your solutions (or mine) from homework exercises #4 and #5, to make the codon_table module with functions specified in this lecture.

Demonstrate the use of these modules to translate an amino-acid sequence in all six-frames with just a few lines of code. The final result should look similar to Slide 10. Your program should handle DNA sequence with N’s in it.

Page 14: Advanced Python Concepts: Modules

Class Project: Expectations

40% of your grade!

Project Report Long version of your homework write-up

Project Presentation Demo your program Describe your project solution

10/21/2013 BCHB524 - 2013 - Edwards 14

Page 15: Advanced Python Concepts: Modules

10/21/2013 BCHB524 - 2013 - Edwards 15

Class Project: Blast Database

1. Write a program that computes all pairwise blast alignments for two species' proteomes and stores the alignments in a relational database.

2. Write a program that retrieves the blast alignment for two proteins (specified by their accessions) from the relational database.

3. Write a program that finds pairs of orthologous proteins that are mutually best hits in the species' proteomes.

Page 16: Advanced Python Concepts: Modules

10/21/2013 BCHB524 - 2013 - Edwards 16

Class Project: MS/MS Viewer

Write a program to display peptide fragmentation spectra from an mzXML file. The program will take an mzXML file, a scan

number, and a peptide sequence as input. The peptide's b-ion and y-ion m/z values should

be computed, and peaks matching these m/z values annotated with appropriate labels.

The output figure/plot should aid the user in determining whether or not the peptide is a good match to the spectrum.

Page 17: Advanced Python Concepts: Modules

10/21/2013 BCHB524 - 2013 - Edwards 17

Class Project: Protein Digest

Write a simple web-server application using TurboGears to carry out an in silico enzymatic digest of a user-provided protein sequence. Users should be able to specify min and max

length, min and max molecular weight, # of missed cleavages, and specific enzyme.

Output should be a table of peptides, with their length, molecular weight, # of missed cleavages, and amino-acids to left and right of each peptide in the protein sequence.