2015 bioinformatics bio_python_partii

Post on 11-Jan-2017

2.499 Views

Category:

Education

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

FBW3-11-2015

Wim Van Criekinge

Bioinformatics.be

GitHub: Hosted GIT

• Largest open source git hosting site• Public and private options• User-centric rather than project-centric• http://github.ugent.be (use your Ugent

login and password)– Accept invitation from Bioinformatics-I-

2015URI:– https://github.ugent.be/Bioinformatics-I-

2015/Python.git

Control Structures

if condition: statements[elif condition: statements] ...else: statements

while condition: statements

for var in sequence: statements

breakcontinue

Lists

• Flexible arrays, not Lisp-like linked lists

• a = [99, "bottles of beer", ["on", "the", "wall"]]

• Same operators as for strings• a+b, a*3, a[0], a[-1], a[1:], len(a)

• Item and slice assignment• a[0] = 98• a[1:2] = ["bottles", "of", "beer"]

-> [98, "bottles", "of", "beer", ["on", "the", "wall"]]• del a[-1] # -> [98, "bottles", "of",

"beer"]

Dictionaries

• Hash tables, "associative arrays"• d = {"duck": "eend", "water": "water"}

• Lookup:• d["duck"] -> "eend"• d["back"] # raises KeyError exception

• Delete, insert, overwrite:• del d["water"] # {"duck": "eend", "back": "rug"}• d["back"] = "rug" # {"duck": "eend", "back":

"rug"}• d["duck"] = "duik" # {"duck": "duik", "back":

"rug"}

Regex.py

text = 'abbaaabbbbaaaaa'pattern = 'ab'

for match in re.finditer(pattern, text): s = match.start() e = match.end() print ('Found "%s" at %d:%d' % (text[s:e], s, e))

m = re.search("^([A-Z]) ",line) if m: from_letter = m.groups()[0]

Install Biopython

pip is the preferred installer program. Starting with Python 3.4, it is included by default with the Python binary installers.pip3.5 install Biopython

#pip3.5 install yahoo_finance

from yahoo_finance import Shareyahoo = Share('AAPL')print (yahoo.get_open())

BioPython

• Make a histogram of the MW (in kDa) of all proteins in Swiss-Prot

• Find the most basic and most acidic protein in Swiss-Prot?• Biological relevance of the results ?

From AAIndex

H ZIMJ680104D Isoelectric point (Zimmerman et al., 1968)R LIT:2004109b PMID:5700434A Zimmerman, J.M., Eliezer, N. and Simha, R.T The characterization of amino acid sequences in proteins by

statistical methodsJ J. Theor. Biol. 21, 170-201 (1968)C KLEP840101 0.941 FAUJ880111 0.813 FINA910103 0.805I A/L R/K N/M D/F C/P Q/S E/T G/W H/Y I/V 6.00 10.76 5.41 2.77 5.05 5.65 3.22 5.97 7.59

6.02 5.98 9.74 5.74 5.48 6.30 5.68 5.66 5.89 5.66 5.96

Biopython AAindex ? Dictionary

… file parser

from Bio import SeqIOc=0handle = open(r'/Users/wvcrieki/Downloads/uniprot_sprot.dat') for seq_rec in SeqIO.parse(handle, "swiss"): print (seq_rec.id) print (repr(seq_rec.seq)) print (len(seq_rec)) c+=1 if c>5: break

Parsing sequences from the net

Parsing GenBank records from the net

Parsing SwissProt sequence from the net

Handles are not always from files

>>>from Bio import Entrez >>>from Bio import SeqIO>>>handle = Entrez.efetch(db="nucleotide",rettype="fasta",id="6273291")>>>seq_record = SeqIO.read(handle,”fasta”)>>>handle.close()>>>seq_record.description

>>>from Bio import ExPASy >>>from Bio import SeqIO>>>handle = ExPASy.get_sprot_raw("6273291")>>>seq_record = SeqIO.read(handle,”swiss”)>>>handle.close()>>>print seq_record.id>>>print seq_record.name>>>prin seq_record.description

Extra Questions

• How many records have a sequence of length 260?• What are the first 20 residues of 143X_MAIZE?• What is the identifier for the record with the shortest

sequence? Is there more than one record with that length?• What is the identifier for the record with the longest

sequence? Is there more than one record with that length?• How many contain the subsequence "ARRA"?• How many contain the substring "KCIP-1" in the description?

top related