2015 bioinformatics bio_python_partii

15

Upload: prof-wim-van-criekinge

Post on 11-Jan-2017

2.499 views

Category:

Education


0 download

TRANSCRIPT

Page 1: 2015 bioinformatics bio_python_partii
Page 2: 2015 bioinformatics bio_python_partii

FBW3-11-2015

Wim Van Criekinge

Page 3: 2015 bioinformatics bio_python_partii

Bioinformatics.be

Page 4: 2015 bioinformatics bio_python_partii
Page 5: 2015 bioinformatics bio_python_partii

GitHub: Hosted GIT

• Largest open source git hosting site• Public and private options• User-centric rather than project-centric• http://github.ugent.be (use your Ugent

login and password)– Accept invitation from Bioinformatics-I-

2015URI:– https://github.ugent.be/Bioinformatics-I-

2015/Python.git

Page 6: 2015 bioinformatics bio_python_partii

Control Structures

if condition: statements[elif condition: statements] ...else: statements

while condition: statements

for var in sequence: statements

breakcontinue

Page 7: 2015 bioinformatics bio_python_partii

Lists

• Flexible arrays, not Lisp-like linked lists

• a = [99, "bottles of beer", ["on", "the", "wall"]]

• Same operators as for strings• a+b, a*3, a[0], a[-1], a[1:], len(a)

• Item and slice assignment• a[0] = 98• a[1:2] = ["bottles", "of", "beer"]

-> [98, "bottles", "of", "beer", ["on", "the", "wall"]]• del a[-1] # -> [98, "bottles", "of",

"beer"]

Page 8: 2015 bioinformatics bio_python_partii

Dictionaries

• Hash tables, "associative arrays"• d = {"duck": "eend", "water": "water"}

• Lookup:• d["duck"] -> "eend"• d["back"] # raises KeyError exception

• Delete, insert, overwrite:• del d["water"] # {"duck": "eend", "back": "rug"}• d["back"] = "rug" # {"duck": "eend", "back":

"rug"}• d["duck"] = "duik" # {"duck": "duik", "back":

"rug"}

Page 9: 2015 bioinformatics bio_python_partii

Regex.py

text = 'abbaaabbbbaaaaa'pattern = 'ab'

for match in re.finditer(pattern, text): s = match.start() e = match.end() print ('Found "%s" at %d:%d' % (text[s:e], s, e))

m = re.search("^([A-Z]) ",line) if m: from_letter = m.groups()[0]

Page 10: 2015 bioinformatics bio_python_partii

Install Biopython

pip is the preferred installer program. Starting with Python 3.4, it is included by default with the Python binary installers.pip3.5 install Biopython

#pip3.5 install yahoo_finance

from yahoo_finance import Shareyahoo = Share('AAPL')print (yahoo.get_open())

Page 11: 2015 bioinformatics bio_python_partii
Page 12: 2015 bioinformatics bio_python_partii

BioPython

• Make a histogram of the MW (in kDa) of all proteins in Swiss-Prot

• Find the most basic and most acidic protein in Swiss-Prot?• Biological relevance of the results ?

From AAIndex

H ZIMJ680104D Isoelectric point (Zimmerman et al., 1968)R LIT:2004109b PMID:5700434A Zimmerman, J.M., Eliezer, N. and Simha, R.T The characterization of amino acid sequences in proteins by

statistical methodsJ J. Theor. Biol. 21, 170-201 (1968)C KLEP840101 0.941 FAUJ880111 0.813 FINA910103 0.805I A/L R/K N/M D/F C/P Q/S E/T G/W H/Y I/V 6.00 10.76 5.41 2.77 5.05 5.65 3.22 5.97 7.59

6.02 5.98 9.74 5.74 5.48 6.30 5.68 5.66 5.89 5.66 5.96

Page 13: 2015 bioinformatics bio_python_partii

Biopython AAindex ? Dictionary

… file parser

from Bio import SeqIOc=0handle = open(r'/Users/wvcrieki/Downloads/uniprot_sprot.dat') for seq_rec in SeqIO.parse(handle, "swiss"): print (seq_rec.id) print (repr(seq_rec.seq)) print (len(seq_rec)) c+=1 if c>5: break

Page 14: 2015 bioinformatics bio_python_partii

Parsing sequences from the net

Parsing GenBank records from the net

Parsing SwissProt sequence from the net

Handles are not always from files

>>>from Bio import Entrez >>>from Bio import SeqIO>>>handle = Entrez.efetch(db="nucleotide",rettype="fasta",id="6273291")>>>seq_record = SeqIO.read(handle,”fasta”)>>>handle.close()>>>seq_record.description

>>>from Bio import ExPASy >>>from Bio import SeqIO>>>handle = ExPASy.get_sprot_raw("6273291")>>>seq_record = SeqIO.read(handle,”swiss”)>>>handle.close()>>>print seq_record.id>>>print seq_record.name>>>prin seq_record.description

Page 15: 2015 bioinformatics bio_python_partii

Extra Questions

• How many records have a sequence of length 260?• What are the first 20 residues of 143X_MAIZE?• What is the identifier for the record with the shortest

sequence? Is there more than one record with that length?• What is the identifier for the record with the longest

sequence? Is there more than one record with that length?• How many contain the subsequence "ARRA"?• How many contain the substring "KCIP-1" in the description?