2015 bioinformatics bio_python_partii

FBW3-11-2015

Wim Van Criekinge

Bioinformatics.be

GitHub: Hosted GIT

• Largest open source git hosting site• Public and private options• User-centric rather than project-centric• http://github.ugent.be (use your Ugent

login and password)– Accept invitation from Bioinformatics-I-

2015URI:– https://github.ugent.be/Bioinformatics-I-

2015/Python.git

Control Structures

if condition: statements[elif condition: statements] ...else: statements

while condition: statements

for var in sequence: statements

breakcontinue

• Flexible arrays, not Lisp-like linked lists

• a = [99, "bottles of beer", ["on", "the", "wall"]]

• Same operators as for strings• a+b, a*3, a[0], a[-1], a[1:], len(a)

• Item and slice assignment• a[0] = 98• a[1:2] = ["bottles", "of", "beer"]

-> [98, "bottles", "of", "beer", ["on", "the", "wall"]]• del a[-1] # -> [98, "bottles", "of",

"beer"]

Dictionaries

• Hash tables, "associative arrays"• d = {"duck": "eend", "water": "water"}

• Lookup:• d["duck"] -> "eend"• d["back"] # raises KeyError exception

• Delete, insert, overwrite:• del d["water"] # {"duck": "eend", "back": "rug"}• d["back"] = "rug" # {"duck": "eend", "back":

"rug"}• d["duck"] = "duik" # {"duck": "duik", "back":

"rug"}

Regex.py

text = 'abbaaabbbbaaaaa'pattern = 'ab'

for match in re.finditer(pattern, text): s = match.start() e = match.end() print ('Found "%s" at %d:%d' % (text[s:e], s, e))

m = re.search("^([A-Z]) ",line) if m: from_letter = m.groups()[0]

Install Biopython

pip is the preferred installer program. Starting with Python 3.4, it is included by default with the Python binary installers.pip3.5 install Biopython

#pip3.5 install yahoo_finance

from yahoo_finance import Shareyahoo = Share('AAPL')print (yahoo.get_open())

BioPython

• Make a histogram of the MW (in kDa) of all proteins in Swiss-Prot

• Find the most basic and most acidic protein in Swiss-Prot?• Biological relevance of the results ?

From AAIndex

H ZIMJ680104D Isoelectric point (Zimmerman et al., 1968)R LIT:2004109b PMID:5700434A Zimmerman, J.M., Eliezer, N. and Simha, R.T The characterization of amino acid sequences in proteins by

statistical methodsJ J. Theor. Biol. 21, 170-201 (1968)C KLEP840101 0.941 FAUJ880111 0.813 FINA910103 0.805I A/L R/K N/M D/F C/P Q/S E/T G/W H/Y I/V 6.00 10.76 5.41 2.77 5.05 5.65 3.22 5.97 7.59

6.02 5.98 9.74 5.74 5.48 6.30 5.68 5.66 5.89 5.66 5.96

Biopython AAindex ? Dictionary

… file parser

from Bio import SeqIOc=0handle = open(r'/Users/wvcrieki/Downloads/uniprot_sprot.dat') for seq_rec in SeqIO.parse(handle, "swiss"): print (seq_rec.id) print (repr(seq_rec.seq)) print (len(seq_rec)) c+=1 if c>5: break

Parsing sequences from the net

Parsing GenBank records from the net

Parsing SwissProt sequence from the net

Handles are not always from files

>>>from Bio import Entrez >>>from Bio import SeqIO>>>handle = Entrez.efetch(db="nucleotide",rettype="fasta",id="6273291")>>>seq_record = SeqIO.read(handle,”fasta”)>>>handle.close()>>>seq_record.description

>>>from Bio import ExPASy >>>from Bio import SeqIO>>>handle = ExPASy.get_sprot_raw("6273291")>>>seq_record = SeqIO.read(handle,”swiss”)>>>handle.close()>>>print seq_record.id>>>print seq_record.name>>>prin seq_record.description

Extra Questions

• How many records have a sequence of length 260?• What are the first 20 residues of 143X_MAIZE?• What is the identifier for the record with the shortest

sequence? Is there more than one record with that length?• What is the identifier for the record with the longest

sequence? Is there more than one record with that length?• How many contain the subsequence "ARRA"?• How many contain the substring "KCIP-1" in the description?

2015 bioinformatics bio_python_partii

Education

2016 bioinformatics i_bio_python_ii_wimvancriekinge

chap. 1 molecular and biological chemistry. bioinformatics ?...

mapreduce for bioinformatics

2016 bioinformatics i_bio_python_wimvancriekinge

introduction to bioinformatics - hu-berlin.de ·...

introduction to bioinformatics

automated exploration of bioinformatics spaces simon colton...

wp5: microbial bioinformatics (mb)

bioruby -- bioinformatics library

a whirlwind tour of bioinformatics

statistical genomics and bioinformatics workshop:...

knime & bioinformatics

bioinformatics review - january 2016 issue

non-coding rnas - bioinformatics leipzig

bioinformatics course - lesson 2

storage solutions for bioinformatics

bioinformatics resources and tools

experiment guide of bioinformatics - zhejiang...

bioinformatics practicas ub sesión 4

bioinformatics b90901099 劉兆昕. 2 outline what is...