2015 bioinformatics bio_python_partii
Post on 11-Jan-2017
2.499 Views
Preview:
TRANSCRIPT
FBW3-11-2015
Wim Van Criekinge
Bioinformatics.be
GitHub: Hosted GIT
• Largest open source git hosting site• Public and private options• User-centric rather than project-centric• http://github.ugent.be (use your Ugent
login and password)– Accept invitation from Bioinformatics-I-
2015URI:– https://github.ugent.be/Bioinformatics-I-
2015/Python.git
Control Structures
if condition: statements[elif condition: statements] ...else: statements
while condition: statements
for var in sequence: statements
breakcontinue
Lists
• Flexible arrays, not Lisp-like linked lists
• a = [99, "bottles of beer", ["on", "the", "wall"]]
• Same operators as for strings• a+b, a*3, a[0], a[-1], a[1:], len(a)
• Item and slice assignment• a[0] = 98• a[1:2] = ["bottles", "of", "beer"]
-> [98, "bottles", "of", "beer", ["on", "the", "wall"]]• del a[-1] # -> [98, "bottles", "of",
"beer"]
Dictionaries
• Hash tables, "associative arrays"• d = {"duck": "eend", "water": "water"}
• Lookup:• d["duck"] -> "eend"• d["back"] # raises KeyError exception
• Delete, insert, overwrite:• del d["water"] # {"duck": "eend", "back": "rug"}• d["back"] = "rug" # {"duck": "eend", "back":
"rug"}• d["duck"] = "duik" # {"duck": "duik", "back":
"rug"}
Regex.py
text = 'abbaaabbbbaaaaa'pattern = 'ab'
for match in re.finditer(pattern, text): s = match.start() e = match.end() print ('Found "%s" at %d:%d' % (text[s:e], s, e))
m = re.search("^([A-Z]) ",line) if m: from_letter = m.groups()[0]
Install Biopython
pip is the preferred installer program. Starting with Python 3.4, it is included by default with the Python binary installers.pip3.5 install Biopython
#pip3.5 install yahoo_finance
from yahoo_finance import Shareyahoo = Share('AAPL')print (yahoo.get_open())
BioPython
• Make a histogram of the MW (in kDa) of all proteins in Swiss-Prot
• Find the most basic and most acidic protein in Swiss-Prot?• Biological relevance of the results ?
From AAIndex
H ZIMJ680104D Isoelectric point (Zimmerman et al., 1968)R LIT:2004109b PMID:5700434A Zimmerman, J.M., Eliezer, N. and Simha, R.T The characterization of amino acid sequences in proteins by
statistical methodsJ J. Theor. Biol. 21, 170-201 (1968)C KLEP840101 0.941 FAUJ880111 0.813 FINA910103 0.805I A/L R/K N/M D/F C/P Q/S E/T G/W H/Y I/V 6.00 10.76 5.41 2.77 5.05 5.65 3.22 5.97 7.59
6.02 5.98 9.74 5.74 5.48 6.30 5.68 5.66 5.89 5.66 5.96
Biopython AAindex ? Dictionary
… file parser
from Bio import SeqIOc=0handle = open(r'/Users/wvcrieki/Downloads/uniprot_sprot.dat') for seq_rec in SeqIO.parse(handle, "swiss"): print (seq_rec.id) print (repr(seq_rec.seq)) print (len(seq_rec)) c+=1 if c>5: break
Parsing sequences from the net
Parsing GenBank records from the net
Parsing SwissProt sequence from the net
Handles are not always from files
>>>from Bio import Entrez >>>from Bio import SeqIO>>>handle = Entrez.efetch(db="nucleotide",rettype="fasta",id="6273291")>>>seq_record = SeqIO.read(handle,”fasta”)>>>handle.close()>>>seq_record.description
>>>from Bio import ExPASy >>>from Bio import SeqIO>>>handle = ExPASy.get_sprot_raw("6273291")>>>seq_record = SeqIO.read(handle,”swiss”)>>>handle.close()>>>print seq_record.id>>>print seq_record.name>>>prin seq_record.description
Extra Questions
• How many records have a sequence of length 260?• What are the first 20 residues of 143X_MAIZE?• What is the identifier for the record with the shortest
sequence? Is there more than one record with that length?• What is the identifier for the record with the longest
sequence? Is there more than one record with that length?• How many contain the subsequence "ARRA"?• How many contain the substring "KCIP-1" in the description?
top related