càlcul i anàlisi de dades massiu per al disseny d'enzims amb aplicacions a la indústria...
TRANSCRIPT
TAC’2015 30/juny/2015 Xevi Biarnés @xevibiarnes
Càlcul i anàlisi de dades massiu per al disseny d’enzims amb aplicacions biotecnològiques
Xevi Biarnés
@xevibiarnes
Departament de Bioenginyeria
IQS School of Engineering
Jornada TAC’2015
Mataró, 30 de juny de 2015
TAC’2015 30/juny/2015 Xevi Biarnés @xevibiarnes
IQS School of Engineering IQS School of Management
Via Augusta, 390 Barcelona
TAC’2015 30/juny/2015 Xevi Biarnés @xevibiarnes
Bioengineering Department
Laboratory of Biochemistry Laboratory of Biomaterials Laboratory of Tissue Engineering Laboratory of Microbiology Laboratory of Bioprocesses
Degree in Biotechnology Master’s of Science in Bioengineering PhD in Bioengineering
TAC’2015 30/juny/2015 Xevi Biarnés @xevibiarnes
Laboratory of Biochemistry
Main R&D topics:
Protein engineering and enzymology of glycosidases and glycosyltranferases Therapeutic targets in infectious diseases Amyloidogenic proteins in neurodegenerative diseases Biocatalysis: enzyme redesign, directed evolution of enzymes Metabolic Engineering for the production of glycoglycerolipids
Headed by Prof. Antoni Planas
5 permanent staff 8 PhD students 6 MSc students 4 undergrad students 2 research assistants
TAC’2015 30/juny/2015 Xevi Biarnés @xevibiarnes
Bioinformatics and Molecular Modelling Unit
Laboratory of Biochemistry
BIOMMIQS
Bioinformatics for comparative analysis of genomic sequences
Protein Structure Prediction
In silico tools to assist in experimental Protein Engineering
Simulation of small and macro molecules conformational dynamics
TAC’2015 30/juny/2015 Xevi Biarnés @xevibiarnes
BIOMMIQS @IQS and Anella Científica
BIOMMIQS
Direct access to consortium services:
CBUC/CCUC EDUROAM
TAC’2015 30/juny/2015 Xevi Biarnés @xevibiarnes
BIOMMIQS @IQS and Anella Científica
Barcelona (BSC-CNS) MareNostrum 3, MinoTauro, Altix Madrid (CeSViMa-UPM) Magerit Islas Canarias (IAC, ITC) LaPalma 2, Atlante Cantabria (UC) Altamira 2 Málaga (UMA) Picasso 2 Valéncia (UV) Tirant 2 Zaragoza (BIFI-UZ) CaesarAugusta 2
BIOMMIQS
TAC’2015 30/juny/2015 Xevi Biarnés @xevibiarnes
Re-Evolution of Genomes
optimization of the genetic codes of living organisms to adapt to their
living environments
TAC’2015 30/juny/2015 Xevi Biarnés @xevibiarnes
Burst of Genomic Data
• Growth of GeneBank database
http://www.ncbi.nlm.nih.gov/genbank/statistics
700 GBytes of raw data
TAC’2015 30/juny/2015 Xevi Biarnés @xevibiarnes
ACTAACCCCTCAGTTTTTGTCAAGCTGTCAGACCCTCCAGCGCAGGTTTCAGTGCCATTCATGTCACCTGCGAGTGCTTATCAATGGTTTTATGACGGATATCCCACATTCGGAGAACACAAACAGGAGAAAGATCTTGAATACGGGGCATGTCCTAATAACATGATGGGCACGTTCTCAGTGCGGACTGTGGGGACCTCCAAGTCCAAGTACCCTTTAGTGGTTAGGATCTACATGAGAATGAAGCACGTCAGGGCGTGGATACCTCGCCCGATGCGTAACCAGAACTACCTATTCAAAGCCAACCCAAATTATGCTGGCAACTCCATTAAGCCAACTGGTGCCAGTCGTACAGCGATCACCACTCTTGGGAAATTTGGACAACAGTCTGGGGCTATTTATGTGGGCAACTTTAGAGTGGTCAACCGACATCTTGCCACTCACAATGATTGGGCAAATCTTGTTTGGGAAGACAGCTCTCGCGACTTGCTCGTGTCATGAACCACCGCCCAAGGCTGTGACACGATTGCTCGTTGCGATTGCCAGACAGGGGTGTACTACTGTAACTCGATGAGAAAACACTACCCAGTCAGTTTTTCAAAACCCAGCCTGATCTATGTAGAGGCTAGCGAGTATTACCCAGCCAGGTACCAATCACATCTCATGCTCGCACAGGGTCACTCAGAACCTGGTGATTGCGGTGGTATCCTTAGATGCCAACATGGCGTCGTCGGCATAGTGTCTACTGGTGGTAATGGGCTCGTTGGCTTTGCAGACGTTAGAGACCTCTTGTGGTTAGATGAAGAAGCTATGGAACAGGGCGTGTCCGACTACATCAAGGGTCTCGGAGATGCTTTTGGAACAGGCTTCACTGACGCAGTCTCAAGGGAGGTTGAAGCTCTCAAGAACTATCTTATAGGGTCTGAAGGAGCAGTTGAGAAAATCTTGAAAAATCTTATTAAACTAATCTCTGCACTGGTATTGTGATCAGAAGTGATTACGACATGGTTA
Where is the information?
TAC’2015 30/juny/2015 Xevi Biarnés @xevibiarnes
Proteins are the Machinery of the Cells
genes (DNA) only keep the information
proteins (aminoacids) perform the function
The Inner Life of the Cell (youtube)
XVIVO Scientific Animation
TAC’2015 30/juny/2015 Xevi Biarnés @xevibiarnes
Proteins are 3D Objects
nanometer size
>sequence_of_aminoacids
MYCSASTCTCSTTATRHYGCKLMNDSSCRFGH
KLISPRDTEDFSGFRTCSKLIPSCSFACVIPL
PSFACEERERWQSRTNCVISCRTEDPLKISCF
GRSRACGRSTTRSGCSPLYPLREDTSWASDFR
3D structure and function of proteins is dictated by their aminoacids sequence
TAC’2015 30/juny/2015 Xevi Biarnés @xevibiarnes
Proteins are Dynamic 3D Objects
hemoglobin: oxygen
transporter in blood
eppur si muove
a = F / m
x(t) = x0 + v0·t + ½·a·t2
protein motion can be simulated on the computer: Molecular Dynamics (MD)
typical simulation: 1 billion of steps!
TAC’2015 30/juny/2015 Xevi Biarnés @xevibiarnes
Computer Simulations of Biological Processes @ BIOMMIQS
Implementation of computational algorithms
based on Molecular Dynamics (metadynamics)
that enhance the simulation of biologically relevant processes
BIOMMIQS
Protein Folding Protein Aggregation
TAC’2015 30/juny/2015 Xevi Biarnés @xevibiarnes
Computer Simulations of Biological Processes @ BIOMMIQS
Prion protein folding
Prion protein (the causative agent of spongiform encephalopathy) is an unstable protein that can adopt different structures.
One of these structures, tends to form precipitates in the central nervous system tissue, leading to neurodegeneration.
TAC’2015 30/juny/2015 Xevi Biarnés @xevibiarnes
Computer Simulations of Biological Processes @ BIOMMIQS
Prion protein folding
The structural determinants of prion protein stability were identified in-silico by extensive computer simulations.
Benetti F. and Biarnés X. et al, JMB 2014
The simulations spent 1.000.000 of hours of total CPU time,
and generated 1 TeraByte of data.
TAC’2015 30/juny/2015 Xevi Biarnés @xevibiarnes
Computer Simulations of Biological Processes @ BIOMMIQS
Amyloid-β peptide aggregation
Amyloid-β is an intrinsically disordered protein. The final segment of this protein can lead to aggregates. These aggregates are associated to Alzheimer’s disease.
18 molecules of the final Amyloid-β segment were simulated, and a nascent fibril was detected.
Baftizadeh F. et al, PRL 2013
TAC’2015 30/juny/2015 Xevi Biarnés @xevibiarnes
Technical Details of Computer Simulations
• Huge CPU-demanding
– millions of hours in CPU time supercomputers
• Big Data Storage
– 100 GBytes per simulation (3-4 months) cloud storage?
• Data Transfer
– 1 GByte of data generated daily
– Need to transfer locally for visualization efficient communications
– Current download rates:
• From CESVIMA (UPM Madrid) to IQS 5.3 MBytes / s
• From BSC (UPC Barcelona) to IQS ??
• From SISSA (Trieste, Italy) to IQS 9.5 Mbytes / s
TAC’2015 30/juny/2015 Xevi Biarnés @xevibiarnes
Technical Details of Computer Simulations
• Visualization
– Renders are done off-line in local computers
• Visualization on-line?
– Remote Desktops are a solution not nice
– “Streaming” of xyz coordinates could be a solution?
C 3.23 2.22 4.34 O 2.31 1.34 3.41 H 2.88 2.35 5.32 C 3.21 2.11 1.22 … …
30000-50000 atoms x 100000 frames
Minimal Atomic Coordinates File atom x y z
3D renders are generated by specific software based on an atomic coordinate file containing
the xyz coordinates of each atom in the protein structure
TAC’2015 30/juny/2015 Xevi Biarnés @xevibiarnes
Proteins as Chemical Machines: ENZYMES
Chemical transformations rule our life
6·CO2 + 6·H2O C6H12O6 + 6·O2
Enzymes decrease the activation energy required for a chemical
transformation: this is a catalyst
PHOTOSYNTHESIS
TAC’2015 30/juny/2015 Xevi Biarnés @xevibiarnes
Proteins as “bio”catalysts (enzymes)
Protein structures are tightly tuned to accommodate their
natural ligands.
Maximum catalytic efficiency of enzymes is attained, in part, by the
binding forces in the active site.
TAC’2015 30/juny/2015 Xevi Biarnés @xevibiarnes
Industrial applications of enzymes
Amylases production of sugars from starch in syrups production
Glucanases starch degradation prior to fermentation in beer production
Proteases
Cellulases cellulose degradation prior to fermentation in bioethanol production
Lipases esterification of lipids in biodiesel production
Amylases, Xylanases, Cellulases, Ligninases starch degradation to lower viscosity, aiding sizing and coating paper
TAC’2015 30/juny/2015 Xevi Biarnés @xevibiarnes
Industrial applications of enzymes
PRODUCTION OF NON-NATURAL ADDED-VALUE COMPOUNDS
Pharmaceuticals Pigments Biomaterials
Complementing traditional chemical industry
···
- Processes optimization - Green chemistry
TAC’2015 30/juny/2015 Xevi Biarnés @xevibiarnes
Enzymes to Produce Added-Value Compounds
natural compound
novel compound
there is room for enzyme optimization by PROTEIN ENGINEERING
Natural enzymes are not optimized for non-natural compounds
TAC’2015 30/juny/2015 Xevi Biarnés @xevibiarnes
The Protein Engineering Dilemma
MSDTAGSPWFSHSLKRNQDFGFYYSDFCNARSDTPQSCWREGQNESDRQTAVWPYRTSCNMLKCSRYTCVPM
Protein Engineering can be guided by Computer Simulations and Genomic-Data Mining
TAC’2015 30/juny/2015 Xevi Biarnés @xevibiarnes
Protein Engineering by Data Mining @ BIOMMIQS
Setting-up of an integrative platform
to assist in protein engineering experiments
BIOMMIQS
The platform is based on biological data integration from different database sources
and complemented with computer simulations
TAC’2015 30/juny/2015 Xevi Biarnés @xevibiarnes
Protein Engineering by Data Mining @ BIOMMIQS
UNIPROT 50.000.000 protein sequences
PDB 110.000 protein structures
PFAM 16.000 protein functions
CAZY 340.000 enzymes active on carbohydrates
GENBANK 185.000.000 genomic sequences
http://www.ncbi.nlm.nih.gov/genbank/
http://uniprot.org
http://pdb.org
http://pfam.xfam.org
http://cazy.org
TAC’2015 30/juny/2015 Xevi Biarnés @xevibiarnes
Protein Engineering by Data Mining @ BIOMMIQS
Protein engineering of chitindeacetylases for the biotechnological production of chitosan
http://nano3bio.eu
from chitin
to chitosan
… …
+ + + … …
TAC’2015 30/juny/2015 Xevi Biarnés @xevibiarnes
Protein Engineering by Data Mining @ BIOMMIQS
Natural chitindeacetylase enzymes producing different chitosans
Andrés E. et al, Angew Chem Intl Ed 2014
Engineering of a non-natural chitindeacetylase to produce new-to-nature chitosans
TAC’2015 30/juny/2015 Xevi Biarnés @xevibiarnes
Technical Details of Data Mining
• Public Databases size
– GENBANK 700 GBytes
– UNIPROT 27 GBytes
– PDB 373 GBytes
– PFAM 195 GBytes
– No local copies! For general purposes, public web services are used.
• BLAST
• JMOL
• HMMSEARCH
• Public Databases are updated regularly (weekly)
– Need to update local copies mirrors?
TAC’2015 30/juny/2015 Xevi Biarnés @xevibiarnes
Bioinformatics and Molecular Modelling Unit
Laboratory of Biochemistry
BIOMMIQS
Bioinformatics for comparative analysis of genomic sequences
Protein Structure Prediction
In silico tools to assist in experimental Protein Engineering
Simulation of small and macro molecules conformational dynamics
TAC’2015 30/juny/2015 Xevi Biarnés @xevibiarnes
Càlcul i anàlisi de dades massiu per al disseny d’enzims amb aplicacions biotecnològiques
Xevi Biarnés
@xevibiarnes
Departament de Bioenginyeria
IQS School of Engineering
Jornada TAC’2015
Mataró, 30 de juny de 2015