the role of machine learning in modelling the cell
TRANSCRIPT
The Role of Machine Learning in Modelling the Cell.
John Hawkins
ARC Centre for Complex Systems
University of Queensland
Australia
Overview of Talk
Overview of cell biology Modelling the cell Subcellular localisation signals Machine Learning in General Neural networks
Feed Forward versus Recurrent
Cell Biology – Quick and Dirty
Membrane bound Organelles
Nucleus DNA -> RNA ->
Protein Transport, e.g.
Mitochondria Peroxisome
Modification, e.g. Disulphide
Bond Formation Glycosylation
Cell Feedback
At a particular time point a set of genes will be expressed.
These do not remain constant, instead the emerging picture is that There is some essential cycle of gene
expression With a capacity to indulge in alternative
pathways of expression under external stimulus.
The pattern of expression is implemented by protein and RNA feedback onto the genes.
Modelling the cell Ideally we would like to model the cell from
the level of a 3D physical simulation. Currently this is infeasible
So numerous approaches are taken to form abstractions Gene Regulatory Networks Differential equation models of particular
pathways Machine learning models of particular
processes
Biological Sequences
Many Important Biological Molecules are Polymers. Thus representable as a sequence of discrete
symbols. Sequence M = [m1, m2, …, mn] where: DNA mi { A, T, G, C } RNA mi { A, U, G, C } Protein mi { G, A, V, L, I, P, S, C, T, M, D,
E, H, K, R, N, Q, F, Y, W }
Information Content
How much information in a linear sequence? Two crucial elements to function
Physical/chemical properties Molecular shape
Each residue has well known properties Denaturation. (Anfinsen,1973).
Sequence defines arrangement of chemical properties which in turn defines folding.
Biological Patterns
Motifs – General term for patterns Numerous Definitions & Visualisations
PROSITE Patterns – Regular Expression PROSITE Profiles – Probability Matrix LOGOs
Peroxisomal Localisation
Predominantly controlled by a C-terminal sequence called the PTS1 signal.
Roughly 12 residues long Known dependencies between
locations
Nuclear Export Some proteins move continuously between the
nucleus and cytoplasm of the cell. Either as:
Transporters Regulators
Machine Learning Requires a set of examples, with
Raw input, sequences data, and Known classes that the machine should
predict In essence Function Approximation
Start with a General parametrised function over the input data
Adjust the parameters until the output of the function is a good approximation to the known classes of the examples.
Bias
Bias is generally unavoidable (Mitchell, 1980)
Three Sources of Bias Input Encoding Function Structure (Architecture) Parameter adjustment algorithm (learning)
Neural Networks
Graphical Model consisting of layers of nodes connected by weights
Feed forward neural networks Fixed input window Signal propagates in a single pass through the
layers Recurrent Neural Networks
Signal processed in parts Recurrent connections maintain a memory state Output generated after processing the last piece
of the input signal
Simple Neural Networks
F F N N O h = S (W1 ∙ I1 + W2 ∙ I2 + b)
R N N O h = S (W1 ∙ I2 + W2 ∙ S (W1 ∙ I1 + b ) + b )
RNNs in Bioinformatics
Bi-Directional RNN
Applications
We have applied these techniques to Subcellular Localisation to
Endoplasmic Reticulum Mitochondria Chloroplast Peroxisome
http://pprowler.imb.uq.edu.au Working with whole genome data and wet
lab biologists to use these tools for data mining.
The End…
?