from neurons to neural networks

Post on 11-Jan-2016

36 Views

Category:

Documents

3 Downloads

Preview:

Click to see full reader

DESCRIPTION

From Neurons to Neural Networks. Jeff Knisley East Tennessee State University Mathematics of Molecular and Cellular Biology Seminar Institute for Mathematics and its Applications, April 2, 2008. Outline of the Talk. Brief Description of the Neuron A “Hot-Spot” Dendritic Model - PowerPoint PPT Presentation

TRANSCRIPT

From Neurons to Neural Networks

Jeff KnisleyEast Tennessee State University

Mathematics of Molecular and Cellular Biology Seminar

Institute for Mathematics and its Applications, April 2, 2008

Outline of the Talk

Brief Description of the Neuron A “Hot-Spot” Dendritic Model

Classical Hodgkin-Huxley (HH) Model A Recent Approach to HH Nonlinearity

Artificial Neural Nets (ANN’s) 1957 – 1969: Perceptron Models 1980’s – soon: MLP’s and Others 1990’s – : Neuromimetic (Spiking) Neurons

Components of a Neuron

Dendrites

Soma

nucleus

Axon

Myelin Sheaths

Synaptic Terminals

Pre-Synaptic to Post-Synaptic

If threshold exceeded,then neuron “fires,” sending a signal along its axon.

Signal Propagation along Axon

Signal is electrical Membrane depolarization from resting -70 mV Myelin acts as an insulator

Propagation is electro-chemical Sodium channels open at breaks in myelin

Much higher external Sodium ion concentrations Potassium ions “work against” sodium Chloride, other influences also very important

Rapid depolarization at these breaks Signal travels faster than if only electrical

Signal Propagation along Axon

+++- - - +++- - -

+++- - - +++- - - +++

- - -

- - -+++

reversal

- - -+++

reversal- - -+++

reversal- - -+++

reversal

Action Potentials

Sodium ion channels open and close

Which causes Potassium ion channels to open and close

Action Potentials

Model “Spike”

Actual Spike Train

Post-Synaptic may be SubThreshold

Signals Decay at Soma if below a Certain threshold

Models beginwith section of a dendrite.

Derivation of the Model

Some Assumptions Assume Neuron separates R3 into 3 regions—interior

(i), exterior (e), and boundary membrane surface (m) Assume El is electric field and Bl is magnetic flux

density, where l = e, i Maxwell’s Equations:

Assume magnetic induction is negligible

Ee = – Ve and Ei = – Vi for potentials Vl , l = i,e

0ll t

B

E

Current Densities ji and je

Letl = conductivity 2-tensor, l = i, e Intracellular homogeneous; small radius Extracellular: Ion Populations!

Ohm’s Law (local): lll Ej

0 ij 02 iV

L0

ji

je

Charges (ions) collecton outside of boundarysurface (especially Na+)

where Im = membranecurrents. Thus,

me I j+ + + +

mee IV

Assume: Circular Cross-sections

Let V = Vi – Ve – Vrest be membrane potential difference, and let Rm, Ri , C be the membrane resistance, intracellular resistance, membrane capacitance, respectively. Let Isyn be a “catch all” for ion channel activity.

ionm

m It

VC

R

VI

d

4 syni m

d V V VC I

x R x x R x t

Lord Kelvin:

Cable EquationIion

Dimensionless Cables

2

2 m m syn

V VV R I

X t

Let and let and m= RmC constant4

m

i

R dx X

R

Tapered Cylinders: Z instead of X and a taper constant K.

2

2 m m syn

V V VK V R I

Z Z t

Iion

Iion

Rall’s Theorem for Untapered

If at each branching the parent

diameter and the daughter cylinder

diameters satisfy

then the dendritic tree can be reduced

to a single equivalent cylinder.

parentdaughters

3/ 2 3/ 2parent j

j daughters

d d

Equivalent Cylinder

Dendritic Models

Full Arbor ModelTapered Equivalent Cylinder

Soma

Tapered Equivalent Cylinder

Rall’s theorem (modified for taper) allows us to collapse to an equivalent cylinder

Assume “hot spots” at x0, x1, …, xm

Soma

0 x0 x1 . . . xm l

. . .

Ion Channel Hot Spots

(Poznanski) Ij due to ion channel(s) at the jth hot spot

Green’s function G(x, xj, t) is solution to hot spot equation for Ij as a point source and others = 0 Plus boundary conditions and Initial conditions Green is solution to Equivalent Cylinder model

2

214

nm

m j jji

R d V VR C V I t x x

R x t

Equivalent Cylinder Model (Iion = 0)

2

20

, 0 ( )

tanh 0,0, 0,

,0 . .

m

s

V VV

X tV

L t no current through endX

L V tVt V t

X t

V x Steady State from const curr

Soma: V ( 0, t ) = Vclamp (voltage clamp)

For Tapered Equivalent Cylinder Model, equation is of the form

02

2

Vt

V

Z

VZF

Z

Vm

Properties

Spectrum is solely non-negative eigenvalues Eigenvectors are orthogonal in Voltage Clamp Eigenvectors are not orthogonal in original

Solutions are multi-exponential decays

Linear Models useful for subthreshold activation assuming nonlinearities (Iion) are not arbitrarily close to soma (and no electric field (ephaptic) effects)

1

/,k

tk

keXCtXV

Somatic Voltage Recording

Saturate to Steady State

Experimental ArtifactMultiExponential Decay

Ionic Channel Effects

0 10ms

Hodgkin-Huxley: Ionic Currents

1963 Nobel Prize in Medicine Cable Equation plus Ionic Currents (Isyn) From Numerous Voltage Clamp Experiments

with squid giant axon (0.5-1.0 mm in diameter) Produces Action Potentials

Ionic Channels n = potassium activation variable m = sodium activation variable h = sodium inactivation variable

Hodgkin-Huxley Equations

24 3

24

1 , 1 ,

1

m l K NaK Nai

n n m m

h h

d V VC g V V g n V V g m h V V

R x t

n mn n m m

t th

h ht

where any V with subscript is constant, any g with a bar is constant, and each of the ’s and ’s are of similar form:

/80

10 /10

10 1,

8100 1

Vn nV

VV V e

e

∙ (x-x j)

HH combined with “Hot Spots”

The solution to the equiv cylinder with hotspots is

where Ij is the restriction of V to jth “hot spot”. At a hot-spot, V satisfies ODE of the form

where m, n, and h are functions of V.

0

0

, , ,n t

initial j jj

V x t V G x x t I d

4 3m l K NaK Na

VC g V V g n V V g m h V V

t

Brief description of an Approach to HH ion channel nonlinearities

Goal: Accessible Approximations that still produce action potentials.

Can be addressed using Linear Embedding, which is closely related to the method of Turning Variables. Maps an finite degree polynomially nonlinear dynamical

system into an infinite degree linear system. The result is an infinite dimensional linear system which is

as unmanageable as the original nonlinear equation. Non-normal operators with continua of eigenvalues Difficult to project back to nonlinear system (convergence

and stability are thorny) But still the approach has some value (action potentials).

The Hot-Spot Model “Qualitatively”

0

0

0, 0, ,n t

j jj

V t G x t I d

Inputs fromOther Neurons

and ion channels

Key Features: Summation of Synaptic Inputs. If V(0,t) is large, action potential travels down axon.

From Subthreshold (Rall Eq. Cyl or

Full Arbor)

Artificial Neural Network (ANN)

Made of artificial neurons, each of which Sums inputs xi from other neurons Compares sum to threshold Sends signal to other neurons if above

threshold Synapses have weights

Model relative ion collections Model efficacy (strength) of synapse

Artificial Neuron

th thi jw synaptic weight betweeni and j neuron

" "firing function that maps state to output

i i ix s i ij js w x

1x2x3x

nx

1iw2iw

3iw

inw

..

.

thj threshold of j neuron

Nonlinear firing function

First Generation: 1957 - 1969

Best Understood in terms of Classifiers Partition a data space into regions containing

data points of the same classification. The regions are predictions of the

classification of new data points.

Simple Perceptron Model

Given 2 classes – Reference and Sample

Firing function (activation function) has only two values, 0 or 1.

“Learning” is by incremental updating of weights using a linear learning rule

referencefromif

samplefromifOutput

0

1w1

w2

wn

Perceptron Limitations

Cannot Do XOR (1969, Minsky and Papert) Data must be linearly separable

1970’s: ANN’s “Wilderness Experience” – only a handful working and very “un-neuron-like”

Support Vector Machine: Perceptron on a Feature Space Data is projected into a high-dimensional

Feature Space, separated with a hyperplane Choice of Feature Space (kernel) is key. Predictions based on location of hyperplane

Second Generation: 1981 - Soon

Big Ideas from other Fields J. J. Hopfield compares neural networks to

Ising Spin Glass models. Uses statistical Mechanics to prove that ANN’s minimize a total energy functional.

Cognitive Psychology provides new insights into how neural networks learn.

Big Ideas from Math Kolmogorov’s Theorem

AND

Firing Functions are Sigmoidal

j

j

j

1

1 j j jj j s

se

3 Layer Neural Network

Output

Hidden(is usually much larger)

Input

The output layer mayconsist of a single neuron

Multilayer Network

1 1 1t w x

1x

2x3x

nx

..

....

tN N N w x

12

N1

N

j jj

out

1

Nt

j j jj

out

w x

Hilbert’s Thirteenth Problem

Original: “Are there continuous functions of 3 variables that are not representable by a superposition of composition of functions of 2 variables?”

Modern: Can a continuous function of n variables on a bounded domain of n-space be written as sums of compositions of functions of 1 variable?

Kolmogorov’s Theorem

Modified Version: Any continuous function f

of n variables can be written

where only h and w’s depend on f

(That is, the g’s are fixed)

2 1

11 1

, ,n n

n ij j ij i

f s s h g s

Cybenko (1989)

Let be any continuous sigmoidal function,

and let x = (x1,…,xn). If f is absolutely integrable

over the n-dimensional unit cube, then for all >0,

there exists a (possibly very large ) integer N and

vectors w1,…,wN such that

where 1,…,N and 1,…,N are fixed parameters.

1

NT

j j jj

f

x w x

Multilayer Network (MLP’s)

1 1 1t w x

1x

2x3x

nx

..

....

tN N N w x

12

N1

N

j jj

out

1

Nt

j j jj

out

w x

ANN as a Universal Classifier

Designs a function f : Data -> Classes Example: f ( Red ) = 1, f ( Blue) = 0 Support of f defines the regions

Data is used to train (i.e., design ) function fsupp(f)

Example – Predicting Trees that are or are not RNA-like

D d-t d-a d-L d-D Lamb-2 E-ratio Randics

0.333333 0.666667 0.666667 0.5 0.666667 0.2679 0.8 2.914214

0.333333 0.5 0.5 0.5 0.666667 0.3249 1 2.770056

0.5 0.5 0.5 0.5 0.5 0.382 1 2.80806

0.166667 0.333333 0.5 0.833333 0.833333 1 2 2.236068

0.333333 0.333333 0.333333 0.666667 0.666667 0.4384 1.2 2.642734

0.333333 0.333333 0.333333 0.666667 0.666667 0.4859 1.4 2.56066

RNALike

NotRNALike

Construct Graphical Invariants Train ANN using known RNA-trees Predict the others

2nd Generation: Phenomenal Success

Data Mining of Micro-array data Stock and commodities trading: ANN’s are an

important part of “computerized trading” Post office mail sorting

This tiny 3-Dimensional Artificial Neural Network, modeled after neural networks in the human brain, is helping machines better visualize their surroundings.

The Mars Rovers ANN decides between “rough” and “smooth”

“rough” and “smooth” are ambiguous

Learningvia many“examples”

And a neural network can lose up to 10% of its neurons without significant loss in performance!

ANN Limitations

Overfitting: e.g, if Training Set is “unbalanced”

Mislabeled data can lead to slow (or no) convergence or incorrect results.

Hard Margins: No “fuzzing” of the boundary

Overfitting may

ProduceIsolatedRegions

Problems on the Horizon

Limitations are becoming very limiting Trained networks often are poor learners (and

self-learners are hard to train) In real neural networks, more neurons imply

better networks (not so in ANNs ). Temporal data is problematic – ANN’s have no

concept or a poor concept of time “Hybridized ANN’s” becoming the rule

SVM’s probably the tool of choice at present SOFM’s, Fuzzy ANN’s, Connectionism

Third Generation: 1997 -

Back to Bio: Spiking Neural Networks (SNN) Asynchronous, action-potential driven ANN’s

have been around for some time. SNN’s show “promise” but results beyond

current ANN’s have been elusive Simulating actual HH equations (neuromimetic)

has to date not been enough Time is both a promise and a curse

A Possible Approach: Use current dendritic models to modify existing ANN’s.

ANN’s with Multiple Time Scales

SNN that reduces to ANN & preserves Kolmogorov Thm The solution to the equiv cylinder with hotspots is

where Ij is the restriction of V to jth “hot spot”.

Equivalent Artificial Neuron:

0

0

0, 0, ,n t

initial j jj

V t V G x t I d

ij

t

jji dxtts 0

Incorporating MultiExponentials

G (0,x,t) is often a multi-exponential decay. In terms of time constants k

wjk are synaptic “weights” k from electrotonic and morphometric data

Rate of taper, Length of dendrites Branching, capacitance, resistance

n

j

t

jut

jkk

duxeews kk

10

//

1

Approximation and Simplification

If xj(u) approx 1 or xj(u) approx 0, then

A Special Case (k is a constant)

t = 0 yields the standard Neural Net Model Standard Neural Net as initial Steady State Modify with time-dependent transient

txews j

n

j

tkjk

k

k

1

/

1

1

j

n

j

ktjj xepws

1

1

Artificial Neuron

" "firing function that maps state to output

i i ix s

1x2x3x

nx

..

.

thj threshold of j neuron

Nonlinear firing function

j

n

j

ktijiji xepws

1

1

wij, pij = synaptic weights

wi1, pi1

win, pin

Steady State and Transient

Sensitivity and Soft Margins t = 0 is a perceptron with weights wij

t = ∞ is a perceptron with weights wij + pij For all t in (0, ∞), a traditional ANN with weights

between wij and wij + pij Transient is a perturbation scheme Many predictions over time (soft margins)

Algorithm Partition training set into subsets Train at t=0 for initial subset Train at t > 0 values for other subsets

Training the Network

Define an energy function

vectors are the information to be “learned” Neural networks minimize energy The “information” in the network is

equivalent to the minima of the total squared energy function

n

iiiiE

1

2

2

1

Back Propagation

Minimize Energy Choose wj and j so that In practice, this is hard

Back Propagation with cont. sigmoidal Feed Forward, Calculate E, modify weights

Repeat until E is sufficiently close to 0

0, 0ij j

E E

w

n

jjjkjjjj

newj

newj

jjjjjjjnewj

newj

xww

yyy

1

1,

Back Propagation with Transient

Train Network Initially (choose wj and j) Each “synapse” given a transient weight pij

Algorithm Addressing Over-fitting/Sensitivity Weights must be given random initial values Weights pij also given random initial values Separate Training of wj and j and pij ameliorates

over-fitting during the training sequence

n

jjjkjjj

ktj

newjoutput

newjhidden

ktjj

newjoutput

newjoutput

expp

epp

1,,

,,

)1(

),1(

Observations/Results

Spiking does occur But only if network is properly “initiated” Spikes only resemble Action Potentials

This is one approach to SNN’s Not likely to be the final word Other real neuron features may be necessary

(e.g., tapering axons can limit frequency of action potentials: also—branching! )

This approach does show promise in handling temporal information

Any Questions?

Thank you!

top related