a cascade associative memory model with a hierarchical memory structure

Contributed article

A cascade associative memory model with a hierarchical memorystructure

Makoto Hiraharaa,* , Natsuki Okaa, Toshiki Kindoa,b

aMatsushita Research Institute Tokyo, Inc., 3-10-1 Higashimita, Tama-Ku, Kawasaki 214, JapanbJapan Science and Technology Corporation, 4-1-8 Moto-machi, Kawaguchi 332, Japan

Received 17 September 1997; accepted 2 September 1999

Abstract

The introduction of a hierarchical memory structure into a cascade associative memory model for storing hierarchically correlated patternsimproves the storage capacity and the size of the basins of attraction remarkably. A learning algorithm groups descendants (second-levelpatterns) according to their ancestors (first-level ones), and organizes the memory structure in a weight matrix where the groups arememorized separately. The weight matrix is, thus, in the form of a pile of covariance matrices, each of which is responsible for recallingonly the descendants of each ancestor. Putting it simply, the model is multiplex associative memory. The recalling process proceeds asfollows: the model first recalls the ancestor of a target descendant. Then, the dynamics with dynamic threshold combines the ancestor and theweight matrix to activate the covariance matrix for recalling only the descendants of the ancestor. This mechanism suppresses the cross-talknoise generated by the descendants of the other ancestors, and the recalling ability is enhanced.q 2000 Elsevier Science Ltd. All rightsreserved.

Keywords: Cascade associative memory; Hetero-associative memory; Hierarchical memory structure; Difference patterns; Sparse encoding; Dynamic thresh-old; Storage capacity; Basins of attraction

1. Introduction

Hopfield (1982) demonstrated that the storage capacityaC of standard associative memory storing uncorrelatedpatterns is about 0.15, whereaC is the ratio between themaximum number of patterns to be stored as equilibria andthe number of unitsN. However, the standard associativememory cannot deal with correlated patterns, since thecross-talk noise generated by the other stored patterns inthe recalling of each pattern does not average to zero(Amit, Gutfreund & Sompolinsky, 1987). Even if someobjects are similar to one another, their representationsmust be orthogonal to one another. This is considered tobe quite unnatural in the biological sense and also greatlyreduces the possibility of applying associative memory inthe field of engineering.

Here, we focus on a special case of correlated patterns,namely hierarchically correlated patterns. In the case of two-level hierarchy,P1 ancestor patterns {jmi } �m � 1;…;P1; i �1;…;N� exist in the first level. For each {jmi } ; P2 descendantpatterns {jmni } �n � 1;…;P2; i � 1;…;N� are generated at

the second-level so as to have a correlationb �0 , b , 1�with { jmi } : Thus, the descendants {jmni } belonging to thesame ancestor {jmi } are distributed in the clusterm whosecenter is {jmi } : These hierarchically organized datafrequently appear in real world problems, and neuralnetworks in the human brain memorize and recall themflexibly. To model more realistic neural networks, manyresearchers extended the standard associative memory toallow for dealing with hierarchically correlated patterns(Feigelman & Ioffe, 1987; Gutfreund, 1988; Morita, 1993;Kakeya & Kindo, 1996; Hirahara, Oka & Kindo, 1997).

Feigelman and Ioffe (1987) proposed a model for storingthe ancestors and their descendants in the same weightmatrix. The storage capacity of the model is known to beof the same order of magnitude as that of the Hopfieldmodel. Nonmonotone dynamics, proposed by Morita(1993), is very attractive and significantly enhances the abil-ity of the standard associative memory, such as the storagecapacity (Yoshizawa, Morita & Amari, 1993) and thestorage of hierarchically correlated patterns. Morita’smodel stores only the descendants by Hebbian learning,and recalls only the descendants. If one does not requiredetailed information, the ancestors, which are the abstractconcepts among the descendants, become useful information,

Neural Networks 13 (2000) 41–50PERGAMON

NeuralNetworks

0893-6080/00/$ - see front matterq 2000 Elsevier Science Ltd. All rights reserved.PII: S0893-6080(99)00083-0

www.elsevier.com/locate/neunet

* Corresponding author. Tel.:181-44-911-6351; fax:181-44-911-8760.E-mail address:[email protected] (M. Hirahara).

and should be recalled in the human brain. To overcome thisproblem, Kakeya and Kindo (1996) adopted neuro-windowunits, which are nonmonotonic units with a controllableparameter. The parameter determines the form of thenonmonotonic function and enables the model to selectwhether an ancestor or a descendant should be recalled.However, this model needs an additional mechanism tocontrol the parameter value carefully. Furthermore, itsstorage capacity is very small.

Gutfreund (1988) proposed a different type of a modelconsisting of two auto-associative memories, each of whichhasN units. The first associative memory (AM1) and thesecond one (AM2) storeP1 ancestors {jmi } and P1P2 descen-dants {jmni } ; respectively. Although AM1 stores {jmi } expli-citly, one can adopt a model of concept formation (Amari,1977) to form {jmi } automatically. However, Gutfreund’smodel has a troublesome parameter on which the storagecapacity and the size of the basins of attraction stronglydepend. CASM, presented by Hirahara et al. (1997),does not have such a parameter, although it is similar instructure to Gutfreund’s model. CASM is characterized by

AM2 storing not {jmni } but difference patterns {hmni }

�m � 1;…;P1; n � 1;…;P2; i � 1;…;N�; which onlycontain information on the differences between {jmi } and{ jmni } : The difference patterns {hmn

i } become sparser withincreasing correlationb, which allows CASM to have alarger storage capacity than Gutfreund’s model. The storagecapacity increases with increasingb and is as large as that ofthe sparsely encoded associative memory (Amari, 1989;Okada, Mimura & Kurata, 1993; Okada, 1996).

The basins of attraction in CASM become larger withdecreasingb. Their size does not depend on the loadinglevel a �; P1P2=N� and is almost constant for a fixedvalue ofb. In the case ofa � 0:1; CASM has larger basinsof attraction than Gutfreund’s model and the standard asso-ciative memory storing uncorrelated patterns. However, thedependence of the basin size ona in Gutfreund’s model isnot known. Furthermore, the basins of attraction in the stan-dard associative memory become larger with decreasinga(Amari & Maginu, 1988; Okada, 1995, 1996), and are largerthan those in CASM whena , 0:08:

If neural networks such as CASM exist in the human

M. Hirahara et al. / Neural Networks 13 (2000) 41–5042

Nomenclature

{ jmi } ancestor pattern{ jmni } descendant pattern of {jmi }{hmn

i } difference pattern between {jmi } and {jmni }{ zmni } noisy input pattern originating from {jmni }{ ri} noise pattern (biased pattern){ x�1�i �t�} network state of AM1 at timet{ x�2�i �t�} network state of AM2 at timet{ y�2�i �t�} network state in the output layer of AM2 at timet{ xi�t�} network state of CASM at timet{ z�1�i �{ zmnj } �} stable state of AM1 when {zmni } is input

{ z�2�i �{ zmnj } �} stable state of AM2 when {zmni } is input

{ z�p�i �{ zmnj } �} stable state of CASM when {zmni } is input

{ w�1�ij } weight matrix of AM1

{ w�2�ij } weight matrix of AM2

{ w�2�ij �m�} covariance matrix for recalling {hmni } �n � 1;…;P2� in the clusterm

N number of units in an associative memory modelN�2� number of21 components in {hmn

i }P1 number of ancestorsP2 number of descendants for each ancestora loading level (; P1P2/N)aC storage capacityap

C storage capacity of standard associative memory storing uncorrelated patternsb correlation parameter between {jmi } and {jmni }c bias value of {ri}tc time when AM1 converges to {z�1�i �{ zmnj } �}u (t) dynamic threshold at timets(t) dynamic shift parameter at timetm(t) overlap between {jmni } and {x�2�i �t�}mC(b) critical overlap as a function ofb

brain, they should be very robust to input noise. To applyCASM in the field of engineering such as associativedatabases (Lim & Cherkassky, 1992) and natural languageprocessing (Collier, 1997), it is important to enlarge the sizeof the basins of attraction further. To approach theseproblems, this paper improves the original CASM(CASM1) to enlarge the size of the basins of attractionand presents CASM2.

The problem in CASM1 is that all the difference patterns{hmn

i } are uniformly stored in the same weight matrix ofAM2, although hierarchically correlated patterns form a treestructure: i.e. the descendants {jmni } belonging to the sameancestor {jmi } are distributed in the clusterm whose center is{ jmi } : In other words, CASM1 does not retain the tree struc-ture, and its memory structure is uniform in every level ofthe tree.

The presented CASM2 has a hierarchical memorystructure which is organized in the weight matrix of AM2.AM2 in CASM2 is a recurrent hetero-associative memoryassociating {jmni } with { hmn

i } ; and its weight matrix is in theform of a pile of covariance matrices, each of which isresponsible for recalling only the difference patterns{hmn

i } �n � 1;…;P2� originating from the descendants{ jmni } �n � 1;…;P2� belonging to each ancestor {jmi } : Inother words, AM2 is a type of multiplex associativememory.

Recalling in CASM2 proceeds as follows. When a targetdescendant {jmni } is input to CASM2, AM1 first recalls itsancestor {jmi } : Then, the dynamics of AM2 combines therecalled ancestor {jmi } and its weight matrix to activatethe covariance matrix for recalling only the differencepatterns {hmn

i } �n � 1;…;P2� originating from the descen-dants {jmni } �n � 1;…;P2� of the recalled ancestor {jmi } :This weight switching mechanism suppresses the cross-talk noise generated by the difference patterns {hm 0v

i }�m 0 ± m; n � 1;…;P2� originating from the descendants{ jm

0vi } �m 0 ± m; n � 1;…;P2� of the other ancestors

{ jm0

i } �m 0 ± m�: Furthermore, the dynamics of AM2 isadaptively tuned using dynamic threshold. The resultsof the analysis and numerical simulations show theadvantages of CASM2 over conventional models with

respect to the storage capacity and the size of the basinsof attraction.

2. Hierarchically correlated patterns

We introduce a procedure to generate a two-level hier-archical tree of correlated patterns (Gutfreund, 1988). Thispaper assumes thatP1 ancestors {jmi } �m � 1;…;P1; i �1;…;N� are random patterns whose components take valuesof 21 independently, with equal probability. Then, for each{ jmi } P2 descendants {jmni } �n � 1;…;P2; i � 1;…;N� aregenerated so as to have a correlationb (0 , b , 1) with{ jmi } as follows:

Pr�jmni � � 12 �1 1 jmi b�d�jmni 2 1�1 1

2 �1 2 jmi b�d�jmni 1 1�;

whered�u� � 1 for u� 0 and 0 otherwise. The total numberof the descendants generated by this procedure is given byP1P2.

3. Model

The original CASM1 is introduced, following which wepresent CASM2. To describe the models below, we define anoisy input pattern {zmni } originating from {jmni } as

zmni � jmni ri ; �1�where {ri} is a biased pattern whose bias isc (21 # c # 1).The components of {ri} take values of21 independently of{ jmni } : Since the sign ofjmni is reversed whenri � 21; {ri}acts as noise embedded in {jmni } :

3.1. CASM1

We introduce CASM1 briefly (Hirahara et al., 1997).Fig. 1 shows the structure of CASM1 composed of twoauto-associative memories, AM1 and AM2. When a noisypattern {zmni } is input to CASM1, AM1 storing {jmi } starts tochange the state {x�1�i �t�} as follows:

x�1�i �0� � zmni ;

x�1�i �t 1 1� � sgnXj±i

w�1�ij x�1�j �t�24 35;

w�1�ij �1

N 2 1

Xm

jmi jmj ;

where sgn�u� � 1 for u $ 0; and 21 otherwise. Iteratingthis dynamics, AM1 converges to {z�1�i �{ zmnj } �} which isthe stable state when {zmni } is input to CASM.

Here, we assume that {zmni } is in the basin of attractionaround {jmi } ; which can be expressed as

1N

Xi

jmi z�1�i �{ zmnj } � � 1 for all m andn �2�

M. Hirahara et al. / Neural Networks 13 (2000) 41–50 43

Fig. 1. Structure of CASM1 composed of two auto-associative memories;AM1 storing ancestors {jmi } and AM2 storing difference patterns {hmn

i } ;which contain only information on the differences between {jmi } anddescendants {jmni } : The difference patterns {hmn

i } are biased and indepen-dent of one another even ifm is the same. AM2 stores all the differencepatterns uniformly. Hence, the memory structure of AM2 is uniform.

Then, the difference patterns {hmni } to be stored in AM2 are

defined by

hmni ; jmni z�1�i �{ zmnj } � � jmni jmi :

Since hmni � 1 for jmi � jmni and 21 otherwise, {hmn

i }contains only information on the differences between {jmi }and {jmni } : The difference patterns {hmn

i } are biased oneswhose bias is equal to the correlationb and become sparserwith increasingb. They have the following relationship:

E�hmni hm 0n 0

i � � 1 if m � m 0 andn � n 0

b2 otherwise:

(�3�

Thus, the difference patterns {hmni } are independent of one

another even ifm is the same. All of them are storeduniformly in the weight matrix {w�2�ij } of AM2 as follows:

w�2�ij �1

�N 2 1��1 2 b2�Xmn

�hmni 2 b��hmn

j 2 b�: �4�

Recalling AM2 starts at timet � tc; wheretc is the timewhen AM1 converges to {z�1�i �{ zmnj } �} : Starting from theinitial state {x�2�i �tc�} � { zmni z�1�i �{ zmnj } �} produced by theleft circle “ ^ ” shown in Fig. 1, AM2 changes the state{ x�2�i �t�} as follows:

h�2�i �t� �Xj±i

w�2�ij �x�2�j �t�2 b�1 b;

x�2�i �t 1 1� � sgn�h�2�i �t��:Thus, recalling in AM2 proceeds without any interactionswith AM1. This is also shown in Fig. 1 where the self-feed-back of AM2 has no interaction with AM1. Iterating thisdynamics, AM2 converges to {z�2�i �{ zmnj } �} which is thestable state when {zmni } is input to CASM1. At the sametime, CASM1 also converges to the stable state{ z�p�i �{ zmnj } �} � { z�1�i �{ zmnj } �z�2�i �{ zmnj } �} ; which is producedby the right circle “̂ ” shown in Fig. 1. Hence, the recal-ling process of CASM1 is divided into two stages, and its

network state {xi�t�} is written in the form

xi�t� �x�t�i if t , tc;

z�1�i �{ zmnj } �x�2�i �t� otherwise:

8<:When {jmni } is input to CASM1, AM1 and AM2 converge to{ z�1�i �{ jmnj } �} � { jmi } and {z�2�i �{ jmnj } �} � {hmn

i } ; respec-tively. Then, CASM1 converges to {z�p�i �{ jmnj } �} �{ jmi h

mni } � { jmni } ; which is just the recalling target.

3.2. CASM2

In this section, we point out some problems in CASM1,following which we present CASM2. Since AM1s in boththe models are the same, the description of AM1 is omitted.However, AM1 in CASM2 plays an important role in recal-ling, as discussed below.

The difference patterns {hmni } are biased and independent

of one another even ifm is the same. The problem of AM2in CASM1 is that all of the difference patterns {hmn

i } areuniformly stored in the same weight matrix as in Eq. (4),although hierarchically correlated patterns form a tree struc-ture: i.e. the descendants {jmni } �n � 1;…;P2� belonging tothe same ancestor {jmi } are distributed in the clustermwhose center is {jmi } : In other words, CASM1 does notretain the tree structure, and its memory structure is uniformin every level of the tree.

As easily predicted from the tree structure, the recallingof a descendant {jmni } will be improved by restricting thenetwork state {xi�t�} so as not to differ from the center of theclusterm in which the target descendant {jmni } exists. Thenetwork state {xi�t�} should be kept to be close to the ances-tor {jmi } to which the target {jmni } belongs. In order torealize this, we propose CASM2 shown in Fig. 2. Thepoint is that AM2 in CASM2 is a recurrent hetero-associa-tive memory which storesP1P2 input–output pairs{{ jmni } ; {hmn

i }} �m � 1;…;P1; n � 1;…;P2� in the weightmatrix {w�2�ij } as follows:

w�2�ij �1

�N 2 1��1 2 b2�Xmn

�hmni 2 b��jmnj 2 z�1�j �{ zmnk } �b�:

�5�where {z�1�i �{ zmnj } �} is identical to {jmi } from the assumption(2). Since AM2 is a hetero-associative memory, it is conve-nient to consider that AM2 has an input and an output layer,so thatw�2�ij can be regarded as the weight from an input unitj to an output onei.

In CASM2 as well as CASM1, recalling in AM2 starts attime t � tc: The initial state {x�2�i �tc�} in the input layer is thesame as an input pattern {zmni } ; and the input state {x�2�i �t�} ischanged as


w�2�ij �x�2�j �t�2 z�1�j �{ zmnk } �s�t��2 u�t�; �6�

y�2�i �t� � sgn�h�2�i �t��;


Fig. 2. Structure of CASM2 composed of two associative memories. AM1is an auto-associative memory storing {jmi } ; and AM2 is a recurrent hetero-associative memory associating {jmni } with { hmn

i } : When a target descen-dant {jmni } is input, AM1 first recalls its ancestor {jmi } : Then, the dynamicsof AM2 combines the ancestor {jmi } and the weight matrix to activate thecovariance matrix which is responsible for recalling only the differencepatterns {hmn

i } �n � 1;…;P2� originating from the recalled ancestor {jmi }and its descendants {jmni } �n � 1;…;P2�: This mechanism suppresses thecross-talk noise generated by the other difference patterns {jm

0i } �m 0 ±

m; n � 1;…;P2�; and the recalling ability is enhanced.

x�2�i �t 1 1� � z�1�i �{ zmnj } �y�2�i �t�; �7�whereu(t) is the threshold at timet, and {z�1�i �{ zmnj } �} isidentical to {jmi } from the assumption (2). The input state{ x�2�i �t�} shifted by {z�1�i �{ zmnj } �s�t�} is propagated to theoutput layer, and the state {y�2�i �t�} in the output layer iscalculated. As shown in Fig. 2, the output state {y�2�i �t�} isfed back through the ascending pathway which intersectswith the descending pathway from AM1 at the circle “^ ”.The function of the circle “̂ ” is to produce the next inputstate {x�2�i �t 1 1�} by combining the information on {y�2�i �t�}from the ascending pathway and {z�1�i �{ zmnj } �} �� { jmi } �from the descending one as in Eq. (7). This function forcesthe input state {x�2�i �t 1 1�} to be close to the center{ z�1�i �{ zmnj } �} �� { jmi } � of the cluster in which the recallingtarget {jmni } exists. Thus, AM1 and AM2 in CASM2 coop-erate to recall {jmni } ; in contrast to CASM1.

As well as CASM1, the recalling process of CASM2 isdivided into two stages, and its network state {xi�t�} isdefined by

xi�t� �x�1�i �t� if t , tc;

x�2�i �t� � z�1�i �{ zmnj } �y�2�i �t 2 1� otherwise:

8<:When {jmni } is input to CASM2, AM1 converges to{ z�1�i �{ zmnj } �} � { jmi } : In AM2, the weights are asymmetric,w�2�ij ± w�2�ji ; so that we cannot regard the recalling in AM2as a monotonically decreasing process in an energy func-tion. Although the convergence of the dynamics in AM2 isnot guaranteed, the input state {x�2�i �t�} approaches the targetdescendant {jmni } with some fluctuations, and substantially

converges to {z�2�i �{ zmnj } �} � { jmni } (the oscillation of a fewunits was often observed in our simulations).

To understand the convergence property of AM2 intui-tively, we rewrite Eq. (5) as

w�2�ij �Xm

jmj w�2�ij �m�;

w�2�ij �m� �1

�N 2 1��1 2 b2�Xn

�hmni 2 b��hmn

j 2 b�;

where we use the assumption (2) and the equality {jmni } �{ jmi h

mni } : The symmetric matrix {w�2�ij �m�} is responsible for

recalling only the P2 difference patterns {hmni } �n �

1;…;P2� originating from theP2 descendants {jmni } �n �1;…;P2� of the ancestor {jmi } ; and we call it the covariancematrix for the clusterm . The weight matrix {w�2�ij } is, thus, inthe form of a pile of the covariance matrices {w�2�ij �m�} �m �1;…;P1� weighted by the ancestors {jmj } : The differencepatterns {hmn

i } �n � 1;…;P2� of the clusterm are memor-ized separately from those of the other clustersm 0 �±m�:Using the covariance matrices {w�2�ij �m�} ; when {j11

i } isinput, Eq. (6) can be rewritten in the form


Xm

w�2�ij �m�jmj j1j �h11

j 2 b�1 b

�Xj±i

w�2�ij �1��h11j 2 b�1

Xm±1

Xj±i

w�2�ij �m�jmj j1j �h11

j 2 b�

1 b; (8)

where we puts�t� � u�t� � b: Since the ancestors {jmi } arerandom patterns and are independent of the other patternsthat appear in the second term, we assume that the secondterm is negligible. Then, we roughly reduce Eq. (8) to

h�2�i �t� <Xj±i

w�2�ij �1��h11j 2 b�1 b: �9�

Since the covariance matrix {w�2�ij �1�} for the cluster 1 issymmetric, AM2 substantially converges to a stable state{ z�2�i �{ j11

j } �} : Thus, the dynamics of AM2 works as a softweight switching mechanism to select a symmetric covar-iance matrix {w�2�ij �m�} by merging the asymmetric weightmatrix {w�2�ij } and the ancestor {jmi } recalled in AM1. Thus,AM2 in CASM2 closely communicates with AM1, incontrast to CASM1.

Roughly speaking, the memory structure of CASM2 ishierarchical. According toP1 ancestors {jmi } �m � 1;…;P1�;the learning algorithm of AM2 groupsP1P2 input–outputpairs {{jmni } ; {hmn

i }} �m � 1;…;P1; n � 1;…;P2� into P1

groups m �m � 1;…;P1�; each of which consists ofP2

input–output pairs {{jmni } ; {hmni }} �n � 1;…;P2� belonging

to the same ancestor {jmi } : This procedure organizes thehierarchical memory structure in the weight matrix {w�2�ij }where the groups are memorized separately. Fig. 3 shows aschematic diagram of recalling in CASM2. In recalling atarget descendant {jmni } ; AM1 recalls its ancestor {jmi } and


Fig. 3. Schematic of recalling in CASM2. CASM2 has a hierarchicalmemory structure, which is organized in the weight matrix of AM2. Theweight matrix is in the form of a pile of the covariance matrices {w�2�ij �m�}�m � 1;…;P1� weighted by the ancestors {jmj } : The difference patterns{hmn

i } �n � 1;…;P2� of the clusterm are memorized separately fromthose of the other clustersm 0 �±m�: In recalling a target descendant{ jmni } ; AM1 first recalls its ancestor {jmi } : Then, the dynamics of AM2combines the ancestor {jmi } and the weight matrix to activate the covar-iance matrix {w�2�ij �m�} : AM2 communicates with AM1 to recall the target{ jmni } :

provides it to AM2. Then, the dynamics of AM2 combinesthe recalled ancestor {jmi } and the weight matrix {w�2�ij } as inEqs. (8) and (9), activates the covariance matrix {w�2�ij �m�}for the clusterm dominantly, and recalls the target descen-dant {jmni } on the input layer.

4. Model comparisons and dynamic threshold

Under the assumption (2), AM2s in CASM1 and CASM2are compared using signal-to-noise ratio (S/N) analysis.Although AM1s in the two models have different roles asdiscussed above, they are the same under the S/N analysis.Furthermore, dynamic threshold is introduced to enlarge thesize of the basins of attraction.

When {z11i } given by Eq. (1) is input to CASM1, the

initial state of AM2 becomes {x�2�i �tc�} � {h11i ri} from the

assumption (2). Then, the inner state {h�2�i �tc�} is written inthe form

h�2�i �tc� � Si 1 R1i ;

Si � ch11i 1 b�1 2 c�;

R1i �Xm

R1mi ;

R1mi �

1�N 2 1��1 2 b2�

Xn±1

�hmni 2 b�

Xj±i

�hmnj 2 b��h11

j rj 2 b�

if m � 1;

1�N 2 1��1 2 b2�

Xn

�hmni 2 b�

Xj±1


j rj 2 b�

otherwise;

8>>>>>>>>><>>>>>>>>>:�10�

whereSi is the signal, andR1mi is the cross-talk noise gener-ated by the difference patterns {hmn

i } belonging to the sameancestor {jmi } :The total noiseR1i is the sum ofR1mi over allm .

Here, we assume thatN is large enough. Then,E�h11i ri� �

bc; since {ri} is a biased pattern whose bias value isc andindependent of {h11

i } : Taking this into account and usingEq. (3), the characteristics of CASM1 are summarized asfollows:

Si � ch11i 1 b�1 2 c�;

E�R1i� � 0;

V�R1i� � P1P2

N�1 2 2b2c 1 b2�:

If we put c� 1 �{ z11i } � { j11

i } �; the results are reduced touSi u � 1; E�R1i� � 0; andV�R1i� � P1P2�1 2 b2�=N: In thiscase, the storage capacity increases drastically with increas-ing correlationb, and is given byaC � ap

C=�1 2 b2�; whereap

C �< 0:15� is the storage capacity of standard associative

memory storing uncorrelated patterns (Okada et al., 1993;Okada, 1996).

In CASM2, the initial state {x�2�i �tc�} of the input layer inAM2 is identical to {z11

i } : Then, the inner state {h�2�i �tc�}becomes

h�2�i �tc� � Si 1 R2i ;

Si � ch11i 1 �u�t�2 bc�; �11�

R2i �Xm

R2mi ;

R2mi �

1�N 2 1��1 2 b2�

Xn±1

�hmni 2 b�

Xj±i


j rj 2 s�t��

if m � 1;

1�N 2 1��1 2 b2�

Xn

�hmni 2 b�

Xj±i

jmj j1j �hmn

j 2 b��h11j rj 2 s�t��

otherwise;

8>>>>>>>>><>>>>>>>>>:�12�

where we use the assumption (2) and the equality {jmni } �{ jmi h

mni } : If we fix the parameters as

s�t� � u�t� � b; �13�we get the same results of CASM1:

Si � ch11i 1 b�1 2 c�; �14�

E�R2i� � 0;

V�R2i� � P1P2

N�1 2 2b2c 1 b2�: �15�

Hence, the storage capacity of CASM2 under Eq. (13) alsoincreases with increasingb, and is given by aC �ap

C=�1 2 b2�: Thus, CASM1 and CASM2 under Eq. (13)have the same characteristics in the simple S/N analysiseven if Eq. (10) is different from Eq. (12) whenm ± 1:However, the difference is not negligible whenN is finite;the results of the following simulations underN � 1000show significant advantages of CASM2 with respect to thestorage capacity.

Here, we try to enlarge the size of the basins of attractionin CASM2, assuming that the cross-talk noiseR2i obeys aGaussian distribution with mean zero. This assumptionimplies that the performance in recalling might be enhancedas the minimum absolute value among the signalsSi �i �1;…;N� becomes larger (Okada, 1996). Under Eq. (13), theminimum absolute value is given by

uSi u � u 2 c 1 b�1 2 c�u; �16�where we assumec $ 0 and puth11

i � 21 in Eq. (14).Therefore, the units taking a value of21 are easily affectedby the cross-talk noiseR2i ; in comparison with those takinga value of11. The maximization of the minimum absolutevalue is realized by puttingu�t� � bc in Eq. (11). In this


case, the signals have the same absolute valueuSi u � ucu;irrespective of their signs. However, we cannot setu�t� tothe fixed valuebc, sincec is not constant along timet: insuccessful recalling,c increases with progression of therecalling and finally becomesc� 1: Hence, we need adynamic thresholdu�t� whose value takesbc at time t � tcand increases with increasing timet up tou�t� � b:

The dynamic thresholdu�t� is realized by the overlapbetween {z�1�i �{ zmnj } �} �� { jmi } � and {x�2�i �t�} : Since theinitial state in the input layer of AM2 is given by {x�2�i �tc�} �{ jmni ri} ; u�tc� becomesu�tc� � N21P

i jmi j

mni ri � bc: If the

recalling is successful, {x�2�i �t�} converges to {jmni } ; so thatu (t) becomesu�t� � b: If we also sets(t) to the same overlap

s�t� � u�t� � 1N

Xi

z�1�i �{ zmnj } �x�2�i �t� �1N

Xi

jmi x�2�i �t�;

�17�we get the following characteristics of CASM2:

uSi u � ucu; �18�

E�R2i� � 0;

V�R2i� � P1P2

N�1 2 b2c2�: �19�

Note that the sign ofc does not affect the behavior ofCASM2 under Eq. (17). In all of the values ofc exceptucu �1; the minimum absolute valueuSi u is larger than that givenby Eq. (16), and the varianceV�R2i� is smaller than thatgiven by Eq. (15). These results imply that CASM2 underEq. (17) has a higher recalling ability than CASM1 andCASM2 under Eq. (13).

5. Numerical simulations

We carried out numerical simulations to evaluate thestorage capacity and the size of the basins of attraction,whereN � 1000: In the following simulations, we assumedthat AM1 had already converged to {z�1�i �{ zmnj } �} � { jmi } :This means that we only examined the ability of AM2.

According to Hirahara et al. (1997), “successful recall”was defined by satisfying the following condition:

12N

Xi

ujmni 2 z�p�i �{ zmnj } �u # 0:1N�2�

N;

where N�2� �� N�1 2 b�=2� is the number of componentstaking a value of21 in {hmn

i } : Using Eq. (1), the relation-ship between the recalling target {jmni } and the noisy inputpattern {zmni } is expressed by the initial overlapm(0):

m�0� � 1N

Xi

jmni zmni � c:

5.1. Storage capacity

The dependence of the storage capacityaC on the corre-lation b was examined. In the following simulations,aC

was defined as the maximum value of the loading levela �; P1P2=N� below which the average number of failurerecalling over 10 trials was smaller than 1. Each trialinvolved presenting every descendant {jmni } �m�0� � 1�once to each model. We performed two types of simula-tions. The first type was carried out by incrementingP2

while P1 was fixed at 4. The loading levela rises graduallyby incrementingP2, and finally exceedsaC. In contrast, thesecond type was done by incrementingP1 while P2 wasfixed at 4.

Fig. 4 showsaC as a function ofb for P1 � 4 (the firsttype simulation) andP2 � 4 (the second type one). Thedotted and the solid lines are the results of AM2s inCASM1 and CASM2 under Eq. (13), respectively. In boththe models,aC increases sharply withb, which agrees withour prediction discussed above. Fig. 4 also demonstratesthat AM2 in CASM2 has a larger storage capacity thanthat in CASM1. Furthermore, the results of the first andthe second type simulations are almost the same inCASM1, while the result of the second type�P2 � 4� inCASM2 is better than that of the first type�P1 � 4�:These results are explained as follows. In the second type�P2 � 4�; the number of the ancestors,P1, is incrementedwhile the conditiona , aC is satisfied. This means that thefirst term in Eq. (8) remains the same, while the second termis changed. Since the second term is assumed to be negli-gible, an increase inP1 (the second type) has a smallerinfluence on CASM2 than that inP2 (the first one). Theresult of the second type in CASM2 was, therefore, betterthan that of the first type. In CASM1, all the differencepatterns {hmn

i } are uniformly stored in the weight matrixas in Eq. (4), so thataC does not depend on the ratio


Fig. 4. Dependence of the storage capacityaC on the correlationb, whereN � 1000: The dotted and the solid lines are the results obtained by AM2sin CASM1 and CASM2, respectively. The lines withP1 � 4 are the resultsof the simulation carried out by incrementingP2 while P1 was fixed at 4, andthe lines withP2 � 4 are the simulation results whenP2 was fixed at 4. Inboth the models,aC increases sharply withb. Moreover, AM2 in CASM2has a larger storage capacity than that in CASM1. The storage capacity ofAM2 in CASM2 becomes larger with increasing ratioP1/P2, while that ofAM2 in CASM1 does not depend on it.

P1/P2. Thus, the storage capacity of AM2 in CASM2 islarger than that in CASM1 and becomes larger with increas-ing ratio P1/P2. The change in parameter setting(Eq: �13� ! Eq: �17�) did not show any significant differ-ences in storage capacity (not shown in this paper).

5.2. Basins of attraction

We examined the dependence of the size of the basins ofattraction on the correlationb, whereP1 � 50 andP2 �4 �a � 0:2�: Fig. 5a shows the results of AM2 in CASM1,where the surface demonstrates the average probabilities ofsuccessful recall over 10 trials as a function ofm(0) andb.Each dot indicates that the average probability is above0.99. In AM2 of CASM1, the basins of attraction becomelarger asb decreases untilb� 0:76: When b , 0:76; afurther decrease inb sharply reduces the basin size. Thereason for this is due to the fact thataC becomes smallwith decreasingb.

Fig. 5b shows the results of AM2 in CASM2 under Eq.(13). The comparison between Fig. 5a and b in the rangeb $ 0:76 does not show any significant differences. Whenb , 0:76; AM2 in CASM2 has larger basins of attractionthan that in CASM1. This is because AM2 in CASM2 has alarger storage capacity than that in CASM1 as shown in Fig.4. Fig. 5c demonstrates the performance of AM2 in CASM2under Eq. (17), where the enlargement in the direction of theembedded noisec �� m�0��was also observed. This result iseasily predicted from the characteristics of AM2 in CASM2:in all of the values ofc except ucu � 1; the minimumabsolute signaluSi u given by Eq. (18) is larger than thatgiven by Eq. (16), and the varianceV�R2i� given by Eq.(19) is smaller than that given by Eq. (15).

For reference, we demonstrate the dynamical recallingprocess of CASM2 under Eq. (17) in Fig. 6. This figureshows a typical example of the time courses of the overlapm(t) between the input state {x�2�i �t�} and a recalling target


Fig. 5. Dependence of the size of the basins of attraction on the correlationb, whereN � 1000; P1 � 50 andP2 � 4 �a � 0:2�: The surfaces demon-strate the average probabilities of successful recall over 10 trials as a func-tion of m(0) andb, where (a)–(c) show the results obtained by AM2s inCASM1, CASM2 under Eq. (13) and CASM2 under Eq. (17), respectively.AM2 in CASM2 under Eq. (13) has the larger basins of attraction in thedirection ofb than that in CASM1. AM2 in CASM2 under Eq. (17) showsthe enlargement of the size of the basins of attraction in the direction of theembedded noisec �� m�0��:

Fig. 6. Time courses of the overlapm(t) between the input state {x�2�i �t�} ofAM2 in CASM2 and a recalling target {jmni } ; where N � 1000; P1 �50; P2 � 4 �a � 0:2� and b� 0:6: In this simulation, we assumed thatAM1 had already converged to {jmi } at time t � 0: From this figure, ifm�0� $ 0:12; AM2 recalled the target {jmni } : The line converging tom�t �∞� � 20:6 indicates that {x�2�i �t�} coincided with the reverse pattern of{ jmi } : Furthermore, the lines converging tom�t � ∞� � 0:356 and20.356 imply that AM2 recalled one of the other descendants {jmn

0i } �n 0 ±

n� belonging to the same ancestor {jmi } and its reverse pattern, respectively.

{ jmni } ; where we carried out the simulation underN �1000; P1 � 50; P2 � 4 �a � 0:2� and b� 0:6: We alsoassumed that AM1 had already converged to the ancestor{ jmi } of the target descendant {jmni } at time t � 0: From thisfigure, if m�0� $ 0:12; AM2 in CASM2 recalled the target{ jmni } ; although the lines starting fromm�0� $ 0:12 actuallyconverged tom�t � ∞� � 0:996: This indicates that theoutputs of the two units are in disagreement with the target{ jmni } : In CASM2 under Eq. (17) as well as that under Eq.(13), these phenomena were often observed in our simula-tion, and the frequency was about two times as large as thatof the correct recalling�m�t � ∞� � 1�: The oscillation of afew units was also observed. The line converging tom�t �∞� � 20:6 indicates that all components of the output state{ y�2�i �t�} took 21 and the input state {x�2�i �t�} coincided withthe reverse pattern of the ancestor {jmi } : Furthermore, thelines converging tom�t � ∞� � 0:356 and20.356 implythat AM2 recalled one of the other descendants {jmn

0i } �n 0 ±

n� belonging to the same ancestor {jmi } and its reversepattern, respectively. These recalling properties of AM2 inCASM2 including recalling of spurious states will bestudied in our future works.

6. Discussion

Although the simple S/N analysis shows that AM2s inCASM1 and CASM2 under Eq. (13) are the same, the simu-lation results shown in Fig. 4 demonstrate that AM2 inCASM2 has a larger storage capacity than that inCASM1. This enhancement of the storage capacity induces

the enlargement of the size of the basins of attraction in thedirection of correlationb as shown in Fig. 5b. Noise toleranceis also improved by setting the parameters as in Eq. (17), whilethe storage capacity is almost as same as that under Eq. (13). Inthis case (Fig. 5c), the basin size enlarges considerably in thedirection of the embedded noisec �� m�0��: This improve-ment is easily predicted from the results of the S/N analysisgiven by Eqs. (18) and (19).

Here, we try to compare the size of the basins ofattraction in CASM2 under Eq. (17) and auto-associa-tive memory with nonmonotone dynamics. Morita(1993) carried out numerical simulations under thesame condition �N � 1000; P1 � 50; P2 � 4 �a � 0:2��to evaluate the dependence of the critical overlapmC(b) on the correlationb, wheremC(b) is the minimumvalue of m(0) above which recalling ends in success.The dotted and the solid lines shown in Fig. 7 demon-strate mC(b) of Morita’s model and AM2 in CASM2,respectively. The data to draw the dotted line wereobtained from Morita (1993, Fig. 11). To draw thesolid line, we used Fig. 5c. For everyb, the minimumvalue of m(0), at which the dotted marks appeared onthe surface, was regarded asmC(b). Although AM2 inCASM2 have large basins of attraction, we cannot concludethat CASM2 has larger basins of attraction than Morita’smodel. One of the reasons is that the evaluation criterion todeterminemC(b) is not given in Morita (1993). The mostimportant reason is that, in all of the above simulations, weassumed that AM1 had already converged to {z�1�i �{ zmnj } �} �{ jmi } : This means that we only examined the ability of AM2.The upper limit of the storage and the recalling abilities ofCASM2 is bounded by that of AM1. Even if the basins ofattraction in AM2 are very large, the basin size in CASM2never exceed that in AM1. From the viewpoint of the storagecapacity, the maximum number of ancestors stored in thecurrent AM1 as equilibria is about 0.15N which is theupper limit ofP1.

In the case ofP2 � 50 which is the same condition ofthe above simulations, we can, however, compare thesize of the basins of attraction between Morita’s modeland CASM2 under Eq. (17), although we still have todeal with the difference in evaluation criterion to deter-mine mC(b). In this case, the loading level of AM1 is0.05, and the critical overlap of AM1 (standard associa-tive memory) is about 0.17 (Okada, 1996). Since theinitial overlap in AM1 is given byN21P

i jmi z

mni � bc;

actual critical overlap in AM1 becomes the function ofthe correlation b given by 0.17/b. The broken lineshown in Fig. 7 demonstrates the estimated criticaloverlaps in AM1, and crosses with the solid line ofAM2 at b < 0:66: The upper area bounded by thesolid and the broken lines can be regarded as the basinsof attraction in CASM2 under Eq. (17), and the bound-ary line gives the critical overlaps. From this figure, ifb , 0:72; Morita’s model has larger basins of attractionthan CASM2, although the basin size in AM2 is larger


Fig. 7. Comparison in critical overlapmC(b) between CASM2 and auto-associative memory with nonmonotone dynamics (Morita, 1993), whereN � 1000; P1 � 50; P2 � 4 �a � 0:2�: The dotted, the broken and thesolid lines demonstratemC(b) of Morita’s model, AM1 in CASM2 andAM2, respectively. The upper area bounded by the solid and the brokenlines can be regarded as the basins of attraction in CASM2, and the bound-ary line givesmC(b). If b , 0:72; Morita’s model has the larger basins ofattraction than CASM2, although the basin size in AM2 is larger than that inMorita’s model whenb , 0:6: If b . 0:72; CASM2 has the larger basins ofattraction than Morita’s model, and this tendency becomes more clear withincreasingb.

than that in Morita’s model whenb , 0:6: If b .0:72; CASM2 has larger basins of attraction thanMorita’s model, and this tendency increases withincreasingb. Thus, CASM2 works better than Morita’smodel when hierarchically correlated patterns have strongcorrelation.

In conclusion, we present a cascade associativememory model (CASM2) with a hierarchical memorystructure for storing hierarchically correlated patterns.CASM2 is characterized by the second associativememory (AM2) which is a hetero-associative memorywith an asymmetric weight matrix {w�2�ij } associatingdescendants {jmni } with difference patterns {hmn

i } : Thelearning algorithm of AM2 groupsP1P2 input–outputpairs {{jmni } ; {hmn

i }} �m � 1;…;P1; n � 1;…;P2� accord-ing to their ancestors {{jmi } �m � 1;…;P1�; and orga-nizes the hierarchical memory structure in the weightmatrix {w�2�ij } where the groups are stored separately.The asymmetric weight matrix {w�2�ij } is in the formof the sum of covariance matrices {w�2�ij �m�} �m �1;…;P1� weighted by ancestors {jmj } ; where each covar-iance matrix {w�2�ij �m�} is symmetric and is constructedby only the P2 difference patterns {hmn

i } �n � 1;…;P2�originating from theP2 descendants {jmni } �n � 1;…;P2�of each ancestor {jmi } : In recalling a target descendant{ jmni } ; AM1 first recalls its ancestor {jmi } : Then, thedynamics of AM2 with dynamic threshold combines theasymmetric matrix {w�2�ij } and the recalled ancestor {jmi } ;and works as a soft weight switching mechanism to activatethe symmetric covariance matrix {w�2�ij �m�} which is respon-sible for recalling only theP2 difference patterns {hmn

i } �n �1;…;P2� for the clusterm . This mechanism suppresses thecross-talk noise generated by the other difference patterns{hm 0n 0

i } �m 0 ± m; n � 1;…;P2�; and enables CASM2 tohave a larger storage capacity and larger basins of attractionthan the original model (CASM1). Furthermore, CASM2works better than the associative memory with nonmonotonedynamics proposed by Morita (1993) when hierarchicallycorrelated patterns have a strong correlation. In futureworks, we will further examine the recalling properties ofAM2 as discussed above. Furthermore, we will extendCASM2 to successively recall the descendants belonging tothe same ancestor.

Acknowledgements

This work was supported in part by the Information-Tech-nology Promotion Agency, Japan, for Advanced SoftwareEnrichment.

References

Amari, S. (1977). Neural theory of association and concept-formation.Biological Cybernetics, 26, 175–185.

Amari, S. (1989). Characteristics of sparsely encoded associative memory.Neural Networks, 2, 451–457.

Amari, S., & Maginu, K. (1988). Statistical neurodynamics of associativememory.Neural Networks, 1, 63–73.

Amit, D. J., Gutfreund, H., & Sompolinsky, H. (1987). Information storagein neural networks with low levels of activity.Physical Review A, 35,2293–2303.

Collier, N. (1997).Convergence time characteristics of anassociative memoryfor natural language processing.Proceedings of 15th InternationalJoint Conference on Artificial Intelligence, Nagoya(pp. 1106–1111).

Feigelman, M. V., & Ioffe, L. B. (1987). The augmented models of asso-ciative memory—asymmetric interaction and hierarchy of patterns.International Journal of Modern Physics B, 1, 51–68.

Gutfreund, H. (1988). Neural networks with hierarchically correlatedpatterns.Physical Review A, 37, 570–577.

Hirahara, M., Oka, N., & Kindo, T. (1997). Associative memory with asparse encoding mechanism for storing correlated patterns.NeuralNetworks, 10, 1627–1636.

Hopfield, J. J. (1982). Neural networks and physical systems with emergentcollective computational abilities.Proceedings of the National Acad-emy of Sciences USA, 79, 2554–2558.

Kakeya, H., & Kindo, T. (1996). Hierarchical concept formation in asso-ciative memory composed of neuro-window elements.NeuralNetworks, 9, 1095–1098.

Lim, E. P., & Cherkassky, V. (1992). Semantic networks and associativedatabases—two approaches to knowledge representation and reason-ing. IEEE Expert, 7, 31–40.

Morita, M. (1993). Associative memory with nonmonotone dynamics.Neural Networks, 6, 115–126.

Okada, M. (1995). A hierarchy of macrodynamical equations for associa-tive memory.Neural Networks, 8, 833–838.

Okada, M. (1996). Notions of associative memory and sparse coding.Neural Networks, 9, 1429–1458.

Okada, M., Mimura, K., & Kurata, K. (1993). Sparsely encoded associativememory: static synaptic noise and static threshold noise.Proceedings ofInternational Joint Conference on Neural Networks, Nagoya(pp.2624–2627).

Yoshizawa,S.,Morita,M.,&Amari,S.(1993).Capacityofassociativememoryusing a nonmonotonic neuron model.Neural Networks, 6, 115–126.


a cascade associative memory model with a hierarchical memory structure

Documents