valentin lecheval - diva portaluu.diva-portal.org/smash/get/diva2:742548/fulltext01.pdf · valentin...

IT 14 046

Examensarbete 30 hpAugusti 2014

On collective bandit behaviour

Valentin Lecheval

Institutionen för informationsteknologiDepartment of Information Technology

Teknisk- naturvetenskaplig fakultet UTH-enheten Besöksadress: Ångströmlaboratoriet Lägerhyddsvägen 1 Hus 4, Plan 0 Postadress: Box 536 751 21 Uppsala Telefon: 018 – 471 30 03 Telefax: 018 – 471 30 00 Hemsida: http://www.teknat.uu.se/student

Abstract

On collective bandit behaviour

Valentin Lecheval

The collective decision process of Gambusia affinis (the mosquitofish) is investigatedfrom the standpoint of online machine learning algorithms. A new algorithm, theCollaborative Exp3 algorithm, is derived from the adversarial bandits framework tomodel how groups of fish make collective decisions leading to consensus. Thanks tomaximum likelihood estimation, parameters are tuned and comparisons between dataand algorithm performances are addressed. This work provides promising results inthe scope of recovering information transfer within fish groups as well as tounderstand the individual mechanisms involved in the collective decision process. It isthe first published approach to connect online machine learning algorithms with data,hence bridging a gap between theory and biological practice.

Tryckt av: Reprocentralen ITCIT 14 046Examinator: Jarmo RantakokkoÄmnesgranskare: Richard MannHandledare: Kristiaan Pelckmans

Contents

1 Introduction 2

2 Model 4

2.1 Adversarial bandit problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.1.1 The Exp3 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.1.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.2 Collaborative bandits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.2.1 Biological assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.2.2 Formulation of the model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.2.3 Collection of data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.2.4 Fitting the model to data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3 Results 13

3.1 Tuning parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.2 Model properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

4 Discussion 24

A Upper bound on the weak regret 29

A.1 Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

A.2 Proof . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

B Code for adjacency matrix generation 32

1

Chapter 1

Introduction

Humans often face problems where a trade-off between exploration and exploitation is involved. For instance

an engineer aims to reduce costs of transportation of packets between two points in a communication network.

In such cases, we have to choose between gathering additional informations (at the cost of, maybe, obtaining

bad rewards) or exploiting known paths (at the cost of, maybe, not obtaining the best possible reward). To

solve this common situation, research in mathematics has provided the multi-armed bandit framework, with

algorithms performing optimal strategies (see section 2), with a lot of applications in industry (Bubeck and

Cesa-Bianchi, 2012).

Other animals are also regularly confronted with decision-making processes with exploration/exploitation

trade-off situations, and also have strategies to find optimal choices. One outstanding example is how animal

societies or groups make collective choices. Most of the time, the patterns displayed collectively arise in reaction

to external factors such as predation risks but they are also constrained by the behavioural mechanisms animals

use to stay together. These consist of multiple local interactions between individuals (e.g. individuals tend to

copy their neighbours’ choices or less direct mechanisms such as stigmergy1), so that the collective patterns

emerge from the complex coupling between the state of the group at a time and the decisions individuals make

in response - this is the self-organization principle (Sumpter, 2006). For instance, a lot of ant species, through

communication involving pheromones and/or tactile recruitment, are able to exploit food patches efficiently,

with mechanisms allowing flexibility and exploration (e.g following a pheromone track is a stochastic process and

workers can therefore find new food patches or shorter tracks even during exploitation of a rich spot) (Dussutour

et al., 2009; Dussutour and Nicolis, 2013). Self-organization, as defined above, is a decentralized paradigm and

does not require any leadership in animal groups - that is why biologists speak about swarm intelligence. The

challenging question for researchers in the animal collective behaviour field is therefore to find the individual

mechanisms and interactions controlling the collective patterns (Weitz et al., 2012). It has been reported that

size of groups can increase accuracy of the decision and speed of the decision-making processes in various species

(Sumpter and Pratt, 2009) such as cockroach Periplaneta americana (Canonge et al., 2011) or fish such as the

mosquitofish (Gambusia affinis) or the three-spined stickleback (Gasterosteus aculeatus) ((Ward et al., 2012,

2008)) - with mechanisms such as the quorum response. With respect to the decision-making processes, theses

1idea introduced by Pierre-Paul Grasse who was working on termites building behaviour. He brought the idea that individualscan efficiently communicate through local modifications of their environment (Grasse, 1959)

2

studies document how do consensus emerges without leadership.

In this work, I will use data published in (Ward et al., 2011) with mosquitofish. The set-up for these

experiments is as follow: groups of n = 1, 2, 4, 8 and 16 fish are submitted to a decision between two paths

of a Y-maze. One of this branch hides a (fake but recognizable by the mosquitofish) predator. According to

(Ward et al., 2011), the larger the group, the more individuals will concentrate their vigilance efforts on smaller

portions of their immediate habitat and use the available social information provided by their congeners. This

strategy will efficiently result in more accurate decisions . At the individual scale, this theory can be seen as a

trade-off: at each time step, either the fish exploits the information it has gathered by its own or from social

cues, or it explores its local environment to obtain new informations. This point of view relates nicely to the

multi-armed bandit framework, developed to address exactly such a problem. Whereas these algorithms are

commonly used to provide an optimal choice to the user, I aim here to investigate if the strategy used by fish

can reasonably be related to multi-armed bandits or not. This will provide quantitative and qualitative ideas

about the mechanisms involved in the information transfers and the accurate decisions in fish shoals. The idea

to use multi-armed bandits to address questions in animal behaviour (and even with three-spined sticklebacks)

is not new (McNamara and Houston, 1980, 1985; Thomas et al., 1985) but is restrained to individual behaviour

and learning processes - this latter not occurring in our motivating question.

The challenge of this work is to provide a multi-armed bandits model taking social cues and absence of

learning procedure into account. I propose a formulation as well as an implementation of a new strategy

motivated by biology and called the Collaborative bandits algorithm. I also provide the formulation of the

likelihood associated with the algorithm and its parameters. The biological relevance of this new model is then

discussed thanks to data fitting and simulations.

3

Chapter 2

Model

2.1 Adversarial bandit problem

A single fish is facing K branches (arms) potentially hosting a predator and has to choose one of them, each

branch (action) being denoted i ∈ {1, ...,K}. This situation is repeated with the same branches and the same

fish at several trials t = 1, 2, .... The choices of the fish are denoted It ∈ {1, ...,K} and the associated “feelings”

(gain or reward) gi,t ∈ [0, 1] assuming that gi,t = 0 denotes a very stressful/harmful experience and 1 a nice

one (this is the same as a loss of 1 when meeting the predator and a loss of 0 when avoiding it). It is assumed

that fish remembers the rewards of the previously chosen actions.

The choice of a theoretical model inspired by the multi-armed bandits framework and specifically derived

from the adversarial bandit problem (whose set-up is exactly the one described above) is leaded by three

particularities of the set-up.

1. The rewards (informations gathered by the choices of the fish) are presented in pieces over many rounds,

trial after trial t. Such sequential decision problems are treated within the online convex optimization

framework (Cesa-Bianchi and Lugosi, 2006).

2. At each trial t, the decision-maker receives only the reward gI,t associated with its choice It and not the

rewards gi,t of the other actions i 6= It. The multi-armed bandit problem is dedicated to solve this special

case, providing several formalizations, among which the stochastic and the adversarial1 (non-stochastic)

(see Cesa-Bianchi and Lugosi, 2006, chap. 6).

3. Each of these formalizations can be efficiently addressed by a specific playing strategy, the Upper Confi-

dence Bound (UCB) algorithm in the stochastic case and the Exp3 randomized algorithm in the adversarial

case (Bubeck and Cesa-Bianchi, 2012). In the stochastic case, the rewards gi,t are independently drawn

from unknown probability distributions νi associated with arms i. With fish, there is no obvious reasons

pleading for independent draws giving the rewards, that is I make the assumption that all fish choosing

the same arm i at time t would receive the same reward. I therefore address our multi-armed bandit

problem with the Exp3 playing strategy.

1The adversarial formalization takes its name because it allows arbitrary set new values of rewards at each trial t and evenreward values set adversely (i.e. against the success of the game)

4

2.1.1 The Exp3 Algorithm

Exp3 stands for Exponential weight algorithm for Exploration and Exploitation. Exponential weighting is a

widely used procedure in what (Arora et al., 2012) call the Multiplicative Weights framework. It works by

maintaining a list of weights for each of the actions, using these weights to decide randomly which action to

take next, and increasing (decreasing) the relevant weights when a reward is good (bad). A factor γ ∈]0, 1]

tunes the desire to pick an action uniformly at random, that is tunes the trade-off between exploration and

exploitation. If γ = 1, the weights have no effect on the choices at any step - the decision-maker performs only

exploration.

(Auer et al., 2002) show that choosing

γΓ = min{1,

√K lnK

(e− 1)Γ} (2.1)

with K the number of arms, for any T > 0 and assuming that given any Γ such as Γ > Gmax, Equation A.1

leads to the following bound (see proof in Appendix A):

Gmax − E[GExp3] ≤ 2√e− 1

√ΓK lnK ≤ 2.63

√ΓK lnK (2.2)

with Gmax−E[GExp3] the weak regret where Gmax = maxj

T∑t=1

gj(t) and E[GExp3] the expectation of the rewards

the Exp3 algorithm induces. The weak regret is an intuitive notion allowing to compare the results of the

algorithm to the best single action over all rounds. This is therefore the quantity I aim to minimize. I present

the pseudo-code of Exp3 as presented in (Auer et al., 2002) in Algorithm 1.

Algorithm 1 Exp3 algorithm - Pseudo-codeParameters: γ ∈]0, 1]

1: Initialize the weights wi(0) for i = 1, ...,K2: for t = 1, 2, ... do

3: Set di(t) = (1− γ) wi(t)∑Kj=1 wj(t)

+ γK for each i.

4: Draw the next action It randomly according to the distribution of di(t).5: Observe reward gIt(t).

6: Set the estimated reward gIt(t) =gIt (t)

dIt (t) .

7: Set wIt(t+ 1) = wIt(t)eγgIt (t)/K .

8: Set all other wj(t+ 1) = wj(t).9: end for

2.1.2 Implementation

Algorithm 1 is implemented with R (Team, 2014) with the following set-up (Listing 2.1). At each time step, the

fish faces K = 2 branches. Branch i = 1 is predator free while branch i = 2 hosts one. When the fish chooses

the branch hosting a predator, gi,t = 0. In the absence of predator, gi,t = 1. Results of chosen branches and

evolution of the probability distribution over time are shown Figure 2.1 for a simulation with 500 time steps and

different values of γ. With γ = 0.02, the branch 1 is rapidly favoured by the algorithm but branches 2 keeps

5

to be chosen from time to time even for the last time steps - there is still some exploration. When γ increases,

algorithm favours exploration on exploitation up to the extreme case where γ = 1 with no exploitation at all:

the chosen arms are drawn from a uniform distribution.

Listing 2.1: Exp3 Algorithm

1 exp3 <- function(arms_nb,gamma,t_max){

2 t_steps <- 1:t_max;

3 distribution <- matrix(NA,nrow=t_max+1,ncol=arms_nb);

4 weights <- rep(1,arms_nb);

5 arm_chosen <- rep(NA,t_max);

6 for(t_step in t_steps){

7 #print(t_step);

8 distribution[t_step,] <- (1 - gamma) * (weights / sum(weights)) + gamma / arms_nb;

9 arm_chosen[t_step] <- sample(1:arms_nb,1,prob=distribution[t_step,]);

10 reward <- reward_function(t_step,arm_chosen[t_step]);

11 reward_est <- reward / distribution[t_step,arm_chosen[t_step]];

12 weights[arm_chosen[t_step]] <- weights[arm_chosen[t_step]] * exp(gamma * reward_est / arms_nb)

;

13 weights[-arm_chosen[t_step]] <- weights[-arm_chosen[t_step]];

14 }

15 return(list(distribution,arm_chosen));

16 }

6

0 100 200 300 400 500

0.0

0.2

0.4

0.6

0.8

1.0

γ = 0.02

Time

Pro

babi

lity

Probability to pick Branch 1Probability to pick Branch 2

A

0 100 200 300 400 500

12

γ = 0.02

Time

Bra

nch

chos

en

B

0 100 200 300 400 500

0.0

0.2

0.4

0.6

0.8

1.0

γ = 0.5

Time

Pro

babi

lity


A

0 100 200 300 400 500

12

γ = 0.5

Time

Bra

nch

chos

enB

0 100 200 300 400 500

0.0

0.2

0.4

0.6

0.8

1.0

γ = 1

Time

Pro

babi

lity


A

0 100 200 300 400 500

12

γ = 1

Time

Bra

nch

chos

en

B

Figure 2.1: Results of three simulations of the Exp3 algorithm with T = 500 and for different valueof γ ∈ {0.02,0.5,1} with A probability to pick branch 1 or 2 with respect to the time and B the chosenbranches with respect to the time. When γ is small, the probability to pick the branch without predator is highand the densities of chosen arms are asymmetrical. For γ = 1, the distributions of chosen arms are identical.

7

2.2 Collaborative bandits

2.2.1 Biological assumptions

My biological model, according to (Ward et al., 2011) is that the detection of predators by mosquitofish is

facilitated by increasing group size thanks to social communication between shoal mates. The use of social

information enables individuals to respond to threats without having to verify the presence of danger indepen-

dently. Fish, when inspecting for predators, can rely on chemical substances or on visual cues either emitted

by the environment (e.g. odour of the predator or visual detection of the predator), or shared (intentionally

or not) by congeners (e.g. the chemical alarm substance diffusing from an injured fish or a fish escaping some

undetected stimuli with a strong flight behaviour). Unlike visual cues, chemical substances might be hard to

manipulate for a predator and therefore may be more reliable (Brown, 2003). However, as visuals cues are

likely to propagate much faster than any other cues through members of a shoal (Brown and Laland, 2003),

I assume that in the considered experiment from (Ward et al., 2011), fish only make use of visual cues either

private (direct detection of the predator) either public (monitoring the positions and the behaviours of their

shoal mates).

The information flow in a shoal can be altered by the structure of the group which depends on various

parameters such as the age, the sex or the numbers of the congeners (Hoare and Krause, 2003). It seems that

the more homogeneous the shoal is, the less the information flow is restricted to a sub-population within the

group. For instance, in adult guppies, it has been found that novel foraging information spreads at a significantly

faster rate through subgroups of females than of males (Reader and Laland, 2000).

It has also been reported that these parameters (e.g. the sex, the age or the number of individuals in a

group) can have an effect on the appeal for exploration of one fish, that is on the exploitation/exploration

trade-off of one fish. For instance, the tendency for exploration is expected to decrease in bigger groups because

conformity may make individuals unlikely to break with the group (Day et al., 2001).

In short, I assume that mosquitofish members of a group communicate through visual cues, with an informa-

tion flow allowed to depend on various parameters. These parameters can also have an effect on the propensity

of exploration of the individuals within a group. I now turn to the formulation of these assumptions.

2.2.2 Formulation of the model

From the previous set-up presented in section 2.1, I extend the formulation so to take information flow and

social structure in fish shoals into account. Several decision-makers are considered: n fish have to make a

decision among K paths. There are as many trials as fish in the group, that is t = 1, 2, ..., n. At each time

step, one fish makes a decision, shares informations with congeners and then disappears from the set-up. Social

communication links between fish in a shoal are represented in an adjacency matrix Ap,q of size n × n where

Ap,q = 1 if fish p shares informations to fish q and Ap,q = 0 otherwise. If Ap,q is a diagonal matrix, no information

is shared between individuals. If Ap,q = 1 for all p, q, each fish shares social cues with all its congeners.

This model induces only small modifications of Algorithm 1. The procedure is described in Algorithm 2 and

implemented in Listing 2.2.

Listing 2.2: Collaborative Exp3 Algorithm

8

Algorithm 2 Collaborative Exp3 algorithm - Pseudo-codeParameters: γ ∈]0, 1]

1: Set a fish pool F with n individuals2: Initialize the weights wi(0) for i = 1, ...,K for each individual in F3: Set the adjacency matrix Ap,q4: for t = 1, 2, ..., n do5: Pick one decision-maker f in F

6: Set di(t) = (1− γ)wf

i (t)∑Kj=1 w

fj (t)

+ γK for each i.

7: Draw the next action It randomly according to the distribution of di(t).8: Observe reward gIt(t).

9: Set the estimated reward gIt(t) =gIt (t)

dIt (t) .

10: Set wIt(t+ 1) = wIt(t)eγgIt (t)/K for each fish in {Af,q = 1}.

11: Set all other wj(t+ 1) = wj(t) for j 6= It.12: Update F = {F\f}13: end for

1 collexp3 <- function(arms_nb,gamma,neighbours){

2 ##neighbours is an adjancy matrix (undirected graph) modelling links between

3 ##fish with respect to communication of social cues.

4 ##time steps and number of fish are merged

5 fish_nb <- dim(neighbours)[1];

6 t_steps <- 1:fish_nb;

7 distribution <- matrix(NA,nrow=fish_nb,ncol=arms_nb);

8 ##Weights for arms_nb = 2 according to Ward, 2011

9 weights <- matrix(rep(c(0.85,0.15),fish_nb*arms_nb),

10 byrow=TRUE,nrow=fish_nb,ncol=arms_nb);

11 arm_chosen <- rep(NA,fish_nb);

12 fish_pool <- seq(from=1,to=fish_nb,by=1);

13 for(t_step in t_steps){

14 ##A fish is randomly selected from the pool of available decision makers

15 decision_maker <- sample(fish_pool,1);

16 distribution[t_step,] <- (1 - gamma) * (weights[decision_maker,] / sum(weights[decision_maker

,])) + gamma / arms_nb;

17 arm_chosen[t_step] <- sample(1:arms_nb,1,prob=distribution[t_step,]);

18 reward <- reward_function(t_step,arm_chosen[t_step]);

19 reward_est <- reward / distribution[t_step,arm_chosen[t_step]];

20 dm_neighbours <- which(neighbours[decision_maker,]==1);

21 weights[dm_neighbours,arm_chosen[t_step]] <- weights[decision_maker,arm_chosen[t_step]] * exp(

gamma * reward_est / arms_nb);

22 fish_pool <- fish_pool[-which(fish_pool==decision_maker)];

23 }

24 return(list(distribution,arm_chosen));

25 }

9

Table 2.1: Collective performances in decision-making per group size

Groupsize

Number of fish avoid-ing the predator

Number of fish meetingthe predator

Proportion of fishavoiding the predator

Proportion of fishmeeting the predator

1 60 48 0.56 0.442 19 7 0.73 0.274 37 14 0.71 0.298 95 15 0.85 0.1516 186 25 0.83 0.17

Figure 2.2: Group of 8 fish facing the bifurcation of the Y-maze One of the arm contains a fake butrecognizable predator while the other does not. Choice of each fish group is recorded as a sequence of chosenarms.

2.2.3 Collection of data

Data used in this work have been published in (Ward et al., 2011) and I report here a short description of

their experimental set-up. According to the article, experiments took place in a Y-maze constructed from white

Perspex (Figure 2.2) with mosquitofish obtained from Lake Northam, Sydney, Australia. The stem of the “Y”

was raised so that the water gradually increased in depth from 1 cm at the foot of the Y to 12 cm at the tips

of the arms. A replica predator, measuring 12 cm in length, was allocated to one of the arms of the Y-maze

at random and suspended in midwater using fine monofilament line. In pilot trials, the fish showed a strong

aversive response to the predator once they detected it. Experimental fish were added to a clear container set in

the stem of the Y. After 120 s the box was raised, releasing the fish. In all cases, the fish made their way down

the Y and into one of the arms. Five different group sizes: 1, 2, 4, 8, and 16 fish are used with respectively 108,

13, 13, 14 and 14 replications

Dataset consists in sequences of chosen arms for each experiments (e.g. 1 1 2 1 for a group of 4 fish

choosing successively branches 1, 1, 2 and 1 with branch 1 the clear branch and branch 2 the branch with the

fake predator). Some fish decline the decision-process (e.g. a fish swimming backward) and are excluded from

the analysis. The overall performances per group size are presented in Table 2.1.

10

2.2.4 Fitting the model to data

When playing the Collaborative bandits strategy introduced here, the sequence of arms chosen by the strategy

depends on the distribution introduced in Algorithm 2: di(t) = (1 − γ)wf

i (t)∑Kj=1 w

fj (t)

+ γK for each branch i. We

can therefore obtain the likelihood associated to a sequence of arms and a set of parameters θ by calculating

L(I1, I2, ..., It, ..., In; θ(t)) =

n∏t=1

dIt(t) (2.3)

Equation 2.3 can be used with a maximum likelihood estimation to fit the model to data (section 3.1). An

example of use is shown in Figure 2.3. Given a sequence, and thanks to the likelihood formula, I can recover

the group structure A the most likely to produce this sequence.

11

1112

0.0

0.2

0.4

0.6

0.8

1.0

1 2 3 4

43

21

Figure 2.3: Adjacency matrix recovered from maximum likelihood estimation given a sequence 1 1

1 2 The arm denoted 1 is predator free while the arm 2 contains a fake predator. γ = 0.5, w1(0) = w2(0) = 1,g1(t) = 1 ∀t and g2(t) = 0 ∀t. According to Equation 2.3, the likelihood of all possible adjacency matrices ofsize 4 × 4 are computed. We then sum all matrices maximizing the likelihood given the arms sequence andrepresent the result as the image above. The darker the case, the closer to 1 is the associated elements. Here,we can see that the first decision-maker is likely to have informed well the second fish and badly the fourth one.Elements of the lower triangular matrix (including diagonal elements) have no influence in the calculation ofthe likelihood since current decision-makers cannot communicate their findings to previous decision-makers.

12

Chapter 3

Results

3.1 Tuning parameters

Efficiency of the modelled shoal in making the good decision (i.e. avoid the predator) can be tuned by the

trade-off parameter γ, the value of the rewards, the number n of individuals, the initial weights, the quality of

information transfer (e.g. one could think about a parameter of reliability in information transfer but, for sake

of simplicity in this exploratory work, I do not consider it here and all interactions have the same importance)

and the structure of the fish shoal (i.e. respectively the values in and the shape of the adjacency matrix A).

In the experiments conducted by (Ward et al., 2011), I assume the age of fish to be controlled: only fish

of 26± 5 mm were used and there is a strong correlation between the size and the age (Bell and Foster, 1994,

chap. 5). The sex is not controlled because the authors have not reported any differences in accuracy of the

decision-making process between males and females. However, they have not tested the effect of the ratio of

males and females within a group: it might be that communication flow would be different as well as the appeal

for exploration of the group with respect to the male/female ratio.

I will assume here that differences between groups only occur because of the number of fish in each group. In

short, the number of fish can affect two different components of our model: the information flow as well as the

appeal for exploration that is the exploitation/exploration trade-off for the individuals of the group respectively

formalized as A and γ.

From the results reported in Table 2.1 for the group size n = 1, I can infer that fish are able to detect the

predator by their own, without use of social cues. Accordingly, I arbitrary set initial weights to w1(0) = 0.85

and w2(0) = 0.15 for branches 1 and 2. The associated rewards are also arbitrary set to g1,t = 1 and g2,t = 0 ∀t.The tuning of A and γ is addressed with a maximum likelihood estimation performed using Equation 2.3 over all

experiments involving the group size n. The point is, given E experiments (i.e. sequences of chosen branches)

with groups of n fish, to maximize the likelihood now expressed as

L(I1,1, ..., It,exp, ..., In,E ;A, γ) =

E∏exp=1

n∏t=1

dIt,exp(t) (3.1)

Two different methods are used to generate adjacency matrices A to be evaluated by the maximum likelihood

13

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

Groups of 2 fish

Number of interactions

γ

5.0e−08

1.0e−07

1.5e−07

2.0e−07

2.5e−07A

0 1 2 3 4 5 6

0.0

0.2

0.4

0.6

0.8

1.0

Groups of 4 fish

Number of interactionsγ

1e−13

2e−13

3e−13

4e−13

5e−13

B

0 5 10 15 20 25

0.0

0.2

0.4

0.6

0.8

1.0

Groups of 8 fish


γ

0.0e+00

5.0e−12

1.0e−11

1.5e−11

2.0e−11

2.5e−11

C

0 20 40 60 80 100 120

0.0

0.2

0.4

0.6

0.8

1.0

Groups of 16 fish


γ

0.0e+00

5.0e−13

1.0e−12

1.5e−12

2.0e−12

D

Figure 3.1: Maximum likelihood estimation of the adjacency matrix A and of the exploration/-exploitation parameter γ with A, B, C and D respectively the results for groups of n = 2, 4, 8 and 16fish. Likelihood minimum; maximum values are A 1.49 × 10−08; 2.65 × 10−07, B 3.55 × 10−15; 6.17 × 10−13,C 1.26 × 10−29; 2.77 × 10−11 and D 8.27 × 10−25; 2.21 × 10−12. The darker the grey shade, the more likelyis the pair (ι, γ). ι stands for the number of interactions reported in a given A (i.e. the number of elementsequal to 1). If a triplet (ι, γ, L) appears several times, the mean value of L involved is reported. To obtain thecomplete grid, linear extrapolation from the likelihood estimations is performed. A and B are obtained withthe method 1 described in section 3.1 whereas C and D are obtained with the method 2. γ evaluated lie in therange [0.01, 0.02, ..., 1]. The pair (A, γ) maximizing L for each group size is reported with a red cross. For n = 4fish, there are two different pairs maximizing the likelihood. These values can differ from the darker regionsbecause ι involved in the figures does not take the structure of A involved in the calculation of L into account.γ are respectively equal to 0.34, 0.44, 0.01 and 0.03.

14

0.0 0.2 0.4 0.6 0.8 1.0

0e+

001e

−33

2e−

333e

−33

4e−

335e

−33

6e−

33

n = 1

γ

L

0.0 0.2 0.4 0.6 0.8 1.0

5.0e

−08

1.0e

−07

1.5e

−07

2.0e

−07

2.5e

−07

n = 2

γ

L

0.0 0.2 0.4 0.6 0.8 1.0

0e+

001e

−13

2e−

133e

−13

4e−

135e

−13

n = 4

γ

L

0.0 0.2 0.4 0.6 0.8 1.0

0.0e

+00

5.0e

−12

1.0e

−11

1.5e

−11

2.0e

−11

2.5e

−11

n = 8

γ

L

0.0 0.2 0.4 0.6 0.8 1.0

0.0e

+00

5.0e

−13

1.0e

−12

1.5e

−12

2.0e

−12

n = 16

γ

L

5 10 15

0.0

0.2

0.4

0.6

0.8

Value maximizing the likelihood per group size

Group size

γ

Figure 3.2: Maximum likelihood estimation of the γ parameter with the adjacency matrix A set to1n×n. γ values estimated lie in the range [0.001, 0.002, ..., 1]. The maximum of the likelihood L are γ = 0.841,0.36, 0.445, 0.001 and 0.026 for respectively n = 1, 2, 4, 8 and 16 (reported in the last plot).

15

estimation.

For n = {2, 4}, I generate all possible upper triangular matrices (with diagonal elements set to 0) because

these are the only elements actually used by the algorithm, recalling that once the fish making the decision

has chosen a branch it disappears from the experiment (i.e. the following fish can not communicate with him).

According to this, the number of possible adjacency matrices can be found by calculating 2v with v the number

of elements in an upper triangular matrix of size n and not considering diagonal elements. v = 1, 6, 28 and 120

giving 2, 64, 268435456 and 1.329228×1036 possible adjacency matrices for respectively n = 2, 4, 8 and 16. The

possible binary combinations are then found by iterating an index j from 0 to 2v − 1 and taking j′ the binary

pattern of j with v bits. For instance, for n = 4, we have v = 6 so that 2v−1 = 63. j ∈ [0, 1, 2, 3, 4, ..., 62, 63] and

j′ ∈ [000000, 000001, 000010, 000011, 000100, ..., 111110, 111111]. I then reconstruct a matrix An×n by setting

all elements of the lower triangular matrix and of the diagonal to 0.

The number of permutations when n = 8 or 16 requires a second method to construct the matrices. Instead

of evaluating all possible adjacency matrix (which would be very time consuming compared to the benefits of

the expected results), I construct randomly adjacency matrices in which each fish shares information to 0%,

25%, 50%, 75% and 100% of the following decision makers and then I evaluate the likelihood of each one of

them. I repeat the procedure ten times to avoid biased results due to the random construction of the matrices.

γ values evaluated in the maximum likelihood estimation lie in the range [0.01, 0.02, ..., 0.99, 1]. The code

generating the matrices (method 1 as well as method 2) and evaluating the likelihoods is presented in Appendix

B.

According to the results of (Ward et al., 2011), I expect that the number of fish sharing informations is

important in the fish shoals (i.e. there is a high number of elements of the upper triangular adjacency matrix

that should be equal to 1) and that the greater the number of fish, the more decision-making processes rely on

informations emitted by congeners (i.e. when the number of fish in a group n increases, γ decreases, meaning

that exploitation is favoured to exploration). The results of the maximum likelihood estimation are consistent

with my expectations (Figure 3.1). Adjacency matrix maximizing the likelihood have almost all of their elements

in the upper triangular part equal to one (i.e. all decision makers share informations with their congeners).

This is particularly noticeable for larger groups (n = 8 and n = 16), even if the use of the second method

presented earlier can also explain this result. When it is not true (for n = 2 for instance), it can be seen that

the likelihood has almost the same value across the x-axis: in small groups the shoal structure is less important

to maximize the likelihood than the value of γ. The values of γ maximizing the likelihood are γ = 0.84, 0.34,

0.44, 0.01 and 0.03 for respectively n = 1, 2, 4, 8 and 16. Note that 0.01 is the smallest value of γ evaluated by

the maximum likelihood estimation, meaning that γ for n = 8 may be smaller.

As it seems that all fish are involved in the information flow even for larger group sizes, I set adjacency

matrices to 1n×n and evaluate again γ, now within the range [0.001, 0.002, ..., 1]. This gives γ = 0.841, 0.36,

0.445, 0.001 and 0.026 for respectively n = 1, 2, 4, 8 and 16 (Figure 3.2). These values are kept as tuned

parameters.

3.2 Model properties

The collaborative bandits algorithm produces results which are compatible with the data, thanks to the γ

parameter I have tuned for each group size (Figure 3.3). Unfortunately, given the tuned parameters as well as

16

the arbitrary set parameters, the model is not able to recover biological properties I have inferred in section

2.2.1. Even if the maximum likelihood estimation yields to consider only shoals where fish highly communicate,

simulations show that the increase of the performance in the decision-making process is not due to social

communication (Figure 3.4). The combination of γ values and of the initial value of weights arbitrary set to

w1(0) = 0.85 and w2(0) = 0.15 is sufficient to make the simulations compatible with data. To run simulations

for group sizes larger than those considered in our data (i.e. for n > 16), I extrapolate γ values with a non-

linear function f : n 7→ exp(α + βn) (Figure 3.5). Simulations are then run for n ∈ {1, 2, 4, 8, 16, 25, 50, 100}.It appears that the performance for large group sizes is bounded by the maximum value of the initial weights

w1(0) = 0.85 (Figure 3.6). The higher performances observed for larger group sizes are due to small values of

γ that make each decision-maker to rely only on their initial weights which are already containing the good

choice to make: no information flow is required.

My model highlights that the collective patterns we observe in this set-up do not imply that fish actually

communicate more and share the vigilance tasks in large groups. It could be that the more fish in the group,

the more fish rely on their own information to make their decision. However, this requires that all fish have

already good informations about the surroundings when making their choice and this is rather unlikely.

I repeat the entire procedure (tuning of the parameters and simulations) with two new sets of initial weights:

when they are slightly different (w1(0) = 0.556 and w2(0) = 0.444, the values observed experimentally when

n = 1) and when they are equal (w1(0) = w2(0) = 1). Given these new initial values, the Collaborative

bandits algorithm is able to recover biological properties inferred by previous studies such as the role of social

communication and of the group size in the improvement of accuracy in the decision-making processes (Figures

3.7 and 3.8, A, B and D). However, given these initial values, the simulations are not compatible with data

(Figures 3.7 and 3.8, C). It could also be that choosing new values for the rewards and keeping such initial

values for the weights the algorithm would be able to produce results compatible with data but this procedure

highlights the problem of models including too many free parameters (i.e. parameters we cannot measure on

data). With these free parameters (namely the rewards and the initial values of weights that I have arbitrary

set), it seems that the Collaborative bandits algorithm can be compatible with data according to several very

different underlying biological hypothesis. As long as these free parameters cannot be measured experimentally,

we cannot decide which of these biological hypotheses is actually implemented. It makes also difficult to assess

the actual contribution of this new algorithm.

17

1 2 4 8 16

Group size

Pro

port

ion

of fi

sh a

void

ing

the

pred

ator

0.0

0.2

0.4

0.6

0.8

1.0

Figure 3.3: Goodness of fit of the Collaborative Exp3 algorithm I report here the mean ± s.d. of theproportion of success recovered from data and a 95% confidence interval derived from 1000 simulations of theCollaborative Exp3 algorithm with tuned parameters (red lines). The fitted algorithm is able to recover mostof the patterns exhibited by data.

18

No social communication Social communication

0.4

0.5

0.6

0.7

0.8

0.9

1.0

Pro

port

ion

of fi

sh a

void

ing

the

pred

ator

Figure 3.4: Effect of social communication on decision-making performance as simulated with theCollaborative Exp3 algorithm I present here results of simulations with 1000 replications of a group ofn = 8 fish facing K = 2 two paths, one of those containing a predator. Simulations with Ap,q a diagonalmatrix are reported in No social communication boxplot whereas simulations with Ap,q = 1∀(p, q) are reportedin Social communication boxplot. γ = 0.001 and the reward git(t) = 1 when predator is absent and git(t) = 0when predator is present. With this choice of γ (estimated by the maximum likelihood procedure), the socialstructure has no effect on the performance.

19

0 20 40 60 80 100

0.0

0.2

0.4

0.6

0.8

Group size

γ

●

●

●

●

●

Figure 3.5: Extrapolation of γ for large group sizes (n > 16) I model γ = f(n) with f(.) the functionf : n 7→ exp(α + βn) (red line). α = 0.059, β = −0.332 are found thanks to the non-linear least squaresestimation performed by the R function nls. γ values estimated by the maximum likelihood estimation arereported with circles.

20

1 2 4 8 16 25 50 100

Group size

Pro

port

ion

of fi

sh a

void

ing

the

pred

ator

0.0

0.2

0.4

0.6

0.8

1.0

Figure 3.6: Effect of group size on decision-making performance as simulated by the Collaborativebandits algorithm I report here the mean ± s.d. of the proportion of success through simulations with 1000replications of groups of n ∈ {1, 2, 4, 8, 16, 25, 50, 100} fish facing K = 2 two paths, one of those containing apredator. Simulations are conducted with Ap,q = 1 ∀(p, q), γ = 0.841, 0.36, 0.445, 0.001, 0.026, 2.63 × 10−4,6.53 × 10−8 and 4.03 × 10−15 for respectively n = 1, 2, 4, 8, 16, 25, 50 and 100 (for the three last values, seeFigure 3.5). The rewards are set to git(t) = 1 when predator is absent and git(t) = 0 when predator is present.The red dashed line stands for a proportion of w1(0) = 0.85. The performance converges to the value of w1(0).

21

0 5 10 15 20 25

0.0

0.2

0.4

0.6

0.8

1.0

Groups of 8 fish


γ

0.0e+00

5.0e−20

1.0e−19

1.5e−19

2.0e−19

2.5e−19A

●


0.0

0.2

0.4

0.6

0.8

1.0

Pro

port

ion

of fi

sh a

void

ing

the

pred

ator

B

1 2 4 8 16

Group size

Pro

port

ion

of fi

sh a

void

ing

the

pred

ator

0.0

0.2

0.4

0.6

0.8

1.0

C

1 2 4 8 16 25 50 100

Group size

Pro

port

ion

of fi

sh a

void

ing

the

pred

ator

0.0

0.2

0.4

0.6

0.8

1.0

D

Figure 3.7: Summary of the results obtained with the Collaborative bandits algorithm by settinginitial weights slightly different Initial weights are set to w1(0) = 0.556 and w2(0) = 0.444 with A themaximum likelihood estimation of A and γ for n = 8 fish, B the effect of social communication with n = 8fish, C the performance of the algorithm with respect to data and D the results of 1000 simulations forn = 1, 2, 4, 8, 16, 25, 50, 100. See respectively Figures 3.1, 3.4, 3.3 and 3.6 for details. Given these initial weights,the collaborative bandits algorithm hardly recovers collective pattern seen in data whereas there is an effect ofsocial communication and of group size in the collective performance.

22

0 5 10 15 20 25

0.0

0.2

0.4

0.6

0.8

1.0

Groups of 8 fish


γ

0.0e+00

5.0e−21

1.0e−20

1.5e−20

A

●


0.0

0.2

0.4

0.6

0.8

1.0

Pro

port

ion

of fi

sh a

void

ing

the

pred

ator

B

1 2 4 8 16

Group size

Pro

port

ion

of fi

sh a

void

ing

the

pred

ator

0.0

0.2

0.4

0.6

0.8

1.0

C

1 2 4 8 16 25 50 100

Group size

Pro

port

ion

of fi

sh a

void

ing

the

pred

ator

0.0

0.2

0.4

0.6

0.8

1.0

D

Figure 3.8: Summary of the results obtained with the Collaborative bandits algorithm by settinginitial weights equal Initial weights are set to w1(0) = w2(0) = 1 with A the maximum likelihood estimationof A and γ for n = 8 fish, B the effect of social communication with n = 8 fish, C the performance of thealgorithm with respect to data and D the results of 1000 simulations for n = 1, 2, 4, 8, 16, 25, 50, 100. Seerespectively Figures 3.1, 3.4, 3.3 and 3.6 for details. Given these initial weights, the collaborative banditsalgorithm hardly recovers collective pattern seen in data whereas there is an effect of social communication andof group size in the collective performance.

23

Chapter 4

Discussion

Biology has already inspired researchers several optimization algorithms. One could for instance mention Ant

Colony Optimization or Ant Colony Routing algorithms (Bonabeau et al., 2000). It appears, thanks to this

work, that non-asymmetrical flow between the two fields would be worth to consider. Making the controversial

hypothesis (see the “adaptationist program”, (Gould and Lewontin, 1979) and (Mayr, 1983)) that the natural

selection has led the design of organisms to solve problems in an optimal way, it makes sense to connect together

models designed by researchers to optimally solve problems met by humans and biology.

Such a method (to adapt models developed in optimization or distributed control to biological problems)

implies some difficulties non-trivial to overcome. For instance, some of the parameters of these models can, as

in this work, be difficult or impossible to measure, making the production of new biological insights a tough

achievement. Even if I have shown that my algorithm can give results comparable to data observed at the

collective level, it is dangerous to see improvements in the understanding of decision-making process in the

mosquitofish, given the number of free parameters: the model can be validated under several opposed biological

hypothesis. For example, I have validated here an unlikely model where, in small groups, fish have a great

preference for one of the branch but they do not take it into account when making their decisions whereas in

large groups the choice is done according to these early preferences which are not due to social communication

— at least not as formulated in the model. With another set of arbitrary fixed parameters, one can validate

the expected assumptions where improved accuracy in decisions is due to effective social communication in

larger groups. These limits have already been expressed when trying to model collective motion, where a lot

of purely theoretical rules can produce the same patterns as observed at the collective level in animal groups,

confusing which cognitive rules are actually implemented in individuals (Weitz et al., 2012; Gautrais et al., 2012).

Nevertheless, two ideas are going to be investigated to improve the results observed setting initial weights to

w1(0) = 0.556 and w2(0) = 0.444. It could be that in experiments with one fish (n = 1) the actual probability

the fish chooses the branch without predator is higher than 0.556, for instance lying within the 95% confidence

interval [0.46, 0.65]. The model may also be formulated differently. The theoretical set-up formulated here

states that fish are taking one decision one by one, queuing in front of the bifurcation and are then removed

from the experiment. Another more realistic formulation allowing fish to share informations even when they

do not take a final decision (i.e. letting the algorithm have more iterations than fish in the group) has to be

addressed.

24

In addition to a framework implying that fish vote to allow consensus to emerge (Ward et al., 2008; Couzin

et al., 2011; Kao and Couzin, 2014), the algorithm presented here sheds light on new ideas that can be derived

from a weighted majority algorithm (Arora et al., 2012). These new thoughts need now to be confronted to

neuroscience and cognitive sciences to provide a new model resulting more on a data-driven approach than on

the theory-driven approach I have conducted here, according to the terminology and methodology proposed

in (Sumpter et al., 2012). In my opinion these further investigations are worth to be considered, given the

simple way to recover fish shoal structure the Collaborative bandits algorithm provides. Information transfer in

groups could then be simply investigated and would allow to address effects of several parameters such as sex

ratio in the group, age ratio, brain size... in the information flow of fish shoals. In this scope, it would also be

worth considering another parameter allowing relationships within the group to be more complex than binary

interactions (interactions / no interaction), so to obtain quantitative measures of the strength of communication

between individuals. Another interesting source of additional work lies in the possibility to integrate this model

in a spatial perspective where, for instance, decision-makers could communicate only to their direct neighbours:

it would then require adjacency matrices A to be time dependent.

In short, it is the first published approach to try to match those algorithms to data, hence bridging a gap

between theory and biological practice. I am confident this work can be used in a near future to explore in a

simple and elegant way information transfers as well as optimality in decision-making process of animal groups.

25

Bibliography

Sanjeev Arora, Elad Hazan, and Satyen Kale. The multiplicative weights update method: A meta-algorithm

and applications. Theory OF Computing, 8:121–164, 2012.

Peter Auer, Nicolo Cesa-Bianchi, Yoav Freund, and Robert E. Schapire. The non-stochastic multi-armed

bandit problem. SIAM Journal on Computing, 32(1):48–77, 2002. URL http://epubs.siam.org/doi/abs/

10.1137/S0097539701398375.

Michael A Bell and Susan Adlai Foster. The evolutionary biology of the threespine stickleback. Oxford University

Press, 1994.

Eric Bonabeau, Marco Dorigo, and Guy Theraulaz. Inspiration for optimization from social insect be-

haviour. Nature, 406(6791):39–42, 2000. URL http://www.nature.com/nature/journal/v406/n6791/abs/

406039a0.html.

Culum Brown and Kevin N. Laland. Social learning in fishes: a review. Fish and Fisheries, 4(3):280–288, 2003.

URL http://onlinelibrary.wiley.com/doi/10.1046/j.1467-2979.2003.00122.x/full.

Grant E. Brown. Learning about danger: chemical alarm cues and local risk assessment in prey fishes. Fish and

Fisheries, 4(3):227–234, 2003. URL http://onlinelibrary.wiley.com/doi/10.1046/j.1467-2979.2003.

00132.x/full.

Sebastien Bubeck and Nicolo Cesa-Bianchi. Regret analysis of stochastic and nonstochastic multi-armed ban-

dit problems. Foundations and Trends R© in Machine Learning, 5(1):1–122, 2012. ISSN 1935-8237, 1935-

8245. doi: 10.1561/2200000024. URL http://www.nowpublishers.com/product.aspx?product=MAL&doi=

2200000024.

Stephane Canonge, Jean-Louis Deneubourg, and Gregory Sempo. Group living enhances individual resources

discrimination: The use of public information by cockroaches to assess shelter quality. PLoS ONE, 6(6):

e19748, June 2011. ISSN 1932-6203. doi: 10.1371/journal.pone.0019748. URL http://dx.plos.org/10.

1371/journal.pone.0019748.

Nicolo Cesa-Bianchi and Gabor Lugosi. Prediction, Learning, and Games. Cambridge University Press, 2006.

URL http://dx.doi.org/10.1017/CBO9780511546921.

I. D. Couzin, C. C. Ioannou, G. Demirel, T. Gross, C. J. Torney, A. Hartnett, L. Conradt, S. A. Levin,

and N. E. Leonard. Uninformed individuals promote democratic consensus in animal groups. Science,

26

http://epubs.siam.org/doi/abs/10.1137/S0097539701398375

http://epubs.siam.org/doi/abs/10.1137/S0097539701398375

http://www.nature.com/nature/journal/v406/n6791/abs/406039a0.html

http://www.nature.com/nature/journal/v406/n6791/abs/406039a0.html

http://onlinelibrary.wiley.com/doi/10.1046/j.1467-2979.2003.00122.x/full



http://www.nowpublishers.com/product.aspx?product=MAL&doi=2200000024

http://www.nowpublishers.com/product.aspx?product=MAL&doi=2200000024

http://dx.plos.org/10.1371/journal.pone.0019748


http://dx.doi.org/10.1017/CBO9780511546921

334(6062):1578–1580, December 2011. ISSN 0036-8075, 1095-9203. doi: 10.1126/science.1210280. URL

http://www.sciencemag.org/cgi/doi/10.1126/science.1210280.

Rachel L. Day, Tom MacDonald, Culum Brown, Kevin N. Laland, and Simon M. Reader. Interactions be-

tween shoal size and conformity in guppy social foraging. Animal Behaviour, 62(5):917–925, November 2001.

ISSN 00033472. doi: 10.1006/anbe.2001.1820. URL http://linkinghub.elsevier.com/retrieve/pii/

S0003347201918202.

A. Dussutour, S. C. Nicolis, G. Shephard, M. Beekman, and D. J. T. Sumpter. The role of multiple pheromones

in food recruitment by ants. Journal of Experimental Biology, 212(15):2337–2348, July 2009. ISSN 0022-0949,

1477-9145. doi: 10.1242/jeb.029827. URL http://jeb.biologists.org/cgi/doi/10.1242/jeb.029827.

Audrey Dussutour and Stamatios C. Nicolis. Flexibility in collective decision-making by ant colonies: Tracking

food across space and time. Chaos, Solitons & Fractals, 50:32–38, May 2013. ISSN 09600779. doi: 10.1016/

j.chaos.2013.02.004. URL http://linkinghub.elsevier.com/retrieve/pii/S0960077913000325.

Jacques Gautrais, Francesco Ginelli, Richard Fournier, Stephane Blanco, Marc Soria, Hugues Chate, and Guy

Theraulaz. Deciphering interactions in moving animal groups. PLoS Computational Biology, 8(9):e1002678,

September 2012. ISSN 1553-7358. doi: 10.1371/journal.pcbi.1002678. URL http://dx.plos.org/10.1371/

journal.pcbi.1002678.

S. J. Gould and R. C. Lewontin. The spandrels of san marco and the panglossian paradigm: A critique

of the adaptationist programme. Proceedings of the Royal Society B: Biological Sciences, 205(1161):581–

598, September 1979. ISSN 0962-8452, 1471-2954. doi: 10.1098/rspb.1979.0086. URL http://rspb.

royalsocietypublishing.org/cgi/doi/10.1098/rspb.1979.0086.

Plerre-P. Grasse. La reconstruction du nid et les coordinations interindividuelles chez bellicositermes natalensis

et cubitermes sp. la theorie de la stigmergie: Essai d’interpretation du comportement des termites construc-

teurs. Insectes sociaux, 6(1):41–80, 1959. URL http://link.springer.com/article/10.1007/BF02223791.

D. J. Hoare and J. Krause. Social organisation, shoal structure and information transfer. Fish and Fisheries,

4(3):269–279, 2003. URL http://onlinelibrary.wiley.com/doi/10.1046/j.1467-2979.2003.00130.x/

full.

A. B. Kao and I. D. Couzin. Decision accuracy in complex environments is often maximized by small group

sizes. Proceedings of the Royal Society B: Biological Sciences, 281(1784):20133305–20133305, April 2014.

ISSN 0962-8452, 1471-2954. doi: 10.1098/rspb.2013.3305. URL http://rspb.royalsocietypublishing.

org/cgi/doi/10.1098/rspb.2013.3305.

Ernst Mayr. How to carry out the adaptationist program? American Naturalist, page 324–334, 1983.

John McNamara and Alasdair Houston. The application of statistical decision theory to animal behaviour. Jour-

nal of Theoretical Biology, 85(4):673–690, 1980. URL http://www.sciencedirect.com/science/article/

pii/0022519380902659.

John M. McNamara and Alasdair I. Houston. Optimal foraging and learning. Journal of Theoretical Biology,

117(2):231–249, 1985. URL http://www.sciencedirect.com/science/article/pii/S0022519385802198.

27

http://www.sciencemag.org/cgi/doi/10.1126/science.1210280

http://linkinghub.elsevier.com/retrieve/pii/S0003347201918202


http://jeb.biologists.org/cgi/doi/10.1242/jeb.029827


http://dx.plos.org/10.1371/journal.pcbi.1002678

http://dx.plos.org/10.1371/journal.pcbi.1002678

http://rspb.royalsocietypublishing.org/cgi/doi/10.1098/rspb.1979.0086


http://link.springer.com/article/10.1007/BF02223791





http://www.sciencedirect.com/science/article/pii/0022519380902659

http://www.sciencedirect.com/science/article/pii/0022519380902659

http://www.sciencedirect.com/science/article/pii/S0022519385802198

Simon M. Reader and Kevin N. Laland. Diffusion of foraging innovations in the guppy. Animal Behaviour, 60(2):

175–180, August 2000. ISSN 00033472. doi: 10.1006/anbe.2000.1450. URL http://linkinghub.elsevier.

com/retrieve/pii/S0003347200914507.

D. J. T. Sumpter. The principles of collective animal behaviour. Philosophical Transactions of the Royal Society

B: Biological Sciences, 361(1465):5–22, January 2006. ISSN 0962-8436, 1471-2970. doi: 10.1098/rstb.2005.

1733. URL http://rstb.royalsocietypublishing.org/cgi/doi/10.1098/rstb.2005.1733.

D. J. T. Sumpter, R. P. Mann, and A. Perna. The modelling cycle for collective animal behaviour. Interface

Focus, 2(6):764–773, December 2012. ISSN 2042-8898, 2042-8901. doi: 10.1098/rsfs.2012.0031. URL http:

//rsfs.royalsocietypublishing.org/cgi/doi/10.1098/rsfs.2012.0031.

D. J.T Sumpter and S. C Pratt. Quorum responses and consensus decision making. Philosophical Transactions

of the Royal Society B: Biological Sciences, 364(1518):743–753, March 2009. ISSN 0962-8436, 1471-2970. doi:

10.1098/rstb.2008.0204. URL http://rstb.royalsocietypublishing.org/cgi/doi/10.1098/rstb.2008.

0204.

R. Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical

Computing, Vienna, Austria, 2014. URL http://www.R-project.org/.

G. Thomas, A. Kacelnik, and J. Van Der Meulen. The three-spined stickleback and the two-armed bandit.

Behaviour, page 227–240, 1985. URL http://www.jstor.org/stable/4534444.

Ashley J. W. Ward, Jens Krause, and David J. T. Sumpter. Quorum decision-making in foraging fish shoals.

PLoS ONE, 7(3):e32411, March 2012. ISSN 1932-6203. doi: 10.1371/journal.pone.0032411. URL http:

//dx.plos.org/10.1371/journal.pone.0032411.

Ashley JW Ward, David JT Sumpter, Iain D. Couzin, Paul JB Hart, and Jens Krause. Quorum decision-

making facilitates information transfer in fish shoals. Proceedings of the National Academy of Sciences, 105

(19):6948–6953, 2008. URL http://www.pnas.org/content/105/19/6948.short.

Ashley JW Ward, James E Herbert-Read, David JT Sumpter, and Jens Krause. Fast and accurate decisions

through collective vigilance in fish shoals. Proceedings of the National Academy of Sciences, 108(6):2312–2315,

2011.

Sebastian Weitz, Stephane Blanco, Richard Fournier, Jacques Gautrais, Christian Jost, and Guy Theraulaz.

Modeling collective animal behavior with a cognitive perspective: A methodological framework. PLoS ONE,

7(6):e38588, June 2012. ISSN 1932-6203. doi: 10.1371/journal.pone.0038588. URL http://dx.plos.org/

10.1371/journal.pone.0038588.

28



http://rstb.royalsocietypublishing.org/cgi/doi/10.1098/rstb.2005.1733

http://rsfs.royalsocietypublishing.org/cgi/doi/10.1098/rsfs.2012.0031

http://rsfs.royalsocietypublishing.org/cgi/doi/10.1098/rsfs.2012.0031



http://www.R-project.org/

http://www.jstor.org/stable/4534444



http://www.pnas.org/content/105/19/6948.short



Appendix A

Upper bound on the weak regret

A.1 Theorem

(Auer et al., 2002) show that choosing

γΓ = min{1,

√K lnK

(e− 1)Γ} (A.1)

with K the number of arms, for any T > 0 and assuming that given any Γ such as Γ > Gmax, Equation A.1

leads to the following bound:

Gmax − E[GExp3] ≤ 2√e− 1

√ΓK lnK ≤ 2.63

√ΓK lnK (A.2)

with Gmax−E[GExp3] the weak regret where Gmax = maxj

T∑t=1

gj(t) and E[GExp3] the expectation of the rewards

the Exp3 algorithm induces.

A.2 Proof

Let gi,t ∈ [0, 1] being the rewards of action i ∈ [1,K] at time t ∈ [0, T ] and Gmax − E[GExp3] the weak regret

where Gmax = maxj

T∑t=1

gj(t) and E[GExp3] the expectation of the rewards the Exp3 algorithm induces. From

the definitions presented section 2.1, the following useful equalities can be derived:

gi(t) ≤ 1/di(t) ≤ K/γ (A.3a)

K∑i=1

di(t)gi(t) = dit(t)git(t)

dit(t)= git(t) (A.3b)

K∑i=1

di(t)gi(t)2 = dit(t)

git(t)

dit(t)git(t) ≤ git(t) =

K∑i=1

gi(t) (A.3c)

29

Let Wt = w1(t) + ...+ wK(t). For all sequences i1, ..., iT of actions drawn by the Exp3 algorithm, we have:

Wt+1

Wt=

K∑i=1

wi(t+ 1)

Wt

=

K∑i=1

wi(t)

Wtexp

( γKgi(t)

)=

K∑i=1

di(t)− γ/K1− γ

exp( γKgi(t)

)(A.4a)

≤K∑i=1

di(t)− γ/K1− γ

[1 +

γ

Kgi(t) + (e− 2)

( γKgi(t)

)2](A.4b)

≤ 1 +γ/K

1− γ

K∑i=1

di(t)gi(t) +(e− 2)(γ/K)2

1− γ

K∑i=1

di(t)gi(t)2

≤ 1 +γ/K

1− γgit(t) +

(e− 2)(γ/K)2

1− γ

K∑i=1

gi(t) (A.4c)

(A.4a): di(t) = (1− γ) wi(t)∑Kj=1 wj(t)

+ γK as defined in Algorithm 1

(A.4b): because ex ≤ 1 + x+ (e− 2)x2 for x ≤ 1 and gi(t) ≤ Kγ ⇔ gi(t)

γK ≤ 1 from (A.3a)

(A.4c): combine (A.3b) and (A.3c)

Taking logarithms and using 1 + x ≤ ex gives

lnWt+1

Wt≤ γ/K

1− γgit(t) +

(e− 2)(γ/K)2

1− γ

K∑i=1

gi(t)

Summing over t we then get

lnWT+1

W1≤ γ/K

1− γGExp3 +

(e− 2)(γ/K)2

1− γ

T∑t=1

K∑i=1

gi(t) (A.5)

For any action j,

lnWT+1

W1≥ ln

wj(T + 1)

W1=

γ

K

T∑t=1

gj(t)− lnK

Combining with (A.5), we get

GExp3 ≥ (1− γ)

T∑t=1

gj(t)−K lnK

γ− (e− 2)

γ

K

T∑t=1

K∑i=1

gi(t) (A.6)

We next take the expectation of both sides of (A.6) with respect to the distribution of 〈i1, ..., iT 〉. For the

30

expected value of each gi(t), we have:

E[gi(t)|i1, ..., it−1] = E[di(t).

gi(t)

di(t)+ (1− di(t)).0

]= gi(t) (A.7)

Combining (A.6) and (A.7), we find that

E[GExp3] ≥ (1− γ)

T∑t=1

gj(t)−K lnK

γ− (e− 2)

γ

K

T∑t=1

K∑i=1

gi(t)

Since j was chosen arbitrarily andT∑t=1

K∑i=1

gi(t) ≤ KGmax

we obtain the inequality in the statement of the equation A.2 by choosing γΓ = min{1,√

K lnK(e−1)Γ}.

31

Appendix B

Code for adjacency matrix generation

Listing B.1: R code implementing the two methods to generate adjacency matrices and to evaluate their

likelihood with respect to data

1 ##This file is used for maximum likelihood estimation

2 ##of both the adjacency matrix and gamma

3 ##Two methods are used / proposed depending on your machine

4 ##For computation time reasons, we provide the second method

5 ##for group sizes of 8 and 16 fish. They won’t evaluate the

6 ##likelihood of each possible combination of adjacency matrix

7 ##but will generate some of them with criteria of "% of fish

8 ##communicating".

9

10 ##Dependencies

11 library(R.utils)##for intToBin() used in method 1

12

13 ##Parameters

14 n <- c(1,2,4,8,16);

15 arms_nb <- 2;

16 gamma_sequences <- seq(from=0.01,to=1,by=0.01);

17 nb_gamma <- length(gamma_sequences);

18

19 ##Build the dataset used for the likelihood estimation

20 load("bygroupsize_list.rda");##This file contains all sequences of chosen arms

21 ##replace 0 by 2

22 for(list_index in 1:5){

23 bygroupsize_list[[list_index]] <- replace(bygroupsize_list[[list_index]],

24 bygroupsize_list[[list_index]] == 0,2);

25 }

26

27 ##Set-up 1 : 2 arms, one without predator, the other with a predator

32

28 reward_function <- function(t_step,arm_chosen){

29 if(arm_chosen == 1){

30 return(1);

31 } else {

32 return(0);

33 }

34 }

35

36 ##Method 1 for group sizes of n = 2 and n = 4

37 ##Create all matrices for n = 2 and n = 4

38 ##To find number of permutations to be computed for upper triangular

39 ##matrices:

40 # a <- 0

41 # for(i in 1:(n-1)){

42 # a <- a + i;

43 # }

44 # print(a)

45 n_permuted <- c(NA,1,6,28,120);

46 allmatrices_2_4 <- list();

47

48 for(shoalsize_index in c(2,3)){

49 fish_nb <- n[shoalsize_index];

50 n_permuted_current <- n_permuted[shoalsize_index];

51 allmatrices_current <- matrix(rep(NA, fish_nb**2),ncol=fish_nb**2);

52 for(perm_index in 0:(2**n_permuted_current-1)){

53 current_perm <- as.integer(unlist(strsplit(intToBin(perm_index),NULL)));

54 current_perm <- c(rep(0,n_permuted_current-length(current_perm)),current_perm);

55 groups_factor <- NULL;

56 for(row_index in 1:(fish_nb-1)){

57 groups_factor <- c(groups_factor,rep(row_index,fish_nb-row_index));

58 }

59 splitted_perms <- split(current_perm,groups_factor);


61 splitted_perms[[row_index]] <- c(rep(0,row_index),

62 splitted_perms[[row_index]]);

63 }

64 current_perm <- c(unlist(splitted_perms,groups_factor),rep(0,fish_nb));

65 allmatrices_current <- rbind(allmatrices_current,current_perm);

66 }

67 allmatrices_2_4 <- c(allmatrices_2_4,list(allmatrices_current[-1,]));

68 }

69

70 detachPackage("R.utils");

71

33

72 for(groupsize_index in c(2,3)){

73 fish_nb <- n[groupsize_index];

74 ##Exclude fish swimming back in data

75 sequences_datamatrix <- bygroupsize_list[[groupsize_index]];

76 excluded_rows <- unique(arrayInd(which(sequences_datamatrix==-1),

77 dim(sequences_datamatrix)[1]));

78 if(length(excluded_rows)>0){

79 sequences_datamatrix <- sequences_datamatrix[-excluded_rows,];

80 }

81 ##Looping over adj_matrices

82 nb_adjmat <- dim(allmatrices_2_4[[groupsize_index-1]])[1];

83 likelihoods_df <- data.frame(n=NA,likelihood=NA,adj_matrix=NA,nb_interacting=NA,gamma=NA);

84 for(adjmatrix_index in 1:nb_adjmat){

85 nb_fish_sharing <- sum(allmatrices_2_4[[groupsize_index-1]][adjmatrix_index,]);

86 neighbours <- matrix(allmatrices_2_4[[groupsize_index-1]][adjmatrix_index,],

87 nrow=fish_nb,byrow=TRUE);

88 if(fish_nb > 1){

89 nb_experiments <- dim(sequences_datamatrix)[1];

90 } else {

91 nb_experiments <- length(sequences_datamatrix);

92 }

93 for(gamma_index in 1:nb_gamma){

94 likelihood <- NULL;

95 gamma <- gamma_sequences[gamma_index];

96 for(experiment_index in 1:nb_experiments){

97 if(fish_nb > 1){

98 chosenarms_sequence <- unlist(sequences_datamatrix[experiment_index,]);

99 } else {

100 chosenarms_sequence <- unlist(sequences_datamatrix[experiment_index]);

101 }

102 weights <- matrix(rep(c(0.85,0.15),fish_nb*2),

103 byrow=TRUE,nrow=fish_nb,ncol=2);

104 for(fish in 1:fish_nb){

105 decision_maker <- fish;

106 arm_chosen <- chosenarms_sequence[fish];

107 distribution <- (1 - gamma) * (weights[decision_maker,arm_chosen]

108 / sum(weights[decision_maker,])) + gamma / arms_nb;

109 reward <- reward_function(fish,arm_chosen);

110 reward_est <- reward / distribution;


112 weights[dm_neighbours,arm_chosen] <-

113 weights[decision_maker,arm_chosen] * exp(gamma * reward_est / arms_nb);

114 likelihood <- c(likelihood,distribution);

115 }

34

116 }

117 likelihoods_df <- rbind(likelihoods_df,c(fish_nb,prod(likelihood),

118 adjmatrix_index,nb_fish_sharing,gamma));

119 }

120 }

121 write.table(likelihoods_df[-1,],

122 file=paste("optim_gamma_adjmat/likelihoods_adjmatxgamma_",fish_nb,".csv",sep=""),

123 sep=",",row.names=FALSE);

124 }

125

126 #Method 2 for group sizes of n = 8 and n = 16 fish.

127 proportions_adjmat <- c(0,0.25,0.50,0.75,1);

128 nb_adjmat <- length(proportions_adjmat);

129 nb_simu <- 10;

130

131 ##/!\ we just look n = 8 and n = 16 - some hard coded stuff in the def of de df

132 ##replace each 2 by length(n) for generalities

133

134 likelihood_df <- data.frame(id=seq(from=1,to=2*nb_gamma*nb_adjmat),

135 n=rep(c(8,16),each=nb_gamma*nb_adjmat),

136 likelihood=rep(0,2*nb_gamma*nb_adjmat),adj_matrix=rep(1:nb_adjmat,each=nb_gamma,2),

137 nb_interacting=rep(NA,2*nb_gamma*nb_adjmat),

138 gamma=rep(gamma_sequences,2*nb_adjmat));

139

140 for(simulation_index in 1:nb_simu){

141 count_row <- 0;

142 for(groupsize_index in 4:length(n)){

143 fish_nb <- n[groupsize_index];

144 sequences_datamatrix <- bygroupsize_list[[groupsize_index]];

145 excluded_rows <- unique(arrayInd(which(sequences_datamatrix==-1),

146 dim(sequences_datamatrix)[1]));

147 if(length(excluded_rows)>0){

148 sequences_datamatrix <- sequences_datamatrix[-excluded_rows,];

149 }

150 ##Generating the adjacency matrix

151 for(proportions_index in 1:nb_adjmat){

152 triang_values <- NULL;

153 for(triang_index in 1:(fish_nb-1)){

154 nb_1 <- round(proportions_adjmat[proportions_index]*triang_index);

155 nb_0 <- triang_index - nb_1;

156 triang_values <- c(triang_values,

157 sample(c(rep(1,nb_1),rep(0,nb_0)),triang_index));

158 }

159 triang_values <- rev(triang_values);

35

160 groups_factor <- NULL;


162 groups_factor <- c(groups_factor,rep(row_index,fish_nb-row_index));

163 }

164 splitted_perms <- split(triang_values,groups_factor);


166 splitted_perms[[row_index]] <- c(rep(0,row_index),

167 splitted_perms[[row_index]]);

168 }

169 neighbours <- c(unlist(splitted_perms,groups_factor),rep(0,fish_nb));

170 nb_interacting <- sum(neighbours);

171 neighbours <- matrix(neighbours,nrow=fish_nb,byrow=TRUE);

172 if(fish_nb > 1){

173 nb_experiments <- dim(sequences_datamatrix)[1];

174 } else {

175 nb_experiments <- length(sequences_datamatrix);

176 }

177 likelihood_pergamma <- rep(NULL,nb_gamma);

178 for(gamma_index in 1:nb_gamma){

179 likelihood <- NULL;

180 gamma <- gamma_sequences[gamma_index];

181 for(experiment_index in 1:nb_experiments){

182 if(fish_nb > 1){

183 chosenarms_sequence <- unlist(sequences_datamatrix[experiment_index,]);

184 } else {

185 chosenarms_sequence <- unlist(sequences_datamatrix[experiment_index]);

186 }

187 weights <- matrix(rep(c(0.85,0.15),fish_nb*2),

188 byrow=TRUE,nrow=fish_nb,ncol=2);

189 for(fish in 1:fish_nb){

190 decision_maker <- fish;

191 arm_chosen <- chosenarms_sequence[fish];

192 distribution <- (1 - gamma) * (weights[decision_maker,arm_chosen]

193 / sum(weights[decision_maker,])) + gamma / arms_nb;

194 reward <- reward_function(fish,arm_chosen);

195 reward_est <- reward / distribution;


197 weights[dm_neighbours,arm_chosen] <-

198 weights[decision_maker,arm_chosen] * exp(gamma * reward_est / arms_nb);

199 likelihood <- c(likelihood,distribution);

200 }

201 }

202 count_row <- count_row + 1;

203 likelihood_df$likelihood[count_row] <-

36

204 likelihood_df$likelihood[count_row] +prod(likelihood);

205 likelihood_df$nb_interacting[count_row] <-

206 nb_interacting;

207 }

208 }

209 }

210 }

211 likelihood_df$likelihood <- likelihood_df$likelihood / nb_simu;

212 save(likelihood_df,file="optim_gamma_adjmat/likelihoods_biggroups.RData");

37

valentin lecheval - diva portaluu.diva-portal.org/smash/get/diva2:742548/fulltext01.pdf · valentin...

Documents