valentin lecheval - diva portaluu.diva-portal.org/smash/get/diva2:742548/fulltext01.pdf · valentin...
TRANSCRIPT
IT 14 046
Examensarbete 30 hpAugusti 2014
On collective bandit behaviour
Valentin Lecheval
Institutionen för informationsteknologiDepartment of Information Technology
Teknisk- naturvetenskaplig fakultet UTH-enheten Besöksadress: Ångströmlaboratoriet Lägerhyddsvägen 1 Hus 4, Plan 0 Postadress: Box 536 751 21 Uppsala Telefon: 018 – 471 30 03 Telefax: 018 – 471 30 00 Hemsida: http://www.teknat.uu.se/student
Abstract
On collective bandit behaviour
Valentin Lecheval
The collective decision process of Gambusia affinis (the mosquitofish) is investigatedfrom the standpoint of online machine learning algorithms. A new algorithm, theCollaborative Exp3 algorithm, is derived from the adversarial bandits framework tomodel how groups of fish make collective decisions leading to consensus. Thanks tomaximum likelihood estimation, parameters are tuned and comparisons between dataand algorithm performances are addressed. This work provides promising results inthe scope of recovering information transfer within fish groups as well as tounderstand the individual mechanisms involved in the collective decision process. It isthe first published approach to connect online machine learning algorithms with data,hence bridging a gap between theory and biological practice.
Tryckt av: Reprocentralen ITCIT 14 046Examinator: Jarmo RantakokkoÄmnesgranskare: Richard MannHandledare: Kristiaan Pelckmans
Contents
1 Introduction 2
2 Model 4
2.1 Adversarial bandit problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1.1 The Exp3 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Collaborative bandits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2.1 Biological assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2.2 Formulation of the model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2.3 Collection of data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2.4 Fitting the model to data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3 Results 13
3.1 Tuning parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.2 Model properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4 Discussion 24
A Upper bound on the weak regret 29
A.1 Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
A.2 Proof . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
B Code for adjacency matrix generation 32
1
Chapter 1
Introduction
Humans often face problems where a trade-off between exploration and exploitation is involved. For instance
an engineer aims to reduce costs of transportation of packets between two points in a communication network.
In such cases, we have to choose between gathering additional informations (at the cost of, maybe, obtaining
bad rewards) or exploiting known paths (at the cost of, maybe, not obtaining the best possible reward). To
solve this common situation, research in mathematics has provided the multi-armed bandit framework, with
algorithms performing optimal strategies (see section 2), with a lot of applications in industry (Bubeck and
Cesa-Bianchi, 2012).
Other animals are also regularly confronted with decision-making processes with exploration/exploitation
trade-off situations, and also have strategies to find optimal choices. One outstanding example is how animal
societies or groups make collective choices. Most of the time, the patterns displayed collectively arise in reaction
to external factors such as predation risks but they are also constrained by the behavioural mechanisms animals
use to stay together. These consist of multiple local interactions between individuals (e.g. individuals tend to
copy their neighbours’ choices or less direct mechanisms such as stigmergy1), so that the collective patterns
emerge from the complex coupling between the state of the group at a time and the decisions individuals make
in response - this is the self-organization principle (Sumpter, 2006). For instance, a lot of ant species, through
communication involving pheromones and/or tactile recruitment, are able to exploit food patches efficiently,
with mechanisms allowing flexibility and exploration (e.g following a pheromone track is a stochastic process and
workers can therefore find new food patches or shorter tracks even during exploitation of a rich spot) (Dussutour
et al., 2009; Dussutour and Nicolis, 2013). Self-organization, as defined above, is a decentralized paradigm and
does not require any leadership in animal groups - that is why biologists speak about swarm intelligence. The
challenging question for researchers in the animal collective behaviour field is therefore to find the individual
mechanisms and interactions controlling the collective patterns (Weitz et al., 2012). It has been reported that
size of groups can increase accuracy of the decision and speed of the decision-making processes in various species
(Sumpter and Pratt, 2009) such as cockroach Periplaneta americana (Canonge et al., 2011) or fish such as the
mosquitofish (Gambusia affinis) or the three-spined stickleback (Gasterosteus aculeatus) ((Ward et al., 2012,
2008)) - with mechanisms such as the quorum response. With respect to the decision-making processes, theses
1idea introduced by Pierre-Paul Grasse who was working on termites building behaviour. He brought the idea that individualscan efficiently communicate through local modifications of their environment (Grasse, 1959)
2
studies document how do consensus emerges without leadership.
In this work, I will use data published in (Ward et al., 2011) with mosquitofish. The set-up for these
experiments is as follow: groups of n = 1, 2, 4, 8 and 16 fish are submitted to a decision between two paths
of a Y-maze. One of this branch hides a (fake but recognizable by the mosquitofish) predator. According to
(Ward et al., 2011), the larger the group, the more individuals will concentrate their vigilance efforts on smaller
portions of their immediate habitat and use the available social information provided by their congeners. This
strategy will efficiently result in more accurate decisions . At the individual scale, this theory can be seen as a
trade-off: at each time step, either the fish exploits the information it has gathered by its own or from social
cues, or it explores its local environment to obtain new informations. This point of view relates nicely to the
multi-armed bandit framework, developed to address exactly such a problem. Whereas these algorithms are
commonly used to provide an optimal choice to the user, I aim here to investigate if the strategy used by fish
can reasonably be related to multi-armed bandits or not. This will provide quantitative and qualitative ideas
about the mechanisms involved in the information transfers and the accurate decisions in fish shoals. The idea
to use multi-armed bandits to address questions in animal behaviour (and even with three-spined sticklebacks)
is not new (McNamara and Houston, 1980, 1985; Thomas et al., 1985) but is restrained to individual behaviour
and learning processes - this latter not occurring in our motivating question.
The challenge of this work is to provide a multi-armed bandits model taking social cues and absence of
learning procedure into account. I propose a formulation as well as an implementation of a new strategy
motivated by biology and called the Collaborative bandits algorithm. I also provide the formulation of the
likelihood associated with the algorithm and its parameters. The biological relevance of this new model is then
discussed thanks to data fitting and simulations.
3
Chapter 2
Model
2.1 Adversarial bandit problem
A single fish is facing K branches (arms) potentially hosting a predator and has to choose one of them, each
branch (action) being denoted i ∈ {1, ...,K}. This situation is repeated with the same branches and the same
fish at several trials t = 1, 2, .... The choices of the fish are denoted It ∈ {1, ...,K} and the associated “feelings”
(gain or reward) gi,t ∈ [0, 1] assuming that gi,t = 0 denotes a very stressful/harmful experience and 1 a nice
one (this is the same as a loss of 1 when meeting the predator and a loss of 0 when avoiding it). It is assumed
that fish remembers the rewards of the previously chosen actions.
The choice of a theoretical model inspired by the multi-armed bandits framework and specifically derived
from the adversarial bandit problem (whose set-up is exactly the one described above) is leaded by three
particularities of the set-up.
1. The rewards (informations gathered by the choices of the fish) are presented in pieces over many rounds,
trial after trial t. Such sequential decision problems are treated within the online convex optimization
framework (Cesa-Bianchi and Lugosi, 2006).
2. At each trial t, the decision-maker receives only the reward gI,t associated with its choice It and not the
rewards gi,t of the other actions i 6= It. The multi-armed bandit problem is dedicated to solve this special
case, providing several formalizations, among which the stochastic and the adversarial1 (non-stochastic)
(see Cesa-Bianchi and Lugosi, 2006, chap. 6).
3. Each of these formalizations can be efficiently addressed by a specific playing strategy, the Upper Confi-
dence Bound (UCB) algorithm in the stochastic case and the Exp3 randomized algorithm in the adversarial
case (Bubeck and Cesa-Bianchi, 2012). In the stochastic case, the rewards gi,t are independently drawn
from unknown probability distributions νi associated with arms i. With fish, there is no obvious reasons
pleading for independent draws giving the rewards, that is I make the assumption that all fish choosing
the same arm i at time t would receive the same reward. I therefore address our multi-armed bandit
problem with the Exp3 playing strategy.
1The adversarial formalization takes its name because it allows arbitrary set new values of rewards at each trial t and evenreward values set adversely (i.e. against the success of the game)
4
2.1.1 The Exp3 Algorithm
Exp3 stands for Exponential weight algorithm for Exploration and Exploitation. Exponential weighting is a
widely used procedure in what (Arora et al., 2012) call the Multiplicative Weights framework. It works by
maintaining a list of weights for each of the actions, using these weights to decide randomly which action to
take next, and increasing (decreasing) the relevant weights when a reward is good (bad). A factor γ ∈]0, 1]
tunes the desire to pick an action uniformly at random, that is tunes the trade-off between exploration and
exploitation. If γ = 1, the weights have no effect on the choices at any step - the decision-maker performs only
exploration.
(Auer et al., 2002) show that choosing
γΓ = min{1,
√K lnK
(e− 1)Γ} (2.1)
with K the number of arms, for any T > 0 and assuming that given any Γ such as Γ > Gmax, Equation A.1
leads to the following bound (see proof in Appendix A):
Gmax − E[GExp3] ≤ 2√e− 1
√ΓK lnK ≤ 2.63
√ΓK lnK (2.2)
with Gmax−E[GExp3] the weak regret where Gmax = maxj
T∑t=1
gj(t) and E[GExp3] the expectation of the rewards
the Exp3 algorithm induces. The weak regret is an intuitive notion allowing to compare the results of the
algorithm to the best single action over all rounds. This is therefore the quantity I aim to minimize. I present
the pseudo-code of Exp3 as presented in (Auer et al., 2002) in Algorithm 1.
Algorithm 1 Exp3 algorithm - Pseudo-codeParameters: γ ∈]0, 1]
1: Initialize the weights wi(0) for i = 1, ...,K2: for t = 1, 2, ... do
3: Set di(t) = (1− γ) wi(t)∑Kj=1 wj(t)
+ γK for each i.
4: Draw the next action It randomly according to the distribution of di(t).5: Observe reward gIt(t).
6: Set the estimated reward gIt(t) =gIt (t)
dIt (t) .
7: Set wIt(t+ 1) = wIt(t)eγgIt (t)/K .
8: Set all other wj(t+ 1) = wj(t).9: end for
2.1.2 Implementation
Algorithm 1 is implemented with R (Team, 2014) with the following set-up (Listing 2.1). At each time step, the
fish faces K = 2 branches. Branch i = 1 is predator free while branch i = 2 hosts one. When the fish chooses
the branch hosting a predator, gi,t = 0. In the absence of predator, gi,t = 1. Results of chosen branches and
evolution of the probability distribution over time are shown Figure 2.1 for a simulation with 500 time steps and
different values of γ. With γ = 0.02, the branch 1 is rapidly favoured by the algorithm but branches 2 keeps
5
to be chosen from time to time even for the last time steps - there is still some exploration. When γ increases,
algorithm favours exploration on exploitation up to the extreme case where γ = 1 with no exploitation at all:
the chosen arms are drawn from a uniform distribution.
Listing 2.1: Exp3 Algorithm
1 exp3 <- function(arms_nb,gamma,t_max){
2 t_steps <- 1:t_max;
3 distribution <- matrix(NA,nrow=t_max+1,ncol=arms_nb);
4 weights <- rep(1,arms_nb);
5 arm_chosen <- rep(NA,t_max);
6 for(t_step in t_steps){
7 #print(t_step);
8 distribution[t_step,] <- (1 - gamma) * (weights / sum(weights)) + gamma / arms_nb;
9 arm_chosen[t_step] <- sample(1:arms_nb,1,prob=distribution[t_step,]);
10 reward <- reward_function(t_step,arm_chosen[t_step]);
11 reward_est <- reward / distribution[t_step,arm_chosen[t_step]];
12 weights[arm_chosen[t_step]] <- weights[arm_chosen[t_step]] * exp(gamma * reward_est / arms_nb)
;
13 weights[-arm_chosen[t_step]] <- weights[-arm_chosen[t_step]];
14 }
15 return(list(distribution,arm_chosen));
16 }
6
0 100 200 300 400 500
0.0
0.2
0.4
0.6
0.8
1.0
γ = 0.02
Time
Pro
babi
lity
Probability to pick Branch 1Probability to pick Branch 2
A
0 100 200 300 400 500
12
γ = 0.02
Time
Bra
nch
chos
en
B
0 100 200 300 400 500
0.0
0.2
0.4
0.6
0.8
1.0
γ = 0.5
Time
Pro
babi
lity
Probability to pick Branch 1Probability to pick Branch 2
A
0 100 200 300 400 500
12
γ = 0.5
Time
Bra
nch
chos
enB
0 100 200 300 400 500
0.0
0.2
0.4
0.6
0.8
1.0
γ = 1
Time
Pro
babi
lity
Probability to pick Branch 1Probability to pick Branch 2
A
0 100 200 300 400 500
12
γ = 1
Time
Bra
nch
chos
en
B
Figure 2.1: Results of three simulations of the Exp3 algorithm with T = 500 and for different valueof γ ∈ {0.02,0.5,1} with A probability to pick branch 1 or 2 with respect to the time and B the chosenbranches with respect to the time. When γ is small, the probability to pick the branch without predator is highand the densities of chosen arms are asymmetrical. For γ = 1, the distributions of chosen arms are identical.
7
2.2 Collaborative bandits
2.2.1 Biological assumptions
My biological model, according to (Ward et al., 2011) is that the detection of predators by mosquitofish is
facilitated by increasing group size thanks to social communication between shoal mates. The use of social
information enables individuals to respond to threats without having to verify the presence of danger indepen-
dently. Fish, when inspecting for predators, can rely on chemical substances or on visual cues either emitted
by the environment (e.g. odour of the predator or visual detection of the predator), or shared (intentionally
or not) by congeners (e.g. the chemical alarm substance diffusing from an injured fish or a fish escaping some
undetected stimuli with a strong flight behaviour). Unlike visual cues, chemical substances might be hard to
manipulate for a predator and therefore may be more reliable (Brown, 2003). However, as visuals cues are
likely to propagate much faster than any other cues through members of a shoal (Brown and Laland, 2003),
I assume that in the considered experiment from (Ward et al., 2011), fish only make use of visual cues either
private (direct detection of the predator) either public (monitoring the positions and the behaviours of their
shoal mates).
The information flow in a shoal can be altered by the structure of the group which depends on various
parameters such as the age, the sex or the numbers of the congeners (Hoare and Krause, 2003). It seems that
the more homogeneous the shoal is, the less the information flow is restricted to a sub-population within the
group. For instance, in adult guppies, it has been found that novel foraging information spreads at a significantly
faster rate through subgroups of females than of males (Reader and Laland, 2000).
It has also been reported that these parameters (e.g. the sex, the age or the number of individuals in a
group) can have an effect on the appeal for exploration of one fish, that is on the exploitation/exploration
trade-off of one fish. For instance, the tendency for exploration is expected to decrease in bigger groups because
conformity may make individuals unlikely to break with the group (Day et al., 2001).
In short, I assume that mosquitofish members of a group communicate through visual cues, with an informa-
tion flow allowed to depend on various parameters. These parameters can also have an effect on the propensity
of exploration of the individuals within a group. I now turn to the formulation of these assumptions.
2.2.2 Formulation of the model
From the previous set-up presented in section 2.1, I extend the formulation so to take information flow and
social structure in fish shoals into account. Several decision-makers are considered: n fish have to make a
decision among K paths. There are as many trials as fish in the group, that is t = 1, 2, ..., n. At each time
step, one fish makes a decision, shares informations with congeners and then disappears from the set-up. Social
communication links between fish in a shoal are represented in an adjacency matrix Ap,q of size n × n where
Ap,q = 1 if fish p shares informations to fish q and Ap,q = 0 otherwise. If Ap,q is a diagonal matrix, no information
is shared between individuals. If Ap,q = 1 for all p, q, each fish shares social cues with all its congeners.
This model induces only small modifications of Algorithm 1. The procedure is described in Algorithm 2 and
implemented in Listing 2.2.
Listing 2.2: Collaborative Exp3 Algorithm
8
Algorithm 2 Collaborative Exp3 algorithm - Pseudo-codeParameters: γ ∈]0, 1]
1: Set a fish pool F with n individuals2: Initialize the weights wi(0) for i = 1, ...,K for each individual in F3: Set the adjacency matrix Ap,q4: for t = 1, 2, ..., n do5: Pick one decision-maker f in F
6: Set di(t) = (1− γ)wf
i (t)∑Kj=1 w
fj (t)
+ γK for each i.
7: Draw the next action It randomly according to the distribution of di(t).8: Observe reward gIt(t).
9: Set the estimated reward gIt(t) =gIt (t)
dIt (t) .
10: Set wIt(t+ 1) = wIt(t)eγgIt (t)/K for each fish in {Af,q = 1}.
11: Set all other wj(t+ 1) = wj(t) for j 6= It.12: Update F = {F\f}13: end for
1 collexp3 <- function(arms_nb,gamma,neighbours){
2 ##neighbours is an adjancy matrix (undirected graph) modelling links between
3 ##fish with respect to communication of social cues.
4 ##time steps and number of fish are merged
5 fish_nb <- dim(neighbours)[1];
6 t_steps <- 1:fish_nb;
7 distribution <- matrix(NA,nrow=fish_nb,ncol=arms_nb);
8 ##Weights for arms_nb = 2 according to Ward, 2011
9 weights <- matrix(rep(c(0.85,0.15),fish_nb*arms_nb),
10 byrow=TRUE,nrow=fish_nb,ncol=arms_nb);
11 arm_chosen <- rep(NA,fish_nb);
12 fish_pool <- seq(from=1,to=fish_nb,by=1);
13 for(t_step in t_steps){
14 ##A fish is randomly selected from the pool of available decision makers
15 decision_maker <- sample(fish_pool,1);
16 distribution[t_step,] <- (1 - gamma) * (weights[decision_maker,] / sum(weights[decision_maker
,])) + gamma / arms_nb;
17 arm_chosen[t_step] <- sample(1:arms_nb,1,prob=distribution[t_step,]);
18 reward <- reward_function(t_step,arm_chosen[t_step]);
19 reward_est <- reward / distribution[t_step,arm_chosen[t_step]];
20 dm_neighbours <- which(neighbours[decision_maker,]==1);
21 weights[dm_neighbours,arm_chosen[t_step]] <- weights[decision_maker,arm_chosen[t_step]] * exp(
gamma * reward_est / arms_nb);
22 fish_pool <- fish_pool[-which(fish_pool==decision_maker)];
23 }
24 return(list(distribution,arm_chosen));
25 }
9
Table 2.1: Collective performances in decision-making per group size
Groupsize
Number of fish avoid-ing the predator
Number of fish meetingthe predator
Proportion of fishavoiding the predator
Proportion of fishmeeting the predator
1 60 48 0.56 0.442 19 7 0.73 0.274 37 14 0.71 0.298 95 15 0.85 0.1516 186 25 0.83 0.17
Figure 2.2: Group of 8 fish facing the bifurcation of the Y-maze One of the arm contains a fake butrecognizable predator while the other does not. Choice of each fish group is recorded as a sequence of chosenarms.
2.2.3 Collection of data
Data used in this work have been published in (Ward et al., 2011) and I report here a short description of
their experimental set-up. According to the article, experiments took place in a Y-maze constructed from white
Perspex (Figure 2.2) with mosquitofish obtained from Lake Northam, Sydney, Australia. The stem of the “Y”
was raised so that the water gradually increased in depth from 1 cm at the foot of the Y to 12 cm at the tips
of the arms. A replica predator, measuring 12 cm in length, was allocated to one of the arms of the Y-maze
at random and suspended in midwater using fine monofilament line. In pilot trials, the fish showed a strong
aversive response to the predator once they detected it. Experimental fish were added to a clear container set in
the stem of the Y. After 120 s the box was raised, releasing the fish. In all cases, the fish made their way down
the Y and into one of the arms. Five different group sizes: 1, 2, 4, 8, and 16 fish are used with respectively 108,
13, 13, 14 and 14 replications
Dataset consists in sequences of chosen arms for each experiments (e.g. 1 1 2 1 for a group of 4 fish
choosing successively branches 1, 1, 2 and 1 with branch 1 the clear branch and branch 2 the branch with the
fake predator). Some fish decline the decision-process (e.g. a fish swimming backward) and are excluded from
the analysis. The overall performances per group size are presented in Table 2.1.
10
2.2.4 Fitting the model to data
When playing the Collaborative bandits strategy introduced here, the sequence of arms chosen by the strategy
depends on the distribution introduced in Algorithm 2: di(t) = (1 − γ)wf
i (t)∑Kj=1 w
fj (t)
+ γK for each branch i. We
can therefore obtain the likelihood associated to a sequence of arms and a set of parameters θ by calculating
L(I1, I2, ..., It, ..., In; θ(t)) =
n∏t=1
dIt(t) (2.3)
Equation 2.3 can be used with a maximum likelihood estimation to fit the model to data (section 3.1). An
example of use is shown in Figure 2.3. Given a sequence, and thanks to the likelihood formula, I can recover
the group structure A the most likely to produce this sequence.
11
1112
0.0
0.2
0.4
0.6
0.8
1.0
1 2 3 4
43
21
Figure 2.3: Adjacency matrix recovered from maximum likelihood estimation given a sequence 1 1
1 2 The arm denoted 1 is predator free while the arm 2 contains a fake predator. γ = 0.5, w1(0) = w2(0) = 1,g1(t) = 1 ∀t and g2(t) = 0 ∀t. According to Equation 2.3, the likelihood of all possible adjacency matrices ofsize 4 × 4 are computed. We then sum all matrices maximizing the likelihood given the arms sequence andrepresent the result as the image above. The darker the case, the closer to 1 is the associated elements. Here,we can see that the first decision-maker is likely to have informed well the second fish and badly the fourth one.Elements of the lower triangular matrix (including diagonal elements) have no influence in the calculation ofthe likelihood since current decision-makers cannot communicate their findings to previous decision-makers.
12
Chapter 3
Results
3.1 Tuning parameters
Efficiency of the modelled shoal in making the good decision (i.e. avoid the predator) can be tuned by the
trade-off parameter γ, the value of the rewards, the number n of individuals, the initial weights, the quality of
information transfer (e.g. one could think about a parameter of reliability in information transfer but, for sake
of simplicity in this exploratory work, I do not consider it here and all interactions have the same importance)
and the structure of the fish shoal (i.e. respectively the values in and the shape of the adjacency matrix A).
In the experiments conducted by (Ward et al., 2011), I assume the age of fish to be controlled: only fish
of 26± 5 mm were used and there is a strong correlation between the size and the age (Bell and Foster, 1994,
chap. 5). The sex is not controlled because the authors have not reported any differences in accuracy of the
decision-making process between males and females. However, they have not tested the effect of the ratio of
males and females within a group: it might be that communication flow would be different as well as the appeal
for exploration of the group with respect to the male/female ratio.
I will assume here that differences between groups only occur because of the number of fish in each group. In
short, the number of fish can affect two different components of our model: the information flow as well as the
appeal for exploration that is the exploitation/exploration trade-off for the individuals of the group respectively
formalized as A and γ.
From the results reported in Table 2.1 for the group size n = 1, I can infer that fish are able to detect the
predator by their own, without use of social cues. Accordingly, I arbitrary set initial weights to w1(0) = 0.85
and w2(0) = 0.15 for branches 1 and 2. The associated rewards are also arbitrary set to g1,t = 1 and g2,t = 0 ∀t.The tuning of A and γ is addressed with a maximum likelihood estimation performed using Equation 2.3 over all
experiments involving the group size n. The point is, given E experiments (i.e. sequences of chosen branches)
with groups of n fish, to maximize the likelihood now expressed as
L(I1,1, ..., It,exp, ..., In,E ;A, γ) =
E∏exp=1
n∏t=1
dIt,exp(t) (3.1)
Two different methods are used to generate adjacency matrices A to be evaluated by the maximum likelihood
13
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
Groups of 2 fish
Number of interactions
γ
5.0e−08
1.0e−07
1.5e−07
2.0e−07
2.5e−07A
0 1 2 3 4 5 6
0.0
0.2
0.4
0.6
0.8
1.0
Groups of 4 fish
Number of interactionsγ
1e−13
2e−13
3e−13
4e−13
5e−13
B
0 5 10 15 20 25
0.0
0.2
0.4
0.6
0.8
1.0
Groups of 8 fish
Number of interactions
γ
0.0e+00
5.0e−12
1.0e−11
1.5e−11
2.0e−11
2.5e−11
C
0 20 40 60 80 100 120
0.0
0.2
0.4
0.6
0.8
1.0
Groups of 16 fish
Number of interactions
γ
0.0e+00
5.0e−13
1.0e−12
1.5e−12
2.0e−12
D
Figure 3.1: Maximum likelihood estimation of the adjacency matrix A and of the exploration/-exploitation parameter γ with A, B, C and D respectively the results for groups of n = 2, 4, 8 and 16fish. Likelihood minimum; maximum values are A 1.49 × 10−08; 2.65 × 10−07, B 3.55 × 10−15; 6.17 × 10−13,C 1.26 × 10−29; 2.77 × 10−11 and D 8.27 × 10−25; 2.21 × 10−12. The darker the grey shade, the more likelyis the pair (ι, γ). ι stands for the number of interactions reported in a given A (i.e. the number of elementsequal to 1). If a triplet (ι, γ, L) appears several times, the mean value of L involved is reported. To obtain thecomplete grid, linear extrapolation from the likelihood estimations is performed. A and B are obtained withthe method 1 described in section 3.1 whereas C and D are obtained with the method 2. γ evaluated lie in therange [0.01, 0.02, ..., 1]. The pair (A, γ) maximizing L for each group size is reported with a red cross. For n = 4fish, there are two different pairs maximizing the likelihood. These values can differ from the darker regionsbecause ι involved in the figures does not take the structure of A involved in the calculation of L into account.γ are respectively equal to 0.34, 0.44, 0.01 and 0.03.
14
0.0 0.2 0.4 0.6 0.8 1.0
0e+
001e
−33
2e−
333e
−33
4e−
335e
−33
6e−
33
n = 1
γ
L
0.0 0.2 0.4 0.6 0.8 1.0
5.0e
−08
1.0e
−07
1.5e
−07
2.0e
−07
2.5e
−07
n = 2
γ
L
0.0 0.2 0.4 0.6 0.8 1.0
0e+
001e
−13
2e−
133e
−13
4e−
135e
−13
n = 4
γ
L
0.0 0.2 0.4 0.6 0.8 1.0
0.0e
+00
5.0e
−12
1.0e
−11
1.5e
−11
2.0e
−11
2.5e
−11
n = 8
γ
L
0.0 0.2 0.4 0.6 0.8 1.0
0.0e
+00
5.0e
−13
1.0e
−12
1.5e
−12
2.0e
−12
n = 16
γ
L
5 10 15
0.0
0.2
0.4
0.6
0.8
Value maximizing the likelihood per group size
Group size
γ
Figure 3.2: Maximum likelihood estimation of the γ parameter with the adjacency matrix A set to1n×n. γ values estimated lie in the range [0.001, 0.002, ..., 1]. The maximum of the likelihood L are γ = 0.841,0.36, 0.445, 0.001 and 0.026 for respectively n = 1, 2, 4, 8 and 16 (reported in the last plot).
15
estimation.
For n = {2, 4}, I generate all possible upper triangular matrices (with diagonal elements set to 0) because
these are the only elements actually used by the algorithm, recalling that once the fish making the decision
has chosen a branch it disappears from the experiment (i.e. the following fish can not communicate with him).
According to this, the number of possible adjacency matrices can be found by calculating 2v with v the number
of elements in an upper triangular matrix of size n and not considering diagonal elements. v = 1, 6, 28 and 120
giving 2, 64, 268435456 and 1.329228×1036 possible adjacency matrices for respectively n = 2, 4, 8 and 16. The
possible binary combinations are then found by iterating an index j from 0 to 2v − 1 and taking j′ the binary
pattern of j with v bits. For instance, for n = 4, we have v = 6 so that 2v−1 = 63. j ∈ [0, 1, 2, 3, 4, ..., 62, 63] and
j′ ∈ [000000, 000001, 000010, 000011, 000100, ..., 111110, 111111]. I then reconstruct a matrix An×n by setting
all elements of the lower triangular matrix and of the diagonal to 0.
The number of permutations when n = 8 or 16 requires a second method to construct the matrices. Instead
of evaluating all possible adjacency matrix (which would be very time consuming compared to the benefits of
the expected results), I construct randomly adjacency matrices in which each fish shares information to 0%,
25%, 50%, 75% and 100% of the following decision makers and then I evaluate the likelihood of each one of
them. I repeat the procedure ten times to avoid biased results due to the random construction of the matrices.
γ values evaluated in the maximum likelihood estimation lie in the range [0.01, 0.02, ..., 0.99, 1]. The code
generating the matrices (method 1 as well as method 2) and evaluating the likelihoods is presented in Appendix
B.
According to the results of (Ward et al., 2011), I expect that the number of fish sharing informations is
important in the fish shoals (i.e. there is a high number of elements of the upper triangular adjacency matrix
that should be equal to 1) and that the greater the number of fish, the more decision-making processes rely on
informations emitted by congeners (i.e. when the number of fish in a group n increases, γ decreases, meaning
that exploitation is favoured to exploration). The results of the maximum likelihood estimation are consistent
with my expectations (Figure 3.1). Adjacency matrix maximizing the likelihood have almost all of their elements
in the upper triangular part equal to one (i.e. all decision makers share informations with their congeners).
This is particularly noticeable for larger groups (n = 8 and n = 16), even if the use of the second method
presented earlier can also explain this result. When it is not true (for n = 2 for instance), it can be seen that
the likelihood has almost the same value across the x-axis: in small groups the shoal structure is less important
to maximize the likelihood than the value of γ. The values of γ maximizing the likelihood are γ = 0.84, 0.34,
0.44, 0.01 and 0.03 for respectively n = 1, 2, 4, 8 and 16. Note that 0.01 is the smallest value of γ evaluated by
the maximum likelihood estimation, meaning that γ for n = 8 may be smaller.
As it seems that all fish are involved in the information flow even for larger group sizes, I set adjacency
matrices to 1n×n and evaluate again γ, now within the range [0.001, 0.002, ..., 1]. This gives γ = 0.841, 0.36,
0.445, 0.001 and 0.026 for respectively n = 1, 2, 4, 8 and 16 (Figure 3.2). These values are kept as tuned
parameters.
3.2 Model properties
The collaborative bandits algorithm produces results which are compatible with the data, thanks to the γ
parameter I have tuned for each group size (Figure 3.3). Unfortunately, given the tuned parameters as well as
16
the arbitrary set parameters, the model is not able to recover biological properties I have inferred in section
2.2.1. Even if the maximum likelihood estimation yields to consider only shoals where fish highly communicate,
simulations show that the increase of the performance in the decision-making process is not due to social
communication (Figure 3.4). The combination of γ values and of the initial value of weights arbitrary set to
w1(0) = 0.85 and w2(0) = 0.15 is sufficient to make the simulations compatible with data. To run simulations
for group sizes larger than those considered in our data (i.e. for n > 16), I extrapolate γ values with a non-
linear function f : n 7→ exp(α + βn) (Figure 3.5). Simulations are then run for n ∈ {1, 2, 4, 8, 16, 25, 50, 100}.It appears that the performance for large group sizes is bounded by the maximum value of the initial weights
w1(0) = 0.85 (Figure 3.6). The higher performances observed for larger group sizes are due to small values of
γ that make each decision-maker to rely only on their initial weights which are already containing the good
choice to make: no information flow is required.
My model highlights that the collective patterns we observe in this set-up do not imply that fish actually
communicate more and share the vigilance tasks in large groups. It could be that the more fish in the group,
the more fish rely on their own information to make their decision. However, this requires that all fish have
already good informations about the surroundings when making their choice and this is rather unlikely.
I repeat the entire procedure (tuning of the parameters and simulations) with two new sets of initial weights:
when they are slightly different (w1(0) = 0.556 and w2(0) = 0.444, the values observed experimentally when
n = 1) and when they are equal (w1(0) = w2(0) = 1). Given these new initial values, the Collaborative
bandits algorithm is able to recover biological properties inferred by previous studies such as the role of social
communication and of the group size in the improvement of accuracy in the decision-making processes (Figures
3.7 and 3.8, A, B and D). However, given these initial values, the simulations are not compatible with data
(Figures 3.7 and 3.8, C). It could also be that choosing new values for the rewards and keeping such initial
values for the weights the algorithm would be able to produce results compatible with data but this procedure
highlights the problem of models including too many free parameters (i.e. parameters we cannot measure on
data). With these free parameters (namely the rewards and the initial values of weights that I have arbitrary
set), it seems that the Collaborative bandits algorithm can be compatible with data according to several very
different underlying biological hypothesis. As long as these free parameters cannot be measured experimentally,
we cannot decide which of these biological hypotheses is actually implemented. It makes also difficult to assess
the actual contribution of this new algorithm.
17
1 2 4 8 16
Group size
Pro
port
ion
of fi
sh a
void
ing
the
pred
ator
0.0
0.2
0.4
0.6
0.8
1.0
Figure 3.3: Goodness of fit of the Collaborative Exp3 algorithm I report here the mean ± s.d. of theproportion of success recovered from data and a 95% confidence interval derived from 1000 simulations of theCollaborative Exp3 algorithm with tuned parameters (red lines). The fitted algorithm is able to recover mostof the patterns exhibited by data.
18
No social communication Social communication
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Pro
port
ion
of fi
sh a
void
ing
the
pred
ator
Figure 3.4: Effect of social communication on decision-making performance as simulated with theCollaborative Exp3 algorithm I present here results of simulations with 1000 replications of a group ofn = 8 fish facing K = 2 two paths, one of those containing a predator. Simulations with Ap,q a diagonalmatrix are reported in No social communication boxplot whereas simulations with Ap,q = 1∀(p, q) are reportedin Social communication boxplot. γ = 0.001 and the reward git(t) = 1 when predator is absent and git(t) = 0when predator is present. With this choice of γ (estimated by the maximum likelihood procedure), the socialstructure has no effect on the performance.
19
0 20 40 60 80 100
0.0
0.2
0.4
0.6
0.8
Group size
γ
●
●
●
●
●
Figure 3.5: Extrapolation of γ for large group sizes (n > 16) I model γ = f(n) with f(.) the functionf : n 7→ exp(α + βn) (red line). α = 0.059, β = −0.332 are found thanks to the non-linear least squaresestimation performed by the R function nls. γ values estimated by the maximum likelihood estimation arereported with circles.
20
1 2 4 8 16 25 50 100
Group size
Pro
port
ion
of fi
sh a
void
ing
the
pred
ator
0.0
0.2
0.4
0.6
0.8
1.0
Figure 3.6: Effect of group size on decision-making performance as simulated by the Collaborativebandits algorithm I report here the mean ± s.d. of the proportion of success through simulations with 1000replications of groups of n ∈ {1, 2, 4, 8, 16, 25, 50, 100} fish facing K = 2 two paths, one of those containing apredator. Simulations are conducted with Ap,q = 1 ∀(p, q), γ = 0.841, 0.36, 0.445, 0.001, 0.026, 2.63 × 10−4,6.53 × 10−8 and 4.03 × 10−15 for respectively n = 1, 2, 4, 8, 16, 25, 50 and 100 (for the three last values, seeFigure 3.5). The rewards are set to git(t) = 1 when predator is absent and git(t) = 0 when predator is present.The red dashed line stands for a proportion of w1(0) = 0.85. The performance converges to the value of w1(0).
21
0 5 10 15 20 25
0.0
0.2
0.4
0.6
0.8
1.0
Groups of 8 fish
Number of interactions
γ
0.0e+00
5.0e−20
1.0e−19
1.5e−19
2.0e−19
2.5e−19A
●
No social communication Social communication
0.0
0.2
0.4
0.6
0.8
1.0
Pro
port
ion
of fi
sh a
void
ing
the
pred
ator
B
1 2 4 8 16
Group size
Pro
port
ion
of fi
sh a
void
ing
the
pred
ator
0.0
0.2
0.4
0.6
0.8
1.0
C
1 2 4 8 16 25 50 100
Group size
Pro
port
ion
of fi
sh a
void
ing
the
pred
ator
0.0
0.2
0.4
0.6
0.8
1.0
D
Figure 3.7: Summary of the results obtained with the Collaborative bandits algorithm by settinginitial weights slightly different Initial weights are set to w1(0) = 0.556 and w2(0) = 0.444 with A themaximum likelihood estimation of A and γ for n = 8 fish, B the effect of social communication with n = 8fish, C the performance of the algorithm with respect to data and D the results of 1000 simulations forn = 1, 2, 4, 8, 16, 25, 50, 100. See respectively Figures 3.1, 3.4, 3.3 and 3.6 for details. Given these initial weights,the collaborative bandits algorithm hardly recovers collective pattern seen in data whereas there is an effect ofsocial communication and of group size in the collective performance.
22
0 5 10 15 20 25
0.0
0.2
0.4
0.6
0.8
1.0
Groups of 8 fish
Number of interactions
γ
0.0e+00
5.0e−21
1.0e−20
1.5e−20
A
●
No social communication Social communication
0.0
0.2
0.4
0.6
0.8
1.0
Pro
port
ion
of fi
sh a
void
ing
the
pred
ator
B
1 2 4 8 16
Group size
Pro
port
ion
of fi
sh a
void
ing
the
pred
ator
0.0
0.2
0.4
0.6
0.8
1.0
C
1 2 4 8 16 25 50 100
Group size
Pro
port
ion
of fi
sh a
void
ing
the
pred
ator
0.0
0.2
0.4
0.6
0.8
1.0
D
Figure 3.8: Summary of the results obtained with the Collaborative bandits algorithm by settinginitial weights equal Initial weights are set to w1(0) = w2(0) = 1 with A the maximum likelihood estimationof A and γ for n = 8 fish, B the effect of social communication with n = 8 fish, C the performance of thealgorithm with respect to data and D the results of 1000 simulations for n = 1, 2, 4, 8, 16, 25, 50, 100. Seerespectively Figures 3.1, 3.4, 3.3 and 3.6 for details. Given these initial weights, the collaborative banditsalgorithm hardly recovers collective pattern seen in data whereas there is an effect of social communication andof group size in the collective performance.
23
Chapter 4
Discussion
Biology has already inspired researchers several optimization algorithms. One could for instance mention Ant
Colony Optimization or Ant Colony Routing algorithms (Bonabeau et al., 2000). It appears, thanks to this
work, that non-asymmetrical flow between the two fields would be worth to consider. Making the controversial
hypothesis (see the “adaptationist program”, (Gould and Lewontin, 1979) and (Mayr, 1983)) that the natural
selection has led the design of organisms to solve problems in an optimal way, it makes sense to connect together
models designed by researchers to optimally solve problems met by humans and biology.
Such a method (to adapt models developed in optimization or distributed control to biological problems)
implies some difficulties non-trivial to overcome. For instance, some of the parameters of these models can, as
in this work, be difficult or impossible to measure, making the production of new biological insights a tough
achievement. Even if I have shown that my algorithm can give results comparable to data observed at the
collective level, it is dangerous to see improvements in the understanding of decision-making process in the
mosquitofish, given the number of free parameters: the model can be validated under several opposed biological
hypothesis. For example, I have validated here an unlikely model where, in small groups, fish have a great
preference for one of the branch but they do not take it into account when making their decisions whereas in
large groups the choice is done according to these early preferences which are not due to social communication
— at least not as formulated in the model. With another set of arbitrary fixed parameters, one can validate
the expected assumptions where improved accuracy in decisions is due to effective social communication in
larger groups. These limits have already been expressed when trying to model collective motion, where a lot
of purely theoretical rules can produce the same patterns as observed at the collective level in animal groups,
confusing which cognitive rules are actually implemented in individuals (Weitz et al., 2012; Gautrais et al., 2012).
Nevertheless, two ideas are going to be investigated to improve the results observed setting initial weights to
w1(0) = 0.556 and w2(0) = 0.444. It could be that in experiments with one fish (n = 1) the actual probability
the fish chooses the branch without predator is higher than 0.556, for instance lying within the 95% confidence
interval [0.46, 0.65]. The model may also be formulated differently. The theoretical set-up formulated here
states that fish are taking one decision one by one, queuing in front of the bifurcation and are then removed
from the experiment. Another more realistic formulation allowing fish to share informations even when they
do not take a final decision (i.e. letting the algorithm have more iterations than fish in the group) has to be
addressed.
24
In addition to a framework implying that fish vote to allow consensus to emerge (Ward et al., 2008; Couzin
et al., 2011; Kao and Couzin, 2014), the algorithm presented here sheds light on new ideas that can be derived
from a weighted majority algorithm (Arora et al., 2012). These new thoughts need now to be confronted to
neuroscience and cognitive sciences to provide a new model resulting more on a data-driven approach than on
the theory-driven approach I have conducted here, according to the terminology and methodology proposed
in (Sumpter et al., 2012). In my opinion these further investigations are worth to be considered, given the
simple way to recover fish shoal structure the Collaborative bandits algorithm provides. Information transfer in
groups could then be simply investigated and would allow to address effects of several parameters such as sex
ratio in the group, age ratio, brain size... in the information flow of fish shoals. In this scope, it would also be
worth considering another parameter allowing relationships within the group to be more complex than binary
interactions (interactions / no interaction), so to obtain quantitative measures of the strength of communication
between individuals. Another interesting source of additional work lies in the possibility to integrate this model
in a spatial perspective where, for instance, decision-makers could communicate only to their direct neighbours:
it would then require adjacency matrices A to be time dependent.
In short, it is the first published approach to try to match those algorithms to data, hence bridging a gap
between theory and biological practice. I am confident this work can be used in a near future to explore in a
simple and elegant way information transfers as well as optimality in decision-making process of animal groups.
25
Bibliography
Sanjeev Arora, Elad Hazan, and Satyen Kale. The multiplicative weights update method: A meta-algorithm
and applications. Theory OF Computing, 8:121–164, 2012.
Peter Auer, Nicolo Cesa-Bianchi, Yoav Freund, and Robert E. Schapire. The non-stochastic multi-armed
bandit problem. SIAM Journal on Computing, 32(1):48–77, 2002. URL http://epubs.siam.org/doi/abs/
10.1137/S0097539701398375.
Michael A Bell and Susan Adlai Foster. The evolutionary biology of the threespine stickleback. Oxford University
Press, 1994.
Eric Bonabeau, Marco Dorigo, and Guy Theraulaz. Inspiration for optimization from social insect be-
haviour. Nature, 406(6791):39–42, 2000. URL http://www.nature.com/nature/journal/v406/n6791/abs/
406039a0.html.
Culum Brown and Kevin N. Laland. Social learning in fishes: a review. Fish and Fisheries, 4(3):280–288, 2003.
URL http://onlinelibrary.wiley.com/doi/10.1046/j.1467-2979.2003.00122.x/full.
Grant E. Brown. Learning about danger: chemical alarm cues and local risk assessment in prey fishes. Fish and
Fisheries, 4(3):227–234, 2003. URL http://onlinelibrary.wiley.com/doi/10.1046/j.1467-2979.2003.
00132.x/full.
Sebastien Bubeck and Nicolo Cesa-Bianchi. Regret analysis of stochastic and nonstochastic multi-armed ban-
dit problems. Foundations and Trends R© in Machine Learning, 5(1):1–122, 2012. ISSN 1935-8237, 1935-
8245. doi: 10.1561/2200000024. URL http://www.nowpublishers.com/product.aspx?product=MAL&doi=
2200000024.
Stephane Canonge, Jean-Louis Deneubourg, and Gregory Sempo. Group living enhances individual resources
discrimination: The use of public information by cockroaches to assess shelter quality. PLoS ONE, 6(6):
e19748, June 2011. ISSN 1932-6203. doi: 10.1371/journal.pone.0019748. URL http://dx.plos.org/10.
1371/journal.pone.0019748.
Nicolo Cesa-Bianchi and Gabor Lugosi. Prediction, Learning, and Games. Cambridge University Press, 2006.
URL http://dx.doi.org/10.1017/CBO9780511546921.
I. D. Couzin, C. C. Ioannou, G. Demirel, T. Gross, C. J. Torney, A. Hartnett, L. Conradt, S. A. Levin,
and N. E. Leonard. Uninformed individuals promote democratic consensus in animal groups. Science,
26
334(6062):1578–1580, December 2011. ISSN 0036-8075, 1095-9203. doi: 10.1126/science.1210280. URL
http://www.sciencemag.org/cgi/doi/10.1126/science.1210280.
Rachel L. Day, Tom MacDonald, Culum Brown, Kevin N. Laland, and Simon M. Reader. Interactions be-
tween shoal size and conformity in guppy social foraging. Animal Behaviour, 62(5):917–925, November 2001.
ISSN 00033472. doi: 10.1006/anbe.2001.1820. URL http://linkinghub.elsevier.com/retrieve/pii/
S0003347201918202.
A. Dussutour, S. C. Nicolis, G. Shephard, M. Beekman, and D. J. T. Sumpter. The role of multiple pheromones
in food recruitment by ants. Journal of Experimental Biology, 212(15):2337–2348, July 2009. ISSN 0022-0949,
1477-9145. doi: 10.1242/jeb.029827. URL http://jeb.biologists.org/cgi/doi/10.1242/jeb.029827.
Audrey Dussutour and Stamatios C. Nicolis. Flexibility in collective decision-making by ant colonies: Tracking
food across space and time. Chaos, Solitons & Fractals, 50:32–38, May 2013. ISSN 09600779. doi: 10.1016/
j.chaos.2013.02.004. URL http://linkinghub.elsevier.com/retrieve/pii/S0960077913000325.
Jacques Gautrais, Francesco Ginelli, Richard Fournier, Stephane Blanco, Marc Soria, Hugues Chate, and Guy
Theraulaz. Deciphering interactions in moving animal groups. PLoS Computational Biology, 8(9):e1002678,
September 2012. ISSN 1553-7358. doi: 10.1371/journal.pcbi.1002678. URL http://dx.plos.org/10.1371/
journal.pcbi.1002678.
S. J. Gould and R. C. Lewontin. The spandrels of san marco and the panglossian paradigm: A critique
of the adaptationist programme. Proceedings of the Royal Society B: Biological Sciences, 205(1161):581–
598, September 1979. ISSN 0962-8452, 1471-2954. doi: 10.1098/rspb.1979.0086. URL http://rspb.
royalsocietypublishing.org/cgi/doi/10.1098/rspb.1979.0086.
Plerre-P. Grasse. La reconstruction du nid et les coordinations interindividuelles chez bellicositermes natalensis
et cubitermes sp. la theorie de la stigmergie: Essai d’interpretation du comportement des termites construc-
teurs. Insectes sociaux, 6(1):41–80, 1959. URL http://link.springer.com/article/10.1007/BF02223791.
D. J. Hoare and J. Krause. Social organisation, shoal structure and information transfer. Fish and Fisheries,
4(3):269–279, 2003. URL http://onlinelibrary.wiley.com/doi/10.1046/j.1467-2979.2003.00130.x/
full.
A. B. Kao and I. D. Couzin. Decision accuracy in complex environments is often maximized by small group
sizes. Proceedings of the Royal Society B: Biological Sciences, 281(1784):20133305–20133305, April 2014.
ISSN 0962-8452, 1471-2954. doi: 10.1098/rspb.2013.3305. URL http://rspb.royalsocietypublishing.
org/cgi/doi/10.1098/rspb.2013.3305.
Ernst Mayr. How to carry out the adaptationist program? American Naturalist, page 324–334, 1983.
John McNamara and Alasdair Houston. The application of statistical decision theory to animal behaviour. Jour-
nal of Theoretical Biology, 85(4):673–690, 1980. URL http://www.sciencedirect.com/science/article/
pii/0022519380902659.
John M. McNamara and Alasdair I. Houston. Optimal foraging and learning. Journal of Theoretical Biology,
117(2):231–249, 1985. URL http://www.sciencedirect.com/science/article/pii/S0022519385802198.
27
Simon M. Reader and Kevin N. Laland. Diffusion of foraging innovations in the guppy. Animal Behaviour, 60(2):
175–180, August 2000. ISSN 00033472. doi: 10.1006/anbe.2000.1450. URL http://linkinghub.elsevier.
com/retrieve/pii/S0003347200914507.
D. J. T. Sumpter. The principles of collective animal behaviour. Philosophical Transactions of the Royal Society
B: Biological Sciences, 361(1465):5–22, January 2006. ISSN 0962-8436, 1471-2970. doi: 10.1098/rstb.2005.
1733. URL http://rstb.royalsocietypublishing.org/cgi/doi/10.1098/rstb.2005.1733.
D. J. T. Sumpter, R. P. Mann, and A. Perna. The modelling cycle for collective animal behaviour. Interface
Focus, 2(6):764–773, December 2012. ISSN 2042-8898, 2042-8901. doi: 10.1098/rsfs.2012.0031. URL http:
//rsfs.royalsocietypublishing.org/cgi/doi/10.1098/rsfs.2012.0031.
D. J.T Sumpter and S. C Pratt. Quorum responses and consensus decision making. Philosophical Transactions
of the Royal Society B: Biological Sciences, 364(1518):743–753, March 2009. ISSN 0962-8436, 1471-2970. doi:
10.1098/rstb.2008.0204. URL http://rstb.royalsocietypublishing.org/cgi/doi/10.1098/rstb.2008.
0204.
R. Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical
Computing, Vienna, Austria, 2014. URL http://www.R-project.org/.
G. Thomas, A. Kacelnik, and J. Van Der Meulen. The three-spined stickleback and the two-armed bandit.
Behaviour, page 227–240, 1985. URL http://www.jstor.org/stable/4534444.
Ashley J. W. Ward, Jens Krause, and David J. T. Sumpter. Quorum decision-making in foraging fish shoals.
PLoS ONE, 7(3):e32411, March 2012. ISSN 1932-6203. doi: 10.1371/journal.pone.0032411. URL http:
//dx.plos.org/10.1371/journal.pone.0032411.
Ashley JW Ward, David JT Sumpter, Iain D. Couzin, Paul JB Hart, and Jens Krause. Quorum decision-
making facilitates information transfer in fish shoals. Proceedings of the National Academy of Sciences, 105
(19):6948–6953, 2008. URL http://www.pnas.org/content/105/19/6948.short.
Ashley JW Ward, James E Herbert-Read, David JT Sumpter, and Jens Krause. Fast and accurate decisions
through collective vigilance in fish shoals. Proceedings of the National Academy of Sciences, 108(6):2312–2315,
2011.
Sebastian Weitz, Stephane Blanco, Richard Fournier, Jacques Gautrais, Christian Jost, and Guy Theraulaz.
Modeling collective animal behavior with a cognitive perspective: A methodological framework. PLoS ONE,
7(6):e38588, June 2012. ISSN 1932-6203. doi: 10.1371/journal.pone.0038588. URL http://dx.plos.org/
10.1371/journal.pone.0038588.
28
Appendix A
Upper bound on the weak regret
A.1 Theorem
(Auer et al., 2002) show that choosing
γΓ = min{1,
√K lnK
(e− 1)Γ} (A.1)
with K the number of arms, for any T > 0 and assuming that given any Γ such as Γ > Gmax, Equation A.1
leads to the following bound:
Gmax − E[GExp3] ≤ 2√e− 1
√ΓK lnK ≤ 2.63
√ΓK lnK (A.2)
with Gmax−E[GExp3] the weak regret where Gmax = maxj
T∑t=1
gj(t) and E[GExp3] the expectation of the rewards
the Exp3 algorithm induces.
A.2 Proof
Let gi,t ∈ [0, 1] being the rewards of action i ∈ [1,K] at time t ∈ [0, T ] and Gmax − E[GExp3] the weak regret
where Gmax = maxj
T∑t=1
gj(t) and E[GExp3] the expectation of the rewards the Exp3 algorithm induces. From
the definitions presented section 2.1, the following useful equalities can be derived:
gi(t) ≤ 1/di(t) ≤ K/γ (A.3a)
K∑i=1
di(t)gi(t) = dit(t)git(t)
dit(t)= git(t) (A.3b)
K∑i=1
di(t)gi(t)2 = dit(t)
git(t)
dit(t)git(t) ≤ git(t) =
K∑i=1
gi(t) (A.3c)
29
Let Wt = w1(t) + ...+ wK(t). For all sequences i1, ..., iT of actions drawn by the Exp3 algorithm, we have:
Wt+1
Wt=
K∑i=1
wi(t+ 1)
Wt
=
K∑i=1
wi(t)
Wtexp
( γKgi(t)
)=
K∑i=1
di(t)− γ/K1− γ
exp( γKgi(t)
)(A.4a)
≤K∑i=1
di(t)− γ/K1− γ
[1 +
γ
Kgi(t) + (e− 2)
( γKgi(t)
)2](A.4b)
≤ 1 +γ/K
1− γ
K∑i=1
di(t)gi(t) +(e− 2)(γ/K)2
1− γ
K∑i=1
di(t)gi(t)2
≤ 1 +γ/K
1− γgit(t) +
(e− 2)(γ/K)2
1− γ
K∑i=1
gi(t) (A.4c)
(A.4a): di(t) = (1− γ) wi(t)∑Kj=1 wj(t)
+ γK as defined in Algorithm 1
(A.4b): because ex ≤ 1 + x+ (e− 2)x2 for x ≤ 1 and gi(t) ≤ Kγ ⇔ gi(t)
γK ≤ 1 from (A.3a)
(A.4c): combine (A.3b) and (A.3c)
Taking logarithms and using 1 + x ≤ ex gives
lnWt+1
Wt≤ γ/K
1− γgit(t) +
(e− 2)(γ/K)2
1− γ
K∑i=1
gi(t)
Summing over t we then get
lnWT+1
W1≤ γ/K
1− γGExp3 +
(e− 2)(γ/K)2
1− γ
T∑t=1
K∑i=1
gi(t) (A.5)
For any action j,
lnWT+1
W1≥ ln
wj(T + 1)
W1=
γ
K
T∑t=1
gj(t)− lnK
Combining with (A.5), we get
GExp3 ≥ (1− γ)
T∑t=1
gj(t)−K lnK
γ− (e− 2)
γ
K
T∑t=1
K∑i=1
gi(t) (A.6)
We next take the expectation of both sides of (A.6) with respect to the distribution of 〈i1, ..., iT 〉. For the
30
expected value of each gi(t), we have:
E[gi(t)|i1, ..., it−1] = E[di(t).
gi(t)
di(t)+ (1− di(t)).0
]= gi(t) (A.7)
Combining (A.6) and (A.7), we find that
E[GExp3] ≥ (1− γ)
T∑t=1
gj(t)−K lnK
γ− (e− 2)
γ
K
T∑t=1
K∑i=1
gi(t)
Since j was chosen arbitrarily andT∑t=1
K∑i=1
gi(t) ≤ KGmax
we obtain the inequality in the statement of the equation A.2 by choosing γΓ = min{1,√
K lnK(e−1)Γ}.
31
Appendix B
Code for adjacency matrix generation
Listing B.1: R code implementing the two methods to generate adjacency matrices and to evaluate their
likelihood with respect to data
1 ##This file is used for maximum likelihood estimation
2 ##of both the adjacency matrix and gamma
3 ##Two methods are used / proposed depending on your machine
4 ##For computation time reasons, we provide the second method
5 ##for group sizes of 8 and 16 fish. They won’t evaluate the
6 ##likelihood of each possible combination of adjacency matrix
7 ##but will generate some of them with criteria of "% of fish
8 ##communicating".
9
10 ##Dependencies
11 library(R.utils)##for intToBin() used in method 1
12
13 ##Parameters
14 n <- c(1,2,4,8,16);
15 arms_nb <- 2;
16 gamma_sequences <- seq(from=0.01,to=1,by=0.01);
17 nb_gamma <- length(gamma_sequences);
18
19 ##Build the dataset used for the likelihood estimation
20 load("bygroupsize_list.rda");##This file contains all sequences of chosen arms
21 ##replace 0 by 2
22 for(list_index in 1:5){
23 bygroupsize_list[[list_index]] <- replace(bygroupsize_list[[list_index]],
24 bygroupsize_list[[list_index]] == 0,2);
25 }
26
27 ##Set-up 1 : 2 arms, one without predator, the other with a predator
32
28 reward_function <- function(t_step,arm_chosen){
29 if(arm_chosen == 1){
30 return(1);
31 } else {
32 return(0);
33 }
34 }
35
36 ##Method 1 for group sizes of n = 2 and n = 4
37 ##Create all matrices for n = 2 and n = 4
38 ##To find number of permutations to be computed for upper triangular
39 ##matrices:
40 # a <- 0
41 # for(i in 1:(n-1)){
42 # a <- a + i;
43 # }
44 # print(a)
45 n_permuted <- c(NA,1,6,28,120);
46 allmatrices_2_4 <- list();
47
48 for(shoalsize_index in c(2,3)){
49 fish_nb <- n[shoalsize_index];
50 n_permuted_current <- n_permuted[shoalsize_index];
51 allmatrices_current <- matrix(rep(NA, fish_nb**2),ncol=fish_nb**2);
52 for(perm_index in 0:(2**n_permuted_current-1)){
53 current_perm <- as.integer(unlist(strsplit(intToBin(perm_index),NULL)));
54 current_perm <- c(rep(0,n_permuted_current-length(current_perm)),current_perm);
55 groups_factor <- NULL;
56 for(row_index in 1:(fish_nb-1)){
57 groups_factor <- c(groups_factor,rep(row_index,fish_nb-row_index));
58 }
59 splitted_perms <- split(current_perm,groups_factor);
60 for(row_index in 1:(fish_nb-1)){
61 splitted_perms[[row_index]] <- c(rep(0,row_index),
62 splitted_perms[[row_index]]);
63 }
64 current_perm <- c(unlist(splitted_perms,groups_factor),rep(0,fish_nb));
65 allmatrices_current <- rbind(allmatrices_current,current_perm);
66 }
67 allmatrices_2_4 <- c(allmatrices_2_4,list(allmatrices_current[-1,]));
68 }
69
70 detachPackage("R.utils");
71
33
72 for(groupsize_index in c(2,3)){
73 fish_nb <- n[groupsize_index];
74 ##Exclude fish swimming back in data
75 sequences_datamatrix <- bygroupsize_list[[groupsize_index]];
76 excluded_rows <- unique(arrayInd(which(sequences_datamatrix==-1),
77 dim(sequences_datamatrix)[1]));
78 if(length(excluded_rows)>0){
79 sequences_datamatrix <- sequences_datamatrix[-excluded_rows,];
80 }
81 ##Looping over adj_matrices
82 nb_adjmat <- dim(allmatrices_2_4[[groupsize_index-1]])[1];
83 likelihoods_df <- data.frame(n=NA,likelihood=NA,adj_matrix=NA,nb_interacting=NA,gamma=NA);
84 for(adjmatrix_index in 1:nb_adjmat){
85 nb_fish_sharing <- sum(allmatrices_2_4[[groupsize_index-1]][adjmatrix_index,]);
86 neighbours <- matrix(allmatrices_2_4[[groupsize_index-1]][adjmatrix_index,],
87 nrow=fish_nb,byrow=TRUE);
88 if(fish_nb > 1){
89 nb_experiments <- dim(sequences_datamatrix)[1];
90 } else {
91 nb_experiments <- length(sequences_datamatrix);
92 }
93 for(gamma_index in 1:nb_gamma){
94 likelihood <- NULL;
95 gamma <- gamma_sequences[gamma_index];
96 for(experiment_index in 1:nb_experiments){
97 if(fish_nb > 1){
98 chosenarms_sequence <- unlist(sequences_datamatrix[experiment_index,]);
99 } else {
100 chosenarms_sequence <- unlist(sequences_datamatrix[experiment_index]);
101 }
102 weights <- matrix(rep(c(0.85,0.15),fish_nb*2),
103 byrow=TRUE,nrow=fish_nb,ncol=2);
104 for(fish in 1:fish_nb){
105 decision_maker <- fish;
106 arm_chosen <- chosenarms_sequence[fish];
107 distribution <- (1 - gamma) * (weights[decision_maker,arm_chosen]
108 / sum(weights[decision_maker,])) + gamma / arms_nb;
109 reward <- reward_function(fish,arm_chosen);
110 reward_est <- reward / distribution;
111 dm_neighbours <- which(neighbours[decision_maker,]==1);
112 weights[dm_neighbours,arm_chosen] <-
113 weights[decision_maker,arm_chosen] * exp(gamma * reward_est / arms_nb);
114 likelihood <- c(likelihood,distribution);
115 }
34
116 }
117 likelihoods_df <- rbind(likelihoods_df,c(fish_nb,prod(likelihood),
118 adjmatrix_index,nb_fish_sharing,gamma));
119 }
120 }
121 write.table(likelihoods_df[-1,],
122 file=paste("optim_gamma_adjmat/likelihoods_adjmatxgamma_",fish_nb,".csv",sep=""),
123 sep=",",row.names=FALSE);
124 }
125
126 #Method 2 for group sizes of n = 8 and n = 16 fish.
127 proportions_adjmat <- c(0,0.25,0.50,0.75,1);
128 nb_adjmat <- length(proportions_adjmat);
129 nb_simu <- 10;
130
131 ##/!\ we just look n = 8 and n = 16 - some hard coded stuff in the def of de df
132 ##replace each 2 by length(n) for generalities
133
134 likelihood_df <- data.frame(id=seq(from=1,to=2*nb_gamma*nb_adjmat),
135 n=rep(c(8,16),each=nb_gamma*nb_adjmat),
136 likelihood=rep(0,2*nb_gamma*nb_adjmat),adj_matrix=rep(1:nb_adjmat,each=nb_gamma,2),
137 nb_interacting=rep(NA,2*nb_gamma*nb_adjmat),
138 gamma=rep(gamma_sequences,2*nb_adjmat));
139
140 for(simulation_index in 1:nb_simu){
141 count_row <- 0;
142 for(groupsize_index in 4:length(n)){
143 fish_nb <- n[groupsize_index];
144 sequences_datamatrix <- bygroupsize_list[[groupsize_index]];
145 excluded_rows <- unique(arrayInd(which(sequences_datamatrix==-1),
146 dim(sequences_datamatrix)[1]));
147 if(length(excluded_rows)>0){
148 sequences_datamatrix <- sequences_datamatrix[-excluded_rows,];
149 }
150 ##Generating the adjacency matrix
151 for(proportions_index in 1:nb_adjmat){
152 triang_values <- NULL;
153 for(triang_index in 1:(fish_nb-1)){
154 nb_1 <- round(proportions_adjmat[proportions_index]*triang_index);
155 nb_0 <- triang_index - nb_1;
156 triang_values <- c(triang_values,
157 sample(c(rep(1,nb_1),rep(0,nb_0)),triang_index));
158 }
159 triang_values <- rev(triang_values);
35
160 groups_factor <- NULL;
161 for(row_index in 1:(fish_nb-1)){
162 groups_factor <- c(groups_factor,rep(row_index,fish_nb-row_index));
163 }
164 splitted_perms <- split(triang_values,groups_factor);
165 for(row_index in 1:(fish_nb-1)){
166 splitted_perms[[row_index]] <- c(rep(0,row_index),
167 splitted_perms[[row_index]]);
168 }
169 neighbours <- c(unlist(splitted_perms,groups_factor),rep(0,fish_nb));
170 nb_interacting <- sum(neighbours);
171 neighbours <- matrix(neighbours,nrow=fish_nb,byrow=TRUE);
172 if(fish_nb > 1){
173 nb_experiments <- dim(sequences_datamatrix)[1];
174 } else {
175 nb_experiments <- length(sequences_datamatrix);
176 }
177 likelihood_pergamma <- rep(NULL,nb_gamma);
178 for(gamma_index in 1:nb_gamma){
179 likelihood <- NULL;
180 gamma <- gamma_sequences[gamma_index];
181 for(experiment_index in 1:nb_experiments){
182 if(fish_nb > 1){
183 chosenarms_sequence <- unlist(sequences_datamatrix[experiment_index,]);
184 } else {
185 chosenarms_sequence <- unlist(sequences_datamatrix[experiment_index]);
186 }
187 weights <- matrix(rep(c(0.85,0.15),fish_nb*2),
188 byrow=TRUE,nrow=fish_nb,ncol=2);
189 for(fish in 1:fish_nb){
190 decision_maker <- fish;
191 arm_chosen <- chosenarms_sequence[fish];
192 distribution <- (1 - gamma) * (weights[decision_maker,arm_chosen]
193 / sum(weights[decision_maker,])) + gamma / arms_nb;
194 reward <- reward_function(fish,arm_chosen);
195 reward_est <- reward / distribution;
196 dm_neighbours <- which(neighbours[decision_maker,]==1);
197 weights[dm_neighbours,arm_chosen] <-
198 weights[decision_maker,arm_chosen] * exp(gamma * reward_est / arms_nb);
199 likelihood <- c(likelihood,distribution);
200 }
201 }
202 count_row <- count_row + 1;
203 likelihood_df$likelihood[count_row] <-
36
204 likelihood_df$likelihood[count_row] +prod(likelihood);
205 likelihood_df$nb_interacting[count_row] <-
206 nb_interacting;
207 }
208 }
209 }
210 }
211 likelihood_df$likelihood <- likelihood_df$likelihood / nb_simu;
212 save(likelihood_df,file="optim_gamma_adjmat/likelihoods_biggroups.RData");
37