how do cognitive agents handle the tradeoff between speed and accuracy?

How$do$cognitive$agents$handle$the$tradeoff$between$

speed$and$accuracy?Tatsuji(Takahashi(高橋(達二((

Tokyo(Denki(University((東京電機大学[email protected]

28(Dec.(2012Matsumoto(lab.,(NAIST(松本研(奈良先端科学技術大学院大学

Tatsuji(Takahashi(高橋達二★ Studied(philosophy(and(history(of(science(with(

KomachiGsan.(★ Got(a(Ph.D.(in(science(of(complex(systems(at(

Kobe(university((supervisor:(Yukio–Pegio(Gunji(郡司ペギオ幸夫教授).

★ Teaching(at(Tokyo(Denki(University((Hiki,(Saitama(campus),(running(a(lab(of(Rinternal(measurementR(内部観測研究室(and(gradually(changing(the(research(area(to(cognitive(science.★ hSp://takalabo.rd.dendai.ac.jp/

Purpose(and(metaGtheory★ Purpose:★ To(analytically(and(constructively(understand(the(

flexibility(and(creativity(of(human(mind,★ under(ambiguity,(uncertainty(or(even(indeterminacy,(

in(this(interminable(world,★ which(can(work(in(face(of(the$frame$problem(and(self:referential$paradox.

★ To(this(end,(we(treat(the(frame(problem(and(selfGreferential(paradoxes(as(empirical(as(possible★ in(cognitive(psychology,(machine(learning(and(

robotics;(not(in(philosophy(itself.

The(problem(

★ How$do$cognitive$agents$like$us$handle$the$speed–accuracy$tradeoff$that$is$inevitable$in$this$uncertain$world?

★ There(should(be(many(things(we(can(learn(from(ourselves(in(understanding(and(engineering(cleverer(systems.

Illogical(biases(in(cognition★ In(our(classroom(experience:(★ we(have(difficulty(in(understanding(material(

implication((Rif(thenR(in(logic),(with(which(Rif(p(then(qR(is(true(if(p(is(false(or(q(is(true.

★ we(confuse(necessary(and(sufficient(conditions(★ (Rif(p(then(qR(read(also(as(Rif(q(then(p,R(or(in(effect(Rp(iff(qR)

★ we(judge(the(probability(and(gain(from(a(situation(differently,(dependent(on(the(expression(of(the(state(description)★ this(is(called(Rthe(Framing(effectR((popular(in(behavioral(

economics,(by(Tversky(&(Kahneman)

Illogical(biases(in(cognition★ We(don\t(follow(P(if(p(then(q)(=(P(notGp(or(q)((material(

implication)★ Generally(P(p)(is(small(hence(P(notGp)(is(big,(making(the(probability(

of(P(notGp#or#q)#too(big(to(be(informative.★ We(consider(conditionals((if)(as(biGconditionals((if(and(only(if;(iff)(and(often(loosely(identify(necessary(and(sufficient(conditions★ Merits(in(information(acquisition(using(conditionals((Oaksford(&(

Chater,(1994;(HaSori,(2002)★ Merits(in(causal(learning(for(not(strictly(distinguishing(forward(prediction(and(backward(diagnosis((with(Markov(equivalence)?

★ The(Framing(effect★ The(expression(in(state(description(represents(the(past(history(and(

the(speaker\s(prediction(of(the(state.((McKenzie(&(Mikkelsen,(2000)

Illogical(biases(can(be(rational(and(even(logical

★ The(illogical(biases(in(human(cognition(can(be(rationalized(when(considered(in(an(appropriate(context.★ Sometimes(our(theory(at(hand(is(too(old(or(primitive(

to(understand(the(rationality(in(human(cognition.★ Then,(it(should(be(possible(to(analyze(human(

cognitive(biases(and(apply(them(to(machine(learning(or(artificial(intelligence.

Two(topics(of(this(talk:(★ (pARIs(part)(Study(of(how(we(reason,(with(

emphasis(on(conditionals((sentences(of(the(form(Rif(p(then(qR).★ Humans(seem(illogical(and(irrational(but(actually(

the(form(of(our(reasoning(follows(some(newly(invented(theories.

★ (LS$part)$Application$of$cognitive$properties$of$human$to$machine$learning.★ The(adapativeGness(of(some(biases(and(heuristics(

in(human(cognition(can(be(actually(applied.

pARIs(part

Reasoning(and(conditional★ Three(forms(of(reasoning:(deduction,(induction,(abduction

★ Deduction(uses(conditionals★ p(and(Rif(p(then(qR(→((q((modus(ponens)

★ Induction(forms(conditionals★ coGoccurrence(of(p(and(q(→(Rif(p(then(qR

★ Abduction(retrogresses(conditionals(and(form(explanation★ q(and(Rif(p(then(qR(→((p((affirmation(of(consequent)

deductiondeductiondeduction inductioninductioninduction abductionabductionabduction

premise 1 p p q

premise 2 p→q q p→q

conclusion q p→q p

Causality(and(conditional★ Causal(relationship(is(usually(expressed(by(conditional.

★ If(global(warming(continues((W)(then(London(will(be(flooded((L).★ (If(cause(then(effect)

★ We(can(also(use(conditionals(of(the(form((If(effect.then(cause)★ The(utility(of(confusing(the(two(forms:$

★ We(should(test(independence(to(find(a(causal(relationship,(before(considering(the(directionality.

★ If(we(allow(for(directionality,(we(need(two(Bayes(networks,(test(and(choose(one(from(the(two.(This(is(cognitively(heavy(for(intuition.

C E

C E

Model 1

Model 2

directed mode

C EModel

undirected mode

？

Material(implication★ Modeling(conditional(by(material.implication★ Rif(p,(then(qR(⇔Rnot(p,(or(qR★ Paradoxes(of(material(implication(1

★ If(there(is(no(gravity,(then(I(am(the(king(of(Japan.★ If(p((antecedent)(is(false,(Rif(p(then(qR(is(true(no(maSer(what(q(is.

★ Paradoxes(of(material(implication(2★ If(I(am(the(king(of(Japan,(then(Tokyo(is(the(capital(of(Japan.

★ If(q((consequent)(is(true,(Rif(p(then(qR(is(true(no(maSer(what(p(is.

★ Experiments(show(that(humans(do(not(follow(material(implication.

A(⊃(C C=T C=FA=T T FA=F T T

Material(implication★ Why(humans(don\t(follow(material(implication?

★ Old(paradigm(psychology(of(reasoning:(It\s(because(human(are(irrational(or(effortless((e.(g.,(mental(models(theory)

★ New(paradigm(psychology(of(reasoning:(Humans(reason(factoring(the(uncertainty(and(the(context((environment(structure)(into(their(reasoning.

★ Considering(uncertainty((the(truth(value(of(a(proposition(as(probability(in([0,1](with(1((true)(and(0((false)),(★ With(the(probability(of(an(event((proposition)(usually(being(very(

small,(material(implication(doesn\t(work.★ Humans(reason(allowing(for(uncertainty.

★ The(meaning(of(Rif(p(then(qR(by(humans(is(modeled(not(by(p#⊃#q(but(by(q|p.★ With(q|p,#¬p(cases(are(ignored.

A(⊃(C C=T C=FA=T T FA=F T T

Defective(conditional★ For(half(a(century((since(1966),(it(

has(been(known(that(humans(follow(the(Rdefective(truth(tableR(when(understanding(and(using(conditionals,(as(in(the(Table.

★ Conditional(is(not(truthGfunctional?★ For(a(conditional(p#=(RIf(A,(then(C,R

★ If(the(truth(value(combination(of(antecedent(A(and(consequent(C(is(TT,(p(is(true.(If(TF,(p#is(false.(When(A(is(false,(participants(of(experiments(answer(that(FT(and(FF(do(not(make(p(true(nor(false(but(irrelevant(to(the(truth(value(of(p.

defective (no truth value assigned)

14

If A then C C=T C=F

A=T true false

A=F irrelevant irrelevant

Psychologically: Wason, 1966; Johnson-Laird and Tagart, 1969; Wason and Johnson-Laird, 1972; Evans et al., 1993.

Theoretically: Strawson 1950; Quine 1952

Table. defective truth table

Defective(biconditional★ There(is(our(tendency(of(

interpreting(Rif(A(then(CR(as(Rif(A(then(C,(and(if(C(then(AR(or(RA(if(and(only(if(CR((biconditional(reading).

★ Here(the(interpreted(biconditional(is(called(defective(biconditional.

★ True(for(TT,(false(for(TF(and(FT,(irrelevant(only(for(FF.

★ In(deductive(tasks,(this(paSern(has(been(known((Evans(&(Over,(2004).

If and only if A then C C=T C=F

A=T true false

A=F false irrelevant

15

If A then C

C=T C=F

A=T T F

A=F I I

If C then A

C=T C=F

A=T T I

A=F F I

conjunction

From(defective(conditional(to(conditional(event

★ P(if(p(then(q)(=(P(q|p)★ Not(P(if(p(then(q)(=(P(p(⊃(q)(=(P(¬p(or(q)

★ q|p(as.an.event((conditional(event)★ Boolean(algebra((ring)(R(can(not(nonGtrivially(

include(q|p((Lewis\(triviality(result).★ We(need(to(extend(R(to(R|R#(conditional#event#algebra#:#Goodman,(Nguyen,(Walker,(1991).

Overview★ New(paradigm(psychology(of(reasoning★ De(FineSi\s(conditional$and(biconditional$event★ biconditional(event(in(causal$induction:

★ the(pARIs((proportion(of(assumedGtoGbe(rare(instances)(rule

★ Meta:analysis(and(three$experiments$to(confirm(the(validity(of(pARIs

★ Theoretical$background(and(connections(to(other(areas,(such(as:★ Developmental$study$of$conditionals(by(Gauffroy(and(

Barouillet((2009),(★ Amos(Tversky\s(study(of(similarity((1977),(and(★ Jaccard$similarity$index(and(some(other(popular(indices(in(

mathematics,(statistics(and(machine(learning.

toc★ New$paradigm$psychology$of$reasoning★ Reasoning$and$conditional★ Conditional$and$biconditional$event★ Biconditional$event$in$causal$induction:$pARIs$(proportion$of$assumed:to:be$rare$instances)

★Meta:analysis★ Three$experiments★ Theoretical$background

New(paradigm(psychology(of(reasoning

★ Very(naively(expressed...★ Old(paradigm:

★ The(normative(theory(is(the(classical(bivalent(logic(with(conditionals(modeled(by(material(implication(P(if(p(then(q)(=(P(p(⊃(q)(=(P(¬p(or(q).

★ Doesn\t(fit(the(data(in(many(areas:(from(this(some(said(humans(are(irrational(or(the(intelligence(is(quite(limited.

★ New(paradigm:★ Probability(logic(with(P(if(p(then(q)(=(P(q|p)★ de(FineSi(gives(the(appropriate(theory(of(subjective(

probability.★ Fits(the(data;(human(cognition(is(designed(to(treat(

uncertainty(by(nature.(It(is(formed(through(evolution.

defective conditional and defective biconditional

★ Defective truth table in the older paradigms★ (Wason, 1966; Johnson-Laird and Tagart, 1969;

Wason and Johnson-Laird, 1972; Evans et al., 1993)

★ is normative and coherent in the new paradigm

21

old(paradigm new(paradigm

defective(conditional →

conditional(event(q|p

defective(biconditional

→ biconditional(event(p⟛q

de(FineSi\s(conditional$event★ Conditional(event,(formerly(called(defective(conditional,(is(a(

core(notion(in(the(new(paradigm(psychology(of(reasoning.★ The(Equation:(the(probability(of(a(conditional(is(the(

conditional(probability(of(the(consequent(given(the(antecedent.★ P(if$p$then$q)$=$P(q|p)$(the$Equation)★ ¬p(cases(are(neglected,(and(Rq|pR(is(itself(a((conditional)(event.

V: void case

material conditional conditional biconditional

conditional event event event

p q p⊃q q|p p|q p⟛qT T T T T TT F F F V FF T T V F FF F T V V V

conjunction

de Finetti

Causal(induction★ Example:(We(want(to(know(the(cause(of(a(health(problem,(

right(now(just(from(pure(observation,(no(intervention.★ I(sometimes(have(stiff(shoulders(and(a(headache.(What\s(

the(cause?(How(about(coffee?★ a:.(cause=present/effect=present)$

★ How(frequently(I(got(a(headache(after(having(a(cup(of(coffee?(★ b:.(present/absent)$

★ How(frequently(I(get$no(headache(after(coffee?★ c:.(absent/present)$

★ How(frequently(I(got(a(headache(without(coffee?(★ d:.(absent/absent)$

★ How(frequently(I(get$no(headache(without(coffee?(

Causal(induction(experiment(Stimulus$presentation:(a(

pair(of(two(kinds(of(pictures(illustrating(the(presence(and(absence(of(cause(and(effect,(at(left(and(right,(respectivelyResponse:(participants(evaluate(the(causal(

intensity(they(felt(from(0(to(100,(using(a(slider(

E ¬EC a b¬C c d

showing b-cell type joint event

Causal((intensity)(induction

★ Two(phases(of(causal(induction((HaSori(&(Oaksford(2007)

★ Phase$1:$observational((statistical)★ Phase$2:$interventional((experimental)

★We(focus(on(causal(induction(of(the(phase$1(for(generative$cause(because(preventive(causes(are(confusing(and(hard(to(treat(especially(in(the(observation(phase((HaSori(&(Oaksford,(2007).

Causal(Induction★ Here(we(study(the(causal(intensity.★ Recent(studies(emphasize(the(structure((the(

topology(of(Bayes(network)(rather(than(the(intensity((node(weight).(However,(structure(and(intensity(have(a(mutual(relationship.(In(an(unknown(situation,(intensity(is(what(maSers(since(structure(is(not(known.

★ Many(problems(about(intensity(remain(untouched.★ Why.normative.models.such.as.∆P.and.Power.PC.donBt.fit.the.data?

The(pARIs(rule★ The(frequency(information(of(rare(instances(

conveys(more(information(than(abundant(instances((rational$analysis(and(rarity$assumption,(see(esp.(McKenzie(2007).

★ Because(of(the(frame(problemGlike(aspect,(the(dGcell(information(can(be(unreliable((depends(strongly(on(how(we(frame(and(count).

★ Hence(we(calculate(the(causal(intensity(only(by(the(proportion(of(assumedGtoGbe(rare(instances((pARIs)★ named(after(pCI:.proportion.of.confirmatory.instances,(White(2003.

The(pARIs(rule★ C(and(E(are(both(generally(assumed(to(be(rare((P(C)(and(P(E)(low).★ pARIs(=(proportion(of(assumedGtoGbe(rare(instances((a,#b,#and(c).(

C Eb ca d

UpARIs = P(p⟛q) = a / (a+b+c)

E -EC a b-C c d

conditional event biconditional event infering causal intensity

C E E|C C⟛E pARIs

T T T T positive

T F F F negativeF T V F negativeF F V V irrelevant

The(pARIs(rule★ C(and(E(are(both(assumed(to(be(rare((P(C)(and(P(E)(low)★ pARIs(=(proportion(of(assumedGtoGbe(rare(instances((a,#b,#and(c).(★ The(probability(of(the(conjunction(of(cause(and(effect(given(the(

disjunction(of(cause(and(effect((conditioned(on(the(disjunction).(

C Eb ca d

U

pARIs = P(C iff E) = P(C and E | C or E)P(C and E | C or E)P(C and E | C or E)

=P(C and E)

=a

=P(C or E)

=a+b+c

E -EC a b-C c d

Why(ignore(the(dGcell?★ Hempel\s(paradox

★ All(ravens(are(black.(★ =(If(something(is(a(raven,(then(it(is(black.

★ Is$a$non:black$non:raven$confirmatory?

★ If(a(nonGraven(that(is(not(black(is(rare,(it(is(informative(hence(not(ignored.((McKenzie(&(Mikkelsen,(2000)

★ If(Raven:nonGraven(=(5:5(and(black/nonGblack(=(5:5:★ RAll(men(are(stupid(than(the(average(of(human(

beings.R((RIf(one(is(a(man,(then.he(is(relatively(stupid.R)★ A(thoughtful(woman(can(be(confirmatory.

DataGfit(of(pARIs(and(PowerPC

0.0 0.2 0.4 0.6 0.8 1.00

20

40

60

80

100

Model prediction

Humanrating

AS95

0.0 0.2 0.4 0.6 0.8 1.00

20

40

60

80

100

Model prediction

Humanrating

BCC03exp1generative

0.0 0.2 0.4 0.6 0.8 1.00

20

40

60

80

100

Model prediction

Humanrating

BCC03exp3

0.0 0.2 0.4 0.6 0.8 1.00

20

40

60

80

100

Model prediction

Humanrating

H03

0.0 0.2 0.4 0.6 0.8 1.00

20

40

60

80

100

Model prediction

Humanrating

H06

0.0 0.2 0.4 0.6 0.8 1.00

20

40

60

80

100

Model prediction

Humanrating

LS00exp123

0.0 0.2 0.4 0.6 0.8 1.00

20

40

60

80

100

Model prediction

Humanrating

W03JEPexp2

0.0 0.2 0.4 0.6 0.8 1.00

20

40

60

80

100

Model prediction

Humanrating

W03JEPexp6

MetaGanalysis★ Fit(with(experiments((the(same(as(HaSori(&(Oaksford,(2007)

★ pARIs(fits(the(data(set(with(the(lowest(correlation(r(<(0.9,(the(highest(average(correlation(in(almost(all(the(data,(and(the(smallest(average(error.

experiment \ model pARIs DFH PowerPC ∆P Phi P(E|C) P(C|E) pCIAS95 0.94 0.95 0.95 0.88 0.89 0.91 0.76 0.87

BCC03: exp1 0.98 0.97 0.89 0.92 0.91 0.82 0.51 0.92BCC03: exp3 0.99 0.99 0.98 0.93 0.93 0.95 0.88 0.93

H03 0.99 0.98 -0.09 0.01 0.70 -0.01 0.98 0.40H06 0.97 0.96 0.74 0.71 0.71 0.89 0.58 0.70

LS00 0.93 0.95 0.86 0.83 0.84 0.58 0.34 0.83W03.2 0.90 0.85 0.44 0.29 0.55 0.47 0.18 0.77W03.6 0.93 0.90 0.46 0.46 0.46 0.77 0.56 0.54

average r 0.95 0.94 0.65 0.63 0.75 0.67 0.60 0.75average error 11.97 18.48 33.39 24.30 27.18 27.78 24.75 29.93

Values other than in error row are correlation coefficient r.

best　next best　bad　otherwise

-0.50

1.38

3.25

5.13

7.00

0.94 0.95 0.95 0.88 0.89 0.91 0.76 0.87

0.98 0.97 0.89 0.92 0.91 0.82 0.51 0.92

0.99 0.99 0.98 0.93 0.93 0.950.88

0.93

0.99 0.98

-0.09

0.01 0.70

-0.01

0.98 0.40

0.97 0.96

0.74 0.710.71

0.89 0.58 0.70

0.93 0.95

0.86 0.830.84

0.58 0.34 0.83

0.90 0.85

0.44 0.290.55

0.47 0.180.77

pARIs DFH PowerPC ΔP Phi P(E|C) P(C|E) pCI

AS95 BCC03exp1 BCC03exp3 H03 H06 LS00 W03.2

0

75

150

225

300

correlation

average,error

Experiments★ Experiment,1★ To,test,the,validity,of,rarity,assumption,in,ordinary,

causal,induction,from,2x2,covariation,information★ Experiment,2★ To,test,the,validity,of,rarity,assumption,in,causal,

induction,from,3x2,covariation,information★ Difference,in,the,cognition,between,rare,events,(a,#b,,and,c@type),and,non@rare,d@type,event,,people,just,vaguely,recognize,and,memorize,the,occurrence,of,d@type,events.

★ Experiment,3★ Rarity,vs.,presence@absence,(yes@no)

Experiment(1:(c(and(d(in(2x2(table

★ 27(undergraduates,(9(stimuli.

★ p:(to(give(artificial(diet(to(your(horse,(q:(your(horse(gets(ill.(

★ After(the(presentation(of((a,b,c,d),(participants(are(asked(the(causal(intensity(and(then(the(frequency(of(cG(and(dGtype(event.

stim. a b c d1 1 9 1 92 1 9 5 53 1 9 9 14 5 5 1 95 5 5 5 56 5 5 9 17 9 1 1 98 9 1 5 59 9 1 9 1

Result(of(exp.(1

★ Participants\(estimation(of(c(and(d(occurrence(was(basically(faithful,(but(d(is(estimated(larger(than(the(real(stimuli.

stim. a b c d1 1 9 1 92 1 9 5 53 1 9 9 14 5 5 1 95 5 5 5 56 5 5 9 17 9 1 1 98 9 1 5 59 9 1 9 1

02468

10

1 2 3 4 5 6 7 8 9

c cell

real c estimated c

0

3

5

8

10

1 2 3 4 5 6 7 8 9

d cell

real d estimated d

Experiment(2:c(and(d(in(3x2(table

★ 54,undergraduates,,2,stimuli.

★ As,a,medical,scientist,,p:,to,give,a,medicine,(three,types,,p1,,p2,and,p3),to,a,patient,q:,the,patient,develops,antibodies,against,a,virus.

★ After,the,presentation,of,six,kinds,of,events,,participants,are,asked,the,causal,intensity,of,p1,to,q,and,p2,to,q,,and,then,the,frequency,of,c<,and,d@type,event.

stimulus A q not-qp1 6 4p2 9 1p3 2 8

stimulus B q not-qp1 5 5p2 8 2p3 1 9

Experiment(2:c(and(d(in(3x2(table

★ Each(participant(estimates(the(intensity(of(causal(relationship(from(p1(to(q.

★ Then(asked(the(value(of(c,(as(RHow(often(q(happened(in(the(absence(of(p1?.R(The(given(value(of(c(is(9+2=11.

stimulus A q not-qp1 6 4p2 9 1p3 2 8

a b

c dfocus

+ +

Exp.,2:,Result

0

3

7

10

13

1 2 3 4

c cell

real c estimated c

0

4

7

11

14

1 2 3 4

d cell

real d estimated d

★ ParticipantsN,estimation,of,c,and,d,occurrence,were,very,different.,The,correlation,between,the,estimated,d,and,the,real,,given,value,of,d,was,significantly,smaller,than,for,c.

r2(=(0.49r2(=(0.99

Exp(3.(Rarity(vs.(affirmationGnegation

★ Do(people(respond(to(the(rarity((hence(informativeness)(or(more(simply((as(in(matching(heuristics/bias)(to(yes/no((presence/absence(of(cause(and(effect)?

★ 132(undergraduates,(4(stimuli(x(2(conditions.★ Participants(evaluates(the(causal(relationship(

from(mental$unstableness(to(dropout(in(college(students.

Exp(3.(Rarity(vs.(affirmationGnegation

★ Participants,are,randomly,divided,into,4x4=16,groups,,four,forms,in,two,conditions,(coinciding,and,contraditing)★ Group(1(:(Yes/Yes(means(Runstable(and(dropped(outR★ Group(2(:(Yes/No(means(Runstable(and(not(graduatedR★ Group(3(:(No/Yes(means(Rnot(healthy(and(dropped(outR★ Group(4(:(No/No(means(Rnot(healthy(and(not(graduatedR

Exp(3.(On(rarity★ Story:

★ Mentally,unstable:,rare★ Dropout:,rare

★ In,the,sample,(stimuli)★ Whether,the,sample,P(unstable),is,small,or,not★ Whether,the,sample,P(dropout),is,small,or,not

★ Two,conditions:★ Coinciding$condition$:,the,sample,P(unstable),and,

P(dropout),are,both,small,(coincides,with,the,story/prior,knowledge)

★ Contradicting$condition$:,the,sample,P(unstable),and,P(dropout),are,both,large,(contradicts,with,the,story/prior,knowledge)

Exp(3.(The(combinations(of(affirmation(and(negation

dropped out

not dropped out

unstable a bnot unstable c d

graduated not graduated

unstable a bnot

unstable c d

dropped out

not dropped out

mentally healthy a b

not mentally healthy c d

graduated not graduated

mentally healthy a b

not mentally healthy c d

orange : confirmatory instances, yellow : disconfirmatory instances, white : irrelevant

Participants evaluate the intensity of the causal relationship from the cause unstableness to the effect dropout is evaluated.48

Exp.(3(Result((coinciding(condition)

0

25

50

75

100

Mean pARIs

coinciding yes/yes

(2,2,2,8) (1,1,3,10) (1,1,1,15) (1,1,3,14)

0

25

50

75

100

Mean pARIs

coinciding yes/no

0

25

50

75

100

Mean pARIs

coinciding no/yes

0

25

50

75

100

Mean pARIs

coinciding no/no

49

Exp.(3(Result((contradicting(condition)

0

25

50

75

100

Mean pARIs

contradicting yes/yes

(6,1,1,1) (8,1,2,3) (7,3,1,3) (6,2,2,3)

0

25

50

75

100

Mean pARIs

contradicting yes/no

0

25

50

75

100

Mean pARIs

contradicting no/yes

0

25

50

75

100

Mean pARIs

contradicting no/no

50

stimuli :

Exp(3.(Discussion

★ In(both(of(the(two(conditions,(coinciding(and(contradicting,★ Participants(responded(to(the(rarity((hence(informativeness).

★ Not(to(mere(yes/no((presence/absence(of(cause(and(effect).★ If(they(had(responded(to(yes/no,(rather(than(the(

rarity,(then(we(would(observe(something(like(matching$bias?

Theoretical(background(of(biconditional(event(and(pARIs

★ Angelo,Gilio,and,Giuseppe,Sanfilippo,(manuscript,under,review),are,studying,biconditional#event,p⟛q#(named,by,Andy,Fugard),in,relation,to,quasi<conjunction.

★ Bart,Kosko,(2004),studied,probable$equivalence,,equivalent,idea,in,his,fuzzy,probability,theory.

★ There,are,some,equivalent,indices,defined,for,computing,similarity.

★ Computer,simulations,shows,that,pARIs,is,very,efficient,,reconciling,speed,and,accuracy,or,variance,and,bias,(their,tradeoff),in,inferring,the,correlation,of,the,population,from,a,small,sample,set,,with,the,highest,reliability,and,precision.

Simulation

0"

0.1"

0.2"

0.3"

0.4"

0.5"

0.6"

0.7"

0.8"

1" 2" 3" 4" 5" 6" 7" 8" 9" 10" 11" 12" 13" 14" 15" 16" 17" 18" 19" 20" 21" 22" 23" 24" 25" 26" 27" 28" 29"

pARIs"

DFH"

Delta"P"

Phi"

PowerPC"

Correlation of the population is 0.2

0"

0.1"

0.2"

0.3"

0.4"

0.5"

0.6"

0.7"

0.8"

0.9"

1"

1" 2" 3" 4" 5" 6" 7" 8" 9" 10" 11" 12" 13" 14" 15" 16" 17" 18" 19" 20" 21" 22" 23" 24" 25" 26" 27" 28" 29" 30" 31"

pARIs"

DFH"

Delta"P"

Phi"

PowerPC"

mean value through MC sim.

sd value

pARIs both speedily and accurately grasps the

population correlation with a very small sample

54

HaSori(&(Oaksford,(2007

DFH:(accurate(but(slowΔP:(fast(but(inaccurate

Indices(equivalent(to(the(probability(of(biconditional#event★ Psychology★ Tversky$index$of$similarity,$Tversky((1977)

★ Asymmetric(similarity(measure(comparing(a(variant(to(a(prototype.(Also(in:(Gregson((1975)(and(Sjöberg((1972)

★ Mathematics,(machine(learning(and(statistics:(★ Probable$equivalence,$or(the(probabilistic(

indentity(of(two(sets(A(and(B,$P(A=B)(by(Kosko((2004)

★ Tanimoto$similarity$coefficient★ Jaccard$similarity$measure

Tversky(indexPsychological Review

J Copyright © 1977 C_? by the American Psychological Association, Inc.

V O L U M E 84 N U M B E R 4 J U L Y 1 9 7 7

Features of Similarity

Amos TverskyHebrew UniversityJerusalem, Israel

The metric and dimensional assumptions that underlie the geometric represen-tation of similarity are questioned on both theoretical and empirical grounds.A new set-theoretical approach to similarity is developed in which objects arerepresented as collections of features, and similarity is described as a feature-matching process. Specifically, a set of qualitative assumptions is shown toimply the contrast model, which expresses the similarity between objects as alinear combination of the measures of their common and distinctive features.Several predictions of the contrast model are tested in studies of similarity withboth semantic and perceptual stimuli. The model is used to uncover, analyze,and explain a variety of empirical phenomena such as the role of common anddistinctive features, the relations between judgments of similarity and differ-ence, the presence of asymmetric similarities, and the effects of context onjudgments of similarity. The contrast model generalizes standard representa-tions of similarity data in terms of clusters and trees. It is also used to analyzethe relations of prototypicality and family resemblance.

Similarity plays a fundamental role in errors of substitution, and correlation betweentheories of knowledge and behavior. It serves occurrences. Analyses of these data attempt toas an organizing principle by which individuals explain the observed similarity relations andclassify objects, form concepts, and make gen- to capture the underlying structure of the ob-eralizations. Indeed, the concept of similarity jects under study.is ubiquitous in psychological theory. It under- The theoretical analysis of similarity rela-lies the accounts of stimulus and response tions has been dominated by geometricgeneralization in learning, it is employed to models. These models represent objects asexplain errors in memory and pattern recogni- points in some coordinate space such that thetion, and it is central to the analysis of con- observed dissimilarities between objects cor-notative meaning. respond to the metric distances between the

Similarity or dissimilarity data appear in respective points. Practically all analyses ofdifferent forms: ratings of pairs, sorting of proximity data have been metric in nature,objects, communality between associations, although some (e.g., hierarchical clustering)

yield tree-like structures rather than dimen-This paper benefited from fruitful discussions with sionallv. organized spaces. However, most

Y. Cohen, I. Gati, D. Kahneman, L. Sjeberg, and theoretical and empirical analyses of similarityS. Sattath. assume that objects can be adequately repre-

Requests for reprints should be sent to Amos Tversky, , , . . . ,. , iDepartment of Psychology, Hebrew University, sented as points m some coordinate space andJerusalem, Israel. that dissimilarity behaves like a metric dis-

327

330 AMOS TVERSKY

A-B

APIB

B - A

Figure 1. A graphical illustration of the relation betweentwo feature sets.

lection of features is viewed as a product of aprior process of extraction and compilation.

Second, the term, feature usually denotes thevalue of a binary variable (e.g., voiced vs.voiceless consonants) or the value of a nominalvariable (e.g., eye color). Feature representa-tions, however, are not restricted to binary ornominal variables; they are also applicable toordinal or cardinal variables (i.e., dimensions).A series of tones that differ only in loudness,for example, could be represented as a sequenceof nested sets where the feature set associatedwith each tone is included in the feature setsassociated with louder tones. Such a represen-tation is isomorphic to a directional unidimen-sional structure. A nondirectional unidimen-sional structure (e.g., a series of tones thatdiffer only in pitch) could be represented by achain of overlapping sets. The set-theoreticalrepresentation of qualitative and quantitativedimensions has been investigated by Restle(1959).

Let s(a,b) be a measure of the similarity ofa to b denned for all distinct a, b in A. Thescale s is treated as an ordinal measure ofsimilarity. That is, s(a,b) > s(c,d) means thata is more similar to b than c is to d. Thepresent theory is based on the followingassumptions.

1. Matching:s(a,b) = F(AH B, A - B, B - A).

The similarity of a to b is expressed as afunction F of three arguments: AHB, thefeatures that are common to both a and b;A — B, the features that belong to a but notto b; B — A, the features that belong to b but

not to a. A schematic illustration of thesecomponents is presented in Figure 1.

2. Monotonicity:s(a,b) > s(a,c)

whenever, A - B C A - C ,

andB - A C C - A.

Moreover, the inequality is strict whenevereither inclusion is proper.

That is, similarity increases with additionof common features and/or deletion of distinc-tive features (i.e., features that belong to oneobject but not to the other). The monotonicityaxiom can be readily illustrated with blockletters if we identify their features with thecomponent (straight) lines. Under this as-sumption, E should be more similar to F thanto I because E and F have more commonfeatures than E and I. Furthermore, I shouldbe more similar to F than to E because I andF have fewer distinctive features than I and E.

Any function F satisfying Assumptions 1and 2 is called a matching function. It measuresthe degree to which two objects—viewed assets of features—match each other. In thepresent theory, the assessment of similarity isdescribed as a feature-matching process. It isformulated, therefore, in terms of the set-theoretical notion of a matching functionrather than in terms of the geometric conceptof distance.

In order to determine the functional formof the matching function, additional assump-tions about the similarity ordering are intro-duced. The major assumption of the theory(independence) is presented next; the remain-ing assumptions and the proof of the represen-tation theorem are presented in the Appendix.Readers who are less interested in formaltheory can skim or skip the following para-graphs up to the discussion of the representa-tion theorem.

Let $ denote the set of all features associatedwith the objects of A, and let X,Y,Z,... etc.denote collections of features (i.e., subsets of$). The expression F(X,Y,Z) is defined when-ever there exists a, b in A such that A C\ B = X,

FEATURES OF SIMILARITY 333

matching function of interest is the ratio model,_

. , - f (AnB)+af(A-B)+^f(B-A)'« , /3>0,

where similarity is normalized so that S liesbetween 0 and 1. The ratio model generalizesseveral set-theoretical models of similarityproposed in the literature. If a = ft = 1, S (a,b)reduces to f (A H B)/f (A (J B) (see Gregson,1975, and Sjoberg, 1972). If « = ft = i S(a,b)equals 2f(AH B)/(f(A) + f(B)) (see Eisler &Ekman, 1959). If a = 1 and ft = 0, S(a,b) re-duces to f (AH B)/f (A) (see Bush & Hosteller,1951). The present framework, therefore, en-compasses a wide variety of similarity modelsthat differ in the form of the matching functionF and in the weights assigned to its arguments.

In order to apply and test the present theoryin any particular domain, some assumptionsabout the respective feature structure must bemade. If the features associated with eachobject are explicitly specified, we can test theaxioms of the theory directly and scale thefeatures according to the contrast model. Thisapproach, however, is generally limited tostimuli (e.g., schematic faces, letters, stringsof symbols) that are constructed from a fixedfeature set. If the features associated with theobjects under study cannot be readily speci-fied, as is often the case with natural stimuli,we can still test several predictions of thecontrast model which involve only generalqualitative assumptions about the featurestructure of the objects. Both approaches wereemployed in a series of experiments conductedby Itamar Gati and the present author. Thefollowing three sections review and discuss ourmain findings, focusing primarily on the testof qualitative predictions. A more detailed de-scription of the stimuli and the data are pre-sented in Tversky and Gati (in press).

Asymmetry and FocusAccording to the present analysis, similarity

is not necessarily a symmetric relation. Indeed,it follows readily (from either the contrast orthe ratio model) thats(a,b) = s(b,a) iff of (A - B) + 0f (B - A)

= of (B - A) + /3f (A - B)iff (a - /9)f (A - B) = (a - 0)f (B - A).

Hence, s(a,b) = s(b,a) if either a = /8, orf(A - B) = f(B - A), which implies f (A) =f(B), provided feature additivity holds. Thus,symmetry holds whenever the objects are equalin measure (f(A) = f(B)) or the task is non-directional (a = /3). To interpret the lattercondition, compare the following two forms:

(i). Assess the degree to which a and b aresimilar to each other.

(ii). Assess the degree to which a is similartob.

In (i), the task is formulated in a nondirectionalfashion; hence it is expected that a = ft ands(a,b) = s(b,a). In (ii), on the other hand, thetask is directional, and hence a and /3 maydiffer and symmetry need not hold.

If s(a,b) is interpreted as the degree towhich a is similar to b, then a is the subjectof the comparison and b is the referent. Insuch a task, one naturally focuses on the sub-ject of the comparison. Hence, the features ofthe subject are weighted more heavily thanthe features of the referent (i.e., a > 0). Con-sequently, similarity is reduced more by thedistinctive features of the subject than by thedistinctive features of the referent. It followsreadily that whenever a > ft,

s(a,b) > s(b,a) iff f(B) > f(A).Thus, the focusing hypothesis (i.e., a > /3)implies that the direction of asymmetry isdetermined by the relative salience of thestimuli so that the less salient stimulus is moresimilar to the salient stimulus than vice versa.In particular, the variant is more similar tothe prototype than the prototype is to thevariant, because the prototype is generallymore salient than the variant.

Similarity of CountriesTwenty-one pairs of countries served as

stimuli. The pairs were constructed so that oneelement was more prominent than the other(e.g., Red China-North Vietnam, USA-Mexico,Belgium-Luxemburg). To verify this relation,we asked a group of 69 subjects2 to select in

2 The subjects in all out experiments were Israelicollege students, ages 18-28. The material was pre-sented in booklets and administered in a group setting.

Biconditional,event

★ Developmental★ Merely(transient(in(the(

process(of(narrowing(the(scope,(between(conjunctive(and(conditional?((Gauffroy(and(Barouillet,(2009)

★ Probably(there(are(theoretical(reasons(for(the(dominance(of(defective(biconditional((biconditional(event).

tween grade 9 and adults (p < .001). This age-related effect was strongly reduced for the strong rela-tions, F(3, 164) = 2.38, p = .07, g2

p ! :04, resulting in a significant age " type of relation interaction,F(3, 164) = 7.17, p < .001, g2

p ! :11. As in Experiment 1, this interaction was no longer significant whenadults were discarded from the analysis.

Response patterns analysis

We categorized the response patterns for the eight causal conditionals on the basis of the fourinterpretations already observed in Experiment 1, that is conjunctive, defective biconditional, defec-tive conditional and matching interpretations (Fig. 3). The matching pattern was produced only byyounger participants (third graders), explaining the age-related increase in ‘‘false” responses on thep :q case. First of all, as we predicted, conjunctive response patterns predominated in younger

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

3 6 9 adultsGrades

Weak

Conjunctive Def Bicond Def Cond

MPOther

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

3 6 9 adultsGrades

Strong


MPOther

Fig. 3. Percent of response patterns categorized as conjunctive, defective biconditional (Def Bicond), defective conditional (DefCond), matching (MP), and others as a function of grades for strong and weak causal conditionals in Experiment 2.

C. Gauffroy, P. Barrouillet / Developmental Review 29 (2009) 249–282 263

Gauffroy and Barouillet, 2009

Conclusion★ Our,intuition,for,generative,causality,from,co@occurrence,

data,is,the$probability$of$biconditional$event,(or,defective$biconditional).★ Conditional,event,is,the,conditional,in,the,new,paradigm.★ Biconditional$event$is,the,biconditional,in,the,new,

paradigm.★ In,causal,induction,,biconditional,event,focuses,on,rare$events,and,neglects,abundant,events,,in,the,uncertain,world.★ pARIs:,proportion,of,assumed@to@be,rare,instances

★ Defective,biconditional,is,turning,out,to,have,some,normative,nature,and,theoretical,grounds,as,biconditional,event.

Future(Issues★ Information,theoretical,analysis,of,the,efficiency,to,compute,pARIs,,

defective,biconditional,or,biconditional,event★ Gilio,and,Sanfilippo,proved,biconditional,event,is,a,kind,of,norm,,and,Kosko,

defined,it,as,a,measure,for,the,identity,(binary,relation),of,two,random,variables

★ The,relationship,of,causal,induction,and,(causal),conditionals★ Semantic,and,pragmatic,analysis,,and,the,conditionals,of,the,diagnostic/

abductive,form,_if,effect,,then,cause._,(Over)★ To,determine,the,scope,of,the,pARIs,rule

★ In,other,words,,when,delta@p,or,Power,PC,can,be,descriptive?,(w/,Habori,,Habori,,Over)

★ To,establish,a,full,connection,with,the,new,paradigm,psychology,of,reasoning,(Over,,Evans,,...),and,the,de,Finebi,table,(Baratgin,,Policer,,...),(w/,Baratgin,,Habori,,Habori)★ Toward,an,integration,of,conditional,reasoning,and,statistical,inference,

★ The,four,cards,in,Wason,selection,tasks,fall,into,four,cells,on,de,Finebi,table.,(Over)

Conditionals(in(development★ Development,of,understanding,of,conditionals,(Gauffroy,&,

Barouillet,,2009)★ Four,developmental,stages:,3rd,grader,,6th,grader,,9th,

grader,,adults,(respectively,,8,,11,,15,,24,years,old,in,average)★ Defective,biconditional,=,biconditional,event,shows,up.

conjunctive defective defective material probability conditional biconditional conditional

p q p|q q|p p⟛q p⊃qT T T T T TT F F F F FF T F V F TF F F V V T

Indicative conditional in development

name formConjunctive = TT/AllDef Bicond = TT/(TT+TF+FT)Def Cond = TT/(TT/TF)

MP = (TT+FT+FF)/AllOther = other forms

All := TT+TF+FT+FFAll := TT+TF+FT+FFAll := TT+TF+FT+FF

determined for each participant. As shown in Fig. 1, for each age and experimental condition, almostall the response patterns corresponded to one of the four interpretations described above.

First of all, as we predicted, conjunctive patterns were predominant in younger participants andtheir rate decreased with age for both types of conditionals, with significant linear trends in both cases,F(1, 88) = 41.87, p < .001, g2

p ! :32, and F(1, 88) = 54.96, p < .001, g2p ! :38 for NN and BB conditionals

respectively (Fig. 1). In line with our hypotheses, for NN conditionals, the defective biconditional inter-pretation represented an intermediate developmental step testified by a significant quadratic trend,F(1, 88) = 17.51, p < .001, g2

p ! :21, and was followed by the defective conditional responses that in-creased with age and became predominant in adults with a significant linear trend, F(1, 88) = 50.23,p < .001, g2

p ! :51. Concerning BB conditionals, our main prediction was that binary terms should affect

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

3 6 9 adultsGrades

NN

ConjunctiveDef BicondDef Cond

MPOther

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

3 6 9 adultsGrades

BB

ConjunctiveDef BicondDef Cond

MPOther

Fig. 1. Percent of response patterns categorized as conjunctive, defective biconditional (Def Bicond), defective conditional (DefCond), matching (MP) and others as a function of grades for NN and BB conditionals in Experiment 1.


Appendix

BB conditionals used in Experiment 1

‘‘If the pupil is a boy then he wears glasses”.‘‘If the door is open then the light is switched on”.‘‘If the student is a woman then she wears a shirt with long sleeves”.‘‘If the piece is big then it is pierced”.

NN conditionals used in Experiment 1

‘‘If the card is yellow then a triangle is printed on it”.‘‘If there is a star on the screen then there is a circle”.‘‘If he wears a red t-shirt then he wears a green trousers”.‘‘If there is a rabbit in the cage then there is a cat”.

Strong causal relations used in Experiment 2

‘‘If the button 3 is turned then the blackboard’s lights are switched on”.‘‘If the lever 2 is down, then the rabbit’s cage is open”.‘‘If the second button of the machine is green then the machine makes sweets”.‘‘If I pour out pink liquid in the vase then stars appear on it”.

Weak causal relations used in Experiment 2

‘‘If the touch F5 is pressed then the computer screen becomes black”.‘‘If the boy eats alkali pills then his skin tans”.‘‘If the fisherman puts flour in the water then he catches a lot of fishes”.‘‘If the gardener pours out buntil in his garden then he gathers a lot of tomatoes”.

Promises used in Experiment 3

‘‘If you gather the leafs in the garden then I give you 5 francs”.‘‘If you score a goal then I name you captain”.‘‘If you exercise the dog then I cook you a cake for dinner”.‘‘If you clean your room then you watch the TV”.

Threats used in Experiment 3

‘‘If you break the vase then I take your ball”.‘‘If you do not buy the bread then you do not play video games”.‘‘If you do not do your homework then you do not go to the attraction park”.‘‘If you have a bad mark then you do not go to the movie”.

References

Artman, L., Cahan, S., & Avni-Babad, D. (2006). Age, schooling and conditional reasoning. Cognitive Development, 21(2), 131–145.Barra, B. G., Bucciarelli, M., & Johnson-Laird, P. N. (1995). Development of syllogistic reasoning. American Journal of Psychology,

108(2), 157–193.Barrouillet, P., Gauffroy, C., & Lecas, J. F. (2008). Mental models and the suppositional account of conditionals. Psychological

Review, 115(3), 760–771.Barrouillet, P., Gavens, N., Vergauwe, E., Gaillard, V., & Camos, V. (2009). Memory span development: A time-based resource-

sharing model account. Developmental Psychology, 45(2), 477–490.Barrouillet, P., Grosset, N., & Lecas, J. F. (2000). Conditional reasoning by mental models: Chronometric and developmental

evidence. Cognition, 75, 237–266.Barrouillet, P., & Lecas, J. F. (1998). How can mental models account for content effects in conditional reasoning? A

developmental perspective. Cognition, 67, 209–253.Barrouillet, P., & Lecas, J. F. (1999). Mental models in conditional reasoning and working memory. Thinking and Reasoning, 5(4),

289–302.Barrouillet, P., & Lecas, J. F. (2002). Content and context effects in children’s and adults’ conditional reasoning. Quarterly Journal

of Experimental Psychology, 55(3), 839–854.

280 C. Gauffroy, P. Barrouillet / Developmental Review 29 (2009) 249–282

Gauffroy & Barouillet, 200961

Causal conditional in development


MP = (TT+FT+FF)/AllOther = other forms


Appendix













References










tween grade 9 and adults (p < .001). This age-related effect was strongly reduced for the strong rela-tions, F(3, 164) = 2.38, p = .07, g2

p ! :04, resulting in a significant age " type of relation interaction,F(3, 164) = 7.17, p < .001, g2

p ! :11. As in Experiment 1, this interaction was no longer significant whenadults were discarded from the analysis.

Response patterns analysis

We categorized the response patterns for the eight causal conditionals on the basis of the fourinterpretations already observed in Experiment 1, that is conjunctive, defective biconditional, defec-tive conditional and matching interpretations (Fig. 3). The matching pattern was produced only byyounger participants (third graders), explaining the age-related increase in ‘‘false” responses on thep :q case. First of all, as we predicted, conjunctive response patterns predominated in younger

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

3 6 9 adultsGrades

Weak


MPOther

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

3 6 9 adultsGrades

Strong


MPOther

Fig. 3. Percent of response patterns categorized as conjunctive, defective biconditional (Def Bicond), defective conditional (DefCond), matching (MP), and others as a function of grades for strong and weak causal conditionals in Experiment 2.



Promise and threat conditionals in development


MP = (TT+FT+FF)/AllEquivalence = (TT+FF)/All


dratic trend already described with NN conditionals, and defective conditional patterns that in-creased with age.

As in the previous experiments, it is worth noting that, although the promise and threats did notdiffer in the rate of equivalence responses they elicited (74%, 78%, 72%, and 74% for the four promisesand 76%, 78%, 74%, and 74% for the four threats), the effect of pragmatic implicature was not universal.When it did not occur (i.e., when the interpretation differed from the equivalence reading), the stan-dard developmental trend from a conjunctive, to a defective biconditional and finally a defective con-ditional interpretation reappeared and, in line with the 33% of indeterminate responses on :p q cases,adults produced 30% of defective conditional patterns (Fig. 4).

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

3 6 9 AdultsGrades

Promises

ConjunctiveDef Bicond Def Cond

EquivalenceOther

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

3 6 9 AdultsGrades

Threats

ConjunctiveDef Bicond Def Cond

EquivalenceOther

Fig. 4. Percent of response patterns categorized as conjunctive, defective biconditional (Def Bicond), defective conditional (DefCond), equivalence, and others as a function of grades for promises and threats in Experiment 3.


Appendix













References











Probability judgment in development

could be expected from previous studies (Evans et al., 2003; Oberauer & Wilhelm, 2003), conjunctiveresponses were very frequent, even in adults. Our interpretation is that the difficulty of the task leadsmany participants to base their evaluation on the sole initial model provided by heuristic processes. Asa consequence, it can be observed that the developmental trend resulting from the intervention of theanalytic system is delayed in the probability task, with sixth graders producing almost 80% of conjunc-tive responses, a rate never observed with the truth table task in the present study or the inferencetask in previous researches (e.g., Barrouillet et al., 2000).

Though our theoretical account of the evaluation of the probability of conditionals is akin to Evans’conception, the two proposals differ in important aspects. According to Evans, the meaning of the con-ditional is suppositional in nature. This means that understanding or evaluating a conditional involvesthe construction of mental models of the antecedent and, on that basis, the assessment of the believ-ability of the consequent through the Ramsey test (Evans et al., 2003). In this account, the irrelevanceof :p cases is a consequence of the suppositional nature of the conditional: it is because the Ramseytest focuses on p possibilities that :p cases are disregarded as irrelevant. Our account is different be-cause the developmental approach makes clear that the irrelevance of :p cases is a gradual construc-tion. Importantly, some :p cases such as :p :q can be considered as irrelevant while others (i.e., :p:q) are still considered as falsifying the conditional. Thus, understanding the irrelevance of :p devel-opmentally precedes the suppositional level of interpretation, reversing the causal relation betweenthe suppositional nature of the conditional and the irrelevance of false-antecedent cases postulatedby Evans. In our theory, the main fact is the developmental increase in the number of models con-structed through fleshing out, :p cases being considered as irrelevant as far as the corresponding men-tal models can be constructed through analytic processes. Thus, in our approach, the suppositionalnature of the conditional is not the cause but the consequence of the irrelevance of :p cases, whichresults in turn from the complete fleshing out of the initial representation. We do not deny that mostadults use a procedure like the Ramsey test to assess conditionals, but we regard this procedure asresulting from the process of constructing a complete set of mental models rather than as the basisof the meaning of conditionals.

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

Adults96Grades

ConjunctiveDef Bicond

Def CondOther

Fig. 6. Percent of response patterns categorized as conjunctive, defective biconditional (Def Bicond), and defective conditional(Def Cond) responses to the probability task in Experiment 4.


How does our theory account for the way people evaluate the probability of conditional statementsand what are its developmental predictions? Our hypothesis is that people evaluate the probability ofa given conditional statement from the mental models they have constructed by focusing on thosecases that are relevant for the truth or falsity of this conditional (i.e., that make it either true or false).The probability that the conditional is either true or false would be given by the ratio between thosecases that make the conditional either true or false and the relevant cases. We have seen that, foradults, the relevant cases are most often the p q and p :q cases whereas the :p cases are irrelevant,leaving indeterminate the truth value of the conditional. Thus, the probability of a basic conditionalof being true or false should be P (p q)/[P (p q) + P (p :q)] and P (p :q)/[P (p q) + P (p :q)] respectively,as Evans et al. (2003) observed. For sake of simplicity, we will call this response the defective condi-tional response in reference to the corresponding interpretation in the truth table task. For example,with the pack of cards represented in Fig. 5 and the conditional ‘‘If the card is black, then there is asquare printed on it”, the defective conditional response for true is 1/4 and 3/4 for false. However,what would be these evaluations at the other developmental levels? The different levels on the inter-pretation of conditionals previously identified permit critical predictions. For children and adolescentswho favor a conjunctive interpretation, there is no irrelevant case: p q cases make the conditional truewhereas the other cases make it false. Thus, the probability that the conditional is true should be P (pq) (i.e., 1/8 in the example of Fig. 5) whereas the probability that it is false should be P (p :q) + P (:pq) + P (:p :q), that is 7/8 in the example. More interesting is the intermediary level of interpretationwe observed above, which involves a defective biconditional reading of conditionals. Within this inter-pretation, p q is the sole case making the conditional true, p :q and :p q cases make it false, :p :qbeing the sole case deemed as irrelevant. As a consequence, at this developmental level, the relevantcases are p q, p :q, and :p q. Thus, the probability for ‘‘true” should equate P (p q)/[P (p q) + P (p :q) + P(:p q)], whereas the probability for ‘‘false” should be [P (p :q) + P (:p q)]/[P (p q) + P (p :q) + P (:p q)].In our example, this leads to a probability of 1/6 for true and 5/6 for false. We will consider this re-sponse as a defective biconditional response. Thus our theory predicts that evaluating the probabilityof the conditionals as the conditional probability should be a developmental achievement related tothe defective conditional interpretation observed in adults. This level should be preceded by differentevaluations related to the incomplete interpretations resulting from immature analytic processes, aswe observed in the truth table task.

How could the frequent conjunctive responses observed in adults be explained within this theoret-ical framework? As Evans et al. (2003), we do not believe that more than 40% of adults really have aconjunctive understanding of basic conditionals, and it can be noted that we practically never ob-served such a response pattern in the previous experiments, nor Evans et al. (2007) with causal con-ditionals. However, we do not endorse Evans’ explanation of an incomplete Ramsey test, our theoryproviding a fairly simple account of this phenomenon. Adults have the capacities to go beyond a

Fig. 5. Example of material given to participants in the probability task.



LS(part

LS(and(pARIs

★ pARIs(almost(coincides(with(LS(under(extreme(rarity((lim(d→∞).

LSR(q|p) = limd!1

LS(q|p) ⇡ pARIs

Dilemma(and(tradeoff

the.tradeoff.between.speed((shortGterm(reward)(and.accuracy((longGterm(reward)

The.dilemma.between.exploitation.(information(utilization)(and.exploration.(information(acquisition)

leads(to

Dilemma(and(tradeoff

While(it(is(desirable(to(be(fast(and(accurate,(quality(often(comes(at(the(cost(of(speed.(

(Jiang(et(al.,(NIPS(2012)

We(can\t(locally.optimize(while(broadening.the.range.of.JlocalJ.at(the(same(time.

choosing(a(known(option(vs.(looking(for(a(new(unknown(option.

leads(to

n@armed,bandit,problems

★ The(simplest(framework(exhibiting(the(dilemma(and(tradeoff.

★ It(is(to(maximize(the(total(reward(acquired(from(n(sources(with(unknown(reward(distribution.

★ OneGarmed(bandit(is(a(slot(machine(that(gives(a(reward((win)(or(not((lose).

★ nGarmed(bandit(is(a(slot(machine(with(n(arms(that(have(different(probability(of(winning.(

n@armed,bandit,problems

★ In(this(study,(we(let(the(reward(be(binary,(1((win)(or(0((lose).★ This(form(is(the(most(important(one(used(in(

MonteGCarlo(Tree(Search(extremely(successful(and(popular(for(AIs(for(the(Game(of(Go(囲碁AI.(

★ Each(arm(of(the(slot(machine(has(a(probability(of(giving(1((win).★ n(probabilities(defines(a(nGarmed(bandit(

problems.

Exploitation(vs.(exploration(in((bandits

★ Exploitation(is(to(utilize(the(existing(information,(trying(the(local(optimization.★ In(bandits,(it(is(to(choose(the((greedy)(arm(with(the(

highest(probability(of(winning.★ Exploration(is(to(broaden(the(range(of(information(at(

hand,(trying(the(search(for(the(best(yet(unknown(arm.★ to(choose(an((nonLgreedy)(arm(with(the(unknown(or(

lower(probability(of(winning(than(the(greedy(arm.★ Hence(exploitation.and.exploration.is.mutually.exclusive.and(incompatible.

Exploitation(vs.(exploitation(in((bandits

★ ...(Hence(exploitation.and.exploration.is.mutually.exclusive.and(incompatible.

★ QUESTION:(Is(this(true?(On(what(ground?(Isn\t(there(the,cost,of,well2definedness?

_Policies_,to,handle,the,dilemma

★ Basically,designed,to,_balance_,exploitation,and,exploration,,accepting,the,incompatibility,between,them,,probabilistically,recombining,the,two.★ ε2greedy,policy:

★ Given,a,parameter,ε,,choose,the,greedy,action,with,probability,1–ε,and,one,of,the,non@greedy,actions,with,probability,ε.

★ Softmax,action$selection$policy:★ Roulebe,selection,of,action,with,the,probability,of,

choosing,each,action,given,by,Gibbs,distribution,and,a,noise,(temperature),parameter,τ.

Speed–Accuracy,Trade@off

0 200 400 600 800 1000

0.5

0.6

0.7

0.8

Steps

Accu

racy

rate

softmax1softmax2

Speed,and,accuracy,are,usually,not,compatible.,

Speedy

Accurate

—$softmax$1—$softmax$2

Accuracy,(the(rate(of(the(optimal(action(

chosen)

Step,(the(number(of(choice)

Models(for(bandits★ PolicyGbased(models★ ε2greedy,policy,and,Softmax,

action,selection,rule★ Value(function(models

★ UCB1,(this,enabled,the,current,performance,of,Game,of,Go,AI,with,MCTS)

★ LS,(our,cognitively–inspired,model,implementing,cognitive,properties,that,appear,to,be,illogical,and,useless)

Agent

Environment

action

state

reward

policyvalue of actions action

value functionaction action value

policyvalue ofactions

action

value functionaction action

value

Components of reinforcementlearning model

The,currently,best,model,for,bandits

UCB1@tuned,:

UCB1,:

the(term(to(suspend(judgment(and(induce(RsearchR

Auer(et(al.,(Machine#learning,(

2002

★ A,is,an,action,(arm)★ E,is,the,presence,of,reward,(E=1).★ n,is,the,current,step,(=,the,number,of,times,arms,are,chosen).★ ni,is,the,number,of,times,the,agent,chose,the,arm,Ai.

Value function considering

the reliability (sample size)

Illustration(of(UCB1(

★ The,reason,for,the,performance,of,UCB1@tuned,is,that,it,delays,the,judgement,of,value,as,long,as,possible.

0.4

A1 A2

0.60.4

A1 A2

0.6

as the arms are chosen many times

< >

the extra term

decays

Current,model,for,bandits

0 200 400 600 800 1000

0.5

0.6

0.7

0.8

0.9

Steps

Accu

racy

rate

softmax1softmax2UCB1UCB1.tuned

Speedy & Accurate

Speedy

Accurate

— softmax 1— softmax 2— UCB1— UCB1-tuned

Accuracy,

Step,

Problems(of(UCB1

★Worse(in(the(initial(stage((the(speed.is(low)(compared(with(other(valid(models.★ It(must(be(both(fast(and(accurate,(but(UCB1(

pursues(accuracy(at(the(cost(of(speed.★ UCB1(requires(so(many(steps.

★ It(doesn\t(work(well(when(the(reward(is(sparse.★ In(the(real(world,(we(can\t(limitlessly(choose(actions.(

We(don\t(have(such(massive(resource.(Also,(the(reward(for(an(action(can(come(much(later.(

What(do(we(do?★ Propose,a,new,model,for,overcoming,the,speed–

accuracy,tradeoff,by,weakening,the,dilemma,between,greedy,and,non2greedy,actions.★ We,implement,our,ideas,as,a,value,function,,not,as,a,

policy,,because:★ Value,function,,such,as,expected,value,or,conditional,

probability,,is,much,more,portable.★ Policy,often,needs,many,parameters,and,therefore,requires,

parameter@tuning,,and,then,becomes,specific,to,a,certain,problem.,(←,Knowledge,of,the,problem,somewhat,required,a#priori)

★ The,ideas,to,implement,are,based,on,cognitive,properties,from,cognitive,science,with,empirical,supports,from,brain,science.

Three,cognitive,properties

★A.(Satisficing(★ coined(as(RsatisfyR(+(RsufficeR★ Simon,(Psy.#Rev.,(1956(

★B.(Risk(aSitude(★ Kahneman(&(Tversky,(Am.#Psy.,(1984

★C.(Relative(estimation(★ Tversky(&(Kahneman,(Science,(1974

Irrationality(of(the(three(cognitive(properties

★A.(Satisficing(★ No(optimization(but(falling(into(a(local(optimum.

★B.(Risk(aSitude(★ Groundless(introduction(of(asymmetry(between(

gain(and(loss.

★C.(Relative(estimation★ Superstitious(assumption(of(the(value(of(arms(

mutually(dependent

Rationality(of(the(three(cognitive(properties

★A.,Satisficing,★ Not(optimize(but(look(for(and(choose(a(satisfactory(

answer(over(a(reference(level,(when(global(optimization(is(intractable.

★ If,only,the,reference,is,properly,set,(just,between,the,best,and,second,best,arm),,satisficing,means,optimization.

★ B.,Risk,abitude(★ Consider(the(reliability(of(information(

★C.,Relative,estimation★ Evaluate(the(value(of(an(action(in(comparison(with(other(

actions

Brain science

Psychology

Kolling et al.,

Science, 2012

Simon, Psy. Rev., 1956

Boorman et al.,

Neuron, 2009

Kahneman & Tversky, Am. Psy.,

1984

Daw et al.,

Nature, 2006

Tversky & Kahneman,

Science, 1974

Expected value 0.75 = 75% 25% = 25%

win (o) and lose (x) in the past

○×○○○×○○○○○○○×○○○×○×

○×○○×○×××○×××××××○×××○×○

×○××

comparison considering reliability

> <

Gamble on 1/4 rather than 5/20.

Property$C:$Relative$evaluationTry arms other than

A1 by relative evaluation (see-saw)

Property$A:$Satisficing

value of A1

value of A2 No pursuit of arms over the reference level given

all arms are over reference

Risk-avoiding over the reference

Rely on 15/20 than 3/4.

Property$B:$Risk$aZitude$(Reliability$consideration)Risk-seeking under the reference

reflection effect

Choose A1 and lose

with the boundary of 0.5

Search hard for an arm over the reference level

all arms are under referencevalue of A1

value of A2

reference

reference

value of A1

value of A2

value of A1

value of A2

value of A1

value of A2

if absolute if relative

Relative,evaluation(is(especially(important

★ Relative(evaluation:(★ is(what(even(slime(molds((粘菌)(and(real(neural(networks(

(conservation(of(synaptic(weights)(do.(Behavioral(economics(found(that(humans(comparatively(evaluate(actions(and(states.

★ weakens,the,dilemma,between,exploitation,and,exploration,with,the,see2saw,game,like,competition,among,arms:(★ Through,failure,(low,reward),,choice,of,greedy,action,may,quickly,

trigger,to,the,next,choice,of,the,previously,second,best,,non@greedy,arm.★ Through,success,(high,reward),,choice,of,greedy,action,may,quickly,

trigger,to,focussing,on,the,currently,greedy,action,,lessening,the,possibility,of,choosing,non@greedy,arms,by,decreasing,the,value,of,other,arms.

Try arms other than A1 by relative

evaluation (see-saw)

Choose A1 and lose

value of A1

value of A2

value of A1

value of A2

if relative

value of A1

value of A2

if absolute

The(framework(of(models(of(the(three(properties

★ Let(there(only(be(two(arms(A1(and(A2.

★ On(the(2x2(contingency(table(of(two(actions(and(two(reward(levels(in(the(right,(

★ The(expected(reward(value(for(each(is★ V(A1)=E(A1)=P(1|A1)=(a/(a+b)★ V(A2)=E(A2)=P(1|A2)=(c/(c+d)

RewardReward1 0

A1 a bA2 c d

A(model((RRSR)(of(the(three(properties

★ A(value(function(VRS(equipped(with(the(three(properties(can(be(given(as:(★ VRS(A1)(=((a+d)/(a+d+b+c),(★ VRS(A2)(=((b+c)/(b+c+a+d).★ with(the(denominator(identical,

((((((((((((((((((((((((((((((((is(simply(the(sign(of((a+d)G(b+c)

★ This(is(the(RS(heuristics:(★ [if$(a+d$>$b+c)$then$choose$A1,$else$choose$A2[

RewardReward1 0

A1 a bA2 c d

argmax

Ai

V (Ai)

RS(heuristics

★ Property(C((relative(estimation(of(value):★ Failing(to(get(reward(with(arm(A2,means(A1(is(relatively,good,(and(vice(versa.

★ The(value(of(A1(and(A2(are(respectively(a+d(and(c+b.RewardReward

1 0A1 a bA2 c d

VRS(A1) a+da+dVRS(A2) c+bc+b

RS(heuristics

★ Property(B((risk(aSitude)★ Let((a,b,c,d)(=((70,(30,(7,(3).

★ V(A1):V(A2)(=(73:37(★ More(reliable((A1)(is(preferred.

★ Let((a,b,c,d)(=((30,(70,(3,(7).★ V(A1):V(A2)(=(37:73(★ Less(reliable((A2)(is(preferred((since(A2(has(more(chance(

of(having(beSer(value(than(30%(of(giving(reward).

RewardReward1 0

A1 a bA2 c d


RS(heuristics★ Property(A((satisficing)

★ Efficiently(realized(by(property(C(&(B,(with(reference(r,=0.5.

★ If(P(1|A1)(=(P(1|A2)(>(0.5(and(N(A1)(>(N(A2)(then(VRS(A1)(>(VRS(A2)(and(keep(choosing(A1,(indifferently.★ When((a,b,c,d)(=((70,(30,(7,(3),((((

VRS(A1):VRS(A2)(=(73:37.(★ If(P(1|A1)(=(P(1|A2)(<(0.5(and(N(A1)(

>(N(A2)(then(VRS(A1)(<(VRS(A2)(and(try(A2,(wondering(if(P(1|A2)(>(r((0.5).★ When((a,b,c,d)(=((30,(70,(3,(7),((((

VRS(A1):VRS(A2)(=(37:73.

RewardReward1 0

A1 a bA2 c d


Result(by(RS

★ The(result(shown(is(of(a(2Garmed(bandit(problems((0.6,(0.4)((the(reward(probability(of(A1(and(A2).

1 5 10 50 100 500 10000.5

0.6

0.7

0.8

0.9

1.0

step

Accuracyrate

LSCPToWH0.5LSMH0.3LSMH0.7L

RS

The(problem(of(RS★ The,naive,relative,evaluation,of,RS,works,only,

with,2,arms.,★ With,n,arms,,RS,is,not,definable,or,any,

generalization,doesnNt,work,well.★ So,,we,need,another,model,that,keeps,the,same,

high,performance.★ We,introduce,our,LS,model,,first,proposed,by,

Shinohara,(2007),—,kind,of,haphazardly.★ 篠原修二,,田口亮,,桂田浩一,,&,新田恒雄.,(2007).,因果性に基づく信念形成モデルとN本腕バンディット問題への適用.,人工知能学会論文誌,,22(1),,58–68.

LS(model★ The(performance(of(LS(in(2G

armed(bandit(problems(is(the(same(as(RS,(and(LS(can(be(applied(to(nGarmed(bandit(problems.★ While(RS(compares(an(arm(

with(the(other(arm,★ LS(compares(an(arm(with(the(

RgroundR(formed(from(the(whole(set(of(arms.

★ LS(fits(the(intuition(of(human(about(causal(relationship(with(very(high,(actually(the(highest(correlation((r(>(0.85(for(all(experiments).

RewardReward1 0

A1 a bA2 c d

LS(1|A1) =a+ b

b+dd

a+ bb+dd+ b+ a

a+cc

P (1|A1) =a

a+ b

RS(1|A1) =a+ d

a+ d+ b+ c

LS(describing(causal(intuition★ LS,fits,the,experiment,data,of,causal,induction,

(inductive,inference,of,causal,relationship),the,best,among,other,42,models,including,the,most,popular,ΔP=P(E|C)–P(E|¬C).,★ Experiment,of,causal,induction:

★ Given,an,effect,E,in,focus,(e.g.,,stomachache),and,a,candidate,cause,C,(e.g.,,drinking,milk),,answer,the,causal,relationship,from,C,to,E.,The,co@occurrence,information,of,C,and,E,is,given.

effecteffectE ¬E

causeC a bcause¬C c d

Experiment AS95 BCC03.1 BCC03.3 H03 H06 LS00 W03.2 W03.6

r2 for LS 0.9 0.96 0.96 0.97 0.94 0.73 0.91 0.72

r2 for ΔP 0.78 0.84 0.7 0.0 0.5 0.77 0.08 0.21

Meta-analysis

The(properties(of(LS#★ Figure–ground(segregation(and(invariance(

of(ground(against(change(in(focus((figure).★ As(the(background(stays(invariant(when(you(

see(each(of(the(two(possible(objects,(a(rabbit(or(a(duck,(but(not(both(at(the(same(time.

LS(1|A1) =a+ b

b+dd

a+ bb+dd+ b+ a

a+cc

LS(1|A2) =c+ d

d+bb

c+ dd+bb+ d+ c

a+ca

A1ground(A1C)

A2ground(A2C)

P:,A1≠A2,,,,,,A1C≠A2C

LS:,A1≠A2,,,,,,A1C=A2C

RS:,A1≠A2,,,,,,,A1C=A2,

The(properties(of(LS#

LS(1|A1) =a+ b

b+dd

a+ bb+dd+ b+ a

a+cc

LS(1|A2) =c+ d

d+bb

c+ dd+bb+ d+ c

a+ca

A1ground(A1C)

A2ground(A2C)

P:,A1≠A2,,,,,,A1C≠A2C

LS:,A1≠A2,,,,,,A1C=A2C

RS:,A1≠A2,,,,,,,A1C=A2,,,,,,A2C=A1

P (1|A1) =a

a+ bP (1|A2) =

c

c+ d

RS(1|A1) =a+ d

a+ d+ b+ c

RS(1|A2) =c+ b

c+ b+ d+ a

≠

==

=

≠≠

Policy(derived(from(value(function

★ A(kind(of(policy(can(be(derived(from(value(function(like(UCB1(and(LS.

で図4のように環境を大別する事が出来る [7]．

単高報酬環境多高報酬環境

低報酬環境単高報酬環境

0.0 PA 1.0

0.0

P

B 1.0

図 4 報酬獲得確率からなる問題環境の区分

以下はこの分類に合わせ，単高報集環境 (PA =.8, PB = .2)，多高報集環境 (PA = .8, PB = .7)，低高報集環境 (PA = .3, PB = .2)の環境に分けてLS

の挙動の変化を論じる．図5はある時点で観測された環境において，意思

決定変数P (A)の値からどのように評価値が変化するか，LSをP (A)の関数として捉えた図である．

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

LS policy

P(A)

LS

va

lue

.8 vs .2LSALSB

.8 vs .7LSALSB

.3 vs .2LSALSB

図 5 意思決定変数P (A)に対するLSの値

この関数においては意思決定変数P (A)に対して評価値が逆転する交点の有無が客観的な確率に逆らい探索を行うかを決定する．環境観測変数を固定した時，単高報酬環境 (どちらか一方の手段のみが報酬獲得確率が0.5以上)という易しい問題環境では交点が存在せず探索を行わない．対して多高報酬環境，低報酬環境のような難しい問題環境では交点が現れ探索するようになる．また，P (A) = 0，即ち手段Aを全く行っていな

い場合には，その手段に対する評価はどの環境に

おいてもLS(E|A) = 0.5に収束している．因果帰納におけるLSの0.5は人が無相関と感じる規準が一致する [6]．逆にP (A) = 1.0に近づけば近づく程値が頻度的に観測された客観的な条件付き確率P (E|A)に近似される．即ち，LSにはあまり観測されていない未知の手段を無相関規準である0.5に値付けし，十分観測された手段に対しては客観的な値に収束する9．

limP (A)!1.0

LS(E|A) = P (E|A) (9)

limP (A)!0.0

LS(E|A) = 0.5

図5，LSの極限 (式9)から，P (E|A)の値が0.5を境にLS(E|A)とLS(E|B)のLSの評価が逆転し，低報酬獲得環境において探索をより行う事にが示された．この事からLSは“少なくともいずれか一つの手段は0.5以上の報酬確率を持つであろう”というポリシーを持ち，報酬確率が0.5以上の手段を求めて探索を行うと解釈出来る．

5.2 選択バイアスの状態遷移LSには低報酬環境ではよく探索し，多高報得環

境(どちらの手段も報酬獲得確率が0.5以上)では報酬を最大化させる傾向がある．これは環境に適応するよう緩やかな状態遷移を行っていると言える(図6)．このようにLSは一つの価値観数でありながら，環境の状態に応じて方策を複数種類有している．

NA>NB

NE>NE

Aを引く傾向が強い

A

B

E E

Aを引く

Bを引く

：収束状態候補

：探索状態

NA：Aを引いた回数　NB：Bを引いた回数

NE：当たった回数　 NE：外れた回数

NA NB

NE>NE

CPの順位に従う

NA NB

NE NE


NA NB

NE<NE


NA>NB

NE>NE

Bを引く傾向が強い

NA>NB

NE<NE


NA<NB

NE<NE


図 6 LSの方策状態遷移

2本腕バンディット問題で優秀とされるUCB1と比べても多数の状態を表現しており，またUCB1は試行回数が増える程，方策が客観的な評価に収束する．LSは試行数が増えても上述したような柔軟な状態変化を行う能力を失わないため，複雑

P2-12

443

に変化する環境に対しても柔軟な判断を行う事が可能である．

A

B

E E

Aを引く

Bを引く

：収束状態候補

：探索状態

NA：Aを引いた回数　NB：Bを引いた回数

NE：当たった回数　 NE：外れた回数

NA>NB


NA NB


NA<NB


図 7 UCB1の方策状態遷移

6. 総合考察以上によりLSは環境の複雑さを地の不変性，平

均情報量，分散からなるバイアスの補正項という形で値に取り入れ，異なる環境毎にその挙動を変化させて不確定な情報を扱っていると考えられる．LSにおいて手段の大まかな評価は手段と報酬間の因果関係で表され，0.5という規準は試行による損失と利益に対する相関の境界を示しているのだと解釈出来る．また，LSはある種の主観確率を記述すると考

えられる．主観確率とは認知を構成する上で最も基礎的なカテゴリーに位置する．論理で扱えない矛盾を省くだけでなく，確率においても初期において近似的な解を効率的に与え，環境変化に対しても高い適応性を示す．曖昧さをどのような扱うべきかという議論は今までも行われてきた．ファジィ集合等で扱われるファジィ理論はその代表例ともいえる．それは人間の離散的だが定性的な判断を定量化し，論理に持ち込もうというアプローチから生まれた．それに対してLS等の主観確率は，確率という定量的な判断に定性的な性質を与えようというアプローチである．主観確率モデルであるLSは確率の公理を満た

すものである．ファジィ集合にはいくらか特殊な規定を行う必要があるのに対して，LSは確率を扱うシステムであれば大きな変更なく扱える．前述のとおりLSは情報獲得が制限される場合に有用に働き，また情報が十分に獲得できている場合には客観的な確率と等しく収束する．即ち，広い分野で単純に確率の表記をLSやその改良モデルに変更するだけで複雑な環境に適応する主観確率を扱えるようになると考えられる．

このようにLSは偶然の影響，不確定な情報に適応する人間の基本的認知能力を表現する良いトイモデルとなる可能性がある．この点については意思決定係数の分離による分析 (表4,図5)等も行い，今後，確率に対する人間の主観的な値付けを実験的に研究して行く．

参考文献[1] 篠原修二, 田口亮, 桂田浩一, 新田恒雄 (2007) “因果性に基づく信念形成モデルとN本腕バンディット問題への適用,” 人工知能学会論文誌, Vol. 22, No. 1, pp.58–68.

[2] Sutton, R. S., Barto, A. G. (2000) 強化学習. 森北出版. (三上, 皆川訳)

[3] Takahashi, T., Nakano, M., Shinohara, S. (2010) “Cog-nitive symmetry: Illogical but rational biases,” Sym-metry: Culture and Science , Vol. 21, No. 1-3, pp.275–294.

[4] Hattori M., Oaksford M. (2007) “Adaptive non-interventional heuristics for covariation detection incausal induction: Model comparison and rational anal-ysis,” Cognitive Science, Vol. 31, No. 5, pp. 765–814.

[5] 大用庫智,高橋達二 (2010) “因果帰納と意思決定を結ぶ緩い対称モデル ,”日本認知科学会第27回大会発表論文集, pp. 799–800.

[6] Tatsuji Takahashi，Kuratomo Oyo, Shuji Shinohara：“A Loosely Symmetric Model of Cognition”， LectureNotes in Computer Science，No. 5778，Springer，pp.234–241 (2011)．

[7] 大用庫智,甲野佑,高橋達二 (2011) “n本腕バンディット問題に対する LS モデルの一般化,”　　2011年度人工知能学会全国大会(第25回)予稿集, 1G1-2in.

[8] 清水隆宏, 横川純貴, 甲野佑, 高橋達二 (2011) ,“ ,認知バイアス調整機構 LS の Q 学習への実装とその機能 ”, 　　2011年度人工知能学会全国大会 (第25回)予稿集, 1P2-12in.

2011年度日本認知科学会第28回大会

444

choose Achoose B

UCB1LSrisk(aSitude((consideration)

LSVR:(Generalized(LS★ LS(has(been(problems:(★ LS(cannot(evaluate(actions(more(than(two,(in(its(

original(form.★ The(reference(for(satisficing(in(LS(model(is(

unchangeable.★ So(we(analyze(and(generalize(LS(to(make(it(

practical.(★ Kohno(Y.,(Takahashi,(T.((2012).(Loosely(symmetric(

reasoning(to(cope(with(the(speedGaccuracy(tradeGoff.(In:(Proceedings/of/SCIS3ISIS/2012,(Kobe,(Japan,(November(20–24,(2012.

Improvement(to(LS#(1)

★ Generalization(from(2Garmed(to(nGarmed

A1

777

A2

777

A3

777

An

777

...? ? ?“>”$or$“<”

★What(is(relative(evaluation(for(n#>#2(actions...(?

Generalization(from(2Garmed(to(nGarmed

A2

777

A3

777

An

777

...

A1

777

Ag

777

Virtual machine : Abstract image ?

★ It,is,hard,to,define,effective,relative,evaluation,in,multiple,actions.,So,we,generate,a,virtual,arm,as,the,ground.


A2

777

A3

777

An

777

...

Ag

777

Virtual machine : Abstract image

A1

777

A2

777

A3

777

An

777

Ag

777

A1

777

“>” or “<”

★ In,this,way,,we,evaluate,each,arm,in,relative,evaluation,with,Ag,,the,virtual,,ground,arm,,which,is,invariant,against,the,change,in,focus,(which,arm,to,evaluate).


Abstract image

★ The,virtual,,abstract,image,or,the,ground,arm,is,defined,like,this.

LS(E|Ai) =P (Ai, E) + Sp

P (Ai, E) + Sp + P (Ai,¬E) + Sn


Improvement(to(LS#(2)

★ Online(learning(of(reference

0.60.4

0.8

0.2

A1 A2 A3 A4

Good

Good

Bad

Bad

Reference = 0.5

Unchangeable... ?

★ If(the(reference(is(fixed(and(invariable,(we(can(not(respond(to(various(environments(in(general.

...

Control(of(reference

0.60.4

0.8 0.5

0.2

A1 A2 A3 A4

Bad

Good

Bad

Bad

ReferenceR

variable reference

★ So(we(make(the(reference(variable(and(define(the(update(procedure.

Control(of(reference

0.60.4

0.8 0.5

0.2

A1 A2 A3 A4

Bad

Good

Bad

Bad

ReferenceR

variable reference

★ So(we(make(the(reference(variable(and(define(the(update(procedure.

Control(of(referenceNow satisficing is optimization!

Variable Reference

★ LSVR(is(LS(with(a(parameter(to(change(the(reference.(The(reference(is(learned(online(by(update(through(some(recurrence(equation.

LS(with(variable(reference((LSVR)

Step : 100000 stepsTurn : 10000 turnsArm : 2, 22, 42, 82, 102 arms

(Reference learning parameter α=0.8)Model : LS-VR, LS, UCB1, UCB1-tuned

Simulation setting

★We(executed(to(examine(compare(simulations(to(performance(of(LSGVR.

★ The(simulation(seSing(is(as(follows.

Simulation,sebing,for,LSVR)

LSVR:(Result

12@armed,bandit,problem(accuracy)

100@armed,bandit,problem(accuracy)

0 1000 2000 3000 4000 50000.

00.

20.

40.

60.

8

Steps

Cor

rect

rate

LSLSVRUCB1UCB1.tuned

— LS— LS-VR— UCB1— UCB1-tuned

0 1000 2000 3000 4000 5000

0.2

0.4

0.6

0.8

Steps

Cor

rect

rate

LSLSVRUCB1UCB1.tuned

— LS— LS-VR— UCB1— UCB1-tuned

LSVR:(Result

LSVR UCB1@tuned

Robot(motion(learning★ A,robot,to,learn,giant@swing,

(Acrobot),with,reinforcement,learning.

★ Non@Markov,dynamics,resulting,from,coarse@graining,necessary,for,the,Q@learning,framework.

初期値 : !0 = 0.5

ステップが経過する毎に,

!t+1 ! 0.9!t (20)

softmax法とは，Boltzmann分布に従って Q値から行動選択確率 (式21)を算出し，その確率に従って行動を決定する手法である．! - greedy 探索する場合，全ての手段に対して等しい確率で探索してしまう．それに比較して softmax 方策は Q 値の大きさによって選択確率を変更できるため，より効率的に探索する事が出来る．

"(st, ak) =exp(Q(st, ak)/T )!ni=1 exp(Q(st, ai)/T )

(21)

Goal line

図 5 Acrobot 振り上げ課題の挙動

本シミュレーションで扱う Acrobot振り上げ課題は，鉄棒のような回転可能な軸に繋がれたロボットが身体を揺らし，その終端が一定の高さ (Goal line)に達する事を目的とした課題である(図5)．課題の各設定は Sutton[10]に準じている．ロボットは下半身との間に一点，可動部を供えており，エージェントはその可動部に加えるトルク # を学習する．ここで与えるトルク # は正の向き 1.0Nm (# = 1.0)，負の向き 1.0Nm (# = "1.0)，トルクを与えない(# = 0.0) の3種とした (# # {10.0,"10.0, 0.0})．また，ロボットが繋がれた回転可能な軸に直接トルクを与える事は出来ない．状態は，重力方向を $ = 0.0 として回転軸に

対する上半身の角度 $1 とその角速度 $1，上半身と下半身のなす角度 $2 とその角速度 $2 の 4種変数によって表現される．それぞれの初期値は，$10 = 0.0[rad]，$10 = 0.0[rad/s]，$20 = 0.0[rad]，$20 = 0.0[rad/s] である．上半身とか半身のなす

角度 $2は"(3/4)" < $ < (3/4)"[rad] の範囲に制限した．また，角速度 $1，$2 は "4" < $ < 4"[rad/s]の範囲を取るよう設定した．回転軸に対する上半身の角度 $1 には制限を与えておらず，一つのエピソード中に何度でも回転可能である．

l1=1.0

lc1=0.5

1

2 l2=1.0

lc2=0.5

m1=1.0

m2=1.0

図 6 Acrobotのパラメータ

各ステップでは状態変数と与えられた力 F から以下の式によって回転軸と上半身との間の角加速度 $1 と，上半身と下半身の間の角加速度 $2 を計算した．使用される変数は台車の重さ M = 1.0[kg]，ポールの重さ m = 1.0[kg]，長さ l = 0.5[m]，重力加速度 g = 9.8[m/s2] である．

$1 = "d!11 (d2$2 + %1) (22)

$2 = (m1l2c1 + I2 "

d22

d1)!1(# +

d2

d1%1 " %2)(23)

d1 = m1l2c1 + m2(l22 + l2c2 + 2l1lc2cos$2)

+I1 + I2 (24)

d2 = m2(l2c2 + l1lc2cos$2) + I2 (25)

%1 = "m2l1lc2$22sin$2 " 2m2l1lc2$1$2sin$2

+(m1lc1 + m2l1)gcos($1 " "/2) + %2(26)

%2 = m2lc2gcos($1 + $2 " "/2) (27)

その後，以下に示すオイラー法によって差分を行い，状態変数である位置 X，速度 X，角度 $，角速度 $ を決定する．差分に用いる時間増分 !t は!t = 0.05[min] とした．

$1 ! $1 + $1!t (28)

$2 ! $2 + $2!t (29)

$1 ! $1 + $1!t (30)

2012年度日本認知科学会第29回大会 P1-22

318

!2 ! !2 + !2!t (31)

連続状態である状態変数はタイルコーディングを用いて，状態変数毎に6つに分割した．タイリングのパターンはシミュレーション毎にランダムに分割しなおし，パターンに依存しないよう設定した．一つのタイリングに付き状態数は 6"6"6"6 = 1, 296であり，一度のシミュレーションにつきタイリングを12パターン用いて計算を行った．環境からの報酬はステップの経過に伴い r = #1.0 を与え，Acrobot が Goal line を越えた際に r = 0.0 を与えた．Acrobot が Goal line に達するか，2,000 ステップ経過すると，1 エピソードが終了する．エージェントは " - greedy，softmax 方策を用

いた 2種類の Sarsa(#) と，本論文で新たに考案した LSRL(#) を用い，# = 0.0 (Sarsa(0.0) " -greedy，Sarsa(0.0) softmax，LSRL(0.0))と，# =0.9の場合(Sarsa(0.9)"- greedy，Sarsa(0.9) softmax，LSRL(0.9))，合計6つのエージェントで各 500 エピソード，1,000回シミュレーションした．

4.1 結果および考察

0 100 200 300 400 500

80

01

00

01

20

01

40

0

episode

ste

p/e

pis

od

e

Sarsa(0.0)ε-greedy

Sarsa(0.9)ε-greedy

Sarsa(0.0) softmax

Sarsa(0.9) softmaxLSRL(0.0)

LSRL(0.9)

図 7 Acrobot 振り上げ課題において終端状態に到るまでのステップ数

図7は Acrobot 振り上げ課題における学習の過程を表している．横軸はエピソードの経過数であり，縦軸はそれまでのエピソードを通して経過したステップ数をエピソード数で平均した物である．

# = 0.0 の時，" - greedy と softmax を用いたSarsa が上手く学習出来ていないのに対して，LSRL は比較的短いステップで終端状態に至った．

# = 0.9の差異には " - greedyよりもよく学習をし，早いステップ数で Goal lineを越えている．Acrobot振り上げ課題は状態変数が連続的であるだけでなく，非線形なダイナミクスを持つ難易度の高い課題として知られる [10]．この課題の荒い離散化(タイルコーディング)に対して LSRL の学習の早さは，LSRL が複雑な環境の情報を持つ課題に対してもロバストに対応出来る事を示している．今回は参照点 Rc (式16)を今までで最も良い結果に設定した．これは過去最も良い結果を招いた選択と比較し，同等か，それを越えるような選択を“常に”探索し続ける事を意味している．このように本シミュレーションにおいて非常にストイックな条件においても有意な結果を得たという事は，実際の物理現象に見合ったより最適な参照点 Rc を与える事で更に成績が向上するものと推測できる．

5. 総合考察本研究では計算機上での効果的な学習・推論シ

ステムの構築のため，機械学習を根幹に認知心理学等の他分野の成果を応用し，人間を模倣するソフトコンピューティングの体系化に寄与することを試みた．具体的には，強化学習課題における対称性推論の有用性を示すため，緩い対称モデル，LS に着目し，強化学習への一般化を試みた．特に LS が対称性推論によって環境の複雑さを考慮し，環境毎にその挙動を変化させて課題に適応する性質や，変化の基準値である参照点とエージェントの状態を対応させる事で，更に柔軟な環境適応する性質を重視し，新たな強化学習モデルである LSRL の構築を行った．また，そのシミュレーションから強化学習における対称性推論の有用性の一端を示唆するに至った．諸研究により LS は人間の因果帰納実験や意思

決定実験に対して高い記述性を示している [5]．それらの結果と本研究の分析から LS は偶然の影響，不確定な情報に適応する人間の認知能力を表現する良いトイモデルとなる可能性がある．現時点では LS を方策オン型学習に実装したに過ぎないが，人間の認知バイアスが学習を促すという側面を直接的に見る事が出来た．本研究の結果は LS の強化学習への応用において完全なものではない．しかし，強化学習一般に適用可能な形式を提示した事で対称性推論に関する研究に幅を持たす事にもつながり，Actor - critic 等のより生物的な学習への実装に見通しが付いた．本研究の成果は，機械学習の形式で広い範囲に

対応可能なモデル構築の一つの方針を示した．のみならず，対称性推論を現実環境の認知的な記号化とその学習システムにまで拡張し，現実で働く


319

Tiling1

Tiling2

図 4 タイルコーディング

!c [r+!Q(st+1, at+1)!Q(st, at)]e(sik, aj)

Q(!si, aj) "

c!

k=1

Q(sik, aj) (13)

タイリング毎にタイルの区切り方(離散化)は等しく無くても良く，またタイルの区切り方を工夫する事で特定の次元を無視する事や，特定の区間を細かく分割する事で詳細に価値関数を設定する事が出来る [1]．

3.3 LSを用いた強化学習アルゴリズム本研究では上述の性質を保ったまま LS の計算

式をSarsa学習 [1]に応用するため，学習で扱う変数を以下のように再定義した．

表 1 保持すべき情報s 価値試行頻度a1 Q1 "1

a2 Q2 "2

...an Qn "n

!k：行動akを行った頻度Qk：行動akを行う価値

表1から，状態 s において行動 ak を取った際の単位時間当たりの獲得報酬は Q/" から以下の式になる．

Rk =Qk

"k(14)

RU ="

Qk""k

(15)

この規準と単位が同じく (価値/試行度合い) になるよう，方策が変化する参照点 Rc を本研究では以下のように設定する．これはまだ試行錯誤中の値であり，より理論的に妥当な値がある可能性は否定しない．

Rc =前エピソードで得た総報酬前エピソードの総試行数 (16)

これらの変数から状態 s において可能な任意の行動 ak に対する価値関数 LSRL を考案する (式17)．ここで aH は最も観測した手段，即ち最も "

の高い手段 max"k ak であり，同様に aL には最も観測していない手段，即ち最も " の低い手段として min"k ak が選択される．これは LS のバイアス項，式2，3に習っている．また，実装に行う行動，バックアップに使う行動には最も高い LSRL

値を持つ行動 ak，つまり maxLSRL(ak) ak が選択される．

LSRL(ak) =Qk + 2Rc

"H"L"H+"L

! QHQL

QH+QL

"k + "H"L"H+"L

(17)

エージェントが状態 sにあるとき，tH/(tH+tL) "1.0 の極限において最も観測した行動 aH，最も観測していない行動 aL の LS 値は式18，19となる．これは通常の LS の極限式4，5に対応する．

limtH

tH+tL#1.0

LSRL(aH) # QH + 2RctL ! QL

tH + tL

# QH

tH(18)

limtH

tH+tL#1.0

LSRL(aL) # QL + 2RctL ! QL

tL + tL

= Rc (19)

4. 強化学習シミュレーション以下では LSRL を実装したエージェントが，そ

うでないエージェントに比べてどのような成績の違いを持つか調べるため Acrobot 振り上げ課題で学習を行い結果を比較した．比較するエージェントには全て方策オン型学習 Sarsa(#) を用い，方策 $ には．% - greedy と softmax 方策を用いた [1]．% - greedy 方策は % の確率でランダムに手段を選択し，1 ! %の確率で greedy に，最も Q 値の高い手段を選ぶ方策である．% が高い程多く探索を行うため，% の初期値を 0.5 とし，ステップが経過する毎に % を減衰させた．


317

Acrobot

tile coding

Robot(motion(learning★ Another,version,of,reinforcement,learning,agent,for,

Acrobot.

The Efficacy of Symmetric Cognitive Biasesin Robotic Motion Learning

Daisuke Uragami Tatsuji Takahashi Hisham Alsubeheen, Akinori Sekiguchi and Yoshiki Matsuo

School of Computer Science School of Science and Technology School of Computer Science Tokyo University of Technology Tokyo Denki University Tokyo University of Technology

Katakuramachi, Hachioji City, Tokyo, 192-0982, JAPAN. Hiki, Saitama, 350-0394, JAPAN Katakuramachi, Hachioji City, Tokyo,

192-0982, JAPAN.

[email protected] [email protected] [email protected] { sekiguchi & matsuo }@cs.teu.ac.jp

Abstract – We propose an application of human-like decision-making to robotic motion learning. Human is known to have illogical symmetric cognitive biases that induce “if p then q”and “if not q then not p” from “if q then p.” The loosely symmetric Shinohara model quantitatively represents the tendencies (Shinohara et al. 2007). Previous studies one of the authors have revealed that an agent with the model used as the action value function shows great performance in n-armed bandit problems, because of the illogical biases. In this study, we apply the model to reinforcement learning with Q-learning algorithm. Testing the model on a simulated giant-swing robot, we have confirmed its efficacy in convergence speed increase and avoidance of local optimum.

Index Terms - Reinforcement Learning, Exploration- Exploitation Dilemma, Speed-Accuracy Tradeoff, Giant-Swing Motion, non-Markov Property

I. INTRODUCTION

Reinforcement learning has been utilized for robots to acquire the motion by interaction with uncertain environments [1-3]. However, because the systems learn through trial-and-error, it requires exponentially longer learning steps for a bigger state space. Some researchers have proposed solutions for the problem [4-7]. The solutions include performing reinforcement learning with an inner model [6], and applying the learning method to determine when to switch between some motion patterns prepared in advance [7].

Hara et al. [8], Sakai et al. [9] and Toyoda et al. [10] have studied the problems of simply applying Q-learning method, that is the representative algorithm for reinforcement learning, to the real robots. They coarse-grained the state space in the giant-swing motion learning so as to make Q-learning practicable. As the result, because of the incomplete state identification, the learning task becomes a dynamic one that does not satisfy the Markov property. They have pointed out two issues in applying Q-learning to the non-Markov task: (1) the convergence speed of the learning decreases because the Q-values fluctuate, and (2) the acquired motion often enters a loop among low reward states, i.e. the learning gets trapped to a local optimum.

In this study, we solve the above two problems by application of human illogical tendencies human in decision-making to Q-learning. Though the inferences are logically invalid, the tendencies often have a beneficial effect on communication and decision-making in society and natural environment, since they underlie many heuristics and biases

[11,12]. Shinohara’s loosely symmetric (LS) model quantitatively models these two biases and their adjustment mechanism [13]. While it describes the human sense of causality the best [11], the agent that has LS model as the action value function exhibits fine performance in the classical two-armed bandit problems [14]. In this article, we apply LSto Q-learning.

This paper is organized as follows: Firstly we introduce LS model. Secondly, we define a modification of Q-learning method that determines the action with LS. Then, simulating the acquisition of giant-swing motion by a robot (a.k.a. Acrobot), we prove the efficacy of our method. Finally we discuss the significance and advantage of applying human cognitive biases to robotic motion learning.

II. LOOSELY SYMMETRIC MODEL

Human inference is known to deviate from the standards of classical logic and probability theory. The tendency of deviation is called cognitive bias. The biases are generally thought to derive from use of heuristics [12]. Among many of the biases, symmetry (S) and mutual exclusivity (MX) biases may be the most important ones for understanding human. One of the reasons is that no animal but only human has them [15]. S and MX bias are the tendency of human to infer the converse “if q then p” and the inverse “if not p then not q”, from the original conditional “if p then q,” respectively. Having S and MX accelerates sign and language acquisition, since it is to assume a one-on-one correspondence between labels and objects [16]. To inquire the functionality and interrelationship of S and MX biases, Shinohara has proposed his loosely symmetric (LS) model that represents distorted conditional probability that human seems to use when they judge the intensity of relationship from the co-occurrence information between two events as in Table 1, where !"#!$% means the presence (absence) of event p. When the information is used in causality judgment, searching for the true cause among candidate causes from a given effect, p and q respectively

TABLE I A 2X2 CONTINGENCY TABLE FOR CO-OCCURRENCE INFORMATION

p !$q a c &' b d

410978-1-4244-8115-6/11/$26.00 ©2011 IEEE

Proceedings of the 2011 IEEEInternational Conference on Mechatronics and Automation

August 7 - 10, Beijing, China

Fig. 12 The result of the three algorithms, GQ, NS and LS

Fig. 13 A typical time development of LS

Fig. 14 A typical time development of NS

Fig. 15 A typical time development of GQ

VI. DISCUSSION

Naive application of Q-learning to robotic motion learning is destined to have slow convergence of learning and sinking in local optima. As the results in the previous section have shown (Fig. 12-15), our LS policy improved the situation. It acquired more rewards after the early stage in which any agent must accumulate enough information by trial-and-error. In the final phase where agents deterministically decide their own action based on the result of learning, LSpolicy enables monotonic increase in the acquired reward. When the robot chooses the greedy action in GQ, it sinks into the position states below the high bar, as in Fig. 10, after the 100,000th step. To get out of the area of the position state, a distinction between two phases of motion learning is required. The earlier phase is the initial phase around the position right below the bar, and in the later phase the robot reaches the angle that is large enough to achieve giant-swing. Hara et al. [8] and Sakai et al. [9] have confirmed the need to distinguish the two phases and they have realized the substantial distinction by giving the reward of two levels and switching between greedy and random actions. However, it is better if we do not need to make such a distinction and the robot more actively learn how to act. Tani et al. [20] and Taniguchi et al. [21] have proposed the ways to make the robot itself segments motion modes. In these studies, the motion is predicted with an internal model and the motion modes are divided according to the measure of expected error. The model proposed in this article could carry off the giant-swing motion well without such an internal model prepared in advance. To analyze the reason why the LS could perform better, we focus on the fact that there are two hierarchical levels in giant-swing motion. The higher level is of the two modes of global motion of rotation noted above. The lower one is of the local action choice, which is among bending, stretching and holding the arm for our robot. The cognitive bias (MX)efficiently structures the local level by comparative valuation of the possible actions. In the ordinary Q-learning, the value of actions is estimated independently. In contrast, the values are dependently evaluated (estimation relativity, see [11]) in the LS policy. When bending the arm at a certain state ends up

414

mean a candidate cause to inspect and an effect. In decision-making with two actions, p and !" are substituted with actions A1 and A2, and q and #$ means the presence and absence of result, or positive and negative rewards. On Table 1, conditional probability of q given p is P(q|p)=a/(a+b). LS is its symmetric and mutually exclusive modification defined as:

%%%%&'(#)!* + , - . / 0. - 0, - . / 0. - 0 - . - , / 1, - 1 %%2%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%(3*LS loosely has S and MX biases as in Fig. 1 and 2. In the scatter plots, 5,000 points are drawn, where a point is the value of LS with a randomly generated integer quadruplet (,4 .4 14 0*2 a, b, c and d are sampled from the uniform distribution of 564 347 4 3668. The loose correlation between &'(#)!* and &'(!)#* and between &'(#)!* and &'(#$)!"*characterizes LS in relation to the ordinary conditional probability, which is totally asymmetric and has no such correlation.

Fig. 1 The symmetry (S) bias of LS

Fig. 2 The mutual exclusivity (MX) bias of LS

Though the definition may look a bit complicated, LS can be derived from a natural condition representing a general cognitive principle. When a human sees a scene, she does not look at everything but segregates a focus and the other part. The focus or the figure is accurately recognized, while the other or the ground is not completely but dimly neglected. When assessing the directed relationship from p to q, in a loosely symmetric way as &'(#)!*4 the figure is 9(#)!* +

,:(, - .*2 The ground information that is uniformly evaluated is represented by the added terms .0:(. - 0* and ,1:(, - 1* . Just as the information of the ground remains invariant against the focal shifts, the context terms are common to the following LS estimation from !" to q, i.e. when the focus is shifted from antecedent p to !" in relation to the resulting q:

%%%%&'(#)!"* + 1 - 0 / .0 - .1 - 0 / .0 - . - 0 - 1 / ,1 - ,%2%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%(;* When p and !" mean two actions, the value of the actions as the conditional probability (expected value) of p and !" are completely independent. The value of p and !" are 9(#)!* +,:(, - .* and 9(#)!"* + 1:(1 - 0*, respectively. LSevaluates the values in a dependent way, as we can see from the common added terms. LS thinks that if action !" causes undesired result #$, then, meanwhile, action ! is plausible to cause desirable result #. This is especially useful to decide a better or the best action in the greedy method, where argmax operator is used and hence only the sign of &'(#)!* %<%&'(#)!"* matters (See the IVth section). This bias of relative estimation or comparative valuation comes from the MX bias, for LS’s negation satisfies the law of excluded middle, &'(#$)!* + 3 < &'(#)!*2%This enables optimal balancing, or rather fusion, of speed and accuracy [14].

III. GIANT-SWING ROBOT

This article studies the realization of giant-swing robot (Fig. 3) by reinforcement learning with decision-making using LS. Many researchers have studied this kind of robot, which imitates high bar gymnastics, as a control test bed with nonlinear dynamics [7-9,17-19]. As in Fig. 4, the first joint between the high bar and the robot is free; only the second joint at the robot’s hip actively drives (nonholonomic system). The goal of motion learning for the robot is to rotate itself by bending and stretching the hip joint, starting from the stationary state right below the high bar. In this study, we construct a giant-swing simulator with Open Dynamics Engine (ODE). Consulting the works by Hara et al. [8,9], the two links are both set to have 0.2 m and 0.5 kg. The active hip joint moves at angular velocity 4.0 rad/s in the operational range of =64 (> ?@ *%AB rad.

Fig. 3 Giant-swing robot in ODE

411

Future(applications(of(LS

★More(robot(control★MonteGCarlo(tree(search(★ Online(advertisement★ LS(works(well(with(sparse(reward,(compared(to(

other(models.★ ACO(and(other(heuristics

Summary((LS)★ LS(implements(cognitive(properties(and(describes(

human(intuition.★ The(relative(evaluation(weakens(the(dilemma(and(

overcomes(the(tradeoff.★ LS(exhibits(the(best(result(with(the(reference(given(

(information(of(the(best(and(the(second(best(arms).(This(reference(parameter(is(extremely(intuitive.★ With(the(reference(learned(online,(the(more(arms(there(

are,(the(beSer(LS(performs.(The(reference(update(is(now(totally(naive(and(it(can(be(easily(much(improved.

★ Because(LS(is(just(a(value(function,(it(can(be(ported(to(many(applications.

General(conclusion★Many(of(the(cognitive(biases(found(in(humans(

look(illogical(and(irrational,(but(we(have(been(finding(that(human(are(actually(rational(––(considering(the(environment(structure(and(the(purpose(of(cognition(––(and(often(even(logical.(

★ Some(of(the(biases(can(be(applied(to(machine(learning(and(other(practical(areas(and(show(some(good(performance.

★ RCognitively–inspired.computingR(like(this(may(be(a(fruitful(field(of(study.

how do cognitive agents handle the tradeoff between speed and accuracy?

Education