量子系の統計的推測...

RIMS研究集会

[email protected]

２０１４年１１月１０日～１１月１２日

田中冬彦(Tanaka Fuyuhiko)

量子系の統計的推測とその幾何学的構造

オーガナイザー

大阪大学基礎工学研究科

序：ベイズ統計とは何か（３枚で簡単に）

そもそも、統計学の使命

統計学は世の中のニーズにこたえる学問

→ 時代背景によって問題意識も変化

（ここが他分野の人にあまり知られてない部分）

統計モデルや推測の目的（点推定,予測,検定, モデル選択など）, 損失関数などに応じて、望ましい方法を提供すること

* 推定誤差の評価は副次的.

統計学の使命

純粋数学との違い

ベイズ統計学とは何か

*解釈にこだわる人もいる

事前分布を軸に据えて考える統計学

事前分布＝未知母数に導入される確率分布*（意味は問わない）

・ベイズ統計学は伝統的な統計学の枠を大きく広げ現代の広いニーズに対応

（頻度論 vs ベイズという二者択一ではない.）

頻度論（伝統的な統計学）との関係

ベイズ統計学

・頻度論の結果(e.g., 最尤推定)は特定の事前分布をかたくなに使うことに相当

（応用では必ずしもうまくはいかない；損失関数への依存性がない, etc.）

・頻度論的な結果や漸近理論は統計解析の出発点（ゴールではない）

＊詳しくは、たとえば、Robert, Bayesian Choice Chap.11

事前分布は形式的なものを使うことも（無情報事前分布）

特定のケースでは頻度論もベイズ統計の方法も一致

例；最尤推定（頻度論）＝一様分布でのMAP推定（ベイズ統計）

ベイズ統計に対する誤解

客観的であるべき科学実験にベイズ統計は使えない？

無情報事前分布の選び方が未解決？

ベイズ統計の問題ではなく、小標本での統計学が

昔から抱えていた問題

よい推測方法の選択を事前分布の選択に繰りこんでいるだけ

（頻度論は漸近理論や問題の対称性に頼らざるを得ない）

参考：統計以外の人向け

数理解析研究所講究録１８３４

「量子論における統計的推測の理論と応用」の序文

・「統計学」に対する誤解の一部を指摘

・「量子系の統計的推測」を包括的に定義

ポイント

今年も講究録作成にご協力お願いします！

RIMS「量子系の統計的推測とその幾何学的構造」＠京都 [email protected]

２０１４年１１月１０日初版

田中冬彦(Tanaka Fuyuhiko)

統計的決定理論における量子ミニマックス定理

所属：大阪大学基礎工学研究科

0. Notation

量子力学＝Hilbert空間H上の作用素（環）の理論

密度作用素

)M(d)(Tr)d( xx

)M(dx

)(確率分布

量子系での確率の記述

測定装置（POVM）

量子確率＝密度作用素と測定（の数学モデル）で記述

)d( x

)(xax)(測定装置

)d(M x

データの統計処理

1Tr1

n

j

jp

*

0

01

U

p

p

U

n

0ip

の時

＊確率分布を量子力学に拡張したもの

nHdim

（＊）

密度作用素

{:)( S }Hilbert空間上の密度作用素の全体

}0,1Tr|)({ L

密度作用素（密度行列、状態）

量子系での測定(1/2)

j

j IM

},,,{ 21 kxxx

OxMM jj })({:

正作用素（半正定値行列）

POVM（有限標本での定義）

測定装置の数学的モデル

＝正作用素値測度（Positive Operator Valued Measure; POVM）

= 測定装置のとりうる値の集合標本空間

kjjM ,,1}{ POVM = 以下を満たすH上の線形作用素の族

量子系での測定(2/2)

},,,{ 21 kxxx

確率空間の構成（量子確率の公理）

標本空間

)(S密度作用素

kjjM ,,1}{ H上のPOVM（測定装置）

jjp MTr:

jx

j

j IM

OM j

＊密度作用素とPOVMの定義から確率の公理を満たしている

1Tr

0and

を測定結果として得る確率は以下で与えられるこのとき、

1. Introduction

)H(},...,,{ 21 Sk k,...,2,1

0,,,1 11 kk

kjj ,,2,1}M{

23214

1 24 23

Quantum Bayesian Hypothesis Testing(1/2)

Alice chooses one alphabet (denoted by an integer) and sends one

corresponding quantum state (described by a density operator)

1.

2.

3.

Bob receives the quantum state and perform a measurement and

guesses the alphabet Alice really sent to him.

The whole process is described by a POVM

The proportion of alphabets is given by a distribution

(called a prior distribution)

jiiAjBp MTr)|(

),( jiw

k

i

i iR1

M )(

Quantum Bayesian Hypothesis Testing(2/2)

k

j

iAjBpjiwiR1

M )|(),()(

* Bob’s average risk (his task is to minimize by using a good POVM)

* Bob’s risk for the i-th alphabet

* Bob’s loss (estimation error) when he guesses j while Alice sends i.

* The probability that Bob guesses j when Alice sends i.

)H(},...,,{ 21 Sk k,...,2,1

0,11 kk

k

i

i iRr1

MM, )(:

k

j

jijiwiR1

M MTr),()(

Guess the alphabet by choosing a measurement (POVM)

kjj ,,2,1}M{

Average risk

Bayes POVM w.r.t. = POVM minimizing the average risk

Chooses one alphabet and sends one quantum state

Proportion of each alphabet

(prior distribution)

Summary of Q-BHT

Minimax POVM

Minimize the worst-case risk

)(sup: M

*

M iRri

The worst-case risk

Minimax POVM = POVM minimizing the worst-case risk

１．Chooses one alphabet and sends one quantum state

)H(},...,,{ 21 Sk k,...,2,1


kjj ,,2,1}M{

Bob has no prior information

１．Chooses one alphabet and sends one quantum state

Quantum Minimax Theorem (Simple Ver.)

)H(},...,,{ 21 Sk k,...,2,1


kjj ,,2,1}M{

Minimax POVM agrees with the Bayes POVM w.r.t. the worst case prior.

k

i

ii

iRiR1

MM

MM

)(infsup)(supinf

Theorem ( Hirota and Ikehara, 1982)

*The worst-case prior to Bob is called a least favorable prior.

,

000

02/12/1

02/12/1

,

000

000

001

21

.

100

000

000

3

Minimax POVM

,

000

02/112/1

02/12/11

2

1,

000

02/112/1

02/12/11

2

121

MM .

100

000

000

3

M

Example (1/2) Quantum states

0-1 loss ijjiw 1),(

*This is a counterexample of Theorem 2 in Hirota and Ikehara (1982), where their proof seems not

mathematically rigorous.

,

000

02/12/1

02/12/1

,

000

000

001

21

.

100

000

000

3

Quantum states

0-1 loss ijjiw 1),(

LFP 0)3(,2/1)2()1( LFLFLF

When completely unknown, it is not necessarily wise to find an optimal POVM

with the uniform prior.

(Although Statisticians already know this fact long decades ago…)

Important Implication

Minimax POVM is constructed as a Bayes POVM w.r.t. LFP.

Example (2/2)

Rewritten in Technical Terms １．Nature

)H(}{ S k,...,2,1

Uuu }M{

)()(infsup)(supinf MM

MM

RR

Quantum Statistical Model

Experimenter and Statistician

kU ,...,2,1

Parameter space

Decision space POVM on U

Loss function

),( uw Risk function

Uu

uuwR MTr),()(M

Theorem

Main Result (Brief Summary)

)d()(infsup)(supinf M

M)(M

M

RRPoPPo

Quantum Minimax Theorem

UuuwR )d(M)(Tr),(:)(M where

Conditions, assumptions, and essence of proof are explained.

)d()(infsup)(supinf

)(

RRDPD

cf)

Statistical Decision Theory

Quantum Statistical Decision Theory

Wald (1950)

Holevo (1973)

(Classical) Minimax Theorem in statistical decision theory

Le Cam (1964)

First ver. is given by Wald (1950)

we show

Quantum Minimax Theorem in quantum statistical decision theory

Recent results and applications (many!)

Kumagai and Hayashi (2013)

Previous Works

Hayashi (1998)

Guta and Jencova (2007)

2. Quantum Statistical Models

and Measurements

Formal Definition of Statistical Models

(naïve ) Quantum statistical model = A parametric family of density operators

)H(: 1L

Basic Idea (Holevo, 1976)

A quantum statistical model is defined by a map

321

213

1i

i1

2

1)(

1)()()(0 2

3

2

2

2

1

R,, 321

Ex.

)()()();,( srsrsr R

)(

Next, we see the required conditions for this map

Regular Operator-Valued Function (1/2)

Locally compact metric space

K

Definition

}),(:),{(: dKKK

An operator-valued function

compact set

0

KXTTXXKT ),(,)()(:inf:)(1

)H(: 1LT

trace-class operators on a Hilbert space )H(1L

T0)(lim

0

KT K

Definition

For a map , we define

is regular

Remark 1

Converse does not hold if

KXTTXXKT ),(,)()(:inf:)(1

)d()()( Tf

0)(lim0

KT

)(Pfor every

Remark 2

The regularity assures that the following operator-valued integral is well-defined

as

Uniformly continuous w.r.t. trace norm on every compact set

0)()(sup1

),(

TTK

K

0

Hdim

)(0 Cf and

Regular Operator-Valued Function (2/2)

1. Identifiability (one-to-one map)

Quantum Statistical Models

Definition

}0,1Tr:)H({:)H( XXLXS

Conditions

)H(: S is called a quantum statistical model

2. Regularity

↑Necessary for our arguments

2121 ),()(

0)(lim0

KT K

if it satisfies the conditions below.

Measurements (POVMs)

U Decision space (locally compact space)

)(UPo All POVMs over U

Positive-operator valued function is called a POVM if it satisfies

Definition

)(UAB

11

)(MMj

jjj

BB

0M B

UA Borel sets (the smallest sigma algebra containing all open sets)

M

UBB A,, 21 jiBB ji ,

IU M

Born Rule

Axiom of Quantum Mechanics

measurement outcome is distributed according to

For the quantum system described by

and a measurement described by a POVM

)d(M x

)d()d(MTr~ xxx

)d( x

xPOVM

)d(M x

3. Quantum Statistical

Decision Problems

Basic Setting

Situation

1. To choose a measurement (POVM)

For the quantum system specified with the unknown parameter

Typical sequence of task

experimenters extract information for some purposes.

2. To perform the measurement over the quantum system

3. To take an action a(x) among choices based on the measurement

outcome x formally called a decision function

)d( x

)(xax)(POVM

)d(M x

Decision Functions

Example

- Estimate the unknown parameter

- Construct confidence region (credible region)

- Validate the entanglement/separability

- Estimate the unknown d.o. rather than parameter

1,0)( xa

)](),([ xaxa RL

n

xxxa n

1)(

)()(100

0)()(

0)()(

31

3

*

2

21

xaxa

xaxa

xaxa

Remarks for Non-statisticians

Remarks

1. If the quantum state is completely known, then the distribution of

measurement outcome is also known and the best action is chosen.

3. Precise estimation of the parameter is only a typical example.

2. Action has to be made from finite data.

)d( x

)(xax)(POVM

)d(M x

2),( aaw

}R:)({ m

Loss Functions

Later we see the formal definition of the loss function.

Performance of decision functions are compared by adopting

the loss functions (smaller is better).

Action space

Quantum Statistical Model

Loss function (squared error)

Ex: Parameter Estimation

)d( x

)(xa

)d( a

From the beginning, we only consider POVMs over the action space.

Put together

Measurements Over Decisions

a

Basic Idea

x)(POVM

)d(M x

POVM

)d(N a)(

Quantum Statistical Decision Problems

Quantum statistical model )(

Loss function (lower semi continuous**; bounded below) }{R: Uw

U *Decision space (locally compact space)

Formal Definitions

The triplet ),),(( wU

Statistical model )d( xp

Loss function }{R: Uw

U Decision space

cf) statistical decision problem

is called a quantum statistical decision problem.

*we follow Holevo’s terminology instead of saying “action space”.

Parameter space (locally compact space)

),,( wUp

** we impose a slightly stronger condition on the loss than LeCam, Holevo.

4. Risk Functions and

Optimality Criteria

Comparison of POVMs

The loss (e.g. squared error) depends on both unknown parameter and

our decision u, which is a random variable.( )

In order to compare two POVMs, we focus on the average of the loss

w.r.t. the distribution of u.

)],([E uw

)]',(['E uw

u)(

)d( u

POVM

)d(M u

POVM

)'d(N u

)(

)'d( u

'u

Compared at the same

)d(M)(Tr)d(~ uuu

UuuwR )M(d)(Tr),(:)(M

The risk function of

Definition

)(M UPo

),(E uw

Risk Functions

)M(d)(Tr~ uu

Since depends on the unknown parameter, we need

additional optimality criteria.

1. Smaller risk is better.

2. Generally there exists no POVM achieving the uniformly smallest

risk among POVMs.

Remarks for Non-statisticians

)(M R

)(supinf)(sup M)(M

M*

RRUPo

A POVM is said to be minimax if it minimizes the supremum of the risk function, i.e.,

Definition

Optimality Criteria (1/2)

Bogomolov showed the existence of a minimax POVM (Bogomolov

1981) in a more general framework.

(Description of quantum states and measurements is different from recent one.)

Historical Remark

All probability distributions (defined on Borel sets) )(P

The average risk of

Definition

)(M UPo

M,)(M

M, inf rr

UPo

A POVM is said to be Bayes if it minimizes the average risk, i.e.,

In Bayesian statistics, )(P is called a prior distribution.

w.r.t.

)(d)(: MM, Rr

Optimality Criteria (2/2)

Holevo showed the existence of a Bayes POVM (Holevo 1976; see also

Ozawa, 1980) in a more general framework.

Historical Remark

Parameter Estimation

U2

),( uuw

}R:)({ m

U

uuR )M(d)(Tr)(2

M ][E2

u

Ex. of Loss Functions and Risk Functions

Construction of Predictive Density Operator

)H( mSU )||)((),( uDuw m

}R:)({ pn

U

mm uuDR )M(d)()Tr||)(()(M

5. Main Result

Main Result (Quantum Minimax Theorem)

and be a compact metric space. Let

Then the following equality holds for every quantum statistical decision problem.

U

Theorem (*)

For every closed convex subset )(UPoQ the above assertion holds.

Corollary

)d()(infsup)(supinf M

)(M)(M

)(M

RRUPoPUPo

UuuwR )M(d)(Tr),(:)(M where the risk function is given by

*see quant-ph 1410.3639 for proof

Key Lemmas For Theorem

Compactness of the POVMs

If the decision space U is compact, then )(UPo is also compact.

Holevo (1976)

Equicontinuity of risk functions

The equicontinuity implies

the compactness of

under the uniform convergence topology

)()}(M:{: M CUPoRF

)(CF

Lemma by FT

Loss function R: Uw

If w is bounded and continuous, then

is (uniformly) equicontinuous.

We show main theorem by using Le Cam’s result with both lemmas.

However, not a consequence of their previous old results.

6. Minimax POVMs

and Least Favorable Priors

Statistical Decision Theory Wald (1950)

MDP prior (Reference prior for pure-states model)

Objective priors in Bayesian analysis are indeed least

favorable priors.

Reference prior; reference analysis

Bernardo (1979, 2005)

Previous Works (LFPs)

Tanaka (2012)

Komaki (2011)

Latent information prior

Jeffreys (1961)

Jeffreys prior

)d()(infsup)d()(inf M

)(M)(M

)(M

RRUPoP

LFUPo

Definition

UuuwR )d(M)(Tr),(:)(M where

If a prior achieves the supremum, i.e., the following holds

The prior is called a least favorable prior（LFP）. LF

and be a compact metric space. Let w: continuous loss function.

Then, for every decision problem, there exists a LFP.

U

Theorem

Main Result (Existence Theorem of LFP)

Pathological Example

Even on a compact space, a bounded lower semicontinuous (and not

continuous) loss function does not necessarily admit a LFP.

Remark

Example

),()(M uLR 1

0

]1,0[

)(R

0 1

1

]1,0[

MM]1,0[

)d()(sup)(sup1

RR

But for every prior, ]1,0[M )d()(1 R

Minimax POVM is constructed using LFP

Every minimax POVM is a Bayes POVM with respect to the LFP.

In particular, if the Bayes POVM w.r.t. the LFP is unique, then it is

minimax.

and be a compact metric space. Let

If a LFP exists for a quantum statistical decision problem,

U

Corollary

For every closed convex subset )(UPoQ the above assertion holds.

Corollary2

Much of theoretical results are derived from quantum

minimax theorem.

7. Summary

Future Works

- Show other decision-theoretic results and previous known results by

using quantum minimax theorem

- Propose a practical algorithm of finding a least favorable prior

- Define geometrical structure in the quantum statistical decision

problem

Discussion

1. We show quantum minimax theorem, which gives theoretical basis in

quantum statistical decision theory, in particular in objective Bayesian

analysis.

2. For every closed convex subset of POVMs, all of assertions still hold.

Our result has also practical meanings for experimenters.

量子系の統計的推測...

Documents