量子系の統計的推測...
TRANSCRIPT
RIMS研究集会
2014年11月10日~11月12日
田中冬彦(Tanaka Fuyuhiko)
量子系の統計的推測とその幾何学的構造
オーガナイザー
大阪大学 基礎工学研究科
序:ベイズ統計とは何か (3枚で簡単に)
そもそも、統計学の使命
統計学は世の中のニーズにこたえる学問
→ 時代背景によって問題意識も変化
(ここが他分野の人にあまり知られてない部分)
統計モデルや推測の目的(点推定,予測,検定, モデル選択など), 損失関数などに応じて、望ましい方法を提供すること
* 推定誤差の評価は副次的.
統計学の使命
純粋数学との違い
ベイズ統計学とは何か
*解釈にこだわる人もいる
事前分布を軸に据えて考える統計学
事前分布=未知母数に導入される確率分布*(意味は問わない)
・ベイズ統計学は伝統的な統計学の枠を大きく広げ現代の広いニーズに対応
(頻度論 vs ベイズ という二者択一ではない.)
頻度論(伝統的な統計学)との関係
ベイズ統計学
・頻度論の結果(e.g., 最尤推定)は特定の事前分布をかたくなに使うことに相当
(応用では必ずしもうまくはいかない;損失関数への依存性がない, etc.)
・頻度論的な結果や漸近理論は統計解析の出発点 (ゴールではない)
*詳しくは、たとえば、Robert, Bayesian Choice Chap.11
事前分布は形式的なものを使うことも(無情報事前分布)
特定のケースでは頻度論もベイズ統計の方法も一致
例; 最尤推定(頻度論) = 一様分布でのMAP推定(ベイズ統計)
ベイズ統計に対する誤解
客観的であるべき科学実験にベイズ統計は使えない?
無情報事前分布の選び方が未解決?
ベイズ統計の問題ではなく、小標本での統計学が
昔から抱えていた問題
よい推測方法の選択を事前分布の選択に繰りこんでいるだけ
(頻度論は漸近理論や問題の対称性に頼らざるを得ない)
参考: 統計以外の人向け
数理解析研究所講究録1834
「量子論における統計的推測の理論と応用」の序文
・「統計学」に対する誤解の一部を指摘
・「量子系の統計的推測」を包括的に定義
ポイント
今年も講究録作成にご協力お願いします!
RIMS「量子系の統計的推測とその幾何学的構造」@京都 [email protected]
2014年11月10日 初版
田中冬彦(Tanaka Fuyuhiko)
統計的決定理論における量子ミニマックス定理
所属:大阪大学 基礎工学研究科
0. Notation
量子力学=Hilbert空間H上の作用素(環)の理論
密度作用素
)M(d)(Tr)d( xx
)M(dx
)(確率分布
量子系での確率の記述
測定装置(POVM)
量子確率=密度作用素と測定(の数学モデル)で記述
)d( x
)(xax)(測定装置
)d(M x
データの統計処理
1Tr1
n
j
jp
*
0
01
U
p
p
U
n
0ip
の時
*確率分布を量子力学に拡張したもの
nHdim
(*)
密度作用素
{:)( S }Hilbert空間 上の密度作用素の全体
}0,1Tr|)({ L
密度作用素(密度行列、状態)
量子系での測定(1/2)
j
j IM
},,,{ 21 kxxx
OxMM jj })({:
正作用素(半正定値行列)
POVM(有限標本での定義)
測定装置の数学的モデル
=正作用素値測度(Positive Operator Valued Measure; POVM)
= 測定装置のとりうる値の集合 標本空間
kjjM ,,1}{ POVM = 以下を満たすH上の線形作用素の族
量子系での測定(2/2)
},,,{ 21 kxxx
確率空間の構成(量子確率の公理)
標本空間
)(S密度作用素
kjjM ,,1}{ H上のPOVM(測定装置)
jjp MTr:
jx
j
j IM
OM j
*密度作用素とPOVMの定義から確率の公理を満たしている
1Tr
0and
を測定結果として得る確率は以下で与えられる このとき、
1. Introduction
)H(},...,,{ 21 Sk k,...,2,1
0,,,1 11 kk
kjj ,,2,1}M{
23214
1 24 23
Quantum Bayesian Hypothesis Testing(1/2)
Alice chooses one alphabet (denoted by an integer) and sends one
corresponding quantum state (described by a density operator)
1.
2.
3.
Bob receives the quantum state and perform a measurement and
guesses the alphabet Alice really sent to him.
The whole process is described by a POVM
The proportion of alphabets is given by a distribution
(called a prior distribution)
jiiAjBp MTr)|(
),( jiw
k
i
i iR1
M )(
Quantum Bayesian Hypothesis Testing(2/2)
k
j
iAjBpjiwiR1
M )|(),()(
* Bob’s average risk (his task is to minimize by using a good POVM)
* Bob’s risk for the i-th alphabet
* Bob’s loss (estimation error) when he guesses j while Alice sends i.
* The probability that Bob guesses j when Alice sends i.
)H(},...,,{ 21 Sk k,...,2,1
0,11 kk
k
i
i iRr1
MM, )(:
k
j
jijiwiR1
M MTr),()(
Guess the alphabet by choosing a measurement (POVM)
kjj ,,2,1}M{
Average risk
Bayes POVM w.r.t. = POVM minimizing the average risk
Chooses one alphabet and sends one quantum state
Proportion of each alphabet
(prior distribution)
Summary of Q-BHT
Minimax POVM
Minimize the worst-case risk
)(sup: M
*
M iRri
The worst-case risk
Minimax POVM = POVM minimizing the worst-case risk
1.Chooses one alphabet and sends one quantum state
)H(},...,,{ 21 Sk k,...,2,1
Guess the alphabet by choosing a measurement (POVM)
kjj ,,2,1}M{
Bob has no prior information
1.Chooses one alphabet and sends one quantum state
Quantum Minimax Theorem (Simple Ver.)
)H(},...,,{ 21 Sk k,...,2,1
Guess the alphabet by choosing a measurement (POVM)
kjj ,,2,1}M{
Minimax POVM agrees with the Bayes POVM w.r.t. the worst case prior.
k
i
ii
iRiR1
MM
MM
)(infsup)(supinf
Theorem ( Hirota and Ikehara, 1982)
*The worst-case prior to Bob is called a least favorable prior.
,
000
02/12/1
02/12/1
,
000
000
001
21
.
100
000
000
3
Minimax POVM
,
000
02/112/1
02/12/11
2
1,
000
02/112/1
02/12/11
2
121
MM .
100
000
000
3
M
Example (1/2) Quantum states
0-1 loss ijjiw 1),(
*This is a counterexample of Theorem 2 in Hirota and Ikehara (1982), where their proof seems not
mathematically rigorous.
,
000
02/12/1
02/12/1
,
000
000
001
21
.
100
000
000
3
Quantum states
0-1 loss ijjiw 1),(
LFP 0)3(,2/1)2()1( LFLFLF
When completely unknown, it is not necessarily wise to find an optimal POVM
with the uniform prior.
(Although Statisticians already know this fact long decades ago…)
Important Implication
Minimax POVM is constructed as a Bayes POVM w.r.t. LFP.
Example (2/2)
Rewritten in Technical Terms 1.Nature
)H(}{ S k,...,2,1
Uuu }M{
)()(infsup)(supinf MM
MM
RR
Quantum Statistical Model
Experimenter and Statistician
kU ,...,2,1
Parameter space
Decision space POVM on U
Loss function
),( uw Risk function
Uu
uuwR MTr),()(M
Theorem
Main Result (Brief Summary)
)d()(infsup)(supinf M
M)(M
M
RRPoPPo
Quantum Minimax Theorem
UuuwR )d(M)(Tr),(:)(M where
Conditions, assumptions, and essence of proof are explained.
)d()(infsup)(supinf
)(
RRDPD
cf)
Statistical Decision Theory
Quantum Statistical Decision Theory
Wald (1950)
Holevo (1973)
(Classical) Minimax Theorem in statistical decision theory
Le Cam (1964)
First ver. is given by Wald (1950)
we show
Quantum Minimax Theorem in quantum statistical decision theory
Recent results and applications (many!)
Kumagai and Hayashi (2013)
Previous Works
Hayashi (1998)
Guta and Jencova (2007)
2. Quantum Statistical Models
and Measurements
Formal Definition of Statistical Models
(naïve ) Quantum statistical model = A parametric family of density operators
)H(: 1L
Basic Idea (Holevo, 1976)
A quantum statistical model is defined by a map
321
213
1i
i1
2
1)(
1)()()(0 2
3
2
2
2
1
R,, 321
Ex.
)()()();,( srsrsr R
)(
Next, we see the required conditions for this map
Regular Operator-Valued Function (1/2)
Locally compact metric space
K
Definition
}),(:),{(: dKKK
An operator-valued function
compact set
0
KXTTXXKT ),(,)()(:inf:)(1
)H(: 1LT
trace-class operators on a Hilbert space )H(1L
T0)(lim
0
KT K
Definition
For a map , we define
is regular
Remark 1
Converse does not hold if
KXTTXXKT ),(,)()(:inf:)(1
)d()()( Tf
0)(lim0
KT
)(Pfor every
Remark 2
The regularity assures that the following operator-valued integral is well-defined
as
Uniformly continuous w.r.t. trace norm on every compact set
0)()(sup1
),(
TTK
K
0
Hdim
)(0 Cf and
Regular Operator-Valued Function (2/2)
1. Identifiability (one-to-one map)
Quantum Statistical Models
Definition
}0,1Tr:)H({:)H( XXLXS
Conditions
)H(: S is called a quantum statistical model
2. Regularity
↑Necessary for our arguments
2121 ),()(
0)(lim0
KT K
if it satisfies the conditions below.
Measurements (POVMs)
U Decision space (locally compact space)
)(UPo All POVMs over U
Positive-operator valued function is called a POVM if it satisfies
Definition
)(UAB
11
)(MMj
jjj
BB
0M B
UA Borel sets (the smallest sigma algebra containing all open sets)
M
UBB A,, 21 jiBB ji ,
IU M
Born Rule
Axiom of Quantum Mechanics
measurement outcome is distributed according to
For the quantum system described by
and a measurement described by a POVM
)d(M x
)d()d(MTr~ xxx
)d( x
xPOVM
)d(M x
3. Quantum Statistical
Decision Problems
Basic Setting
Situation
1. To choose a measurement (POVM)
For the quantum system specified with the unknown parameter
Typical sequence of task
experimenters extract information for some purposes.
2. To perform the measurement over the quantum system
3. To take an action a(x) among choices based on the measurement
outcome x formally called a decision function
)d( x
)(xax)(POVM
)d(M x
Decision Functions
Example
- Estimate the unknown parameter
- Construct confidence region (credible region)
- Validate the entanglement/separability
- Estimate the unknown d.o. rather than parameter
1,0)( xa
)](),([ xaxa RL
n
xxxa n
1)(
)()(100
0)()(
0)()(
31
3
*
2
21
xaxa
xaxa
xaxa
Remarks for Non-statisticians
Remarks
1. If the quantum state is completely known, then the distribution of
measurement outcome is also known and the best action is chosen.
3. Precise estimation of the parameter is only a typical example.
2. Action has to be made from finite data.
)d( x
)(xax)(POVM
)d(M x
2),( aaw
}R:)({ m
Loss Functions
Later we see the formal definition of the loss function.
Performance of decision functions are compared by adopting
the loss functions (smaller is better).
Action space
Quantum Statistical Model
Loss function (squared error)
Ex: Parameter Estimation
)d( x
)(xa
)d( a
From the beginning, we only consider POVMs over the action space.
Put together
Measurements Over Decisions
a
Basic Idea
x)(POVM
)d(M x
POVM
)d(N a)(
Quantum Statistical Decision Problems
Quantum statistical model )(
Loss function (lower semi continuous**; bounded below) }{R: Uw
U *Decision space (locally compact space)
Formal Definitions
The triplet ),),(( wU
Statistical model )d( xp
Loss function }{R: Uw
U Decision space
cf) statistical decision problem
is called a quantum statistical decision problem.
*we follow Holevo’s terminology instead of saying “action space”.
Parameter space (locally compact space)
),,( wUp
** we impose a slightly stronger condition on the loss than LeCam, Holevo.
4. Risk Functions and
Optimality Criteria
Comparison of POVMs
The loss (e.g. squared error) depends on both unknown parameter and
our decision u, which is a random variable.( )
In order to compare two POVMs, we focus on the average of the loss
w.r.t. the distribution of u.
)],([E uw
)]',(['E uw
u)(
)d( u
POVM
)d(M u
POVM
)'d(N u
)(
)'d( u
'u
Compared at the same
)d(M)(Tr)d(~ uuu
UuuwR )M(d)(Tr),(:)(M
The risk function of
Definition
)(M UPo
),(E uw
Risk Functions
)M(d)(Tr~ uu
Since depends on the unknown parameter, we need
additional optimality criteria.
1. Smaller risk is better.
2. Generally there exists no POVM achieving the uniformly smallest
risk among POVMs.
Remarks for Non-statisticians
)(M R
)(supinf)(sup M)(M
M*
RRUPo
A POVM is said to be minimax if it minimizes the supremum of the risk function, i.e.,
Definition
Optimality Criteria (1/2)
Bogomolov showed the existence of a minimax POVM (Bogomolov
1981) in a more general framework.
(Description of quantum states and measurements is different from recent one.)
Historical Remark
All probability distributions (defined on Borel sets) )(P
The average risk of
Definition
)(M UPo
M,)(M
M, inf rr
UPo
A POVM is said to be Bayes if it minimizes the average risk, i.e.,
In Bayesian statistics, )(P is called a prior distribution.
w.r.t.
)(d)(: MM, Rr
Optimality Criteria (2/2)
Holevo showed the existence of a Bayes POVM (Holevo 1976; see also
Ozawa, 1980) in a more general framework.
Historical Remark
Parameter Estimation
U2
),( uuw
}R:)({ m
U
uuR )M(d)(Tr)(2
M ][E2
u
Ex. of Loss Functions and Risk Functions
Construction of Predictive Density Operator
)H( mSU )||)((),( uDuw m
}R:)({ pn
U
mm uuDR )M(d)()Tr||)(()(M
5. Main Result
Main Result (Quantum Minimax Theorem)
and be a compact metric space. Let
Then the following equality holds for every quantum statistical decision problem.
U
Theorem (*)
For every closed convex subset )(UPoQ the above assertion holds.
Corollary
)d()(infsup)(supinf M
)(M)(M
)(M
RRUPoPUPo
UuuwR )M(d)(Tr),(:)(M where the risk function is given by
*see quant-ph 1410.3639 for proof
Key Lemmas For Theorem
Compactness of the POVMs
If the decision space U is compact, then )(UPo is also compact.
Holevo (1976)
Equicontinuity of risk functions
The equicontinuity implies
the compactness of
under the uniform convergence topology
)()}(M:{: M CUPoRF
)(CF
Lemma by FT
Loss function R: Uw
If w is bounded and continuous, then
is (uniformly) equicontinuous.
We show main theorem by using Le Cam’s result with both lemmas.
However, not a consequence of their previous old results.
6. Minimax POVMs
and Least Favorable Priors
Statistical Decision Theory Wald (1950)
MDP prior (Reference prior for pure-states model)
Objective priors in Bayesian analysis are indeed least
favorable priors.
Reference prior; reference analysis
Bernardo (1979, 2005)
Previous Works (LFPs)
Tanaka (2012)
Komaki (2011)
Latent information prior
Jeffreys (1961)
Jeffreys prior
)d()(infsup)d()(inf M
)(M)(M
)(M
RRUPoP
LFUPo
Definition
UuuwR )d(M)(Tr),(:)(M where
If a prior achieves the supremum, i.e., the following holds
The prior is called a least favorable prior(LFP). LF
and be a compact metric space. Let w: continuous loss function.
Then, for every decision problem, there exists a LFP.
U
Theorem
Main Result (Existence Theorem of LFP)
Pathological Example
Even on a compact space, a bounded lower semicontinuous (and not
continuous) loss function does not necessarily admit a LFP.
Remark
Example
),()(M uLR 1
0
]1,0[
)(R
0 1
1
]1,0[
MM]1,0[
)d()(sup)(sup1
RR
But for every prior, ]1,0[M )d()(1 R
Minimax POVM is constructed using LFP
Every minimax POVM is a Bayes POVM with respect to the LFP.
In particular, if the Bayes POVM w.r.t. the LFP is unique, then it is
minimax.
and be a compact metric space. Let
If a LFP exists for a quantum statistical decision problem,
U
Corollary
For every closed convex subset )(UPoQ the above assertion holds.
Corollary2
Much of theoretical results are derived from quantum
minimax theorem.
7. Summary
Future Works
- Show other decision-theoretic results and previous known results by
using quantum minimax theorem
- Propose a practical algorithm of finding a least favorable prior
- Define geometrical structure in the quantum statistical decision
problem
Discussion
1. We show quantum minimax theorem, which gives theoretical basis in
quantum statistical decision theory, in particular in objective Bayesian
analysis.
2. For every closed convex subset of POVMs, all of assertions still hold.
Our result has also practical meanings for experimenters.