0603254

18
arXiv:quant-ph/0603254 v1 28 Mar 2006 Hypothesis testing for an entangled state produced by spontaneous parametric down conversion Masahito Hayashi , Bao-Sen Shi , Akihisa Tomita ,§ , Keiji Matsumoto ,, Yoshiyuki Tsuda , and Yun-Kun Jiang ERATO Quantum Computation and Information Project, Japan Science and Technology Agency (JST), Tokyo 113-0033, Japan National Institute of Informatics, Hitotsubashi, Chiyoda-ku, Tokyo 101-8430, Japan COE, Chuo University, Kasuga, Bunkyo-ku, Tokyo 112-8551, Japan § Fundamental Research Laboratories, NEC, Tsukuba 305-8501, Japan Generation and characterization of entanglement are crucial tasks in quantum information pro- cessing. A hypothesis testing scheme for entanglement has been formulated. Three designs were proposed to test the entangled photon states created by the spontaneous parametric down conver- sion. The time allocations between the measurement bases were designed to consider the anisotropic deviation of the generated photon states from the maximally entangled states. The designs were evaluated in terms of the asymptotic variance and the p-value. It has been shown that the optimal time allocation between the coincidence and anti-coincidence measurement bases improves the en- tanglement test. The test can be further improved by optimizing the time allocation between the anti-coincidence bases. Analysis on the data obtained in the experiment verified the advantage of the entanglement test designed by the optimal time allocation. PACS numbers: 03.65.Wj,42.50.-p,03.65.Ud I. INTRODUCTION The concept of entanglement has been thought to be the heart of quantum mechanics. The seminal experi- ment by Aspect et al [1] has proved the ’spooky’ non- local action of quantum mechanics by observing viola- tion of Bell inequality [2] with entangled photon pairs. Recently, entanglement has been also recognized as an important resource for information processing. It has been revealed that the entanglement plays an essential role, explicitly or implicitly, in the quantum informa- tion processing, which provides the unconditional secu- rity in cryptographic communication and an exponen- tial speed-up in some computational tasks[3]. For exam- ple, the entangled states are indispensable in quantum teleportation[4], a key protocol in a quantum repeater[5]. Even in BB84 quantum cryptographic protocol[6], a hid- den entanglement between the legitimate parties guar- antees the security[7]. Practical realization of entangled states is therefore one of the most important issues in the quantum information technology. Practical implementation raises a problem how to make sure that we have highly entangled states, which are required to achieve the quantum information protocols. Unavoidable imperfections will reduce the entanglement in the generation process. Moreover, decoherence and dissipation due to the coupling with the environment de- grade the entanglement during the processing. Therefore, it is an important issue to characterize the entanglement of the generated (or stored) states to guarantee the suc- cessful quantum information processing. Quantum state estimation and Quantum state tomography are known as the method of identifying the unknown state[8, 9, 10]. Quantum state tomography [11] has recently applied to obtain full information of the 4 × 4 two-particle density matrix from the coincidence counts of 16 combinations. For practical applications, however, characterization is not the goal of an experiment, but only a part of prepa- ration. It is thus favorable to reduce the time for charac- terization and the number of consumed particles as possi- ble. In most applications, we don’t need to know the full information on the states; we only need to know whether the states are enough entangled or not by a test. The test should be simpler than the full characterization. Barbi- eri et al [12] introduced an entanglement witness to test the entanglement of polarized entangled photon pairs. We will treat the optimization problem of the tests in the framework of the hypothesis testing. It enables us to handle the fluctuation in the data properly with the mathematical statistics[13]. In the following, we consider experimental designs to test the polarization entangled states of two pho- ton pairs generated by spontaneous parametric down conversion (SPDC), though the concept of the design are applicable to other two-particle entangled states. Two-photon states can be characterized by the corre- lation of photon detection events in several measure- ment bases. For example, if the state is close to |Φ (+) = 1 2 (|HH + |VV ), the coincidence counts on the bases {|HH , |VV , |DD, |XX , |RL, |LR〉} yields the maximum values, whereas the coincidence counts on the bases {|HV , |VH , |DX , |XD, |RR, |LL〉} take the minimum values, where H , V , X , D, R, and L stand for horizontal, vertical, 45 linear, 135 linear, clock-wise circular, and anti-clock-wise circular polarizations, re- spectively. We will refer to the former set of the bases as the coincidence bases, and to the latter as the anti- coincidence bases. The ratio of the minimum counts to the maximum counts measures the degree of entangle- ment. For example, visibility of two-photon interference,

Upload: gejikeiji

Post on 20-Feb-2016

212 views

Category:

Documents


0 download

DESCRIPTION

enttest

TRANSCRIPT

Page 1: 0603254

arX

iv:q

uant

-ph/

0603

254

v1

28 M

ar 2

006

Hypothesis testing for an entangled state produced

by spontaneous parametric down conversion

Masahito Hayashi∗, Bao-Sen Shi∗, Akihisa Tomita∗,§,

Keiji Matsumoto∗,†, Yoshiyuki Tsuda‡, and Yun-Kun Jiang∗∗ERATO Quantum Computation and Information Project,

Japan Science and Technology Agency (JST), Tokyo 113-0033, Japan†National Institute of Informatics, Hitotsubashi, Chiyoda-ku, Tokyo 101-8430, Japan

‡COE, Chuo University, Kasuga, Bunkyo-ku, Tokyo 112-8551, Japan§Fundamental Research Laboratories, NEC, Tsukuba 305-8501, Japan

Generation and characterization of entanglement are crucial tasks in quantum information pro-cessing. A hypothesis testing scheme for entanglement has been formulated. Three designs wereproposed to test the entangled photon states created by the spontaneous parametric down conver-sion. The time allocations between the measurement bases were designed to consider the anisotropicdeviation of the generated photon states from the maximally entangled states. The designs wereevaluated in terms of the asymptotic variance and the p-value. It has been shown that the optimaltime allocation between the coincidence and anti-coincidence measurement bases improves the en-tanglement test. The test can be further improved by optimizing the time allocation between theanti-coincidence bases. Analysis on the data obtained in the experiment verified the advantage ofthe entanglement test designed by the optimal time allocation.

PACS numbers: 03.65.Wj,42.50.-p,03.65.Ud

I. INTRODUCTION

The concept of entanglement has been thought to bethe heart of quantum mechanics. The seminal experi-ment by Aspect et al [1] has proved the ’spooky’ non-local action of quantum mechanics by observing viola-tion of Bell inequality [2] with entangled photon pairs.Recently, entanglement has been also recognized as animportant resource for information processing. It hasbeen revealed that the entanglement plays an essentialrole, explicitly or implicitly, in the quantum informa-tion processing, which provides the unconditional secu-rity in cryptographic communication and an exponen-tial speed-up in some computational tasks[3]. For exam-ple, the entangled states are indispensable in quantumteleportation[4], a key protocol in a quantum repeater[5].Even in BB84 quantum cryptographic protocol[6], a hid-den entanglement between the legitimate parties guar-antees the security[7]. Practical realization of entangledstates is therefore one of the most important issues in thequantum information technology.

Practical implementation raises a problem how tomake sure that we have highly entangled states, which arerequired to achieve the quantum information protocols.Unavoidable imperfections will reduce the entanglementin the generation process. Moreover, decoherence anddissipation due to the coupling with the environment de-grade the entanglement during the processing. Therefore,it is an important issue to characterize the entanglementof the generated (or stored) states to guarantee the suc-cessful quantum information processing. Quantum stateestimation and Quantum state tomography are knownas the method of identifying the unknown state[8, 9, 10].Quantum state tomography [11] has recently applied toobtain full information of the 4 × 4 two-particle density

matrix from the coincidence counts of 16 combinations.For practical applications, however, characterization isnot the goal of an experiment, but only a part of prepa-ration. It is thus favorable to reduce the time for charac-terization and the number of consumed particles as possi-ble. In most applications, we don’t need to know the fullinformation on the states; we only need to know whetherthe states are enough entangled or not by a test. The testshould be simpler than the full characterization. Barbi-eri et al [12] introduced an entanglement witness to testthe entanglement of polarized entangled photon pairs.We will treat the optimization problem of the tests inthe framework of the hypothesis testing. It enables usto handle the fluctuation in the data properly with themathematical statistics[13].

In the following, we consider experimental designsto test the polarization entangled states of two pho-ton pairs generated by spontaneous parametric downconversion (SPDC), though the concept of the designare applicable to other two-particle entangled states.Two-photon states can be characterized by the corre-lation of photon detection events in several measure-ment bases. For example, if the state is close to|Φ(+)〉 = 1√

2(|HH〉 + |V V 〉), the coincidence counts on

the bases {|HH〉, |V V 〉, |DD〉, |XX〉, |RL〉, |LR〉} yieldsthe maximum values, whereas the coincidence countson the bases {|HV 〉, |V H〉, |DX〉, |XD〉, |RR〉, |LL〉} takethe minimum values, where H , V , X , D, R, and L standfor horizontal, vertical, 45◦ linear, 135◦ linear, clock-wisecircular, and anti-clock-wise circular polarizations, re-spectively. We will refer to the former set of the basesas the coincidence bases, and to the latter as the anti-coincidence bases. The ratio of the minimum counts tothe maximum counts measures the degree of entangle-ment. For example, visibility of two-photon interference,

Page 2: 0603254

2

which has been widely used to characterize the entangledstates since Aspect’s experiment [1], measures the entan-glement by the ratios obtained at two fixed bases for oneparticle. We will show that the visibility is not sufficientand to be re-formulated in the view of statistics. We thenimprove the test by optimizing the allocation on the mea-surement time for each measurement basis, consideringthat the counts on the anti-coincidence bases are muchsmaller than those on the coincidence bases. The testcan be further improved, if we utilize the knowledge onthe tendency of the entanglement degradation. In gen-eral, the error from the maximally entangled states canbe anisotropic, which reflects the generation process ofthe states. We can improve the sensitivity to the entan-glement degradation by focusing the measurement on theexpected error directions.

The SPDC is now widely used to generate entangledphoton pairs. Several experimental settings have beendemonstrated to provide highly entangled states. In par-ticular, Kwait et al [15] have obtained a high flux of thepolarization entangled photon pairs from a stack of twotype-I phase matched nonlinear crystals. One nonlinearcrystal generates a photon pair polarized in the horizon-tal direction (|HH〉), and the other generates ones polar-ized in the vertical direction (|V V 〉). If the two pairs areindistinguishable, the generated photons are entangled.Otherwise, the state will be a mixture of HH pairs andV V pairs. The quantum state tomography has shownthat the only HHHH , V V V V , V V HH , HHV V ele-ments are dominant [16], which implies that the densitymatrix can be approximately given by the classical mix-ture of the |Φ(+)〉〈Φ(+)| and |Φ(−)〉〈Φ(−)|. We can im-prove the entanglement test based on this property ofthe photon pairs, as described in the following sections.

In this article, we reformulate the hypothesis testingto be applicable to the SPDC experiments, and demon-strate the effectiveness of the optimized time allocationin the entanglement test. The construction of this arti-cle is following. Section 2 defines the hypothesis schemefor the entanglement of the two-photon states generatedby SPDC. Section 3 gives the mathematical formulationconcerning statistical hypothesis testing. Sections 4 - 10describe the mathematical aspects, and Section 11 exam-ines the experimental aspects of the hypothesis testing onthe entanglement. The designs on the time allocation areevaluated by the experimental data. Hence, if a reader isinterested only in experiments and data analysis, he canskip sections 4 - 10 and proceed to section 11. However,if he is concerned with the mathematical discussion, heshould read sections 4 - 10 before section 11.

The following is organization of sections 4 - 10, whichdiscuss more theoretical issues. Sections 4 and 5 give thefundamental properties of the hypothesis testing: section4 introduces the likelihood ratio test, and section 5 givesthe asymptotic theory of the hypothesis testing. Sections6-9 are devoted to the designs of the time allocation be-tween the coincidence and anti-coincidence bases: section6 defines the modified visibility method, section 7 opti-

mize the time allocation, when the total photon flux λ isunknown, section 8 gives the results with known λ, andsection 9 compares the designs in terms of the asymptoticvariance. Section 10 gives further improvement by opti-mizing the time allocation between the anti-coincidencebases. Appendices give the detail of the proofs used inthe optimization.

II. HYPOTHESIS TESTING SCHEME FORENTANGLEMENT

This section introduces the hypothesis test for entan-glement. We consider the two-photon states generated bySPDC, which are described by a density matrix σ. Weassume each two-photon generation process to be iden-tical but individual. The target state is the maximallyentangled |Φ(+)〉 state. Here we measure the entangle-ment by the fidelity between the generated state and thetarget state:

θ = 〈Φ(+)|σ|Φ(+)〉. (1)

The purpose of the test is to guarantee that the state issufficiently close to the maximally entangled state with acertain significance. That is, we are required to disprovethat the fidelity θ is less than a threshold θ0 with a smallerror probability. In mathematical statistics, this situa-tion is formulated as hypothesis testing; we introduce thenull hypothesis H0 that entanglement is not enough andthe alternative H1 that the entanglement is enough:

H0 : θ ≤ θ0 v.s. H1 : θ > θ0, (2)

with a threshold θ0.The outcome of the coincidence count measurement in

the SPDC experiment can be assumed a random vari-able independently distributed according to the Poissondistribution. From now on, a symbol labeled by a pair(x, y) refers to a random variable or parameter related tothe measurement basis |xA, yB〉. The number of detec-tion events nxy on the basis |xA, yB〉 is a random variableaccording to the Poisson distribution Poi((λµxy + δ)txy)of mean (λµxy + δ)txy, where

• λ is a known constant, related to the photon detec-tion rate, determined from the average number ofphoton pairs generated in unit time and the detec-tion efficiency,

• µxy = 〈xA, yB|σ|xA, yB〉 is an unknown constant,

• txy is a known constant of the time for detection.

• δ is a constant of the average dark counts,

The probability function of nxy is

exp(−(λµxy + δ)txy)((λµxy + δ)txy)nxy

nxy!.

Page 3: 0603254

3

Because the detections at different times are mutuallyindependent, nxy is independent of nx′y′ (x 6= x′ or y 6=y′). In this paper, we discuss the quantum hypothesistesting under the above assumption while Usami et al.[17]discussed the state estimation under this assumption.

Visibility of the two photon interference is an indica-tor of entanglement commonly used in the experiments.When the dark-count parameter δ can be regarded tobe 0, the visibility is calculated as follows: first, A’smeasurement basis |xA〉 is fixed, then the measurement|xA, yB〉 is performed by rotating B’s measurement ba-sis |xB〉 to obtain the maximum and minimum numberof the coincidence counts, nmax and nmin. We need tomake the measurement with at least two bases of A inorder to exclude the possibility of the classical correla-tion. We may choose the two bases |H〉 and |D〉 as |xA〉,for example. Finally, the visibility is given by the ratiobetween nmax − nmin and nmax + nmin with the respec-tive A’s measurement basis |xA〉. However, our decisionwill contain a bias, if we choose only two bases as A’smeasurement basis |xA〉. Hence, we cannot estimate thefidelity between the target maximally entangled state andthe given state in a statistically proper way from the vis-ibility.

Since the equation

|HH〉〈HH | + |V V 〉〈V V | + |DD〉〈DD|+ |XX〉〈XX |+ |RL〉〈RL| + |LR〉〈LR|

=2|Φ(+)〉〈Φ(+)| + I (3)

holds, we can estimate the fidelity by measuring thesum of the coincidence counts of the following bases:|HH〉, |V V 〉, |DD〉, |XX〉, |RL〉, and |LR〉, when λ andδ are known[12, 13]. This is because the sum n1 :=nHH + nV V + nDD + nXX + nRL + nLR obeys the Pois-son distribution with the expectation value (λ1+2θ

6 +δ)t1,

where the measurement time for each base is t16 .

The parameter λ is usually unknown, however. Weneed to perform another measurement on different basesto obtain additional information. Since

|HV 〉〈HV | + |V H〉〈V H | + |XD〉〈XD|+ |DX〉〈DX | + |RR〉〈RR| + |LL〉〈LL|

=2I − 2|Φ(+)〉〈Φ(+)| (4)

also holds, we can estimate the fidelity by measuring thesum of the coincidence counts of the following bases:|HV 〉, |V H〉, |DX〉, |XD〉, |RR〉, and |LL〉. The sumn2 := nHV + nV H + nDX + nXD + nRR + nLL obeysthe Poisson distribution Poi((λ2−2θ

6 + δ)t2), where the

measurement time for each base is t26 . Combining the

two measurements, we can estimate the fidelity withoutthe knowledge of λ.

We can also consider different type of measurement onλ. If we prepare our device to detect all photons, thedetected number n3 obeys the distribution Poi((λ+δ)t3)with the measurement time t3. We will refer to it as thetotal flux measurement. In the following, we consider

the best time allocation for estimation and test on thefidelity, by applying methods of mathematical statistics.We will assume that λ is known or estimated from thedetected number n3.

III. HYPOTHESIS TESTING FORPROBABILITY DISTRIBUTIONS

A. Formulation

In this section, we review the fundamental knowledgeof hypothesis testing for probability distributions[14].Suppose that a random variable X is distributed accord-ing to a probability measure Pθ identified by the unknownparameter θ. We also assume that the unknown parame-ter θ belongs to one of mutually disjoint sets Θ0 and Θ1.When we want to guarantee that the true parameter θ be-longs to the set Θ1 with a certain significance, we choosethe null hypothesis H0 and the alternative hypothesis H1

as

H0 : θ ∈ Θ0 versus H1 : θ ∈ Θ1. (5)

Then, our decision method is described by a test, whichis described as a function φ(x) taking values in {0, 1}; H0

is rejected if 1 is observed, and H0 is not rejected if 0 isobserved. That is, we make our decision only when 1 isobserved, and do not otherwise. This is because the pur-pose is accepting H1 by rejecting H0 with guaranteeingthe quality of our decision, and is not rejecting H1 noraccepting H1.

From theoretical viewpoint, we often consider random-ized tests, in which we probabilistically make the decisionfor a given data. Such a test is given by a function φ map-ping to the interval [0, 1]. When we observe the data x,H0 is rejected with the probability φ(x). In the following,we treat randomized tests as well as deterministic tests.

In the statistical hypothesis testing, we minimize errorprobabilities of the test φ. There are two types of errors.The type one error is the case whereH0 is rejected thoughit is true. The type two error is the converse case, H0

is accepted though it is false. Hence, the type one errorprobability is given Pθ(φ) (θ ∈ Θ0), and the type twoerror probability is given 1 − Pθ′(φ) (θ′ ∈ Θ1), where

Pθ(φ) =

φ(x)dPθ(x).

It is in general impossible to minimize both Pθ(φ) and1 − Pθ′(φ) simultaneously because of a trade-off relationbetween them. Since we make our decision with guaran-teeing its quality only when 1 is observed, it is definitivelyrequired that the type one error probability Pθ(φ) is lessthan a certain constant α. For this reason, we minimizethe type two error probability 1−Pθ′(φ) under the condi-tion Pθ(φ) ≤ α. The constant α in the condition is calledthe risk probability, which guarantees the quality of ourdecision. If the risk probability is large enough, our deci-sion has less reliability. Under this constraint for the risk

Page 4: 0603254

4

probability, we maximize the probability to reject the hy-pothesis H0 when the true parameter is θ′ ∈ Θ1. Thisprobability is given as Pθ(φ), and is called the power ofφ. Hence, a test φ of the risk probability α is said to bemost powerful (MP) at θ′ ∈ Θ1 if Pθ′(φ) ≥ Pθ′(ψ) holdsfor any test ψ of the risk probability α. Then, a test issaid to be Uniformly Most Powerful (UMP) if it is MPat any θ′ ∈ Θ1.

B. p-values

In the hypothesis testing, we usually fixed our test be-fore applying it to data. However, we sometimes focuson the minimum risk probability among tests in a class Trejecting the hypothesisH0 with a given data. This valueis called the p-value, which depends on the observed datax as well as the subset Θ0 to be rejected.

In fact, in order to define the p-value, we have to fix aclass T of tests. Then, for x and Θ0, p-value is definedas

minφ∈T :φ(x)=1

maxθ∈Θ0

pθ(φ). (6)

Since the p-value expresses the risk for rejecting the hy-pothesis H0, Hence, this concept is useful for comparisonamong several experiments.

Note that if we are allowed to choose any function φas a test, the above minimum is attained by the functionδx:

δx(y) =

{

0 if y 6= x1 if y = x.

(7)

In this case, the p-vale is maxθ∈Θ0 Pθ(x). However, thefunction δx is unnatural as a test. Hence, we should fixa class of tests to define p-value.

IV. LIKELIHOOD TEST

A. Definition

The likelihood ratio test is a standard test being UMPfor typical cases[14]. When both Θ0 and Θ1 consist ofsingle elements as Θ0 = {θ0} and Θ1 = {θ1}, the likeli-hood ratio test φr is defined as

φr(x) :=

{

0 if pθ0(x)/pθ1(x) ≥ r,

1 if pθ0(x)/pθ1(x) < r

where r is a constant, and the ratio Pθ0(x)/Pθ1(x) iscalled the likelihood ratio. From the definition, any testφ satisfies

(rPθ1 − Pθ0)(φr) ≥ (rPθ1 − Pθ0)(φ). (8)

When a likelihood test φr satisfies

α = Pθ0(φr), (9)

the test φLR,r is MP of level α. Indeed, when a test φsatisfies Pθ0(φ) ≤ α,

− α+ rPθ1 (φ) = −Pθ0(φ) + rPθ1(φ)

≤− Pθ0(φr) + rPθ1(φr) = −α+ rPθ1 (φr).

Hence, 1 − Pθ1(φ) ≥ 1 − Pθ1(φr). This is known asNeyman-Pearson’s fundamental lemma[19].

The likelihood ratio test is generalized to the caseswhere Θ0 or Θ1 has at least two elements as

φr(x) :=

0 ifsupθ∈Θ0

pθ(x)

supθ∈Θ1pθ(x) ≥ r,

1 ifsupθ∈Θ0

pθ(x)

supθ∈Θ1pθ(x) < r.

B. Monotone Likelihood Ratio Test

In cases where the hypothesis is one-sided, that is, theparameter space Θ is an interval of R and the hypothesisis given as

H0 : θ ≥ θ0 versus H1 : θ < θ0, (10)

we often use so-called interval tests for its optimality un-der some conditions as well as for its naturalness.

When the likelihood ratio Pθ(x)/Pη(x) is monotoneincreasing concerning x for any θ, η such that θ > η,the likelihood ratio is called monotone. In this case, thelikelihood test φr between Pθ0 and Pθ1 is UMP of levelα := Pθ0(φr), where θ1 is an arbitrary element satisfyingθ1 < θ0.

Indeed, many important examples satisfy this condi-tion. Hence, it is convenient to give its proof here.

From the monotonicity, the likelihood test φr has theform

φr(x) =

{

1 x < x0

0 x ≥ x0(11)

with a threshold value x0. Since the monotonicity impliesPθ0(φr) ≥ Pθ(φr) for any θ ∈ Θ0, it follows from NeymanPearson Lemma that the likelihood test φr is MP of levelα. From (11), the likelihood test φr is also a likelihoodtest between Pθ0 and Pη, where η is another elementsatisfying η < θ0. Hence, the test φr is also MP of levelα.

From the above discussion, it is suitable to treat p-value based on the class of likelihood tests. In this case,when we observe x0, the p-value is equal to

∫ x0

−∞Pθ0(dx). (12)

C. One-Parameter Exponential Family

In mathematical statistics, exponential families areknown as a class of typical statistical models[18]. A fam-ily of probability distributions {pθ|θ ⊂ Θ} is called an

Page 5: 0603254

5

exponential family when there exists a random variablex such that

Pθ(x) := P0(x) exp(θx+ g(θ)), (13)

where g(θ) := − log∫

exp(θx)P0(dx).It is known that this class of families includes, for ex-

ample, the Poisson distributions, normal distributions,binomial distributions, etc. In this case, the likelihood

ratio exp(θ0x+g(θ0))exp(θ1x+g(θ1))

= exp((θ0 − θ1)x + g(θ0) − g(θ1)) is

monotone concerning x for θ0 > θ1. Hence, the likelihoodratio test is UMP in the hypothesis (10). Note that thisargument is valid even if we choose a different parameterif the family has a parameter satisfying (13).

For example, in the case of the normal distribution

Pθ(x) = 1√2πV

e−(x−θ)2

2V = 1√2πV

e−x2

2V + θxV − θ2

2V , the UMP

test φUMP,α of the level α is given as

φUMP,α(x) :=

{

1 if x < θ0 − ǫα√V

0 if x ≥ θ0 − ǫα√V,

(14)

where

α =

∫ −ǫα

−∞

1√2πe−

x2

2 dx. (15)

The n-trial binomial distributions Pnp (k) =

(

nk

)

(1 −p)n−kpk are also an exponential family because an-other parameter θ := log p

1−p satisfies that Pnp (k) =

(

nk

)

12n e

θk+n log eθ

1+eθ . Hence, in the case of the n-trial bi-nomial distribution, the UMP test φn

UMP,α of the level αis given as

φUMP,α(k) :=

1 if x < k0

γ if x = k0

0 if x > k0

(16)

where k0 is the maximum value k′ satisfying α ≥∑k′−1

k=0

(

nk

)

(1 − θ)n−kθk, and γ is defined as

α = γ

(

n

k0

)

(1 − θ)n−k0θk0 +

k0−1∑

k=0

(

n

k

)

(1 − θ)n−kθk.

(17)

The Poisson distributions are also an exponentialfamily because another parameter θ := logn satisfies

Poi(µ) = 1ne

θn−eθ

. In this case, the UMP test φUMP,α

of the level α is similarly characterized. When n is suffi-ciently large, the distribution Pn

θ (k) can be approximatedby the normal distribution with variance n(1−θ)θ. Thatis, the UMP test φn

UMP,α of the level α is approximatelygiven as

φnUMP,α(k) :=

{

1 if k < nθ0 − ǫα√nθ0(1−θ0)

0 if k ≥ nθ0 − ǫα√nθ0(1−θ0)

.(18)

Similarly, in the case of the Poisson distribution Poi(θ),the UMP test φUMP,α of the level α is approximatelygiven as

φUMP,α(k) :=

{

1 if k < nθ0 − ǫα√θ0

0 if k ≥ nθ0 − ǫα√θ0.

(19)

Next, we consider testing the following hypothesis inthe case of the binomial Poisson distribution Poi(µ1, µ2):

H0 :µ1

µ1 + µ2≥ θ0 versus H1 :

µ1

µ1 + µ2< θ0. (20)

In this case, the test φ(k1, k2) = φk1+k2

UMP,α(k1) is a testof level α. This is because the conditional distribution

Poi(µ1,µ2)(k,n−k)∑n

k′=0Poi(µ1,µ2)(k′,n−k′) is equal to the binomial distribu-

tion Pnµ1

µ1+µ2

(k). Therefore, when we observe k1, k2, the p-

value of this class of tests is equal to∑k1

k=0

(

k1+k2

k

)

θk0 (1−

θ)k1+k2−k.

D. Multi-parameter case

In the one-parameter case, UMP tests can be oftencharacterized by likelihood tests. However, in multi-parameter case, this type characterization is impossiblegenerally, and the UMP test does not always exist. Inthis case, we have to choose our test among non-UMPtests. One idea is choosing our test among likelihoodtests because likelihood tests always exist and we can ex-pect that these tests have good performances. Generally,it is not easy to give an explicit form of the likelihoodtest. When the family is multi-parameter exponentialfamily, the likelihood test has a simpler form. A familyof probability distributions {pθ|θ = (θ1, . . . , θm) ∈ R

m}is called an m-parameter exponential family when thereexists m random variable x1, . . . , xm such that

Pθ(x) := P0(x) exp(θ · x+ g(θ)),

where g(θ) := − log∫

exp(θ · x)P0(dx).In this case the likelihood test φr has the form

φr(x) =

0 if infθ1∈Θ1 D(Pθ(x)‖Pθ1)− infθ0∈Θ0 D(Pθ(x)‖Pθ0) ≥ log r

1 if infθ1∈Θ1 D(Pθ(x)‖Pθ1)− infθ0∈Θ0 D(Pθ(x)‖Pθ0) < log r,

(21)

where the divergence D(Pη‖Pθ) is defined as

D(Pη‖Pθ) :=

logPη(x′)

Pθ(x′)Pη(dx′)

= (η − θ)

xPη(dx) + g(η) − g(θ),

and θ(x) is defined by [18]∫

x′Pθ(x)(dx′) = x. (22)

Page 6: 0603254

6

This is because the logarithm of the likelihood functionis calculated as

logsupθ0∈Θ0

Pθ0(x)

supθ1∈Θ1Pθ1(x)

= supθ0∈Θ0

infθ1∈Θ1

logPθ0(x)

Pθ1(x)

= supθ0∈Θ0

infθ1∈Θ1

(θ0 − θ1) · x+ g(θ0) − g(θ0)

= supθ0∈Θ0

infθ1∈Θ1

(θ0 − θ1) ·∫

x′Pθ(x)(dx′) + g(θ0) − g(θ0)

= supθ0∈Θ0

infθ1∈Θ1

D(Pθ(x)‖Pθ1) −D(Pθ(x)‖Pθ0)

= infθ1∈Θ1

D(Pθ(x)‖Pθ1) − infθ0∈Θ0

D(Pθ(x)‖Pθ0).

In addition, θ(x) coincides with the MLE when x is ob-served.

In the following, we treat two hypotheses given as

H0 : w · θ ≥ c0 versus H1 : w · θ < c0. (23)

In the case of the multi-normal distribution fam-ily Pθ(x) = 1

(2π)m/2√

detVe−

12 (x−θ)V −1(x−θ)dx =

1(2π)m/2

√det V

e−12xV −1x+θV −1x− 1

2 θV −1θdx, In this case,

D(Pη‖Pθ) = 12 (η − θ)V −1(η − θ) and θ(x) = x. Since

minw·θ=c0

12 (η − θ)V −1(η − θ) = (w·η−c)2

2wV −1w , the likelihoodfunction is calculated as

infθ1∈Θ1

D(Px‖Pθ1) − infθ0∈Θ0

D(Px‖Pθ0)

= infc<c0

(w · x− c)2

2wV −1w− inf

c≥c0

(w · x− c)2

2wV −1w. (24)

That is, the likelihood function has the same form as thelikelihood function of one-parameter normal distributionfamily with the variance wV −1w.

The multi-nomial Poisson distributions

Poi(µ1, . . . , µm)(k1, . . . , km) := e−∑ l

i=1 µiµ

k11 ···µkm

l

k1!···km!is also an exponential family. The divergence iscalculated as

D(Poi(µ1, . . . , µm)‖Poi(µ′1, . . . , µ

′m))

=m∑

i=1

(µ′i − µi) +

m∑

i=1

µi logµi

µ′i

. (25)

Hence, using this formula and (21), we can calculate thelikelihood ratio test. Now, we calculate the p-value con-cerning the class of likelihood ratio tests when we observethe data k1, . . . , km. When

∑mi=1 wiki < c0, this value is

equal to

maxw·µ′=c0

Poi(µ′1, . . . , µ

′m)

{

(k′1, . . . , k′m)

minw·µ=c0

∑mi=1(µi − k′i) +

∑mi=1 k

′i log

k′

i

µi≥ R

∑mi=1 wiki < c0

}

, (26)

where R := minw·µ=c0

∑mi=1(µi−ki)+

∑mi=1 ki log ki

µi. As

is shown in Appendix D, this value is upperly boundedby

maxw·µ′=c0

Poi(µ′1, . . . , µ

′m)

{

(k′1, . . . , k′m)

k∑

i=1

k′iµi

≤ 1

}

,

(27)

where µi is defined as follows:

c0wi

− µi + µi logµiwi

c0= R if R ≤ R0 (28)

c0wM

+ µi logwM − wi

wM= R if R > R0, (29)

where wM := maxiwi and R0 := c0

wM−

c0(wM−wi)wiwM

log wM−wi

wM.

V. ASYMPTOTIC THEORY

A. Fisher information

Assume that the data x1, . . . , xn obeys the identicaland independent distribution of the same distributionfamily pθ and n is sufficiently large. When the true pa-rameter θ is close to θ0, it is known that the meaning-ful information for θ is essentially given as the randomvariable 1

n

∑ni=1 lθ0(xi), where the logarithmic derivative

lθ0(xi) is defined by

lθ(x) :=d log pθ(x)

dθ. (30)

In this case, the random variable 1n

∑ni=1 lθ0(xi) can be

approximated by the normal distribution with the expec-tation value θ−θ0 and the variance 1

nJθ0, where the Fisher

information Jθ is defined as Jθ :=∫

(lθ(x))2Pθ(dx).

Hence, the testing problem can be approximated by thetesting of this normal distribution family [14, 18]. Thatis, the quality of testing is approximately evaluated by

Page 7: 0603254

7

the Fisher information Jθ0 at the threshold θ0.In the case of Poisson distribution family Poi(θt), the

parameter θ can be estimated by Xt . The asymptotic case

corresponds to the case with large t. In this case, Fisherinformation is t

θ . When X obey the unknown Poisson

distribution family Poi(θt), the estimation error Xt − θ is

close to the normal distribution with the variance θt , i.e.,√

t(Xt − θ) approaches to the random variables obeying

the normal distribution with variance θ. That is, Fisherinformation corresponds to the inverse of variance of theestimator.

This approximation can be extended to the multi-parameter case {pθ|θ ∈ R

m}. Similarly, it is known thatthe testing problem can be approximated by the testingof the normal distribution family with the covariance ma-trix (nJθ)

−1, where the Fisher information matrix Jθ;i,j

is given by

Jθ;i,j :=

lθ;i(x)lθ;j(x)Pθ(dx), (31)

lθ;i(x) :=∂ log pθ(x)

∂θi. (32)

When the hypotheses is given by (10), the testing prob-lem can be approximated by the testing of the normal

distribution family with variancew·J−1

θ0w

n ,Indeed, the same fact holds for the multinomial Poisson

distribution family Poi(tµ1, . . . , tµm). When the randomvariable Xj is the i-th random variable, the random vari-

able∑m

j=1λj√

t(Xj − µj) converges to the random vari-

able obeying the normal distribution with the variance∑m

j=1 λ2jµj in distribution:

m∑

j=1

λj√t(Xj − µj)

d−→m∑

j=1

λ2jµj . (33)

This convergence is compact uniform concerning the pa-rameter µ1, . . . , µm. In this case, the Fisher informationmatrix Jµ is the diagonal matrix with the diagonal el-ements ( t

µ1, . . . , t

µm). When our distribution family is

given as a subfamily Poi(tµ1(θ), . . . , tµm(θ)), the Fisher

information matrix is AtθJµ(θ)Aθ, where Aθ;i,j =

∂µj

∂θi.

Hence, when the hypotheses is given by (23), the test-ing problem can be approximated by the testing of thenormal distribution family with variance

w · (AtθJµ(θ)Aθ)

−1w. (34)

In the following, we call this value Fisher information.Based on this value, the quality can be compared whenwe have several testing schemes.

B. p-values

In the following, we treat the p-values. First, we focuson the testing hypothesis (10) in one-parametric normal

distributions with the variance v. In this case, the p-value of likelihood ration tests is the function of the datax, the variance v, and the threshold θ0, which is equal toΦ(x−θ0√

v), where

Φ(x) :=

∫ x

−∞

1√2πe−

x2

2 dx.

In the n-trial binomial distribution case, the distribu-tion can approximated by normal distribution if n is suffi-ciently large. In this case, the p-value of likelihood rationtests is the function of the data k, the number of trial andthe threshold θ0, which is equal to

Φ(k − nθ0

n(1 − θ0)θ0). (35)

Next, focus on the Poisson distribution case Poi(µt).The p-value of likelihood ration tests is the function ofthe data n, the time t and the threshold µ0, which isalmost equal to

Φ(n− µ0t√

µ0t), (36)

when the time t is sufficiently large.Using (35), we consider the hypothesis (20) in the bi-

nomial Poisson distribution Poi(µ1t, µ2t). The p-value oftests given in likelihood ration tests is the function of thedata n1, n2, and the threshold θ0, which is almost equalto

Φ(n1 − (n1 + n2)θ0

(n1 + n2)θ0(1 − θ0)), (37)

when the total number n1 + n2 is sufficiently large.Next, we consider the hypothesis (23) in the multino-

mial Poisson distribution Poi(µ1, · · · , µm). In this case,by using µi defined in (28) and (29), the upper bound ofthe p-value is approximated to

maxw·µ′=c0

Φ

1 −∑mj=1

µ′

j

µj√

∑mi=1

µ′

i

µ2i

= Φ

maxw·µ′=c0

1 −∑mj=1

µ′

j

µj√

∑mi=1

µ′

i

µ2i

,

(38)

because this convergence (33) is compact uniform con-cerning the parameter µ1, . . . , µm. Letting xi = c0

wiµi

and yi = c0

wiµ2i, we have

maxw·µ′=c0

1 −∑mj=1

µ′

j

µj√

∑mi=1

µ′

i

µ2i

= max(x,y)∈Co

1 − x√y, (39)

where Co is the convex hull of (x1, y1), . . . , (xm, ym).That is, p-value is given by

Φ( max(x,y)∈Co

1 − x√y

). (40)

Page 8: 0603254

8

VI. MODIFICATION OF VISIBILITY

In the two photon interference, the coincidencecounts on the bases |HH〉, |V V 〉, |DD〉, |XX〉, |RL〉,and |LR〉 yield the maximum values (coincidence),whereas the coincidence counts on the bases|HV 〉, |V H〉, |DX〉, |XD〉, |RR〉, and |LL〉 does theminimum values (anti-coincidence). We can testthe fidelity between the maximally entangled state|Φ(+)〉〈Φ(+)| and the given state σ, using the total coin-cidence count k1 and the total anti-coincidence count k2

obtained by measuring on all the bases with the time t12 .

The total coincidence count k1 obeys Poi(λ2θ+112 t), and

the total anti-coincidence count k2 obeys the distributionPoi(λ2−2θ

12 t), and Fisher information matrix concerningthe parameters θ and λ is

(

λ( t3(2θ+1) + t

3(2−2θ)) 0

02θ+112 t+ 2−2θ

12 t

λ

)

, (41)

where the first element corresponds to the parameter θand the second one does to the parameter λ. Then, wecan apply the testing method given in the end of subsec-tion IVC. On the basis of the discussion in subsectionVA, the asymptotic variance (34) is calculated to be

1

λ( t3(2θ+1) + t

3(2−2θ))=

(2θ + 1)(2 − 2θ)

λt. (42)

The above method uses the ratio

µ2

µ1 + µ2,

where µ1 = λ2θ+112 t and µ2 = λ2−2θ

12 t are the expecta-tion values of the total coincidence counts and the to-tal anti-coincidence counts, respectively. Considering thedefinition of visibility, (nmax − nmin)/(nmax + nmin) =(2nmin/(nmax + nmin)), we can regard the above esti-mation of fidelity as a modification of the visibility in awell-defined statistics manner. We will refer to it as themodified visibility method. In the following, we will pro-pose several designs of experiment to improve the modi-fied visibility method.

VII. DESIGN I (λ: UNKNOWN, ONE STAGE)

In this section, we consider the problem of test-ing the fidelity between the maximally entangledstate |Φ(+)〉〈Φ(+)| and the given state σ using data(k1, k2, k3) subject to the multinomial Poisson distribu-tion Poi(λ2θ+1

6 t1, λ2−2θ

6 t2, λt3) with the assumption thatthe parameter λ is unknown. In this problem, it is natu-ral to assume that we can select the time allocation withthe constraint for the total time t1 + t2 + t3 = t.

The performance of the time allocation (t1, t2, t3) isevaluated by the variance (34). The Fisher information

matrix concerning the parameters θ and λ is

(

λ( 2t13(2θ+1) + 2t2

3(2−2θ) )t1−t2

3

t1−t23

2θ+16 t1+ 2−2θ

6 t2+t3λ

)

, (43)

where the first element corresponds to the parameter θand the second one does to the parameter λ. Then, theasymptotic variance (34) is calculated as

2θ+16 t1 + 2−θ

6 t2 + t3

λ(

(2θ+16 t1 + 2−θ

6 t2 + t3)(2t1

3(2θ+1) + 2t23(2−2θ)) − ( t1−t2

3 )2) .

(44)

We optimize the time allocation by minimizing the vari-ance (44). We perform the minimization by maximizing

the inverse: λ

(

2t13(2θ+1) + 2t2

3(2−2θ) −(

t1−t23 )2

2θ+16 t1+

2−2θ6 t2+t3

)

.

Applying Lemmas 1 and 2 shown in Appendix A to thecase of a = 2

3(2θ+1) , b = 23(2−2θ) , c = 2θ+1

6 , d = 2−2θ6 , we

obtain

(i)

λ maxt1+t3=t

2t13(2θ + 1)

− ( t13 )2

2θ+16 t1 + t3

=2λt

3(2θ + 1)(1 +√

2θ+16 )2

(45)

(ii)

λ maxt2+t3=t

2t23(2 − 2θ)

− ( t23 )2

2−2θ6 t2 + t3

=2λt

3(2 − 2θ)(1 +√

2−2θ6 )2

(46)

and

(iii) λ maxt1+t2=t

2t13(2θ + 1)

+2t2

3(2 − 2θ)− ( t1−t2

3 )2

2θ+16 t1 + 2−2θ

6 t2

=λ(1

3

2−2θ2θ+1 + 1

3

2θ+12−2θ )2t

(√

2θ+16 +

2−2θ6 )2

=6λt

(2θ + 1)(2 − 2θ)(√

2θ + 1 +√

2 − 2θ)2, (47)

using the results of (i) coincidence and total flux measure-ments, (ii) anti-coincidence and total flux measurements,and (iii) coincidence and anti-coincidence measurements,respectively. The ratio of (47) to (45) is equal to

3(√

6 +√

2θ + 1)2

2(2 − 2θ)(√

2θ + 1 +√

2 − 2θ)2> 1, (48)

as shown in Appendix B. That is, the measurement usingthe coincidence and the anti-coincidence provides better

Page 9: 0603254

9

test than that using the coincidence and the total flux.Hence, we compare (ii) with (iii), and obtain

maxt1+t2+t3=t

λ( 2t1

3(2θ + 1)+

2t23(2 − 2θ)

− ( t1−t23 )2

2θ+16 t1 + 2−2θ

6 t2 + t3

)

=

{

4λt(2−2θ)(

√6+

√2−2θ)2

if θ1 < θ ≤ 16λt

(2θ+1)(2−2θ)(√

2θ+1+√

2−2θ)2if 0 ≤ θ ≤ θ1

(49)

where θ1 < 1 is defined by

2(2θ1 + 1)(√

2θ1 + 1 +√

2 − 2θ1)2

3(√

6 +√

2 − 2θ1)2= 1. (50)

The approximated value of θ1 is 0.899519. The equation(49) is derived in Appendix C.

Fig. 1 shows the ratio of the optimal Fisher infor-mation obtained from the result of the anti-coincidenceand total flux measurements to that obtained from theresult of the coincidence and anti-coincidence measure-ments. When θ1 ≤ θ ≤ 1, the maximum Fisher in-

formation is attained by t1 = 0, t2 =√

6

(√

6+√

2(1−θ))t,

t3 =

√2(1−θ)

√6+

√2(1−θ)

t. Otherwise, the maximum is attained

by t1 =√

2−2θ√2θ+1+

√2−2θ

t, t2 =√

2θ+1√2θ+1+

√2−2θ

t, t3 = 0. The

optimal time allocation shown in Fig. 1 implies that weshould measure the counts on the anti-coincidence basespreferentially over other bases.

0.0 0.2 0.4 0.6 0.8 1.00.0

0.2

0.4

0.6

0.8

1.0

fidelity

ratio

of F

ishe

r inf

orm

atio

n

0.899519

t2

t1

0.0

0.2

0.4

0.6

0.8

1.0

optimal tim

e allocationt3

FIG. 1: The ratio of the optimal Fisher information (solidline) and the optimal time allocation as a function of the fi-delity θ. The measurement time is divided into three periods:coincidence t1 (plus signs), anti-coincidence t2 (circles), andtotal flux t3 (squares), which are normalized as t1+t2+t3 = 1in the plot.

The optimal asymptotic variance is(2θ+1)(2−2θ)(

√2−2θ+

√1+2θ)2

6λt when the threshold θ0is less than θ1. This asymptotic variance is much betterthan that obtained by the modified visibility method.The ratio of the optimal asymptotic variance is given by

(√

2 − 2θ +√

1 + 2θ)2

6< 1. (51)

In the following, we give the optimal test of level α inthe hypothesis testing (5). Assume that the thresholdθ0 is less than θ1. In this case, we can apply testingof the hypothesis (20). First, we measure two-photoncoincidence count on the coincidence bases for a period of

t1 = t√

2−2θ0√2θ0+1+

√2−2θ0

, to obtain the total count n1. Then,

we measure the count on the anti-coincidence bases for aperiod of t2 = t

√2θ0+1√

2θ0+1+√

2−2θ0to obtain the total count

n2. Note that the optimal time allocation depends on thethreshold of our hypothesis. Finally, we apply the UMPtest of α of the hypothesis:

H0 : p ≥√

2−2θ√2−2θ+

√1+2θ

versus H1 : p <√

2−2θ√2−2θ+

√1+2θ

with the binomial distribution family Pn1+n2p to the data

n1. We can apply a similar testing for θ0 > θ1. It issufficient to replace the time allocation to t1 = 0 t2 =

t√

6√6+

√2(1−θ0)

, t3 =t√

2(1−θ0)√

6+√

2(1−θ0).

If the dark count parameter δ is known but is not neg-ligible, the Fisher information matrix is given by

(

λ( 2λt13(λ(2θ+1)+6δ) + 2λt2

3(λ(2−2θ)+6δ) )λ(2θ+1)

3(λ(2θ+1)+6δ) t1 −λ(2−2θ)

3(λ(2−2θ)+6δ) t2λ(2θ+1)

3(λ(2θ+1)+6δ) t1 −λ(2−2θ)

3(λ(2−2θ)+6δ) t22θ+1

λ(2θ+1)+6δ2θ+1

6 t1 + 2−2θλ(2−2θ)+6δ

2−2θ6 t2 + 1

λ t3

)

. (52)

Hence, from (34), the inverse of the minimum variance is equal to

f(t1, t2, t3)

:=λ(2λt1

3(λ(2θ + 1) + 6δ)+

2λt23(λ(2 − 2θ) + 6δ)

−( λ(2θ+1)3(λ(2θ+1)+6δ) t1 −

λ(2−2θ)3(λ(2−2θ)+6δ) t2)

2

λ(2θ+1)λ(2θ+1)+6δ

2θ+16 t1 + λ(2−2θ)

λ(2−2θ)+6δ2−2θ

6 t2 + t3).

Page 10: 0603254

10

Then, we apply Lemmas 1 and 2 in Appendix A tof(t1,t2,t3)

λ with a = 2λ3(λ(2θ+1)+6δ) , b = 2λ

3(λ(2−2θ)+6δ) ,

c = λ(2θ+1)λ(2θ+1)+6δ

2θ+16 , d = λ(2−2θ)

λ(2−2θ)+6δ2−2θ

6 , and obtain the

optimized value:

(i) coincidence and total flux

maxt1+t3=t

f(t1, 0, t3) =4λt

((2θ + 1) +√

6(λ(2θ+1)+6δ)λ )2

(53)

(ii) anti-coincidence and total flux

maxt2+t3=t

f(0, t2, t3) =4λt

((2 − 2θ) +√

6(λ(2−2θ)+6δ)λ )2

(54)

and

(iii) coincidence and anti-coincidence

maxt1+t2=t

f(t1, t2, 0) =λt

λ(2θ+1)

3√

(λ(2θ+1)+δ)(λ(2−2θ)+6δ)+ λ(2−2θ)

3√

(λ(2θ+1)+δ)(λ(2−2θ)+6δ)

(2θ + 1)√

λ6(λ(2θ+1)+6δ) + (2 − 2θ)

λ6(λ(2−2θ)+6δ)

2

=2λ2t

3(λ(2θ + 1) + δ)(λ(2 − 2θ) + 6δ)

32θ+1√

λ(2θ+1)+6δ+ 2−2θ√

λ(2−2θ)+6δ

2

=6λ2t

(

(2θ + 1)√

λ(2 − 2θ) + 6δ + (2 − 2θ)√

λ(2θ + 1) + 6δ)2 . (55)

The ratio of (53) to (55) is

(

(2θ + 1) +√

6(λ(2θ+1)+6δ)λ

)2

2(

(2θ + 1)√

λ(2 − 2θ) + 6δ + (2 − 2θ)√

λ(2θ + 1) + 6δ)2

=3

2

(

(2θ + 1)√λ+

6(λ(2θ + 1) + 6δ)

(2θ + 1)√

λ(2 − 2θ) + 6δ + (2 − 2θ)√

λ(2θ + 1) + 6δ

)2

> 1, (56)

where the final inequality is derived in Appendix B.Therefore, the measurement using the coincidence andthe anti-coincidence provides better test than that usingthe coincidence and the total flux, as in the case of δ = 0.

Define δ1 and θδ′ for δ′ = 6δ/λ < δ1 as

δ1 + 3 −√

δ1 =√

3/2√

1 + 2θδ′ + δ′ −√

2 − 2θδ′ + δ′ =√

3/2.

The parameter δ1 is calculated to be 0.375. As shown inAppendix C, the measurement using the coincidence andthe anti-coincidence provides better test than that usingthe anti-coincidence and the total flux, if the fidelity is

smaller than the threshold θδ′ :

maxt1+t2+t3=t

f(t1, t2, t3)

=

4λ2t

((2−2θ)√

λ+√

6(λ(2−2θ)+6δ))2if θ > θδ′

6λ2t(

(2θ+1)√

λ(2−2θ)+6δ+(2−2θ)√

λ(2θ+1)+6δ)2 otherwise.

(57)

The optimal time allocation is given by

t1 = 0, t2 =t√

6(λ(2−2θ)+6δ)√6(λ(2−2θ)+6δ)+(2−2θ)

√λ, and

t3 = t(2−2θ)√

λ√6(λ(2−2θ)+6δ)+(2−2θ)

√λ

for θ > θδ′ ,

and t1 =t(2−2θ)

√λ(2θ+1)+6δ

(2−2θ)√

λ(2θ+1)+6δ+(2θ+1)√

λ(2−2θ)+6δ,

Page 11: 0603254

11

t2 =t(2θ+1)

√λ(2−2θ)+6δ

(2−2θ)√

λ(2θ+1)+6δ+(2θ+1)√

λ(2−2θ)+6δ, t3 = 0

for θ ≤ θδ′ . The threshold θδ′ for optimal time allocationincreases with the normalized dark count as illustratedin Fig. 2.

0.1 0.2 0.3 0.375normalized dark counts

0.9

0.92

0.94

0.96

0.98

1

threshold

FIG. 2: The threshold θδ′ for optimal time allocation as afunction of normalized dark counts δ′.

VIII. DESIGN II (λ: KNOWN, ONE STAGE)

In this section, we consider the case where λ is known.Then, the Fisher information is

λ(2λt1

3(λ(2θ + 1) + 6δ)+

2λt23(λ(2 − 2θ) + 6δ)

). (58)

The maximum value is calculated as

maxt1+t2+t3=t;t3=0

(58) =

{

2λ2t3(λ(2θ+1)+6δ) if θ < 1

42λ2t

3(λ(2−2θ)+6δ) if θ ≥ 14 .

(59)

The above optimization shows that when θ ≥ 14 , the anti-

coincidence count (t1 = 0; t2 = t) is better than the coin-cidence count (t1 = t; t2 = 0). In fact, Barbieri et al.[12]measured the sum of the counts on the anti-coincidencebases |HV 〉, |V H〉, |DX〉, |XD〉, |RR〉, |LL〉 to realize theentanglement witness in their experiment. In this case,

the variance is 3(λ(2−2θ)+6δ)2λ2t . When we observe the sum

of counting number n2, the estimated value of θ is givenby 1+3(δ− n2

λt ), which is the solution of (λ2−2θ6 +δ)t = n2.

The UMP test is given from the UMP test of Poisson dis-tribution.

IX. COMPARISON OF THE ASYMPTOTICVARIANCES

We compare the asymptotic variances of the follow-ing designs for time allocation, when the dark count δ

parameter is zero.

(i) Modified visibility: The asymptotic variance is(2θ+1)(2−2θ)

λt .

(iia) Design I (λ unknown). optimal time alloca-tion between the anti-coincidence count and thecoincidence count: The asymptotic variance is(2θ+1)(2−2θ)(

√2θ+1+

√2−2θ)2

6λt .

(iib) Design I (λ unknown), optimal time alloca-tion between the anti-coincidence count and thetotal flux count: The asymptotic variance is(2−2θ)(

√6+

√2−2θ)2

4λt .

(iiia) Design II (λ known), estimation from the anti-coincidence count: The asymptotic variance is3(2−2θ)

2λt .

(iiib) Design II (λ known), estimation from the coinci-

dence count The asymptotic variance is 3(2θ+1)2λt .

Fig. 3 shows the comparison, where the asymptotic vari-ances in (iia)-(iiib) are normalized by the one in (i).The anti-coincidence measurement provides the best es-timation for high (θ > 0.25) fidelity. When λ is un-known, the measurement with the anti-coincidence countand the coincidence count is better than that with theanti-coincidence count and the total flux count for θ <0.899519. For higher fidelity, the anti-coincidence countand the total flux count turns to be better, but the dif-ference is small.

0 0.25 0.5 0.75 0.899 1

fidelity

0.25

0.5

0.75

1

1.25

1.5

1.75

2

variance�varianceofHiL

FIG. 3: Comparison of the designs for time allocation. Theasymptotic variances normalized by the value of modified vis-ibility method are shown as a function of fidelity, where dots:(iia), solid: (iib), thick: (iiia), and dash: (iiib).

Page 12: 0603254

12

X. DESIGN III (λ: KNOWN, TWO STAGE)

A. Optimal Allocation

The comparison in the previous section shows thatthe measurement on the anti-coincidence bases yields abetter variance than the measurement on the coincidencebases, when the fidelity is close to 1 and the parametersλ and δ are known. We will explore further improvementin the measurement on the anti-coincidence bases. In theprevious sections, we allocate an equal time to the mea-surement on each of the anti-coincidence bases. Here weminimize the variance by optimizing the time allocationtHV , tV H , tDX , tXD, tRR, and tLL between the anti-coincidence bases |HV 〉, |V H〉, |DX〉, |XD〉, |RR〉, and|LL〉, respectively. The number of the coincidence counts

nxy obeys Poisson distribution Poi((λµxy + δ)txy) withunknown parameter µxy. Then, the Fisher informationmatrix is the diagonal matrix with the diagonal elements

λ2tHV

λµHV +δ ,λ2tV H

λµV H+δ ,λ2tDX

λµDX+δ ,λ2tXD

λµXD+δ ,λ2tRR

λµRR+δ ,λ2tLL

λµLL+δ .

Since we are interested in the parameter 1 − θ =12 (µHV +µV H +µDX +µXD +µRR +µLL), the varianceis given by

1

4

(λµHV + δ

λ2tHV+λµV H + δ

λ2tV H+λµDX + δ

λ2tDX

+λµXD + δ

λ2tXD+λµRR + δ

λ2tRR+λµLL + δ

λ2tLL

)

, (60)

as mentioned in section VA. Under the restriction of themeasurement time: tHV +tV H +tDX +tXD +tRR+tLL =t, the minimum value of (60) is

(√λµHV + δ +

√λµV H + δ +

√λµDX + δ +

√λµXD + δ +

√λµRR + δ +

√λµLL + δ)2

4λ2t, (61)

which is attained by the optimal time allocation

txy =(λ√µxy + δ)t√

λµHV + δ +√λµV H + δ +

√λµDX + δ +

√λµXD + δ +

√λµRR + δ +

√λµLL + δ

, (62)

called Neyman allocation. The variance with the equalallocation is

3(λ(2 − 2θ) + 6δ)

2λ2t

=3(λ(µHV + µV H + µDX + µXD + µRR + µLL) + 6δ)

2λ2t.

(63)

The inequality (61) ≤ (63) can be derived fromSchwartz’s inequality of the vectors (1, . . . , 1) and(√λµHV + δ, . . . ,

√λµLL + δ). The equality holds if and

only if µHV = µV H = µDX = µXD = µRR = µLL.Therefore, the Neyman allocation has an advantage overthe equal allocation, when there is a bias in the param-eters µHV , µV H , µDX , µXD, µRR, µLL. In other words,the Neyman allocation is effective when the expectationvalues of the coincidence counts on some bases are largerthan those on other bases.

B. Two-stage Method

The optimal time allocation derived above is not ap-plicable in the experiment, because it depends on theunknown parameters µHV , µV H , µDX , µXD, µRR, andµLL. In order to resolve this problem, we introduce atwo-stage method, where the total measurement time tis divided into tf for the first stage and ts for the sec-

ond stage under the condition of t = tf + ts. In the firststage, we measure the coincidence counts on each basisfor tf/6 and estimate the expectation value for Neymanallocation on measurement time ts. In the second stage,we measure the coincidence counts on a basis |xAyB〉 ac-cording to the estimated Neyman allocation. The two-stage method is formulated as follows.

(i) The measurement time for each basis in the first stageis given by tf/6(ii) In the second stage, we measure the coincidencecounts on a basis |xAyB〉 for txy defined as

txy =mxy

(x,y)∈B√mxy

(t− tf )

where mxy is the observed count in the first stage.

(iii) Define µxy and θ as

µxy =1

λ(nxy

txy

− δ)

θ = 1 − 1

2

(x,y)∈B

µx,y

where nx,y is the number of the counts on |xAyB〉 for txy.We test the hypothesis (5) by

T =

{

0 if θ ≤ c0,

1 if θ > c0

where c0 is a constant which makes the level α.

Page 13: 0603254

13

XI. ANALYSIS OF EXPERIMENTAL DATA

The experimental set-up for the hypothesis testing isshown in Fig. 4. The nonlinear crystals (BBO), the op-tical axis of which were set to orthogonal to on another,were pumped by a pulsed UV light polarized in 45◦ di-rection to the optical axis of the crystals. One nonlinearcrystal generates two photons polarized in the horizon-tal direction (|HH〉) from the vertical component of thepump light, and the other generates ones polarized in thevertical direction (|V V 〉) from the horizontal componentof the pump. The second harmonic of the mode-lockedTi:S laser light of about 100 fs duration and 150 mWaverage power was used to pump the nonlinear crystal.The wavelength of SPDC photons was thus 800 nm. Thegroup velocity dispersion and birefringence in the crys-tal may differ the space-time position of the generatedphotons and make the two processes to be distinguished[16]. Fortunately, this timing information can be erasedby compensation; the horizontal component of the pumppulse should arrive at the nonlinear crystals earlier thanthe vertical component. The compensation can be doneby putting a set of birefringence plates (quartz) and avariable wave-plate before the crystals. We could con-trol the two photon state from highly entangled states toseparable states by shifting the compensation from theoptimal setting.

The coincidence count on the basis |xAyB〉 was mea-sured by adjusting the half wave plates (HWPs) and thequarter wave plates (QWPs) in Fig. 4. We accumulatedthe coincidence counts for one second, and recorded thecounts every one second. Therefore, the time allocationof the measurement time on a basis must be an integralmultiple of one second. Figure 5 shows the histogram ofthe coincidence counts in one second on the basis

B = {|VH〉, |HV 〉, |XD〉, |DX〉, |RR〉, |LL〉},

when the visibility of the two-photon states was esti-mated to be 0.92. The measurement time was 40 sec-onds on each basis. The distribution of the coincidenceevents obeys the Poisson distribution. Only small num-bers of coincidence were observed on |HV 〉 and |V H〉bases. Those observations agree with the prediction,therefore, we expect that the hypothesis testing in theprevious section can be applied.

In the following, we compare four testing meth-ods on experimental data with the fixed total time t.The testing method employ the different time allocations{tHH , tV V , tDD, tXX , tRL, tLR, tHV , tV H , tDX , tXD, tRR, tLL}between the measurement bases:

(i) Modified visibility method: λ is unknown. Thecoincidence and the anti-coincidence are measuredwith the equal time allocation;

tHH = tV V = tDD = tXX = tRL = tLR

= tHV = tV H = tDX = tXD = tRR = tLL =t

12. (64)

HWP

Pol�NLC (BBOs)

400 nm∆∆∆∆tp

����100 fs

800 nm

QWPPBS

IF

SPDM

Coinc.

BereckCompensator

Quartz Plate

FIG. 4: Schematic of the entangled photon pair generationby spontaneous parametric down conversion. Cascade of thenonlinear crystals (NLC) generate the photon pairs. Groupvelocity dispersion and birefringence in the NLCs are pre-compensated with quartz plates and a Bereck compensator.Two-photon states are analyzed with half wave plates (HWP),quarter wave plates (QWP), and polarization beam splitters(PBS). Interference filters (IF) are placed before the singlephoton counting modules (SPCM).

0

0.05

0.1

0.15

0.2

0.25

0.3

0 5 10 15 20 25 30

Counts

Pro

bab

ility

������������������������FIG. 5: Distribution of the coincidence counts obtained inone second on the basis |V H〉, |HV 〉, |XD〉, |DX〉, |RR〉, and|LL〉. Bars present the histograms of the measured numbers,and lines show the Poisson distribution with the mean valuesestimated from the experiment. Measurement time was 40seconds for each basis.

(ii) Design I: λ is unknown. The coincidence and anti-coincidence counts are measured with the optimaltime allocation at the target threshold θ0;

tHH = tV V = tDD = tXX = tRL = tLR =t16

tHV = tV H = tDX = tXD = tRR = tLL =t26, (65)

Page 14: 0603254

14

where

t1 =t√

2 − 2θ0√2θ0 + 1 +

√2 − 2θ0

t2 =t√

2θ0 + 1√2θ0 + 1 +

√2 − 2θ0

. (66)

(iii) Design II: λ is known. Only the anti-coincidencecounts are measured with the equal time allocation.

tHV = tV H = tDX = tXD = tRR = tLL =t

6, (67)

(iv) Design III: λ is known. Only the anti-coincidencecounts are measured. The time allocation is givenby the two-step method:

tHV = tV H = tDX = tXD = tRR = tLL =tf6

(68)

in the first stage, and

txy =mxy

(x,y)∈B

√mxy

(t− tf ) (69)

in the second stage. The observed count mxy inthe first stage determines the time allocation in thesecond stage.

We have compared the p-values at the fixed thresh-old θ0 = 7/8 = 0.875 with the total measurement timet = 240 seconds. As shown in section III B, the p-valuemeasures the minimum risk probability to reject the hy-pothesis H0, i.e., the probability to make an erroneousdecision to accept insufficiently entangled states with thefidelity less than the threshold. The results of the exper-iment and the analysis of obtained data are described inthe following.

In the method (i), we measured the coincidence oneach basis for 20 seconds. Using the total coincidencecount n1 and the total anti-coincidence count n2, andapplying (37), we calculated the p-value approximately

by Φ( n2(2θ0+1)−n1(2−2θ0)√(n1+n2)(2θ0+1)(2−2θ0)

). We obtained n1 = 9686

and n2 = 868 in the experiment, which yielded the p-value 0.343.

In the method (ii), the optimal time allocation wascalculated with (66) to be t1 = 55.6 seconds and t2 ==184.4 seconds. However, since the time allocation shouldbe the integral multiple of second in our experiment, weused the time allocation t1 = 54 and t2 = 186. Thatis, we measure the coincidence count on each coinci-dence basis for 9 seconds and on each anti-coincidencebasis for 31 seconds. Using the total coincidence countn1 and the total anti-coincidence count n2, and apply-ing (37), we calculated the p-value approximately by

Φ( n2·(t1/6)(2θ0+1)−n1·(t2/6)(2−2θ0)√(n1+n2)·(t1/6)(2θ0+1)·(t2/6)(2−2θ0)

). We obtained n1 =

7239 and n2 = 2188 in the experiment, which yielded thep-value 0.0736.

In the method (iii), we measured the coincidence counton each anti-coincidence basis for 40 seconds. Using thetotal anti-coincidence count n, and applying (37), we cal-

culated the p-value approximately by Φ(n−λ(t/6)(2−2θ0)√λ(t/6)(2−2θ0)

).

We used λ = 290 estimated from another experiment.We obtained n = 2808 in the experiment, which yieldedthe p-value 0.0438.

In the method (iv), the calculation is rather compli-cated. Similarly to (iii), λ was estimated to be 290from another experiment. In the first stage, we mea-sured the coincidence count on each anti-coincidencebasis for tf/6 = 1 second. We obtained the counts6, 3, 13, 20, 11, and 23 on the bases |HV 〉, |V H〉, |DX〉,|XD〉, |RR〉, and |LL〉, respectively. We made thetime allocation of remaining 234 seconds for the sec-ond stage according to (69), and obtained tHV = 28.14,tV H = 19.90, tDX = 41.42, tXD = 51.37, tRR = 38.10,and tLL = 55.09. Since the time allocation should be theintegral multiple of second in our experiment, we usedthe time allocation {tHV , tV H , tDX , tXD, tRR, tLL} ={28, 20, 42, 51, 38, 55}. We obtained the anti-coincidencecounts nHV = 99, nV H = 66, nDX = 703, nXD =863, nRR = 531, and nLL = 853. Applying the countsand the time allocation to the formula (40), we obtainedthe p-values to be 0.0308.

The p-values obtained in the four methods are sum-marized in the table. We also calculated the p-values atdifferent values of the threshold θ0. The results are shownin Figs. 6 and 7. As clearly seen, the optimal time allo-cation between the coincidence bases measurement andthe anti-coincidence bases measurement reduces the riskof a wrong decision on the fidelity (the p-value) in an-alyzing the experimental data. The counts on the anti-coincidence bases is much more sensitive to the degra-dation of the entanglement. This matches our intuitionthat the deviation from zero provides a more efficientmeasure than that from the maximum does. The com-parison between (iii) and (iv) shows that the risk canbe reduced further by the time allocation between theanti-coincidence bases, as shown in Fig. 7. The optimal(Neyman) allocation implies that the measurement timeshould be allocated preferably to the bases that yieldmore coincidence counts. Under the present experimentalconditions, the optimal allocation reduces the risk proba-bility to about 75 %. The improvement should increasedas the fidelity. However, the experiment showed almostno gain when the visibility was larger than 0.95. In suchhigh visibility, errors from the maximally entangled stateare covered by dark counts, which are independent of thesetting of the measurement apparatus.

(i) (ii) (iii) (iv)

p-value at 0.875 0.343 0.0736 0.0438 0.0308

Page 15: 0603254

15

0.872 0.873 0.874 0.875 0.876threshold for hypothesis

0.1

0.2

0.3

0.4

p-value

FIG. 6: Calculated p-value as a function of the threshold.Dash-dot: (i)the modified visibility, dash: (ii)design I, dot:(iii)design II, solid: (iv) design III.

0.872 0.873 0.874 0.875 0.876threshold for hypothesis

0.02

0.04

0.06

0.08

0.1

p-value

FIG. 7: Calculated p-value as a function of the threshold(magnified). Dash: (ii)design I, dots: (iii)design II, solid:(iv) design III.

XII. CONCLUSION

We have formulated the hypothesis testing scheme totest the entanglement of the two-photon state generatedby SPDC. Our statistical method can handle the fluctua-tion in the experimental data properly. It has been shownthat the optimal time allocation improves the test: themeasurement time should be allocated preferably to theanti-coincidence bases. This design is particularly usefulfor the experimental test, because the optimal time al-location depends only on the threshold of the test. We

don’t need any further information of the probability dis-tribution and the tested state. The test can be furtherimproved by optimizing time allocation between the anti-coincidence bases, when the error from the maximallyentangled state is anisotropic. However, this time allo-cation requires the expectation values on the coincidencecounts, so that we need to apply the two stage method.

APPENDIX A: OPTIMIZATION OF FISHERINFORMATION

In this section, we maximize the quantities appearingin Fisher information.

Lemma 1 The equation

maxt1,t3≥0, ct1+t3=t

at2 −act21

ct1 + t3=

at

(√c+ 1)2

(A1)

holds and the maximum value is attained when t1 =t√c+1

, t3 =√

ct√c+1

.

Proof: Letting x := ct1 + t3, we have t1 = x−tc−1 . Then,

at1 −act21

ct1 + t3=

a

(c− 1)2

(

−x− ct2

x+ (c+ 1)t

)

.

Hence, the maximum is attained at x =√ct, i.e., t1 =

t√c+1

and t3 = t√

c√c+1

. Thus,

maxt1,t3≥0, ct1+t3=t

at1 −act21

ct1 + t3

=a

(c− 1)2(

−2√ct+ (c+ 1)t

)

=at

(√c+ 1)2

.

Lemma 2 The equation

maxt1,t2≥0, t1+t2=t

at1 + bt2 −(√act1 −

√bct2)

2

ct1 + dt2

=t(√ad+

√bc)2

(√c+

√d)2

. (A2)

holds, and this maximum value is attained when t1 =t√

d√c+

√d, t2 = t

√c√

c+√

d.

Proof: Letting x := ct1 + dt2, we have t1 = dt−xd−c and

t2 = x−ctd−c . Then,

at1 + bt2 −(√act1 −

√bct2)

2

ct1 + dt2

=

(√ad+

√bc

d− c

)2(

(c+ d)t− x− cdt2

x

)

.

Page 16: 0603254

16

Hence, the maximum is attained at x =√cdt, i.e., t1 =

t√

d√c+

√d

and t2 = t√

c√c+

√d. Thus,

maxt1,t2≥0, t1+t2=t

at1 + bt2 −(√act1 −

√bdt2)

2

ct1 + dt2

=

(√ad+

√bc

d− c

)2(

(c+ d)t− 2√cdt)

=t(√ad+

√bc)2

(√c+

√d)2

.

Further, three-parameter case can be maximized as fol-lows.

Lemma 3 The maximum value

maxt1,t2,t3≥0, t1+t2+t3=t

at1 + bt2 −(√act1 −

√bdt2)

2

ct1 + dt2 + t3

is equal to the maximum among three values

maxt1,t3≥0, ct1+t3=t

at2 − act21ct1 + t3

, maxt2,t3≥0, ct2+t3=t

at2 −bdt22

dt2 + t3, max

t1,t2≥0, t1+t2=tat1 + bt2 −

(√act1 −

√bdt2)

2

ct1 + dt2.

Proof: Define two parameters x := ct1 + dt2 + t3 andy :=

√cdt1 −

√bdt2. Then, the range of x and y forms a

convex set. Since

t1 =

√bd(x− t) + (d− 1)y√bd(c− 1) +

√ac(d− 1)

,

t2 =

√ac(x− t) − (c− 1)y

√ac(c− 1) +

√bd(d− 1)

.

Hence,

at1 + bt2 −(√act1 −

√bdt2)

2

ct1 + dt2 + t3

=

(

a√bd√

bd(c− 1) +√ac(d− 1)

+b√ac√

ac(c− 1) +√bd(d− 1)

)

(x− t)

+

(

a(d− 1)√bd(c− 1) +

√ac(d− 1)

− b(c− 1)√ac(c− 1) +

√bd(d− 1)

)

y − y2

x

= − 1

x(y − 1

2Bx)2 + (

B2

4+A)x−At,

where A := a√

bd√bd(c−1)+

√ac(d−1)

+ b√

ac√ac(c−1)+

√bd(d−1)

, B :=

a(d−1)√bd(c−1)+

√ac(d−1)

− b(c−1)√ac(c−1)+

√bd(d−1)

. Applying Lemma

4, we obtain this lemma.

Lemma 4 Define the function f(x, y) := − 1x (y−αx)2 +

βx on a closed convex set C. The maximum value isrealized at the boundary bdC.

Proof: The condition can be classified to two cases: i)bdC ∩ {y = αx} = ∅, ii) bdC ∩ {y = αx} 6= ∅. In thecase i), when fix x is fixed, maxy:(x,y)∈C f(x, y) =maxy:(x,y)∈bdC f(x, y). Then, we obtainmax(x,y)∈C f(x, y) = max(x,y)∈bdC f(x, y). In the case ii),when (x, αx) ∈ C, maxy:(x,y)∈C f(x, y) = f(x, αx) = βx.Hence, maxx:(x,αx)∈C maxy:(x,y)∈C f(x, y) =maxx:(x,αx)∈C βx This maximum is attained atx = maxx{x| : (x, αx) ∈ C} or x = minx{x| :(x, αx) ∈ C}. These point belongs to the boundarybdC. Further, maxx:(x,αx)/∈C maxy:(x,y)∈C f(x, y) =maxx:(x,αx)∈C maxy:(x,y)∈bdC f(x, y). Thus, the proof iscompleted.

APPENDIX B: PROOF OF INEQUALITIES (48)AND (56)

It is sufficient to show

3

2

(

(2θ + 1)√λ+

6(λ(2θ + 1) + 6δ))

−(

(2θ + 1)√

λ(2 − 2θ) + 6δ

+ (2 − 2θ)√

λ(2θ + 1) + 6δ)

> 0. (B1)

By putting δ′ := 6δλ , the LHS is evaluated as

LHS of (B1)√λ

=

3

2(2θ + 1) + 3

(2θ + 1) + δ′)

− (2θ + 1)√

(2 − 2θ) + δ′ − (2 − 2θ)√

(2θ + 1) + δ′

=

3

2(2θ + 1) + (2θ + 1)

(2θ + 1) + δ′)

− (2θ + 1)√

(2 − 2θ) + δ′

=(2θ + 1)

(

3

2+√

(2θ + 1) + δ′) −√

(2 − 2θ) + δ′

)

.

Since 0 ≤ θ ≤ 1, we have

3

2+√

(2θ + 1) + δ′) −√

(2 − 2θ) + δ′

≥√

3

2+√

1 + δ′ −√

2 + δ′.

Further, the function δ′ →√

1 + δ′−√

2 + δ′ (δ′ ∈ [0,∞])

has the minimum√

1 −√

2 > −1 > −√

32 at δ′ = 0.

Hence, LHS of (B1)√λ

> 0.

Page 17: 0603254

17

APPENDIX C: PROOF OF EQUATIONS (49)AND (57)

It is sufficient to show that√

3

2

(

(2 − 2θ)√λ+

6(λ(2 − 2θ) + 6δ))

−(

(2θ + 1)√

λ(2 − 2θ) + 6δ

+ (2 − 2θ)√

λ(2θ + 1) + 6δ)

> 0 (C1)

if and only if 6δλ < δ1 and θ ≥ θ 6δ

λ. By putting δ′ := 6δ

λ ,

the LHS of (C1) is evaluated as

LHS of (C1)√λ

=

3

2(2 − 2θ) + 3

(2 − 2θ) + δ′)

− (2θ + 1)√

(2 − 2θ) + δ′ − (2 − 2θ)√

(2θ + 1) + δ′

=

3

2(2 − 2θ) + (2 − 2θ)

(2 − 2θ) + δ′

− (2 − 2θ)√

(2θ + 1) + δ′

=(2 − 2θ)

(

3

2+√

(2 − 2θ) + δ′ −√

(2θ + 1) + δ′

)

.

Since 0 ≤ θ ≤ 1 and δ ≥ 0,

3

2+√

(2 − 2θ) + δ′ −√

(2θ + 1) + δ′ > 0

if and only if δ1 > δ′ and θ > θδ′ .

APPENDIX D: PROOF OF (26) ≤ (27)

Define µi by

minw·µ′=c0

D(Poi(0, . . . , 0, µi, 0, . . . , 0)‖Poi(µ′1, . . . , µ

′m) = R.

In fact,

minµ′

i≥0: w·µ′=c0

D(Poi(0, . . . , 0, a, 0, . . . , 0)‖Poi(µ′1, . . . , µ

′m)

= minµ′

i≥0: w·µ′=c0

m∑

j=1

µ′j − a+ a log

a

µ′i

= minα≥0,β≥0: µiα+µM β=c0

α+ β − a+ a loga

α

=

{

c0

wi− a+ a log awi

c0if a ≥ c0(wM−wi)

wM wic0

wM+ a log wM−wi

wMif a < c0(wM−wi)

wM wi.

This value is monotone decreasing concerning a. When

a = c0(wM−wi)wM wi

, this value is c0

wM− c0(wM−wi)

wiwMlog wM−wi

wM.

Hence, we obtain (28) and (29).

Next, we prove (27). It is sufficient to show that

{

(k′1, . . . , k′m)

minw·µ′=c0

D(Poi(k′1, . . . , k′m)‖Poi(µ′

1, . . . , µ′m)) ≥ R

∑mi=1 wiki < c0

}

⊂{

(k′1, . . . , k′m)

k∑

i=1

k′iµi

≤ 1

}

. (D1)

The above relation follows from the relation

minw·µ′=c0

D(Poi(p1a1, . . . , pmam)‖Poi(µ′1, . . . , µ

′m))

≤k∑

i=1

pi minw·µ′=c0

D(Poi(0, . . . , 0, ai, 0, . . . , 0)‖Poi(µ′1, . . . , µ

′m)).

We choose µ′1,i, . . . , µ

′m,i such that

minw·µ′=c0 D(Poi(0, . . . , 0, ai, 0, . . . , 0)‖Poi(µ′1, . . . , µ

′m)) =

D(Poi(0, . . . , 0, ai, 0, . . . , 0)‖Poi(µ′1,i, . . . , µ

′m,i)). Then,

Lemma 5 implies

k∑

i=1

pi minw·µ′=c0

D(Poi(0, . . . , 0, ai, 0, . . . , 0)‖Poi(µ′1, . . . , µ

′m))

=

k∑

i=1

piD(Poi(0, . . . , 0, ai, 0, . . . , 0)‖Poi(µ′1,i, . . . , µ

′m,i))

≥D(Poi(p1a1, . . . , pmam)‖Poi(

k∑

i=1

piµ′1,i, . . . ,

k∑

i=1

piµ′m,i))

≥ minw·µ′=c0

D(Poi(p1a1, . . . , pmam)‖Poi(µ′1, . . . , µ

′m)).

Lemma 5 Any real number 0 ≤ p ≤ 1 and any foursequence of positive numbers (µi), (νi), (µ′

i), and (ν′i)

Page 18: 0603254

18

satisfy

p(

m∑

i=1

(µi − νi) +

m∑

i=1

νi logνi

µi)

+ (1 − p)(

m∑

i=1

(µ′i − ν′i) +

m∑

i=1

ν′i logν′iµ′

i

)

≥m∑

i=1

((pµi + (1 − p)µ′i) − pνi + (1 − p)ν′i))

+

m∑

i=1

(pνi + (1 − p)ν′i) log(pνi + (1 − p)ν′i)

(pµi + (1 − p)µ′i).

Proof: It is sufficient to show

p(

m∑

i=1

νi logνi

µi) + (1 − p)(

m∑

i=1

ν′i logν′iµ′

i

)

≥m∑

i=1

(pνi + (1 − p)ν′i) log(pνi + (1 − p)ν′i)

(pµi + (1 − p)µ′i).

The convexity of − log implies that

− log((pµi + (1 − p)µ′

i)

(pνi + (1 − p)ν′i))

= − log(pνi

(pνi + (1 − p)ν′i)

µi

νi+

(1 − p)ν′i(pνi + (1 − p)ν′i)

µ′i

ν′i)

≤ pνi

(pνi + (1 − p)ν′i)· − log(

µi

νi)

+(1 − p)ν′i

(pνi + (1 − p)ν′i)· − log(

µ′i

ν′i).

Hence,

p(

m∑

i=1

νi logνi

µi) + (1 − p)(

m∑

i=1

ν′i logν′iµ′

i

)

=

m∑

i=1

(pνi + (1 − p)ν′i)( pνi

(pνi + (1 − p)ν′i)· − log(

µi

νi)

+(1 − p)ν′i

(pνi + (1 − p)ν′i)· − log(

µ′i

ν′i))

≥−m∑

i=1

(pνi + (1 − p)ν′i) log(pνi + (1 − p)ν′i)

(pµi + (1 − p)µ′i).

[1] A. Aspect, P. Grangier, and G. Roger, Phys. Rev. Lett.,49, 91 (1982).

[2] J. S. Bell. Speakable and Unspeakable in Quantum Me-chanics: Collected Papers on Quantum Philosophy, Cam-bridge University Press, Cambridge, 1993.

[3] P. W. Shor, SIAM J. Comp. 26, 1484 (1997).[4] C.H. Bennett, G. Brassard, C. Crepeau, R. Jozsa, A.

Peres, and W. K. Wootters, Phys. Rev. Lett. 70, 1895(1993).

[5] H.-J. Briegel, W. Dur, J.I. Cirac, and P. Zoller, Phys.Rev. Lett., 81, 5932 (1998).

[6] C. H. Bennett and G. Brassard, Proc. Int. Conf. Comput.Syst. Signal Process., Bangalore, 1984, pp. 175-179.

[7] P.W. Shor and J. Preskill, Phys. Rev. Lett., 85, 441(2000).

[8] C. W. Helstrom, Quantum detection and estimation the-ory Academic Press (1976).

[9] A. S. Holevo, Probabilistic and statistical aspects of quan-tum theory North-Holland Publishing (1982).

[10] M. Hayashi, Asymptotic Theory Of Quantum StatisticalInference: Selected Papers World Scientific (2005).

[11] A. G. White, D. F. V. James, P. H. Eberhard, and

P. G. Kwiat, Phys. Rev. Lett., 83, 3103 (1999).[12] M. Barbieri, F. De Martini, G. Di Nepi, P. Mataloni,

G. M. D’Ariano, and C. Macchiavello, Phys. Rev. Lett.,91, 227901 (2003).

[13] Y. Tsuda, K. Matsumoto, and M. Hayashi. “Hypoth-esis testing for a maximally entangled state,” quant-ph/0504203.

[14] E. L. Lehmann, Testing statistical hypotheses Second edi-tion. Wiley (1986).

[15] P. G. Kwiat, E. Waks, A. G. White, I. Appelbaum, andP.H. Eberhard, Phys. Rev. A, 60, 773(R) (1999).

[16] Y. Nambu, K. Usami, Y. Tsuda, K. Matsumoto, andK. Nakamura, Phys. Rev. A, 66, 033816 (2002).

[17] K. Usami, Y. Nambu, Y. Tsuda, K. Matsumoto, andK. Nakamura, “Accuracy of quantum-state estimationutilizing Akaike’s information criterion,” Phys. Rev. A,68, 022314 (2003).

[18] S. Amari and H. Nagaoka, Methods of Information Ge-ometry, (AMS & Oxford University Press, 2000).

[19]