improved semidefinite programming hierarchy for h`39`42 ... · with tools from algebraic geometry...

Improved Semidefinite ProgrammingHierarchy for hSep

with tools from Algebraic Geometry

Aram W. Harrow 1 Anand Natarajan 1 Xiaodi Wu 2

1MIT

2University of Oregon

QMA(2) Workshop, QuICS, Aug 3rd 2016arXiv:1506.08834

The problem hSep

Definition (Separable states)A bipartite state ρ ∈ D (X ⊗ Y) is separable if

ρ ∈ conv(σX ⊗ σY : σX ∈ D (X ) , σY ∈ D (Y)).

Let Sep def= separable states .

Definition (hSep)Given a Hermitian operator M over X ⊗ Y,

hSep(M) := maxρ∈Sep

Tr[Mρ].

The problem WOpt(M, ε) is to compute an ε-additiveapproximation of hSep(M).

Entanglement Testing

Definition (Weak Membership)WMem(ε, ‖·‖) : for any ρ ∈ D (X ⊗ Y), decide whether ρ ∈ Sepor ‖ρ− Sep‖ ≥ ε.

I By “GLS theorems”in convexoptimization, cansolve WMem(ε, ‖·‖1)efficiently with anoracle toWOpt(M, ε). (Usesellipsoid method)

I Any M withhSep(M) ≤ λmax(M)is an entanglementwitness.

Applications

I Finding the maximum acceptance probability of a QMA(2)machine⇔ solving WMem(ε, ‖·‖1) for M arising frompoly-time quantum circuit.

I Many more (mean-field aproximations, output entropies ofchannels, etc.)

Algorithms and Hardness

I Complexity is a function of dimension n and accuracy ε.I Algorithms based on Sum-of-Squares [DPS] and ε-nets,

[SW, BH, ...].I When ε = Ω(1):

I Need time nΩ(log n) assuming Exponential-Time Hypothesis[HM]. (Unconditionally for SoS hierarchy [HNW]).

I Exists nO(log(n)/ε2) algorithm for 1-LOCC M [BCY, BH].I When ε = 1/poly(n):

I WOpt(·, ε) is NP-hard. [Gur, Ioa, Gha]I Exist (n/

√ε)O(n) algorithms based on SDPs. [NOP]

I What about scaling with ε?

Why care about ε scaling?

I Surprising jumps in complexity in high-accuracy regime:I QIP jumps from PSPACE to EXP with inverse doubly

exponential ε. [IKW]I QMA equals PSPACE for inverse-exponential ε. [FL]I QMA(2) equals NEXP for inverse-exponential ε. [Per]

I Better understand “brute-force algorithms” for quantumproblems

I Ex: no known algorithm to approximate entangled valueof non-local game to constant factor.

I Could lead to algorithms that perform better in practice

Results

Theorem (Main)There exists an algorithm that estimates hSep(n)(M) to error ε intime exp(poly(n)) poly log(1/ε). similar for the multi-partite case.

I One algorithm based on quantifier elimination—lessconceptually interesting, so skip for this talk.

I Other algorithm based on SoS hierarchy with addedconstraints, inspired by [Nie, NR].

I Algorithm is exact: only approximation comes fromnumerical error in SDP solver

I As a complexity result:

QMAlog(2)[c, s = c−1/exp(exp(n))] ⊆ DTIME(exp(poly(n))).

Comparison with DPS

Advantages

I Primal of SDP leads to new monogamy relations relativeto particular observables.

I Dual of SDP leads to a new class of entanglementwitness.

I Analog of the exact convergence achievable for discreteoptimization, e.g., SoS for Boolean CSPs. (Note: DPSdoes not converge at any finite level)

Disadvantages

I Unlike [DPS], set of entanglement witnesses could benon-convex.

I Hence, cannot solve WMem directly (need to use GLSreduction).

Proof Outline

I Treat WOpt as a constained polynomial optimizationproblem.

I Use Karush-Kuhn-Tucker conditions to restrict to criticalset (technique from [Nie, NR])

I Claim 1: For generic M, critical set consists ofexp(n)-many discrete points. (From Bertini’s thm)

I Algorithm: use SoS to approximately optimize over criticalpoints by searching for certificates of positivity.

I Claim 2: For generic M, optimum has SoS certificate ofdegree exp(n). (From Claim 1 and Grobner basis theory)

I (Continuity argument to handle non-generic M).

Simplification: the symmetric real case

By simple reductions [HM], we can restrict to states of the form|ψ〉 ⊗ |ψ〉 , |ψ〉 ∈ Rn.

hProdSym(n)(M) :=

maxx∈Rn

f0(x) :=∑

i1,i2,j1,j2

M(i1,i2),(j1,j2)xi1xi2xj1xj2

subject to f1(x) := ||x ||2 − 1 = 0.(1)

REMARK: this is a polynomial optimization problem with ahomogenous degree 4 objective and a degree 2 constraints.

Principle of Sum-of-Squares

One way to show that a polynomial f (x) is nonnegative could be

f (x) =∑

ai(x)2 ≥ 0.

Example

f (x) = 2x2 − 6x + 5= (x2 − 2x + 1) + (x2 − 4x + 4)

= (x − 1)2 + (x − 2)2 ≥ 0.

Such a decomposition is called a sum of squares (SoS)certificate for the non-negativity of f .

Principle of SoS : constrained domain

Definition (Variety)A set V ⊆ Cn is called an algebraic variety ifV = x ∈ Cn : g1(x) = · · · = gk (x) = 0.

Non-negativity of f (x) on V can be shown by

f (x) =∑

ai(x)2 +∑

bj(x)gj(x) ≥ 0.

Question: do all nonnegative polynomials on certain varietyhave a SoS certificate?

Putinar’s Positivstellensatz

Definition (Ideal)The polynomial ideal I generated by g1, . . . ,gk∈ C[x1, . . . , xn] is

I = ∑

aigi : ai ∈ C[x1, . . . , xn] := 〈g1, · · · ,gk 〉.

Theorem (Putinar’s Positivstellensatz)Under the Archimedean condition, if f (x) > 0 on V (I)∩Rn, then

f (x) = σ(x) + g(x),

for some choice of SoS polynomial σ(x) and g(x) ∈ I.

SoS in Optimization

max f (x)

subject to gi(x) = 0 ∀i(2)

is equivalent to (under Archimedean condition)

min ν

such that ν − f (x) = σ(x) +∑

i

bi(x)gi(x), (3)

where σ(x) is SoS and bi(x) is any polynomial.

SoS hierarchy

I If σ(x) and bi(x) can have arbitrarily high degrees, then theoptimization problem (3) is equivalent to problem (2).

I By bounding the degrees, i.e., deg(σ(x)),deg(bi(x)gi(x)) ≤ 2D for some integer D, we get SoShierarchy at level D.

min ν

such that ν − f (x) = σ(x) +∑

i

bi(x)gi(x), (4)

where σ(x) is SoS and bi(x) is any polynomial and deg(σ(x)),deg(bi(x)gi(x)) ≤ 2D.DPS algorithm is SoS applied to hSep

Why is it an SDP?

Observation

I Any p(x) (of degree 2D) = mT Qm, where m is the vectorof monomials of degree up to 2D and Q is the coefficients.

I p(x) is a SoS iff Q ≥ 0.

minν,biα∈R

ν

such that νA0 − F −∑iα

biαGiα ≥ 0.(5)

Complexity: poly(m) poly log(1/ε), where m =(n+D

D

).

Karush-Kuhn-Tucker ConditionsFor any optimization problem

max f (x) s.t. gi(x) ≤ 0,hj(x) = 0,∀i , j ,

if x∗ is a local optimizer, then ∃µi , λj ,

∇f (x∗) =∑

µi∇gi(x∗) +∑

λj∇hj(x∗)

gi(x∗) ≤ 0,hj(x∗) = 0,µi ≥ 0, µigi(x∗) = 0.

Remark: for convexoptimization (our case), anyglobal optimizer satisfies KKT.

Our caseRecall our optimization problem is

max f0(x) s.t. f1(x) = 0.

The KKT condition is ∇f0(x) = λ∇f1(x), which is equivalent to

rank

∂f0(x)∂x1

∂f1(x)∂x1

......

∂f0(x)∂x2n

∂f1(x)∂x2n

< 2.

gij(x) :=∂f0(x)

∂xi

∂f1(x)

∂xj− ∂f0(x)

∂xj

∂f1(x)

∂xi= 0, ∀i , j

DPS+: Optimization with KKT constraints

min ν

such that ν − f0(x) ≥ 0f1(x) = 0

KKT gij(x) = 0 ∀1 ≤ i 6= j ≤ 2n

I DPS+ hierarchy: SoS applied to above optimizationproblem

I Will show finite convergence when D = exp(poly(n)). Thenm =

(n+DD

)= exp(poly(n)). Thus the final time is

exp(poly(n)) poly log(1/ε).

KKT Ideal

Definition (KKT Ideal & Variety)

IK =

v(x)f1(x) +∑

hij(x)gij(x)

=< f1(x),gij(x) > .

V (IK ) =

x ∈ C2n : ∀p(x) ∈ IK ,p(x) = 0

Definition (KKT Ideal to degree m)

ImK = v(x)f1(x) +

∑hij(x)gij(x) : deg(v(x)f1(x)) ≤ m,

∀ i , j ,deg(hijgij) ≤ m.

Main Technical Theorems

Theorem (Zero-dimensionality of generic IK )For a generic M, |V (IK )| <∞ and IK is zero-dimensional.

Theorem (Degree bound)There exists m = O(exp(poly(n))), s.t. for a generic M, ε > 0,

v − f0(x) + ε = σ(x) + g(x),

where σ(x) is SoS and deg(σ(x)) ≤ m,g(x) ∈ ImK .

Corollary (SDP solution)For a generic M, hProdSym(n)(M) can be estimated to error ε intime exp(poly(n))poly log(1/ε).

From generic to arbitrary M

Observations

I Generic Ms are dense, and value of SDP should be acontinuous function of M.

I Issue: SDP might be infeasible for non-generic M.

Solution

I Switch to the dual SDP: satisfies Slater’s condition, i.e,strictly feasible.

I For a generic M, by strong duality,hProdSym(n)(M) =OPTmom(M).

I For non-generic inputs M, use the continuity of the dualSDP.

Proof of Theorem 1

Let U = f1(x) = 0,W = ∀ i , j ,gij = 0. then V (IK ) ⊆ U ∩W.It suffices to show |U ∩W| <∞. Construct A =X ∩U s.t.

A ∩W = ∅ and dim(X ) = n − 1. NoteW ∩A = (W ∩ U) ∩ X .By Bezout’s theorem, two varieties with dimension sum≥ nmust intersect. Thus

dim(W ∩ U) + dim(X ) = dim(W ∩ U) + n − 1 < n.

This implies dim(W ∩ U) = 0 and thus |V (IK )| <∞.

Proof of Theorem 1: construct X

Let X = f0(x) = µ for generic (µ,M). dim(X ) = n − 1.By Bertini’s theorem, dim(A) = dim(U ∩ X ) = n − 2.

The Jacobian matrix JA =

∂f0∂x1

∂f1∂x1

......

∂f0∂xn

∂f1∂xn

has rank(JA) = 2.

W by definition says rank(JA) = 1. Thus no intersection!

Subtleties: varieties over projective space, so need toconsider point at infinity

Proof of Theorem 2

I Goal: given SoS certificate

ν − f0(x) = σ(x) + g(x), σ(x) ∈ SoS,g(x) ∈ ImK ,

upper bound deg(σ(x)),deg(g(x)).I Idea: start from any SoS certificate and lower degree ofσ(x) using Grobner basis.

I Then, show that remaining high-degree terms in g(x)vanish

I Grobner basis: set of polynomials generating Ik suchthat leading term of any poly in Ik is divisible byleading term of a basis element.

I [MR]:

|V (IK )| <∞ =⇒ max degγi ≤ D = exp(poly(n)).

Proof of Theorem 2: SoS term

I Let γi be a Grobner basis for IK .I Take any SoS certificate:

v − f0(x) = σ(x) + g(x). s.t. σ(x) SoS ,g(x) ∈ ImK .

I Let σ(x) =∑

sa(x)2. By properties of Grobner basis

sa(x) = ga(x) + ua(x), s.t. ga(x) ∈ IK ,deg(ua(x)) ≤ nD.

I Thus

v−f0(x) = σ′(x)+g′(x),deg(σ′(x)) ≤ exp(poly(n)),g′ ∈ IK .

Proof of Theorem 2: Ideal term

Suffices to show g′ ∈ ImK ,m = exp(poly(n)).

I Since f0(x) and σ(x) have degree at most m,deg(g′(x)) ≤ m. Remains to show that g′(x) can beobtained as sum of degree-m multiples of originalgenerators gij .

I First step: decompose g′(x) in Grobner basis:g′(x) =

∑tkγk (x) with deg(tkγk (x)) ≤ m.

I Second step: decompose Grobner basis in terms oforiginal generators: γk (x) =

∑uij(x)gij(x) with

deg(uij) ≤ m.I Thus

g′(x) =∑

tkuijgij(x),deg(tkuij) ≤ m, =⇒ g′(x) ∈ ImK .

Numerical Results

DPS+ beats DPS at very high levels, but what about low levelsof the hierarchy?

I DPS+ beats DPS atlevel 2, n = 3 for afamily ofmeasurementsintroduced by[DPS].

Mγ = Id− (Aγ ⊗ Id)Z (Aγ ⊗ Id)

Aγ = diag(1,1/γ, . . . , 1γ).

I For all γ ∈ [0,1],hSep(Mγ) = 1.

Open Questions

DPS+I Analyze the low levels of DPS+.I Advantages of adding KKT conditions other than

presented here.Nonlocal games

I What is the correct version of KKT conditions fornon-commutative polynomials?

I Can we have finite convergence for thecommuting-operator game value? Would imply an upperbound on commuting-operator version of MIP* (noneknown so far)

SoS hierarchyI Any other applications to quantum information?

Thank you!Q & A

improved semidefinite programming hierarchy for h`39`42 ... · with tools from algebraic geometry...

Documents