approximation ii

アルゴリズム工学特論Approximation Algorithms II

宮野英次

システム創成情報工学系

miyano@ces.kyutech.ac.jp

LP-Based Algorithms

Introduction

A large fraction of the theory of approximation algorithms is built

around linear programming (LP).

We will review some key concepts from this theory.

We will introduce the two fundamental algorithm design techniques of

rounding and the primal-dual schema, as well as the method of dual

fitting.

Linear programming

Linear programming is the problem of optimizing a linear function

subject to linear inequality constraints.

The function being optimized is called the objective function.

Example: Standard form

Any solution, i.e., a setting for the variables in this linear program, that

satisfies all the constraints is said to be a feasible solution.

103subject to

57minimize

Upper bound

Let z* denote the optimum value of this linear program.

Let us consider the question, “Is z* at most ?” where is a given

rational number.

Example: Is z* at most 30?

A Yes certificate for this question is simply a feasible solution

whose objective function value is at most 30.

For example, x = (2,1,3).

Any Yes certificate to this question provides an upper bound on z*.

103subject to

57minimize

Lower bound

How do we provide a No certificate for such a question?

How do we place a good lower bound on z* ?

One such bound is given by the first constraint: Since the xi’s are

restricted to be nonnegative, term-by-term comparison of

coefficients show that

Since the right-hand side of the first constraint is 10, the objective

function is at least 10 for any feasible solution.

103subject to

57minimize

321321 357 xxxxxx

7>1, 1>-1, 5>3

Better lower bound

by taking the sum of the two constraints:

The problem of finding the best such lower bound can be formulated as

a linear program:

primal program and dual program

One is a minimization and the other is a maximization.

16)25()3(57 321321321 xxxxxxxxx

75subject to

610maximize

Primal and Dual

Every feasible solution to the dual program gives a lower bound on the

optimum value of the primal.

The reverse also holds; every feasible solution to the primal gives an

upper bound on the optimum of the dual.

If we can find feasible solutions for the dual and the primal with

matching values, then both solutions must be optimal.

x = (7/4,0,11/4) and y = (2,1) achieve 26.

LP-duality theorem

dual opt = primal opt

dual solutions primal solutions

Primal program

Minimization problem

,...,10

,...,1subject to

minimize

Dual program

Maximization problem

,...,10

,...,1subject to

maxmize

LP-duality theorem

Theorem.

The primal program has finite optimum iff its dual has finite optimum.

Moreover, if x*=(x1*, …, xn*) and y*=(y1*, …, ym*) are optimal

solutions for the primal and dual programs, respectively, then

The design of several exact algorithms have their basis in the LP-duality

theorem.

jj ybxc1

Weak duality theorem

Theorem.

If x = (x1, …, xn) and y = (y1, …, ym) are feasible solutions for the

primal and dual program, respectively, then

jj ybxc11

Since y is dual feasible and xj’s are nonnegative,

Similarly, since x is primal feasible and yi’s are nonnegative,

The theorem follows by observing that

iij yxaxya

1 11 1

jij ybyxa11 1

jj xyaxc1 11

Structure of optimal solutions

Theorem.

Let x and y be primal and dual feasible solutions, respectively. Then, x

and y are both optimal iff all of the following conditions are satisfied:

Primal complementary slackness conditions:

Dual complementary slackness conditions:

iijj cyaxnj 1

or 0either ,1each For

jiji bxaymi 1

or 0either ,1each For

Usefulness of LP

Many combinatorial optimization problems can be stated as integer

programs.

Once this is done, the linear relaxation of this program provides a

natural way of lower bounding the cost of the optimal solution.

This is a key step in the design of an approximation algorithm.

Also, in analyzing combinatorially obtained approximation algorithms,

LP-duality theory has been useful, using the method of dual fitting.

Two fundamental algorithm design techniques

1. Rounding

Solve the LP and then convert the fractional solution obtained into

an integral solution, trying to ensure that in the process the cost

does not increase much.

The approximation guarantee is established by comparing the cost

of integral and fractional solutions.

2. Primal-dual schema

Use the dual of the LP-relaxation.

Let us call the LP-relaxation the primal program.

Under this schema, an integral solution to the primal program and a

feasible solution to the dual program are constructed iteratively.

The approximation guarantee is established by comparing the two

solutions.

Integrality gap

Given an LP-relaxation for a minimization problem , let OPTf(I)

denote the cost of an optimal fractional solution to instance I, i.e., the

objective function value of an optimal solution to the LP-relaxation.

Define the integrality gap (or integrality ratio), of the relaxation to be

i.e., the supremum of the ratio of the optimal integral and fractional

solutions.

In the case of a maximization problem, the integrality gap is defined to

be the infimum of this ratio.

If the cost of the solution found by the algorithm is compared directly

with the cost of an optimal fractional solution, as is done in most

algorithms, the best approximation factor we can hope to prove is the

integrality gap of the relaxation.

A comparison of the two techniques

LP-rounding vs Primal-dual schema

For many problems, both techniques have been successful in yielding

algorithms having guarantees essentially equal to the integrality gap of

the relaxation.

The main difference lies in the running times.

An LP-rounding algorithm needs to find an optimal solution to the LP

relaxation. The running time is high.

The primal-dual schema leaves enough room to exploit the special

combinatorial structure of individual problems and is thereby able to

yield algorithms having good running times.

From a practical standpoint, a combinatorial algorithm is more useful,

since it is easier to adapt it to specific applications and fine tune its

performance for specific types of inputs.

Set Cover via Dual Fitting

Set cover

Problem (Set cover)

Given a universe U of n elements, a collection of subsets of U, S = {S1,

…, Sk}, and a cost function c: S Q+, find a minimum cost su-

bcollection of S that covers all elements of U.

Let f be the frequency of the most frequent element.

Dual fitting

Outline, assuming a minimization problem:

The basic algorithm is combinatorial.

Using the LP-relaxation of the problem and its dual, one shows that the

primal integral solution found by the algorithm is fully paid for by the

dual computed; however, the dual is infeasible.

By fully paid we mean that the objective function value of the primal

solution found is at most the objective function value of the dual

computed.

The main step in the analysis consists of dividing the dual by a suitable

factor and showing that the shrunk dual is feasible, i.e., it fits into the

given instance.

The shrunk dual is then a lower bound of OPT, and the factor is the

approximation guarantee of the algorithm.

IP-formulation of set cover

Let us assign a variable xS for each set S S, which is allowed 0/1

values.

xS = 1 iff S is picked in the set cover.

The constraint is that for each element e U we want that at least one

of the sets containing it be picked.

1subject to

)(minimize

x1 = 0

x3 = 0

x2 = 1

LP-relaxation

The LP-relaxation of the IP is obtained by letting the domain of

variables xS be 1 xS 0.

Since the upper bound on xS is redundant, we get the following LP.

A solution to this LP can be viewed as a fractional set cover.

1subject to

)(minimize

Dual program

Introducing a variable ye corresponding to each element e U, we

obtain the dual program.

An intuitive way of thinking about (DP) is that it is packing “stuff” into

elements, trying to maximize the total amount packed, subject to the

constraint that no set is overpacked.

A set is said to be overpacked if the total amount packed into its

elements exceeds the cost of the set.

)(subject to

maxmize

Covering and packing

Whenever the coefficients in the constraint matrix, objective function,

and right-hand side are all negative, the minimization LP is called a

covering LP, and

the maximization LP is called a packing LP.

Thus, (LP) and (DP) are a covering-packing pair.

Now, we can state the lower bounding scheme:

GREEDY SET COVER

2. While C U do

Find the set whose cost-effectiveness is smallest, say S.

Let = c(S)/|S – C| , i.e., the cost-effectiveness of S.

Pick S, and for each e S – C, set price(e) = .

C C S.

3. Output the picked sets.

Lower bounding scheme (revisited)

Denote by OPTf the cost of an optimal fractional set cover, i.e., an

optimal solution to (LP).

Then, OPTf OPT.

The cost of any feasible solution to (DP) is a lower bound on OPTf, and

hence also on OPT.

GREEDY SET COVER algorithm uses this as the lower bound.

0 OPTf

dual fractionalsolutions

primal fractionalsolutions

primal integralsolutions

GREEDY SET COVER defines dual variables price(e), for each

element, e.

Observe that the cover picked by the algorithm is fully paid for by this

dual solution.

However, in general, this dual solution is not feasible.

We show in the next slide that if this dual is shrunk by a factor of Hn, it

fits into the given set cover instance, i.e., no set is overpacked.

For each element e define,

GREEDY SET COVER uses the dual feasible solution, y, as the lower

bound on OPT.

epricey

Lemma:

The vector y defined in the previous slide is a feasible solution for the

dual program (DP).

Proof.

Consider a set S S consisting of k elements.

Number the elements in the order in which they are covered by the

algorithm, breaking ties arbitrarily, say, e1, …, ek.

Consider the iteration in which the algorithm covers element ei.

At this point, S contains at least k–i+1 uncovered elements.

In this iteration, S itself can cover ei at an average cost of at most

c(S)/(k–i+1).

Since the algorithm chose the most cost-effective set, price(ei)

c(S)/(k–i+1).

Summing over all elements in S,

Therefore, S is not overpacked.

Approximation ratio

Theorem:

The approximation guarantee of GREEDY SET COVER algorithm is

Proof.

The cost of the set cover picked is

where the first inequality follows from the fact that y is dual feasible.

OPTHOPTHyHeprice nfn

Rounding Applied to Set Cover

A simple rounding algorithm

An LP-relaxation is given in (LP).

One way of converting a solution to (LP) into an integral solution is to

round up all nonzero variables to 1.

Let us consider a slight modification of this algorithm that is easier to

prove and picks fewer sets in general:

LP-ROUNDING

1. Find an optimal solution to the LP-relaxation.

2. Pick all sets S for which xS 1/f in this solution.

Let f be the frequency of the most frequent element.

frequency = 3

Approximation ratio of LP-ROUNDING

Theorem:

LP-ROUNDING algorithm achieves an approximation factor of f for

the set cover problem.

Proof.

Let C be the collection of picked sets.

Consider an arbitrary element e.

Since e is in at most f sets, one of these sets must be picked to the extent

of at least 1/f in the fractional cover.

Thus, e is covered by C, and hence C is a valid set cover.

The rounding process increase xS, for each set S C, by a factor of at

most f.

Therefore, the cost of C is at most f times the cost of the fractional

cover, thereby proving the desired approximation guarantee.

Set Cover via the Primal-dual Schema

Overview of the schema

The primal program is:

,...,10

,...,1subject to

minimize

The dual program is:

,...,10

,...,1subject to

maxmize

Most known approximation algorithm using the primal-dual schema run

by ensuring one set of conditions, and suitably relaxing the other.

In the following description we capture both situations by relaxing both

conditions.

If primal conditions are ensured, we set =1, and if dual conditions are

ensured, we set = 1.

Primal complementary slackness conditions:

Dual complementary slackness conditions:

.or 0either ,1each For

i iijjj cyacxnj

.or 0either ,1each For

j jijii bxabymi

Proposition:

If x and y are primal and dual feasible solutions satisfying the

conditions stated above, then

Proof.

The first and second inequalities follow from the primal and dual

conditions, respectively. The equality follows by simply changing the

order of summation.

jj ybxc11

yxaxyaxc

1 11 11

The algorithm starts with a primal infeasible solution and a dual feasible

solution; these are usually the trivial solutions x = 0 and y = 0.

It iteratively improves the feasibility of the primal solution, and the

optimality of the dual solution, ensuring that in the end a primal feasible

solution is obtained and all conditions are satisfied with a suitable

choice of and .

The primal solution is always extended integrally, thus ensuring that the

final solution is integral.

The current primal solution is used to determine the improvement to the

dual, and vice versa.

Finally, the cost of the dual solution is used as a lower bound on OPT.

By the proposition, the approximation guarantee of the algorithm is .

Primal-dual schema applied to SC

We choose = 1 and = f, frequency.

Primal conditions:

S is said to be tight if

Since we increment the primal variable integrally, we can state the

conditions as “Pick only tight sets in cover”.

In order to maintain dual feasibility, we are not allowed to overpack

any set.

ScyxSSSee

ScySee e

Primal-dual schema applied to SC

We choose = 1 and = f, frequency.

Dual conditions:

Since we find a 0/1 solution for x, these conditions are equivalent to

“Each element having a nonzero dual value can be covered at most

f times”.

Since each element is in at most f sets, this condition is trivially

satisfied for all elements.

fxyeSeS

Primal –dual algorithm

PRIMAL-DUAL

1. Initialization: x 0, y 0.

2. Until all elements are covered, do:

Pick an uncovered element, say, e, and raise ye until some set goes

tight.

Pick all tight sets in the cover and update x.

Declare all the elements occurring in these sets as “covered”.

3. Output the set cover x.

Approximation ratio

Theorem:

PRIMAL-DUAL achieves an approximation factor of f.

Proof.

There is no uncovered elements and no overpacked sets at the end of the

algorithm.

Thus, the primal and dual solutions are both feasible.

Since they satisfy the relaxed complementary slackness conditions with

= 1, = f.

By the proposition, the approximation factor is f.

Appeindix:Random Variables and TheirExpectations

Terminologies

Random variable:

Given a probability space, a random variable X is a function from the

underlying sample space to the natural numbers.

Expectation:

Given a discrete random variable X, its expectation E[X] is defined by:

Linearity of expectation:

Given two random variables X and Y defined over the same probability

space, we can define X+Y to be the random variable equal to

X()+Y() on a sample point . For any X and Y, we have

]Pr[][j

][][][ YEXEYXE

Conditional Expectation

Suppose that we have a random variable X and an event E of positive

probability.

We define the conditional expectation of X, given E, to be the expected

value of X computed only over the part of the sample space

corresponding to E as follows:

]|Pr[]|[j

EjXjEXE

Maximum Satisfiability

Maximum satisfiability (MAX-SAT)

Problem (MAX-SAT)

Input:

A conjunctive normal form formula f on Boolean variables x1, …, xn,

and nonegative weights wc for each clause c of formula f.

Find a truth assignment to the Boolean variables that maximizes the

total weight of satisfied clauses.

Let size(c) denote the size of clause c, i.e., the number of literals in it.

We assume that the sizes of clauses in f are arbitrary.

MAX-kSAT: each clause is of size at most k.

MAX-3SAT

Given a 3-SAT formula, find a truth assignment that satisfies as many

clauses as possible.

Remark: NP-hard optimization problem

Notation

Random variable W denotes the total weight of satisfied clauses.

For each clause c, random variable Wc denotes the weight contributed

by clause c to W.

The expectation is as follows:

]satisfied is Pr[][ cwWE cc

Dealing with large clauses

SIMPLE-RANDOM

1. Set each Boolean variable to be true independently with probability 1/2.

2. Output the resulting truth assignment, say, .

If size(c) = k, then E[Wc] = kwc, where k = 1 – 2-k.

Proof.

Clause c is not satisfied by iff all its literals are set to false.

The probability of this event is 2-k.

Dealing with large clauses

For k 1, k 1/2.

By linearity of expectation,

since OPT is at most the total weight of clauses in C.

Observe that k = 1 – 2-k increases with k and the guarantee of this

algorithm is 3/4 if each clause has two or more literals.

OPTwWEWECc

43211 2

Self-reducibility of SAT

Let be a SAT formula on n Boolean variables, x1, …, xn.

Let a truth assignment be an n-bit 0/1 vector.

Let S be the set of satisfying truth assignment, i.e., solutions to .

For the setting x1 = 0 (or 1), we can find, in polynomial time, a formula

0 (1) on the remaining n – 1 variables whose solution S0 (S1) are

precisely solution of having x1 = 0 (1).

Self-reducibility tree T

n0110 n101 n n1

Derandomization

Consider the self-reducibility tree T for formula f.

Each internal node at level i corresponds to a setting for Boolean

variables x1, …, xi.

Let us label each node of T with its conditional expectation as follows.

Let a1, …, ai be a (partial) truth assignment to x1, …, xi.

The node corresponding to this assignment is labeled with

If i = n, this is a leaf node and its conditional expectation is simply the

total weight of clauses satisfied by its truth assignment.

],...,[ 11 ii axaxWE

Derandomizing via conditional expectation

The conditional expectation of any node in T can be computed in

polynomial time.

Consider a node x1 = a1, …, xi = ai.

Let be the formula on variables xi+1, …, xn.

The expected weight of satisfied clauses of under a random truth

assignment to the variables xi+1, …, xn can be computed in polynomial

Adding to this, the total weight of clauses of f already satisfied by the

partial assignment x1 = a1, …, xi = ai gives the answer.

Theorem

We can compute, in polynomial time, a path from the root to a leaf such

that the conditional expectation of each node on this path is E[W].

Proof.

The conditional expectation of a node is the average of the conditional

expectation of its two children, i.e.,

The reason is xi+1 is equally likely to be set to true or false.

As a result, the child with the larger value has a conditional expectation

at least as large as that of the parent.

This establishes the existence of the desired path.

It can be computed in polynomial time.

2/],,...,[

2/],,...,[],...,[

falsexaxaxWE

truexaxaxWEaxaxWE

DERANDOMIZATION

1. Simply output the truth assignment on the leaf node of the path

computed.

The total weight of clauses satisfied by it is E[W].

][][ 11 falsexWEtruexWE

],[],[ 2121 falsexfalsexWEtruexfalsexWE

General method

Derandomization of more complex randomized algorithms:

Suppose that the algorithm does not set the Boolean variables

independently of each other.

The sum of the two conditional probabilities is again 1, since the two

events are exhaustive.

Then, the conditional expectation of the parent is still a convex

combination of the conditional expectation of the two children.

],...,Pr[],,...,[

],...,[

111111

iiiiii

axaxfalsexfasexaxaxWE

axaxtruextruexaxaxWE

axaxWE

General method

If we can determine, in polynomial time, which of the two children has

a larger value, we can again derandomize the algorithm.

(However, computing the conditional expectations may not be easy.)

In general, a randomized algorithm may pick from a larger set of

choices and not necessarily with equal probability.

But, given by the probabilities of picking them, a convex combination

of the conditional expectations of these choices equals the conditional

expectation of the parent.

Hence, there must be a choice that has at least as large as a conditional

expectation as the parent.

Dealing with small clauses via LP-rounding

Consider an integer program (IP) for MAX-SAT (in the next slide).

For each clause c, let Sc+ (Sc

-) denote the set of Boolean variables

occurring nonnegated (negated) in c.

The truth assignment is encoded by a vector y.

Picking yi = 1 (yi = 0) denotes setting xi to true (false).

Constraint

The constraint for clause c ensures that zc (0 or 1) can be set to 1 only if

at least one of the literals occurring in c is set to true, i.e., if clause c is

satisfied by the picked truth assignment.

4321 xxxx SxxSxx 4321 ,,,

IP and LP-relaxation

Integer program for MAX-SAT

LP-relaxation

)1(,subject to

maximize

)1(,subject to

maximize

LP-rounding

LP-RANDOM

1. Solve (LP). Let (y*, z*) be the optimal solution.

2. Independently set xi to true with probability yi*.

3. Output the resulting truth assignment, say, .

For k 1, define

Lemma.

If size(c) = k, then

*][ cckc zwWE

Assume without loss of generality that all literals in c appear

nonnegative.

By renaming, we assume that c = (x1 … xk).

Clause c is satisfied if x1, …, xk are not all set to false.

The probability of this event is

where the first inequality follows from the arithmetic-geometric mean.

The second inequality uses the constraint in (LP) that y1+…+yk zc.

11)1(1)1(1

kkk aakaa 11 /

Proof (cont’d)

Define function g by:

This is a concave function with g(0) = 0 and g(1) = k.

Therefore, for z [0,1], g(z) kz.

Hence,

The lemma follows.

11]satisfied is Pr[ ck

Dealing with small clauses

Note that k is a decreasing function of k.

Thus, if all clauses are of size at most k,

This algorithm can also be derandomized using the method of

conditional expectation.

Hence, for MAX-SAT instances with clause sizes at most k, it is a k

factor approximation algorithm.

this is a 1 – 1/e factor algorithm for MAX-SAT.

OPTOPTzwWEWE kfk

A 3/4 factor algorithm

We combine the two algorithms as follows:

Let b be the flip of a fair coin.

If b = 0, then run the SIMPLE-RANDOM, and

If b = 1, then run the LP-RANDOM.

Let (y*, z*) be an optimal solution to (LP) on the given instance.

Lemma.

3][ ccc zwWE

Let size(c) = k.

By the lemma for large clauses and zc* 1,

By the lemma for small clauses,

Combining we get

Now, 1+1=2+2=3/2, and for k 3,

The lemma follows.

*]0[ cckck zwwbWE

*]1[ cck zwbWE

)(]1[]0[

1][ * kk

ccccc zwbWEbWEWE

23)11(87 ekk

Algorithm

By linearity of expectation,

DERANDOMIZED MAX-SAT

1. Use the derandomized factor 1/2 algorithm to get a truth assignment, 1.

2. Use the derandomized factor 1 – 1/e algorithm to get a truth assignment,

3. Output the better of the two assignments.

OPTOPTzwWEWE f

3][][ *

Derandomization

Approximation ratio

Theorem

DERANDOMIZED MAX-SAT algorithm is a deterministic factor 3/4

approximation algorithm for MAX-SAT.

Proof.

The average of the weights of satisfied clauses under 1 and 2 is

(3/4)OPT since the following holds

Hence, the better of these two assignments also does at least as well.

3][ ccc zwWE

Semidefinite Programming (SDP)

Outline

We provide a class of relaxations, called vector programs.

These serve as relaxation for several NP-hard problems, in particular,

for problems that can be expressed as strict quadratic programs.

Vector programs are equivalent to a powerful and well-studied

generalization of linear programs, called semidefinite programs.

Semidefinite programs (and vector programs) can be solved within an

additive error of , for any >0, in time polynomial in n and log(1/),

using the ellipsoid algorithm.

We see the use of vector programs by deriving a 0.87856 factor

approximation algorithm for MAX-CUT.

Maximum cut

Problem (MAX CUT)

Given an undirected graph G = (V, E), with edge weights w: E Q+,

find a partition (S, V – S) of V so as to maximize the total weight of

edges in this cut, i.e., edges that have one endpoint in S and the other in

V – S.

SV – S

total weight

Strict quadratic programs

A quadratic program is the problem of optimizing a quadratic function

of integer valued variables, subject to quadratic constraints on these

variables.

If each monomial in the object function, as well as in each of the

constraints, is of degree 0 (constant) or 2, then we will say that this is a

strict quadratic program.

MAX-CUT

1subject to

1maximize

Strict quadratic program of MAX-CUT

Let yi be an indicator variable for vertex vi which is constrained to be

either +1 or –1.

The partition (S, V – S) is defined as follows:

If vi and vj are on opposite sides of this partition, then yiyj = –1, and

edge (vi, vj) contributes wij to the objective function.

If they are on the same side, then yiyj = 1, and edge

(vi, vj) makes no contribution.

1 and 1 iiii yvSSVyvS

1subject to

1maximize

1(SQP)

Vector program

We relax the strict quadratic program to a vector one.

A vector program is defined over n vector variables in Rn, say, v1, …, vn,

and is the program of optimizing a linear function of the inner products

subject to linear constraints on these inner products.

A vector program can be thought of as being obtained from a linear

program by replacing each variable with an inner product of a pair of

these vectors.

njiji 1,vv

1subject to

1maximize

Vector program of MAX-CUT

A strict quadratic program over n integer variables defines a vector

program over n vector variables in Rn as follows:

Establish a correspondence between the n integer variables and the

n vector variables, and replace each degree 2 term with the

corresponding inner product.

1subject to

1maximize

Vector is a relaxation of Quadratic

Because of the constraint vivi = 1, the vectors, v1, …, vn are constrained

to lie of the n-dimensional sphere, Sn-1.

Any feasible solution to the strict quadratic program (SQP) yields a

solution to the vector program (VP) having the same objective function

value, by assigning the vector (yi, 0, … 0) to vi.

Therefore, the vector program (VP) is a relaxation of the strict quadratic

program (SQP).

Generally, the vector program corresponding to a strict quadratic

program is a relaxation of the quadratic program.

Approximability of vector programs

Vector programs are approximable to any desired degree of accuracy in

polynomial time, and thus the relaxation (VP) provides an upper bound

on OPT for MAX-CUT.

To show the above, we need to recall some interesting and powerful

properties of positive semidefinite matrices.

Properties of positive semidefinite matrices

Let A be a real, symmetric n n matrix.

Then, A has real eigenvalues and has n linearly independent

eigenvectors (even if the eigenvalues are not distinct).

We say that A is positive semidefinite if

Theorem:

The following are equivalent:

0, AxxRx Tn

.such that ,matrix real an is There.3

e.nonnegativ are of seigenvalue All.2

(1) (2)

Let be an eigenvalue of A, and let v be a corresponding eigenvector.

Therefore, Av = v.

Pre-multiplying by vT, we get vTAv = vTv.

By (1), vTAv 0. Therefore, vTv 0.

Since vTv > 0, 0.

(2) (3)

Let 1, …, n be the n eigenvalues of A, and v1, …, vn be the

corresponding complete collection of orthonormal eigenvectors.

Let Q be the matrix whose columns are v1, …, vn, and be the diagonal

matrix with entries 1, …, n.

Since for each I, Avi = vi, we have AQ = Q.

Proof (cont’d)

Since Q is orthogonal, i.e. QQT = I, we get that QT = Q-1.

Therefore, AQQ-1 = A = QQ-1 = QQT

Let D be the diagonal matrix whose diagonal entries are the positive

square roots of 1, …, n.

Note that by (2), 1, …, n are nonnegative, and thus their square roots

are real.

Then, = DDT.

Substituting, we get A = QDDTQT = (QD)(QD)T.

Now, (3) follows by letting W = (QD)T.

(3) (1)

For any x Rn, xTAx = xTWTWx = (Wx)T(Wx) 0.

Semidefinite programming problem

Let Y be an n n matrix of real variables.

The problem of maximizing a linear function of yij’s, subject to linear

constraints on them, and the additional constraint that Y be symmetric

and positive semidefinite, is called the semidefinite programming

problem (SDP).

Notation

Denote by Rnn the space of n n real matrices.

The trace of a matrix A Rnn is the sum of its diagonal entries, and is

denoted by tr(A).

The Frobenius inner product of matrices A, B Rnn, denoted AB, is

defined to be

Let denote the fact that matrix A is positive semidefinite.

T ba1 1

)(tr BABA

Semidefinite programming program

Let C, D1, …, Dk, and Y be symmetric n n real matrices.

d1, …, dk are real.

Following is a statement of the general semidefinite programming

problem.

Observe that if C, D1, …, Dk are all diagonal matrices, this is simply a

linear programming problem.

1subject to

maximize

Separating hyperplane

Let us call a matrix satisfying all the constraints of (SDP) a feasible

solution.

Since a convex combination of positive semidefinite matrices is positive

semidefinite, it is easy to see that the set of feasible solutions is convex.

I.e., if A and B are feasible solutions, then so is any convex

combination of these solutions.

Let A be an infeasible point.

A hyperplane CY b is called a separating hyperplane for A if all

feasible points satisfy it and point A does not satisfy it.

In the next slide, we show how to find a separating hyperplane in

polynomial time.

As a consequence, for any > 0, semidefinite programs can be solved

within an additive error of , in time polynomial in n and log(1/), using

the ellipsoid algorithm.

Separating hyperplane

Theorem:

Let SDP be a semidefinite programming problem, and A be a point in

Rnn. We can determine in polynomial time whether A is feasible for

SDP, and, if it is not, find a separating hyperplane.

feasible infeasible

CY = b

separatinghyperplane

Testing for feasibility involves ensuring that A is symmetric and

positive semidefinite, and that it satisfies all the linear constraints. This

can be done in polynomial time (proof is omitted).

If A is infeasible, a separating hyperplane is obtained as follows:

If A is not symmetric, then aij > aji for some i, j. Then, yij yji is a

separating hyperplane.

If A is not positive semidefinite, then it has a negative eigenvalue,

say, . Let v be the corresponding eigenvector. Now (vvT)Y =

vTYv 0 is a separating hyperplane.

If any of the linear constraints is violated, it directly yields a

separating hyperplane.

Equivalency

Theorem:

Vector program VP is equivalent to semidefinite program SDP.

Proof.

Let a1, …, an be a feasible solution to VP.

Let W be the matrix whose columns are a1, …, an.

Then, A = WTW is a feasible solution to the corresponding SDP S

(over n2 variables) having the same objective function value.

Let A be a feasible solution to the SDP S.

Then, there is an n n matrix W such that A = WTW.

Let a1, …, an be the columns of W.

Then, a1, …, an is a feasible solution to VP having the same objective

function value.

Semidefinite programming relaxation of MAX-CUT

Semidefinite programming relaxation of MAX-CUT, which is

equivalent to (VP)

1subject to

1maximize

Randomized rounding algorithm

Assume the we have an optimal solution to (VP).

(However, in fact, there is a slight inaccuracy in solving (VP).)

Let a1, …, an be an optimal solution.

These vectors lie on the n-dimensional unit sphere Sn-1.

Let OPTv denote its objective function value.

We need to obtain a cut (S, V – S) whose weights is a large fraction of

S V – S

vectors cut

Let ij denote the angle between vectors ai and aj.

The contribution of this pair of vectors to OPTv is

The closer ij is to , the larger this contribution will be.

We want vertices vi and vj to be separated if ij is large.

Pick r to be a uniformly distributed vector to the unit sphere Sn-1.

Select S as follows:

ajij ij

ij waa

w cos1

0 raiivS

Lemma.

Proof.

Project r onto the plane containing vi and vj.

ji vv separated are and Pr

SvSv ji ,

Algorithm

VECTOR ROUNDING MAX CUT

1. Solve vector program (VP).

Let a1, …, an be an optimal solution.

2. Pick r to be a uniformly distributed vector on the unit sphere Sn-1.

3. Output a cut (S, V – S):

Let W be the random variable denoting the weight of edges in the cut

picked by the above algorithm.

0 raiivS

Approximation ratio

Theorem.

Proof.

Using Mathematica…

cos1min

87856.0

Approximation ratio

Lemma.

Proof.

For any 0 ,

vOPTWE ][

)cos1(2

]separated are and Pr[][

Approximation ratio

Theorem.

There is a randomized approximation algorithm for MAX-CUT

achieving an approximation factor of 0.87856.

Proof.

Let T denote the sum of weights of all edges in G.

Define a so that E[W] = aT.

where > 0 is a constant.

Since the random variable W is always bounded by T, we get

])1(Pr[ aTWp

TpaTpaTWE )1()1(][

Proof (cont’d)

Therefore,

where the last inequality follows from the fact that OPT T/2.

Therefore,

Using this upper and lower bound on a, we get

TOPTOPTaTWET v

2/where,1

Proof (cont’d)

Run VECTOR ROUNDING MAX CUT algorithm 1/c times, and

output the heaviest cut found in there runs.

Let W’ be the weight of this cut.

Since aT OPT > 0.87856 OPT, we can pick a value of > 0 so that

ecpTW cc 1

1)1(11])1('Pr[ /1/1

])1(Pr[

OPT87856.0)1( aT

approximation ii

Documents

approximation algorithms for network design problems

signal processing course : approximation

reinforcement learning lecture function approximation

mupad computeralgebrapraktikum: approximation

1 approximation de l’optique géométrique

breitkopf approximation diffuse

kernel approximation methods for speech recognition

chapitre ii interpolation et...

submodular optimization and approximation algorithm ·...

approximation algorithms: dynamic programming

model reduction (approximation) of large-scale systems...

envy free makespan approximation

nonlocal approximation - arxiv

calculus favorite: stirling's approximation,...

hasse-prinzip und schwache approximation - iazd · i....

stein's method for functional poisson approximation

modul or - metode vogel's approximation (vam)

diophantine approximation related to polylogarithms...

exponents of diophantine approximation

m209 : approximation et fractions continues