[ieee 2011 ieee international symposium on information theory - isit - st. petersburg, russia...

Polarization in the presence of memoryEren Sasoglu

EPFL, Lausanne, [email protected]

Abstract—It is shown that Arıkan’s recursive constructionpolarizes all q-ary, κ-th order ergodic Markov processes for allκ and prime q.

Index Terms—Channel polarization, source polarization, mem-ory, Markov process, mixing.

I. INTRODUCTION

In [1], Arıkan introduced polar codes, a class of codes thatachieve the ‘symmetric capacity’ of binary-input memorylesschannels. The same author showed later that the techniquesof [1] can be used to compress binary memoryless sourcesto their entropy [2]. The main observation underlying binarypolar codes for channel and source coding is the following:Suppose (X1, Y1), (X2, Y2), . . . is a sequence of i.i.d. randomvariables, where X1, X2, . . . are {0, 1}-valued. Here, the X’scan be thought of as inputs to a memoryless channel, and theY ’s as the corresponding output. Alternatively, the X’s canbe the output of an i.i.d. source, and the Y ’s side informationabout the source.

The first step of Arıkan’s recursive construction consists inplacing (X1, X2) in one-to-one correspondence with (S1, S2)via the mapping

S1 = X1 +X2

S2 = X2

, (1)

where the summation is modulo-2 (cf. Figure I). It is then easyto see that H(S2

1 | Y 21 ) = 2H(X1 | Y1), and also that

H(S2 | Y 21 S1) ≤ H(X1 | Y1) ≤ H(S1 | Y 2

1 ). (2)

That is, S1 is ‘more random’ than X1, and S2 is ‘lessrandom’, conditioned on the corresponding side information.Furthermore, as S1 and S2 are binary random variables, thetransform in (1) can similarly be applied to them. Proceedingrecursively in this fashion n times, one obtains N = 2n

random variables Un,i, i = 1, . . . , N , with

UNn,1 = XN1 ΠnGn, (3)

where Πn is a permutation matrix known as the ‘bit reversal’operator, and Gn = [ 1 1

1 0 ]⊗n. Here, ‘⊗n’ denotes the nthKronecker power of a matrix. We know from [1] that anequivalent way of defining UNn,1 is through

Un,2i = Vn−1,i,

Un,2i−1 = Un−1,i + Vn−1,i, i = 1, . . . , N/2.(4)

whereV Nn,1 = X2N

N+1ΠnGn.

Y1

Y2

S1

S2

X1

X2

Fig. 1. One level of Arıkan’s recursive construction.

The main result in [1] and [2] is that as the construction sizein (3) increases, the resulting entropies H(Un,i | Y N1 U i−1

n,1 )polarize, i.e., they approach either 0 or 1. That is, the resultingrandom variables Un,i become either almost deterministic oralmost uniformly distributed, conditioned on their past:

Theorem 1 ([1],[2]). For (X1, Y1), (X2, Y2), . . . and UNn,1defined as above, we have

limN→∞

1N

∣∣{i : H(Un,i | Y N1 U i−1n,1 ) < δ

}∣∣ = 1−H(X1 | Y1),

limN→∞

1N

∣∣{i : H(Un,i | Y N1 U i−1n,1 ) > 1− δ

}∣∣ = H(X1 | Y1).

for any δ > 0.

The proof of this theorem is based on the following obser-vations:(i) Regardless of the joint distribution of (X1, Y1), inequal-

ities in (2) are strict unless H(X1 | Y1) = 0 orH(X1 | Y1) = 1.

(ii) At the nth level of the construction, the pairs(Un,i, Y N1 U i−1

n,1 ) and (Vn,i, Y 2NN+1V

i−1n,1 ) are independent

and identically distributed. Therefore, in the next step ofthe construction we have

H(Un+1,2i | Y 2N1 U2i−1

n+1,1)

≤ H(Un,i | Y N1 U i−1n,1 )

≤ H(Un+1,2i−1 | Y 2N1 U2i−2

n+1,1).

(5)

Also, by virtue of (i), these inequalities are strict unlessH(Un,i | Y N1 ) is 0 or 1.

(iii) As n grows, both inequalities in (5) approach equalitiesfor almost all indices i. It then follows from (ii) that theentropies H(Un,i | Y N1 U i−1

n,1 ) must approach either 0 or1.

The aim of this note is to show that Arıkan’s construction(3) can be used as is to polarize a large class of processeswith memory. In order to do so, one unfortunately cannotfollow the above agenda exactly, as observation (i) dependsstrongly on the memorylessness of the underlying process(X1, Y1), (X2, Y2), . . . , and does not necessarily hold when

2011 IEEE International Symposium on Information Theory Proceedings

978-1-4577-0595-3/11/$26.00 ©2011 IEEE 189

the process has memory (and therefore neither does property(ii)). In fact, not only the inequalities in (2) may be equalities,but it is also easy to find examples of processes for which theydo not hold at all. We will see, however, that the inequalitiesin (ii) become strict for most indices as construction sizegrows, although they may not hold in the initial stages ofthe construction.

II. PROBLEM STATEMENT AND RESULT

Let X1, X2, . . . be an ergodic (i.e., positive recurrent andaperiodic), stationary Markov process of order κ <∞, takingvalues in X = {0, . . . , q− 1}. Throughout, we suppose that qis a prime number. Let

HX = limN→∞

H(XN |XN−1) = limn→∞

1NH(XN

1 )

denote the entropy rate of the process. For all n and N = 2n

define random variables Un,i, i = 1, . . . , N through

UNn,1 = XN1 ΠnGn,

where Πn and Gn are as in the previous section, and matrixmultiplication is over Fq .

Let H[0] := H(X1), and define, for all n and i = 1, . . . , N ,the shorthand notation

H[n]i := H(Un,i | U i−1n,1 ),

H[n]−i := H[n+ 1]2i−1,

H[n]+i := H[n+ 1]2i.

As ΠnGn is invertible, we haveN∑i=1

H[n]i = H(UNn,1) = H(XN1 ),

and therefore

limn→∞

1N

N∑i=1

H[n]i = HX . (6)

Entropies above and throughout this note are computed withbase-q logs, thus are [0, 1]-valued. The main result reportedhere is a generalization of Arıkan’s polarization theorem:

Theorem 2. Let X1, X2, . . . be as above. Then, for all δ > 0,

limn→∞

1N

∣∣{i : H[n]i > 1− δ}∣∣ = HX ,

limn→∞

1N

∣∣{i : H[n]i < δ}∣∣ = 1−HX .

(7)

The remainder of this note is devoted to the proof ofTheorem 2. In order to prove convergence in (7), we willhave recourse to techniques introduced in [1]; we thereforefind it useful to introduce the following notation and restate thetheorem in a slightly different form: Let B0, B1, . . . be i.i.d.,{−,+}-valued random variables with Pr[B0 = +] = 1/2.Define a random process H0, H1, . . . through

H0 = H[0] = H(X1),

Hn+1 = HBnn n = 0, 1, . . .

It is clear that the sets {H[n]i : i = 1, . . . , N} and {H[0]s : s ∈{−,+}n} are identical. As the process B0, B1, . . . inducesa uniform distribution on Hn over the set {H[0]s : s ∈{−,+}n}, it follows that

1N

∣∣{i : H[n]i ∈ A}∣∣ = Pr[Hn ∈ A] (8)

for any set A. Consequently, Theorem 2 is equivalent to

Theorem 2∗. Hn converges in distribution to a {0, 1}-valuedrandom variable H∞, with Pr[H∞ = 1] = 1 − Pr[H∞ =0] = HX .

III. AUXILIARY RESULTS

We will prove Theorem 2∗ through a similar path to (i)–(iii)above. For that purpose, we will first identify a class of jointdistributions on (X1, Y1), (X2, Y2) for which the inequality

H(X1 | Y1) ≤ H(X1 +X2 | Y1Y2) (9)

is strict provided that the left-hand entropy is away from 0 and1. We will then show that the random variables obtained byArıkan’s recursive construction belong to this class. We willomit the proofs of the auxiliary results throughout.

We begin by quoting an unconditional version of (9) forindependent q-ary random variables, which states that modulo-q addition of independent random variables strictly increasesentropy unless the summands are uniformly distributed or areconstants:

Lemma 1 ([3]). Let X1, X2 ∈ X be independent randomvariables with H(X1) ≤ H(X2). If

min{H(X2), 1−H(X1)} = δ > 0,

then there exists an ε1(δ) > 0 such that

H(X1) + ε1(δ) ≤ H(X1 +X2).

For the proof of Theorem 2, we will need a conditionalversion of Lemma 1. However, even if X1 and X2 are indepen-dent conditioned on a random variable Y , with moderate con-ditional entropies, it is not true in general that H(X1+X2 | Y )is strictly greater than either H(X1 | Y ) or H(X2 | Y ):

Example 1. Let X1, X2 ∈ {0, . . . , q − 1} and Y ∈ {0, 1},with

pX1X2|Y (x1, x2 | y) =

1/q2 if y = 01 if y = 1 and x1 = x2 = 00 otherwise

.

Note that X1 and X2 are i.i.d. conditioned on Y . It is alsoeasy to see that H(X1 +X2 | Y ) = H(X1 | Y ) = pY (0).

The above example illustrates the only case where summingconditionally independent random variables does not increaseentropy: the case in which X1 and X2 are simultaneouslyconstant or simultaneously uniform for all realizations of Y .The next result states that a conditional version of Lemma 1holds excluding this anomalous case.

190

Given X1, X2 ∈ X and Y , let hi(Y ) denote the randomvariable that takes the value hi(y) := H(Xi | Y = y)whenever Y = y. Given 0 < δ < 1

2 , define two randomvariables S1 and S2 through

Si =

0 if hi(Y ) ∈ [0, δ)1 if hi(Y ) ∈ [δ, 1− δ]2 if hi(Y ) ∈ (1− δ, 1]

, i = 1, 2.

Note that the irregularity described in Example 1 correspondsto the case where S1 = S2 ∈ {0, 2} with probability 1.

Lemma 2. If

(i) I(X1;X2 | Y ) ≤ ε2, and(ii) Pr[S1 = S2 ∈ {0, 2}] < 1− η for some η > 0,

then there exist µ(δ, η) > 0 and ν(ε2) such that

H(X1 +X2 | Y ) ≥ mini∈{1,2}

H(Xi | Y ) + µ(δ, η)− ν(ε2),

where ν(ε2)→ 0 as ε2 → 0.

Our proof of Theorem 2 relies heavily on Lemma 2. Inparticular, we will show that as the size of the construction in(4) increases, almost all the resulting random variables satisfythe conditions of the lemma. This in turn will imply that forlarge n, if H[n]i = H(Un,i | U i−1

n,1 ) is moderate (i.e., awayfrom 0 and 1), then the inequality

H(Un,i | U i−1n,1 ) ≤ H(Un,1 + Vn,1 | U i−1

n,1 Vi−1n,1 )

is strict for almost all indices i. As we will see in the proof,showing that Un,i and Vn,i are almost independent conditionedon their pasts (i.e., that condition (i) of Lemma 2 holds)is easy. The difficulty is in that the conditioning randomvariables U i−1

n,1 and V i−1n,1 are not necessarily independent. The

next result states (Lemma 3), however, that there is sufficientindependence between U i−1

n,1 and V i−1n,1 to satisfy condition (ii)

of Lemma 2:Given X1, X2, . . . and the resulting {Un,i} and {Vn,i} as

in Section II, let hun,i(Ui−1n,1 ) denote the random variable that

takes the value hun,i(ui−1) := H(Un,i | U i−1

n,1 = ui−1) when-ever U i−1

n,1 = ui−1, similarly to h(Y ) above. Also analogouslyto the above, define a sequence of random variables Sun,ithrough

Sun,i =

0 if hun,i(U

i−1n,1 ) ∈ [0, δ/2)

1 if hun,i(Ui−1n,1 ) ∈ [δ/2, 1− δ/2]

2 if hun,i(Ui−1n,1 ) ∈ (1− δ/2, 1]

, i = 1, . . . , N.

Similarly define random variables hvn,i(Vi−1n,1 ) and Svn,i by

replacing the U ’s with V ’s above.

Lemma 3. For any δ > 0, there exists n0(δ) and η(δ) > 0such that whenever n > n0(δ), H[n]i ∈ (δ, 1− δ) implies

Pr[Sun,i = Svn,i ∈ {0, 2}

]< 1− η(δ).

IV. PROOF OF THEOREM 2∗

We begin by writing

H[n]−i +H[n]+i= H(Un+1,2i−1 | U2i−2

n+1,1)

+H(Un+1,2i | U2i−1n+1,1)

= H(Un+1,2i−1, Un+1,2i | U2i−2n+1,1)

= H(Un,i, Vn,i | U i−1n,1 V

i−1n,1 )

≤ H(Un,i | U i−1n,1 ) +H(Vn,i | V i−1

n,1 )

= H[n]i +H[n]i = 2H[n]i. (10)

In the above, the third equality follows by virtue of the one-to-one correspondence between

(U in−1,1, V

in−1,1

)and U2i

n,1,which in turn is due to (4). The last equality is due tothe stationarity assumption. Since Hn takes values in [0, 1],it follows from (10) that the process {Hn} is a boundedsupermartingale, and therefore converges almost surely to arandom variable H∞, implying convergence in distribution.The claim on the probability distribution of H∞ will followfrom (6) and (8) once we show that H∞ is {0, 1}-valued. Thelatter claim is equivalent to the statement that for all δ > 0and ε > 0, there exists an n0(δ, ε) such that

Pr[Hn ∈ (δ, 1− δ)] < ε (11)

for all n ≥ n0(δ, ε), which we show next and conclude theproof: Note first that

12E[|H−n −Hn|

]≤ 1

2E[|H−n −Hn|

]+ 1

2E[|H+

n −Hn|]

= E [|Hn+1 −Hn|]→ 0,

where the convergence to zero is due to the almost sureconvergence of Hn. It then follows that for all ζ > 0, thereexists n1(ζ) such that

Pr[|H−n −Hn| ≤ ζ

]≥ 1− ε

4for all n ≥ n1(ζ). (12)

Now take η = η(δ/2) as in Lemma 3 and µ(δ, η) as inLemma 2, and let ζ = µ(δ/2, η). Then, (12) implies that theset

Mn :={i : |H[n]−i −H[n]i| <

µ(δ/2, η)4

}satisfies

|Mn|N

≥ 1− ε

4(13)

for all n ≥ n1 (µ(δ/2, η)). We will prove (11) by contradic-tion. To that end, define the set

Ln :={i : H(Un,i | U i−1

n,1 ) ∈ (δ, 1− δ)}

and suppose, contrary to (11), that there exists n >n1 (µ(δ/2, η)) for which

|Ln|N≥ ε. (14)

191

Define the sets

Kn :={i : I(Un,i;Vn,i | U i−1

n,1 Vi−1n,1 ) ≤

√κ/N

},

Jn,1 :={i : I(Un,i;V i−1

n,1 | Ui−1n,1 ) ≤

√κ/N

},

Jn,2 :={i : I(Vn,i;U i−1

n,1 | Vi−1n,1 ) ≤

√κ/N

},

Jn := Jn,1 ∩ Jn,2.

Note that for all n we haveκ

N≥ 1NI(XN

1 ;X2NN+1)

=1NI(UNn,1;V Nn,1)

=1N

N∑i=1

I(Un,i;V Nn,1 | U i−1n,1 )

≥ 1N

N∑i=1

I(Un,i;V in,1 | U i−1n,1 )

=1N

N∑i=1

[I(Un,i;V i−1

n,1 | Ui−1n,1 )

+I(Un,i;Vn,i | U i−1n,1 V

i−1n,1 )

].

This in particular implies that

|Jn,1 ∩ Kn|N

≥ 1−√κ/N.

By swapping the U ’s with the V ’s above, one also obtains|Jn,2|N ≥ 1−

√κ/N . Hence,

|Jn ∩ Kn|N

=|Jn,1 ∩ Jn,2 ∩ Kn|

N≥ 1− 2

√κ/N. (15)

Take n > max{n0(δ), n1 (µ(δ/2, η))} (where n0(δ) is as inLemma 3) such that √

κ/N <δ

2, (16)

2√κ/N <

ε

2, (17)

ν(√

κ/N)

+√κ/N ≤ µ(δ/2, η)

2.

Observe that for such n and for all i ∈ Jn ∩Ln, relation (16)implies

H(Un,i | U i−1n,1 , V

i−1n,1 ),

H(Vn,i | U i−1n,1 , V

i−1n,1 ) ∈ (δ/2, 1− δ/2),

Now let hun,i(Ui−1n,1 V

i−1n,1 ) be a random variable that takes

the value H(Un,i | U i−1n,1 V

i−1n,1 = ui−1vi−1) whenever

(U i−1n,1 V

i−1n,1 = ui−1vi−1) and define

Sun,i =

0 if hun,i(U

i−1n,1 V

i−1n,1 ) ∈ [0, δ/2)

1 if hun,i(Ui−1n,1 V

i−1n,1 ) ∈ [δ/2, 1− δ/2]

2 if hun,i(Ui−1n,1 V

i−1n,1 ) ∈ (1− δ/2, 1]

.

Also define Svn,i analogously. It can easily be shown thatfor i ∈ Jn, the joint distribution of the pair (Sun,i, S

vn,i)

approaches that of (Sun,i, Svn,i) as n grows. It then follows

from Lemma 3 that for sufficiently large n we have

Pr[Sun,i = Svn,i ∈ {0, 2}

]< 1− η/2.

For such n, and for all i ∈ Jn∩Kn∩Ln, it is easily seen that(U in,1, V in,1), along with Sun,i and Svn,i satisfy the conditionsof Lemma 2 with

X1 = Un,i, X2 = Vn,i, Y = (U i−1n,1 , V

i−1n,1 ),

S1 = Sun,i, S2 = Svn,i, ε2 =√κ/N, η = η/2.

Consider now i ∈ Jn ∩ Kn ∩ Ln. We have

H[n]−i −H[n]i= H(Un,i + Vn,i | U i−1

n,1 , Vi−1n,1 )

−H(Un,i | U i−1n,1 )

≥ H(Un,i + Vn,i | U i−1n,1 , V

i−1n,1 )

−H(Un,i | U i−1n,1 , V

i−1n,1 )−

√κ/N

≥ µ(δ/2, η)− ν(√

κ/N)−√κ/N

≥ µ(δ/2, η)2

,

from which we obtain

Jn ∩ Kn ∩ Ln ∩Mn = ∅.

This, in addition to (14), (15) and (17), implies

ε

2≤ |Jn ∩ Kn ∩ Ln|

N

=|Jn ∩ Kn ∩ Ln ∩Mc

n|N

≤ |Mcn|

N,

which contradicts (13), yielding the claim.

V. CHANNELS WITH MEMORY

Let W be a finite-state channel with input alphabet X ,output alphabet Y , and state space A = {0, 1, . . . ,M − 1}.Let xi,yi, and si denote the input, output and the state ofthe channel at time i. The channel is described by the jointprobability distribution

p(yN1 , xN1 , s

N1 ) = pX(xN1 )pS(sN1 )

N∏i=1

W (yi | xi, si), N ∈ N.

Note that the above description excludes intersymbol inter-ference channels. We assume that the state sequence forms aMarkov chain. That is,

p(slk) = pS(sk)l∏

i=k+1

pS2|S1(si | si−1)

for all k ≤ l ∈ N. Note that unlike the channel input alphabetsize, the number of channel states is not assumed to be prime,and thus all channels with finite memory length fit in this

192

model. We also assume that the channel input sequence is aκth-order ergodic and stationary Markov process with

p(xN1 ) = p(xκ1 )N∏

i=κ+1

p(xi | xi−1i−κ).

Fix n and let N = 2n. Let XN1 and Y N1 denote the inputs

and outputs to N uses of the channel W . Define the matricesΠn and Gn as in Section II, and let UN1 = XN

1 ΠnGn. Define

I[0] := I(X1;Y1),

I[n]i := I(Ui;Y N1 | U i−11 ).

Theorem 3. For all δ > 0,

limn→∞

1N

∣∣{i : I[n]i ∈ (δ, 1− δ)}

= 0

The proof of the above theorem follows similar argumentsto those of Section IV.

VI. DISCUSSION

Any recursive transformation that polarizes memorylesschannels will also polarize the class of processes consideredin this note. This can easily be seen by noting that the onlybearing of such transforms on our proofs is through equation(2), (equivalently, Lemma 1).

Although our main result is stated only for finite-memoryprocesses, we believe that this restriction is an artifact of ourproof technique, and is not necessary for polarization to takeplace. In fact, one easily sees that most of the crucial steps inthe proofs remain valid without the finite-memory assumption.We conjecture that Arıkan’s construction polarizes all mixingprocesses with prime alphabet sizes. Such a result would alsotranslate directly to the channel polarization setting to includeinfinite-memory and intersymbol-interference channels.

ACKNOWLEDGMENT

I would like to thank Emre Telatar for helpful discussionsand comments on the paper.

REFERENCES

[1] E. Arıkan, “Channel polarization: a method for constructing capacity-achieving codes for symmetric binary-input memoryless channels,” IEEETrans. Inform. Theory, vol. 55 no. 7, pp. 3051–3073, July 2009.

[2] E. Arıkan, “Source polarization,” Proc. Int. Sym. Inform. Theory, Jun.2010.

[3] E. Sasoglu, “An entropy inequality for q-ary random variables and itsapplication to channel polarization,” Proc. Int. Sym. Inform. Theory, Jun.2010.

193

[ieee 2011 ieee international symposium on information theory - isit - st. petersburg, russia...

Documents