Transcript
Page 1: Recent Results in  Non-Asymptotic Shannon Theory

Recent Results in Non-Asymptotic Shannon Theory

Dror Baron

Supported by AFOSR, DARPA, NSF, ONR, and Texas InstrumentsJoint work with M. A. Khojastepour, R. G. Baraniuk, and S. Sarvotham

Page 2: Recent Results in  Non-Asymptotic Shannon Theory

“we may someday see the end of wireline”

• S. Cherry, “Edholm’s law of bandwidth,” IEEE Spectrum, vol. 41, no. 7, July 2004, pp. 58-60

Page 3: Recent Results in  Non-Asymptotic Shannon Theory

But will there ever be enough data rate?

• R. Lucky, 1989:“We are not very good at predicting uses until the actual service becomes available. I am not worried; we will think of something when it happens.”

There will always be new applications that gobble up more data rate!

Page 4: Recent Results in  Non-Asymptotic Shannon Theory

How much can we improve wireless?

• Spectrum is limited natural resource• Information theory says we need lots of power for

high data rates - even with infinite bandwidth!

• Solution: transmit more power BUT– Limited by environmental concerns– Will batteries support all that power?

• Sooner or later wireless rates will hit a wall!

Page 5: Recent Results in  Non-Asymptotic Shannon Theory

Where can we improve?

• Algorithms and hardware gains– Power-efficient computation – Efficient power amplifiers– Advances in batteries– Directional antennas

• Communication gains– Channel coding – Source coding– Better source and channel models

Page 6: Recent Results in  Non-Asymptotic Shannon Theory

Where will “the last dB” of communication gains come from?

Network information theory (Shannon theory)

Page 7: Recent Results in  Non-Asymptotic Shannon Theory

Traditional point to point information theory

• Single source• Single transmitter• Single receiver• Single communication stream

• Most aspects are well-understood

Channel DecoderEncoder

Page 8: Recent Results in  Non-Asymptotic Shannon Theory

Network information theory

• Network of:– Multiple sources– Multiple transmitters– Multiple receivers– Multiple communication streams

• Few results

• My goal: understand various costs of network information theory

Channel

Decoder

Encoder

Encoder Channel

Page 9: Recent Results in  Non-Asymptotic Shannon Theory

What costs has information theory overlooked?

Page 10: Recent Results in  Non-Asymptotic Shannon Theory

Channel coding has arrived…

• Turbo codes [Berrou et al., 1993]– ~0.5 dB gap to capacity (rate R below capacity)– BER=10-5

– Block length n=6.5£104

• Regular LDPC codes [Gallager, 1963]

• Irregular LDPC [Richardson et al., 2001]– 0.13 dB gap to capacity– BER=10-6

– n=106

Channel DecoderEncoder

Page 11: Recent Results in  Non-Asymptotic Shannon Theory

Distributed source coding has also arrived…

• Encoder for x based on syndrome of channel code• Decoder for x has correlated side information y

• Various types of channel codes can be used

• Slepian-Wolf via LDPC codes [Xiong et al., 2004]– H(X|Y)=0.47– R=0.5 (rate above Slepian-Wolf limit)– BER=10-6

– Block length n=105

DecoderEncoderx

y

Page 12: Recent Results in  Non-Asymptotic Shannon Theory

Hey! Did you notice those block lengths?

• Information theory provides results in the asymptotic regime– Channel coding: 8>0, rate R=C- achievable

with !0 as n!1 – Slepian-Wolf coding: 8>0, rate R=H(X|Y)+

achievable with !0 as n!1

• Best practical results achieved for n¸105

• Do those results require large n?

Page 13: Recent Results in  Non-Asymptotic Shannon Theory

But we live in a finite world…

• Real world data doesn’t always have n¸106

– IP packets– Emails, text messages– Sensornet applications

(small battery ! small n)

• How do those methods perform for n=104? 103?

• How quickly can we approach the performance limits of information theory?

Page 14: Recent Results in  Non-Asymptotic Shannon Theory

And we don’t know the statistics either!

• Lossless coding (single source):– Length-n input x~Ber(p) – Encode with wrong parameter q– K-L divergence penalty with variable rate codesPerformance loss (minor bitrate penalty)

• Channel coding, distributed source coding:– Encode with wrong parameter q<p<0.5– Fixed rate codes based on joint-typicality– Typical set Tq for q is smaller than Tp for p• As n!1, Pr(error)!1Performance collapse!

Page 15: Recent Results in  Non-Asymptotic Shannon Theory

Main challenges

• How quickly can we approach the performance limits of information theory?

• Will address for channel coding and Slepian-Wolf

• What can we do when the source statistics are unknown?

• Will address for Slepian-Wolf

Page 16: Recent Results in  Non-Asymptotic Shannon Theory

But first . . .

What does the prior art indicate?

Page 17: Recent Results in  Non-Asymptotic Shannon Theory

Underlying problem

• Shannon [1958]:“This inverse problem is perhaps the more natural in applications: given a required level of probability of error, how long must the code be?”– Motivation may have been phone and space

communication– “Small” probability of codeword error

• Wireless paradigm:Given k bits, what are the minimal channel resources to attain probability of error ?– Can retransmit packet fix “large” – n depends on packet length– Need to characterize R(n,)

Page 18: Recent Results in  Non-Asymptotic Shannon Theory

Error exponents

• Fix rate R<C and codeword length n– Bounds on probability of error– Random coding Pr[error]·2-nEr(R)

– Sphere packing Pr[error]¸2-nEsp(R)+o(n)

– Er(R)=Esp(R) for R near C

Page 19: Recent Results in  Non-Asymptotic Shannon Theory

Error exponents

• Fix rate R<C and codeword length n– Bounds on probability of error– Random coding Pr[error]·2-nEr(R)

– Sphere packing Pr[error]¸2-nEsp(R)+o(n)

– Er(R)=Esp(R) for R near C

• Shannon’s regime:“This inverse problem is perhaps the more natural in applications: given a required level of probability of error, how long must the code be?”– Fix R<CE(R)=O(1) log()=O(n) good for “small”

Page 20: Recent Results in  Non-Asymptotic Shannon Theory

Error exponents

• Fix rate R<C and codeword length n– Bounds on probability of error– Random coding Pr[error]·2-nEr(R)

– Sphere packing Pr[error]¸2-nEsp(R)+o(n)

– Er(R)=Esp(R) for R near C

• Wireless paradigm:Given k bits, what are the minimal channel resources to attain probability of error ?– Fix nE(R)=O(1)– o(n) term dominatesBounds diverge

Page 21: Recent Results in  Non-Asymptotic Shannon Theory

Error exponents fail for R=C-/n0.5

Page 22: Recent Results in  Non-Asymptotic Shannon Theory

How quickly can we approach the channel capacity?

(known statistics)

Page 23: Recent Results in  Non-Asymptotic Shannon Theory

Binary symmetric channel (BSC) setup

• s2{1,…,M} input message• x, y, and z binary length-n sequences• z~Bernoulli(n,p) implies crossover probability p• Code (f,g,n,M,) includes:

– Encoder x=f(s)2{1,…,M}– Rate R=log(M)/n– Channel y=xz– Decoder g reconstructs s by s’=g(y)– Error probability Pr[g(y)s]·

sEncoder f

x=f(s)Decoder g

z~Ber(n,p)

s’=g(y)y

Page 24: Recent Results in  Non-Asymptotic Shannon Theory

Non-asymptotic capacity• Definition: CNA(n,)=max9 code (f,g,n,M,) log(M)/n• Theorem: Capacity of BSC is C=1-H(Z)=1-H2(p)

• Prior art by Wolfowitz [1978]– Converse result CNA(n,) · C-KC()/n0.5

– Achievable result CNA(n,) ¸ C-KA()/n0.5

– Bounds are loose KA()>KC()

• Can we tighten Wolfowitz’s bounds?

n

CNA(n,) looseness of bounds

C

Page 25: Recent Results in  Non-Asymptotic Shannon Theory

Key to solution – Packing typical sets

• Need to encode typical set TZ for z

– Code needs to “cover” z2Tz

Need Pr(z2TZ)¼ Probability of codeword error

• What about rate?– Output space = 2n possible sequences– Can’t pack more than 2n/|Tz| sets into outputM·2n/|Tz| – Minimal cardinality Tmin covers CNA·1-log(|Tmin|)

Tz

output space

Page 26: Recent Results in  Non-Asymptotic Shannon Theory

What’s the cardinality of Tmin?

• Consider empirical statistics nz=i zi, PZ=nz/n

– p<0.5 Pr(z) monotone decreasing in nz

Minimal Tmin has form Tmin,{z: PZ·}

• Determine () with central limit theorem (CLT)

– E[PZ]=p, Var(PZ)=p(1-p)/n

Pz~N(p,p(1-p)/n)

• Asymptotic– =p+– LLN: 0

• Non-asymptotic– =p+[p(1-p)/n]0.5

– CLT: ! ()

Page 27: Recent Results in  Non-Asymptotic Shannon Theory

Tight non-asymptotic capacity• Theorem:

– CNA(n,)=C-K()/n0.5+o(n-0.5)– K()=-1() [p(1-p)]0.5 log((1-p)/p)– Gap to capacity is K()/n0.5+o(n-0.5)

• Note: o(n-0.5) asymptotically negligible w.r.t. K/n0.5

Tightened Wolfowitz bounds up to o(n-0.5) Gap to capacity of LDPC codes 2-3x greater We know how quickly we can approach C

n

CNA(n,) tight bound

C

Page 28: Recent Results in  Non-Asymptotic Shannon Theory

Non-asymptotic capacity of BSC

Page 29: Recent Results in  Non-Asymptotic Shannon Theory

Gaussian channel results

• Continuous channel

• Power constraint i(xi)2 · nP

• Shannon [1958] derived CNA(n,) for Gaussian channel via cone packing (non-i.i.d. codebook)

• Information spectrum bounds on probabilities of error indicate Gaussian codebooks are sub-optimal

i.i.d. codebooks aren’t good enough!

sEncoder f

x=f(s)Decoder g

z~N(0,2)

s’=g(y)y

Page 30: Recent Results in  Non-Asymptotic Shannon Theory

Excess power of Gaussian channel

Page 31: Recent Results in  Non-Asymptotic Shannon Theory

How quickly can we approach the Slepian-Wolf limit?

(known statistics)

Page 32: Recent Results in  Non-Asymptotic Shannon Theory

But first . . .

Slepian-Wolf Review

Page 33: Recent Results in  Non-Asymptotic Shannon Theory

Slepian-Wolf setup

• x and y are correlated length-n sequences

• Code (fX,fY,gX,gY,n,MX,MY,X,Y) includes:

– Encoders fX(x)2{1,…,MX}, fY(y)2{1,…,MY}

– Rates RX=log(MX)/n, RY=log(MY)/n

– Decoder g reconstructs x and y by gX(fX(x),fY(y))

and gY(fX(x),fY(y))

– Error probabilities Pr[gX(fX(x),fY(y))x]·X and Pr[gY(fX(x),fY(y))y]·Y

xEncoder fX

fX(x)

Decoder gy

Encoder fY

fY(y)

gX(fX(x),fY(y))

gY(fX(x),fY(y))

Page 34: Recent Results in  Non-Asymptotic Shannon Theory

Slepian-Wolf theorem

• Theorem: [Slepian&Wolf,1973]

– RX¸H(X|Y) (conditional entropy)

– RY¸H(Y|X)

– RX+RY¸H(X,Y) (joint entropy)

RX

RY

H(Y|X)

H(Y)

H(X)H(X|Y)

Slepian-Wolf rate region

Page 35: Recent Results in  Non-Asymptotic Shannon Theory

Slepian-Wolf with binary symmetric correlation structure

(known statistics)

Page 36: Recent Results in  Non-Asymptotic Shannon Theory

Binary symmetric correlation setup

• y, z, and z are length-n Bernoulli sequences • Correlation channel z is independent of y• Bernoulli parameters p,q2[0,0.5), r=p(1-q)+(1-p)q • Code (f,g,n,M,) includes:

– Encoder f(x)2{1,…,M}– Rate R=log(M)/n– Decoder g(f(x),y)2{0,1}n

– Error probability Pr[g(f(x),y)x]·

x~Ber(r)

Encoder fy~Ber(p)

z~Ber(q)

f(x)2{1,…,M}

Decoder g

g(f(x),y)2{0,1}n

Page 37: Recent Results in  Non-Asymptotic Shannon Theory

Relation to general Slepian-Wolf setup

• x, y, and z are Bernoulli• Correlation z independent of y • Focus on encoding x at rate approaching H(Z)

• Neglect well-known encoding of y at rate RY=H(Y)

RX

RY

H(Y|X)

H(Y)

H(X)H(Z)

our setup

Page 38: Recent Results in  Non-Asymptotic Shannon Theory

Non-asymptotic Slepian-Wolf rate

• Definition: RNA(n,)=min9 code (f,g,n,M,) log(M)/n

• Prior art [Wolfowitz,1978]– Converse result RNA(n,) ¸ H(X|Y)+KC()/n0.5

– Achievable result RNA(n,) · H(X|Y)+KA()/n0.5

– Bounds are loose KA()>KC()

• Can we tighten Wolfowitz’s bounds?

n

RNA(n,) looseness of bounds

H(X|Y)

Page 39: Recent Results in  Non-Asymptotic Shannon Theory

Tight non-asymptotic rate

• Theorem: – RNA(n,)=H(Z)+K()/n0.5+o(n-0.5)– K()=-1() [q(1-q)]0.5 log((1-q)/q)– Redundancy rate is K()/n0.5+o(n-0.5)

• Note: o(n-0.5) decays faster than K/n0.5

Tightened Wolfowitz bounds up to o(n-0.5) We know how quickly we can approach H(Z) with

known statistics

n

RNA(n,) tight bound

H(X|Y)

Page 40: Recent Results in  Non-Asymptotic Shannon Theory

What can we do when the source statistics are unknown?

(universality)

Page 41: Recent Results in  Non-Asymptotic Shannon Theory

Universal setup

• Unknown Bernoulli parameters p, q, r

• Encoder observes x and ny=iyi

• Communication of ny requires log(n) bits

• Variable rate used

– Need distribution for nz

– Distribution depends on nx and ny (not x)

– Codebook size Mnx,ny

x~Ber(r)

Encoder fy~Ber(p)

z~Ber(q)

f(x)2{1,…,Mnx,ny}

Decoder g

g(f(x),y)2{0,1}n

ny=iyi

Page 42: Recent Results in  Non-Asymptotic Shannon Theory

Distribution of nz

• CLT was key to solution with known statistics• How can we apply CLT when q is unknown?

• Consider a numerical example – p=0.3, q=0.1, r=p(1-q)+(1-p)q

– PX=r, PY=p, PZ=q (empirical = true)

– We plot Pr(nz|nx,ny) as function of nz2{0,…,n}

Page 43: Recent Results in  Non-Asymptotic Shannon Theory

Pr(nz|nx,ny) for n=102

Page 44: Recent Results in  Non-Asymptotic Shannon Theory

Pr(nz|nx,ny) for n=103

Page 45: Recent Results in  Non-Asymptotic Shannon Theory

Pr(nz|nx,ny) for n=104

Page 46: Recent Results in  Non-Asymptotic Shannon Theory

Pr(nz|nx,ny) for n=104

where

Page 47: Recent Results in  Non-Asymptotic Shannon Theory

Universal rate

• Theorem:

– RNA(n,)=H(P*Z)+K’()/n0.5+o(n-0.5)

– K’()=f(PY)K()

– f(PY)=2[PY(1-PY)]0.5/|1-2PY|

f(PY)!0

f(PY)!1

Page 48: Recent Results in  Non-Asymptotic Shannon Theory

Why is f(PY) small when PY is small?

• Known statistics Var(nz)=nq(1-q) regardless of empirical statistics

• PY!0 can estimate nZ with small variance Universal scheme outperforms known statistics

when PY is small

• Key issue: variable rate coding (universal) beats fixed rate coding (known statistics)

Can cut down expected redundancy (known statistics) by communicating ny to encoder

• log(n) bits for ny will save O(n0.5)

Page 49: Recent Results in  Non-Asymptotic Shannon Theory

Redundancy for PY¼0.5

• f(PY) blows up as PY approaches 0.5

• Redundancy is O(n-0.5) with enormous constant

• Another scheme has O(n-1/3) redundancy

• Better performance for PY=0.5+O(n-1/6)

Universal redundancy can be huge!

• Ongoing research: improvement of O(n-1/3)

Page 50: Recent Results in  Non-Asymptotic Shannon Theory

Numerical example• n=104

• q=0.1

• Slepian-Wolf requires nH2(q)=4690 bits

• Non-asymptotic approach (known statistics) with =10-2 requires nRNA(n,)=4907 bits

• Universal approach with PY=0.3 requires 5224 bits

• With PY=0.4 we need 5863 bits

• In practice, penalty for universality is huge!

Page 51: Recent Results in  Non-Asymptotic Shannon Theory

Summary• Network information theory (Shannon theory) may

enable to increase wireless data rates• Practical channel codes and distributed source

codes approaching limits, rely on large n

• How quickly can we approach the performance limits of information theory?CNA=C-K()/n1/2+o(n-1/2)

RNA=H(Z)+K()/n1/2+o(n-1/2)Gap to capacity of LDPC codes 2-3x greater

Page 52: Recent Results in  Non-Asymptotic Shannon Theory

Universality• What can we do when the source statistics

are unknown? (Slepian Wolf)PY<0.5: H(P*

Z)+K’()/n1/2+o(n-1/2)

PY¼0.5: H(P*Z)+O(n-1/3) – can be huge!

• Universal channel coding with feedback for BSC

– Capacity-achieving code requires PY=0.5Universality with current scheme is O(n-1/3)

Channel DecoderEncoder

feedback

Page 53: Recent Results in  Non-Asymptotic Shannon Theory

Further directions

• Gaussian channel (briefly discussed)– Shannon [1958] derived CNA(n,) for Gaussian

channel with cone packing (non-i.i.d. codebook)– Gaussian codebooks are sub-optimal!

• Other channels:

– CNA(n,) ¸ C-KA()/n0.5 via information spectrum

– Gaussian codebook distribution sub-optimalMust consider non-i.i.d. codebook constructions

• Penalties for finite n and unknown statistics exist everywhere in Shannon theory!!

www.dsp.rice.edu


Top Related