recent results in non-asymptotic shannon theory

53
Recent Results in Non-Asymptotic Shannon Theory Dror Baron Supported by AFOSR, DARPA, NSF, ONR, and Texas Instruments Joint work with M. A. Khojastepour, R. G. Baraniuk, and S. Sarvotham

Upload: claire

Post on 15-Jan-2016

32 views

Category:

Documents


0 download

DESCRIPTION

Recent Results in Non-Asymptotic Shannon Theory. Dror Baron Supported by AFOSR, DARPA, NSF, ONR, and Texas Instruments Joint work with M. A. Khojastepour, R. G. Baraniuk, and S. Sarvotham. “we may someday see the end of wireline”. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Recent Results in  Non-Asymptotic Shannon Theory

Recent Results in Non-Asymptotic Shannon Theory

Dror Baron

Supported by AFOSR, DARPA, NSF, ONR, and Texas InstrumentsJoint work with M. A. Khojastepour, R. G. Baraniuk, and S. Sarvotham

Page 2: Recent Results in  Non-Asymptotic Shannon Theory

“we may someday see the end of wireline”

• S. Cherry, “Edholm’s law of bandwidth,” IEEE Spectrum, vol. 41, no. 7, July 2004, pp. 58-60

Page 3: Recent Results in  Non-Asymptotic Shannon Theory

But will there ever be enough data rate?

• R. Lucky, 1989:“We are not very good at predicting uses until the actual service becomes available. I am not worried; we will think of something when it happens.”

There will always be new applications that gobble up more data rate!

Page 4: Recent Results in  Non-Asymptotic Shannon Theory

How much can we improve wireless?

• Spectrum is limited natural resource• Information theory says we need lots of power for

high data rates - even with infinite bandwidth!

• Solution: transmit more power BUT– Limited by environmental concerns– Will batteries support all that power?

• Sooner or later wireless rates will hit a wall!

Page 5: Recent Results in  Non-Asymptotic Shannon Theory

Where can we improve?

• Algorithms and hardware gains– Power-efficient computation – Efficient power amplifiers– Advances in batteries– Directional antennas

• Communication gains– Channel coding – Source coding– Better source and channel models

Page 6: Recent Results in  Non-Asymptotic Shannon Theory

Where will “the last dB” of communication gains come from?

Network information theory (Shannon theory)

Page 7: Recent Results in  Non-Asymptotic Shannon Theory

Traditional point to point information theory

• Single source• Single transmitter• Single receiver• Single communication stream

• Most aspects are well-understood

Channel DecoderEncoder

Page 8: Recent Results in  Non-Asymptotic Shannon Theory

Network information theory

• Network of:– Multiple sources– Multiple transmitters– Multiple receivers– Multiple communication streams

• Few results

• My goal: understand various costs of network information theory

Channel

Decoder

Encoder

Encoder Channel

Page 9: Recent Results in  Non-Asymptotic Shannon Theory

What costs has information theory overlooked?

Page 10: Recent Results in  Non-Asymptotic Shannon Theory

Channel coding has arrived…

• Turbo codes [Berrou et al., 1993]– ~0.5 dB gap to capacity (rate R below capacity)– BER=10-5

– Block length n=6.5£104

• Regular LDPC codes [Gallager, 1963]

• Irregular LDPC [Richardson et al., 2001]– 0.13 dB gap to capacity– BER=10-6

– n=106

Channel DecoderEncoder

Page 11: Recent Results in  Non-Asymptotic Shannon Theory

Distributed source coding has also arrived…

• Encoder for x based on syndrome of channel code• Decoder for x has correlated side information y

• Various types of channel codes can be used

• Slepian-Wolf via LDPC codes [Xiong et al., 2004]– H(X|Y)=0.47– R=0.5 (rate above Slepian-Wolf limit)– BER=10-6

– Block length n=105

DecoderEncoderx

y

Page 12: Recent Results in  Non-Asymptotic Shannon Theory

Hey! Did you notice those block lengths?

• Information theory provides results in the asymptotic regime– Channel coding: 8>0, rate R=C- achievable

with !0 as n!1 – Slepian-Wolf coding: 8>0, rate R=H(X|Y)+

achievable with !0 as n!1

• Best practical results achieved for n¸105

• Do those results require large n?

Page 13: Recent Results in  Non-Asymptotic Shannon Theory

But we live in a finite world…

• Real world data doesn’t always have n¸106

– IP packets– Emails, text messages– Sensornet applications

(small battery ! small n)

• How do those methods perform for n=104? 103?

• How quickly can we approach the performance limits of information theory?

Page 14: Recent Results in  Non-Asymptotic Shannon Theory

And we don’t know the statistics either!

• Lossless coding (single source):– Length-n input x~Ber(p) – Encode with wrong parameter q– K-L divergence penalty with variable rate codesPerformance loss (minor bitrate penalty)

• Channel coding, distributed source coding:– Encode with wrong parameter q<p<0.5– Fixed rate codes based on joint-typicality– Typical set Tq for q is smaller than Tp for p• As n!1, Pr(error)!1Performance collapse!

Page 15: Recent Results in  Non-Asymptotic Shannon Theory

Main challenges

• How quickly can we approach the performance limits of information theory?

• Will address for channel coding and Slepian-Wolf

• What can we do when the source statistics are unknown?

• Will address for Slepian-Wolf

Page 16: Recent Results in  Non-Asymptotic Shannon Theory

But first . . .

What does the prior art indicate?

Page 17: Recent Results in  Non-Asymptotic Shannon Theory

Underlying problem

• Shannon [1958]:“This inverse problem is perhaps the more natural in applications: given a required level of probability of error, how long must the code be?”– Motivation may have been phone and space

communication– “Small” probability of codeword error

• Wireless paradigm:Given k bits, what are the minimal channel resources to attain probability of error ?– Can retransmit packet fix “large” – n depends on packet length– Need to characterize R(n,)

Page 18: Recent Results in  Non-Asymptotic Shannon Theory

Error exponents

• Fix rate R<C and codeword length n– Bounds on probability of error– Random coding Pr[error]·2-nEr(R)

– Sphere packing Pr[error]¸2-nEsp(R)+o(n)

– Er(R)=Esp(R) for R near C

Page 19: Recent Results in  Non-Asymptotic Shannon Theory

Error exponents

• Fix rate R<C and codeword length n– Bounds on probability of error– Random coding Pr[error]·2-nEr(R)

– Sphere packing Pr[error]¸2-nEsp(R)+o(n)

– Er(R)=Esp(R) for R near C

• Shannon’s regime:“This inverse problem is perhaps the more natural in applications: given a required level of probability of error, how long must the code be?”– Fix R<CE(R)=O(1) log()=O(n) good for “small”

Page 20: Recent Results in  Non-Asymptotic Shannon Theory

Error exponents

• Fix rate R<C and codeword length n– Bounds on probability of error– Random coding Pr[error]·2-nEr(R)

– Sphere packing Pr[error]¸2-nEsp(R)+o(n)

– Er(R)=Esp(R) for R near C

• Wireless paradigm:Given k bits, what are the minimal channel resources to attain probability of error ?– Fix nE(R)=O(1)– o(n) term dominatesBounds diverge

Page 21: Recent Results in  Non-Asymptotic Shannon Theory

Error exponents fail for R=C-/n0.5

Page 22: Recent Results in  Non-Asymptotic Shannon Theory

How quickly can we approach the channel capacity?

(known statistics)

Page 23: Recent Results in  Non-Asymptotic Shannon Theory

Binary symmetric channel (BSC) setup

• s2{1,…,M} input message• x, y, and z binary length-n sequences• z~Bernoulli(n,p) implies crossover probability p• Code (f,g,n,M,) includes:

– Encoder x=f(s)2{1,…,M}– Rate R=log(M)/n– Channel y=xz– Decoder g reconstructs s by s’=g(y)– Error probability Pr[g(y)s]·

sEncoder f

x=f(s)Decoder g

z~Ber(n,p)

s’=g(y)y

Page 24: Recent Results in  Non-Asymptotic Shannon Theory

Non-asymptotic capacity• Definition: CNA(n,)=max9 code (f,g,n,M,) log(M)/n• Theorem: Capacity of BSC is C=1-H(Z)=1-H2(p)

• Prior art by Wolfowitz [1978]– Converse result CNA(n,) · C-KC()/n0.5

– Achievable result CNA(n,) ¸ C-KA()/n0.5

– Bounds are loose KA()>KC()

• Can we tighten Wolfowitz’s bounds?

n

CNA(n,) looseness of bounds

C

Page 25: Recent Results in  Non-Asymptotic Shannon Theory

Key to solution – Packing typical sets

• Need to encode typical set TZ for z

– Code needs to “cover” z2Tz

Need Pr(z2TZ)¼ Probability of codeword error

• What about rate?– Output space = 2n possible sequences– Can’t pack more than 2n/|Tz| sets into outputM·2n/|Tz| – Minimal cardinality Tmin covers CNA·1-log(|Tmin|)

Tz

output space

Page 26: Recent Results in  Non-Asymptotic Shannon Theory

What’s the cardinality of Tmin?

• Consider empirical statistics nz=i zi, PZ=nz/n

– p<0.5 Pr(z) monotone decreasing in nz

Minimal Tmin has form Tmin,{z: PZ·}

• Determine () with central limit theorem (CLT)

– E[PZ]=p, Var(PZ)=p(1-p)/n

Pz~N(p,p(1-p)/n)

• Asymptotic– =p+– LLN: 0

• Non-asymptotic– =p+[p(1-p)/n]0.5

– CLT: ! ()

Page 27: Recent Results in  Non-Asymptotic Shannon Theory

Tight non-asymptotic capacity• Theorem:

– CNA(n,)=C-K()/n0.5+o(n-0.5)– K()=-1() [p(1-p)]0.5 log((1-p)/p)– Gap to capacity is K()/n0.5+o(n-0.5)

• Note: o(n-0.5) asymptotically negligible w.r.t. K/n0.5

Tightened Wolfowitz bounds up to o(n-0.5) Gap to capacity of LDPC codes 2-3x greater We know how quickly we can approach C

n

CNA(n,) tight bound

C

Page 28: Recent Results in  Non-Asymptotic Shannon Theory

Non-asymptotic capacity of BSC

Page 29: Recent Results in  Non-Asymptotic Shannon Theory

Gaussian channel results

• Continuous channel

• Power constraint i(xi)2 · nP

• Shannon [1958] derived CNA(n,) for Gaussian channel via cone packing (non-i.i.d. codebook)

• Information spectrum bounds on probabilities of error indicate Gaussian codebooks are sub-optimal

i.i.d. codebooks aren’t good enough!

sEncoder f

x=f(s)Decoder g

z~N(0,2)

s’=g(y)y

Page 30: Recent Results in  Non-Asymptotic Shannon Theory

Excess power of Gaussian channel

Page 31: Recent Results in  Non-Asymptotic Shannon Theory

How quickly can we approach the Slepian-Wolf limit?

(known statistics)

Page 32: Recent Results in  Non-Asymptotic Shannon Theory

But first . . .

Slepian-Wolf Review

Page 33: Recent Results in  Non-Asymptotic Shannon Theory

Slepian-Wolf setup

• x and y are correlated length-n sequences

• Code (fX,fY,gX,gY,n,MX,MY,X,Y) includes:

– Encoders fX(x)2{1,…,MX}, fY(y)2{1,…,MY}

– Rates RX=log(MX)/n, RY=log(MY)/n

– Decoder g reconstructs x and y by gX(fX(x),fY(y))

and gY(fX(x),fY(y))

– Error probabilities Pr[gX(fX(x),fY(y))x]·X and Pr[gY(fX(x),fY(y))y]·Y

xEncoder fX

fX(x)

Decoder gy

Encoder fY

fY(y)

gX(fX(x),fY(y))

gY(fX(x),fY(y))

Page 34: Recent Results in  Non-Asymptotic Shannon Theory

Slepian-Wolf theorem

• Theorem: [Slepian&Wolf,1973]

– RX¸H(X|Y) (conditional entropy)

– RY¸H(Y|X)

– RX+RY¸H(X,Y) (joint entropy)

RX

RY

H(Y|X)

H(Y)

H(X)H(X|Y)

Slepian-Wolf rate region

Page 35: Recent Results in  Non-Asymptotic Shannon Theory

Slepian-Wolf with binary symmetric correlation structure

(known statistics)

Page 36: Recent Results in  Non-Asymptotic Shannon Theory

Binary symmetric correlation setup

• y, z, and z are length-n Bernoulli sequences • Correlation channel z is independent of y• Bernoulli parameters p,q2[0,0.5), r=p(1-q)+(1-p)q • Code (f,g,n,M,) includes:

– Encoder f(x)2{1,…,M}– Rate R=log(M)/n– Decoder g(f(x),y)2{0,1}n

– Error probability Pr[g(f(x),y)x]·

x~Ber(r)

Encoder fy~Ber(p)

z~Ber(q)

f(x)2{1,…,M}

Decoder g

g(f(x),y)2{0,1}n

Page 37: Recent Results in  Non-Asymptotic Shannon Theory

Relation to general Slepian-Wolf setup

• x, y, and z are Bernoulli• Correlation z independent of y • Focus on encoding x at rate approaching H(Z)

• Neglect well-known encoding of y at rate RY=H(Y)

RX

RY

H(Y|X)

H(Y)

H(X)H(Z)

our setup

Page 38: Recent Results in  Non-Asymptotic Shannon Theory

Non-asymptotic Slepian-Wolf rate

• Definition: RNA(n,)=min9 code (f,g,n,M,) log(M)/n

• Prior art [Wolfowitz,1978]– Converse result RNA(n,) ¸ H(X|Y)+KC()/n0.5

– Achievable result RNA(n,) · H(X|Y)+KA()/n0.5

– Bounds are loose KA()>KC()

• Can we tighten Wolfowitz’s bounds?

n

RNA(n,) looseness of bounds

H(X|Y)

Page 39: Recent Results in  Non-Asymptotic Shannon Theory

Tight non-asymptotic rate

• Theorem: – RNA(n,)=H(Z)+K()/n0.5+o(n-0.5)– K()=-1() [q(1-q)]0.5 log((1-q)/q)– Redundancy rate is K()/n0.5+o(n-0.5)

• Note: o(n-0.5) decays faster than K/n0.5

Tightened Wolfowitz bounds up to o(n-0.5) We know how quickly we can approach H(Z) with

known statistics

n

RNA(n,) tight bound

H(X|Y)

Page 40: Recent Results in  Non-Asymptotic Shannon Theory

What can we do when the source statistics are unknown?

(universality)

Page 41: Recent Results in  Non-Asymptotic Shannon Theory

Universal setup

• Unknown Bernoulli parameters p, q, r

• Encoder observes x and ny=iyi

• Communication of ny requires log(n) bits

• Variable rate used

– Need distribution for nz

– Distribution depends on nx and ny (not x)

– Codebook size Mnx,ny

x~Ber(r)

Encoder fy~Ber(p)

z~Ber(q)

f(x)2{1,…,Mnx,ny}

Decoder g

g(f(x),y)2{0,1}n

ny=iyi

Page 42: Recent Results in  Non-Asymptotic Shannon Theory

Distribution of nz

• CLT was key to solution with known statistics• How can we apply CLT when q is unknown?

• Consider a numerical example – p=0.3, q=0.1, r=p(1-q)+(1-p)q

– PX=r, PY=p, PZ=q (empirical = true)

– We plot Pr(nz|nx,ny) as function of nz2{0,…,n}

Page 43: Recent Results in  Non-Asymptotic Shannon Theory

Pr(nz|nx,ny) for n=102

Page 44: Recent Results in  Non-Asymptotic Shannon Theory

Pr(nz|nx,ny) for n=103

Page 45: Recent Results in  Non-Asymptotic Shannon Theory

Pr(nz|nx,ny) for n=104

Page 46: Recent Results in  Non-Asymptotic Shannon Theory

Pr(nz|nx,ny) for n=104

where

Page 47: Recent Results in  Non-Asymptotic Shannon Theory

Universal rate

• Theorem:

– RNA(n,)=H(P*Z)+K’()/n0.5+o(n-0.5)

– K’()=f(PY)K()

– f(PY)=2[PY(1-PY)]0.5/|1-2PY|

f(PY)!0

f(PY)!1

Page 48: Recent Results in  Non-Asymptotic Shannon Theory

Why is f(PY) small when PY is small?

• Known statistics Var(nz)=nq(1-q) regardless of empirical statistics

• PY!0 can estimate nZ with small variance Universal scheme outperforms known statistics

when PY is small

• Key issue: variable rate coding (universal) beats fixed rate coding (known statistics)

Can cut down expected redundancy (known statistics) by communicating ny to encoder

• log(n) bits for ny will save O(n0.5)

Page 49: Recent Results in  Non-Asymptotic Shannon Theory

Redundancy for PY¼0.5

• f(PY) blows up as PY approaches 0.5

• Redundancy is O(n-0.5) with enormous constant

• Another scheme has O(n-1/3) redundancy

• Better performance for PY=0.5+O(n-1/6)

Universal redundancy can be huge!

• Ongoing research: improvement of O(n-1/3)

Page 50: Recent Results in  Non-Asymptotic Shannon Theory

Numerical example• n=104

• q=0.1

• Slepian-Wolf requires nH2(q)=4690 bits

• Non-asymptotic approach (known statistics) with =10-2 requires nRNA(n,)=4907 bits

• Universal approach with PY=0.3 requires 5224 bits

• With PY=0.4 we need 5863 bits

• In practice, penalty for universality is huge!

Page 51: Recent Results in  Non-Asymptotic Shannon Theory

Summary• Network information theory (Shannon theory) may

enable to increase wireless data rates• Practical channel codes and distributed source

codes approaching limits, rely on large n

• How quickly can we approach the performance limits of information theory?CNA=C-K()/n1/2+o(n-1/2)

RNA=H(Z)+K()/n1/2+o(n-1/2)Gap to capacity of LDPC codes 2-3x greater

Page 52: Recent Results in  Non-Asymptotic Shannon Theory

Universality• What can we do when the source statistics

are unknown? (Slepian Wolf)PY<0.5: H(P*

Z)+K’()/n1/2+o(n-1/2)

PY¼0.5: H(P*Z)+O(n-1/3) – can be huge!

• Universal channel coding with feedback for BSC

– Capacity-achieving code requires PY=0.5Universality with current scheme is O(n-1/3)

Channel DecoderEncoder

feedback

Page 53: Recent Results in  Non-Asymptotic Shannon Theory

Further directions

• Gaussian channel (briefly discussed)– Shannon [1958] derived CNA(n,) for Gaussian

channel with cone packing (non-i.i.d. codebook)– Gaussian codebooks are sub-optimal!

• Other channels:

– CNA(n,) ¸ C-KA()/n0.5 via information spectrum

– Gaussian codebook distribution sub-optimalMust consider non-i.i.d. codebook constructions

• Penalties for finite n and unknown statistics exist everywhere in Shannon theory!!

www.dsp.rice.edu