coding theory and its applications 編碼及其應用 hung-lin fu dept. of applied mathematics...

Coding Theory and its Applications編碼及其應用

Hung-Lin Fu

Dept. of Applied Mathematics

National Chiao Tung University

Hsin Chu, Taiwan

Basic Ideas Messages Transmission Correctness and Security Save time and expense Security Study is the main job of

Cryptography Coding Theory not only deals with the

correctness of transmission but also the quickness of transmission.

The flow of Transmission

Message Encode Modulation

Demodulation Decode

Original Message

Through Noisy Channel

Examples Grades A, B, C, and D Use digits 0 and 1 to encode A : 00

B : 01

C : 10

D : 11 Send A

00

Receiving Following demodulation and decoding We expect to receive the original

message A. Unfortunately, it is possible to make

errors due to the “noise”.

Probability of Errors Let p denote the error probability of

sending “0” and receiving “1”. In a “symmetric channel”, sending “1” and

receiving “0” also has error probability p. If t digits are transmitted, then the

probability of making s errors is C(t,s)ps(1-p)(t-s).

The probability of making errors is C(t,1)p1(1-p)t-1 + C(t,2)p2(1-p)t-2 + … + pt.

1

0

1

0

(1-p)

(1-p)

p p

Symmetric Channel

It happens! Let p = 0.01. It looks small. But, in fact, this is a very large

number if we consider a transmission of real world. Million digits are transmitted in a minute. So, we have error digits about 10,000 in a minute.

Therefore, if we use 00, 01, 10, and 11 for A, B, C, and D, then errors in transmitting words occur! The probability of making errors(words) is 2x(0.01)x(0.99) + (0.01)2 = 0.0199.

An Improvement

Parity check digits 00 000 01 011 10 101 11 110 The probability of making errors “without

noticing” is smaller! C(3,2)x(0.01)2x(0.99) + (0.01)3 = 0.000298. We can add more digits instead of just one.

Error Correction When an error occurs, we may not be

able to know where is the error digit. So, “ask for retransmission”.

Retransmission is not always possible.

The Idea of Correcting Errors

00 000000 01 010101 10 101010 11 111111 Assume that 101110 is received. We

shall conclude that the message sent is 101010!

Hamming Distance The message we send can be

expressed as an n-dimension vector over the finite field GF(2) if the message has n digits.

E.g. 010101 (1,0,1,0,1,0) Let GF(2) = K. Kn is a set of 2n vectors.

A New Metric Let (a1,a2, …, an) and (b1,b2, …, bn) be two

vectors of Kn. Then the Hamming distance of the two vectors is the number of k’s such that ak – bk is not equal to 0, k = 1, 2, …, n.

E.g. d(101010,101110) = 1 d(000000,101110) = 4 d(111111,101110) = 2 d(010101,101110) = 5 Hamming distance is a “metric”!

Distance and Decoding If the distance of two words u and v of length n is d,

then the probability of sending u and receiving v is pd(1-p)n-d.Fact: If d(w,u) > d(v,u) and u is received, then v is

more probable than w as a sending word.e.g. Let 000000, 010101, 101010, and 111111 be the

four possible sending words and 101110 is received. Then we choose 101010 as the sending word.

Maximum Likelihood Decoding

Let C be the code we use for transmission and u be the word which is received through the channel.

CMLD(Complete Maximum Likelihood Decoding): If v satisfies that d(v,u) is minimum for all codewords in C, then we conclude that v is the transmitted codeword no matter v is unique or not.

IMLD(Incomplete MLD): If v(as above) is not unique, then ask for retransmission.

Linear Codes A code of length n is a subset of Kn.

A linear code of length n is a linear subspace of Kn. (The sum of two vectors is taken under addition of K for each coordinate.)

A linear (n,k,d)-code is a linear code with dimension k and distance d where d is the minimum distance between two distinct vectors of the linear code.

Weights of Codewords Each vector of a code is called a

codeword. The weight of a codeword is the

number of 1’s in the codeword. E.g. wt(101011) = 4.Proposition. The distance of a linear

code is equal to the minimum weight of a non-zero codeword.

Main TheoremTheorem. A code with distance d can detect d-

1 errors and correct [(d-1)/2] errors.Proof. If u and w are two codewords of

the

code C and d(v,w) < [(d-1)/2], then for each y in C, d(v,y) > d(v,w).

u v

w

Better Codes The length of a codeword determines the “time” of

transmission. The dimension of a linear code shows the information

rate k/n. The distance of a code tells you how many errors

which can be detected (or corrected). The bits which are not information bits are parity

check bits. (n-k) A(n,d) is the maximum number of words of length n

such that the distance between two words is at least d. A code C is (n,d)-optimal if C has A(n,d) codewords. (A[n,d] for linear codes.)

The most Important Problem in Coding Theory

Given two positive integers n and d where d < n, determine A(n,d) and A[n,d].

A(7,3) <= 27 / (1+7) = 16 (Sphere packing bound).

A(7,3) = 16. (By direct constructions.)

Two Constructions Use a Steiner triple system of order 7.{1,2,4}, {2,3,5}, {3,4,6}, {4,5,7}, {5,6,1}, {6,7,2},

{7,1,3}.1101000 0010111 00000000110100 1001011 11111110011010 11001010001101 11100101000110 01110010100011 10111001010001 0101110

Parity Check Matrix The code we plan to construct is a linear

code of dimension 4. By using a 7x3 matrix H of rank 3, we

conclude that the set of vectors v satisfies vH = 0 form a linear subspace of K7 with dimension 4.

0 0 1 0 1 1 1 Let Ht = 0 1 0 1 0 1 1 1 0 1 1 1 0 1

BCH Codes BCH represents Bose, Chaudhuri and

Hocquengham. The code we just construct is a 1-error

correcting BCH code. Since no two rows (vectors) are the same, a

nonzero vector v satisfies vH = 0 has weight at least 3. Hence the distance of the code is 3 (there are 3 rows which are dependent).

The rows of H can be considered as the set of all non-zero elements of GF(23).

A different Point of View Kn can be viewed as the set of all

polynomials of degree at most n-1 with coefficients in K.

Let Rn = K[x]/(xn+1) (xn = 1). Then Rn with polynomial addition and multiplication is a ring.

If f(x) is a divisor of xn+1, then the set of all multiples of f(x) is a linear (cyclic) code of dimension n – deg(f(x)).

Quiz Consider R7.

x7+1 = (1+x)(1+x+x3)(1+x2+x3) (?)(Hint: 1 = -1, (1+x)2 = 1+x2.) The set of all polynomials in R7 which

are multiples of 1+x+x3 forms a linear code with 16 codewords. This is “essentially the same” code as constructed above.

Reed-Solomen Codes Instead of using K = GF(2), we shall use K =

GF(q) where q is a prime power. (It is well known that a finite field of order q exists.) So, the codewords are vectors with coordinates from GF(q). The one used in CD is letting q = 28.

An RS(2r,d)-code is a linear cyclic (2r-1,2r-d,d)-code over GF(q) generated by (x+bm+1)(x+bm+2)…(x+bm+d-1) where q = 2r, m is a nonnegative integer and b is a primitive element of GF(q).

•Design of Compact Discs

(Key Contributions)

1948, C.E. Shannon publishes “A mathematical theory of communication.

1950, R.W. Hamming publishes “Information about error detection/correction codes.

1958, Invention of laser. 1960, Start experiments of computer

music.

Story- Continued 1960, I.S. Reed and G. Solomen constructed

Reed-Solomen codes. 1969, Klass Copaan, a Dutch physicist comes

up with the idea for compact disc. 1970, Klass complete a glass disc prototype

and decide to use laser. 1978, Philips releases the video disc player

and type of laser selected for CD players. 1980, CD standard proposed by Philips and

Sony. 1982, Philips and Sony both have products

ready to go.

Keep Going 1983, 30,000 CD players sold in U.S. and

800,000 CD’s sold in U.S. 1984, Portable CD players (Sony DiscMan)

sold. 1985, CD-ROM drives hit the computer market. 1990, 9.2 millions players sold in U.S. only and

about one billion CD’s sold in the world. 1997, DVD released. DVD players/movies hit

consumer market. Now, we can not live without it.

A Brief Overview Data storage in CD format is not simple.

Typically, a user pictures the "1’s" and "0’s" in the memory of the computer as being directly transferred to "pits" and "bumps" on the CD disk.

To begin with the incoming data is subjected to a series of coding operations. These coding operations add a number of additional parity bits to the data for error detection and correction purposes. The data is also subject to an interleaving process .

Concealment(隱藏 ) Interpolation(添寫 ): In this technique, some “average” is

constructed using the valid data around an error. This average is then substituted in for the erroneous data. Since most music (with the possible exception of heavy metal!) is continuous -- this method works well for concealing relatively short errors.

Muting(消音 ): Muting is a last ditch technique -- as it effectively creates a brief period of silence in the audio train. However, it is not effective to simply set all the binary digits to zero --as this produces exactly the click that we are trying to avoid! Instead, the volume is faded out(淡出 ) and then back in again to conceal the error.

Error-Correcting Ability CD players use parity and interleaving

techniques to minimize the effects of an error on the disk. Theoretically, the combination of parity and interleaving in a CD player can detect and correct a burst error of up to 4000 bad bits -- or a physical defect 2.47 mm long. Interpolation can conceal errors up to 13,700 or physical defects up to 8.5 mm long. (Burst-error-correcting codes)

EFM modulation

EFM means Eight to Fourteen Modulation and is an incredibly clever way of reducing errors. The idea is to minimize the number of 0 to 1 and 1 to 0 transitions(臨時轉調 )-- thus avoiding small pits. In EFM only those combinations of bits are used in which more than two but less than 10 zeros appear continuously.

E.g. 0000 1010 EFM 10010001000000.

Figure 2

Figure 4

Encoding The original musical signal is a waveform in time. A

sample of this waveform in time is taken and "digitized" into two 16-bit words, one for the left channel and one for the right channel.

For example, a single sample of the musical signal might look like:

L1 = 0111 0000 1010 1000 R1 = 1100 0111 1010 1000 Six samples (six of the left and six of the right for a

total of twelve) are taken to form a frame such as L1 R1 L2 R2 L3 R3 L4 R4 L5 R5 L6 R6.

Sound has 216 Levels The frame is then encoded in the form of 8-bit words.

Each 16-bit audio signal turns into two 8-bit words, such as

L1 左 L1 右R1 左R1 右L2 左L2 右R2 左R2 右L3 左L3 右R3 左R3 右

L4 左L4 右R4 左R4 右L5 左L5 右R5 左R5 右L6 左L6 右R6 左R6 右

This gives a grand total of 24 8-bit words. ((L,R) produces stereo effects and one second has 44,100 ticks.)

The even words are then delayed by two blocks and the resulting "word" scrambled.

This delay and scramble is the first part of the interleaving process.

RS codes Show Up! Encoded by C(227):(28,24,5)-RS: The resulting 24 byte word (remember, it has an

included two block delay -- so some symbols in this word are from blocks two blocks behind) has 4 bytes of parity added. This particular parity is called "Q" parity. Parity errors found in this part of the algorithm are called C1 errors. More on the Q parity later.

4-frame delay interleaved: Now, the resulting 24 + 4Q = 28 bytes word is

interleaved. Each of the 28 bytes is delayed by a different period. Each period is an integral multiple of 4 blocks. So the first byte might be delayed by 4 blocks, the second by 8 blocks, the third by 12 blocks and so on. The interleaving spreads the word over a total of 28 x 4 = 112 blocks

Another RS code Encoded by C(223):(32,28,5)-RS: The resulting 28 byte words are again subjected to a parity

operation. This generates four more parity bytes called P bytes which are placed at the end of the 28 bit data word. The word is now a total of 28 + 4 = 32 bytes long. Parity errors found in this part of the algorithm are called C2 errors.

Finally, another odd-even delay is performed -- but this time delay by just a single block. Both the P and Q parity bits are inverted (turning the "1’s" into "0’s") to assist data readout during muting.

EFM A subcode of length 8 is then added to the front end

of the word. The subcode specifies the total number of selections on the disk, their length, and so on.

Next, the data-words are converted to EFM format. EFM means Eight to Fourteen Modulation and is an incredibly clever way of reducing errors. The idea is to minimize the number of 0 to 1 and 1 to 0 transitions -- thus avoiding small pits. In EFM only those combinations of bits are used in which more than two but less than 10 zeros appear continuously.

Encode the Sound Each frame finally has a 24-bit synchronization word attached to the

very front end -- (just for completeness the word is (100000000001000000000010) and each group of 14 symbols is then coupled by three merged bits.

SO! The final frame (which started at 6*16*2 = 192 data bits) now contains:

1 sync word 24 bits 1 subcode signal 14 bits 6*2*2*14 data bits 336 bits (14 comes from

8) 8*14 parity bits 112 bits 34*3 merge bits 102 bits GRAND TOTAL 588 bits.

Music:

來自淡江的鼓勵

Final Words

多運動 ,身體好 !

多唸數學 ,頭腦好 !

You are lucky!

coding theory and its applications 編碼及其應用 hung-lin fu dept. of applied mathematics...

Documents