bai_10-huffman (nen du lieu)

Upload: so-sad

Post on 19-Jul-2015

233 views

Category:

Documents


0 download

TRANSCRIPT

Data Structure & Algorithm - Nguyen Tri Tuan - Khoa CNTT H KHTN Tp.HCM

1

NN D LI U NN HUFFMANBi gi ng C u trc d li u & Gi i thu t

2

N i dung trnh byGi i thi u nn d li u Gi i thu t nn RLE Static Huffman (Nn Huffman tnh) Adaptive Huffman (Nn Huffman ng)

Gi i thi u3

Cc thu t ng th ng dngData Compression Lossless Compression Lossy Compression Encoding Decoding Run / Run Length RLE, Arithmetic, Huffman, LZ77, LZ78

Gi i thi u (tt)4

M c ch c a nn d li u:Gi m kch th c d li uKhi lu tr Khi truy n d li u

Tng tnh b o m t

Gi i thi u (tt)5

C hai hnh th c nn:Nn b o ton thng tin (Lossless Compression):Khng m t mt thng tin nguyn th y Hi u su t nn khng cao: 10% 60% Cc gi i thu t tiu bi u: RLE, Arithmetic, Huffman, LZ77, LZ78,

Nn khng b o ton thng tin (Lossy Compression):Thng tin nguyn th y b m t mt Hi u su t nn cao 40% 90% Cc gi i thu t tiu bi u: JPEG, MP3, MP4,

nh ngha6

Hi u su t nn (%):T l % kch th c d li u gi m c sau khi p d ng thu t ton nn

D (%) = (N M)/N*100D: hi u su t nn N: kch th c data tr c khi nn M: kch th c data sau khi nn

Hi u su t nn ty thu cPhng php nn

c trng c a d li u

ng d ng7

Nn t p tin:Dng khi c n Backup, Restore, d li u Dng cc thu t ton nn b o ton thng tin Khng quan tm n nh d ng (format) c a t p tin Cc ph n m m: PKzip, WinZip, WinRar,

Gi i thu t nn RLE8

RLE = Run Length Encoding: m ho theo di l p l i c a d li u T t ng Hnh th c bi u di n thng tin d th a n gi n: ng ch y (run) l dy cc k t l p l i lin ti p ng ch y c bi u di n ng n g n: Khi di ng ch y l n Ti t ki m ng k V dData = AAAABBBBBBBBCCCCCCCCCCDEE (# 25 bytes) Datann = 4A8B10C1D2E (# 10 bytes)

Gi i thu t nn RLE9

T t ngKhi v n d ng th c t , c n c bi n php x l tr ng h p ph n tc d ng i v i cc run 1 k t X (# 1 bytes) 1X (# 2 bytes) trnh c bi t

10

Gi i thu t nn HuffmanGi i thi u Huffman tnh (Static Huffman) Huffman ng (Adaptive Huffman)

Gi i thu t Huffman Gi i thi u11

Hnh thnhV n :M t gi i thu t nn b o ton thng tin; Khng ph thu c vo tnh ch t c a d li u; ng d ng r ng ri trn b t k d li u no, v i hi u su t t t

Gi i thu t Huffman Gi i thi u12

T t ng chnh Phng php c: dng 1 dy c di n 1 k t Huffman:

nh (8 bits)

bi u

S d ng vi bits bi u di n 1 k t (g i l m bit bits code) di m bit cho cc k t khng gi ng nhau: K t xu t hi n nhi u l n bi u di n b ng m ng n; K t xu t hi n t bi u di n b ng m di M ha b ng m c di thay i (Variable Length Encoding)

David Huffman 1952: tm ra phng php xc m t i u trn d li u tnh

nh

Gi i thu t Huffman Gi i thi u13

Gi s c d li u nh sauf = ADDAABBCCBAAABBCCCBBBCDAADDEEAA

Bi u di n bnh th ng (8 bits/k t ):Sizeof(f) = 10*8 + 8*8 + 6*8 + 5*8 + 2*8 = 248 bitsK t A B C D E S l n xu t hi n trong file f 10 8 6 5 2

Gi i thu t Huffman Gi i thi u14

Bi u di n b ng m c

di thay

i (theo b ng):

Sizeof(f) = 10*2 + 8*2 + 6*2 + 5*3 + 2*3 = 69 bitsK t A B C D E M 11 10 00 011 010

15

Static HuffmanThu t ton nn T o cy Huffman Pht sinh b ng m bit Lu tr thng tin dng Thu t ton gi i nn

gi i nn

Static Huffman Thu t ton16

Thu t ton nn:[b1] Duy t file L p b ng th ng k s l n xu t hi n c a m i lo i k t [b2] Pht sinh cy Huffman d a vo b ng th ng k [b3] T cy Huffman pht sinh b ng m bit cho cc k t [b4] Duy t file Thay th cc k t b ng m bit tng ng [b5] Lu l i thng tin c a cy Huffman dng gi i nn

Static Huffman Thu t ton17

f = ADDAABBCCBAAABBCCCBBBCDAADDEEAA [b1] 1Level 1CEDBA

K t A B C D E K t A B C D E

31 3 BA 181

Root node

Level 2

2 CED 020

[b2]

S l n xu t hi n 10 8 6 5 2 M bit 11 10 00 011 010

Level 3

4 C 060

5 ED 071

6 B 080

7 A 101

[b3] [b4]

Level 4

8 E 020

9 D 051

NodeCharacter Frequency Code

f = 11011011111110100000101111111010000000 1010100001111110110110100101111

Static Huffman Cy Huffman18

T o cy HuffmanM t cy Huffman: m Huffman c bi u di n b ng 1 cy nh phn1 M i nt l ch a 1 k t Level CEDBA 31 1 Nt cha s ch a cc k t c a Root node nh ng nt con 2 3 M i nt c gn m t tr ng s : Level CED 02 BA 18 Nt l c tr ng s b ng s 2 0 1 l n xu t hi n c a k t trong file 4 5 6 7 B 08 A 10 Nt cha c tr ng s b ng Level C 06 ED 07 3 0 1 0 1 t ng tr ng s c a cc nt con Node 8 9 Character Frequency Level E 02 D 05 Code40 1

Static Huffman Thu t ton19

T o cy Huffman:Tnh cy Huffman:Nhnh tri tng ng v i m ho bit 0; nhnh ph i tng ng v i m ho bit 1 Cc nt c t n s th p n m xa g c m bit di Cc nt c t n s cao n m g n g c m bit ng n S nt c a cy: (2n-1)

Static Huffman D li u20

// C u trc d li u lu tr cy Huffman #define MAX_NODES 511 // 2*256 - 1 typedef struct { char c; // k t long nFreq; // tr ng s int nLeft; // cy con tri int nRight; // cy con ph i } HUFFNode; HUFFNode HuffTree[MAX_NODES];

Static Huffman Pht sinh cy21

T o cy Huffman:Thu t ton pht sinh cy: [b1] Ch n trong b ng th ng k 2 ph n t x,y c tr ng s th p nh t t o thnh nt cha z:z.c = x.c + y.c; z.nFreq = x.nFreq + y.nFreq; z.nLeft = x (*) z.nRight = y (*)

[b2] Lo i b nt x v y kh i b ng; [b3] Thm nt z vo b ng; [b4] L p l i b c [b1] - [b3] cho n khi ch cn l i 1 nt duy nh t trong b ng

Static Huffman Pht sinh cy22

(*) Qui c: nt c tr ng s nh n m bn nhnh tri; nt c tr ng s l n n m bn nhnh ph i; n u tr ng s b ng nhau, nt c k t nh n m bn nhnh tri; nt c k t l n n m bn nhnh ph i

Static Huffman V d23

K t A B C D E

SLXH 10 8 6 5 2

ED 071

E 020

D 051

K t A B ED C

SLXH 10 8 7 6

CED 130

C 060

ED 071

CEDBA 311

K t BA CED1

SLXH 18 130

BA 181

CED 130

BA 18

B 08

A 101

K t CED A B

SLXH 13 10 8

Minh h a qu trnh t o cy

Static Huffman V d24

1Level 1CEDBA

31 3 BA 181

Root node

Level 2

2 CED 020

Level 3

4 C 060

5 ED 071

6 B 080

7 A 101

Level 4

8 E 020

9 D 051

NodeCharacter Frequency Code

Cy Huffman sau khi t o

Static Huffman Pht sinh m25

Pht sinh m bit cho cc k t :M c a m i k t c t o b ng cch duy t t nt g c nt l ch a k t ; Khi duy t sang tri, t o ra 1 bit 0; Khi duy t sang ph i, t o ra 1 bit 1; n

Static Huffman Pht sinh m26

Pht sinh m bit cho cc k t :Level 1

1CEDBA

31 3 BA 181

K t A B C D E

M bit 11 10 00 011 010

Root node

Level 2

2 CED 130

4 Level C 06 30

5 ED 071

6 B 080

7 A 101

Level 4

8 E 020

9 D 051

NodeCharacter Frequency Code

Static Huffman Lu tr27

Lu tr thng tin dngP. Php 1: lu b ng m bitK t A B C D E M bit 11 10 00 011 010

gi i nnP. Php 2: lu s l n xu t hi nK t A B C D E S l n xu t hi n 10 8 6 5 2

Static Huffman Gi i nn28

Thu t ton gi i nn:[b1] Xy d ng l i cy Huffman (t thng tin c lu) [b2] Kh i t o nt hi n hnh pCurr = pRoot [b3] c 1 bit b t file nn fn [b4] N u (b==0) th pCurr = pCurr.nLeft ng c l i pCurr = pCurr.nRight [b5] N u pCurr l nt l th:- Xu t k t t i pCurr ra file

- Quay l i b c [b2]

ng c l i- Quay l i b c [b3]

[b6] Thu t ton s d ng khi h t file fn

Static Huffman Gi i nn29

0

1

0 E 0 A

1

0 T 1 S 0 N

1

1 O

30

Adaptive HuffmanGi i thi u t ng Cy Huffman ng Thu t ton nn (Encoding) Thu t ton gi i nn (Decoding)

Adaptive Huffman Gi i thi u31

H n ch c a Huffman tnh:C n duy t file 2 l n khi nn chi ph cao C n ph i lu tr thng tin gi i nn tng kch th c d li u nn D li u c n nn ph i c s n khng nn c trn d li u pht sinh theo th i gian th c (online)

Adaptive Huffman u i m32

Khng c n tnh tr c s l n xu t hi n c a cc k t Qu trnh nn: ch c n 1 l n duy t file Khng c n lu thng tin ph c v cho vi c gi i nn Nn on-line: trn d li u pht sinh theo th i gian th c

Adaptive Huffman t ng33

Huffman tnh: cy Huffman c t o thnh t b ng th ng k s l n xu t hi n c a cc k t Huffman ng:

Nn on-line khng c tr c b ng th ng k T o cy nh th no ? Phng php: kh i t o cy t i thi u ban u v c p nh t cy d n d n (~ thch nghi Adaptive) d a trn d li u pht sinh trong qu trnh nn/gi i nn

Adaptive Huffman t ng34

Qu trnh nn/gi i nn c a Adaptive Huffman s c th c hi n cng lc v i qu trnh c p nh t cy. Nn Adaptive Huffman:Kh i t o cy c k t u vo Nn k t v c p nh t cy

Gi i nn Adaptive Huffman:Kh i t o cy c k t t d li u nn Gi i nn v c p nh t cy

35

Adaptive Huffman Cy Huffman ngM t cy nh phn c n nt l c g i l cy Huffman n u th a:Cc nt l c tr ng s Wi >= 0, i [1..n] Cc nt nhnh c tr ng s b ng t ng tr ng s cc nt con c a n Tnh ch t Anh/Em (Sibling Property):M i nt, ngo i tr nt g c, u t n t i 1 nt anh/em (c cng nt cha) Khi s p x p cc nt trong cy theo th t tng d n c a tr ng s th m i nt lun k v i nt anh/em c a n

36

Adaptive Huffman Cy Huffman ng (tt)thu n l i khi t o cy, ta quy c:Cc nt s c gn m t s th t gi m d n. Cc nt m i c thm vo lun c s th t nh hn nh ng node ang c trn cy. Trong qu trnh t o cy, tnh anh em ph i c b o ton: nh ng nt c s th t l n ph i c tr ng s >= nh ng nt c tr ng s nh N u khng vi ph m c n i u ch nh

Adaptive Huffman T o cy37

Cch th c t o cy:Kh i t o cy t i thi u, ch c nt Escape (0-node) hay nt NYT (Not yet transmitted): nt khng con, khng k t . C p nh t 1 k t c vo cy:N u c cha c trong cy thm m i nt l N u c c trong cy tng tr ng s nt c ln 1 (+1) C p nh t tr ng s c a cc nt lin quan trong cy

Adaptive Huffman - Thm nt l38

Khi thm nt l cho k t c:T o 2 nt: m t nt cho k t c (W = 1) v m t nt Escape m i (W = 0) Hai nt m i c thm vo lm con c a nt Escape hi n t i Ki m tra vi ph m tnh ch t anh em v C p nh t tr ng s cc nt c lin quan

Adaptive Huffman V d39

Kh i t o cy: Thm A:Nn A C p nh t cy

ESC W = 0; #50

D li u nn

D li u nn W = 1; #50 A

ESC W = 0; #48

A W = 1; #49

40

Adaptive Huffman C p nh t tr ng sB t u t nt m i thm ho c c c p nh t i ng c ln n nt g c v tng tr ng s cc nt ln 1 Ki m s vi ph m tnh ch t anh em c a cc nt v c p nh t

Adaptive Huffman V d (tt)41

D li u nn W = 1; #50 A

ESC W = 0; #48

A W = 1; #49 D li u nn

Thm B:Nn B C p nh t cyW = 2; #50 A W = 1; #49

A0B

W = 1; #48 ESC W = 0; #46

B W = 1; #47

42

Adaptive Huffman Ki m tra tnh ch t anh emTnh ch t anh em vi ph m khi:T n t i m t nt X c tr ng s W + 1 c s th t nh hn m t nt c Y tr ng s W

i u ch nh:i ch (d li u) nt X c hi n t i v i nt c s th t l n nh t c tr ng s W. C p nh t tr ng s cc nt c lin quan, ki m tra lan truy n n cc nt khc.

Adaptive Huffman V d (tt)43

Thm B

D li u nn A0B01

W = 3; #50

W = 3; #50

W = 2; #48 ESC W = 0; #46

A W = 1; #49

W = 1; #48 ESC W = 0; #46

B W = 2; #49

B W = 2; #47

A W = 1; #47

Adaptive Huffman V d (tt)44

Thm CCW = 5; #50

D li u nn A0B0100C001 W = 5; #50

W = 3; #48

B W = 2; #49

W = 3; #48

B W = 2; #49

W = 2; #46 ESC W = 0; #44

A W = 1; #47

W = 1; #46 ESC W = 0; #44

C W = 2; #47

C W = 2; #45

A W = 1; #45

Adaptive Huffman V d (tt)45

i u ch nh cyW = 5; #50

B W = 2; #48

W = 3; #49 C W = 2; #47

W = 1; #46 ESC W = 0; #44

A W = 1; #45

Adaptive Huffman V d (tt)46

Thm AA:D li u nn ? Cy Huffman ?

Adaptive Huffman Gi i nn47

Kh i t o cy c l n l t cc bit t d li u nn:Duy t cy Huffman gi i nn k tN u l k t Escape, xu t k t khng nn (8 bit) k ti p N u l k t th ng, xu t k t tng ng

C p nh t cy Huffman v i k t v a c (tng t qu trnh nn).