bai_10-huffman (nen du lieu)
TRANSCRIPT
Data Structure & Algorithm - Nguyen Tri Tuan - Khoa CNTT H KHTN Tp.HCM
1
NN D LI U NN HUFFMANBi gi ng C u trc d li u & Gi i thu t
2
N i dung trnh byGi i thi u nn d li u Gi i thu t nn RLE Static Huffman (Nn Huffman tnh) Adaptive Huffman (Nn Huffman ng)
Gi i thi u3
Cc thu t ng th ng dngData Compression Lossless Compression Lossy Compression Encoding Decoding Run / Run Length RLE, Arithmetic, Huffman, LZ77, LZ78
Gi i thi u (tt)4
M c ch c a nn d li u:Gi m kch th c d li uKhi lu tr Khi truy n d li u
Tng tnh b o m t
Gi i thi u (tt)5
C hai hnh th c nn:Nn b o ton thng tin (Lossless Compression):Khng m t mt thng tin nguyn th y Hi u su t nn khng cao: 10% 60% Cc gi i thu t tiu bi u: RLE, Arithmetic, Huffman, LZ77, LZ78,
Nn khng b o ton thng tin (Lossy Compression):Thng tin nguyn th y b m t mt Hi u su t nn cao 40% 90% Cc gi i thu t tiu bi u: JPEG, MP3, MP4,
nh ngha6
Hi u su t nn (%):T l % kch th c d li u gi m c sau khi p d ng thu t ton nn
D (%) = (N M)/N*100D: hi u su t nn N: kch th c data tr c khi nn M: kch th c data sau khi nn
Hi u su t nn ty thu cPhng php nn
c trng c a d li u
ng d ng7
Nn t p tin:Dng khi c n Backup, Restore, d li u Dng cc thu t ton nn b o ton thng tin Khng quan tm n nh d ng (format) c a t p tin Cc ph n m m: PKzip, WinZip, WinRar,
Gi i thu t nn RLE8
RLE = Run Length Encoding: m ho theo di l p l i c a d li u T t ng Hnh th c bi u di n thng tin d th a n gi n: ng ch y (run) l dy cc k t l p l i lin ti p ng ch y c bi u di n ng n g n: Khi di ng ch y l n Ti t ki m ng k V dData = AAAABBBBBBBBCCCCCCCCCCDEE (# 25 bytes) Datann = 4A8B10C1D2E (# 10 bytes)
Gi i thu t nn RLE9
T t ngKhi v n d ng th c t , c n c bi n php x l tr ng h p ph n tc d ng i v i cc run 1 k t X (# 1 bytes) 1X (# 2 bytes) trnh c bi t
10
Gi i thu t nn HuffmanGi i thi u Huffman tnh (Static Huffman) Huffman ng (Adaptive Huffman)
Gi i thu t Huffman Gi i thi u11
Hnh thnhV n :M t gi i thu t nn b o ton thng tin; Khng ph thu c vo tnh ch t c a d li u; ng d ng r ng ri trn b t k d li u no, v i hi u su t t t
Gi i thu t Huffman Gi i thi u12
T t ng chnh Phng php c: dng 1 dy c di n 1 k t Huffman:
nh (8 bits)
bi u
S d ng vi bits bi u di n 1 k t (g i l m bit bits code) di m bit cho cc k t khng gi ng nhau: K t xu t hi n nhi u l n bi u di n b ng m ng n; K t xu t hi n t bi u di n b ng m di M ha b ng m c di thay i (Variable Length Encoding)
David Huffman 1952: tm ra phng php xc m t i u trn d li u tnh
nh
Gi i thu t Huffman Gi i thi u13
Gi s c d li u nh sauf = ADDAABBCCBAAABBCCCBBBCDAADDEEAA
Bi u di n bnh th ng (8 bits/k t ):Sizeof(f) = 10*8 + 8*8 + 6*8 + 5*8 + 2*8 = 248 bitsK t A B C D E S l n xu t hi n trong file f 10 8 6 5 2
Gi i thu t Huffman Gi i thi u14
Bi u di n b ng m c
di thay
i (theo b ng):
Sizeof(f) = 10*2 + 8*2 + 6*2 + 5*3 + 2*3 = 69 bitsK t A B C D E M 11 10 00 011 010
15
Static HuffmanThu t ton nn T o cy Huffman Pht sinh b ng m bit Lu tr thng tin dng Thu t ton gi i nn
gi i nn
Static Huffman Thu t ton16
Thu t ton nn:[b1] Duy t file L p b ng th ng k s l n xu t hi n c a m i lo i k t [b2] Pht sinh cy Huffman d a vo b ng th ng k [b3] T cy Huffman pht sinh b ng m bit cho cc k t [b4] Duy t file Thay th cc k t b ng m bit tng ng [b5] Lu l i thng tin c a cy Huffman dng gi i nn
Static Huffman Thu t ton17
f = ADDAABBCCBAAABBCCCBBBCDAADDEEAA [b1] 1Level 1CEDBA
K t A B C D E K t A B C D E
31 3 BA 181
Root node
Level 2
2 CED 020
[b2]
S l n xu t hi n 10 8 6 5 2 M bit 11 10 00 011 010
Level 3
4 C 060
5 ED 071
6 B 080
7 A 101
[b3] [b4]
Level 4
8 E 020
9 D 051
NodeCharacter Frequency Code
f = 11011011111110100000101111111010000000 1010100001111110110110100101111
Static Huffman Cy Huffman18
T o cy HuffmanM t cy Huffman: m Huffman c bi u di n b ng 1 cy nh phn1 M i nt l ch a 1 k t Level CEDBA 31 1 Nt cha s ch a cc k t c a Root node nh ng nt con 2 3 M i nt c gn m t tr ng s : Level CED 02 BA 18 Nt l c tr ng s b ng s 2 0 1 l n xu t hi n c a k t trong file 4 5 6 7 B 08 A 10 Nt cha c tr ng s b ng Level C 06 ED 07 3 0 1 0 1 t ng tr ng s c a cc nt con Node 8 9 Character Frequency Level E 02 D 05 Code40 1
Static Huffman Thu t ton19
T o cy Huffman:Tnh cy Huffman:Nhnh tri tng ng v i m ho bit 0; nhnh ph i tng ng v i m ho bit 1 Cc nt c t n s th p n m xa g c m bit di Cc nt c t n s cao n m g n g c m bit ng n S nt c a cy: (2n-1)
Static Huffman D li u20
// C u trc d li u lu tr cy Huffman #define MAX_NODES 511 // 2*256 - 1 typedef struct { char c; // k t long nFreq; // tr ng s int nLeft; // cy con tri int nRight; // cy con ph i } HUFFNode; HUFFNode HuffTree[MAX_NODES];
Static Huffman Pht sinh cy21
T o cy Huffman:Thu t ton pht sinh cy: [b1] Ch n trong b ng th ng k 2 ph n t x,y c tr ng s th p nh t t o thnh nt cha z:z.c = x.c + y.c; z.nFreq = x.nFreq + y.nFreq; z.nLeft = x (*) z.nRight = y (*)
[b2] Lo i b nt x v y kh i b ng; [b3] Thm nt z vo b ng; [b4] L p l i b c [b1] - [b3] cho n khi ch cn l i 1 nt duy nh t trong b ng
Static Huffman Pht sinh cy22
(*) Qui c: nt c tr ng s nh n m bn nhnh tri; nt c tr ng s l n n m bn nhnh ph i; n u tr ng s b ng nhau, nt c k t nh n m bn nhnh tri; nt c k t l n n m bn nhnh ph i
Static Huffman V d23
K t A B C D E
SLXH 10 8 6 5 2
ED 071
E 020
D 051
K t A B ED C
SLXH 10 8 7 6
CED 130
C 060
ED 071
CEDBA 311
K t BA CED1
SLXH 18 130
BA 181
CED 130
BA 18
B 08
A 101
K t CED A B
SLXH 13 10 8
Minh h a qu trnh t o cy
Static Huffman V d24
1Level 1CEDBA
31 3 BA 181
Root node
Level 2
2 CED 020
Level 3
4 C 060
5 ED 071
6 B 080
7 A 101
Level 4
8 E 020
9 D 051
NodeCharacter Frequency Code
Cy Huffman sau khi t o
Static Huffman Pht sinh m25
Pht sinh m bit cho cc k t :M c a m i k t c t o b ng cch duy t t nt g c nt l ch a k t ; Khi duy t sang tri, t o ra 1 bit 0; Khi duy t sang ph i, t o ra 1 bit 1; n
Static Huffman Pht sinh m26
Pht sinh m bit cho cc k t :Level 1
1CEDBA
31 3 BA 181
K t A B C D E
M bit 11 10 00 011 010
Root node
Level 2
2 CED 130
4 Level C 06 30
5 ED 071
6 B 080
7 A 101
Level 4
8 E 020
9 D 051
NodeCharacter Frequency Code
Static Huffman Lu tr27
Lu tr thng tin dngP. Php 1: lu b ng m bitK t A B C D E M bit 11 10 00 011 010
gi i nnP. Php 2: lu s l n xu t hi nK t A B C D E S l n xu t hi n 10 8 6 5 2
Static Huffman Gi i nn28
Thu t ton gi i nn:[b1] Xy d ng l i cy Huffman (t thng tin c lu) [b2] Kh i t o nt hi n hnh pCurr = pRoot [b3] c 1 bit b t file nn fn [b4] N u (b==0) th pCurr = pCurr.nLeft ng c l i pCurr = pCurr.nRight [b5] N u pCurr l nt l th:- Xu t k t t i pCurr ra file
- Quay l i b c [b2]
ng c l i- Quay l i b c [b3]
[b6] Thu t ton s d ng khi h t file fn
Static Huffman Gi i nn29
0
1
0 E 0 A
1
0 T 1 S 0 N
1
1 O
30
Adaptive HuffmanGi i thi u t ng Cy Huffman ng Thu t ton nn (Encoding) Thu t ton gi i nn (Decoding)
Adaptive Huffman Gi i thi u31
H n ch c a Huffman tnh:C n duy t file 2 l n khi nn chi ph cao C n ph i lu tr thng tin gi i nn tng kch th c d li u nn D li u c n nn ph i c s n khng nn c trn d li u pht sinh theo th i gian th c (online)
Adaptive Huffman u i m32
Khng c n tnh tr c s l n xu t hi n c a cc k t Qu trnh nn: ch c n 1 l n duy t file Khng c n lu thng tin ph c v cho vi c gi i nn Nn on-line: trn d li u pht sinh theo th i gian th c
Adaptive Huffman t ng33
Huffman tnh: cy Huffman c t o thnh t b ng th ng k s l n xu t hi n c a cc k t Huffman ng:
Nn on-line khng c tr c b ng th ng k T o cy nh th no ? Phng php: kh i t o cy t i thi u ban u v c p nh t cy d n d n (~ thch nghi Adaptive) d a trn d li u pht sinh trong qu trnh nn/gi i nn
Adaptive Huffman t ng34
Qu trnh nn/gi i nn c a Adaptive Huffman s c th c hi n cng lc v i qu trnh c p nh t cy. Nn Adaptive Huffman:Kh i t o cy c k t u vo Nn k t v c p nh t cy
Gi i nn Adaptive Huffman:Kh i t o cy c k t t d li u nn Gi i nn v c p nh t cy
35
Adaptive Huffman Cy Huffman ngM t cy nh phn c n nt l c g i l cy Huffman n u th a:Cc nt l c tr ng s Wi >= 0, i [1..n] Cc nt nhnh c tr ng s b ng t ng tr ng s cc nt con c a n Tnh ch t Anh/Em (Sibling Property):M i nt, ngo i tr nt g c, u t n t i 1 nt anh/em (c cng nt cha) Khi s p x p cc nt trong cy theo th t tng d n c a tr ng s th m i nt lun k v i nt anh/em c a n
36
Adaptive Huffman Cy Huffman ng (tt)thu n l i khi t o cy, ta quy c:Cc nt s c gn m t s th t gi m d n. Cc nt m i c thm vo lun c s th t nh hn nh ng node ang c trn cy. Trong qu trnh t o cy, tnh anh em ph i c b o ton: nh ng nt c s th t l n ph i c tr ng s >= nh ng nt c tr ng s nh N u khng vi ph m c n i u ch nh
Adaptive Huffman T o cy37
Cch th c t o cy:Kh i t o cy t i thi u, ch c nt Escape (0-node) hay nt NYT (Not yet transmitted): nt khng con, khng k t . C p nh t 1 k t c vo cy:N u c cha c trong cy thm m i nt l N u c c trong cy tng tr ng s nt c ln 1 (+1) C p nh t tr ng s c a cc nt lin quan trong cy
Adaptive Huffman - Thm nt l38
Khi thm nt l cho k t c:T o 2 nt: m t nt cho k t c (W = 1) v m t nt Escape m i (W = 0) Hai nt m i c thm vo lm con c a nt Escape hi n t i Ki m tra vi ph m tnh ch t anh em v C p nh t tr ng s cc nt c lin quan
Adaptive Huffman V d39
Kh i t o cy: Thm A:Nn A C p nh t cy
ESC W = 0; #50
D li u nn
D li u nn W = 1; #50 A
ESC W = 0; #48
A W = 1; #49
40
Adaptive Huffman C p nh t tr ng sB t u t nt m i thm ho c c c p nh t i ng c ln n nt g c v tng tr ng s cc nt ln 1 Ki m s vi ph m tnh ch t anh em c a cc nt v c p nh t
Adaptive Huffman V d (tt)41
D li u nn W = 1; #50 A
ESC W = 0; #48
A W = 1; #49 D li u nn
Thm B:Nn B C p nh t cyW = 2; #50 A W = 1; #49
A0B
W = 1; #48 ESC W = 0; #46
B W = 1; #47
42
Adaptive Huffman Ki m tra tnh ch t anh emTnh ch t anh em vi ph m khi:T n t i m t nt X c tr ng s W + 1 c s th t nh hn m t nt c Y tr ng s W
i u ch nh:i ch (d li u) nt X c hi n t i v i nt c s th t l n nh t c tr ng s W. C p nh t tr ng s cc nt c lin quan, ki m tra lan truy n n cc nt khc.
Adaptive Huffman V d (tt)43
Thm B
D li u nn A0B01
W = 3; #50
W = 3; #50
W = 2; #48 ESC W = 0; #46
A W = 1; #49
W = 1; #48 ESC W = 0; #46
B W = 2; #49
B W = 2; #47
A W = 1; #47
Adaptive Huffman V d (tt)44
Thm CCW = 5; #50
D li u nn A0B0100C001 W = 5; #50
W = 3; #48
B W = 2; #49
W = 3; #48
B W = 2; #49
W = 2; #46 ESC W = 0; #44
A W = 1; #47
W = 1; #46 ESC W = 0; #44
C W = 2; #47
C W = 2; #45
A W = 1; #45
Adaptive Huffman V d (tt)45
i u ch nh cyW = 5; #50
B W = 2; #48
W = 3; #49 C W = 2; #47
W = 1; #46 ESC W = 0; #44
A W = 1; #45
Adaptive Huffman V d (tt)46
Thm AA:D li u nn ? Cy Huffman ?
Adaptive Huffman Gi i nn47
Kh i t o cy c l n l t cc bit t d li u nn:Duy t cy Huffman gi i nn k tN u l k t Escape, xu t k t khng nn (8 bit) k ti p N u l k t th ng, xu t k t tng ng
C p nh t cy Huffman v i k t v a c (tng t qu trnh nn).