tree-based search focused on red-black tree and trie and some fresh algorithms

39
Tree-based Search Focused on Red-Black Tree and Trie and Some Fresh Algorithms Computer Science & Engineering De partment 9746079 Bok-Youn L ee 9746080 Sun-oh Choi

Upload: bree-kemp

Post on 30-Dec-2015

55 views

Category:

Documents


3 download

DESCRIPTION

Tree-based Search Focused on Red-Black Tree and Trie and Some Fresh Algorithms. Computer Science & Engineering Department 9746079 Bok-Youn Lee 9746080 Sun-oh Choi. INDEX. Tree Historical Overview( 트리 개요 ) 2-3-4 Tree Red-Black Tree Trie Burning Tree Thick Tree - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Tree-based Search Focused on Red-Black Tree and Trie and Some Fresh Algorithms

Tree-based SearchFocused on Red-Black Tree and Trieand Some Fresh Algorithms

Computer Science & Engineering Department 9746079 Bok-Youn Lee

9746080 Sun-oh Choi

Page 2: Tree-based Search Focused on Red-Black Tree and Trie and Some Fresh Algorithms

INDEX

Tree Historical Overview( 트리 개요 ) 2-3-4 Tree Red-Black Tree Trie Burning Tree Thick Tree Conclusion( 결론 )

Page 3: Tree-based Search Focused on Red-Black Tree and Trie and Some Fresh Algorithms

Tree Historical Overview

Binary Search Tree( 이진검색트리 ) AVL Tree 2-3 Tree 2-3-4 Tree Red-Black Tree Trie

Page 4: Tree-based Search Focused on Red-Black Tree and Trie and Some Fresh Algorithms

2-3-4 Tree

특성 - 3 개의 키값과 4 개의 링크를 가짐 - 자료의 개수 N, 트리의 높이 h log4(N) <= h <= log2(N) - 각 외부노드로 가는 경로의 길이 같음 - 검색방법 A B

C

ab c

d

Page 5: Tree-based Search Focused on Red-Black Tree and Trie and Some Fresh Algorithms

2-3-4 Tree

Split( 분할 ) : 삽입시 균형을 유지하기 위해

A B C

a b c d

CA

B

a b c d

a. If the root is 4-node

A B C

a b c d

F

e

B FB F

CA

a b c d

e

b. etc

Page 6: Tree-based Search Focused on Red-Black Tree and Trie and Some Fresh Algorithms

2-3-4 Tree

InsertK

(a) insert K

K L

(b) insert L

C K L

(c) insert C

K

A C L

(d)insertA( 분할 )

K

LA C E

(e) insert E

E F

C K

A L

(f)insert F( 분할 )

Page 7: Tree-based Search Focused on Red-Black Tree and Trie and Some Fresh Algorithms

2-3-4 Tree

Delete

F G

H K

E

C

A D LI

(a) delete G

F

H K

E

C

A D LI

F

H K

E

C

A D LI A

C E

H

K

I LFD

I

K

H

E

A C D F LI

K

H

E

C D F L

(b) delete A

( 빌려오기 & 결합 )

Page 8: Tree-based Search Focused on Red-Black Tree and Trie and Some Fresh Algorithms

2-3-4 Tree

Conclusion - 균형을 유지하고 구현하기 쉬움 - 대부분의 노드가 2 노드로 될때 메모리의 낭비가 있음 - 외부검색인 B- 트리로 발전됨

Page 9: Tree-based Search Focused on Red-Black Tree and Trie and Some Fresh Algorithms

Red-Black Tree

Introduction - 레드블랙트리는 2-3-4 트리의 이진표현 - 검색방법은 이진검색트리와 동일

A A

A B C

A

B A

B

B

CA

A B

Page 10: Tree-based Search Focused on Red-Black Tree and Trie and Some Fresh Algorithms

Red-Black Tree

Color Flip( 색상변환 ) & Rotation( 회전 ) - 2-3-4 트리에서의 분할에 해당함

( LL rotation )

H

B

CA

F

D

B

A C

F

D

H

B

A C

D

F

H

Page 11: Tree-based Search Focused on Red-Black Tree and Trie and Some Fresh Algorithms

Red-Black Tree

InsertK

(a) insert K

K

L

(b) insert L

K

LC

(c) insert C

LC

K

(d) color flip & insert A

LC

K

(e) insert E

A A E

Page 12: Tree-based Search Focused on Red-Black Tree and Trie and Some Fresh Algorithms

Red-Black Tree

Conclusion - 자료의 수를 N, 트리의 높이를 h 라 할

때 h <= 2log2(N+1)

- 2-3-4 트리에 비하여 메모리 절약 - 2-3-4 트리에 비하여 구현이 복잡함

Page 13: Tree-based Search Focused on Red-Black Tree and Trie and Some Fresh Algorithms

Trie Search - 가변크기의 키를 다룰때 유용한 인덱스구조 - 분기노드 (branch node), 원소노드 - 트라이의 레벨 l, 최악의 탐색시간 O(l)

/ // /

/ /// / / /

a b a

● ●●

●● ● ● ●

● ● ● ● ● ● ● ●

a b c a c a a c c b b c c a b c c a c c c

Page 14: Tree-based Search Focused on Red-Black Tree and Trie and Some Fresh Algorithms

Conclusion

자료수 100 만 200 만 300 만 400 만2-3-4 트리 2388 4770 7102 9577

트라이 1452 1450 1462 1450

( 검색횟수 100 만으로 고정 ) ( 단위 : clock)

검색횟수 100 만 200 만 300 만 400 만2-3-4 트리 2388 2368 2363 2365

트라이 1452 2895 4339 5819

( 자료수 100 만으로 고정 ) ( 단위 : clock)

2-3-4 트리는 자료수 증가에 따라 시간이 증가함 . 그러나 트라이는 자료수 증가에

상관없이 검색시간이 일정하다 . 트라이는 가변크기의 인덱스구조에 적합

-> 2-3-4 트리는 검색횟수에 상관없이 검색시간이 일정하다 . 트라이는 검색횟수가

증가할수록 검색시간이 증가한다 . 일반적인 검색은 2-3-4 트리가 더 유용하다 .

Page 15: Tree-based Search Focused on Red-Black Tree and Trie and Some Fresh Algorithms

Burning Tree

착안 - 일반적으로 검색은 특정 주제에 집중돼

발생한다 ( 집중성 ). - 방금 검색된 내용은 빠른 시간 내에 재검색될

수 있다 ( 지역성 ). Cache 를 통해 요청이 집중되는 노드로

빠르게 접근한다 .

Page 16: Tree-based Search Focused on Red-Black Tree and Trie and Some Fresh Algorithms

Burning Tree

Concept diagram of the Burning Tree

subroot

Cache

burning subtree

Page 17: Tree-based Search Focused on Red-Black Tree and Trie and Some Fresh Algorithms

Burning Tree

쟁점 적절한 적중률 계산 메커니즘 ( 시간 정보

반영 ) 검색 이슈가 변화에 따른 Cache 엔트리

동적 관리 추가적인 구현으로 인한 오버헤드 최소화

Page 18: Tree-based Search Focused on Red-Black Tree and Trie and Some Fresh Algorithms

Burning Tree

Hit counting mechanism

Subtree 단위로 카운팅한다 . Subtree 에 속한 노드에서만 카운팅한다 . 한 번 적중 시 1 씩 증가시킨다 . 특정 시점마다 hit count 를 0 으로 초기화시킨다 . Cache 의 엔트리는 hit count 를 초기화시키기

직전에 변경한다 .

Page 19: Tree-based Search Focused on Red-Black Tree and Trie and Some Fresh Algorithms

Burning TreeHit counting mechanism

subroot

HitCountArray4 bytes / entry

Index of HitCountArray

562

42287653

5431332

5598763168

37651223

043

752342

0

000000000000000

Initializing (overwrite)

dummy

Page 20: Tree-based Search Focused on Red-Black Tree and Trie and Some Fresh Algorithms

Burning Tree

Node 와 SubtreeInfo

SubtreeInfo Node

subtreeInfo

hitCountIndex

key

left right maxKey

minKey

subroot

parent

Page 21: Tree-based Search Focused on Red-Black Tree and Trie and Some Fresh Algorithms

Burning Tree

[Model-1] Height of Burning Tree: 20 Height of subroot: 10 Maximum number of cache entries: 5 Minimum hit percentage for caching: 5% State: Tree is completed. Entries of cache are full. Each subtrees’ hit rate in cache entry is 7, 7, 6, 5, 5%.

[Expectation] 7/100(1 + 1/2lnN) + 7/100(2 + 1/2lnN) + 6/100(3 + 1/2lnN) + 5/100(4 + 1/2lnN) + 5/100(5 + 1/2lnN) + 70/100(5 + lnN) = 85/100lnN + 4.34 lnN 20 을 대입하면 : 21.34 (1.34 만큼 더 느려짐 )

Page 22: Tree-based Search Focused on Red-Black Tree and Trie and Some Fresh Algorithms

Burning Tree

문제점 및 결론

적절한 성능을 발휘하기 위해서는 비현실적인 모델이 필요 ( 관련 주제들이 같은 서브 트리에 존재하는가 ? - 범용성 ).

노드 제거 시 subtree 를 유지할 수 있는가 ? 알고리즘 자체가 너무 복잡하지 않은가 ? ( 단순성 )

잠정적으로 이 알고리즘 무효

Page 23: Tree-based Search Focused on Red-Black Tree and Trie and Some Fresh Algorithms

Thick Tree

착안 특정 원소에 검색 요청이 집중되는 상황에서의 성능

향상

특징 각각의 노드가 hit count 유지 이 값을 priority 로 사용하여 hit count 가 높은

노드들을 root 와 가깝게 배치 (Treap 의 아이디어 도용 )

Page 24: Tree-based Search Focused on Red-Black Tree and Trie and Some Fresh Algorithms

Thick Tree

Treap 의 특징 랜덤 넘버를 일반 binary tree 에 삽입하면

대략적인 균형이 잡히는 점을 이용 , key 외에 priority 라는 개념 적용

key 뿐 아니라 priority 에 의해 전체 트리 구성 결정

priority 는 랜덤하게 생성 별도의 balancing 기법 없이 대략적인 균형 유지 key (in order): left child < parent < right child priority (heap order): child < parent

Page 25: Tree-based Search Focused on Red-Black Tree and Trie and Some Fresh Algorithms

Thick Tree

Concept diagram of the Thick Tree

50:15

55:923:8

68:353:734:510:5

81:059:054:651:244:330:111:08:2

Node

hit count

left right

key

Page 26: Tree-based Search Focused on Red-Black Tree and Trie and Some Fresh Algorithms

Thick Tree

Searching & Hit Counting Binary Search Tree 와 동일한 검색 절차 한 번 적중 시 1 씩 증가 검색에 의한 회전 발생 시 (child.hit > parent.hit + a ) 해당 서브

트리 전체의 hit-count 반감 오래된 count 일 수록 영향력 감소

Page 27: Tree-based Search Focused on Red-Black Tree and Trie and Some Fresh Algorithms

Thick Tree

search(key : int, tree : ThickTree) : ThickTree

searchLL(key : int, tree : ThickTree, parent : ThickTree, grand : ThickTree) : ThickTree

searchL(key : int, tree : ThickTree, parent : ThickTree) : ThickTree

searchR(key : int, tree : ThickTree, parent : ThickTree) : ThickTree

searchRL(key : int, tree : ThickTree, parent : ThickTree, grand : ThickTree) : ThickTree

searchRR(key : int, tree : ThickTree, parent : ThickTree, grand : ThickTree) : ThickTree

searchLR(key : int, tree : ThickTree, parent : ThickTree, grand : ThickTree) : ThickTree

Page 28: Tree-based Search Focused on Red-Black Tree and Trie and Some Fresh Algorithms

Thick Tree

Rotation (Left/Right)

y

A

x

C

B

x

C

y

A

B

Page 29: Tree-based Search Focused on Red-Black Tree and Trie and Some Fresh Algorithms

Thick Tree

Insertion & Deletion

Treap 의 삽입 / 삭제와 동일 단 삽입 시 랜덤하게 초기화되는 hit-count 값이

너무 크지 않게 한다 ( 상한치 설정 , 너무 클 경우 새로 삽입된 노드들이 트리의 상부까지 올라감 ).

너무 작을 경우 트리 전체의 균형이 흩트러진다 . 적절한 값 선택 중요

Page 30: Tree-based Search Focused on Red-Black Tree and Trie and Some Fresh Algorithms

Thick Tree

Entries: 1,000,000

Search: 1,000,000

  Insert Search

Red-Black Tree 3160.2 718

Treap 3935.7 958.7 집중도 Alpha MaxInitHit

Thick Tree 1 11295.3 800 50 50 500

Thick Tree 2 11405.8 857.2 100 50 500

Thick Tree 3 11133.8 658.5 200 50 500

Thick Tree 4 11109.8 542.5 500 50 500

Thick Tree 5 11317.9 458.7 1000 50 500

집중도가 증가함에 따라 성능이 점차 향상된다 .

Page 31: Tree-based Search Focused on Red-Black Tree and Trie and Some Fresh Algorithms

Thick Tree

718 958.7 800 857.2 658.5 542.5 458.7

0

2000

4000

6000

8000

10000

12000

Red-BlackTree

Treap ThickTree 1

ThickTree 2

ThickTree 3

ThickTree 4

ThickTree 5

InsertSearch

Page 32: Tree-based Search Focused on Red-Black Tree and Trie and Some Fresh Algorithms

Thick Tree

Entries: 1,000,000

Search: 1,000,000

  Insert Search

Red-Black Tree 3160.2 718

Treap 3935.7 958.7 집중도 Alpha MaxInitHit

Thick Tree 1 11247.2 483.4 1000 10 500

Thick Tree 2 11220.9 464.9 1000 30 500

Thick Tree 3 11317.9 458.7 1000 50 500

Thick Tree 4 11096.8 472.1 1000 100 500

Thick Tree 5 11187.4 470   1000 200 500

Alpha 에 따른 차이는 크지 않다 . 50 에서 멀어져 감에 따라 점차 성능이 떨어지는 경향이 있다 .

Page 33: Tree-based Search Focused on Red-Black Tree and Trie and Some Fresh Algorithms

Thick Tree

718 958.7483.4 464.9 458.7 472.1 470

0

2000

4000

6000

8000

10000

12000

Red-BlackTree

Treap ThickTree 1

ThickTree 2

ThickTree 3

ThickTree 4

ThickTree 5

InsertSearch

Page 34: Tree-based Search Focused on Red-Black Tree and Trie and Some Fresh Algorithms

Thick Tree

Entries: 1,000,000

Search: 1,000,000

  Insert Search

Red-Black Tree 3160.2 718

Treap 3935.7 958.7 집중도 Alpha MaxInitHit

Thick Tree 1 11317.9 458.7 1000 50 500

Thick Tree 2 8611.1 484.7 1000 50 700

Thick Tree 3 7102.8 463.8 1000 50 1000

Thick Tree 4 4394.5 521.2 1000 50 5000

Thick Tree 5 4113.6 558.5 1000 50 10000

MaxInitHit 값은 삽입 성능에 커라란 영향을 미치며 , 검색 시간에도 영향을 준다 .

Page 35: Tree-based Search Focused on Red-Black Tree and Trie and Some Fresh Algorithms

Thick Tree

718 958.7458.7 484.7 463.8 521.2 558.5

0

2000

4000

6000

8000

10000

12000

Red-BlackTree

Treap ThickTree 1

ThickTree 2

ThickTree 3

ThickTree 4

ThickTree 5

InsertSearch

Page 36: Tree-based Search Focused on Red-Black Tree and Trie and Some Fresh Algorithms

Thick Tree

집중도가 낮을 경우의 성능을 측정해보았다 . 여전히 인상적인 성능을 보인다 .

Entries: 1,000,000

Search: 1,000,000

  Insert Search

Red-Black Tree 3160.2 718

Treap 3935.7 958.7 집중도 Alpha MaxInitHit

Thick Tree 1 11133.8 658.5 200 50 500

Thick Tree 2 5327.4 596.8 200 30 2000

Thick Tree 3 4373.9 630.6 200 50 5000

Thick Tree 4 5312.3 604.2 200 70 2000

Thick Tree 5 4750.1 836.6 50 7 3000

Page 37: Tree-based Search Focused on Red-Black Tree and Trie and Some Fresh Algorithms

Thick Tree

718 958.7 658.5 596.8 630.6 604.2 836.6

0

2000

4000

6000

8000

10000

12000

Red-BlackTree

Treap ThickTree 1

ThickTree 2

ThickTree 3

ThickTree 4

ThickTree 5

InsertSearch

Page 38: Tree-based Search Focused on Red-Black Tree and Trie and Some Fresh Algorithms

Thick Tree

결론 여러 통계 데이터에서도 입증되었듯이 Thick

Tree 는 상당히 인상적인 성능을 보인다 . 알고리즘 자체가 간단하고 명확하다 . 아직 최적화의 여지가 남아있다 .

Page 39: Tree-based Search Focused on Red-Black Tree and Trie and Some Fresh Algorithms

Conclusion

일반 binary tree 는 많은 위험 요소가 있어 여러 가지 balancing 기법들이 고안되었다 .

이중 Red-Black Tree 는 가장 완성도 높고 안정적인 구조 중의 하나이다 .

완벽한 balancing 이 아니더라도 환경에 따라 balanced tree 의 성능을 능가할 수 있다 .