parallelization of regular expression matching and its evaluation on hadoop

Post on 14-Jul-2015

244 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

PARALLELIZATION OF REGULAR EXPRESSION MATCHING AND ITS EVALUATION ON HADOOP

KIMINORI MATSUZAKI, KENTO EMOTO, YU LIU情報処理学会論文誌 プログラミング�Vol.4 No.4 1-11 (Sep. 2011)

INTRODUCTION AND

MOTIVATION

0

REGULAR EXPRESSION

LIST HOMOMORPHISM

HADOOP

FINITE AUTOMATON

PARALLELIZATION

DFA IS BETTER

PROCESSOR SCALABILITY

OPTIMIZATION OF

REGULAR EXPRESSION MATCHING

1

Hadoop

hadoopHadooooop

hadop

Hadooop

hadoooooop

Hadop

(H|h)adoo*pREGULAREXPRESSION

full-text search

search engine

XML processingaccess log analysis

natural language processing

text replacing network securitycompiler front-endACHIEVED WITH

REGULAREXPRESSIONS

URL router

FINITE AUTOMATON

a

ε

a

a

NON-DETERMINISTICFINITE AUTOMATON

a

b

a

c

d e

a

DETERMINISTICFINITE AUTOMATON

PARALLELISM

LISTHOMOMORPHISM

2

({[a],[b],[a, b],[b, c, d],[e, f],..}, ++)

({1,1,2,3,2,..}, +)

HOMOMORPHISM

[1, 2, 3] ++ [7, 8] = [1, 2, 3, 7, 8]

3 + 2 = 5

HOMOMORPHISM

DIVIDEAND

CONQUER

LIST HOMOMORPHISM

B C D A BA ...

foldl

O((n/p + log p))入力文字列の長さがn計算ノードの数がp

DFAO((n/p + log p)|QD|)入力文字列の長さがn計算ノードの数がpDFAの状態数がQD

NFAO((n/p + log p)|QN|^3)入力文字列の長さがn計算ノードの数がpNFAの状態数がQN

EVALUATION ON

HADOOP

3

MAP REDUCE

MAPPER

REDUCER

MAPPER

MAPPER

MAPPER

INPUT OUTPUT

0s

125s

250s

375s

500s

0 8 16 24 32 40

Exec

utin

tim

e

Number of Nodes

DFA NFA

small REGULAR EXPRESSION

0s

1750s

3500s

5250s

7000s

0 8 16 24 32 40

Exec

utin

Tim

e

Number of Nodes

DFA NFA

LARGE REGULAR EXPRESSION

0s

75s

150s

225s

300s

0 1500 3000 4500 6000

Exec

utio

n tim

e

Number of states

DFA

LINEAR

0s

1000s

2000s

3000s

4000s

0 10 20 30 40

Exec

utin

tim

e

Number of states

NFA

CUBIC

RELEVANTSTUDIES

4

TREEHOMOMORPHISM

GPGPU BASED

MAXIMUM MARKING PROBLEMS 松崎公紀, 胡 振江, 武市正人:

リスト上の最大マーク付け問題を解く並列プログラムの導出,情報処理学会論文誌:プログラミング,Vol.49, No.SIG 3 (PRO 36), pp.16‒27 (2008).

Skillicorn, D.B.: Structured Parallel Computation in Structured Documents, Journal of Universal Computer Science, Vol.3, No.1, pp.42‒68 (1997).野村芳明, 江本健斗, 松崎公紀, 胡 振江, 武市正人:木スケルトンによるXPathクエリの並列化とその評価,コンピュータソフトウェア, Vol.24, No.3, pp.51‒62 (2007).

Naghmouchi, J., Scarpazza, D.P. and Berekovic, M.:Small-ruleset Regular Expres- sion Matching on GPGPUs: Quantitative Performance Analysis and Optimization, Proc. 24th International Conference on Supercomputing, 2010, Tsukuba, Ibaraki, Japan, June 2-4, 2010,Boku, T., Nakashima, H. and Mendelson, A. (Eds.), pp.337‒ 348, ACM (2010).

top related