parallelization of regular expression matching and its evaluation on hadoop
Post on 14-Jul-2015
244 Views
Preview:
TRANSCRIPT
PARALLELIZATION OF REGULAR EXPRESSION MATCHING AND ITS EVALUATION ON HADOOP
KIMINORI MATSUZAKI, KENTO EMOTO, YU LIU情報処理学会論文誌 プログラミング�Vol.4 No.4 1-11 (Sep. 2011)
INTRODUCTION AND
MOTIVATION
0
REGULAR EXPRESSION
LIST HOMOMORPHISM
HADOOP
FINITE AUTOMATON
PARALLELIZATION
DFA IS BETTER
PROCESSOR SCALABILITY
OPTIMIZATION OF
REGULAR EXPRESSION MATCHING
1
Hadoop
hadoopHadooooop
hadop
Hadooop
hadoooooop
Hadop
(H|h)adoo*pREGULAREXPRESSION
full-text search
search engine
XML processingaccess log analysis
natural language processing
text replacing network securitycompiler front-endACHIEVED WITH
REGULAREXPRESSIONS
URL router
FINITE AUTOMATON
a
ε
a
a
NON-DETERMINISTICFINITE AUTOMATON
a
b
a
c
d e
a
DETERMINISTICFINITE AUTOMATON
PARALLELISM
LISTHOMOMORPHISM
2
({[a],[b],[a, b],[b, c, d],[e, f],..}, ++)
({1,1,2,3,2,..}, +)
HOMOMORPHISM
[1, 2, 3] ++ [7, 8] = [1, 2, 3, 7, 8]
3 + 2 = 5
HOMOMORPHISM
DIVIDEAND
CONQUER
LIST HOMOMORPHISM
B C D A BA ...
foldl
O((n/p + log p))入力文字列の長さがn計算ノードの数がp
DFAO((n/p + log p)|QD|)入力文字列の長さがn計算ノードの数がpDFAの状態数がQD
NFAO((n/p + log p)|QN|^3)入力文字列の長さがn計算ノードの数がpNFAの状態数がQN
EVALUATION ON
HADOOP
3
MAP REDUCE
MAPPER
REDUCER
MAPPER
MAPPER
MAPPER
INPUT OUTPUT
0s
125s
250s
375s
500s
0 8 16 24 32 40
Exec
utin
tim
e
Number of Nodes
DFA NFA
small REGULAR EXPRESSION
0s
1750s
3500s
5250s
7000s
0 8 16 24 32 40
Exec
utin
Tim
e
Number of Nodes
DFA NFA
LARGE REGULAR EXPRESSION
0s
75s
150s
225s
300s
0 1500 3000 4500 6000
Exec
utio
n tim
e
Number of states
DFA
LINEAR
0s
1000s
2000s
3000s
4000s
0 10 20 30 40
Exec
utin
tim
e
Number of states
NFA
CUBIC
RELEVANTSTUDIES
4
TREEHOMOMORPHISM
GPGPU BASED
MAXIMUM MARKING PROBLEMS 松崎公紀, 胡 振江, 武市正人:
リスト上の最大マーク付け問題を解く並列プログラムの導出,情報処理学会論文誌:プログラミング,Vol.49, No.SIG 3 (PRO 36), pp.16‒27 (2008).
Skillicorn, D.B.: Structured Parallel Computation in Structured Documents, Journal of Universal Computer Science, Vol.3, No.1, pp.42‒68 (1997).野村芳明, 江本健斗, 松崎公紀, 胡 振江, 武市正人:木スケルトンによるXPathクエリの並列化とその評価,コンピュータソフトウェア, Vol.24, No.3, pp.51‒62 (2007).
Naghmouchi, J., Scarpazza, D.P. and Berekovic, M.:Small-ruleset Regular Expres- sion Matching on GPGPUs: Quantitative Performance Analysis and Optimization, Proc. 24th International Conference on Supercomputing, 2010, Tsukuba, Ibaraki, Japan, June 2-4, 2010,Boku, T., Nakashima, H. and Mendelson, A. (Eds.), pp.337‒ 348, ACM (2010).
top related