fastregular%expression%matching% based%on%dual%glushkov%nfa · fastregular%expression%matching%...

39
Fast Regular Expression Matching Based On Dual Glushkov NFA Ryutaro Kurai , Norihito Yasuda, Hiroki Arimura, Shinobu Nagayama and ShinIchi Minato 1 PSC 2014 Prague, Czech Republic, September 1–3, 2014

Upload: others

Post on 22-Mar-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: FastRegular%Expression%Matching% Based%On%Dual%Glushkov%NFA · FastRegular%Expression%Matching% Based%On%Dual%Glushkov%NFA Ryutaro%Kurai,%Norihito%Yasuda,% Hiroki%Arimura,%Shinobu%Nagayama

Fast  Regular  Expression  Matching  Based  On  Dual  Glushkov  NFA Ryutaro  Kurai,  Norihito  Yasuda,  

Hiroki  Arimura,  Shinobu  Nagayama  and  Shin-­‐Ichi  Minato

1

PSC  2014  Prague,  Czech  Republic,  September  1–3,  2014  

Page 2: FastRegular%Expression%Matching% Based%On%Dual%Glushkov%NFA · FastRegular%Expression%Matching% Based%On%Dual%Glushkov%NFA Ryutaro%Kurai,%Norihito%Yasuda,% Hiroki%Arimura,%Shinobu%Nagayama

Outline

•  Research  DefiniRon  •  Regular  Expression  Matching  with  NFA  – Previous  works  – Look-­‐Ahead(LA)  Matching  – Glushkov  NFA,  and  Dual  Glushkov  NFA  

•  LA  Matching  by  Using  Dual  Glushkov  NFA  – Memory  Space  Analysis  – Experimental  Results  

2

Page 3: FastRegular%Expression%Matching% Based%On%Dual%Glushkov%NFA · FastRegular%Expression%Matching% Based%On%Dual%Glushkov%NFA Ryutaro%Kurai,%Norihito%Yasuda,% Hiroki%Arimura,%Shinobu%Nagayama

Outline

•  Research  DefiniRon  •  Regular  Expression  Matching  with  NFA  – Previous  works  – Look-­‐Ahead(LA)  Matching  – Glushkov  NFA,  and  Dual  Glushkov  NFA  

•  LA  Matching  by  Using  Dual  Glushkov  NFA  – Memory  Space  Analysis  – Experimental  Results  

3

Page 4: FastRegular%Expression%Matching% Based%On%Dual%Glushkov%NFA · FastRegular%Expression%Matching% Based%On%Dual%Glushkov%NFA Ryutaro%Kurai,%Norihito%Yasuda,% Hiroki%Arimura,%Shinobu%Nagayama

Research  DefiniRon  and  Purpose  •  Regular  Expression(RE)  Matching  Problem  – Searching  substring  s  that  matches  to  the  paXern  p  of  regular  expression  R  from  given  text  T.  

– Example  •  R  =  ‘aa(a|b)’    •  T  =  ‘aabababbaaa’  

•  Purpose  of  the  Research  – Develop  high-­‐speed  and  space  efficient  matching  method  for  RE  Matching  to  process  huge  amount  of  data.  

–  InvesRgate  efficient  algorithm  for  RE  Matching.  4

Page 5: FastRegular%Expression%Matching% Based%On%Dual%Glushkov%NFA · FastRegular%Expression%Matching% Based%On%Dual%Glushkov%NFA Ryutaro%Kurai,%Norihito%Yasuda,% Hiroki%Arimura,%Shinobu%Nagayama

Targeted  ApplicaRon  of  Our  Research  

•  Network  intrusion  detecRon  systems.  •  Using  huge  size  of  RE  paXerns.  

•  Above  paXerns  come  from  ‘Snort’.  •  Each  line  has  about  100  symbols.  •  Each  paXerns  will  be  joined  by  conjuncRon  symbol  ‘|’.  •  Number  of  lines  is  about  6,000 5

^Remote-Party-ID\x3A\s+[^\r\n]+\x40[^\r\n]*?[\x80-\xFF]�^Authorization\x3A[^\r\n]+?response=[\x00-\x09\x0B\x0C\x0E-\x7F]*[\x80-\xFF]�^Date\x3A[^\r\n]*[\x2D\x2B]�^Content-Type[^\r\n]+[\x01-\x08\x0B\x0C\x0E-\x1F\x80-\xFF]�^Content-Type\x3A\s*[^\r\n%]*%�^Contact\x3a\s+\x22[^\x22]*\x3c�…  

Page 6: FastRegular%Expression%Matching% Based%On%Dual%Glushkov%NFA · FastRegular%Expression%Matching% Based%On%Dual%Glushkov%NFA Ryutaro%Kurai,%Norihito%Yasuda,% Hiroki%Arimura,%Shinobu%Nagayama

Major  Approaches  To  The  RE  Matching  Problem

Method Pros Cons

Backtracking Can  use  back  reference  Good  space  efficiency

Slows  down  for  some  bad  paXern  (when  it  backtracks  many  Rmes)

DeterminisRc  Finite  Automata  (DFA)

Fast  matching  speed Use  exponenRal  memory  space  O(2m),  m  is  the  size  of  RE  paXern

NondeterminisRc  Finite  Automata  (NFA)

Good  space  efficiency  Matches  fast  enough  

Slows  in  case  acRve  states  grow

6

Page 7: FastRegular%Expression%Matching% Based%On%Dual%Glushkov%NFA · FastRegular%Expression%Matching% Based%On%Dual%Glushkov%NFA Ryutaro%Kurai,%Norihito%Yasuda,% Hiroki%Arimura,%Shinobu%Nagayama

Major  Approaches  To  The  RE  Matching  Problem

Method Pros Cons

Backtracking Can  use  back  reference  Good  space  efficiency

Slows  down  for  some  bad  paXern  (when  it  backtracks  many  Rmes)

DeterminisRc  Finite  Automata  (DFA)

Fast  matching  speed Use  exponenRal  memory  space  O(2m),  m  is  the  size  of  RE  paXern

NondeterminisRc  Finite  Automata  (NFA)

Good  space  efficiency  Matches  fast  enough  

Slows  in  case  acRve  states  grow

7

We  have  to  Control  growth  of  ac6ve  states!

Page 8: FastRegular%Expression%Matching% Based%On%Dual%Glushkov%NFA · FastRegular%Expression%Matching% Based%On%Dual%Glushkov%NFA Ryutaro%Kurai,%Norihito%Yasuda,% Hiroki%Arimura,%Shinobu%Nagayama

Outline

•  Research  DefiniRon  •  Regular  Expression  Matching  with  NFA  – Previous  works  – Look-­‐Ahead(LA)  Matching  – Glushkov  NFA,  and  Dual  Glushkov  NFA  

•  LA  Matching  by  Using  Dual  Glushkov  NFA  – Memory  Space  Analysis  – Experimental  Results  

8

Page 9: FastRegular%Expression%Matching% Based%On%Dual%Glushkov%NFA · FastRegular%Expression%Matching% Based%On%Dual%Glushkov%NFA Ryutaro%Kurai,%Norihito%Yasuda,% Hiroki%Arimura,%Shinobu%Nagayama

NondeterminisRc  Finite  Automata

•  NFA  is  defined  as  following  5-­‐tuple.  – a  finite  set  of  states  E  – a  finite  set  of  input  symbols  Σ  – a  transiRon  funcRon  σ  – a  finite  set  of  iniRal  states  I  – a  finite  set  of  final  states  F    

9

Page 10: FastRegular%Expression%Matching% Based%On%Dual%Glushkov%NFA · FastRegular%Expression%Matching% Based%On%Dual%Glushkov%NFA Ryutaro%Kurai,%Norihito%Yasuda,% Hiroki%Arimura,%Shinobu%Nagayama

Previous  Research  on    Improving  RE  Matching  Using  NFA

•  Bit  parallel  techniques  –  S.Wu  and  U.  Manber,  92  – G.Navarro  and  M.Raffinot,  99  –  Efficient  for  the  paXerns  that  is  smaller  than  word  size  of  processors.  

•  MulR-­‐stride/mulR-­‐character  NFA  –  B.  Brodie  et  al.  06  –  TransiRons  are  labeled  by  mulR  symbols.  

•  Look-­‐ahead  Matching  – Well  known  remedy.  

10

Page 11: FastRegular%Expression%Matching% Based%On%Dual%Glushkov%NFA · FastRegular%Expression%Matching% Based%On%Dual%Glushkov%NFA Ryutaro%Kurai,%Norihito%Yasuda,% Hiroki%Arimura,%Shinobu%Nagayama

Outline

•  Research  DefiniRon  •  Regular  Expression  Matching  with  NFA  – Previous  works  – Look-­‐Ahead(LA)  Matching  – Glushkov  NFA,  and  Dual  Glushkov  NFA  

•  LA  Matching  by  Using  Dual  Glushkov  NFA  – Memory  Space  Analysis  – Experimental  Results  

11

Page 12: FastRegular%Expression%Matching% Based%On%Dual%Glushkov%NFA · FastRegular%Expression%Matching% Based%On%Dual%Glushkov%NFA Ryutaro%Kurai,%Norihito%Yasuda,% Hiroki%Arimura,%Shinobu%Nagayama

Regular  expression  example •  paXern  ‘a’  – matches  to  text  “a”  

•  paXern  ‘ab’  (concatenaRon)  – matches  to  text  “ab”  

•  paXern  ‘(a|b)’  (disjuncRon)  – matches  to  text  “a”  or  “b”  

•  paXern  ‘a*’  (cline  closure)  – matches  to  text  “”  ,  “a”,  “aa”,  or  “aa…a”    

•  combinaRon  of  above  parts  –  paXern  ‘(ab|ac)*’  matches  to  “ab”,  “ac”  or  “abac…”  

12

Page 13: FastRegular%Expression%Matching% Based%On%Dual%Glushkov%NFA · FastRegular%Expression%Matching% Based%On%Dual%Glushkov%NFA Ryutaro%Kurai,%Norihito%Yasuda,% Hiroki%Arimura,%Shinobu%Nagayama

Look-­‐Ahead  Matching  Method  for  Regular  Expression

•  NFA  treats  mulRple  states  as  acRve  state.  •  Each  acRve  state  needs  simulaRon  of  transiRon.  It  costs  almost  all  of  calculaRon  Rme.  

•  Therefore,  we  want  to  decrease  acRve  states.  

•  Main  Idea  – Make  state  acRve  if  and  only  if  the  state  have  transiRon  that  use  next  input  symbol.  

–  If  we  use  such  transiRon,  we  only  use  states  that  can  transit  in  connected  next  state.  This  property  can  decrease  acRve  states.

13

Page 14: FastRegular%Expression%Matching% Based%On%Dual%Glushkov%NFA · FastRegular%Expression%Matching% Based%On%Dual%Glushkov%NFA Ryutaro%Kurai,%Norihito%Yasuda,% Hiroki%Arimura,%Shinobu%Nagayama

Normal  NFA  Matching  Example

14

•  Input  text:  “aab…”  – We  acRvate  state  No.  2.  and  No.  3  –  Then  we  acRvate  states  No.  4  but  No.  5.  

•  Input  text:  “adb..”  –  we  acRvate  state  No.  2.  and  No.3  –  But,  for  next  symbol,  we  can  not  acRvate  any  state.

1

2 4

3 5

a a

a

a

b 7

b

6

a b

R  =  (aa|ab)(a|b) t1 source des6na6on

a 1 2

a 1 3

a 2 4

b 3 5

a 4 6

b 4 7

a 5 6

b 6 7

Page 15: FastRegular%Expression%Matching% Based%On%Dual%Glushkov%NFA · FastRegular%Expression%Matching% Based%On%Dual%Glushkov%NFA Ryutaro%Kurai,%Norihito%Yasuda,% Hiroki%Arimura,%Shinobu%Nagayama

Look-­‐Ahead  Matching  Example

15

•  Input  text:  “aab…”  – We  acRvate  state  only  No.  2.  #  of  acRve  state  is  decreased.  

–  Then  we  acRvate  states  No.  4  •  Input  text:  “adb..”  –  Since  ‘d’  is  not  present  in  t2  column,  no  state  acRvated.

1

2 4

3 5

a a

a

a

b 7

b

6

a b

R  =  (aa|ab)(a|b) t1 t2 source des6na6on

a a 1 2

a b 1 3

a a 2 4

a b 2 4

b a 3 5

b b 3 5

a * 4 6

b * 4 7

a * 5 6

b * 6 7

Page 16: FastRegular%Expression%Matching% Based%On%Dual%Glushkov%NFA · FastRegular%Expression%Matching% Based%On%Dual%Glushkov%NFA Ryutaro%Kurai,%Norihito%Yasuda,% Hiroki%Arimura,%Shinobu%Nagayama

Look-­‐Ahead  Matching  Example

16

•  LA  matching  can  decrease  number  of  acRve  states.  

•  LA  matching  can  give  up  matching  in  early  Rming.  

•  But,  it  use  extra  memory  space  for  extended  transiRon  table.

1

2 4

3 5

a a

a

a

b 7

b

6

a b

R  =  (aa|ab)(a|b) t1 t2 source des6na6on

a a 1 2

a b 1 3

a a 2 4

a b 2 4

b a 3 5

b b 3 5

a * 4 6

b * 4 7

a * 5 6

b * 6 7

Page 17: FastRegular%Expression%Matching% Based%On%Dual%Glushkov%NFA · FastRegular%Expression%Matching% Based%On%Dual%Glushkov%NFA Ryutaro%Kurai,%Norihito%Yasuda,% Hiroki%Arimura,%Shinobu%Nagayama

Outline

•  Research  DefiniRon  •  Regular  Expression  Matching  with  NFA  – Previous  works  – Look-­‐Ahead(LA)  Matching  – Glushkov  NFA,  and  Dual  Glushkov  NFA  

•  LA  Matching  by  Using  Dual  Glushkov  NFA  – Space  Analysis  – Experimental  Results  

17

Page 18: FastRegular%Expression%Matching% Based%On%Dual%Glushkov%NFA · FastRegular%Expression%Matching% Based%On%Dual%Glushkov%NFA Ryutaro%Kurai,%Norihito%Yasuda,% Hiroki%Arimura,%Shinobu%Nagayama

Thompson  NFA  (T-­‐NFA)

•  NFA  that  transformed  from  regular  expression  by  Thompson’s  method.  

•  Thompson’s  method  convert  regular  expression  syntax  tree  into  parRal  NFA  inducRvely.  

•  It  has  following  property  –  It  has  epsilon  transiRon  – #  of  transiRons  is  4m  at  most  .  – #  of  states  is  2m  at  most.    

0

1

2

3A

4G

5T

6A7

8

9

10

11A

12T

13G

14A 1615A�

18 R=  (AT|GA)(AG|TAA)*

Page 19: FastRegular%Expression%Matching% Based%On%Dual%Glushkov%NFA · FastRegular%Expression%Matching% Based%On%Dual%Glushkov%NFA Ryutaro%Kurai,%Norihito%Yasuda,% Hiroki%Arimura,%Shinobu%Nagayama

Glushkov  NFA  (G-­‐NFA) •  NFA  that  transformed  from  regular  expression.  •  Glushkov  NFA  has  following  properRes.  –  It  has  no  ε-­‐transiRons.  –  Incoming  transiRons  are  labeled  by  the  same  symbol.  –  It  has  only  one  iniRal  state.  –  It  has  one  or  more  final  states.  

19

1

2 4

3 5

a a

a

a

b 7 b

6

a b

R  =  (aa|ab)(a|b)

Page 20: FastRegular%Expression%Matching% Based%On%Dual%Glushkov%NFA · FastRegular%Expression%Matching% Based%On%Dual%Glushkov%NFA Ryutaro%Kurai,%Norihito%Yasuda,% Hiroki%Arimura,%Shinobu%Nagayama

Dual  Glushkov  NFA

•  NFA  with  dual  property  of  Glushkov  NFA.  –  no  ε-­‐transiRons.  – Outgoing  transiRons  are  labeled  by  the  same  symbol.  –  It  has  only  one  final  state.  –  It  has  one  or  more  iniRal  states.  

20

1 3

2 4

a a

a 6 b

5

ba 7

a

b

R  =  (aa|ab)(a|b)

Page 21: FastRegular%Expression%Matching% Based%On%Dual%Glushkov%NFA · FastRegular%Expression%Matching% Based%On%Dual%Glushkov%NFA Ryutaro%Kurai,%Norihito%Yasuda,% Hiroki%Arimura,%Shinobu%Nagayama

Outline

•  Research  DefiniRon  •  Regular  Expression  Matching  with  NFA  – Previous  works  – Look-­‐Ahead(LA)  Matching  – Glushkov  NFA,  and  Dual  Glushkov  NFA  

•  LA  Matching  by  Using  Dual  Glushkov  NFA  – Memory  Space  Analysis  – Experimental  Results  

21

Page 22: FastRegular%Expression%Matching% Based%On%Dual%Glushkov%NFA · FastRegular%Expression%Matching% Based%On%Dual%Glushkov%NFA Ryutaro%Kurai,%Norihito%Yasuda,% Hiroki%Arimura,%Shinobu%Nagayama

Proposed  Method  •  Look-­‐Ahead  Matching  by  Using  Dual  Glushkov  NFA  

•  To  solve  the  problem  of  consumpRon  of  extra  memory  space  by  the  Look-­‐Ahead  matching  method,  we  propose  the  approach  of  Dual  G-­‐NFA  to  reduce  memory  space  consumpRon.  

•  If  we  use  Look-­‐Ahead  Matching  by  dual  G-­‐NFA,  we  can  simulate  NFAs  without  increasing  the  size  of  transiRon  tables.  

22

Page 23: FastRegular%Expression%Matching% Based%On%Dual%Glushkov%NFA · FastRegular%Expression%Matching% Based%On%Dual%Glushkov%NFA Ryutaro%Kurai,%Norihito%Yasuda,% Hiroki%Arimura,%Shinobu%Nagayama

Memory  Space  Comparison

23

R  =  (aa|ab)(a|b) 1

2 4

3 5

aa

a

a

b7

6

ab

1 3

2 4

aa

a6

b

5

ba

7

a

bb

t1 t2 source des6na6on a a 1 2 a b 1 3 a a 2 4 a b 2 4 b a 3 5 b b 3 5 a * 4 6 b * 4 7 a * 5 6 b * 6 7

t1 t2 source des6na6on a a 1 3 a b 2 4 a a 3 5 a b 3 6 b a 4 5 b b 4 6 a * 5 7 b * 6 7

Glushkov  NFA Dual  Glushkov  NFA

Page 24: FastRegular%Expression%Matching% Based%On%Dual%Glushkov%NFA · FastRegular%Expression%Matching% Based%On%Dual%Glushkov%NFA Ryutaro%Kurai,%Norihito%Yasuda,% Hiroki%Arimura,%Shinobu%Nagayama

Memory  Space  Comparison

24

t1 t2 source des6na6on a a 1 2 a b 1 3 a a 2 4 a b 2 4 b a 3 5 b b 3 5 a * 4 6 b * 4 7 a * 5 6 b * 6 7

t1 t2 source des6na6on a a 1 3 a b 2 4 a a 3 5 a b 3 6 b a 4 5 b b 4 6 a * 5 7 b * 6 7

Glushkov  NFA Dual  Glushkov  NFA

•  The  size  of  transiRon  table  of  Glushkov  NFA  is  O(|E||Σ|)  

•  The  size  of  Dual  Glushkov  NFA  is  O(|E|)  •  Our  method  use  memory  space  by|Σ|  Rmes  smaller.  

Page 25: FastRegular%Expression%Matching% Based%On%Dual%Glushkov%NFA · FastRegular%Expression%Matching% Based%On%Dual%Glushkov%NFA Ryutaro%Kurai,%Norihito%Yasuda,% Hiroki%Arimura,%Shinobu%Nagayama

Look-­‐Ahead  Matching    by  Using  Dual  Glushkov  NFA  

•  The  size  of  transiRon  table  of  Glushkov  NFA  is  O(|E||Σ|)  

•  The  size  of  Dual  Glushkov  NFA  is  O(|E|)  •  Our  method  use  memory  space  by|Σ|  Rmes  smaller.  

25

Page 26: FastRegular%Expression%Matching% Based%On%Dual%Glushkov%NFA · FastRegular%Expression%Matching% Based%On%Dual%Glushkov%NFA Ryutaro%Kurai,%Norihito%Yasuda,% Hiroki%Arimura,%Shinobu%Nagayama

Outline

•  Research  DefiniRon  •  Regular  Expression  Matching  with  NFA  – Previous  works  – Look-­‐Ahead(LA)  Matching  – Glushkov  NFA,  and  Dual  Glushkov  NFA  

•  LA  Matching  by  Using  Dual  Glushkov  NFA  – Memory  Space  Analysis  – Experimental  Results  

26

Page 27: FastRegular%Expression%Matching% Based%On%Dual%Glushkov%NFA · FastRegular%Expression%Matching% Based%On%Dual%Glushkov%NFA Ryutaro%Kurai,%Norihito%Yasuda,% Hiroki%Arimura,%Shinobu%Nagayama

Experimental  Serngs •  Method:  match  3  types  of  regular  expression  paXerns  with  one  text  data.  

•  The  text  data  used  in  the  experiments  is    “English.100MB”  from  Pizza  &  Chili  corpus.    

•  We  compared  proposed  method  (Dual  G-­‐NFA  with  LA)  with  Dual  G-­‐NFA,  G-­‐NFA,  G-­‐NFA  with  LA,  Thompson  NFA  and  NR-­‐grep[Navarro  01].  

•  Each  experiment  is  iterated  10  Rmes,  and  the  average  elapsed  Rme  is  recorded.  

•  The  elapsed  Rme  and  average  size  of  acRve  states  are  compared.  

27

Page 28: FastRegular%Expression%Matching% Based%On%Dual%Glushkov%NFA · FastRegular%Expression%Matching% Based%On%Dual%Glushkov%NFA Ryutaro%Kurai,%Norihito%Yasuda,% Hiroki%Arimura,%Shinobu%Nagayama

Fixed  paXern

•  PaXern  – n  fixed  strings  are  joined  by  disjuncRon  symbol.  – n  fixed  strings  are  randomly  collected  from  /usr/share/dict/words  on  Mac  OS  X  10.9  

•  Example  R  =  alpha|blabo|chary|delta|…|Juliet      

28

Page 29: FastRegular%Expression%Matching% Based%On%Dual%Glushkov%NFA · FastRegular%Expression%Matching% Based%On%Dual%Glushkov%NFA Ryutaro%Kurai,%Norihito%Yasuda,% Hiroki%Arimura,%Shinobu%Nagayama

Results:  Elapsed  Time(sec)

0  

20  

40  

60  

80  

100  

120  

140  

160  

180  

200  

20   40   60   80   100   120   140   160   180   200  

G-­‐NFA  

G-­‐NFA  with  LA  

Dual  G-­‐NFA  

Dual  G-­‐NFA  with  LA  

Thompson  NFA  

NR-­‐grep  

29

#  of  strings  n

sec

Our  proposed  method  (red  line)  is  2nd  fastest,  and  it  has  beXer  space  efficiency  than  1st  fastest  G-­‐NFA  with  LA

Page 30: FastRegular%Expression%Matching% Based%On%Dual%Glushkov%NFA · FastRegular%Expression%Matching% Based%On%Dual%Glushkov%NFA Ryutaro%Kurai,%Norihito%Yasuda,% Hiroki%Arimura,%Shinobu%Nagayama

Results:  #  of  Average  AcRve  States

0  

1  

2  

3  

4  

5  

6  

7  

8  

9  

20   40   60   80   100   120   140   160   180   200  

G-­‐NFA  

G-­‐NFA  with  LA  

Dual  G-­‐NFA  

Dual  G-­‐NFA  with  LA  

30

#  of  strings  n

#  of  average  states

AcRve  state  of  our  proposed  method  (red  line)  is  smallest.

Page 31: FastRegular%Expression%Matching% Based%On%Dual%Glushkov%NFA · FastRegular%Expression%Matching% Based%On%Dual%Glushkov%NFA Ryutaro%Kurai,%Norihito%Yasuda,% Hiroki%Arimura,%Shinobu%Nagayama

Flexible  paXern

•  PaXern  – n  fixed  strings  used  same  as  previous  paXern.  – a  special  symbol  is  inserted  to  each  of  the  n  fixed  strings.  

–  Inserted  strings  are  joined  by  disjuncRon  symbol.  

•  Example  – R  =  al*pha|bla+bo|ch+ary|d*elta|…|Juliet      

31

Page 32: FastRegular%Expression%Matching% Based%On%Dual%Glushkov%NFA · FastRegular%Expression%Matching% Based%On%Dual%Glushkov%NFA Ryutaro%Kurai,%Norihito%Yasuda,% Hiroki%Arimura,%Shinobu%Nagayama

Results:  Elapsed  Time(sec)

0  

20  

40  

60  

80  

100  

120  

140  

160  

180  

200  

20   40   60   80   100   120   140   160   180   200  

G-­‐NFA  

G-­‐NFA  with  LA  

Dual  G-­‐NFA  

Dual  G-­‐NFA  with  LA  

Thompson  NFA  

NR-­‐grep  

32

sec

#  of  strings  n

If  the  paXerns  have  special  symbols  of  RE,  our  proposed  method  (red  line)  is  2nd  fastest  again.

Page 33: FastRegular%Expression%Matching% Based%On%Dual%Glushkov%NFA · FastRegular%Expression%Matching% Based%On%Dual%Glushkov%NFA Ryutaro%Kurai,%Norihito%Yasuda,% Hiroki%Arimura,%Shinobu%Nagayama

Results:  #  of  Average  AcRve  States

0  

1  

2  

3  

4  

5  

6  

7  

8  

9  

20   40   60   80   100   120   140   160   180   200  

G-­‐NFA  

G-­‐NFA  with  LA  

Dual  G-­‐NFA  

Dual  G-­‐NFA  with  LA  

33

#  of  strings  n

#  of  average  states

If  the  paXerns  have  special  symbol,  the  number  of  acRve  states  of  our  proposed  method  (red  line)  is  smallest.

Page 34: FastRegular%Expression%Matching% Based%On%Dual%Glushkov%NFA · FastRegular%Expression%Matching% Based%On%Dual%Glushkov%NFA Ryutaro%Kurai,%Norihito%Yasuda,% Hiroki%Arimura,%Shinobu%Nagayama

Reasonable  paXerns •  Some  reasonable  regular  expression  paXerns  are  made  as  

follows.  •  “Suffix”  

–  [a-­‐zA-­‐Z]+(able|ible|al|…|ise)                          (total  35  suffixes)    •  “Prefix”  

–  (in|il|im|infra|…|under)  [a-­‐zA-­‐Z]+  (total  32  prefixes)  •  “Names”  

–  (Jackson|Aiden|...|Jack)  (Smith|Johnson|...|Wilson)    •  “User”  

–  [a-­‐zA-­‐Z]+@[a-­‐zA-­‐Z]+  •  “Title”  

–  ([A-­‐Z]+␣)+  

34

Page 35: FastRegular%Expression%Matching% Based%On%Dual%Glushkov%NFA · FastRegular%Expression%Matching% Based%On%Dual%Glushkov%NFA Ryutaro%Kurai,%Norihito%Yasuda,% Hiroki%Arimura,%Shinobu%Nagayama

Results paXern T-­‐NFA NR-­‐grep G-­‐NFA G-­‐NFA  

with  LA Dual  G-­‐NFA   Dual  G-­‐NFA  

with  LA

suffix 113.48   20.24 9.74 7.51 106.64 3.35

prefix 14.33   5.295 2.74 3.97 78.39 3.82

names 12.95 0.216 2.97 2.74 3.21 2.76

user 78.14 0.08 12.11 7.41 185.22 3.36

Rtle 38.88 0.186 2.93 2.38 2.68 2.21

paXern G-­‐NFA G-­‐NFA  with  LA

Dual  G-­‐NFA   Dual  G-­‐NFA   with  LA

suffix 2.33 1.65 50.44 1.15

prefix 1.50 1.15 8.11 0.33

names 1.01 1.00 0.01 0.001

user 1.77 1.59 40.67 0.59

Rtle 1.03 1.01 0.75   0.01 35

Elapsed  Rme  (sec)

Average  number  of  acRve  states  

Our  Method:  •  Fastest  for  ‘suffix’;  •  2nd  fastest  for  other  

paXerns;  •  Decreases  acRve  

states  in  all  paXerns.

Colors    •  1st  •  2nd  

Page 36: FastRegular%Expression%Matching% Based%On%Dual%Glushkov%NFA · FastRegular%Expression%Matching% Based%On%Dual%Glushkov%NFA Ryutaro%Kurai,%Norihito%Yasuda,% Hiroki%Arimura,%Shinobu%Nagayama

Conclusion

•  We  developed  efficient  regular  expression  matching  method  that  combines  Dual  Glushkov  NFA  and  Look-­‐Ahead  Matching.  

•  Our  proposed  method  is  very  fast  especially  in  "large-­‐scale"  &  "sparsely  acRve"  paXerns.  

•  We  plan  to  use  our  method  for  extended  regular  expressions  such  as  ‘character  class’,  ‘wild  card’,  and  so  on.  

•  We  also  plan  to  use  our  method  to  network  intrusion  detecRon  paXerns.  

36

Page 37: FastRegular%Expression%Matching% Based%On%Dual%Glushkov%NFA · FastRegular%Expression%Matching% Based%On%Dual%Glushkov%NFA Ryutaro%Kurai,%Norihito%Yasuda,% Hiroki%Arimura,%Shinobu%Nagayama

Thank  you!

37

Page 38: FastRegular%Expression%Matching% Based%On%Dual%Glushkov%NFA · FastRegular%Expression%Matching% Based%On%Dual%Glushkov%NFA Ryutaro%Kurai,%Norihito%Yasuda,% Hiroki%Arimura,%Shinobu%Nagayama

Thank  you!

38

Page 39: FastRegular%Expression%Matching% Based%On%Dual%Glushkov%NFA · FastRegular%Expression%Matching% Based%On%Dual%Glushkov%NFA Ryutaro%Kurai,%Norihito%Yasuda,% Hiroki%Arimura,%Shinobu%Nagayama

What  is  Duality?

39

•  Glushkov  NFA  –  Incoming  transiRons  are  labeled  by  the  same  symbol.  

–  It  has  only  one  iniRal  state.  –  It  has  one  or  more  final  states.  

39

1

2 4

3 5

a a

a

a

b 7

6

a b

R  =  (aa|ab)(a|b)

•  Dual  Glushkov  NFA  –  Outgoing  transi6ons  are  labeled  by  the  same  symbol.  

–  It  has  only  one  final  state.  –  It  has  one  or  more  iniRal  states.  

39

1 3

2 4

a a

a 6

b

5

b a

7

a

b

R  =  (aa|ab)(a|b)

b