notes on sequence binary decision diagrams: relationship to acyclic automata and complexities of...
Post on 04-Jan-2016
224 Views
Preview:
TRANSCRIPT
Notes on Sequence Binary Decision Diagrams:
Relationship to Acyclic Automata and Complexities of Binary Set Operations
Shuhei Denzumi1, Ryo Yoshinaka2, 1, Shin-ichi Minato1,2, and Hiroki Arimura1
1) Hokkaido University2) JST ERATO Minato Discrete Structure Manipulation System Project
Background
Researches on string processing become active.Massive online data: The internet and sensing networks.
String matching and string mining problems.
Data miningInput data should be represented in compact form
Computation under compressed structure is needed
Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operationsby Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011
Data Structure
Data Structure
InputInput
ResultResultOperationOperationCompressCompressInputInput
InputInput
Manipulatable & Compact
Manipulatable Compact data structureRepresent data in compressed form
Have operations to manipulate data in compacted style
Get much attention for recent years
Binary Decision Diagram (BDD)LSI area
Deterministic Finite Automata (DFA)Natural Language Processing area
Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operationsby Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011
Data Structure
Data Structure
InputInput
InputInput
InputInput
CompactionCompaction
D 1D 1
D 2D 2
D 3D 3OperationOperation
Sequence Binary Decision Diagram (SeqBDD, SDD).Loekito, Bailey, and Pei (2009)
Graph structure
Represent finite sets of stringswith finite length
SDD’s basic properties are unknownMinimization
Size complexity
Operation time
ApplicationData mining
Graph mining
Human genome sequencing
What is Sequence BDD?
Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operationsby Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011
Sequence Binary
Decision Diagram
Sequence Binary
Decision Diagram
TextTextTextText
TextText
…
Family of BDDs
Compact representation for discrete structureWith rich algebraic operations
Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operationsby Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011
SDD [Loekito, et.al 2009]Sets of strings
SDD [Loekito, et.al 2009]Sets of strings
{a, b, ab, bab, abbab}{abc, acb, bac, bca}
ZDD [Minato 1993]Sets of combinations
ZDD [Minato 1993]Sets of combinations
{{a}, {b}, {a, b}}{{a}, {b}, {c}, {a, b, c}}
BDD [Bryant 1986]Boolean functions
BDD [Bryant 1986]Boolean functions
xy ∨ yz ∨ zx
¬ xyz ∨ x¬ yz ∨ xy¬ z
Relationship to Acyclic Deterministic Finite Automata (ADFA)Translation from an SDD to an ADFA and vice versa
An SDD is never larger than an ADFA
An SDD can be |Σ| times smaller than an ADFA
Computational complexity of binary set operationsGeneralize eight set operations
Tight analysis on time complexity for binary set operation algorithm
Experimental resultsSDDs can be smaller than ADFAs
Binary operation time
Result
Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operationsby Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011
Preliminary
Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operationsby Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011
Definition
Σ: alphabet (totally ordered by )≺
Internal node: , , , , 1/0 - terminal node: /
1/0 - edge: /
SDD: directed acyclic graph
Internal node S, τ(S) ↦ 〈 S.lab, S.1, S.0 〉 S.lab: label
S.1: 1-child
S.0: 0-child
Ordering ruleN.lab (N.0).lab≺
Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operationsby Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011
aa bb 11 00… zz
S.0
S.labS.labS
S.1
aa bb zz…≺ ≺ ≺ 11 00
aa
bb cc
L(N): set of strings N represents
L( ) = {ε}
L( ) = {}
L(N) = N.lab ・ L(N.1) L(N.0)∪
A path from the root to the 1-terminal noderepresent a string.
Semantics
Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operationsby Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011
11 00
aa bb
bb
{ε} {}
{b}
{a, b} {bb}
aa
{aa, ab, bb}
11 00
aa bb
bb
{ε} {}
{b}
{a, b} {bb}
aa
{aa, ab, bb}
11 00
aa bb
bb
{ε} {}
{b}
{a, b} {bb}
aa
{aa, ab, bb}
11 00
aa bb
bb
{ε} {}
{b}
{a, b} {bb}
aa
{aa, ab, bb}
0011
accept state
reject state
Comparison to ADFA
Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operationsby Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011
0011
bb ccaa
a b c
aa bb cc
aa bb cc
11 00
aa bb
bb
{a, b} {bb}aa
{aa, ab, bb}
a b
a b
b
{a, b} {b}
{aa, ab, bb}
Reduction process
Suppression
N.1 ≠ 0-terminal node
In ADFA, removing edges pointing dead state
Merging
τ(N) = τ(N’) N = N’⇒In ADFA, share all equivalent nodes
Theorem
Under these rules, SDD is unique and minimal
Like ADFA’s have unique canonical form
Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operationsby Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011
N.0
xx
N
N.1
xx
N’
N.0
xx
N
N.1
00
aa
N.0 N.0
a ・ {} ∪ L(N.0) = L (N.0)
Almost isomorphic to Acyclic Deterministic Finite Automata
BDD/ZDD techniques are applicable
Binary formSimple recursive algorithm
Easy to implement
Rich collections of operations
Use of hash tablesTo share equivalent nodes
To share intermediate computations
Characteristic
Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operationsby Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011
BDD/ZDDBDD/ZDD ADFAADFA
SDDSDD
Relationship toAcyclic Automata
Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operationsby Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011
Size
An SDD node correspond to an ADFA edge
The description size is proportional to|N|: the number of internal nodes in SDD N|A|: the number of edges in ADFA A
Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operationsby Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011
bb ccaa
a b c
Theorem: Size compare
For equivalent an SDD and an ADFA
From an ADFA A to an SDD N
From an SDD N to an ADFA A
SDD |Σ| times can be smaller than ADFA
Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operationsby Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011
2
)12(
NA
AN
0-child sharing
Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operationsby Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011
cc dd
aa
a c dee
bbe
e
cd b
Example
Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operationsby Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011
11
aa
aa
aa
bb cc
aa a
a
a
a
b
bb
b
b
c
cccc
|S| = 6 |A| = 14
ADFA ASDD S
{anbicj, n = 0, …, 4, i, j = 0, 1}
c
Experiment
Input: Canterbury corpusBibleAll: bible.txt, BibleBi: all bigrams from bible.txt, Ecoli: E.coli.txt
Fac means store all fanctors of input data
Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operationsby Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011
0 1,000,000 2,000,000 0.6
0.7
0.8
0.9
1.0 Size ratio
BibleAll BibleBi BibleAll (Fac) BibleBi (Fac) Ecoli (Fac)
Input size (byte)
SD
D s
ize /
DFA
siz
e
Binary Set Operation Algorithm
Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operationsby Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011
Set operation
A binary set operation { , ∩, ♢ ∈ ∪ \ , …}
Input: two SDDs P, Q
Output: SDD Rsuch thatL(R) = L(P) L(Q)♢
Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operationsby Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011
PP QQ
P ♢ QP ♢ Q
Binary Set OperationBinary Set Operation
Apply algorithm
Originally for BDD [Bryant 1986], applied to SDD
Based on the definition L(N) = N.lab ・ L(N.1) L(N.0)∪In operation, (when P.lab = Q.lab)L(P) L(Q) = P.lab ♢ ・ (L(P.1) L(Q.1♢ )) (L(P.0) L(Q.0))∪ ♢
Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operationsby Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011
P1P1 P0
P0
aa
Q1Q1 Q0
Q0
aa
P1♢Q1P1♢Q1 P0♢Q1
P0♢Q1
aa
♢
P Q P♢Q
Hash table technique
Key-Value hash tables
UniquetableKey: 〈 letter x, SDD node N1, SDD node N0 〉Value: SDD node N with τ(N) = 〈 x, N1, N0 〉
OpcacheKey: 〈 operation id , SDD node P, SDD node Q♢ 〉Value: SDD node R which is R = P Q♢
Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operationsby Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011
N0
xx
N
N1
Key (triple)〈 x, N1, N0〉
Value (node) N
Key (triple)〈♢ , P, Q〉
Value (node) R
Uniquetable Opcache
PP QQ
P ♢ QP ♢ Q
♢
Node create process
Any SDD node needed during computation is created via this process
Once an internal node is registered in Uniquetable, equivalent nodes will not created anymore.
Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operationsby Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011
Check the Uniquetable for key 〈 x, N1, N0〉 .Check the Uniquetable for key 〈 x, N1, N0〉 .
ExistExist Not existNot exist
Return it.Return it. Create a new node and return it.
Create a new node and return it.
Time complexity
When P Q is executed♢Every operation use Opcache
At most |P| × |Q| different instances of recursive calls invoke
(Assume that the access time to hash tables is constant)
Naïve methodPrepare |P| × |Q| size table
This methodNo useless or redundant node
TheoremWorst case O(|P| |Q|) time
Example needs Ω(|P| |Q|) time exist
Lower and upper bound got
Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operationsby Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011
Check the Opcache for key 〈♢ , P, Q〉 .
Check the Opcache for key 〈♢ , P, Q〉 .
ExistExist Not existNot exist
P ♢ Q is already done,
return it.
P ♢ Q is already done,
return it.
Continue to computation
on 0-side and 1-side.
Continue to computation
on 0-side and 1-side.
Experiment
Operation timePrepare two SDDs for all factors of random texts of length n
Time to compute operation
Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operationsby Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011
0 10000 20000 30000 40000 50000 60000 70000 80000 90000 1000000
200
400
600
800
1000
1200
1400
1600
union
intersection
difference
Length of text(letter)
Exe
cuti
on
ti
me
(ms)
Conclusion
Relationship to Acyclic AutomataAn SDD can be |Σ| times smaller than an ADFA
For real data, SDDs are 10~20 % more compact than ADFAs
Computational complexity of binary set operationsWorst case time complexity is quadratic
Tight time bound is analyzed
In our experiment, operation time is almost linear
Future workEfficient implement of various operations
Propose substring index on SDD
Factor SDD construction algorithm
Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operationsby Shuhei Denzumi, Ryo Yoshinaka, Shin-ichi Minato, and Hiroki Arimura, 2011-08-30 (TUE), Prague Stringology Conference 2011
top related