network motifs: simple building blocks of complex networksckingsf/bioinfo-lectures/netmotifs.pdf ·...
TRANSCRIPT
Beyond Degree Distribution & Diameter
Network Motifs: Consider all possible ways to connect 3 nodes with directed edges:
(Milo et al., Science, 2002)
Finding Over-represented Subgraphs
For each possible motif M:Let cM be the number of times M occurs in graph G.Estimate pM = Pr[# occurrences ≥ cM] when edges are shuffled.Output M if pM < 0.01 and cM > 4.
To generate a random graph for the 3-node motifs:
a b
c d
a b
c d
a b
c d
a b
c d
Single and double edges swapped separately:
To Generate Random Graphs With a Given Distribution of (n-1)-node subgraphs:
Define an “energy” on a vector of occurrences of motifs:
Energy(Vrand) =�
M
|Vreal,M − Vrand,M |(Vreal,M − Vrand,M )
When Vrand = Vreal, the energy is 0.
Start with a randomized network.Until Energy is small:
Make a random swap.If the swap reduces the energy, keep itOtherwise, keep it with probability exp(-ΔE/T)
+
I
I
I
E
“Information processing” networks tend to use the same motifs
Other networks each had their own distinct collection of motifs.
Feed forward, e.g.: filter out transient signals.
(Milo et al., Science, 2002)
Network Motif Discovery Using Subgraph Enumeration and
Symmetry-Breaking Grochow & Kellis, RECOMB 2007
Backtracking (Recursive) Algorithm to Find Network Motifs
G =
H = (Small query graph)
(Large Network)
Def. Node g supports node h if the degrees of g and h are compatible.
Backtracking (Recursive) Algorithm to Find Network Motifs
G =
H = (Small query graph)
(Large Network)
Def. Node g supports node h if the degrees of g and h are compatible.
f : VH → VG
Basic Algorithm:
For each node g ∈ GFor each node h ∈ HIf h can’t support g: continue
Let f = {(g!h)}L = Extend(f, G, H)For q in L: Output image of q
Remove g from G
q : VH → VG
For every possible mapping of a single node from G to H
f is a partial map that maps g to h.
Then grow this partial map into many full maps
No need to consider g again (since we tried all its possible matches already)
Extend(f, G, H):
If domain(f) = H: return [f]
Let m = some node in N(domain(f))For each node u ∈ N(f(domain(f))):
If adding (m!u) to f keeps f as a valid isomorphism then:Extend(f∪{(m!u)}, G, H)
g
h
domain(f)
N(domain(f))
f(domain(f))
N(f(domain(f)))
m
u
u
Base case
Choose a node in HTry to map it to G
Extend(f, G, H):
If domain(f) = H: return [f]
Let m = some node in N(domain(f))For each node u ∈ N(f(domain(f))):
If adding (m!u) to f keeps f as a valid isomorphism then:Extend(f∪{(m!u)}, G, H)
g
h
domain(f)
N(domain(f))
f(domain(f))
N(f(domain(f)))
m
u
u
Base case
Choose a node in HTry to map it to G
Speed-up #1
• Every time we can choose a node, we pick the one that is “most constrained”:
- Pick the node that already has the most mapped neighbors
- If there are ties, choose the node with the highest degree
- If there are still ties, choose the node with highest 2nd order degree (total degree of the neighbors)
• Just a heuristic --- doesn’t hurt because we can pick the nodes in any order we want
- if a map that we are building can’t be completed, we want to know sooner rather than later.
Automorphisms & Orbits
C
A
B
D
E
F
C
A
B
D
E
F
Def. An automorphism is an isomorphism from a graph to itself.
Orbit of a node u is the set of nodes that u is mapped to under some automorphism
Automorphisms & Orbits
C
A
B
D
E
F
C
A
B
D
E
F
Def. An automorphism is an isomorphism from a graph to itself.
Orbit of a node u is the set of nodes that u is mapped to under some automorphism
Main Speedup (#2)
1
2
3
6Number each node of G
f : VH → VG
A mapping f induces a numbering on H
4
53
Main Speedup (#2)
1
2
3
6Number each node of G
f : VH → VG
A mapping f induces a numbering on H4
5
3
Main Speedup (#2)
1
2
3
6Number each node of G
f : VH → VG
A mapping f induces a numbering on H4
5
3A
B
C
A < min{B,C}
C < min{B}
If we add these constraints, we get only one possible mapping
Basic Algorithm, differences for symmetry breaking
For each node g ∈ GFor each node h ∈ H s.t. we haven’t considered q ∈ Orbit(h):If h can’t support g: continue
Let f = {(g!h)}L = Extend(f, G, H, CH)For q in L: Output image of q
Remove g from G
q : VH → VG
Extend(f, G, H), symmetry breaking differences
If domain(f) = H: return [f]
Let m = some node in N(domain(f))For each node u ∈ N(f(domain(f))):
If adding (m!u) to f keeps f as a valid isomorphism and (m!u) obeys the constraints then:Extend(f∪{(m!u)}, G, H)
g
h
domain(f)
N(domain(f))
f(domain(f))
N(f(domain(f)))
m
u
u
Really Large “motifs”? Meaningful?
Occurred 27,720 times in the real yeast PPI
network (but rarely in a random network)
Really just a subgraph of this part of the yeast
PPI: choose 4 nodes from the clique and 3 nodes from the oval.
(Figure from Grochow & Kellis, 2007)
Other Advantages
• Since symmetry breaking ensures each match is output only once, they don’t need to keep track of which graphs they’ve already output
- save a lot of space
• Can be parallelized better