ciencia de redes (humanas y sociales) #2 · 2019. 5. 3. · red de enfermedades humanas
TRANSCRIPT
®
Ciencia de Redes (Humanas y Sociales)
#2
Carlos SarrauteInstituto de Cálculo, Abril-Junio 2019
Conceptos fundamentales de Teoría de Grafos
COMPONENTES DE UN SISTEMA COMPLEJO
§ componentes: nodos, vertices N
§ interacciones: vínculos, enlaces, aristas L
§ sistema: red, grafo (N,L)
Enlaces: no dirigidos (simétricos)
Grafo:
Enlaces dirigidos :URLs en la webLlamados telefónicosReacciones metabolicas
REDES DIRIGIDAS VS. NO DIRIGIDAS
No dirigido Dirigido
A
B
D
C
L
MF
G
H
I
Links: directed (arcs).
Digrafo = directed graph:
Enlaces no dirigidos:Vínculo de coautorRed de actorsInteracciones entre proteinas
AG
F
BC
D
E
Distribución de grados
Grado del nodo: cantidad de enlaces que conectan con el nodo
�
kB = 4
GRADO DE UN NODO
No
dirig
ido
En los grafos dirigidos se puede definir un in-degree y out-
degree. El grado (total) es la suma de in- y out-degree.
Source: nodo con kin= 0; Sink: nodo con kout= 0.
2k inC = 1koutC = 3=Ck
Diri
gido
AG
F
BC
D
E
A
B
�
kA =1
å=
ºN
iikN
k1
1
outinN
1i
outi
outN
1i
ini
in kk ,kN1k ,k
N1k =ºº åå
==
�
k ≡ 2LN
�
k ≡ LN
GRADO PROMEDIO
No
dirig
ido
Diri
gido
A
F
BC
D
E
j
i
Distribución de gradosP(k): probabilidad de que un nodo al azar tenga grado k
Nk = # nodos con grado k
P(k) = Nk / N ➔ plot
DISTRIBUCIÓN DE GRADOS
DISTRIBUCIÓN DE GRADOS
The degree distribution has taken a central role in net-work theory following the discovery of scale-free networks (Barabási & Albert, 1999). Another reason for its impor-tance is that the calculation of most network properties re-quires us to know pk. For example, the average degree of a network can be written as
We will see in the coming chapters that the precise func-tional form of pk determines many network phenomena, from network robustness to the spread of viruses.
∑==
∞
k kpkk 0
Image 2.4aDegree distribution.
The degree distribution is defined as the pk = Nk /N ratio, where Nk denotes the number of k-degree nodes in a network. For the network in (a) we have N = 4 and p1 = 1/4 (one of the four nodes has degree k1 = 1), p2 = 1/2 (two nodes have k3 = k4 = 2), and p3 = 1/4 (as k2 = 3). As we lack nodes with degree k > 3, pk = 0 for any k > 3. Panel (b) shows the degree distri-bution of a one dimensional lattice. As each node has the same degree k = 2, the degree distribution is a Kronecker’s delta function pk = H(k - 2).
Image 2.4b
In many real networks, the node degree can vary considerably. For exam-ple, as the degree distribution (a) indicates, the degrees of the proteins in the protein interaction network shown in (b) vary between k=0 (isolated nodes) and k=92, which is the degree of the largest node, called a hub. There are also wide differences in the number of nodes with different degrees: as (a) shows, almost half of the nodes have degree one (i.e. p1=0.48), while there is only one copy of the biggest node, hence p92 = 1/N=0.0005. (c) The degree distribution is often shown on a so-called log-log plot, in which we either plot log pk in function of log k, or, as we did in (c), we use logarithmic axes.
DEGREE, AVERAGE DEGREE, AND DEGREE DISTRIBUTION | 29
Matriz de adyacencia
Sección 2
Matriz de adyacencia
• Representa enlaces como matriz– Aij = 1 si nodo i tiene enlace hacia nodo j
= 0 sino
– Aii = 0 salvo que el grafo tenga “self-loops”
– Aij = Aji si el grafo es no dirigido,o si i y j tiene un enlace recíproco
Ejemplo de matriz de adyacencia
1
23
45
0 0 0 0 00 0 1 1 00 1 0 1 00 0 0 0 11 1 0 0 0
A =
Grados de nodos usando matriz
Outdegree =
0 0 0 0 00 0 1 1 00 1 0 1 00 0 0 0 11 1 0 0 0
A =å=
n
jijA
1
ejemplo: outdegree para nodo 3 sumamos la 3er fila
Indegree =
0 0 0 0 00 0 1 1 00 1 0 1 00 0 0 0 11 1 0 0 0
A =å=
n
iijA
1
ejemplo: indegree para nodo 3Sumamos la 3er columna
å=
n
iiA
13
å=
n
jjA
13
1
2
3
45
Lista de aristas
• Lista de aristas– 2, 3– 2, 4– 3, 2– 3, 4– 4, 5– 5, 2– 5, 1
1
23
45
Lista de adyacencia• Lista de adyacencia
– Mas facil de usar para redes• grandes• ralas (sparse)
– Recuperar facilmente losvecinos de un nodo
• 1:• 2: 3 4• 3: 2 4• 4: 5• 5: 1 2
1
2
3
45
a b c d e f g h
a 0 1 0 0 1 0 1 0
b 1 0 1 0 0 0 0 1
c 0 1 0 1 0 1 1 0
d 0 0 1 0 1 0 0 0
e 1 0 0 1 0 0 0 0
f 0 0 1 0 0 0 1 0
g 1 0 1 0 0 1 0 0
h 0 1 0 0 0 0 0 0
Ejemplo de matriz de adyacencia
b
e
g
a
c
f
h d
Las redes reales son ralas (sparse)
Sección 3
La cantidad máxima de vínculos en unared con N nodos:
�
Lmax =N2
⎛
⎝ ⎜
⎞
⎠ ⎟ = N(N −1)
2
Un grafo con vínculos L = Lmax se llama grafo completo, su grado promedio es <k> = N-1
Grafo completo
La mayoría de las redes observadas en sistemas realesson ralas (sparse):
L << Lmax
<k> << N-1.
WWW (ND Sample): N=325,729; L=1.4 106 Lmax=1012 <k>=4.51Protein (S. Cerevisiae): N= 1,870; L=4,470 Lmax=107 <k>=2.39 Coauthorship (Math): N= 70,975; L=2 105 Lmax=3 1010 <k>=3.9Movie Actors: N=212,250; L=6 106 Lmax=1.8 1013 <k>=28.78
(Source: Albert, Barabasi, RMP2002)
LAS REDES REALES SON RALAS
MATRICES DE ADYACENCIA SON RALAS
REDES BIPARTITAS
Sección 4
grafo bipartito es un grafo cuyos nodos se pueden dividir en dos conjuntos separados U y V, de manera que cada enlace conecta un nodo en U con uno en V; es decir, U y V son conjuntos independientes.
Ejemplos:
Red de actores del cine argentino
Red de enfermedades
GRAFO BIPARTITO
GRANDATAGRANDATA
Gene network
GENOME
PHENOMEDISEASOME
Disease network
Goh, Cusick, Valle, Childs, Vidal & Barabási, PNAS (2007)
RED DE GENES Y RED DE ENFERMEDADES
RED DE ENFERMEDADES HUMANAS
https://archive.nytimes.com/www.nytimes.com/interactive/2008/05/05/science/20080506_DISEASE.html?ref=health
GRANDATAGRANDATA
Y.-Y. Ahn, S. E. Ahnert, J. P. Bagrow, A.-L. Barabási Flavor network and the principles of food pairing , Scientific Reports 196, (2011).
RED BIPARTITA DE INGREDIENTES Y SABORES
Ejemplos de grafos bipartitos
Ejemplos de grafos bipartitos
• Científicos
• Actores
• Músicos
• Papers
• Películas
• Bandas, conciertos
Ejemplos de grafos bipartitos
Legisladores Leyes
CAMINOS
Sección 5
The distance (shortest path, geodesic path) between two nodes is defined as the number of edges along the shortest path connecting them.
*If the two nodes are disconnected, the distance is infinity.
In directed graphs each path needs to follow the direction of the arrows.Thus in a digraph the distance from node A to B (on an AB path) is generally different from the distance from node B to A (on a BCA path).
DISTANCIA EN UN GRAFO Caminos más corto, camino geodésico
DC
A
B
DC
A
B
Network Science: Graph Theory
1 11
1
2
2
22
2
3
3
3
3
3
3
3
3
44
4
4
4
4
4
4
Distance between node 0 and node 4:
1.Start at 0.
CALCULANDO DISTANCIAS: BREADTH FIRST SEARCH
0
Network Science: Graph Theory
1 11
1
2
2
22
2
3
3
3
3
3
3
3
3
44
4
4
4
4
4
4
Distance between node 0 and node 4:1.Start at 0.2.Find the nodes adjacent to 1. Mark them as at distance 1. Put them in a queue.
Network Science: Graph Theory
0 11
1
CALCULANDO DISTANCIAS: BREADTH FIRST SEARCH
Network Science: Graph Theory
1 11
1
2
2
22
2
3
3
3
3
3
3
3
3
44
4
4
4
4
4
4
Distance between node 0 and node 4:1.Start at 0.2.Find the nodes adjacent to 0. Mark them as at distance 1. Put them in a queue.3.Take the first node out of the queue. Find the unmarked nodes adjacent to it in the graph. Mark them with the label of 2. Put them in the queue.
Network Science: Graph Theory
0 11
1
2
2
22
2
Network Science: Graph Theory
1
1
CALCULANDO DISTANCIAS: BREADTH FIRST SEARCH
Distance between node 0 and node 4:
1.Repeat until you find node 4 or there are no more nodes in the queue.2.The distance between 0 and 4 is the label of 4 or, if 4 does not have a label, infinity.
Network Science: Graph Theory
0 11
1
2
2
22
2
3
3
3
3
3
3
3
3
44
4
4
4
4
4
4
CALCULANDO DISTANCIAS: BREADTH FIRST SEARCH
Diameter: dmax the maximum distance between any pair of nodes in the graph.
Average path length/distance, <d>, for a connected graph:
where dij is the distance from node i to node j
In an undirected graph dij =dji , so we only need to count them once:
�
d ≡1
2Lmaxdij
i, j≠ i∑
�
d ≡1Lmax
diji, j> i∑
DIAMETRO DE LA RED Y DISTANCIA PROMEDIO
CAMINOS: RESUMEN
2 5
43
1
l1!4
l1!4
l1!5
Shortest Path
l1!5 = 2
l1!4 = 3
The path with the shortest length between two nodes
(distance).
CAMINOS: RESUMEN
2 5
43
1
Diameter
l1!4 = 3
2 5
43
1
Average Path Length
(l1!2 + l1!3 + l1!4+
+ l1!5 + l2!3 + l2!4+
+ l2!5 + l3!4 + l3!5+
+ l4!5) /10 = 1.6
The longest shortest path in a graph
The average of the shortest paths for all pairs of nodes.
CAMINOS: RESUMEN
2 5
43
1
Cycle
2 5
43
1Self-avoiding Path
A path with the same start and end node.
A path that does not intersect itself.
CAMINOS: RESUMEN
2 5
43
1
2 5
43
1
Eulerian Path Hamiltonian Path
A path that visits each node exactly once.
A path that traverses each link exactly once.
CONECTIVIDAD
Sección 6
Connected (undirected) graph: any two vertices can be joined by a path.A disconnected graph is made up by two or more connected components.
Bridge: if we erase it, the graph becomes disconnected.
Largest Component: Giant Connected Component
The rest: Isolates
CONECTIVIDAD EN GRAFOS NO DIRIGIDOS
DC
A
B
F
F
G
DC
A
B
F
F
G
The adjacency matrix of a network with several components can be written in a block-diagonal form, so that nonzero elements are confined to squares, with all other elements being zero:
CONECTIVIDAD EN GRAFOS NO DIRIGIDOS Matriz de Adyacencia
Strongly connected directed graph: has a path from each node to every other node and vice versa (e.g. AB path and BA path).
Weakly connected directed graph: it is connected if we disregard theedge directions.
Strongly connected components can be identified, but not every node is partof a nontrivial strongly connected component.
CONECTIVIDAD EN GRAFOS DIRIGIDOS
D C
A
B
FG
E
E
C
A
B
G
F
D
Coeficiente de Clustering
Sección 7
Coeficiente de Clustering: qué fracción de tus vecinos están conectados?
Nodo i con grado ki
Ci en [0,1]
COEFICIENTE DE CLUSTERING
Watts & Strogatz, Nature 1998.
COEFICIENTE DE CLUSTERING
Watts & Strogatz, Nature 1998.
SECTION 10
CLUSTERING COEFFICIENT
The local clustering coefficient captures the degree to which the neighbors of a given node link to each other. For a node i with degree ki the local clustering coefficient is de-fined as [5]. (19)
where Li represents the number of links between the ki neighbors of node i. Note that Ci is between 0 and 1:
Ci = 0 if none of the neighbors of node i link to each other;
Ci = 1 if the neighbors of node i form a complete graph, i.e. they all link to each other (Image 2.7).
In general Ci is the probability that two neighbors of a node link to each other: C = 0.5 implies that there is a 50% chance that two neighbors of a node are linked.
In summary Ci measures the network’s local density: the more densely interconnected the neighborhood of node i, the higher is Ci.
The degree of clustering of a whole network is captured by the average clustering coefficient, <C>, representing the av-erage of Ci over all nodes i = 1, ..., N [5], . (20)
In line with the probabilistic interpretation <C> is the probability that two neighbors of a randomly selected node link to each other.
While Eq. (19) is defined for undirected networks, the clustering coefficient can be generalized to directed and weighted [6,7,8,9]) networks as well. Note that in the net-work literature one also often encounters the global clus-tering coefficient, defined in Appendix A.
=−
C Lk k2( 1 )i
i
i i
∑==
C N C1i
i
N
1
Image 2.15Clustering Coefficient.
The local clustering coefficient, Ci , of the central node with degree ki=4 for three different configurations of its neighborhood. The clustering coefficient measures the local density of links in a node’s vicinity. The bottom figure shows a small network, with the local clustering coefficient of a node shown next to each node. Next to the figure we also list the network’s average clustering coefficient <C>, according to Eq. (20), and its global clustering coefficient C, declined in Appendix A, Eq. (21). Note that for nodes with degrees ki=0,1, the clustering coefficient is taken to be zero.
CLUSTERING COEFFICIENT | 41
Coeficiente de Clustering: qué fracción de tus vecinos están conectados?
Nodo i con grado ki
Ci en [0,1]
RESUMEN
Sección 8
Distribución de grados: P(k)
Longitud de caminos: <d>
Coeficiente de Clustering:
TRES MÉTRICAS CENTRALES EN CIENCIA DE REDES
3
�
Aij =
0 1 1 01 0 1 11 1 0 00 1 0 0
⎛
⎝
⎜ ⎜ ⎜ ⎜
⎞
⎠
⎟ ⎟ ⎟ ⎟
�
Aii = 0 Aij = A ji
L = 12
Aiji, j=1
N
∑ < k >= 2LN �
Aij =
0 1 0 00 0 1 11 0 0 00 0 0 0
⎛
⎝
⎜ ⎜ ⎜ ⎜
⎞
⎠
⎟ ⎟ ⎟ ⎟
�
Aii = 0 Aij ≠ A ji
L = Aiji, j=1
N
∑ < k >= LN
GRAFOS 1
Undirected Directed
14
23
2
14
Actor network, protein-protein interactions WWW, citation networks
�
Aij =
0 1 1 01 0 1 11 1 0 00 1 0 0
⎛
⎝
⎜ ⎜ ⎜ ⎜
⎞
⎠
⎟ ⎟ ⎟ ⎟
�
Aii = 0 Aij = A ji
L = 12
Aiji, j=1
N
∑ < k >= 2LN �
Aij =
0 2 0.5 02 0 1 40.5 1 0 00 4 0 0
⎛
⎝
⎜ ⎜ ⎜ ⎜
⎞
⎠
⎟ ⎟ ⎟ ⎟
�
Aii = 0 Aij = A ji
L = 12
nonzero(Aij )i, j=1
N
∑ < k >= 2LN
GRAFOS 2
Unweighted(undirected)
Weighted(undirected)
3
14
23
2
14
protein-protein interactions, www Call Graph, metabolic networks
�
Aij =
1 1 1 01 0 1 11 1 0 00 1 0 1
⎛
⎝
⎜ ⎜ ⎜ ⎜
⎞
⎠
⎟ ⎟ ⎟ ⎟
�
Aii ≠ 0 Aij = A ji
L = 12
Aij + Aiii=1
N
∑i, j=1,i≠ j
N
∑ ? �
Aij =
0 2 1 02 0 1 31 1 0 00 3 0 0
⎛
⎝
⎜ ⎜ ⎜ ⎜
⎞
⎠
⎟ ⎟ ⎟ ⎟
�
Aii = 0 Aij = A ji
L = 12
nonzero(Aij )i, j=1
N
∑ < k >= 2LN
GRAFOS 3
Self-interactions Multigraph(undirected)
3
14
23
2
14
Protein interaction network, www Social networks, collaboration networks
�
Aij =
0 1 1 11 0 1 11 1 0 11 1 1 0
⎛
⎝
⎜ ⎜ ⎜ ⎜
⎞
⎠
⎟ ⎟ ⎟ ⎟
�
Aii = 0 Ai≠ j =1
L = Lmax = N(N −1)2
< k >= N −1
GRAFOS 4
Complete Graph(undirected)
3
14
2
Actor network, protein-protein interactions