ciencia de redes (humanas y sociales) #2 · 2019. 5. 3. · red de enfermedades humanas

®

Ciencia de Redes (Humanas y Sociales)

#2

Carlos SarrauteInstituto de Cálculo, Abril-Junio 2019

Conceptos fundamentales de Teoría de Grafos

COMPONENTES DE UN SISTEMA COMPLEJO

§ componentes: nodos, vertices N

§ interacciones: vínculos, enlaces, aristas L

§ sistema: red, grafo (N,L)

Enlaces: no dirigidos (simétricos)

Grafo:

Enlaces dirigidos :URLs en la webLlamados telefónicosReacciones metabolicas

REDES DIRIGIDAS VS. NO DIRIGIDAS

No dirigido Dirigido

A

B

D

C

L

MF

G

H

I

Links: directed (arcs).

Digrafo = directed graph:

Enlaces no dirigidos:Vínculo de coautorRed de actorsInteracciones entre proteinas

AG

F

BC

D

E

Distribución de grados

Grado del nodo: cantidad de enlaces que conectan con el nodo

�

kB = 4

GRADO DE UN NODO

No

dirig

ido

En los grafos dirigidos se puede definir un in-degree y out-

degree. El grado (total) es la suma de in- y out-degree.

Source: nodo con kin= 0; Sink: nodo con kout= 0.

2k inC = 1koutC = 3=Ck

Diri

gido

AG

F

BC

D

E

A

B

�

kA =1

å=

ºN

iikN

k1

1

outinN

1i

outi

outN

1i

ini

in kk ,kN1k ,k

N1k =ºº åå

==

�

k ≡ 2LN

�

k ≡ LN

GRADO PROMEDIO

No

dirig

ido

Diri

gido

A

F

BC

D

E

j

i

Distribución de gradosP(k): probabilidad de que un nodo al azar tenga grado k

Nk = # nodos con grado k

P(k) = Nk / N ➔ plot

DISTRIBUCIÓN DE GRADOS

DISTRIBUCIÓN DE GRADOS

The degree distribution has taken a central role in net-work theory following the discovery of scale-free networks (Barabási & Albert, 1999). Another reason for its impor-tance is that the calculation of most network properties re-quires us to know pk. For example, the average degree of a network can be written as

We will see in the coming chapters that the precise func-tional form of pk determines many network phenomena, from network robustness to the spread of viruses.

∑==

∞

k kpkk 0

Image 2.4aDegree distribution.

The degree distribution is defined as the pk = Nk /N ratio, where Nk denotes the number of k-degree nodes in a network. For the network in (a) we have N = 4 and p1 = 1/4 (one of the four nodes has degree k1 = 1), p2 = 1/2 (two nodes have k3 = k4 = 2), and p3 = 1/4 (as k2 = 3). As we lack nodes with degree k > 3, pk = 0 for any k > 3. Panel (b) shows the degree distri-bution of a one dimensional lattice. As each node has the same degree k = 2, the degree distribution is a Kronecker’s delta function pk = H(k - 2).

Image 2.4b

In many real networks, the node degree can vary considerably. For exam-ple, as the degree distribution (a) indicates, the degrees of the proteins in the protein interaction network shown in (b) vary between k=0 (isolated nodes) and k=92, which is the degree of the largest node, called a hub. There are also wide differences in the number of nodes with different degrees: as (a) shows, almost half of the nodes have degree one (i.e. p1=0.48), while there is only one copy of the biggest node, hence p92 = 1/N=0.0005. (c) The degree distribution is often shown on a so-called log-log plot, in which we either plot log pk in function of log k, or, as we did in (c), we use logarithmic axes.

DEGREE, AVERAGE DEGREE, AND DEGREE DISTRIBUTION | 29

Matriz de adyacencia

Sección 2

Matriz de adyacencia

• Representa enlaces como matriz– Aij = 1 si nodo i tiene enlace hacia nodo j

= 0 sino

– Aii = 0 salvo que el grafo tenga “self-loops”

– Aij = Aji si el grafo es no dirigido,o si i y j tiene un enlace recíproco

Ejemplo de matriz de adyacencia

1

23

45

0 0 0 0 00 0 1 1 00 1 0 1 00 0 0 0 11 1 0 0 0

A =

Grados de nodos usando matriz

Outdegree =

0 0 0 0 00 0 1 1 00 1 0 1 00 0 0 0 11 1 0 0 0

A =å=

n

jijA

1

ejemplo: outdegree para nodo 3 sumamos la 3er fila

Indegree =

0 0 0 0 00 0 1 1 00 1 0 1 00 0 0 0 11 1 0 0 0

A =å=

n

iijA

1

ejemplo: indegree para nodo 3Sumamos la 3er columna

å=

n

iiA

13

å=

n

jjA

13

1

2

3

45

Lista de aristas

• Lista de aristas– 2, 3– 2, 4– 3, 2– 3, 4– 4, 5– 5, 2– 5, 1

1

23

45

Lista de adyacencia• Lista de adyacencia

– Mas facil de usar para redes• grandes• ralas (sparse)

– Recuperar facilmente losvecinos de un nodo

• 1:• 2: 3 4• 3: 2 4• 4: 5• 5: 1 2

1

2

3

45

a b c d e f g h

a 0 1 0 0 1 0 1 0

b 1 0 1 0 0 0 0 1

c 0 1 0 1 0 1 1 0

d 0 0 1 0 1 0 0 0

e 1 0 0 1 0 0 0 0

f 0 0 1 0 0 0 1 0

g 1 0 1 0 0 1 0 0

h 0 1 0 0 0 0 0 0

Ejemplo de matriz de adyacencia

b

e

g

a

c

f

h d

Las redes reales son ralas (sparse)

Sección 3

La cantidad máxima de vínculos en unared con N nodos:

�

Lmax =N2

⎛

⎝ ⎜

⎞

⎠ ⎟ = N(N −1)

2

Un grafo con vínculos L = Lmax se llama grafo completo, su grado promedio es <k> = N-1

Grafo completo

La mayoría de las redes observadas en sistemas realesson ralas (sparse):

L << Lmax

<k> << N-1.

WWW (ND Sample): N=325,729; L=1.4 106 Lmax=1012 <k>=4.51Protein (S. Cerevisiae): N= 1,870; L=4,470 Lmax=107 <k>=2.39 Coauthorship (Math): N= 70,975; L=2 105 Lmax=3 1010 <k>=3.9Movie Actors: N=212,250; L=6 106 Lmax=1.8 1013 <k>=28.78

(Source: Albert, Barabasi, RMP2002)

LAS REDES REALES SON RALAS

MATRICES DE ADYACENCIA SON RALAS

REDES BIPARTITAS

Sección 4

grafo bipartito es un grafo cuyos nodos se pueden dividir en dos conjuntos separados U y V, de manera que cada enlace conecta un nodo en U con uno en V; es decir, U y V son conjuntos independientes.

Ejemplos:

Red de actores del cine argentino

Red de enfermedades

GRAFO BIPARTITO

GRANDATAGRANDATA

Gene network

GENOME

PHENOMEDISEASOME

Disease network

Goh, Cusick, Valle, Childs, Vidal & Barabási, PNAS (2007)

RED DE GENES Y RED DE ENFERMEDADES

RED DE ENFERMEDADES HUMANAS

https://archive.nytimes.com/www.nytimes.com/interactive/2008/05/05/science/20080506_DISEASE.html?ref=health

https://archive.nytimes.com/www.nytimes.com/interactive/2008/05/05/science/20080506_DISEASE.html?ref=health

GRANDATAGRANDATA

Y.-Y. Ahn, S. E. Ahnert, J. P. Bagrow, A.-L. Barabási Flavor network and the principles of food pairing , Scientific Reports 196, (2011).

RED BIPARTITA DE INGREDIENTES Y SABORES

Ejemplos de grafos bipartitos


• Científicos

• Actores

• Músicos

• Papers

• Películas

• Bandas, conciertos


Legisladores Leyes

CAMINOS

Sección 5

The distance (shortest path, geodesic path) between two nodes is defined as the number of edges along the shortest path connecting them.

*If the two nodes are disconnected, the distance is infinity.

In directed graphs each path needs to follow the direction of the arrows.Thus in a digraph the distance from node A to B (on an AB path) is generally different from the distance from node B to A (on a BCA path).

DISTANCIA EN UN GRAFO Caminos más corto, camino geodésico

DC

A

B

DC

A

B

Network Science: Graph Theory

1 11

1

2

2

22

2

3

3

3

3

3

3

3

3

44

4

4

4

4

4

4

Distance between node 0 and node 4:

1.Start at 0.

CALCULANDO DISTANCIAS: BREADTH FIRST SEARCH

0


1 11

1

2

2

22

2

3

3

3

3

3

3

3

3

44

4

4

4

4

4

4

Distance between node 0 and node 4:1.Start at 0.2.Find the nodes adjacent to 1. Mark them as at distance 1. Put them in a queue.


0 11

1



1 11

1

2

2

22

2

3

3

3

3

3

3

3

3

44

4

4

4

4

4

4

Distance between node 0 and node 4:1.Start at 0.2.Find the nodes adjacent to 0. Mark them as at distance 1. Put them in a queue.3.Take the first node out of the queue. Find the unmarked nodes adjacent to it in the graph. Mark them with the label of 2. Put them in the queue.


0 11

1

2

2

22

2


1

1


Distance between node 0 and node 4:

1.Repeat until you find node 4 or there are no more nodes in the queue.2.The distance between 0 and 4 is the label of 4 or, if 4 does not have a label, infinity.


0 11

1

2

2

22

2

3

3

3

3

3

3

3

3

44

4

4

4

4

4

4


Diameter: dmax the maximum distance between any pair of nodes in the graph.

Average path length/distance, <d>, for a connected graph:

where dij is the distance from node i to node j

In an undirected graph dij =dji , so we only need to count them once:

�

d ≡1

2Lmaxdij

i, j≠ i∑

�

d ≡1Lmax

diji, j> i∑

DIAMETRO DE LA RED Y DISTANCIA PROMEDIO

CAMINOS: RESUMEN

2 5

43

1

l1!4

l1!4

l1!5

Shortest Path

l1!5 = 2

l1!4 = 3

The path with the shortest length between two nodes

(distance).

CAMINOS: RESUMEN

2 5

43

1

Diameter

l1!4 = 3

2 5

43

1

Average Path Length

(l1!2 + l1!3 + l1!4+

+ l1!5 + l2!3 + l2!4+

+ l2!5 + l3!4 + l3!5+

+ l4!5) /10 = 1.6

The longest shortest path in a graph

The average of the shortest paths for all pairs of nodes.

CAMINOS: RESUMEN

2 5

43

1

Cycle

2 5

43

1Self-avoiding Path

A path with the same start and end node.

A path that does not intersect itself.

CAMINOS: RESUMEN

2 5

43

1

2 5

43

1

Eulerian Path Hamiltonian Path

A path that visits each node exactly once.

A path that traverses each link exactly once.

CONECTIVIDAD

Sección 6

Connected (undirected) graph: any two vertices can be joined by a path.A disconnected graph is made up by two or more connected components.

Bridge: if we erase it, the graph becomes disconnected.

Largest Component: Giant Connected Component

The rest: Isolates

CONECTIVIDAD EN GRAFOS NO DIRIGIDOS

DC

A

B

F

F

G

DC

A

B

F

F

G

The adjacency matrix of a network with several components can be written in a block-diagonal form, so that nonzero elements are confined to squares, with all other elements being zero:

CONECTIVIDAD EN GRAFOS NO DIRIGIDOS Matriz de Adyacencia

Strongly connected directed graph: has a path from each node to every other node and vice versa (e.g. AB path and BA path).

Weakly connected directed graph: it is connected if we disregard theedge directions.

Strongly connected components can be identified, but not every node is partof a nontrivial strongly connected component.

CONECTIVIDAD EN GRAFOS DIRIGIDOS

D C

A

B

FG

E

E

C

A

B

G

F

D

Coeficiente de Clustering

Sección 7

Coeficiente de Clustering: qué fracción de tus vecinos están conectados?

Nodo i con grado ki

Ci en [0,1]

COEFICIENTE DE CLUSTERING

Watts & Strogatz, Nature 1998.

COEFICIENTE DE CLUSTERING

Watts & Strogatz, Nature 1998.

SECTION 10

CLUSTERING COEFFICIENT

The local clustering coefficient captures the degree to which the neighbors of a given node link to each other. For a node i with degree ki the local clustering coefficient is de-fined as [5]. (19)

where Li represents the number of links between the ki neighbors of node i. Note that Ci is between 0 and 1:

Ci = 0 if none of the neighbors of node i link to each other;

Ci = 1 if the neighbors of node i form a complete graph, i.e. they all link to each other (Image 2.7).

In general Ci is the probability that two neighbors of a node link to each other: C = 0.5 implies that there is a 50% chance that two neighbors of a node are linked.

In summary Ci measures the network’s local density: the more densely interconnected the neighborhood of node i, the higher is Ci.

The degree of clustering of a whole network is captured by the average clustering coefficient, <C>, representing the av-erage of Ci over all nodes i = 1, ..., N [5], . (20)

In line with the probabilistic interpretation <C> is the probability that two neighbors of a randomly selected node link to each other.

While Eq. (19) is defined for undirected networks, the clustering coefficient can be generalized to directed and weighted [6,7,8,9]) networks as well. Note that in the net-work literature one also often encounters the global clus-tering coefficient, defined in Appendix A.

=−

C Lk k2( 1 )i

i

i i

∑==

C N C1i

i

N

1

Image 2.15Clustering Coefficient.

The local clustering coefficient, Ci , of the central node with degree ki=4 for three different configurations of its neighborhood. The clustering coefficient measures the local density of links in a node’s vicinity. The bottom figure shows a small network, with the local clustering coefficient of a node shown next to each node. Next to the figure we also list the network’s average clustering coefficient <C>, according to Eq. (20), and its global clustering coefficient C, declined in Appendix A, Eq. (21). Note that for nodes with degrees ki=0,1, the clustering coefficient is taken to be zero.

CLUSTERING COEFFICIENT | 41

Coeficiente de Clustering: qué fracción de tus vecinos están conectados?

Nodo i con grado ki

Ci en [0,1]

RESUMEN

Sección 8

Distribución de grados: P(k)

Longitud de caminos: <d>

Coeficiente de Clustering:

TRES MÉTRICAS CENTRALES EN CIENCIA DE REDES

3

�

Aij =

0 1 1 01 0 1 11 1 0 00 1 0 0

⎛

⎝

⎜ ⎜ ⎜ ⎜

⎞

⎠

⎟ ⎟ ⎟ ⎟

�

Aii = 0 Aij = A ji

L = 12

Aiji, j=1

N

∑ < k >= 2LN �

Aij =

0 1 0 00 0 1 11 0 0 00 0 0 0

⎛

⎝

⎜ ⎜ ⎜ ⎜

⎞

⎠

⎟ ⎟ ⎟ ⎟

�

Aii = 0 Aij ≠ A ji

L = Aiji, j=1

N

∑ < k >= LN

GRAFOS 1

Undirected Directed

14

23

2

14

Actor network, protein-protein interactions WWW, citation networks

�

Aij =

0 1 1 01 0 1 11 1 0 00 1 0 0

⎛

⎝

⎜ ⎜ ⎜ ⎜

⎞

⎠

⎟ ⎟ ⎟ ⎟

�

Aii = 0 Aij = A ji

L = 12

Aiji, j=1

N

∑ < k >= 2LN �

Aij =

0 2 0.5 02 0 1 40.5 1 0 00 4 0 0

⎛

⎝

⎜ ⎜ ⎜ ⎜

⎞

⎠

⎟ ⎟ ⎟ ⎟

�

Aii = 0 Aij = A ji

L = 12

nonzero(Aij )i, j=1

N

∑ < k >= 2LN

GRAFOS 2

Unweighted(undirected)

Weighted(undirected)

3

14

23

2

14

protein-protein interactions, www Call Graph, metabolic networks

�

Aij =

1 1 1 01 0 1 11 1 0 00 1 0 1

⎛

⎝

⎜ ⎜ ⎜ ⎜

⎞

⎠

⎟ ⎟ ⎟ ⎟

�

Aii ≠ 0 Aij = A ji

L = 12

Aij + Aiii=1

N

∑i, j=1,i≠ j

N

∑ ? �

Aij =

0 2 1 02 0 1 31 1 0 00 3 0 0

⎛

⎝

⎜ ⎜ ⎜ ⎜

⎞

⎠

⎟ ⎟ ⎟ ⎟

�

Aii = 0 Aij = A ji

L = 12

nonzero(Aij )i, j=1

N

∑ < k >= 2LN

GRAFOS 3

Self-interactions Multigraph(undirected)

3

14

23

2

14

Protein interaction network, www Social networks, collaboration networks

�

Aij =

0 1 1 11 0 1 11 1 0 11 1 1 0

⎛

⎝

⎜ ⎜ ⎜ ⎜

⎞

⎠

⎟ ⎟ ⎟ ⎟

�

Aii = 0 Ai≠ j =1

L = Lmax = N(N −1)2

< k >= N −1

GRAFOS 4

Complete Graph(undirected)

3

14

2

Actor network, protein-protein interactions

ciencia de redes (humanas y sociales) #2 · 2019. 5. 3. · red de enfermedades humanas

Documents