네트워크의 과학, 정하웅 kaist, 물리학과 albert lászló barabási, réka albert (univ....

Post on 20-Jan-2016

216 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

네트워크의 과학네트워크의 과학 ,,

정하웅정하웅KAIST, KAIST, 물리학과물리학과

Albert László Barabási, Réka AlbertAlbert László Barabási, Réka Albert(Univ. of Notre Dame)(Univ. of Notre Dame)

Zoltán N. Oltvai (Northwestern Univ.)Zoltán N. Oltvai (Northwestern Univ.)www.nd.edu/~networks

복잡한세상을 바라보는 새로운 복잡한세상을 바라보는 새로운 시각시각

A Few Good Man

Robert Wagner

Austin Powers: The spy who shagged me

Wild Things

Let’s make it legal

Barry Norton

What Price Glory

Monsieur Verdoux

What is Complexity?

Main Entry: 1com·plexFunction: nounEtymology: Late Latin complexus totality, from Latin, embrace, from complectiDate: 16431 : a whole made up of complicated or interrelated parts

non-linear systems chaos fractals

A popular paradigm: Simple systems display complex behavior

3 Body Problem

Earth( ) Jupiter ( ) Sun ( )

Society Human body : chemical network

Internet

NETWORKS!

Society

Nodes: individuals

Links: social relationship (family/work/friendship/etc.)

S. Milgram (1967)

Social networks: Many individuals with diverse social interactions between them.

John Guare “Six Degrees of Separation”

9-11 Terror Hijacker’s Network

Communication networksThe Earth is developing an electronic nervous system, a network with diverse nodes and links are

-computers

-routers

-satellites

-phone lines

-TV cables

-EM waves

Communication networks: Many non-identical components with diverse connections between them.

protein-gene interactions

protein-protein interactions

PROTEOME

GENOME

Citrate Cycle

METABOLISM

Bio-chemical reactions

Complex systemsMade of

many non-identical elements connected by diverse interactions.

NETWORK

Erdös-Rényi model (1960)

Pál ErdösPál Erdös (1913-1996)

- Democratic

Poisson distribution

Connect with probability p

p=1/6 N=10 k ~ 1.5

- Random

Degree Distribution P(k) : prob. that a certain

node will have k links

ARE COMPLEX NETWORKS REALLY

RANDOM?

To test this:

We need to pragmatically investigate the topology of large real networks.

World Wide Web

800 million documents (S. Lawrence, Nature,1999)

Nodes: WWW documents Links: URL links

ROBOT: collects all URL’s found in a document and follows them recursively

R. Albert, H. Jeong, A-L Barabasi, Nature, 401 130 (1999)

What did we expect?

We find:

Pout(k) ~ k-out

out= 2.45 in = 2.1

Pin(k) ~ k- in

k ~ 6

P(k=500) ~ 10-99

NWWW ~ 109

N(k=500)~10-90

P(k=500) ~ 10-6

NWWW ~ 109

N(k=500) ~ 103

What does it mean?Poisson distribution

Exponential Network

Power-law distribution

Scale-free Network

INTERNET BACKBONE

(Faloutsos, Faloutsos and Faloutsos, 1999)

Nodes: computers, routers Links: physical lines

Nodes: people (females; males)Links: sexual relationships

(Liljeros et al. Nature 2001)

4781 Swedes; 18-74; 59% response rate.

SEX-Web

ACTOR CONNECTIVITIES

Nodes: actors Links: cast jointly

N = 212,250 actors k = 28.78

P(k) ~k-

Days of Thunder (1990) Far and Away

(1992) Eyes Wide Shut (1999)

=2.3

SCIENCE CITATION INDEX

( = 3)

Nodes: papers Links: citations

(S. Redner, 1998)

P(k) ~k-

1736 PRL papers (1988)

Econo networkNodes: individual, company, country... Links: economic activities

)1(ln)(ln tPtPY iiiPi(t) : stock price at time t ,

protein-gene interactions

protein-protein interactions

PROTEOME

GENOME

Citrate Cycle

METABOLISM

Bio-chemical reactions

protein-protein network(yeast)

p53 network

(mammals)

metabolic network(E. coli)

Jeong et al. Nature 411, 41 (2001)

Jeong et al. Nature 407, 651 (2000).

Other Examples of Scale-Free Network

Email networkNodes: individual email address Links: email communication

Phone-call networksNodes: phone-number Links: completed phone call

Networks in linguisticsNodes: words Links: appear next or one word apart from each other

Nodes: agents, individuals Links: bids for the same item

(Abello et al, 1999)

(Ferrer et al, 2001)

(H. Jeong et al, 2001)

Networks in Electronic auction (eBay)

Most real world networks have the same internal

structure:

Scale-free networks

Why?

How?

ORIGIN OF SCALE-FREE NETWORKS

(1) The number of nodes (N) is NOT fixed. Networks continuously expand

by the addition of new nodes

Examples: WWW : addition of new documents Citation : publication of new papers

(2) The attachment is NOT uniform.A node is linked with higher probability to a

node that already has a large number of links.

Examples : WWW : new documents link to well known sites (CNN, YAHOO, NewYork Times, etc) Citation : well cited papers are more likely to be cited again

Origins SF

Scale-Free Model(1) GROWTH : At every timestep we add a new node with m edges (connected to the nodes already present in the system).

(2) PREFERENTIAL ATTACHMENT : The probability Π that a new node will be connected to node i depends on the connectivity ki of that node

A.-L.Barabási, R. Albert, Science 286, 509 (1999)

jj

ii k

kk

)(

P(k) ~k-3

Continuum Theory

γ = 3

t

k

k

kAk

t

k i

j j

ii

i

2)(

ii t

tmtk )(

, with initial condition mtk ii )(

)(1)(1)())((

02

2

2

2

2

2

tmk

tm

k

tmtP

k

tmtPktkP ititi

33

2

~12))((

)(

kktm

tm

k

ktkPkP

o

i

A.-L.Barabási, R. Albert and H. Jeong, Physica A 272, 173 (1999)

Most real world networks have the same internal

structure:

Scale-free networks

Why?

How?

Diameter (Scale-free) < Diameter (Exponential)

(Diameter : average distance between two nodes)

Efficiency of resource usage

With same number of nodes and links (same amount of resources),

construct scale-free and exponential networks.

Scale-free network is more efficient than exponential network!

RobustnessComplex systems maintain their basic functions even under errors and failures (cell mutations; Internet router breakdowns)

node failure

fc

0 1Fraction of removed nodes, f

1

Relative size of largest cluster

S

Robustness of scale-free networks

1

S

0 1f

3 : fc=1

(R. Cohen et al PRL, 2000)

Failures

Topological error tolerance

fc

Extreme failure

tolerance

fc

Attacks

Low survivability

under attacks!

Achilles’ Heel of complex network

Internet Protein network

failureattack

R. Albert, H. Jeong, A.L. Barabasi, Nature 406 378 (2000)

Bio-informatics vs. Networks

Human Genome Project completed! Inventory of all genes Only list of proteins

Post Genome Era

• Transcriptomics

• Proteomics

“Human Network Project”

Needs information on interactions

protein-gene interactions

protein-protein interactions

PROTEOME

GENOME

Citrate Cycle

METABOLISM

Bio-chemical reactions

Citrate Cycle

METABOLISM

Bio-chemical reactions

Metabolic NetworksNodes: chemicals (proteins, substrates)

Links: bio-chem. reaction

Metabolic networks

Organisms from all three domains of life are scale-free networks!

H. Jeong, B. Tombor, R. Albert, Z.N. Oltvai, and A.L. Barabasi, Nature, 407 651 (2000)

Archaea Bacteria Eukaryotes

Properties of metabolic networks

Average distances are independent of organisms! by making more links between nodes. based on “design principles” of the cell through evolution.

cf. Other scale-free network: D~log(N)

protein-gene interactions

protein-protein interactions

PROTEOME

GENOME

Citrate Cycle

METABOLISM

Bio-chemical reactions

protein-protein interactions

PROTEOME

Yeast protein networkNodes: proteins

Links: physical interactions (binding)

P. Uetz, et al. Nature 403, 623-7 (2000).

Topology of the protein network

)exp()(~)( 00

k

kkkkkP

H. Jeong, S.P. Mason, A.-L. Barabasi, Z.N. Oltvai, Nature, 411, 41 (2001)

Power-law with exponential cut-off : (physical limitation)

Ito et al, PNAS 97, 1143 (2000); PNAS 98, 4569 (2001).

[3280 protein with 4434 interactions]

While there is only 13% overlap between the Uetz et al and Ito et al data, their large-scale topology is identical.

P(k) ~ k-

Uetz 2.4

Ito 2.3

www.nd.edu/~networks/cell

Yeast protein network- lethality and topological position -

Highly connected proteins are more essential (lethal) than less connected proteins.

protein-gene interactions

protein-protein interactions

PROTEOME

GENOME

Citrate Cycle

METABOLISM

Bio-chemical reactions

protein-gene interactions

Nature 408 307 (2000)

“… since 1989 … there have been over 17,000 publications centered on p53 … this work has led to considerable confusion and controversy.”

“One way to understand the p53 network is to compare it to the Internet. The cell, like the Internet, appears to be a ‘scale-free network’.”

p53 network (mammals)

Complexity

Network

Scale-free network

Science collaboration WWW

Internet CellCitation pattern

UNCOVERING ORDER HIDDEN WITHIN COMPLEX SYSTEMS

Food Web

Conclusions• Scale-free networks are ubiquitous

• Effective use of resources

• Robust against errors and random failures

• Vulnerable under attack

• New way of understanding complexity

• Applicable to many areas

Outlook• Directed Networks

• Weighted Networks

• Dynamics on networks

A Few Good Man

Robert Wagner

Austin Powers: The spy who shagged me

Wild Things

Let’s make it legal

Barry Norton

What Price Glory

Monsieur Verdoux

Rank NameAveragedistance

# ofmovies

# oflinks

1 Rod Steiger 2.537527 112 25622 Donald Pleasence 2.542376 180 28743 Martin Sheen 2.551210 136 35014 Christopher Lee 2.552497 201 29935 Robert Mitchum 2.557181 136 29056 Charlton Heston 2.566284 104 25527 Eddie Albert 2.567036 112 33338 Robert Vaughn 2.570193 126 27619 Donald Sutherland 2.577880 107 2865

10 John Gielgud 2.578980 122 294211 Anthony Quinn 2.579750 146 297812 James Earl Jones 2.584440 112 3787…

876 Kevin Bacon 2.786981 46 1811…

Bonus: Why Kevin Bacon?

Measure the average distance between Kevin Bacon and all other actors.

No. of movies : 46 No. of actors : 1811 Average separation: 2.79

Kevin Bacon

Is Kevin Bacon the most

connected actor?

NO!

876 Kevin Bacon 2.786981 46 1811

Rod Steiger

Martin Sheen

Donald Pleasence

#1

#2

#3

#876Kevin Bacon

References

• R. Albert, H. Jeong, A.L. Barabasi, Nature 401 130 (1999).

• R. Albert, A.L. Barabasi, Science 286 509 (1999).

• A.L. Barabási, R. Albert and H. Jeong, Physica A 272, 173 (1999)

• R. Albert, H. Jeong, A.L. Barabasi, Nature 406 378 (2000).

• H.Jeong, B.Tombor, R.Albert, Z.N.Oltvai, A.L.Barabasi, Nature 407 651 (2000).

• H. Jeong, S.P. Mason, A.L. Barabasi, Z.N. Oltvai, Nature 411 41 (2001).

• J. Podani, Z.N. Oltvai, H. Jeong, B. Tombor, A.L. Barabasi, Nature Genetics (2001).

• S.H. Yook, H. Jeong, A.-L. Barabasi, PNAS (2002)

URL: http://stat.kaist.ac.kr

Email: Hawoong Jeong hjeong@mail.kaist.ac.kr

Paul Erdös has a Bacon number of 4! Paul Erdös was in N is a number (1993) with Gene Patterson

Gene Patterson was in Box of Moonlight (1996) with John Turturro

John Turturro was in Gung Ho (1986) with Clint Howard

Clint Howard was in My Dog Skip (2000) with Kevin Bacon

Information Tech. vs. Networks

(83% commercial content excluded Feb. 1999)

Relative Coverage of Search Engines

NON-EQUAL ACCESS OF INFORMATION

Based on network structure, such as number of incoming links & popularity,

find better information!

Spatial Distribution of RoutersFractal set

Box counting: N() No. of boxes of size that contain routers

N() ~ -Df Df=1.5

Designing better Internet!

Supercomputer

대형연구장비

첨단연구장비

학제간공동연구

대규모데이터

(BT, NT, etc.)

고성능클러스터

가상실험시스템

첨단대형연구장비원격제어및 감시연구망

(Gbps)

N*Grid

NT ET

CT

BT

선박

반도체 자동차

ST

기계기상

The Grid

인터넷의 미래

Distributed, high performance(parallel) computing and data handling infrastructure

데이터 그리드와 컴퓨터 그리드를 이용한 대규모

분산컴퓨팅 가능

(PC 등 컴퓨터의 유휴자원을 이용 , 슈퍼컴퓨터의 수천 -

수만배 이상의 계산능력을 가짐 .)

분산컴퓨터 (Distributed Computer)사용이 허가되어진 컴퓨터들에게 커다란 작업을 분할하여 계산 후 결과를 수집 , 분석하는 방법 .단점 : 컴퓨터 소유주가 자원사용을 허용해야함

( 참여율 저조 )

스텔스 - 컴퓨터 (Stealth Computer)분산 컴퓨터가 소유자의 허가를 받아야하는 것과 달리

소유자의 허가와 상관없이 네트워크에 연결된 모든 컴퓨터에 기본적으로 제공되는 표준 통신프로토콜을 이용 , 모든 컴퓨터에 분산컴퓨팅 기법을 적용하는

방법 .문제점 : 인터넷에 연결된 기본자원의 소유개념 등

여러가지 새로운 윤리적인 문제 발생 .

좀더 적극적인 접근 방법

HTTP 프로토콜과 TCP/IP checksum을 이용한 스텔스 컴퓨팅의 한 예 .

분산컴퓨터의 미래

Because of scale-free structure of the networkit is hard to stop AIDS with standard methods.

(There is no threshold.)

Can we stop AIDS?

By treating hubs (or at least, hub-like nodes)

with higher probability, we can

restore finite threshold.

Food Web

Nodes: trophic species Links: trophic interactions

R.J. Williams, N.D. Martinez Nature (2000)

Nodes: scientist (authors) Links: write paper together

(Newman, 2000, H. Jeong et al 2001)

SCIENCE COAUTHORSHIP

Integrated drug target identification based on genomic data

Cy3

Cy5

Taxonomy using networks

A: Archaea

B: Bacteria

E: Eukaryotes

J. Podani et al, Nature Genetics (2001)

Based on Topology

of metabolic networks

Taxonomy

Because of scale-free structure of the networkit is hard to stop AIDS with standard methods.

(There is no threshold.)

Can we stop AIDS?

By treating hubs (or at least, hub-like nodes)

with higher probability, we can

restore finite threshold.

Modeling the Internet’s large scale topology

S.H. Yook, H. Jeong, A.-L. Barabasi

[cond-mat/0107417]

INTERNET BACKBONE

(Faloutsos, Faloutsos and Faloutsos, 1999)

Nodes: computers, routers Links: physical lines

Preferential Attachment

• Compare maps taken at different times (t = 6 months)• Measure k(k), increase in No. of links for a node with k links

Preferential Attachment:

k(k) ~ k

Population density

Router density

Spatial Distributions

Spatial Distribution of Routers

Fractal set

Box counting: N() No. of boxes of size that contain routers

N() ~ -Df Df=1.5

Distance dep.

INTERNET

N() ~ -Df Df=1.5

k(k) ~ k =1

P(d) ~ d- =1

ij

j

d

kji ~),( # of links

distance between two nodes

Good agreement with real Internet Data!

Traditional modeling: Network as a static graphGiven a network with N nodes and L links

Create a graph with statistically identical topology

RESULT: model the static network topology

PROBLEM: real networks are not static!

Evolve through local processes

• new nodes

• new links between existing nodes

• nodes removed

• links removed

Real networks are dynamical systems!

:

Erdös-Rényi model (1960)

Pál ErdösPál Erdös (1913-1996)

- Democratic

Poisson distribution

Connect with probability p

p=1/6 N=10 k ~ 1.5

- Random

Watts-Strogatz

(Nature 393, 440 (1998))

N nodes forms a regular lattice. With probability p, each edge is rewired randomly.

Clustering: My friends will know each other with high probability!

Probability to be connected C » p

C =# of links between 1,2,…n neighbors

n(n-1)/2

Small-World Phenomena

Networks are clustered [large C(p)]

but have a small characteristic path length [small L(p)].

C(p) : clustering coeff.

L(p) : average path length

Evolving networks

OBJECTIVE: capture the network dynamics

METHOD :

• identify the processes that contribute to the network topology

• measure their frequency from real data

• develop dynamical models that capture these processes

BONUS: get the topology correctly.

Scale-free model(1) GROWTH : At every timestep we add a new node with m edges (connected to the nodes already present in the system).

(2) PREFERENTIAL ATTACHMENT : The probability Π that a new node will be connected to node i depends on the connectivity ki of that node

A.-L.Barabási, R. Albert, Science 286, 509 (1999)

jj

ii k

kk

)(

P(k) ~k-3

Model - A Model - B(Without Growth) (Without Preferential Attachment)

For given fixed number of nodes N,

links are added with preferential attachment.

Number of nodes are increasing,

but links are added with random probability.

Temporarily shows scale-free behavior, but

ends up to exponential network

Poisson distribution (exponential network)

Preferential Attachment

Citation network

Internet

k vs. k : increment of # of links during unit time

t

kk

t

k ii

i

~)( For given t,k (k)

Universality?

P(k) ~ (k+(p,q,m))-(p,q,m)

[1,)

• Predict the network topology from microscopic processes with parameters (p,q,m)

• Scaling but no universality

Extended Model

p=0.937

m=1

= 31.68

= 3.07

Actor network

• prob. p : internal links

• prob. q : link deletion

• prob. 1-p-q : add node

WWW(in)

WWW(out)

Internet(domain)

Internet(router)

ActorCitation

indexPower-

grid = 2.1 = 2.45 = 2.2 = 2.5 = 2.3 = 3 = 4

Other Models

• Model with initial attractiveness : (k) ~ A+k

P(k) ~ k- where =2 + A/m

(Dorogovtsev et al (2000).)

• Model with non-linear preferential attachment : (k) ~ k P(k) ~ no scaling for 1

<1 : stretch-exponential

>1 : no-scaling (>2 : “winner-takes-all”)

(Krapivsky et al (2000).)

• Model with aging : each node has lifetime node cannot get links after finite time. (actor)

P(k) : power-law with exponential cutoff

(Amaral et al (2000).)

• Model with limitation : each node has capacity limit.

node cannot get links after finite # of links.(internet)

P(k) : power-law with exponential cutoff

(Amaral et al (2000).)

Other Models (continued)

• Model with fitness (i) : each node has an intrinsic ability to get links at the expense of other nodes : (ki) ~ iki

(Bianconi et al (2000).)

1

)()()(

C

k

mCkP

• Alternative mechanisms for preferential attachment:

1. Copying mechanism: (concept from WWW)

: after adding new node, it is connected to

with prob. p: randomly selected node

with prob. 1-p: copied target from selected ‘prototye node’

2. Attaching to edges:

: after adding new node, instead of selecting target node,

select target edges and connect new node to end point of the edges.

Other Models (continued)

Whole cellular network

References

• R. Albert, H. Jeong, A.L. Barabasi, Nature 401 130 (1999).

• R. Albert, A.L. Barabasi, Science 286 509 (1999).

• A.L. Barabási, R. Albert and H. Jeong, Physica A 272, 173 (1999)

• R. Albert, H. Jeong, A.L. Barabasi, Nature 406 378 (2000).

• H.Jeong, B.Tombor, R.Albert, Z.N.Oltvai, A.L.Barabasi, Nature 407 651 (2000).

• H. Jeong, S.P. Mason, A.L. Barabasi, Z.N. Oltvai, Nature 411 41 (2001).

• J. Podani, Z.N. Oltvai, H. Jeong, B. Tombor, A.L. Barabasi, Nature Genetics (2001).

URL: http://www.nd.edu/~networks

< l

>

Finite size scaling: create a network with N nodes with Pin(k) and Pout(k)

< l > = 0.35 + 2.06 log(N)

Diameter of WWW: 19 degrees of separation

l15=2 [125]

l17=4 [1346 7]

… < l > = ??

1

2

3

4

5

6

7

nd.edu

19 degrees of separation R. Albert et al Nature (99)

based on 800 million webpages [S. Lawrence et al Nature (99)]

A. Broder et al WWW9 (00)IBM

Parameter (,,Df) dependence of P(k) and P(d)

N~6209 <k> ~ 3.93

N~325,729 <k> ~ 4.6

attack

attack

failure

failure

Fraction of nodes eliminated

Fraction of nodes eliminated

WWW

Internet

Robustness of scale-free networks

Three domains of Life

Taxonomy by using Cellular networks

1 2 3

A

UPGMA (unweighted pair grouping method with

arithmetic means)

22313

3

ddd A

dij = Jacard indexor rank order

top related