네트워크의 과학, 정하웅 kaist, 물리학과 albert lászló barabási, réka albert (univ....
TRANSCRIPT
네트워크의 과학네트워크의 과학 ,,
정하웅정하웅KAIST, KAIST, 물리학과물리학과
Albert László Barabási, Réka AlbertAlbert László Barabási, Réka Albert(Univ. of Notre Dame)(Univ. of Notre Dame)
Zoltán N. Oltvai (Northwestern Univ.)Zoltán N. Oltvai (Northwestern Univ.)www.nd.edu/~networks
복잡한세상을 바라보는 새로운 복잡한세상을 바라보는 새로운 시각시각
A Few Good Man
Robert Wagner
Austin Powers: The spy who shagged me
Wild Things
Let’s make it legal
Barry Norton
What Price Glory
Monsieur Verdoux
What is Complexity?
Main Entry: 1com·plexFunction: nounEtymology: Late Latin complexus totality, from Latin, embrace, from complectiDate: 16431 : a whole made up of complicated or interrelated parts
non-linear systems chaos fractals
A popular paradigm: Simple systems display complex behavior
3 Body Problem
Earth( ) Jupiter ( ) Sun ( )
Society Human body : chemical network
Internet
NETWORKS!
Society
Nodes: individuals
Links: social relationship (family/work/friendship/etc.)
S. Milgram (1967)
Social networks: Many individuals with diverse social interactions between them.
John Guare “Six Degrees of Separation”
9-11 Terror Hijacker’s Network
Communication networksThe Earth is developing an electronic nervous system, a network with diverse nodes and links are
-computers
-routers
-satellites
-phone lines
-TV cables
-EM waves
Communication networks: Many non-identical components with diverse connections between them.
protein-gene interactions
protein-protein interactions
PROTEOME
GENOME
Citrate Cycle
METABOLISM
Bio-chemical reactions
Complex systemsMade of
many non-identical elements connected by diverse interactions.
NETWORK
Erdös-Rényi model (1960)
Pál ErdösPál Erdös (1913-1996)
- Democratic
Poisson distribution
Connect with probability p
p=1/6 N=10 k ~ 1.5
- Random
Degree Distribution P(k) : prob. that a certain
node will have k links
ARE COMPLEX NETWORKS REALLY
RANDOM?
To test this:
We need to pragmatically investigate the topology of large real networks.
World Wide Web
800 million documents (S. Lawrence, Nature,1999)
Nodes: WWW documents Links: URL links
ROBOT: collects all URL’s found in a document and follows them recursively
R. Albert, H. Jeong, A-L Barabasi, Nature, 401 130 (1999)
What did we expect?
We find:
Pout(k) ~ k-out
out= 2.45 in = 2.1
Pin(k) ~ k- in
k ~ 6
P(k=500) ~ 10-99
NWWW ~ 109
N(k=500)~10-90
P(k=500) ~ 10-6
NWWW ~ 109
N(k=500) ~ 103
What does it mean?Poisson distribution
Exponential Network
Power-law distribution
Scale-free Network
INTERNET BACKBONE
(Faloutsos, Faloutsos and Faloutsos, 1999)
Nodes: computers, routers Links: physical lines
Nodes: people (females; males)Links: sexual relationships
(Liljeros et al. Nature 2001)
4781 Swedes; 18-74; 59% response rate.
SEX-Web
ACTOR CONNECTIVITIES
Nodes: actors Links: cast jointly
N = 212,250 actors k = 28.78
P(k) ~k-
Days of Thunder (1990) Far and Away
(1992) Eyes Wide Shut (1999)
=2.3
SCIENCE CITATION INDEX
( = 3)
Nodes: papers Links: citations
(S. Redner, 1998)
P(k) ~k-
1736 PRL papers (1988)
Econo networkNodes: individual, company, country... Links: economic activities
)1(ln)(ln tPtPY iiiPi(t) : stock price at time t ,
protein-gene interactions
protein-protein interactions
PROTEOME
GENOME
Citrate Cycle
METABOLISM
Bio-chemical reactions
protein-protein network(yeast)
p53 network
(mammals)
metabolic network(E. coli)
Jeong et al. Nature 411, 41 (2001)
Jeong et al. Nature 407, 651 (2000).
Other Examples of Scale-Free Network
Email networkNodes: individual email address Links: email communication
Phone-call networksNodes: phone-number Links: completed phone call
Networks in linguisticsNodes: words Links: appear next or one word apart from each other
Nodes: agents, individuals Links: bids for the same item
(Abello et al, 1999)
(Ferrer et al, 2001)
(H. Jeong et al, 2001)
Networks in Electronic auction (eBay)
Most real world networks have the same internal
structure:
Scale-free networks
Why?
How?
ORIGIN OF SCALE-FREE NETWORKS
(1) The number of nodes (N) is NOT fixed. Networks continuously expand
by the addition of new nodes
Examples: WWW : addition of new documents Citation : publication of new papers
(2) The attachment is NOT uniform.A node is linked with higher probability to a
node that already has a large number of links.
Examples : WWW : new documents link to well known sites (CNN, YAHOO, NewYork Times, etc) Citation : well cited papers are more likely to be cited again
Origins SF
Scale-Free Model(1) GROWTH : At every timestep we add a new node with m edges (connected to the nodes already present in the system).
(2) PREFERENTIAL ATTACHMENT : The probability Π that a new node will be connected to node i depends on the connectivity ki of that node
A.-L.Barabási, R. Albert, Science 286, 509 (1999)
jj
ii k
kk
)(
P(k) ~k-3
Continuum Theory
γ = 3
t
k
k
kAk
t
k i
j j
ii
i
2)(
ii t
tmtk )(
, with initial condition mtk ii )(
)(1)(1)())((
02
2
2
2
2
2
tmk
tm
k
tmtP
k
tmtPktkP ititi
33
2
~12))((
)(
kktm
tm
k
ktkPkP
o
i
A.-L.Barabási, R. Albert and H. Jeong, Physica A 272, 173 (1999)
Most real world networks have the same internal
structure:
Scale-free networks
Why?
How?
Diameter (Scale-free) < Diameter (Exponential)
(Diameter : average distance between two nodes)
Efficiency of resource usage
With same number of nodes and links (same amount of resources),
construct scale-free and exponential networks.
Scale-free network is more efficient than exponential network!
RobustnessComplex systems maintain their basic functions even under errors and failures (cell mutations; Internet router breakdowns)
node failure
fc
0 1Fraction of removed nodes, f
1
Relative size of largest cluster
S
Robustness of scale-free networks
1
S
0 1f
3 : fc=1
(R. Cohen et al PRL, 2000)
Failures
Topological error tolerance
fc
Extreme failure
tolerance
fc
Attacks
Low survivability
under attacks!
Achilles’ Heel of complex network
Internet Protein network
failureattack
R. Albert, H. Jeong, A.L. Barabasi, Nature 406 378 (2000)
Bio-informatics vs. Networks
Human Genome Project completed! Inventory of all genes Only list of proteins
Post Genome Era
• Transcriptomics
• Proteomics
“Human Network Project”
Needs information on interactions
protein-gene interactions
protein-protein interactions
PROTEOME
GENOME
Citrate Cycle
METABOLISM
Bio-chemical reactions
Citrate Cycle
METABOLISM
Bio-chemical reactions
Metabolic NetworksNodes: chemicals (proteins, substrates)
Links: bio-chem. reaction
Metabolic networks
Organisms from all three domains of life are scale-free networks!
H. Jeong, B. Tombor, R. Albert, Z.N. Oltvai, and A.L. Barabasi, Nature, 407 651 (2000)
Archaea Bacteria Eukaryotes
Properties of metabolic networks
Average distances are independent of organisms! by making more links between nodes. based on “design principles” of the cell through evolution.
cf. Other scale-free network: D~log(N)
protein-gene interactions
protein-protein interactions
PROTEOME
GENOME
Citrate Cycle
METABOLISM
Bio-chemical reactions
protein-protein interactions
PROTEOME
Yeast protein networkNodes: proteins
Links: physical interactions (binding)
P. Uetz, et al. Nature 403, 623-7 (2000).
Topology of the protein network
)exp()(~)( 00
k
kkkkkP
H. Jeong, S.P. Mason, A.-L. Barabasi, Z.N. Oltvai, Nature, 411, 41 (2001)
Power-law with exponential cut-off : (physical limitation)
Ito et al, PNAS 97, 1143 (2000); PNAS 98, 4569 (2001).
[3280 protein with 4434 interactions]
While there is only 13% overlap between the Uetz et al and Ito et al data, their large-scale topology is identical.
P(k) ~ k-
Uetz 2.4
Ito 2.3
www.nd.edu/~networks/cell
Yeast protein network- lethality and topological position -
Highly connected proteins are more essential (lethal) than less connected proteins.
protein-gene interactions
protein-protein interactions
PROTEOME
GENOME
Citrate Cycle
METABOLISM
Bio-chemical reactions
protein-gene interactions
Nature 408 307 (2000)
“… since 1989 … there have been over 17,000 publications centered on p53 … this work has led to considerable confusion and controversy.”
…
“One way to understand the p53 network is to compare it to the Internet. The cell, like the Internet, appears to be a ‘scale-free network’.”
p53 network (mammals)
Complexity
Network
Scale-free network
Science collaboration WWW
Internet CellCitation pattern
UNCOVERING ORDER HIDDEN WITHIN COMPLEX SYSTEMS
Food Web
Conclusions• Scale-free networks are ubiquitous
• Effective use of resources
• Robust against errors and random failures
• Vulnerable under attack
• New way of understanding complexity
• Applicable to many areas
Outlook• Directed Networks
• Weighted Networks
• Dynamics on networks
A Few Good Man
Robert Wagner
Austin Powers: The spy who shagged me
Wild Things
Let’s make it legal
Barry Norton
What Price Glory
Monsieur Verdoux
Rank NameAveragedistance
# ofmovies
# oflinks
1 Rod Steiger 2.537527 112 25622 Donald Pleasence 2.542376 180 28743 Martin Sheen 2.551210 136 35014 Christopher Lee 2.552497 201 29935 Robert Mitchum 2.557181 136 29056 Charlton Heston 2.566284 104 25527 Eddie Albert 2.567036 112 33338 Robert Vaughn 2.570193 126 27619 Donald Sutherland 2.577880 107 2865
10 John Gielgud 2.578980 122 294211 Anthony Quinn 2.579750 146 297812 James Earl Jones 2.584440 112 3787…
876 Kevin Bacon 2.786981 46 1811…
Bonus: Why Kevin Bacon?
Measure the average distance between Kevin Bacon and all other actors.
No. of movies : 46 No. of actors : 1811 Average separation: 2.79
Kevin Bacon
Is Kevin Bacon the most
connected actor?
NO!
876 Kevin Bacon 2.786981 46 1811
Rod Steiger
Martin Sheen
Donald Pleasence
#1
#2
#3
#876Kevin Bacon
References
• R. Albert, H. Jeong, A.L. Barabasi, Nature 401 130 (1999).
• R. Albert, A.L. Barabasi, Science 286 509 (1999).
• A.L. Barabási, R. Albert and H. Jeong, Physica A 272, 173 (1999)
• R. Albert, H. Jeong, A.L. Barabasi, Nature 406 378 (2000).
• H.Jeong, B.Tombor, R.Albert, Z.N.Oltvai, A.L.Barabasi, Nature 407 651 (2000).
• H. Jeong, S.P. Mason, A.L. Barabasi, Z.N. Oltvai, Nature 411 41 (2001).
• J. Podani, Z.N. Oltvai, H. Jeong, B. Tombor, A.L. Barabasi, Nature Genetics (2001).
• S.H. Yook, H. Jeong, A.-L. Barabasi, PNAS (2002)
URL: http://stat.kaist.ac.kr
Email: Hawoong Jeong [email protected]
Paul Erdös has a Bacon number of 4! Paul Erdös was in N is a number (1993) with Gene Patterson
Gene Patterson was in Box of Moonlight (1996) with John Turturro
John Turturro was in Gung Ho (1986) with Clint Howard
Clint Howard was in My Dog Skip (2000) with Kevin Bacon
Information Tech. vs. Networks
(83% commercial content excluded Feb. 1999)
Relative Coverage of Search Engines
NON-EQUAL ACCESS OF INFORMATION
Based on network structure, such as number of incoming links & popularity,
find better information!
Spatial Distribution of RoutersFractal set
Box counting: N() No. of boxes of size that contain routers
N() ~ -Df Df=1.5
Designing better Internet!
Supercomputer
대형연구장비
첨단연구장비
학제간공동연구
대규모데이터
(BT, NT, etc.)
고성능클러스터
가상실험시스템
첨단대형연구장비원격제어및 감시연구망
(Gbps)
N*Grid
NT ET
CT
BT
선박
반도체 자동차
ST
기계기상
The Grid
인터넷의 미래
Distributed, high performance(parallel) computing and data handling infrastructure
데이터 그리드와 컴퓨터 그리드를 이용한 대규모
분산컴퓨팅 가능
(PC 등 컴퓨터의 유휴자원을 이용 , 슈퍼컴퓨터의 수천 -
수만배 이상의 계산능력을 가짐 .)
분산컴퓨터 (Distributed Computer)사용이 허가되어진 컴퓨터들에게 커다란 작업을 분할하여 계산 후 결과를 수집 , 분석하는 방법 .단점 : 컴퓨터 소유주가 자원사용을 허용해야함
( 참여율 저조 )
스텔스 - 컴퓨터 (Stealth Computer)분산 컴퓨터가 소유자의 허가를 받아야하는 것과 달리
소유자의 허가와 상관없이 네트워크에 연결된 모든 컴퓨터에 기본적으로 제공되는 표준 통신프로토콜을 이용 , 모든 컴퓨터에 분산컴퓨팅 기법을 적용하는
방법 .문제점 : 인터넷에 연결된 기본자원의 소유개념 등
여러가지 새로운 윤리적인 문제 발생 .
좀더 적극적인 접근 방법
HTTP 프로토콜과 TCP/IP checksum을 이용한 스텔스 컴퓨팅의 한 예 .
분산컴퓨터의 미래
Because of scale-free structure of the networkit is hard to stop AIDS with standard methods.
(There is no threshold.)
Can we stop AIDS?
By treating hubs (or at least, hub-like nodes)
with higher probability, we can
restore finite threshold.
Food Web
Nodes: trophic species Links: trophic interactions
R.J. Williams, N.D. Martinez Nature (2000)
Nodes: scientist (authors) Links: write paper together
(Newman, 2000, H. Jeong et al 2001)
SCIENCE COAUTHORSHIP
Integrated drug target identification based on genomic data
Cy3
Cy5
Taxonomy using networks
A: Archaea
B: Bacteria
E: Eukaryotes
J. Podani et al, Nature Genetics (2001)
Based on Topology
of metabolic networks
Taxonomy
Because of scale-free structure of the networkit is hard to stop AIDS with standard methods.
(There is no threshold.)
Can we stop AIDS?
By treating hubs (or at least, hub-like nodes)
with higher probability, we can
restore finite threshold.
Modeling the Internet’s large scale topology
S.H. Yook, H. Jeong, A.-L. Barabasi
[cond-mat/0107417]
INTERNET BACKBONE
(Faloutsos, Faloutsos and Faloutsos, 1999)
Nodes: computers, routers Links: physical lines
Preferential Attachment
• Compare maps taken at different times (t = 6 months)• Measure k(k), increase in No. of links for a node with k links
Preferential Attachment:
k(k) ~ k
Population density
Router density
Spatial Distributions
Spatial Distribution of Routers
Fractal set
Box counting: N() No. of boxes of size that contain routers
N() ~ -Df Df=1.5
Distance dep.
INTERNET
N() ~ -Df Df=1.5
k(k) ~ k =1
P(d) ~ d- =1
ij
j
d
kji ~),( # of links
distance between two nodes
Good agreement with real Internet Data!
Traditional modeling: Network as a static graphGiven a network with N nodes and L links
Create a graph with statistically identical topology
RESULT: model the static network topology
PROBLEM: real networks are not static!
Evolve through local processes
• new nodes
• new links between existing nodes
• nodes removed
• links removed
Real networks are dynamical systems!
:
Erdös-Rényi model (1960)
Pál ErdösPál Erdös (1913-1996)
- Democratic
Poisson distribution
Connect with probability p
p=1/6 N=10 k ~ 1.5
- Random
Watts-Strogatz
(Nature 393, 440 (1998))
N nodes forms a regular lattice. With probability p, each edge is rewired randomly.
Clustering: My friends will know each other with high probability!
Probability to be connected C » p
C =# of links between 1,2,…n neighbors
n(n-1)/2
Small-World Phenomena
Networks are clustered [large C(p)]
but have a small characteristic path length [small L(p)].
C(p) : clustering coeff.
L(p) : average path length
Evolving networks
OBJECTIVE: capture the network dynamics
METHOD :
• identify the processes that contribute to the network topology
• measure their frequency from real data
• develop dynamical models that capture these processes
BONUS: get the topology correctly.
Scale-free model(1) GROWTH : At every timestep we add a new node with m edges (connected to the nodes already present in the system).
(2) PREFERENTIAL ATTACHMENT : The probability Π that a new node will be connected to node i depends on the connectivity ki of that node
A.-L.Barabási, R. Albert, Science 286, 509 (1999)
jj
ii k
kk
)(
P(k) ~k-3
Model - A Model - B(Without Growth) (Without Preferential Attachment)
For given fixed number of nodes N,
links are added with preferential attachment.
Number of nodes are increasing,
but links are added with random probability.
Temporarily shows scale-free behavior, but
ends up to exponential network
Poisson distribution (exponential network)
Preferential Attachment
Citation network
Internet
k vs. k : increment of # of links during unit time
t
kk
t
k ii
i
~)( For given t,k (k)
Universality?
P(k) ~ (k+(p,q,m))-(p,q,m)
[1,)
• Predict the network topology from microscopic processes with parameters (p,q,m)
• Scaling but no universality
Extended Model
p=0.937
m=1
= 31.68
= 3.07
Actor network
• prob. p : internal links
• prob. q : link deletion
• prob. 1-p-q : add node
WWW(in)
WWW(out)
Internet(domain)
Internet(router)
ActorCitation
indexPower-
grid = 2.1 = 2.45 = 2.2 = 2.5 = 2.3 = 3 = 4
Other Models
• Model with initial attractiveness : (k) ~ A+k
P(k) ~ k- where =2 + A/m
(Dorogovtsev et al (2000).)
• Model with non-linear preferential attachment : (k) ~ k P(k) ~ no scaling for 1
<1 : stretch-exponential
>1 : no-scaling (>2 : “winner-takes-all”)
(Krapivsky et al (2000).)
• Model with aging : each node has lifetime node cannot get links after finite time. (actor)
P(k) : power-law with exponential cutoff
(Amaral et al (2000).)
• Model with limitation : each node has capacity limit.
node cannot get links after finite # of links.(internet)
P(k) : power-law with exponential cutoff
(Amaral et al (2000).)
Other Models (continued)
• Model with fitness (i) : each node has an intrinsic ability to get links at the expense of other nodes : (ki) ~ iki
(Bianconi et al (2000).)
1
)()()(
C
k
mCkP
• Alternative mechanisms for preferential attachment:
1. Copying mechanism: (concept from WWW)
: after adding new node, it is connected to
with prob. p: randomly selected node
with prob. 1-p: copied target from selected ‘prototye node’
2. Attaching to edges:
: after adding new node, instead of selecting target node,
select target edges and connect new node to end point of the edges.
Other Models (continued)
Whole cellular network
References
• R. Albert, H. Jeong, A.L. Barabasi, Nature 401 130 (1999).
• R. Albert, A.L. Barabasi, Science 286 509 (1999).
• A.L. Barabási, R. Albert and H. Jeong, Physica A 272, 173 (1999)
• R. Albert, H. Jeong, A.L. Barabasi, Nature 406 378 (2000).
• H.Jeong, B.Tombor, R.Albert, Z.N.Oltvai, A.L.Barabasi, Nature 407 651 (2000).
• H. Jeong, S.P. Mason, A.L. Barabasi, Z.N. Oltvai, Nature 411 41 (2001).
• J. Podani, Z.N. Oltvai, H. Jeong, B. Tombor, A.L. Barabasi, Nature Genetics (2001).
URL: http://www.nd.edu/~networks
< l
>
Finite size scaling: create a network with N nodes with Pin(k) and Pout(k)
< l > = 0.35 + 2.06 log(N)
Diameter of WWW: 19 degrees of separation
l15=2 [125]
l17=4 [1346 7]
… < l > = ??
1
2
3
4
5
6
7
nd.edu
19 degrees of separation R. Albert et al Nature (99)
based on 800 million webpages [S. Lawrence et al Nature (99)]
A. Broder et al WWW9 (00)IBM
Parameter (,,Df) dependence of P(k) and P(d)
N~6209 <k> ~ 3.93
N~325,729 <k> ~ 4.6
attack
attack
failure
failure
Fraction of nodes eliminated
Fraction of nodes eliminated
WWW
Internet
Robustness of scale-free networks
Three domains of Life
Taxonomy by using Cellular networks
1 2 3
A
UPGMA (unweighted pair grouping method with
arithmetic means)
22313
3
ddd A
dij = Jacard indexor rank order