theory of αbins: alphabetic bipartite networks animesh mukherjee dept. of computer science and...

19
Theory of αBiNs: Alphabetic Bipartite Networks Animesh Mukherjee Dept. of Computer Science and Engineering Indian Institute of Technology, Kharagpur Collaborators: Monojit Choudhury, Microsoft Research India, Bangalore Niloy Ganguly, Abyayananda Maiti, Department of Computer Science and Engineering, Indian Institute of Technology, Kharagpur Fernando Peruani, Service de Physique de l'Etat Condense & Complex System Institute Paris - Ile-de-France, Paris, France Lutz Brusch and Andreas Deutsch, Centre for

Upload: laura-hutchinson

Post on 12-Jan-2016

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Theory of αBiNs: Alphabetic Bipartite Networks Animesh Mukherjee Dept. of Computer Science and Engineering Indian Institute of Technology, Kharagpur Collaborators:

Theory of αBiNs: Alphabetic Bipartite Networks

Animesh MukherjeeDept. of Computer Science and Engineering

Indian Institute of Technology, Kharagpur

Collaborators:Monojit Choudhury, Microsoft Research India, BangaloreNiloy Ganguly, Abyayananda Maiti, Department of Computer Science and Engineering, Indian Institute of Technology, KharagpurFernando Peruani, Service de Physique de l'Etat Condense & Complex System Institute Paris - Ile-de-France, Paris, FranceLutz Brusch and Andreas Deutsch, Centre for Information Services and High Performance Computing, Technical University of Dresden, Germany

Page 2: Theory of αBiNs: Alphabetic Bipartite Networks Animesh Mukherjee Dept. of Computer Science and Engineering Indian Institute of Technology, Kharagpur Collaborators:

Discrete Combinatorial System (DCS)

• A DCS is a system where the basic building blocks are a finite set of elementary units and the system is a collection of potentially infinite number of discrete combinations of these units

• Examples include two of the greatest wonders on earth – life and language

• Life Elementary units are the nucleotides or codons while their discrete combinations give rise to the different genes

• Language Elementary units are the letters or words and the discrete combinations are the sentences formed from them.

Page 3: Theory of αBiNs: Alphabetic Bipartite Networks Animesh Mukherjee Dept. of Computer Science and Engineering Indian Institute of Technology, Kharagpur Collaborators:

αBiNs to Model a DCS

• αBiNs A special class of complex networksoBipartite in natureoOne partition contains nodes corresponding to

the basic units (or alphabets) while the other contains nodes that represent the discrete combinations of the basic units

oAn edge represents that a particular basic unit is a part of a discrete combination

Page 4: Theory of αBiNs: Alphabetic Bipartite Networks Animesh Mukherjee Dept. of Computer Science and Engineering Indian Institute of Technology, Kharagpur Collaborators:

Example: Phoneme-language Network (PlaNet)• Basic Unit Phonemes that human beings can articulate• Discrete Combination Phoneme inventory of a language, i.e.,

the repertoire of phonemes that the speakers of the language use for communication

l1

l2

l3

l4

/s/

/p/

/k/

/d/

/t/

/n/

PlaNet - Phoneme-Language Network

Page 5: Theory of αBiNs: Alphabetic Bipartite Networks Animesh Mukherjee Dept. of Computer Science and Engineering Indian Institute of Technology, Kharagpur Collaborators:

Topological Properties of PLaNet

Degree distribution of language nodes

Degree distribution of phoneme nodes

0 50 100

150

0.02

0.04

0.06

0.08

Language inventory size (degree k)

pk

pk = beta(k) with α = 7.06, and β = 47.64

pk =Γ(54.7) k6.06(1-k)46.64

Γ(7.06) Γ(47.64)

kmin= 5, kmax= 173, kavg= 21

200

1000Degree of a consonant, k

Pk = k -0.71

Exponential Cut-off

1 10 100

0.001

0.01

0.1

1

Networks constructed from the data available at UCLA Phonological Segment Inventory Database (UPSID) hosts 317 inventories with 541 different consonants found across them

Page 6: Theory of αBiNs: Alphabetic Bipartite Networks Animesh Mukherjee Dept. of Computer Science and Engineering Indian Institute of Technology, Kharagpur Collaborators:

Network Synthesis

• Can we simulate a stochastic network growth model which has similar DD?

• Clue: Preferential attachment leads to power-law degree distributions in both unipartite and unbounded bipartite networks

Page 7: Theory of αBiNs: Alphabetic Bipartite Networks Animesh Mukherjee Dept. of Computer Science and Engineering Indian Institute of Technology, Kharagpur Collaborators:

Evolution of PlaNet

Rules of the game:• A new language is born• Chooses from the set of

existing phonemes preferentially based on the degree

k + (k + )

all phonemes

Phon

emes Languages

Page 8: Theory of αBiNs: Alphabetic Bipartite Networks Animesh Mukherjee Dept. of Computer Science and Engineering Indian Institute of Technology, Kharagpur Collaborators:

Wow! We are quite close

ACL 2006

Page 9: Theory of αBiNs: Alphabetic Bipartite Networks Animesh Mukherjee Dept. of Computer Science and Engineering Indian Institute of Technology, Kharagpur Collaborators:

Theoretical Investigation: The Three Sides of the Coin• Sequential Attachment

o Only one edge per incoming nodeo Exclusive set-membership: Language – {speaker,

webpage}, country – citizen

• Parallel Attachment With Replacemento All incoming nodes has > 1 edgeso Sequences: letter-word, word-document

• Parallel Attachment Without Replacemento Sets: phoneme-languages, station-train

Page 10: Theory of αBiNs: Alphabetic Bipartite Networks Animesh Mukherjee Dept. of Computer Science and Engineering Indian Institute of Technology, Kharagpur Collaborators:

Sequential Attachment

Markov Chain Formulation

t – #nodes in growing partition N – #nodes in fixed partitionpk,t – pk after adding t nodes*One edge added per node

EPL, 2007

Notations

Page 11: Theory of αBiNs: Alphabetic Bipartite Networks Animesh Mukherjee Dept. of Computer Science and Engineering Indian Institute of Technology, Kharagpur Collaborators:

The Hard part• Average degree of the fixed partition diverges• Methods based on steady-state and continuous

time assumptions fail

Closed-form Solution

EPL, 2007

Page 12: Theory of αBiNs: Alphabetic Bipartite Networks Animesh Mukherjee Dept. of Computer Science and Engineering Indian Institute of Technology, Kharagpur Collaborators:

A tunable distribution

k (degree)

p k (p

rob

abili

ty t

hat

rand

om

ly c

hose

n n

od

e h

as

deg

ree k

)

= = 2

= 1 = 4e-4

1< <

< (N/-1)-1

EPL, 2007

Page 13: Theory of αBiNs: Alphabetic Bipartite Networks Animesh Mukherjee Dept. of Computer Science and Engineering Indian Institute of Technology, Kharagpur Collaborators:

Parallel attachment with replacement

• Either use approximation: pk,t ~ B(k/t; ε, Nε/μ – ε) where (> 1) is the number of incoming edges

• An exact Markov Chain:

• Could not solve for exact solution

• But have some closer approximations

To be Submitted to PRE

Page 14: Theory of αBiNs: Alphabetic Bipartite Networks Animesh Mukherjee Dept. of Computer Science and Engineering Indian Institute of Technology, Kharagpur Collaborators:

Parallel Attachment with replacement results

= 1 = 0.0625

• =40, N = 100

• Red broken line Approximation

• Blue symbols Stochastic Simulation

• Black line Numerical integration of the Markov chain

• For very low the approximation falls out of range

Page 15: Theory of αBiNs: Alphabetic Bipartite Networks Animesh Mukherjee Dept. of Computer Science and Engineering Indian Institute of Technology, Kharagpur Collaborators:

One-Mode Projection of the fixed Partition

• One mode projection onto the nodes of the fixed partition corresponds to a network of basic units where two basic units are connected as many times as they are part of discrete combinations: example Phoneme-phoneme Network (PhoNet)

PhoNet - Phoneme-Phoneme Network

/s/

/n//k/

/p/

/t/ /d/

1

1 1

2

2

2

1

2

1

1

1

1

1

Page 16: Theory of αBiNs: Alphabetic Bipartite Networks Animesh Mukherjee Dept. of Computer Science and Engineering Indian Institute of Technology, Kharagpur Collaborators:

Weighted DD

= 5 = 15

N = 500, = 1

Blue dots Stochastic Simulation, Black line Theory

q = k( - 1)

Page 17: Theory of αBiNs: Alphabetic Bipartite Networks Animesh Mukherjee Dept. of Computer Science and Engineering Indian Institute of Technology, Kharagpur Collaborators:

Comparison with real data

Not a very good match

Page 18: Theory of αBiNs: Alphabetic Bipartite Networks Animesh Mukherjee Dept. of Computer Science and Engineering Indian Institute of Technology, Kharagpur Collaborators:

A lot of work for future

• Derive closed form solutions for

oParallel attachment with replacement

oParallel attachment without replacement

• Strike a model and its associated theory to match the properties of the one-mode

• Study other real-world systems with an underlying αBiN-structure

Page 19: Theory of αBiNs: Alphabetic Bipartite Networks Animesh Mukherjee Dept. of Computer Science and Engineering Indian Institute of Technology, Kharagpur Collaborators:

To-DAH