d2mce

48
D2MCE Speaker 呂呂呂 Adviser: 呂呂呂 呂呂 Date 2008/07/14

Upload: zongying-lyu

Post on 23-Jun-2015

108 views

Category:

Software


0 download

DESCRIPTION

Distributed Mobile Computing Environment Document http://ir.ntut.edu.tw/ir/retrieve/35374/ntut-98-95598029-1.pdf Source https://github.com/ffbli666/D2MCE

TRANSCRIPT

Page 1: D2MCE

D2MCE

Speaker :呂宗螢Adviser: 梁文耀 老師Date : 2008/07/14

Page 2: D2MCE

Embedded and Parallel Systems Lab 2

D2MCE

Wireless

Network

Page 3: D2MCE

Embedded and Parallel Systems Lab 3

DSM

Page 4: D2MCE

Three State

Invalid Shared

Exclusive

wri

te m

iss

sh

are

s =

{n

od

e}

inv

ali

da

teinvalidate

read misssharers = shares + {node}

fetc

h

write

hit

shar

ers

= {node}

read hit / write hit

read hit

Black = all node process

Red = only home node

process

Page 5: D2MCE

Embedded and Parallel Systems Lab 5

Invalidate & update

Node 1 Node2 Node 3 Node 4

store(A)

updateupdateupdate

load(A)

Node 1 Node2 Node 3 Node 4

store(A)

invalidate

load(A)

invalidateinvalidate

update

Invalidate Update

Page 6: D2MCE

Embedded and Parallel Systems Lab 6

Release Consistency Definition

1. Before an ordinary access is allowed to perform with respect to any other processor, all previous acquires must be performed.

2. Before a release is allowed to perform with respect to any other processor, all previous ordinary read and writes must be performed.

3. Special accesses are sequentially consistent with respect to one another.

Page 7: D2MCE

Embedded and Parallel Systems Lab 7

ERC & LRC

Lazy RCNode 1 Node 2 Node 3

store(A)

store(A)release

acquire

store(A)release

acquire

release

acquire

Eager RCNode 1 Node 2 Node 3

store(A)

release

store(A)release

acquire

store(A)release

acquire

acquire

Page 8: D2MCE

Embedded and Parallel Systems Lab 8

Home-base & Homeless

Homeless Diff scattered in all the nodes Diff store Garbage collection

Home-base Centralize processing & always update No diff store No garbage collection Home node access the share memory no communication

Page 9: D2MCE

Embedded and Parallel Systems Lab 9

HLRC

Node 1 Node 2 Home Node 3

store(A)

acquire

release

Load(A)

acquire

release

Invalidate(A)

twin

diff

apply diff

fetch page

Page 10: D2MCE

Only send not invalid nodeInvalid

Node 1 Home Node2

Not invalid

Node 3

Invalid

Node 4

store(A)

acquire

release

invalidate

acquire

release

req

update

acquire

release

req

update

load(A)

load(A)

reply

Page 11: D2MCE

HERC Worst Case

4*W count

8*W byte

Node 1 Home Node 2 Node 3 Node 4

acquire

release

store(A)A (exclusive) A (invalid)

A (invalid) A (shared) A (invalid) A (invalid)

acquire

release

store(A)

A (invalid) acquire

release

store(A)

A (exclusive)

A (exclusive)

A (invalid)

acquire

release

store(A)

A (invalid)

A (exclusive)

acquire

release

store(A) A (exclusive)

A (invalid)

Invalidate

reply

Page 12: D2MCE

Tradition ERC Worst Case

2(n-1) count

8*W byte

Node 1 Node 2 Node 3 Node 4

acquire

release

store(A) A (invalid)

A (invalid) A (shared) A (invalid) A (invalid)

Invalidate

reply

acquire

release

store(A)

release

store(A)

acquire

release

acquire

store(A)

A (invalid)

A (invalid)

A (invalid)

A (exclusive)

A (exclusive)

A (exclusive)

A (exclusive)

Page 13: D2MCE

HLRC Worst Case

1 count

3*4*n+8*sm byte

Node 1 Home Node 2 Node 3 Node 4

acquire

release

store(A)

A (invalid) A (shared) A (invalid) A (invalid)

acquire

release

store(A)

acquire

release

store(A)

acquire

release

store(A)

acquire

reply

A(invalid)Invalidate(A)

Invalidate(A)

Invalidate(A)

Page 14: D2MCE

HERC Best Case

Node 1 Home Node 2

acquire

release

store(A)A (exclusive) A (invalid)

A (invalid) A (invalid)

4 count

8*W byte

Invalidate

reply

acquire

release

store(A)

acquire

release

store(A)

Node 3

A (exclusive)

Page 15: D2MCE

Tradition ERC Best Case

2(n-1) count

8*W byte

Node 1 Node 2 Node 3 Node 4

acquire

release

store(A) A (exclusive) A (invalid)

A (invalid) A (invalid) A (exclusive) A (invalid)

Invalidate

reply

acquire

release

store(A)

acquire

release

store(A)

Page 16: D2MCE

HLRC Best Case

Node 0 Home Node1

A (invalid) A (exclusive)

release

store(A)

1 count

3*4*n+8*sm W byte

acquire

reply

acquire

release

store(A)

acquire

release

store(A)

acquire

Invalidate(A)

Page 17: D2MCE

Embedded and Parallel Systems Lab 17

Application

D2CME Libraries

Join / Leave Share Memory Barrier Mutex Semaphore

Thread Manager

Communication

Sender Receiver

ResourceManager

Share Memory Manager

BarrierManager

MutexManager

TCP/IP Based Socket

SemaphoreManager

D2MCE Architecture

D2M

CE

Page 18: D2MCE

Processing framework

Node

Process

Communication

Receiver

Thread pool

Thread pool

request

request

Queue

Queue

QueueThread pool

assignment

Page 19: D2MCE

Embedded and Parallel Systems Lab 19

Node

CommunicationProcess

ComputingThread

(Application)

Resource

Share Memory

Barrier

Mutex

Semphore

Receiver

Sender

Node

Node

Node

……

Request

Reply

Communication

Page 20: D2MCE

Thread pool process request

Node

CommunicationProcess

Share Memory thread 1busying

Receiver

Sender

Share Memory thread 2sleeping

Share Memory thread 3busying

Share Memory thread 4sleeping

request

request

Queue

request

request

request

Page 21: D2MCE

Embedded and Parallel Systems Lab 21

Low

Memory PoolHighMemory Address

64 1024 10240 Other Free

64

1024

10240

other

Page 22: D2MCE

Embedded and Parallel Systems Lab 22

Memory Pool

struct memory_info{size_t size;

};

表格 1 memory information structure

圖表 5 memory pool memory block

mem_malloc mem_free

Page 23: D2MCE

Embedded and Parallel Systems Lab 23

Thread safe

All function thread safe

struct request_header{unsigned short msg_type; // message typeunsigned int size; // package size

unsigned int src_node; // source node id unsigned int src_index; // source index number

unsigned int des_index; // destination index number

};

Page 24: D2MCE

Embedded and Parallel Systems Lab 24

CPU

Job

Core1 Core1

CPU

Core1 Core1

Two Level Parallel

Parallel on

Cluster

Parallel on

Multi-Coreor CPU

Page 25: D2MCE

Multi-thread call d2mce function

Node 1

load(A)

thread2

Home node2

thread1

load(A)

store(A)

block

A(invalid)

A(shared)

A’s state is shared

don’t send request

barrier

A(exclusive)

Page 26: D2MCE

Embedded and Parallel Systems Lab 26

Node1 Access

Node2 Access

Node2

False Sharing

Node1

Page

Page 27: D2MCE

Embedded and Parallel Systems Lab 27

Multiple-Writer Protocols

Page 28: D2MCE

Embedded and Parallel Systems Lab 28

Page 29: D2MCE

Embedded and Parallel Systems Lab 29

multiple-writer protocol

int d2mce_mload(void *share_memory, unsigned int offset, unsigned int length);int d2mce_mstore(void *share_memory, unsigned int offset, unsigned int length);

表格 3 Multiple-write protocol function

圖表 8 Multiple-writer protocol

Page 30: D2MCE

Embedded and Parallel Systems Lab 30

multiple-writer protocol

If(node_id == 0)d2mce_store(SM); // SM = share memoryd2mce_barrier(&barrier, nodes); // nodes = number of nodesd2mce_mload(SM, start*sizeof(TYPE), end*sizeof(TYPE));

表格 4 Scatter program pattern

d2mce_mstore(SM, start*sizeof(TYPE), end*sizeof(TYPE));d2mce_barrier(&barrier, nodes);if(node_id ==0)d2mce_load(SM)

表格 5 Gather program pattern

Page 31: D2MCE

Embedded and Parallel Systems Lab 31

Dynamic manager migration

int d2mce_sethome(void *share_memory);int d2mce_ibarrier_manager();int d2mce_isem_manager();int d2mce_imutex_manager();int d2mce_iresource_manager();

Page 32: D2MCE

manager migration

New manager

Node 0

Old manager

Node1 Node 2 Node 3

manage information

I home request

Init & set manage

information

ok

new manager

lock & wait service

forward

unlock & forward

request

request

new manager

block

Page 33: D2MCE

HRC broadcast

Node 1 Home Node 2 Node 3 Node 4

acquire

release

store(A)

acquire

release

load(A)

acquire

release

load(A)

release

load(A)

acquire

latency

Page 34: D2MCE

Node 1 Home Node 2 Node 3 Node 4

acquire

release

store(A)

acquire

load(A)

acquire

load(A)load(A)

acquire

HRC broadcast barrier

barrierlatency

Page 35: D2MCE

Node 2 Home Node 3Node 1 Node 4 Node 5 Node 6

store(A)

updateupdateupdate

register node

Home based Disseminate Update

load(A) load(A) load(A) load(A)

not invalid

invalidate

invalid

Page 36: D2MCE

Broadcast coding pattern

store node all need load node

Use mutex

d2mce_mutex_lock(&m1)d2mce_store(A)d2mce_mutex_unlock(&m1)

d2mce_mutex_lock(&m1)d2mce_load(A)d2mce_mutex_unlock(&m1)

Use barrier

d2mce_store(A)d2mce_barrier(&b1, neednodes)

d2mce_barrier(&b1, neednodes)d2mce_load(A)

Use semaphore

d2mce_store(A)for(i=0; i<neednodes; i++) d2mce_sem_post(&m1)

d2mce_sem_wait(&m1)d2mce_load(A)

Page 37: D2MCE

Home based Disseminate Update

int d2mce_update_register(void* share_memory);int d2mce_update_unregister(void* share_memory);

Page 38: D2MCE

Embedded and Parallel Systems Lab 38

Home based Disseminate Register

Node 1 Home Node 2

Register update

1

Input the table

Node 1 Home Node 2

Unregister updateClear the node

Page 39: D2MCE

Event driven

int d2mce_checkUpdate(void* share_memory);

Page 40: D2MCE

Embedded and Parallel Systems Lab 40

Event driven (update)

Node 1

Node 2

store(A)update

load(A) load(A)

ShareMemorythread

Computingthread

update A

checkupdate(A)

signal

Page 41: D2MCE

Embedded and Parallel Systems Lab 41

Event driven (invalid)

Node 1

Node 2

store(A)invalid

load(A) load(A)

ShareMemorythread

Computingthread

invalid A

checkupdate(A)

signal

update

request

Page 42: D2MCE

write and immediately load coding pattern Store node Load node

Use mutex

d2mce_mutex_lock(&m1)d2mce_store(A)d2mce_mutex_unlock(&m1)

while(1){ d2mce_mutex_lock(&m1) d2mce_load(A) d2mce_mutex_unlock(&m1)}

Use barrier

d2mce_store(A)d2mce_barrier(&b1, neednodes)

while(1){ d2mce_barrier(&b1, neednodes) d2mce_load(A)}

Use semaphore

d2mce_store(A)for(i=0; i<neednodes; i++) d2mce_sem_post(&m1, neednodes)

while(1){ d2mce_sem_wait(&m1) d2mce_load(A)}

Use even driven

d2mce_store(A) while(1){ d2mce_checkUpdate(A) d2mce_load(A)}

Page 43: D2MCE

Evaluation

MM

  1 2 4

128*128 0.0224598

0.0150916 [1.488231864]

0.0149468 [1.502649397]

256*256 0.1624132

0.09476025 [1.71393807]

0.07156825 [2.269347092]

512*512 1.3165244

0.6979126 [1.886374311]

0.438122 [3.004926482]

1024*1024 38.787176

20.96464 [1.850123637]

10.51557 [3.688547173]

2048*2048

362.6819634

184.635501 [1.964313263]

91.1462238 [3.979122209]

Page 44: D2MCE

Embedded and Parallel Systems Lab 44

Reference

1. Lamport, L. “How to make a correct multiprocess program execute correctly on amultiprocessor.”, IEEE Transactions on Computers, On page(s): 779-782, Jul 1997

2. K.Gharachorlook, D.Lenoski, J. Laudon, P.Gibbons, A.Gupta, and J.Hennessy. ”Memory Consistency and Event Ordering in Scalable Shared-Memory Multiprocessors.”, In Proceedings of the 17th Annual Symposium on Computer Architecture, Pages 15-26, May 1990

3. L. Iftode, J.P. Singh and K. Li. “Scope Consistency: A Bridge between Release Consistency and Entry Consistency.“, In Proc. of the 8th Annual ACM Symposium on Parallel Algorithms and Architectures, 1996.

4. J.B. Carter, J.K. Bennett, and W. Zwaenepoel.”Implementation and performance of Munin.” In Pro-ceedings of the 13th ACM Symposium on Operating Systems Principles, pages 152-164, October 1991.

Page 45: D2MCE

Embedded and Parallel Systems Lab 45

Reference

4. Keleher, P. Cox, A.L. Zwaenepoel, W. ”Lazy Release Consistency for Software Distributed Shared Memory.” , In Computer Architecture, 1992. Proceedings., The 19th Annual International Symposium, Pages 13-21, May 1992.

5. Y. Zhou, L. Iftode, and K. Li. ”Performance evaluation of two home-based lazy release consistency protocols for shared virtual memory systems.”, In Proceedings of the Second USENIX Symposium on Operating System Design and Implementation, pages 75-88, November 1996.

6. Cox, A.L.; de Lara, E.; Hu, C.; Zwaenepoel, W. ”A performance comparison of homeless and home-based lazy releaseconsistency protocols in software shared memory.” , In High-Performance Computer Architecture, 1999. Proceedings. Fifth International Symposium, page(s): 279-283, Jan 1999.

7. Byung-Hyun Yu, Zhiyi Huang, Stephen Cranefield, Martin Purvis. ”Homeless and Home-based Lazy Release Consistency Protocols on Distributed Shared.”, ACM International Conference Proceeding Series; Vol. 56 Proceedings of the 27th Australasian conference on Computer science - Volume 26, Pages:117-123, 2004 .

Page 46: D2MCE

Embedded and Parallel Systems Lab 46

Reference

9. Pete Keleher, Alan L. Cox, Sandhya Dwarkadas, Willy Zwaenepoel, “TreadMarks: distributed shared memory on standard workstations and operating systems”, In Proceedings of the winter USENIX Conference, pages:115-132, January 1994.

10. Cristiana Amza, Alan L. Cox, Sandhya Dwarkadas, Pete Keleher, Honghui Lu, Ramakrishnan Rajamony, Weimin Yu, Willy Zwaenepoel,”TreadMarks: shared memory computing on networks of workstations.” , IEEE Computer 29(2), 18-28, 1996.

11. B. Cheung, C. Wang, and K. Hwang. ”A Migrating-Home Protocol for Implementing Scope Consistency Model on a Cluster of Workstations.” In International Conference on Parallel and Distributed Processing Techniques and Applications, pages 821–827, 1999.

12. W. Hu, W. Shi, and Z. Tang. ”Home Migration in Home-based Software DSMs.” In Proc. of the 1st Workshop on Software Distributed Shared Memory (WSDSM’99), 1999.

Page 47: D2MCE

Embedded and Parallel Systems Lab 47

Reference

13. W. Fang, C.-L. Wang, W. Zhu, and F. C. Lau. “A novel adaptive home migration protocol in home-based DSM.” In Proc.of the 2004 IEEE International Conference on Cluster Computing (Cluster2004), pages 215-224, 2004.

14. Sandhya Dwarkadas, Peter Keleher, Alan L. Cox, Willy Zwaenepoel, “Evaluation of release consistent software distributed shared memory on emerging network technology.” ACM SIGARCH Computer Architecture News Volume 21 ,  Issue 2, Pages: 144 - 155  , May  1993

15. Weiwu Hu, Weisong Shi, Zhimin Tang, Zhiyu Zhou, “JIAJIA: An SVM System Based on a New Cache Coherence Protocol (1998)”, Proc. of the High-Performance Computing and Networking Europe 1999 (HPCN'99)

16. Wen-Yew Liang, Yu-Ming Hsieh and Zong-Ying Lyu, “Design of a Dynamic Distributed Mobile Computing Environment,” in the Proceedings of the 13th International Conference on Parallel and Distributed Systems (ICPADS 2007), Dec. 5-7, 2007, Hsinchu, Taiwan, NSC: 96-2221-E-027-023. (EI)

Page 48: D2MCE

Reference

17. Shun-Yun Hu, Guan-Ming Liao, “Scalable peer-to-peer networked virtual environment”, Network and System Support for Games Proceedings of 3rd ACM SIGCOMM workshop on Network and system support for games, Pages: 129 – 133, Year of Publication: 2004

18. Matt Welsh, Steven D. Gribble, Eric A. Brewer, David Culler,”A Design Framework for Highly Concurrent System”, EECS Department University of California, Berkeley Technical Report No. UCB/CSD-00-1108 2000.