d2mce
DESCRIPTION
Distributed Mobile Computing Environment Document http://ir.ntut.edu.tw/ir/retrieve/35374/ntut-98-95598029-1.pdf Source https://github.com/ffbli666/D2MCETRANSCRIPT
D2MCE
Speaker :呂宗螢Adviser: 梁文耀 老師Date : 2008/07/14
Embedded and Parallel Systems Lab 2
D2MCE
Wireless
Network
Embedded and Parallel Systems Lab 3
DSM
Three State
Invalid Shared
Exclusive
wri
te m
iss
sh
are
s =
{n
od
e}
inv
ali
da
teinvalidate
read misssharers = shares + {node}
fetc
h
write
hit
shar
ers
= {node}
read hit / write hit
read hit
Black = all node process
Red = only home node
process
Embedded and Parallel Systems Lab 5
Invalidate & update
Node 1 Node2 Node 3 Node 4
store(A)
updateupdateupdate
load(A)
Node 1 Node2 Node 3 Node 4
store(A)
invalidate
load(A)
invalidateinvalidate
update
Invalidate Update
Embedded and Parallel Systems Lab 6
Release Consistency Definition
1. Before an ordinary access is allowed to perform with respect to any other processor, all previous acquires must be performed.
2. Before a release is allowed to perform with respect to any other processor, all previous ordinary read and writes must be performed.
3. Special accesses are sequentially consistent with respect to one another.
Embedded and Parallel Systems Lab 7
ERC & LRC
Lazy RCNode 1 Node 2 Node 3
store(A)
store(A)release
acquire
store(A)release
acquire
release
acquire
Eager RCNode 1 Node 2 Node 3
store(A)
release
store(A)release
acquire
store(A)release
acquire
acquire
Embedded and Parallel Systems Lab 8
Home-base & Homeless
Homeless Diff scattered in all the nodes Diff store Garbage collection
Home-base Centralize processing & always update No diff store No garbage collection Home node access the share memory no communication
Embedded and Parallel Systems Lab 9
HLRC
Node 1 Node 2 Home Node 3
store(A)
acquire
release
Load(A)
acquire
release
Invalidate(A)
twin
diff
apply diff
fetch page
Only send not invalid nodeInvalid
Node 1 Home Node2
Not invalid
Node 3
Invalid
Node 4
store(A)
acquire
release
invalidate
acquire
release
req
update
acquire
release
req
update
load(A)
load(A)
reply
HERC Worst Case
4*W count
8*W byte
Node 1 Home Node 2 Node 3 Node 4
acquire
release
store(A)A (exclusive) A (invalid)
A (invalid) A (shared) A (invalid) A (invalid)
acquire
release
store(A)
A (invalid) acquire
release
store(A)
A (exclusive)
A (exclusive)
A (invalid)
acquire
release
store(A)
A (invalid)
A (exclusive)
acquire
release
store(A) A (exclusive)
A (invalid)
Invalidate
reply
Tradition ERC Worst Case
2(n-1) count
8*W byte
Node 1 Node 2 Node 3 Node 4
acquire
release
store(A) A (invalid)
A (invalid) A (shared) A (invalid) A (invalid)
Invalidate
reply
acquire
release
store(A)
release
store(A)
acquire
release
acquire
store(A)
A (invalid)
A (invalid)
A (invalid)
A (exclusive)
A (exclusive)
A (exclusive)
A (exclusive)
HLRC Worst Case
1 count
3*4*n+8*sm byte
Node 1 Home Node 2 Node 3 Node 4
acquire
release
store(A)
A (invalid) A (shared) A (invalid) A (invalid)
acquire
release
store(A)
acquire
release
store(A)
acquire
release
store(A)
acquire
reply
A(invalid)Invalidate(A)
Invalidate(A)
Invalidate(A)
HERC Best Case
Node 1 Home Node 2
acquire
release
store(A)A (exclusive) A (invalid)
A (invalid) A (invalid)
4 count
8*W byte
Invalidate
reply
acquire
release
store(A)
acquire
release
store(A)
Node 3
A (exclusive)
Tradition ERC Best Case
2(n-1) count
8*W byte
Node 1 Node 2 Node 3 Node 4
acquire
release
store(A) A (exclusive) A (invalid)
A (invalid) A (invalid) A (exclusive) A (invalid)
Invalidate
reply
acquire
release
store(A)
acquire
release
store(A)
HLRC Best Case
Node 0 Home Node1
A (invalid) A (exclusive)
release
store(A)
1 count
3*4*n+8*sm W byte
acquire
reply
acquire
release
store(A)
acquire
release
store(A)
acquire
Invalidate(A)
Embedded and Parallel Systems Lab 17
Application
D2CME Libraries
Join / Leave Share Memory Barrier Mutex Semaphore
Thread Manager
Communication
Sender Receiver
ResourceManager
Share Memory Manager
BarrierManager
MutexManager
TCP/IP Based Socket
SemaphoreManager
…
D2MCE Architecture
D2M
CE
Processing framework
Node
Process
Communication
Receiver
Thread pool
Thread pool
request
request
Queue
Queue
QueueThread pool
assignment
Embedded and Parallel Systems Lab 19
Node
CommunicationProcess
ComputingThread
(Application)
Resource
Share Memory
Barrier
Mutex
Semphore
Receiver
Sender
Node
Node
Node
……
Request
Reply
Communication
Thread pool process request
Node
CommunicationProcess
Share Memory thread 1busying
Receiver
Sender
Share Memory thread 2sleeping
Share Memory thread 3busying
Share Memory thread 4sleeping
request
request
Queue
request
request
request
Embedded and Parallel Systems Lab 21
Low
Memory PoolHighMemory Address
64 1024 10240 Other Free
64
1024
10240
other
Embedded and Parallel Systems Lab 22
Memory Pool
struct memory_info{size_t size;
};
表格 1 memory information structure
圖表 5 memory pool memory block
mem_malloc mem_free
Embedded and Parallel Systems Lab 23
Thread safe
All function thread safe
struct request_header{unsigned short msg_type; // message typeunsigned int size; // package size
unsigned int src_node; // source node id unsigned int src_index; // source index number
unsigned int des_index; // destination index number
};
Embedded and Parallel Systems Lab 24
CPU
Job
Core1 Core1
CPU
Core1 Core1
Two Level Parallel
Parallel on
Cluster
Parallel on
Multi-Coreor CPU
Multi-thread call d2mce function
Node 1
load(A)
thread2
Home node2
thread1
load(A)
store(A)
block
A(invalid)
A(shared)
A’s state is shared
don’t send request
barrier
A(exclusive)
Embedded and Parallel Systems Lab 26
Node1 Access
Node2 Access
Node2
False Sharing
Node1
Page
Embedded and Parallel Systems Lab 27
Multiple-Writer Protocols
Embedded and Parallel Systems Lab 28
Embedded and Parallel Systems Lab 29
multiple-writer protocol
int d2mce_mload(void *share_memory, unsigned int offset, unsigned int length);int d2mce_mstore(void *share_memory, unsigned int offset, unsigned int length);
表格 3 Multiple-write protocol function
圖表 8 Multiple-writer protocol
Embedded and Parallel Systems Lab 30
multiple-writer protocol
If(node_id == 0)d2mce_store(SM); // SM = share memoryd2mce_barrier(&barrier, nodes); // nodes = number of nodesd2mce_mload(SM, start*sizeof(TYPE), end*sizeof(TYPE));
表格 4 Scatter program pattern
d2mce_mstore(SM, start*sizeof(TYPE), end*sizeof(TYPE));d2mce_barrier(&barrier, nodes);if(node_id ==0)d2mce_load(SM)
表格 5 Gather program pattern
Embedded and Parallel Systems Lab 31
Dynamic manager migration
int d2mce_sethome(void *share_memory);int d2mce_ibarrier_manager();int d2mce_isem_manager();int d2mce_imutex_manager();int d2mce_iresource_manager();
manager migration
New manager
Node 0
Old manager
Node1 Node 2 Node 3
manage information
I home request
Init & set manage
information
ok
new manager
lock & wait service
forward
unlock & forward
request
request
new manager
block
HRC broadcast
Node 1 Home Node 2 Node 3 Node 4
acquire
release
store(A)
acquire
release
load(A)
acquire
release
load(A)
release
load(A)
acquire
latency
Node 1 Home Node 2 Node 3 Node 4
acquire
release
store(A)
acquire
load(A)
acquire
load(A)load(A)
acquire
HRC broadcast barrier
barrierlatency
Node 2 Home Node 3Node 1 Node 4 Node 5 Node 6
store(A)
updateupdateupdate
register node
Home based Disseminate Update
load(A) load(A) load(A) load(A)
not invalid
invalidate
invalid
Broadcast coding pattern
store node all need load node
Use mutex
d2mce_mutex_lock(&m1)d2mce_store(A)d2mce_mutex_unlock(&m1)
d2mce_mutex_lock(&m1)d2mce_load(A)d2mce_mutex_unlock(&m1)
Use barrier
d2mce_store(A)d2mce_barrier(&b1, neednodes)
d2mce_barrier(&b1, neednodes)d2mce_load(A)
Use semaphore
d2mce_store(A)for(i=0; i<neednodes; i++) d2mce_sem_post(&m1)
d2mce_sem_wait(&m1)d2mce_load(A)
Home based Disseminate Update
int d2mce_update_register(void* share_memory);int d2mce_update_unregister(void* share_memory);
Embedded and Parallel Systems Lab 38
Home based Disseminate Register
Node 1 Home Node 2
Register update
1
Input the table
Node 1 Home Node 2
Unregister updateClear the node
Event driven
int d2mce_checkUpdate(void* share_memory);
Embedded and Parallel Systems Lab 40
Event driven (update)
Node 1
Node 2
store(A)update
load(A) load(A)
ShareMemorythread
Computingthread
update A
checkupdate(A)
signal
Embedded and Parallel Systems Lab 41
Event driven (invalid)
Node 1
Node 2
store(A)invalid
load(A) load(A)
ShareMemorythread
Computingthread
invalid A
checkupdate(A)
signal
update
request
write and immediately load coding pattern Store node Load node
Use mutex
d2mce_mutex_lock(&m1)d2mce_store(A)d2mce_mutex_unlock(&m1)
while(1){ d2mce_mutex_lock(&m1) d2mce_load(A) d2mce_mutex_unlock(&m1)}
Use barrier
d2mce_store(A)d2mce_barrier(&b1, neednodes)
while(1){ d2mce_barrier(&b1, neednodes) d2mce_load(A)}
Use semaphore
d2mce_store(A)for(i=0; i<neednodes; i++) d2mce_sem_post(&m1, neednodes)
while(1){ d2mce_sem_wait(&m1) d2mce_load(A)}
Use even driven
d2mce_store(A) while(1){ d2mce_checkUpdate(A) d2mce_load(A)}
Evaluation
MM
1 2 4
128*128 0.0224598
0.0150916 [1.488231864]
0.0149468 [1.502649397]
256*256 0.1624132
0.09476025 [1.71393807]
0.07156825 [2.269347092]
512*512 1.3165244
0.6979126 [1.886374311]
0.438122 [3.004926482]
1024*1024 38.787176
20.96464 [1.850123637]
10.51557 [3.688547173]
2048*2048
362.6819634
184.635501 [1.964313263]
91.1462238 [3.979122209]
Embedded and Parallel Systems Lab 44
Reference
1. Lamport, L. “How to make a correct multiprocess program execute correctly on amultiprocessor.”, IEEE Transactions on Computers, On page(s): 779-782, Jul 1997
2. K.Gharachorlook, D.Lenoski, J. Laudon, P.Gibbons, A.Gupta, and J.Hennessy. ”Memory Consistency and Event Ordering in Scalable Shared-Memory Multiprocessors.”, In Proceedings of the 17th Annual Symposium on Computer Architecture, Pages 15-26, May 1990
3. L. Iftode, J.P. Singh and K. Li. “Scope Consistency: A Bridge between Release Consistency and Entry Consistency.“, In Proc. of the 8th Annual ACM Symposium on Parallel Algorithms and Architectures, 1996.
4. J.B. Carter, J.K. Bennett, and W. Zwaenepoel.”Implementation and performance of Munin.” In Pro-ceedings of the 13th ACM Symposium on Operating Systems Principles, pages 152-164, October 1991.
Embedded and Parallel Systems Lab 45
Reference
4. Keleher, P. Cox, A.L. Zwaenepoel, W. ”Lazy Release Consistency for Software Distributed Shared Memory.” , In Computer Architecture, 1992. Proceedings., The 19th Annual International Symposium, Pages 13-21, May 1992.
5. Y. Zhou, L. Iftode, and K. Li. ”Performance evaluation of two home-based lazy release consistency protocols for shared virtual memory systems.”, In Proceedings of the Second USENIX Symposium on Operating System Design and Implementation, pages 75-88, November 1996.
6. Cox, A.L.; de Lara, E.; Hu, C.; Zwaenepoel, W. ”A performance comparison of homeless and home-based lazy releaseconsistency protocols in software shared memory.” , In High-Performance Computer Architecture, 1999. Proceedings. Fifth International Symposium, page(s): 279-283, Jan 1999.
7. Byung-Hyun Yu, Zhiyi Huang, Stephen Cranefield, Martin Purvis. ”Homeless and Home-based Lazy Release Consistency Protocols on Distributed Shared.”, ACM International Conference Proceeding Series; Vol. 56 Proceedings of the 27th Australasian conference on Computer science - Volume 26, Pages:117-123, 2004 .
Embedded and Parallel Systems Lab 46
Reference
9. Pete Keleher, Alan L. Cox, Sandhya Dwarkadas, Willy Zwaenepoel, “TreadMarks: distributed shared memory on standard workstations and operating systems”, In Proceedings of the winter USENIX Conference, pages:115-132, January 1994.
10. Cristiana Amza, Alan L. Cox, Sandhya Dwarkadas, Pete Keleher, Honghui Lu, Ramakrishnan Rajamony, Weimin Yu, Willy Zwaenepoel,”TreadMarks: shared memory computing on networks of workstations.” , IEEE Computer 29(2), 18-28, 1996.
11. B. Cheung, C. Wang, and K. Hwang. ”A Migrating-Home Protocol for Implementing Scope Consistency Model on a Cluster of Workstations.” In International Conference on Parallel and Distributed Processing Techniques and Applications, pages 821–827, 1999.
12. W. Hu, W. Shi, and Z. Tang. ”Home Migration in Home-based Software DSMs.” In Proc. of the 1st Workshop on Software Distributed Shared Memory (WSDSM’99), 1999.
Embedded and Parallel Systems Lab 47
Reference
13. W. Fang, C.-L. Wang, W. Zhu, and F. C. Lau. “A novel adaptive home migration protocol in home-based DSM.” In Proc.of the 2004 IEEE International Conference on Cluster Computing (Cluster2004), pages 215-224, 2004.
14. Sandhya Dwarkadas, Peter Keleher, Alan L. Cox, Willy Zwaenepoel, “Evaluation of release consistent software distributed shared memory on emerging network technology.” ACM SIGARCH Computer Architecture News Volume 21 , Issue 2, Pages: 144 - 155 , May 1993
15. Weiwu Hu, Weisong Shi, Zhimin Tang, Zhiyu Zhou, “JIAJIA: An SVM System Based on a New Cache Coherence Protocol (1998)”, Proc. of the High-Performance Computing and Networking Europe 1999 (HPCN'99)
16. Wen-Yew Liang, Yu-Ming Hsieh and Zong-Ying Lyu, “Design of a Dynamic Distributed Mobile Computing Environment,” in the Proceedings of the 13th International Conference on Parallel and Distributed Systems (ICPADS 2007), Dec. 5-7, 2007, Hsinchu, Taiwan, NSC: 96-2221-E-027-023. (EI)
Reference
17. Shun-Yun Hu, Guan-Ming Liao, “Scalable peer-to-peer networked virtual environment”, Network and System Support for Games Proceedings of 3rd ACM SIGCOMM workshop on Network and system support for games, Pages: 129 – 133, Year of Publication: 2004
18. Matt Welsh, Steven D. Gribble, Eric A. Brewer, David Culler,”A Design Framework for Highly Concurrent System”, EECS Department University of California, Berkeley Technical Report No. UCB/CSD-00-1108 2000.