![Page 1: An Active Reliable Multicast Framework for the Grids](https://reader036.vdocuments.pub/reader036/viewer/2022062309/56813509550346895d9c5a2d/html5/thumbnails/1.jpg)
An Active Reliable Multicast Framework for the Grids
M. Maimour & C. Pham
ICCS 2002, AmsterdamNetwork Support and Services for Computational Grids Sunday, April 21st, 2002
http://www.ens-lyon.fr/LIP/RESAM
Action INRIA-RESO
![Page 2: An Active Reliable Multicast Framework for the Grids](https://reader036.vdocuments.pub/reader036/viewer/2022062309/56813509550346895d9c5a2d/html5/thumbnails/2.jpg)
2
Outline
Motivations behind (reliable) multicast
Use of active networks : the DyRAM protocol
DyRAM main services Simulation results Conclusion
![Page 3: An Active Reliable Multicast Framework for the Grids](https://reader036.vdocuments.pub/reader036/viewer/2022062309/56813509550346895d9c5a2d/html5/thumbnails/3.jpg)
3
From unicast…
Problem Sending same data to many receivers via unicast is inefficient.
Sender
data
datadata
data
Receiver Receiver Receiver
datadata
![Page 4: An Active Reliable Multicast Framework for the Grids](https://reader036.vdocuments.pub/reader036/viewer/2022062309/56813509550346895d9c5a2d/html5/thumbnails/4.jpg)
4
…to multicast on the Internet.Sender
data
datadata
data
Receiver Receiver Receiver
Problem Sending same data to many receivers via unicast is inefficient.
SolutionUsing multicast is more efficient
![Page 5: An Active Reliable Multicast Framework for the Grids](https://reader036.vdocuments.pub/reader036/viewer/2022062309/56813509550346895d9c5a2d/html5/thumbnails/5.jpg)
5
At the routing level, IP Multicast efficiently delivers packets to all the receivers subscribed to a multicast session but without any reliability guarantees.
Reliability (including flow and congestion control) is to be addressed at the transport level.
Reliable multicast
![Page 6: An Active Reliable Multicast Framework for the Grids](https://reader036.vdocuments.pub/reader036/viewer/2022062309/56813509550346895d9c5a2d/html5/thumbnails/6.jpg)
6
Data replications
Database updates
Code & data transfers
Data communications for distributed applications (collective & gather operations, sync. barrier)
Data replications
Database updates
Code & data transfers
Data communications for distributed applications (collective & gather operations, sync. barrier)
Reliable multicast: a big win for grids
Multicast address group 224.2.0.1
224.2.0.1
SDSC IBM SP1024 procs5x12x17 =1020
NCSA Origin Array256+128+1285x12x(4+2+2) =480
CPlant cluster256 nodes
![Page 7: An Active Reliable Multicast Framework for the Grids](https://reader036.vdocuments.pub/reader036/viewer/2022062309/56813509550346895d9c5a2d/html5/thumbnails/7.jpg)
7
Reliable multicast strategies
End-to-end solutions :Only the end hosts (the source and/or the receivers) are involved.Problem : the lack of topology information at the end hosts.
In-network solutions :Some intermediate nodes (router/server) are involved in the recovery process.
![Page 8: An Active Reliable Multicast Framework for the Grids](https://reader036.vdocuments.pub/reader036/viewer/2022062309/56813509550346895d9c5a2d/html5/thumbnails/8.jpg)
8
Active networking solutions
Active routers are able to perform customized computations on incoming packets: cache of data, feedback aggregation, filtering, subcasting, …
![Page 9: An Active Reliable Multicast Framework for the Grids](https://reader036.vdocuments.pub/reader036/viewer/2022062309/56813509550346895d9c5a2d/html5/thumbnails/9.jpg)
9
The DyRAM framework for grids(Dynamic Replier Active Reliable Multicast)
In order to enable distributed grid applications, main design goals are :
low recovery latency using local recovery
low memory usage in routers : local recovery is performed from the receivers (no cache in routers)
low processing overheads in routers : light active services
![Page 10: An Active Reliable Multicast Framework for the Grids](https://reader036.vdocuments.pub/reader036/viewer/2022062309/56813509550346895d9c5a2d/html5/thumbnails/10.jpg)
10
DyRAM loss recovery strategy : main active services
DyRAM is NACK-based …
Global NACK suppression Early packet loss detection Subcast of repair packets Dynamic replier election
![Page 11: An Active Reliable Multicast Framework for the Grids](https://reader036.vdocuments.pub/reader036/viewer/2022062309/56813509550346895d9c5a2d/html5/thumbnails/11.jpg)
11
Global NACKs suppression
NACK4NACK4
NACK4
NACK4data4
NACK4
only one NACK is forwarded to the source
![Page 12: An Active Reliable Multicast Framework for the Grids](https://reader036.vdocuments.pub/reader036/viewer/2022062309/56813509550346895d9c5a2d/html5/thumbnails/12.jpg)
12
Early loss packet detection
NACK4
NACK4
NACK4
NACK4
NACK4
A NACK is sent by the router
data3data4
data5
The repair latency can be reduced if the lost packet could be requested as soon as possible
These NACKs are ignored!
![Page 13: An Active Reliable Multicast Framework for the Grids](https://reader036.vdocuments.pub/reader036/viewer/2022062309/56813509550346895d9c5a2d/html5/thumbnails/13.jpg)
13
Replier election
A receiver is elected to be a replier for each lost packet (one recovery tree per packet)
Load balancing can be taken into account for the replier election
![Page 14: An Active Reliable Multicast Framework for the Grids](https://reader036.vdocuments.pub/reader036/viewer/2022062309/56813509550346895d9c5a2d/html5/thumbnails/14.jpg)
Replier election and repair subcast
IP multicastIP multicast
IP multicast
DyRAMDyRAM
IP multicast
IP multicast
DyRAMDyRAM
R1
R2R3R4
R5 R6 R7
0
12
1 0
NAK 2,@ NAK 2,@
NAK 2,@
NAK 2 from link 1NAK 2 from link 2
NAK 2
Repair 2
Repair 2
Repair 2
Repair 2
D0
D1
NAK 2
NAK 2
![Page 15: An Active Reliable Multicast Framework for the Grids](https://reader036.vdocuments.pub/reader036/viewer/2022062309/56813509550346895d9c5a2d/html5/thumbnails/15.jpg)
core networkGbits rate
1000 Base FX
active routeractive router
active router
active router
active router
100 Base FX
sourcesourceThe backbone is very fast so nothing else than fast forwarding functions.
• Nacks suppresion• Subcast• Loss detection
A hierarchy of active routers can be used for processing specific functions at different layers of the hierarchy.
Any receiver can be elected as a replier for a loss packet.
•Nacks suppression•Subcast •Replier election
The DyRAM framework for grids
![Page 16: An Active Reliable Multicast Framework for the Grids](https://reader036.vdocuments.pub/reader036/viewer/2022062309/56813509550346895d9c5a2d/html5/thumbnails/16.jpg)
16
Some simulation results
Network model and metrics used Local recovery from the receivers DyRAM vs. ARM (cache in routers) DyRAM : early lost packet detection
![Page 17: An Active Reliable Multicast Framework for the Grids](https://reader036.vdocuments.pub/reader036/viewer/2022062309/56813509550346895d9c5a2d/html5/thumbnails/17.jpg)
17
Network model
10 MBytes file transfer
Source router
![Page 18: An Active Reliable Multicast Framework for the Grids](https://reader036.vdocuments.pub/reader036/viewer/2022062309/56813509550346895d9c5a2d/html5/thumbnails/18.jpg)
18
Metrics
Load at the source : the number of the retransmissions from the source.
Load at the network : the consumed bandwidth.
Completion time per packet (latency).
![Page 19: An Active Reliable Multicast Framework for the Grids](https://reader036.vdocuments.pub/reader036/viewer/2022062309/56813509550346895d9c5a2d/html5/thumbnails/19.jpg)
19
Local recovery from the receivers (1)
Local recoveries reduces the end-to-end delay (especially for high loss rates and a large number of receivers).
#grp: 6…24
4 receivers/group
p=0.25
![Page 20: An Active Reliable Multicast Framework for the Grids](https://reader036.vdocuments.pub/reader036/viewer/2022062309/56813509550346895d9c5a2d/html5/thumbnails/20.jpg)
20
Local recovery from the receivers (2)
As the group size increases, doing the recoveries from the receivers greatly reduces the bandwidth consumption
48 receivers distributed in g groups #grp: 2…24
![Page 21: An Active Reliable Multicast Framework for the Grids](https://reader036.vdocuments.pub/reader036/viewer/2022062309/56813509550346895d9c5a2d/html5/thumbnails/21.jpg)
21
DyRAM vs ARM
ARM performs better than DyRAM only for very low loss rates and with considerable caching requirements
![Page 22: An Active Reliable Multicast Framework for the Grids](https://reader036.vdocuments.pub/reader036/viewer/2022062309/56813509550346895d9c5a2d/html5/thumbnails/22.jpg)
22
DyRAM: early lost packet detection
#grp: 6…244 receivers/group
The end-to-end latency is decreased when the early lost packet detection is enabled
![Page 23: An Active Reliable Multicast Framework for the Grids](https://reader036.vdocuments.pub/reader036/viewer/2022062309/56813509550346895d9c5a2d/html5/thumbnails/23.jpg)
23
Conclusions
Reliability on large-scale multicast is difficult.
Active services can provide more efficient solutions for reliable multicast related problems.
Main DyRAM design goal is reducing the end-to-end latencies using active services
which are keeped as light as possible making DyRAM more suitable to grid applications.