1 vldb - data management in grids b. del-fabbro, d. laiymani, j.m. nicod and l. philippe laboratoire...
DESCRIPTION
3 Introduction: the NES context Example: antenna positioningTRANSCRIPT
1
VLDB - Data Management in Grids VLDB - Data Management in Grids
B. Del-Fabbro, D. Laiymani, J.M. Nicod and L. Philippe
Laboratoire d’Informatique de l’Université de Franche-Comté
Séoul, Koréa, 11 September 2006
Design and experimentations of an Design and experimentations of an efficient data management service for efficient data management service for
NES architecturesNES architectures
2
OutlineOutline
Introduction: the NES context Related work Motivations and issues The data management service Experimental results Conclusion and future work
VLDB - DMG Workshop - Séoul, Koréa - 11 September 2006
3
Introduction: the NES Introduction: the NES contextcontext
Example: antenna positioning
4
Introduction: the NES Introduction: the NES contextcontext
Exec (EXTRACTION, img1,img2)
Agent (Broker)
RPC-based model
Servers (provide services)
Client
5
Introduction: the NES Introduction: the NES contextcontext
Exec (ANTENNA, img3)
Agent
Data can be reused for further computations
6
Introduction: the NES Introduction: the NES contextcontext
Exec (EXTRACTION, img1,img2)
Agent
It is necessary to allow the storage of some data Data persistency
7
Introduction: the NES Introduction: the NES contextcontext
Exec (ANTENNA, &img3)
Agent
It is necessary to allow the storage of some data Data persistency
8
Introduction: the NES Introduction: the NES contextcontext
Exec(ANTENNA, &img3)Exec(RENDU,&img3)Exec(ANTENNA,&img3)
Agent
It is necessary to take advantage of parallelism due to independant tasks
Data replication
9
GoalGoal
To propose a data management service for NES architectures which
implements datapersistency and data replication
concepts in the most transparent way for end-users
VLDB - DMG Workshop - Séoul, Koréa - 11 September 2006
10
OutlineOutline
Introduction: the NES context Related work Motivations and issues The data management service Experimental results Conclusion and future work
VLDB - DMG Workshop - Séoul, Koréa - 11 September 2006
11
Related work: non-NES Related work: non-NES architecturesarchitectures
Data Grid context Separating data physical and logical view European Data Grid…
Grid Computing context Large number of widely distributed nodes GASS, LegionFS…
Stork Pre-placement tool Generally coupled with meta-scheduler
Concepts
VLDB - DMG Workshop - Séoul, Koréa - 11 September 2006
12
Related work: non-NES Related work: non-NES architecturesarchitectures
Mainly storage and system oriented
Difficult to adapt to NES environments
Data transfers are explicitely performed at the client level
Lack of transparency
VLDB - DMG Workshop - Séoul, Koréa - 11 September 2006
Drawbacks
13
Related work: NES Related work: NES architecturesarchitectures
Decreasing network traffic Between clients and servers Ensuring that no unnecessary data are transmitted
NetSolve Request Sequencing Distributed Storage Infrastructure (DSI)
Drawbacks Data management is performed for only one computation sequence
Data transfers are explicit at client level
VLDB - DMG Workshop - Séoul, Koréa - 11 September 2006
Concepts
14
OutlineOutline
Introduction: the NES context Related work Motivations and issues The data management service Experimental results Conclusion and future work
VLDB - DMG Workshop - Séoul, Koréa - 11 September 2006
15
IssuesIssues
Replicas consistency For update operations
Do all the replicas have to be updated ? Or all the replicas are independant copies ?
Data Storage To store data as close as possible to servers Physical limitations of storage resources
Security Secure access policy Data can be shared access rights
VLDB - DMG Workshop - Séoul, Koréa - 11 September 2006
A NES data management service must address the following issues:
16
IssuesIssues
Data localization For data item stored inside the platform To find where a data item is stored
Data identification A data item must be fully identified
a client does not have to know where its data are stored
Data handle = unique reference to a data item
Data redistribution Bandwith is better between servers than between clients and servers
Move data between computational servers
VLDB - DMG Workshop - Séoul, Koréa - 11 September 2006
A NES data management service must address the following issues:
17
OutlineOutline
Introduction: the NES context Related work Motivations and issues The data management service Experimental results Conclusion and future work
VLDB - DMG Workshop - Séoul, Koréa - 11 September 2006
18
The data management The data management service: DTMservice: DTM
Data Tree Manager (DTM) Distributed as a part of the DIET platform Flexible enough to be implemented in other platform
Distributed Interactive Engineering Toolbox (DIET) NES CORBA-based platform Hierarchical architecture Master and Local Agents Performance forecasting tool (FAST)
VLDB - DMG Workshop - Séoul, Koréa - 11 September 2006
Basics
19
The data management The data management service: DTMservice: DTM
VLDB - DMG Workshop - Séoul, Koréa - 11 September 2006
Architecture
20
The data management The data management service: DTMservice: DTM
The Logical Data Manager It manages a list of tuples (data handle, owners)
data present in its sub-tree It provides the localization knowledge
The Physical Data Manager It manages a list of persistent data It stores data and provides them to its server It informs its parent when update operations (add, move, delete) occur
VLDB - DMG Workshop - Séoul, Koréa - 11 September 2006
Components
21
The data management The data management service: DTMservice: DTM
The Data Mover It provides mechanisms for data transfers between Data Managers
Data transfer management and data recording are separatedIntegration of different transfer protocols: GridFTP, RFT…
The Replica Manager It sends replication orders to Data Mover It allows the choice of the best replica to be transferred (NWS tool)
It uses a distributed protocol no distinction between the original data and its replicas
Replicas are read-only but the architecture allows the implementation of any consistency technique
VLDB - DMG Workshop - Séoul, Koréa - 11 September 2006
Components
22
The data management The data management service: DTMservice: DTM
Communiation occurs between DIET and DTM components Low bandwith consumption for data management
Updates operations are limited to sub-trees Again low bandwith consumption for data management
DTM minimizes the number of data copy operations (CORBA) Crucial for large data
VLDB - DMG Workshop - Séoul, Koréa - 11 September 2006
Architecture advantages
23
The data management The data management service: DTMservice: DTM
Only end-users have the knowledge of the application they submit Only end-users have the knowledge of the data that must be managed
The persistence mode It allows to choose if data must be persistent or not
The data handleEnd-users do not need to know where data are stored
The API Based on the profile concept
Problem name + data or date handle + persistence mode
VLDB - DMG Workshop - Séoul, Koréa - 11 September 2006
The end-user point of view
24
OutlineOutline
Introduction: the NES context Related work Motivations and issues The data management service Experimental results Conclusion and future work
VLDB - DMG Workshop - Séoul, Koréa - 11 September 2006
25
Experimental resultsExperimental results
Previous experiments show:The good scalability and low overhead of DTM
The following tests show:The relevance of the data persistency approachThe performances of the data replication policy
Platform: DTM deployed over two laboratories far from 100 km
VLDB - DMG Workshop - Séoul, Koréa - 11 September 2006
Description
26
Experimental resultsExperimental results
1 MA - 2 LA and 2 servers locally interconnected (100 Mbits/s)
1 client in the remote site (16 Mbits/s)
Linear algebra application
VLDB - DMG Workshop - Séoul, Koréa - 11 September 2006
Data persistence benefits
27
Experimental resultsExperimental results
VLDB - DMG Workshop - Séoul, Koréa - 11 September 2006
Data persistence benefits
28
Experimental resultsExperimental results
1 MA - 6 servers
Computing the occurrences number of a letter in a file
Synchronous requests are sent to the platform
When data item are not present they are replicated
VLDB - DMG Workshop - Séoul, Koréa - 11 September 2006
Replication benefits
29
Experimental resultsExperimental results
VLDB - DMG Workshop - Séoul, Koréa - 11 September 2006
Replication benefits
30
Experimental resultsExperimental results
Medical imagery application
Input files (from 0.1 Mbytes up to 500 Mbytes)
Several extractions parameters are applied
Result = jpeg file
VLDB - DMG Workshop - Séoul, Koréa - 11 September 2006
Use case: Dividing Cubes
31
Experimental resultsExperimental results
VLDB - DMG Workshop - Séoul, Koréa - 11 September 2006
Use case: Dividing Cubes
32
ConclusionConclusion
Feasability for NES environments
Fully implemented and integrated in DIET since version 1.1
Promising experimental results
Normalisation proposition (GGF)
VLDB - DMG Workshop - Séoul, Koréa - 11 September 2006
33
Future workFuture work
Finalization of the GGF proposal
Tests on the Grid5000 platform
Fault tolerance
Integration of DTM in data grids
VLDB - DMG Workshop - Séoul, Koréa - 11 September 2006