the role of the euclid archive system in distributed data
Post on 22-Mar-2022
4 Views
Preview:
TRANSCRIPT
Euclid Consortium
Slide 1
The Role of the EUCLID Archive System in Distributed Data Management & Processing
Rees Williams on behalf of
A.N.Belikov, K. Begeman, D.Boxhoorn, B. Dröge, E. Helmich, J.McFarland, T. Nutma, A.Tsyganov, E.A. Valentijn, W-J Vriend
University of Groningen, Groningen, The Netherlands
B.Altieri, G.Buenadicha, J. Hoar, S.Nieto, J. Salgado, P. de Teodoro European Space Astronomy Center, European Space Agency, Spain
Martin Melchior (IAL Data Model)
University of Applied Science, Brugg
Euclid Consortium
Slide 2
The Euclid Mission Archive (EAS) has a central role in the Euclid mission • EAS acts as interface between all ground system components • EAS data distributed across nine national SDCs • EAS meta-data is centrally stored • EAS metadata contains all information except images and spectra • EAS stores dependencies of data products
o Avoid unnecessary reprocessing o Data provenance ensured
• Builds on experience with the WISE information system used by
o OmegaCAM (VST), Muse (VLT), LOFAR LTA
• Builds on experience with earlier ESA archives • Traceability and data lineage are strong requirements
Euclid Consortium
Slide 4
EAS Design
DPS-MDR: Metadata Repository DPS-MAL: Metadata Access Layer DPS-MIS: Metadata Ingestion Service DPS-CUS: Consortium User Services DPS-CPS: Consortium Processing Services DPS-MRS: Metadata Replication Service SAS-MTS: Metadata Transfer Service SAS-AUS: Archive User Services COORS: Coordination & Orchestration Syst. IAL: Infrastructure Abstraction Layer HPCI: High Performance Computing I. DSS-SVR: Distributed Storage System Syst. DSS-SNI: Storage Node Interface CDM: Euclid Common Data Model SEDM: Science Exploitation Data Model
Scien
ce Com
mu
nity
Advanced Application Services
Consortium User Services
SEDM
SAS-MDR
DPS-MDR
DPS MAL
DPS CUS DPS CPS
SAS MAL
SAS MTS
DPS-MRS
EC Users
Metadata transfer
Scientific access
EAS-DPS@SDC-NL/Slave
EAS-DPS @ESAC/Master
EAS-SAS@ESAC
DPS MIS
SA
S-A
US
CLI
WEB
GU
I DPS-MDR
DPS MAL
DPS CUS
DPS CPS DPS MIS
DSS SVR DSS SNI HPCI
FS
DSS SVR
DSS SNI
FS
DSS SVR
DSS SVR DSS SNI
IAL
HPCI
FS
…
SDC 1 SDC n SOC ESAC
Notify SDCs
COORS
IAL
Ingest Metadata
Ingestion Job specification
Job spec.
Query/Retrieval Data (from any
DSS SVR) Query/Retrieval Data
SD
C C
omm
un
ity
CDM
CDM
Euclid Consortium
Slide 5
• DSS server installed at each SDC provides
access to data at that SDC to other SDCs
• File location in metadata provides a global “file system”
• No constraint on storage solutions at SDCs • DSS server supports the access of data in
SDCs using different protocols o sftp, https, gridftp, Astro-Wise
dataserver, Irods
• Python common code with modules to support local file systems
• Simple interface: store, retrieve, copy, delete,
check • Extended interface: cut-out services
Data Processing System (EAS-DPS)
Science Archive System (EAS-SAS)
Distributed Storage System (EAS-DSS)
Data files storage
Computing facilities
SOC
Data files storage
Computing facilities
SDC-FR
Data files storage
Computing facilities
SDC-ES
Data files storage
Computing facilities
SDC-NL
Data files storage
Computing facilities
SDC-DE
Data files storage
Computing facilities
SDC-UK
Data files storage
Computing facilities
SDC-IT
Data files storage
Computing facilities
SDC-CH
Data files storage
Computing facilities
SDC-FI
DSS Servers
Euclid Consortium
Slide 6
• Pipeline Processing Order (PPO) • Processing plan • SGS infrastructure • IAL processing defintions
Euclid Common Data Model • Data model in XSD, objects in XML • Common thesaurus (dictionary) • EAS is the only interface for data exchange in SGS,
common data model is the only model for this interface
• VIS, NIR, NISP, EXT, MER data products • Metadata exchanged between pipelines • Input for scientific data release
• Quality flags • Configuration parameters • Common definitions and templates
Euclid Consortium
Slide 7
Euclid Common Data Model Implementation
Original Euclid Data Model (XSD)
Python classes
Database schema User Interface and services
Detailed ECDM objects
implemented in full in the EAS
Provides lineage & traceability
Euclid Consortium
Slide 8
Processing and EAS
• Survey Plan → Processing Plans → PPOs
• PPO allocated to a SDC • EAS keeps all information on
each stage of processing • EAS supports re-processing
granularity at PPO level • Back- and forward- lineage for
PPOs and data products
Euclid Consortium
Slide 9
Metadata/data flow during processing
• SGS components interaction via EAS-DPS (services CPS and MIS) • Data files exchange via EAS-DSS • Processing triggering via publishing PPO from COORS to EAS-DPS (retrieved by IAL)
Euclid Consortium
Slide 11
Dbview and processing information
• Order to execute pipeline • Contains pipeline name and path to executable • Contains destination (SDC) • Contains status of execution and timestamps for
begin/end of execution
• Info on pipeline run (actual running of pipeline)
• Reference to input and output data • List of completed pipeline steps (tasks)
• Pipeline Task – minimum part of pipeline which can be run in isolation
• Input/output data for task • Id of pipeline task in pipeline
• For each produced data product → Pipeline Run Id → PPO Id→ Input data products • For each input raw frames → PPO list → Pipeline Run list → Output data products •
Euclid Consortium
Slide 16
Calibration Service
• Astro-WISE example • Change any calibration
frame/parameter • Set default
Euclid Consortium
Slide 17
• Third component of the EAS
• Data and metadata for public release is copied to the EAS Science Analysis System (SAS) by the Metadata Transfer Service (SAS-MTS)
• EAS Science Analysis System contains a reduced set of metadata o Optimised for Science Analysis o Uses the Euclid Science Exploitation Data Model
• See poster in hall on the EAS Science Analysis System for more information
EAS Science Analysis System
top related