meeting 2016 martin grønlien pejcoch computing ... · co/webinars/introduction-elk-stack. rewrite...

12
Computing Representatives' meeting 2016 Martin Grønlien Pejcoch

Upload: vandien

Post on 10-May-2018

215 views

Category:

Documents


3 download

TRANSCRIPT

Utskifting av bakgrunnsbilde:

- Høyreklikk på lysbildet og velg «Formater bakgrunn»

- Under «Fyll», velg «Bilde eller tekstur» og deretter «Fil…»

- Velg ønsket bakgrunnsbilde og klikk «Åpne»

- Avslutt med å velge «Lukk»

Computing Representatives' meeting 2016Martin Grønlien Pejcoch

The current SMS pipeline with NFS shared disk

2

Opdata

SMS

jobrestart

XCDP

Operator

Continuous delivery

3

@littleidea:

Continuous delivery

4

source: http://derberg.github.io/documentation-continuous-delivery/img/continuous-delivery-cycle.jpg

Post Processing Infrastructure

• our “ecGate”• spread over 2 datarooms• 1PB and 380TB lustre storage• GridEngine

- Operational queue- Research queue

5

Rewrite of the ECPDS receiver 1/3

~600GB daily (Increased from ~160GB)

6

Rewrite of the ECPDS receiver 2/3

7

Apache Mesos (with Docker containers running in Marathon)

Cluster manager mesos.apache.org

Productstatus Production overview (DB for production metadata and an API)

github.com/metno/productstatus

Apache Kafka Distributed message broker kafka.apache.org

EVA (EVent Adapter) Listens to Kafka and triggeres actions

github.com/metno/EVA

InfluxDB, Telegraf, Grafana, Kapacitor

Monitoring system influxdata.com, grafana.org

Postprocessing infrastructure similar to ecgate, built around Lustre FS and GridEngine

ELK stack (Elasticsearch, Logstash and KIBANA)

Log handling https://www.elastic.co/webinars/introduction-elk-stack

Rewrite of the ECPDS receiver 3/3

8

Data flow

Data processing

Finished or incoming data processing job

KafkaMessage queueProductstatus

Data processingData processing jobs

Storemetadata

Publish metadata

Distribute published metadata to all

Processing completed

Read additionalmetadata

Author: Kim T. Jensen

MetCoOp

NWP operational cooperation with Sweden

• HPC upgrade every 2 years• Currently running 2.5km Harmonie Arome and 11km Hirlam• Ensamble runs of the same model in test phase

- 9 members- 1 AROME and ALARO control run and 8 AROME members- Control member of the ENS to replace today’s deterministic run- Members distributed on both HPCs (5,4)

• Uses ecFlow on VMs to trigger the model• Postprocessing done separately for each institute (the PPI and

ecFlow at MET Norway)

9

Arome Arctic

• Same model and area size as for the MetCoOp (2.5km Arome)

• ecFlow on MET Norway VMs and PPI used to fetch, postprocess and create products

10

VGL (Access to big amounts of data selectively without having to rewrite all our SW tools)

11

Lustre

VGL client, transferring already rendered context Server

with GPU

Lustre

Users

• netCDF in ECPDS• SBU overview

12