manchester computing supercomputing, visualization & e-science stephen pickles, andrew porter,...

41
Manchester Computing Supercomputing, Visualization & e-Science Stephen Pickles, Andrew Porter, Robin Pinning & Rob Haines <[email protected]> http://www.realitygrid.org Royal Society, Tuesday 15 June, 2004 RealityGrid RealityGrid Software Infrastructure: Achievements and Prospects

Upload: anita-billups

Post on 14-Dec-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Manchester Computing Supercomputing, Visualization & e-Science Stephen Pickles, Andrew Porter, Robin Pinning & Rob Haines  Royal

Man

ch

este

r C

om

pu

tin

gSup

erc

om

puti

ng,

Vis

ualiz

ati

on &

e-S

cien

ce

Stephen Pickles, Andrew Porter, Robin Pinning & Rob Haines

<[email protected]>

http://www.realitygrid.org

Royal Society, Tuesday 15 June, 2004

RealityGridRealityGrid

Software Infrastructure: Achievements and Prospects

Page 2: Manchester Computing Supercomputing, Visualization & e-Science Stephen Pickles, Andrew Porter, Robin Pinning & Rob Haines  Royal

RealityGrid Annual Workshop, 15/6/20042

Outline

Review– How we got here

Status– Where we are today

Prospects– Where we’re going

Page 3: Manchester Computing Supercomputing, Visualization & e-Science Stephen Pickles, Andrew Porter, Robin Pinning & Rob Haines  Royal

Man

ch

este

r C

om

pu

tin

gSup

erc

om

puti

ng,

Vis

ualiz

ati

on &

e-S

cien

ce

ReviewReview

How we got here

Page 4: Manchester Computing Supercomputing, Visualization & e-Science Stephen Pickles, Andrew Porter, Robin Pinning & Rob Haines  Royal

RealityGrid Annual Workshop, 15/6/20044

The pieces

Fast track Computational Steering Library and tools (MC) On-line Visualization (MC) Web portal (EPCC) Human-Computer Interfaces (HCI)

Deep track Performance Control (CNC) Resource management, component frameworks (IC) Instruments: LUSI, XMT (not this talk)

This talk will emphasise fast track work.

Page 5: Manchester Computing Supercomputing, Visualization & e-Science Stephen Pickles, Andrew Porter, Robin Pinning & Rob Haines  Royal

RealityGrid Annual Workshop, 15/6/20045

Design philosophies

Grid-enabled Component-based and service-oriented

– plug in and compose new components and services, from partners and third parties

Independence and modularity– to minimize dependencies on third-party software

• Should be able to steer locally without and Grid middleware

– to facilitate parallel development within project

Integration and/or interoperability– Things should work together

Respect autonomy of application owners– Prefer light-weight instrumentation of application codes to wholesale re-factoring– Same source (or binary) should work with or without steering

Dynamism and adaptability– Attach/detach steering client from running application– Adapt to prevailing conditions

Intuitive and appropriate user interfaces

Page 6: Manchester Computing Supercomputing, Visualization & e-Science Stephen Pickles, Andrew Porter, Robin Pinning & Rob Haines  Royal

RealityGrid Annual Workshop, 15/6/20046

Historical Context –Messages from above

In 2002, we were told “use Globus, SRB or Condor”. Then we were told “Web services are OK too”. Then the Open Grid Services Architecture (OGSA) effort was

announced. OGSA would be based on the Open Grid Services Infrastructure

(OGSI), and specifications began in earnest with (it seemed) overwhelming industrial support.

“You must be on an OGSA-convergence track. You must use e-Science certificates.”

GT3 appears 2003. Some people build GT3 services. No-one builds production grids based on GT3.

Early in 2004, we hear “OGSI was a great success. OGSI is dead. Long live WS-RF. GT3 is obsolescent.”

Page 7: Manchester Computing Supercomputing, Visualization & e-Science Stephen Pickles, Andrew Porter, Robin Pinning & Rob Haines  Royal

RealityGrid Annual Workshop, 15/6/20047

2002 - Enter Grid Services

OGSI brought the hope of convergence between Web services (technology of choices for business process integration) and Grid computing.

It offered state, 2-level naming (GSH, GSR), lifetime management, and infrastructure support for common patterns (factories, registries, notification)…

With Dave Snelling, we experimented with UNICORE-based OGSI prototype (pre-dating GT3 preview).

Page 8: Manchester Computing Supercomputing, Visualization & e-Science Stephen Pickles, Andrew Porter, Robin Pinning & Rob Haines  Royal

RealityGrid Annual Workshop, 15/6/20048

First “Fast Track” Demonstration

Jens Harting at UK e-ScienceAll Hands Meeting, September 2002

Page 9: Manchester Computing Supercomputing, Visualization & e-Science Stephen Pickles, Andrew Porter, Robin Pinning & Rob Haines  Royal

RealityGrid Annual Workshop, 15/6/20049

“Fast Track” Steering DemoUK e-Science AHM 2002

BezierSGI Onyx @ Manchester

Vtk + VizServer

DiracSGI Onyx @ QMUL

LB3D with RealityGridSteering API

LaptopSHU Conference Centre

UNICOREGateway and NJS

Manchester

Fir

ew

al

l

SGI OpenGL VizServer

Simulation

Data

VizServer clientSteering GUI The Mind Electric GLUE web service hosting environment with OGSA extensionsSingle sign-on using UK e-Science digital certificates

UNICOREGateway and NJS

QMUL

Steering (XML)

Page 10: Manchester Computing Supercomputing, Visualization & e-Science Stephen Pickles, Andrew Porter, Robin Pinning & Rob Haines  Royal

RealityGrid Annual Workshop, 15/6/200410

Steering architecture in 2002

Communication modes:• Shared file system• Files moved by UNICORE daemon• GLOBUS-IO

Simulation

Visualization

Visualization

data transfer

Client

Steering library

Steering library

Steering library

Page 11: Manchester Computing Supercomputing, Visualization & e-Science Stephen Pickles, Andrew Porter, Robin Pinning & Rob Haines  Royal

RealityGrid Annual Workshop, 15/6/200411

Dilemma

Wanted to separate steering from job management Architecture was brittle and firewall unfriendly

– Client needed to know too much about application deployment

– Direct connection between client and simulation is problematic when client is mobile

OGSI’s lifetime management, registries, language neutrality and notification seemed ideal for steering– (ended up not using OGSI notification for firewall reasons)

But all “production” grids were based on Globus Toolkit version 2 (GT2)

Page 12: Manchester Computing Supercomputing, Visualization & e-Science Stephen Pickles, Andrew Porter, Robin Pinning & Rob Haines  Royal

RealityGrid Annual Workshop, 15/6/200412

Serendipity – OGSI::Lite

Mark Mc Keown’s OGSI::Lite started life as a spare time exercise to understand Web services, then OGSI.

Soon became a near-complete OGSI implementation.

Minimal pre-requisites (Perl and SOAP::Lite) meant we could deploy it trivially in user space when the job is run. Only need permission to listen on a port. (This would be highly non-trivial using deep stack of GT3.)

So we could have our OGSI cake and eat it on a GT2 grid.

Our steering architecture quickly got a middle-tier implemented in OGSI::Lite.

Page 13: Manchester Computing Supercomputing, Visualization & e-Science Stephen Pickles, Andrew Porter, Robin Pinning & Rob Haines  Royal

RealityGrid Annual Workshop, 15/6/200413

The Architecture of Steering

Steering client

Simulation

Steering library

VisualizationVisualization

Registry

Steering GS

Steering GS

connect

publish

find

bind

data transfer

(Globus-IO)

publish

bind

Client

Steering library

Steering library

Steering library Display

Display

Display

components start independently and

attach/detach dynamically

multiple clients: Qt/C++, .NET on PocketPC, GridSphere Portlet (Java) remote visualization through

SGI VizServer, Chromium, and/or streamed to Access Grid

OGSI middle tier

Page 14: Manchester Computing Supercomputing, Visualization & e-Science Stephen Pickles, Andrew Porter, Robin Pinning & Rob Haines  Royal

RealityGrid Annual Workshop, 15/6/200414

The TeraGyroid Project

Funding from EPSRC (UK) & NSF (USA) Ran LB3D across UK e-Science Grid and US TeraGrid Study of defect dynamics in liquid crystalline surfactant

systems using lattice-Boltzmann methods Featured world’s largest Lattice Boltzmann simulation TRICEPS was the HPC-Challenge aspect of this work

– Transcontinental RealityGrids for Interactive Collaborative Exploration of Parameter Space

– “most innovative data-intensive application” at SC’03

Later picked up ISC 2004 award in the “Integrated Data and Information Management” category

More in Richard Blake’s talk

Page 15: Manchester Computing Supercomputing, Visualization & e-Science Stephen Pickles, Andrew Porter, Robin Pinning & Rob Haines  Royal

RealityGrid Annual Workshop, 15/6/200415

New for TeraGyroid

Access Grid integration use of Chromium to complement VizServer job migration based on malleable checkpoints user friendly “wizard” to drive job launching and migration support for parameter space exploration through checkpoint trees

– also implemented in OGSI::Lite– services thrown together for TeraGyroid have been upgraded in flight– still running 8 months later

file transfer service– to get around issues with systems homed on two networks

port forwarding (Stephen Booth, EPCC)– to work around lack of public IP address on compute nodes (e.g. HPCx)

Page 16: Manchester Computing Supercomputing, Visualization & e-Science Stephen Pickles, Andrew Porter, Robin Pinning & Rob Haines  Royal

RealityGrid Annual Workshop, 15/6/200416

Checkpoint trees andparameter space exploration

Initial condition: Random water/ surfactant mixture.

Self-assembly starts.

Rewind and restart from checkpoint.

Lamellar phase: surfactant bilayers between water layers.

Cubic micellar phase, low surfactant density gradient.

Cubic micellar phase, high surfactant density gradient.

Page 17: Manchester Computing Supercomputing, Visualization & e-Science Stephen Pickles, Andrew Porter, Robin Pinning & Rob Haines  Royal

RealityGrid Annual Workshop, 15/6/200417

Access Grid integration - SC Global

Page 18: Manchester Computing Supercomputing, Visualization & e-Science Stephen Pickles, Andrew Porter, Robin Pinning & Rob Haines  Royal

RealityGrid Annual Workshop, 15/6/200418

TeraGyroid Testbed

VisualizationComputation

Starlight (Chicago)

Netherlight (Amsterdam)

BT provision

PSC

ANL

NCSA

Phoenix

Caltech

SDSC

UCL

Daresbury

Manchester

SJ4MB-NG

Network PoP

Access Grid nodeService Registry

production network

Dual-homed system

10 Gbps

2 x 1 Gbps

Page 19: Manchester Computing Supercomputing, Visualization & e-Science Stephen Pickles, Andrew Porter, Robin Pinning & Rob Haines  Royal

RealityGrid Annual Workshop, 15/6/200419

EPSRC e-Science Meeting 2004

Multiple steering clients driving same simulation– Qt client on laptop– .NET client on PDA

• Simon Nee (Loughborough)

– Web client• GridSphere Portlet• Access through web browser• Matthew Egbert (EPCC)

– not all at same time– significant achievement in terms of OGSI interoperability

Collaborative steering prototype– using ICENI and client proxy– Java bindings to client side of steering library (JNI)– Gary Kong (LeSC)

Page 20: Manchester Computing Supercomputing, Visualization & e-Science Stephen Pickles, Andrew Porter, Robin Pinning & Rob Haines  Royal

RealityGrid Annual Workshop, 15/6/200420

Public Release – April 2004

Steering Library released as version 1.1 version 1.0 was project internal very liberal open source license (FreeBSD) API specification version 1.1 Library (C and Fortran90 bindings) Tools, including Qt steerer User Manual Examples

Available for download at:http://www.sve.man.ac.uk/Research/AtoZ/RealityGrid/

Globus-IO replaced by vanilla sockets major simplification to build process only way to complete integration of NAMD and VMD into RealityGrid

Page 21: Manchester Computing Supercomputing, Visualization & e-Science Stephen Pickles, Andrew Porter, Robin Pinning & Rob Haines  Royal

Man

ch

este

r C

om

pu

tin

gSup

erc

om

puti

ng,

Vis

ualiz

ati

on &

e-S

cien

ce

StatusStatus

Where we are today

Page 22: Manchester Computing Supercomputing, Visualization & e-Science Stephen Pickles, Andrew Porter, Robin Pinning & Rob Haines  Royal

RealityGrid Annual Workshop, 15/6/200422

Steering library

We instrument (add "knobs" and "dials" to) simulation codes through a steering library, written in C– Bindings in Fortran90, C/C++ (complete) and Java (partial)

Library features:– Pause/resume– Checkpoint and restart– Set values of steerable parameters (parameter steer)– Report values of monitored (read-only) parameters (parameter watch)– Emit "samples" to remote systems for e.g. on-line visualization– Consume "samples" from remote systems for e.g. resetting boundary

conditions– Automatic emit/consume with steerable frequency– No restrictions on parallelisation paradigm

You only implement what you need

Page 23: Manchester Computing Supercomputing, Visualization & e-Science Stephen Pickles, Andrew Porter, Robin Pinning & Rob Haines  Royal

RealityGrid Annual Workshop, 15/6/200423

Qt Steering client

Built using C++ and QtAttaches to any steerable

RealityGrid applicationDiscovers what commands

are supportedDiscovers steerable &

monitored parametersConstructs appropriate

widgets on the fly

Page 24: Manchester Computing Supercomputing, Visualization & e-Science Stephen Pickles, Andrew Porter, Robin Pinning & Rob Haines  Royal

RealityGrid Annual Workshop, 15/6/200424

On-line visualisation

Fast track uses open source VTK for on-line visualisation– Simple GUI built with Tk/Tcl, polls for new data to refresh image

– Some in-built parallelism

– extended to use the steering library

– AVS-format data supported

– XDR-format data for sample transfer between platforms

– Volume render (parallel)

– Isosurface

– Hedgehog

– Cut-plane

New work on atom-centric meshes for Steve Kenny

Page 25: Manchester Computing Supercomputing, Visualization & e-Science Stephen Pickles, Andrew Porter, Robin Pinning & Rob Haines  Royal

RealityGrid Annual Workshop, 15/6/200425

OGSI is dead. Long live WS-RF!

WS-ResourceFramework preserves most OGSI ideas in a way which is friendlier (less abusive) to Web services.

Open Middleware Infrastructure Institute (OMII) has a conservative roadmap based on Web services.– WS-I plus as little else as possible

UK National Grid Service is aligned with EGEE.– This means Globus Toolkit version 2 for at least 12 months.

WS-RF (and WS-Notification) are moving targets. What does this mean for us?

Page 26: Manchester Computing Supercomputing, Visualization & e-Science Stephen Pickles, Andrew Porter, Robin Pinning & Rob Haines  Royal

RealityGrid Annual Workshop, 15/6/200426

Our response to WS-RF

We must be able to exploit the grids that exist– GT4 is unlikely to be stable and widely deployed in lifetime of RealityGrid

OGSI::Lite works fine for us, so continue to use it for now. In time, WS-RF may be appropriate.

– seems indicated for the Steering Grid Service, which is a very dynamic thing

– optional for persistent services such as Checkpoint Metadata Tree and Registry. These could be implemented in plain Web services.

WSRF::Lite is already an option– prototype released within a few weeks of first publication of WS-RF drafts

– featured in WS-RF interop fest in April, and interop demo at GGF 11 last week

Page 27: Manchester Computing Supercomputing, Visualization & e-Science Stephen Pickles, Andrew Porter, Robin Pinning & Rob Haines  Royal

RealityGrid Annual Workshop, 15/6/200427

Standards, generally

Very slow progress on Advance Reservation– RealityGrid requires co-allocation of compute, viz, AG resources at time to suit the

humans– LSF, PBS(Pro), SGE now support it, but not accessible through middleware– GRAAP-WG at GGF is bogged down in WS-Agreement and has yet to address

protocols and apply them to Advance Reservation problem

Practical WS-RF interoperability will require coherent, global security strategy for Web services, and a delegation model

– not clear that GT4 interoperability is the driver.– GT3 and GT4 security has never been on the standards table– what is GSI-SecureConversation anyway?

OGSA itself is a massive undertaking and will not settle in RealityGrid’s lifetime

RealityGrid is a provider of use case drivers for GRAAP, GridCPR, OGSA, SAGA (and other) groups in GGF

Page 28: Manchester Computing Supercomputing, Visualization & e-Science Stephen Pickles, Andrew Porter, Robin Pinning & Rob Haines  Royal

Man

ch

este

r C

om

pu

tin

gSup

erc

om

puti

ng,

Vis

ualiz

ati

on &

e-S

cien

ce

ProspectsProspects

Where we’re going

Page 29: Manchester Computing Supercomputing, Visualization & e-Science Stephen Pickles, Andrew Porter, Robin Pinning & Rob Haines  Royal

RealityGrid Annual Workshop, 15/6/200429

Steering

Plans Tabbed steerer (work in progress)

– single client tabs between multiple steerable simulations– required for thermodynamic integration work using NAMD

Steering of multi-component simulations (coupled models) – requires metadata about component interactions and schedule

Quantitative study of the overhead of steering and on-line visualization Support use of steering within project Final release of steering library, toolkit and documentation

Significant Gap - Security!!!– contingent on additional funding for WSRF::Lite– and coherent global security strategy for Web services

Page 30: Manchester Computing Supercomputing, Visualization & e-Science Stephen Pickles, Andrew Porter, Robin Pinning & Rob Haines  Royal

RealityGrid Annual Workshop, 15/6/200430

Steering - Wishlist

Port of steering services to WS-RF– probably in a follow-on project

Provenance of steering and parameter space exploration Collaborative steering

– i.e. support simultaneous connection of multiple clients

Scripted steering– Breakpoints ( IF (temperature > TOO_HOT) THEN … )

– Replay of previous steering actions

Integration of steering into selected MVEs– entirely feasible, but can’t do them all

Page 31: Manchester Computing Supercomputing, Visualization & e-Science Stephen Pickles, Andrew Porter, Robin Pinning & Rob Haines  Royal

RealityGrid Annual Workshop, 15/6/200431

Standardisation of Steering

Opportunities: Standardise an API for computational steering Standardise the WSDL of the Steering Grid Service

These could be input to the GGF research group “Simple APIs for Grid Applications” (SAGA-RG)

Is there critical mass?

Page 32: Manchester Computing Supercomputing, Visualization & e-Science Stephen Pickles, Andrew Porter, Robin Pinning & Rob Haines  Royal

RealityGrid Annual Workshop, 15/6/200432

Visualization

Plans Finish atom-centric meshes High-performance visualization

– re-evaluate AVS with Parallel Support Toolkit

“Thin visualization”– delivered to PDA or Web browser– thumbnails in checkpoint tree

Possibilities Use of *-ray from Utah AVS module for streaming to Access Grid VizServer integration:

– Put GSI authentication into VizServer PAM when released– Liaison with Platform and SGI regarding use of VizServer API for Advance

Reservation of graphics pipes

Page 33: Manchester Computing Supercomputing, Visualization & e-Science Stephen Pickles, Andrew Porter, Robin Pinning & Rob Haines  Royal

RealityGrid Annual Workshop, 15/6/200433

Launching and packaging

Plans Continue to improve usability Reduce deployment overhead

– wizard can now work with Java CoG kit• easier to deploy than Globus client bundles

Possibilities Integrate RLS or SRB into checkpoint tree Pick up Web service approaches to job submission

Page 34: Manchester Computing Supercomputing, Visualization & e-Science Stephen Pickles, Andrew Porter, Robin Pinning & Rob Haines  Royal

RealityGrid Annual Workshop, 15/6/200434

HCI

Plans Update of HCI Audit report in light of experiences Journal paper on the HCI of TeraGyroid .NET client

– deployable demonstrator with renderings on PDA and Windows laptop

Identified activities, off critical path, for PhD student VizServer QoS experiments with MB-NG or UK-Light Thin visualization for PDAs and Web portals

Page 35: Manchester Computing Supercomputing, Visualization & e-Science Stephen Pickles, Andrew Porter, Robin Pinning & Rob Haines  Royal

RealityGrid Annual Workshop, 15/6/200435

Portal

Currently provides Web client for steering– GridSphere portlet communicates with Steering Grid Service via SOAP

Prototype portlet for checkpoint tree browsing

Little resource (2-3 PM) remains for second phase of portal work.

Plans Finish checkpoint tree browsing Incorporate use of registry for simulation discovery Hope to inherit JSR168 portlets for job launching and monitoring limited visualization capability

– slice of scalar field– subject to resources

Page 36: Manchester Computing Supercomputing, Visualization & e-Science Stephen Pickles, Andrew Porter, Robin Pinning & Rob Haines  Royal

RealityGrid Annual Workshop, 15/6/200436

Resource Management – Deep Track

Advance Reservation– proof of concept using SGE 6.0

Implemented within Job Submission Web Service separated from ICENI– using Job Definition Markup Language (JDML)

• which is evolving into Job Submission Definition Language (JSDL) through Global Grid Forum JSDL working group

– designed to support plug-in of other job submission systems• eg. Globus, gsi-ssh, UNICORE, LSF,...

Page 37: Manchester Computing Supercomputing, Visualization & e-Science Stephen Pickles, Andrew Porter, Robin Pinning & Rob Haines  Royal

RealityGrid Annual Workshop, 15/6/200437

ICENI integration – Deep Track

Application

Steering library

Steering

GS

Control

Status

Data in / Data out

Technical report on feasibility of integrating fast-track steerable binary (with associated SGS) as an ICENI component

If practical, do it.

Page 38: Manchester Computing Supercomputing, Visualization & e-Science Stephen Pickles, Andrew Porter, Robin Pinning & Rob Haines  Royal

RealityGrid Annual Workshop, 15/6/200438

Performance Control – Deep Track

Performance Control of coupled models– working with HybridMD code and Bespoke Framework Generator (BFG)– outcomes: technology demonstrator & research papers– deployment in production is unlikely

Performance prediction of same– Steering of BFG-coupled models

Integration of PERCO and ICENI is not likely Generalised malleable-checkpoint library is unlikely

– major undertaking, re-inventing SRS from UTK– application specific alternatives always possible for those that need it

Proven to be possible to support steering or PERCO through a common API

– which simplifies instrumentation of application codes– but doing both at the same time leads to frighteningly complex interactions

Page 39: Manchester Computing Supercomputing, Visualization & e-Science Stephen Pickles, Andrew Porter, Robin Pinning & Rob Haines  Royal

RealityGrid Annual Workshop, 15/6/200439

Conclusions

We will not solve everything during the lifetime of RealityGrid

We must be ruthless about what we do and do not undertake

Page 40: Manchester Computing Supercomputing, Visualization & e-Science Stephen Pickles, Andrew Porter, Robin Pinning & Rob Haines  Royal

RealityGrid Annual Workshop, 15/6/200440

Partners

Academic University College London Queen Mary, University of London Imperial College University of Manchester University of Edinburgh University of Oxford University of Loughborough

Industrial Schlumberger Edward Jenner Institute for Vaccine

Research Silicon Graphics Inc Computation for Science Consortium Advanced Visual Systems Fujitsu BT Exact

Page 41: Manchester Computing Supercomputing, Visualization & e-Science Stephen Pickles, Andrew Porter, Robin Pinning & Rob Haines  Royal

Man

ch

este

r C

om

pu

tin

gSup

erc

om

puti

ng,

Vis

ualiz

ati

on &

e-S

cien

ce

Bringing Science and Supercomputers Together

http://www.sve.man.ac.uk

SVE @ Manchester ComputingSVE @ Manchester Computing