desktop grids

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Desktop Grids

Ashok AdigaTexas Advanced Computing

Center{adiga@tacc.utexas.edu}

Topics

• What makes Desktop Grids different?

• What applications are suitable?• Three Solutions:

– Condor– United Devices Grid MP– BOINC

Compute Resources on the Grid

• Traditional: SMPs, MPPs, clusters, …– High speed, Reliable, Homogeneous,

Dedicated, Expensive (but getting cheaper)– High speed interconnects– Upto 1000s of CPUs

• Desktop PCs and Workstations– Low Speed (but improving!), Heterogeneous,

Unreliable, Non-dedicated, Inexpensive– Generic connections (Ethernet connections)– 1000s-10,000s of CPUs– Grid compute power increases as desktops are

upgraded

Desktop Grid Challenges

• Unobtrusiveness– Harness underutilized computing resources without

impacting the primary Desktop user• Added Security requirements

– Desktop machines typically not in secure environment– Must protect desktop & program from each other

(sandboxing)– Must ensure secure communications between grid nodes

• Connectivity characteristics– Not always connected to network (e.g. laptops)– Might not have fixed identifier (e.g. dynamic IP addresses)

• Limited Network Bandwidth– Ideal applications have high compute to communication

ratio– Data management is critical to performance

Desktop Grid Challenges (cont’d)

• Job Scheduling heterogeneous, non-dedicated resources is complex– Must match application requirements to

resource characteristics– Meeting QoS is difficult since program

might have to share the CPU with other desktop activity

• Desktops are typically unreliable– System must detect & recover from node failure

• Scalability issues– Software has to manage thousands of resources– Conventional application licensing is not set up for

desktop grids

Application Feasibility• Only some applications map well to

Desktop grids– Coarse-grain data parallelism– Parallel chunks relatively independent– High computation-data communication

ratios– Non-Intrusive behavior on client device

• Small memory footprint on the client• I/O activity is limited

– Executable and data sizes are dependent on available bandwidth

Typical Applications

• Desktop Grids naturally support data parallel applications– Monte Carlo methods– Large Database searches– Genetic Algorithms– Exhaustive search techniques– Parametric Design– Asynchronous Iterative algorithms

Condor• Condor manages pools of workstations and

dedicated clusters to create a distributed high-throughput computing (HTC) facility.– Created at University of Wisconsin– Project established in 1985

• Initially targeted at scheduling clusters providing functions such as:– Queuing– Scheduling– Priority Scheme– Resource Classifications

• And then extended to manage non-dedicated resources– Sandboxing– Job preemption

Why use Condor?• Condor has several unique mechanisms such

as :– ClassAd Matchmaking – Process checkpoint/ restart / migration– Remote System Calls– Grid Awareness– Glideins

• Support for multiple “Universes”– Vanilla, Java, MPI, PVM, Globus, …

• Very simple to install, manage, and use– Natural environment for application developers

• Free!

Typical Condor Pool

Central Manager

master

collector

negotiator

schedd

startd

= ClassAd Communication Pathway

= Process Spawned

Submit-Only

master

schedd

Execute-Only

master

startd

Regular Node

schedd

startd

master

Regular Node

schedd

startd

master

Execute-Only

master

startd

Condor ClassAds

• ClassAds are at the heart of Condor

• ClassAds– are a set of uniquely named

expressions; each expression is called an attribute

– combine query and data – semi-structured : no fixed schema– extensible

Sample ClassAd

MyType = "Machine"TargetType = "Job"Machine = "froth.cs.wisc.edu"Arch = "INTEL"OpSys = "SOLARIS251"Disk = 35882Memory = 128KeyboardIdle = 173LoadAvg = 0.1000Requirements = TARGET.Owner=="smith" || LoadAvg<=0.3 && KeyboardIdle>15*60

Condor Flocking• Central managers can allow schedds from other

pools to submit to them.

ScheddSchedd

CollectorCollector

NegotiatorNegotiator

Central Manager

(CONDOR_HOST)

CollectorCollector

Pool-Foo Central Manager

CollectorCollector

Pool-BarCentral Manager

SubmitMachine

Example: POVray on UT Grid Condor

Now: 15 min

Was: 2h 17 min

5-8 min

Parallel POVray on CondorA. Submitting POVray to Condor Pool – Perl

Script1. Automated creation of image “slices”2. Automated creation of condor submit files3. Automated creation of DAG file4. Using DAGman for job flow control

B. Multiple Architecture Support1. Executable = povray.$$(OpSys).$$(Arch)

C. Post processing with a C-executable1. “Stitching” image slices back together into one image file2. Using “xv” to display image back to user desktop

• Alternatively transferring image file back to user’s desktop

POVray Submit Description File

Universe = vanillaExecutable = povray.$$(OpSys).$$(Arch)Requirements = (Arch == "INTEL" && OpSys == "LINUX") ||

\ (Arch == "INTEL" && OpSys == "WINNT51")

|| \ (Arch == "INTEL" && OpSys == "WINNT52") transfer_files = ONEXITInput = glasschess_0.iniError = Errfile_0.errOutput = glasschess_0.ppmtransfer_input_files = glasschess.pov,chesspiece1.incarguments = glasschess_0.inilog = glasschess_0_condor.lognotification = NEVERqueue

DAGman Job Flow

…PARENT A1

A0 AnA2 A3 A4 A5

Pre-processing prior to executing Job B

DAGman Submission Script

# Filename: povray.dag Job A0 ./submit/povray_submit_0.cmdJob A1 ./submit/povray_submit_1.cmdJob A2 ./submit/povray_submit_2.cmdJob A3 ./submit/povray_submit_3.cmdJob A4 ./submit/povray_submit_4.cmdJob A5 ./submit/povray_submit_5.cmdJob A6 ./submit/povray_submit_6.cmdJob A7 ./submit/povray_submit_7.cmdJob A8 ./submit/povray_submit_8.cmdJob A9 ./submit/povray_submit_9.cmdJob A10 ./submit/povray_submit_10.cmdJob A11 ./submit/povray_submit_11.cmdJob A12 ./submit/povray_submit_12.cmdJob B barrier_job_submit.cmdPARENT A0 CHILD BPARENT A1 CHILD BPARENT A2 CHILD BPARENT A3 CHILD BPARENT A4 CHILD BPARENT A5 CHILD BPARENT A6 CHILD BPARENT A7 CHILD BPARENT A8 CHILD BPARENT A9 CHILD BPARENT A10 CHILD BPARENT A11 CHILD BPARENT A12 CHILD BScript PRE B postprocessing.sh

glasschess

$ condor_submit_dag povray.dag

#!/bin/sh./stitchppms glasschess > glasschess.ppm 2> /dev/nullrm *_*.ppm *.ini Err* *.log povray.dag.*/usr/X11R6/bin/xv $1.ppm

#!/bin/sh/bin/sleep 1

United Devices Grid MP• Commercial product that aggregates unused

cycles on desktop machines to provide a computing resource.

• Originally designed for non-dedicated resources– Security, non-intrusiveness, scheduling, …– Screensaver/graphical GUI on client desktop

• Support for multiple clients– Windows, Linux, Mac, AIX, & Solaris clients

How Grid MP™ Works

• Submits jobs• Monitors job progress• Processes results

• Authenticates users and devices• Dispatches jobs based on priority…• Monitors and reschedules failed jobs• Collects job results

• Web browser interface• Command Line Interface• XML Web services API

Low latency parallel jobs

Large sequential jobs

Large data parallel jobs

• Advertises capability• Launches job• Secure job execution • Returns result• Caches data for reuse

Administrator

Grid MP Agent- Clusters

Grid MP Agent-Workstation- Desktop

Grid MP Agent- Servers

Grid MP Services

UD Management Features• Enterprise features make it easier to

convince traditional IT organizations & and individual desktop users to install software– Browser-based administration tools allow local

management/policy specification to manage• Devices• Users• Workloads

– Single click install of client on PCs• Easily customizable to work with software

management packages

Grid MP™ Provisioning Example

User Group A

Grid MPServices

Root Administrator

User Group B

Device Group Administrator

Device Group Y

Device Group Z

Device Group Administrator(s)

Device Group XDevice Group X• User Groups A = 50%, B = 25%• Usage: 8am-5pm, 2hr cut-off• Runnable app. list …..

Device Group Y• User Group B = 100%• Usage: 24hrs, 1hr cut-off• Runnable app. list….

Device Group X• User Groups A = 50%, B = 50%• Usage: 6pm-8am, 8hr cut-off• Runnable app. list…..

Application Types Supported

• Batch jobs– Use mpsub command to run single

executable on single remote desktop

• MPI jobs– Use ud_mpirun command to run MPI job

across a set of desktop machines

• Data Parallel jobs– Single job consists of several independent

workunits that can be executed in parallel– Application developer must create program

modules and write application scripts to create workunits

Hosted Applications• Hosted Applications are easier to manage

– Provides users with managed application– Great for applications that are run frequently but

rarely updated– Data Parallel applications fit best in hosted scenario– Users do not have to deal with the application

maintenance only developer does.

• Grid MP is optimized for running hosted applications– Applications and data are cached at client nodes– Affinity scheduling to minimize data movement by re-

using cached executables and data. – Hosted application can be run across multiple

platforms by registering executables for each platform

Example: Reservoir Simulation

• Landmark’s VIP product benchmarked on Grid MP

• Workload consisted of 240 simulations for 5 wells– Sensitivities investigated include:

• 2 PVT cases, • 2 fault connectivity, • 2 aquifer cases, • 2 relative permeability cases,• 5 combinations of 5 wells• 3 combinations of vertical permeability multipliers

– Each simulation packaged as a separate piece of work.

• Similar Reservoir simulation application has been developed at TACC (with Dr. W. Bangerth, Institute of Geophysics)

Example: Drug Discovery• Think & LigandFit

applications– Internet Project in

partnership with Oxford University Model interactions between proteins and potential drug molecules

– Virtual screening of drug molecules to reduce time-consuming, expensive lab testing by 90%

– Drug Database of 3.5 billion candidate molecules.

– Over 350K active computers participating all over the world.

Think• Code developed at Oxford University• Application Characteristics

– Typical Input Data File: < 1 KB– Typical Output File: < 20 KB– Typical Execution Time: 1000-5000

minutes– Floating-point intensive– Small memory footprint– Fully resolved executable is ~3Mb in

Grid MP: POVray Application Portal

BOINC• Berkeley Open Infrastructure for

Network Computing (BOINC)– Open source follow-on to SETI@home – General architecture supports multiple

applications– Solution targets volunteer resources, and

not enterprise desktops/workstations – More information at

http://boinc.berkeley.edu • Currently being used by several internet

projects

Structure of a BOINC project

Scheduling server (C++)

BOINC DB(MySQL) Work

generation

data server (HTTP)

Web interfaces

Retry generation

Result validation

Result processing

Garbage collection

Ongoing tasks:- monitor server correctness- monitor server performance- develop and maintain applications

BOINC • No enterprise management tools

– Focus on “volunteer grid”• Provide incentives (points, teams, website)• Basic browser interface to set usage preferences

on PCs• Support for user community (forums)

• Simple interface for job management– Application developer creates scripts to

submit jobs and retrieve results

• Provides sandbox on client• No encryption: uses redundant

computing to prevent spoofing

Projects using BOINC• Climateprediction.net: study climate change • Einstein@home: search for gravitational signals

emitted by pulsars • LHC@home: improve the design of the CERN

LHC particle accelerator • Predictor@home: investigate protein-related

diseases • Rosetta@home: help researchers develop cures

for human diseases • SETI@home: Look for radio evidence of

extraterrestrial life • Cell Computing biomedical research (Japanese;

requires nonstandard client software) • World Community Grid: advance our knowledge

of human disease. (Requires 5.2.1 or greater)

SETI@home

• Analysis of radio telescope data from Arecibo– SETI: search for narrowband signals– Astropulse: search for short broadband signals

• 0.3 MB in, ~4 CPU hours, 10 KB out

Climateprediction.net

• Climate change study (Oxford University)– Met Office model (FORTRAN, 1M lines)

• Input: ~10MB executable, 1MB data• Output per workunit:

– 10 MB summary (always upload)– 1 GB detail file (archive on client, may upload)

• CPU time: 2-3 months (can't migrate)– trickle messages– preemptive scheduling

Why use Desktop Grids?• Desktop Grid solutions are typically complete &

standalone– Easy to setup and manage– Good entry vehicle to try out grids.

• Use existing (but underutilized) resources – Number of desktops/workstations on campus (or in an

enterprise) is typically an order of magnitude greater than traditional compute resources.

– Power of grid grows over time as new, faster desktops are added

• Typical (large) numbers of resources on desktop grids enable new approaches to solving problems

desktop grids

Documents

hanzi grids hanzi grids hanzi grids - chinese2018-19 ·...

gic impact on us grids

[cv - 2011.2] 14 - grids

sistemas distribuídos - grids computacionais

grids com bootstrap 3

programme smart grids

sparse grids - uni-bonn.de

[cv - 2011.2] 16 - grids (cont)

grids computacionais

ppt smart grids uruguay

documentation grids

grids or meshes

aplicaciones smart grids: movilidad con vehículo...

fe 1 grids

flexible electricity grids - citeseer

desktop grids

criar grids

3ºb grids

grids & e- ciencia

tyurina margarita grids