desktop grids
Post on 07-Jan-2016
29 Views
Preview:
DESCRIPTION
TRANSCRIPT
December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide
Desktop Grids
Ashok AdigaTexas Advanced Computing
Center{adiga@tacc.utexas.edu}
December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide
Topics
• What makes Desktop Grids different?
• What applications are suitable?• Three Solutions:
– Condor– United Devices Grid MP– BOINC
December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide
Compute Resources on the Grid
• Traditional: SMPs, MPPs, clusters, …– High speed, Reliable, Homogeneous,
Dedicated, Expensive (but getting cheaper)– High speed interconnects– Upto 1000s of CPUs
• Desktop PCs and Workstations– Low Speed (but improving!), Heterogeneous,
Unreliable, Non-dedicated, Inexpensive– Generic connections (Ethernet connections)– 1000s-10,000s of CPUs– Grid compute power increases as desktops are
upgraded
December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide
Desktop Grid Challenges
• Unobtrusiveness– Harness underutilized computing resources without
impacting the primary Desktop user• Added Security requirements
– Desktop machines typically not in secure environment– Must protect desktop & program from each other
(sandboxing)– Must ensure secure communications between grid nodes
• Connectivity characteristics– Not always connected to network (e.g. laptops)– Might not have fixed identifier (e.g. dynamic IP addresses)
• Limited Network Bandwidth– Ideal applications have high compute to communication
ratio– Data management is critical to performance
December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide
Desktop Grid Challenges (cont’d)
• Job Scheduling heterogeneous, non-dedicated resources is complex– Must match application requirements to
resource characteristics– Meeting QoS is difficult since program
might have to share the CPU with other desktop activity
• Desktops are typically unreliable– System must detect & recover from node failure
• Scalability issues– Software has to manage thousands of resources– Conventional application licensing is not set up for
desktop grids
December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide
Application Feasibility• Only some applications map well to
Desktop grids– Coarse-grain data parallelism– Parallel chunks relatively independent– High computation-data communication
ratios– Non-Intrusive behavior on client device
• Small memory footprint on the client• I/O activity is limited
– Executable and data sizes are dependent on available bandwidth
December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide
Typical Applications
• Desktop Grids naturally support data parallel applications– Monte Carlo methods– Large Database searches– Genetic Algorithms– Exhaustive search techniques– Parametric Design– Asynchronous Iterative algorithms
December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide
Condor• Condor manages pools of workstations and
dedicated clusters to create a distributed high-throughput computing (HTC) facility.– Created at University of Wisconsin– Project established in 1985
• Initially targeted at scheduling clusters providing functions such as:– Queuing– Scheduling– Priority Scheme– Resource Classifications
• And then extended to manage non-dedicated resources– Sandboxing– Job preemption
December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide
Why use Condor?• Condor has several unique mechanisms such
as :– ClassAd Matchmaking – Process checkpoint/ restart / migration– Remote System Calls– Grid Awareness– Glideins
• Support for multiple “Universes”– Vanilla, Java, MPI, PVM, Globus, …
• Very simple to install, manage, and use– Natural environment for application developers
• Free!
December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide
Typical Condor Pool
Central Manager
master
collector
negotiator
schedd
startd
= ClassAd Communication Pathway
= Process Spawned
Submit-Only
master
schedd
Execute-Only
master
startd
Regular Node
schedd
startd
master
Regular Node
schedd
startd
master
Execute-Only
master
startd
December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide
Condor ClassAds
• ClassAds are at the heart of Condor
• ClassAds– are a set of uniquely named
expressions; each expression is called an attribute
– combine query and data – semi-structured : no fixed schema– extensible
December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide
Sample ClassAd
MyType = "Machine"TargetType = "Job"Machine = "froth.cs.wisc.edu"Arch = "INTEL"OpSys = "SOLARIS251"Disk = 35882Memory = 128KeyboardIdle = 173LoadAvg = 0.1000Requirements = TARGET.Owner=="smith" || LoadAvg<=0.3 && KeyboardIdle>15*60
December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide
Condor Flocking• Central managers can allow schedds from other
pools to submit to them.
ScheddSchedd
CollectorCollector
NegotiatorNegotiator
Central Manager
(CONDOR_HOST)
CollectorCollector
NegotiatorNegotiator
Pool-Foo Central Manager
CollectorCollector
NegotiatorNegotiator
Pool-BarCentral Manager
SubmitMachine
December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide
Example: POVray on UT Grid Condor
…
Now: 15 min
Was: 2h 17 min
5-8 min
5-8 min
December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide
Parallel POVray on CondorA. Submitting POVray to Condor Pool – Perl
Script1. Automated creation of image “slices”2. Automated creation of condor submit files3. Automated creation of DAG file4. Using DAGman for job flow control
B. Multiple Architecture Support1. Executable = povray.$$(OpSys).$$(Arch)
C. Post processing with a C-executable1. “Stitching” image slices back together into one image file2. Using “xv” to display image back to user desktop
• Alternatively transferring image file back to user’s desktop
December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide
POVray Submit Description File
Universe = vanillaExecutable = povray.$$(OpSys).$$(Arch)Requirements = (Arch == "INTEL" && OpSys == "LINUX") ||
\ (Arch == "INTEL" && OpSys == "WINNT51")
|| \ (Arch == "INTEL" && OpSys == "WINNT52") transfer_files = ONEXITInput = glasschess_0.iniError = Errfile_0.errOutput = glasschess_0.ppmtransfer_input_files = glasschess.pov,chesspiece1.incarguments = glasschess_0.inilog = glasschess_0_condor.lognotification = NEVERqueue
December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide
DAGman Job Flow
CHILD
…PARENT A1
B
A0 AnA2 A3 A4 A5
Pre-processing prior to executing Job B
December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide
DAGman Submission Script
# Filename: povray.dag Job A0 ./submit/povray_submit_0.cmdJob A1 ./submit/povray_submit_1.cmdJob A2 ./submit/povray_submit_2.cmdJob A3 ./submit/povray_submit_3.cmdJob A4 ./submit/povray_submit_4.cmdJob A5 ./submit/povray_submit_5.cmdJob A6 ./submit/povray_submit_6.cmdJob A7 ./submit/povray_submit_7.cmdJob A8 ./submit/povray_submit_8.cmdJob A9 ./submit/povray_submit_9.cmdJob A10 ./submit/povray_submit_10.cmdJob A11 ./submit/povray_submit_11.cmdJob A12 ./submit/povray_submit_12.cmdJob B barrier_job_submit.cmdPARENT A0 CHILD BPARENT A1 CHILD BPARENT A2 CHILD BPARENT A3 CHILD BPARENT A4 CHILD BPARENT A5 CHILD BPARENT A6 CHILD BPARENT A7 CHILD BPARENT A8 CHILD BPARENT A9 CHILD BPARENT A10 CHILD BPARENT A11 CHILD BPARENT A12 CHILD BScript PRE B postprocessing.sh
glasschess
$ condor_submit_dag povray.dag
#!/bin/sh./stitchppms glasschess > glasschess.ppm 2> /dev/nullrm *_*.ppm *.ini Err* *.log povray.dag.*/usr/X11R6/bin/xv $1.ppm
#!/bin/sh/bin/sleep 1
December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide
United Devices Grid MP• Commercial product that aggregates unused
cycles on desktop machines to provide a computing resource.
• Originally designed for non-dedicated resources– Security, non-intrusiveness, scheduling, …– Screensaver/graphical GUI on client desktop
• Support for multiple clients– Windows, Linux, Mac, AIX, & Solaris clients
December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide
How Grid MP™ Works
• Submits jobs• Monitors job progress• Processes results
• Authenticates users and devices• Dispatches jobs based on priority…• Monitors and reschedules failed jobs• Collects job results
• Web browser interface• Command Line Interface• XML Web services API
User
Low latency parallel jobs
Large sequential jobs
Large data parallel jobs
• Advertises capability• Launches job• Secure job execution • Returns result• Caches data for reuse
Administrator
Grid MP Agent- Clusters
Grid MP Agent-Workstation- Desktop
Grid MP Agent- Servers
Grid MP Services
December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide
UD Management Features• Enterprise features make it easier to
convince traditional IT organizations & and individual desktop users to install software– Browser-based administration tools allow local
management/policy specification to manage• Devices• Users• Workloads
– Single click install of client on PCs• Easily customizable to work with software
management packages
December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide
Grid MP™ Provisioning Example
User Group A
Grid MPServices
Root Administrator
User Group B
Device Group Administrator
Device Group Y
Device Group Z
Device Group Administrator(s)
Device Group XDevice Group X• User Groups A = 50%, B = 25%• Usage: 8am-5pm, 2hr cut-off• Runnable app. list …..
Device Group Y• User Group B = 100%• Usage: 24hrs, 1hr cut-off• Runnable app. list….
Device Group X• User Groups A = 50%, B = 50%• Usage: 6pm-8am, 8hr cut-off• Runnable app. list…..
December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide
Application Types Supported
• Batch jobs– Use mpsub command to run single
executable on single remote desktop
• MPI jobs– Use ud_mpirun command to run MPI job
across a set of desktop machines
• Data Parallel jobs– Single job consists of several independent
workunits that can be executed in parallel– Application developer must create program
modules and write application scripts to create workunits
December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide
Hosted Applications• Hosted Applications are easier to manage
– Provides users with managed application– Great for applications that are run frequently but
rarely updated– Data Parallel applications fit best in hosted scenario– Users do not have to deal with the application
maintenance only developer does.
• Grid MP is optimized for running hosted applications– Applications and data are cached at client nodes– Affinity scheduling to minimize data movement by re-
using cached executables and data. – Hosted application can be run across multiple
platforms by registering executables for each platform
December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide
Example: Reservoir Simulation
• Landmark’s VIP product benchmarked on Grid MP
• Workload consisted of 240 simulations for 5 wells– Sensitivities investigated include:
• 2 PVT cases, • 2 fault connectivity, • 2 aquifer cases, • 2 relative permeability cases,• 5 combinations of 5 wells• 3 combinations of vertical permeability multipliers
– Each simulation packaged as a separate piece of work.
• Similar Reservoir simulation application has been developed at TACC (with Dr. W. Bangerth, Institute of Geophysics)
December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide
Example: Drug Discovery• Think & LigandFit
applications– Internet Project in
partnership with Oxford University Model interactions between proteins and potential drug molecules
– Virtual screening of drug molecules to reduce time-consuming, expensive lab testing by 90%
– Drug Database of 3.5 billion candidate molecules.
– Over 350K active computers participating all over the world.
December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide
Think• Code developed at Oxford University• Application Characteristics
– Typical Input Data File: < 1 KB– Typical Output File: < 20 KB– Typical Execution Time: 1000-5000
minutes– Floating-point intensive– Small memory footprint– Fully resolved executable is ~3Mb in
size.
December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide
Grid MP: POVray Application Portal
December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide
BOINC• Berkeley Open Infrastructure for
Network Computing (BOINC)– Open source follow-on to SETI@home – General architecture supports multiple
applications– Solution targets volunteer resources, and
not enterprise desktops/workstations – More information at
http://boinc.berkeley.edu • Currently being used by several internet
projects
December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide
Structure of a BOINC project
Scheduling server (C++)
BOINC DB(MySQL) Work
generation
data server (HTTP)
data server (HTTP)
data server (HTTP)
Web interfaces
(PHP)
Retry generation
Result validation
Result processing
Garbage collection
Ongoing tasks:- monitor server correctness- monitor server performance- develop and maintain applications
December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide
BOINC • No enterprise management tools
– Focus on “volunteer grid”• Provide incentives (points, teams, website)• Basic browser interface to set usage preferences
on PCs• Support for user community (forums)
• Simple interface for job management– Application developer creates scripts to
submit jobs and retrieve results
• Provides sandbox on client• No encryption: uses redundant
computing to prevent spoofing
December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide
Projects using BOINC• Climateprediction.net: study climate change • Einstein@home: search for gravitational signals
emitted by pulsars • LHC@home: improve the design of the CERN
LHC particle accelerator • Predictor@home: investigate protein-related
diseases • Rosetta@home: help researchers develop cures
for human diseases • SETI@home: Look for radio evidence of
extraterrestrial life • Cell Computing biomedical research (Japanese;
requires nonstandard client software) • World Community Grid: advance our knowledge
of human disease. (Requires 5.2.1 or greater)
December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide
SETI@home
• Analysis of radio telescope data from Arecibo– SETI: search for narrowband signals– Astropulse: search for short broadband signals
• 0.3 MB in, ~4 CPU hours, 10 KB out
December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide
Climateprediction.net
• Climate change study (Oxford University)– Met Office model (FORTRAN, 1M lines)
• Input: ~10MB executable, 1MB data• Output per workunit:
– 10 MB summary (always upload)– 1 GB detail file (archive on client, may upload)
• CPU time: 2-3 months (can't migrate)– trickle messages– preemptive scheduling
December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide
Why use Desktop Grids?• Desktop Grid solutions are typically complete &
standalone– Easy to setup and manage– Good entry vehicle to try out grids.
• Use existing (but underutilized) resources – Number of desktops/workstations on campus (or in an
enterprise) is typically an order of magnitude greater than traditional compute resources.
– Power of grid grows over time as new, faster desktops are added
• Typical (large) numbers of resources on desktop grids enable new approaches to solving problems
top related