lessons from the beowulf bob lucas usc – lockheed martin quantum computing center oct 14, 2014
TRANSCRIPT
Lessons from the Beowulf
Bob Lucas
USC – Lockheed Martin Quantum Computing Center
Oct 14, 2014
“Seque”
I met Thomas Sterling in Oct., 1988Supercomputing Research Center (SRC)MIT-trained dataflow expertReally big vocabularySkipper of the Floating Point
Supercomputing 1988Spoke of the perils of overheadRebutted by MIT professor in the audience
Guerrilla researchOften in Thomas’s homeDataflow execution of a linear solverWould have been more efficient than a Y/MP
Supercomputing in the 1980s
ECL shared-memory, vector mainframesPrimarily from Cray Research~$10M
SRC Cray-2Four 250 MHz CPUsThree people
NASA Cray-2 from Wikipedia
Supercomputing in the 1980s
ECL shared-memory, vector mainframesPrimarily from Cray Research~$10M
SRC Cray-2Four 250 MHz CPUsThree people
Machines were expensive
People were cheap
NASA Cray-2 from Wikipedia
FET Technology Revolution
FET patent filed in 1925
MOSFET invented in 1959
COSMIC Cube’s 8086s were nMOS
CMOS matured in the mid-1980sLatch-up finally addressed
New manufacuring technology launched a broad range of parallel computer architecture research
1980s and early 1990s
Early 1990s Message Passing Systems(aka, Communicating Sequential Processes)
PC componentsIntel Touchstone Delta512 CPUsCustom networkOSF/1
Workstation componentsIBM SP1128 RS/6000 CPUsCustom networkAIX
Contemporary Shared Memory Alternatives
Convex SPP2048 PA-RISC CPUsccNUMASCI network
Cray T3D2048 Alpha CPUsShared address space3D torus networkY/MP packaging
Beowulf was Underwhelming
”Lowest Common Denominator”Cheap PC componentsMediocre performance (10s of CPUs)Large form factorMessage passing execution modelOS from a Finnish teenager, and Don
Beowulf was Underwhelming
”Lowest Common Denominator”Cheap PC componentsMediocre performance (10s of CPUs)Large form factorMessage passing execution modelOS from a Finnish teenager, and Don
Mosaic was underwhelming too
I Began to Take Notice
Tom Blank quit MasParKnew he couldn’t compete with Beowulf cost structure
Boeing engineer’s ”office equipment”IDC’s dark matter
LSTC classroom outperformed the SGI OriginNot all applications need fancy networks
USC ”Condo complex”HPC with modest institutional investmentManaged by only three people
Beowulf Triumphed
Hardware costs are effectively minimizedSystem software tooISV license fees often exceed hardware cost
Vendor integrated systemsBetter form factorsCompetitive with custom systems at all but extreme scalesLow margins
Large users still integrate their ownGoogle and Facebook among top five server manufacturers
Outsourcing of infrastructureEliminate labor of system administrators and operatorsCloud purveyors have econonmies of scale
Computing “Too Cheap to Meter”
”Flops are free”
Applications often used inefficientlyE.g., rectalinear meshes to track turbulent fluidsEasier than more sophisticated, adaptive grids.
Large parallel systems used inefficientlyMap-Reduce execution model easy to useVirtual machine layers make them easy to manage
False Economy?
People are expensiveSophisticated codes are costly to writeConcurrancy makes them more so
Mitigate some of this with libraries
Electricity is expensive too
Tyranny of BeowulfNot all algorithms parallelize wellCSP execution model limits those that do
Unpredictable distribution of data and operationsCommodity hardware overheads further impact scalingBeowulf cost advantage has squeezed out alternatives
Looking to the Future
Need to change focus to maximizing human productivityReduce cognitive burden on developers and users
e.g., shared address spaces
Software legacy represents huge labor investmentEvolution onto Beowulf an ongoing process, after two decadesNeed to evolve these codes into the future
Yes, that means Fortran and MPI where they workAdd new features where needed
Launch ParalleX applications by typing ”mpirun”
Threatened by diversity of rapidly evolving environmentBeowulf fostered a stable execution model for two decadesGracefully incorporated local node changes
Shared memory and accelerators
Revisit Execution Model
Pentium core performance asymptotingRoom for innovation that wasn’t possible for two decadesRediscover E-registers and other lost 1990s technology
Anton is illustrative of the engineering that’s neededOrder-of-magnitude lower communication overheadI expect more application (or domain) specific systems
Thomas Sterling’s current research focusInformed by three decades of prior research
Dataflow, Beowulf, PIM, HTMT, ParalleXHe set us on the path to BeowulfHe could do it again