cluster computing: an introduction

64
金金金 金金金金金金金金金金金金 [email protected] Cluster Computing: An Introduction

Upload: aman

Post on 08-Jan-2016

24 views

Category:

Documents


4 download

DESCRIPTION

Cluster Computing: An Introduction. 金仲達 國立清華大學資訊工程學系 [email protected]. Clusters Have Arrived. What is a Cluster?. A collection of independent computer systems working together as if a single system Coupled through a scalable, high bandwidth, low latency interconnect - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Cluster Computing: An Introduction

金仲達國立清華大學資訊工程學系

[email protected]

Cluster Computing:

An Introduction

Page 2: Cluster Computing: An Introduction

2

Clusters Have Arrived

Page 3: Cluster Computing: An Introduction

3

What is a Cluster?

A collection of independent computer systems working together as if a single system

Coupled through a scalable, high bandwidth, low latency interconnect

The nodes can exist in a single cabinet or be separated and connected via a network

Faster, closer connection than a network (LAN) Looser connection than a symmetric multiprocessor

Page 4: Cluster Computing: An Introduction

4

Outline

Motivations of Cluster Computing Cluster Classifications Cluster Architecture & its Components Cluster Middleware Representative Cluster Systems Task Forces on Cluster Resources and Conclusions

Page 5: Cluster Computing: An Introduction

5

Motivations of

Cluster Computing

Page 6: Cluster Computing: An Introduction

6

How to Run Applications Faster ?

There are three ways to improve performance: Work harder Work smarter Get help

Computer analogy Use faster hardware: e.g. reduce the time per

instruction (clock cycle) Optimized algorithms and techniques Multiple computers to solve problem

=> techniques of parallel processing is mature and can be exploited commercially

Page 7: Cluster Computing: An Introduction

7

Motivation for Using Clusters

Performance of workstations and PCs is rapidly improving

Communications bandwidth between computers is increasing

Vast numbers of under-utilized workstations with a huge number of unused processor cycles

Organizations are reluctant to buy large, high performance computers, due to the high cost and short useful life span

Page 8: Cluster Computing: An Introduction

8

Motivation for Using Clusters

Workstation clusters are thus a cheap and readily available approach to high performance computing Clusters are easier to integrate into existing networks Development tools for workstations are mature

Threads, PVM, MPI, DSM, C, C++, Java, etc.

Use of clusters as a distributed compute resource is cost effective --- incremental growth of system!!! Individual node performance can be improved by adding

additional resource (new memory blocks/disks) New nodes can be added or nodes can be removed Clusters of Clusters and Metacomputing

Page 9: Cluster Computing: An Introduction

9

Key Benefits of Clusters

High performance: running cluster enabled programs

Scalability: adding servers to the cluster or by adding more clusters to the network as the need arises or CPU to SMP

High throughput System availability (HA): offer inherent high system

availability due to the redundancy of hardware, operating systems, and applications

Cost-effectively

Page 10: Cluster Computing: An Introduction

10

Why Cluster Now?

Page 11: Cluster Computing: An Introduction

11

Hardware and Software Trends

Important advances taken place in the last five year Network performance increased with reduced cost Workstation performance improved

Average number of transistors on a chip grows 40% per year Clock frequency growth rate is about 30% per year Expect 700-MHz processors with 100M transistors in early 2000

Availability of powerful and stable operating systems (Linux, FreeBSD) with source code access

Page 12: Cluster Computing: An Introduction

12

Why Clusters NOW?

Clusters gained momentum when three technologies converged:Very high performance microprocessors

workstation performance = yesterday supercomputers

High speed communicationStandard tools for parallel/ distributed computing &

their growing popularity Time to market => performance Internet services: huge demands for scalable, available,

dedicated internet servers big I/O, big compute

Page 13: Cluster Computing: An Introduction

13

Efficient Communication

The key enabling technology:from killer micro to killer switch Single chip building block for

scalable networks high bandwidth low latency very reliable

Challenges for clusters greater routing delay and less than

complete reliability constraints on where the network

connects into the node UNIX has a rigid device and

scheduling interface

Page 14: Cluster Computing: An Introduction

14

Putting Them Together ...

Building block = complete computers(HW & SW) shipped in 100,000s:Killer micro, Killer DRAM, Killer disk,Killer OS, Killer packaging, Killer investment

Leverage billion $ per year investment Interconnecting building blocks => Killer Net

High bandwidth Low latency Reliable Commodity (ATM, Gigabit Ethernet,

MyridNet)

Page 15: Cluster Computing: An Introduction

15

Windows of Opportunity

The resources available in the average clusters offer a number of research opportunities, such as Parallel processing: use multiple computers to build

MPP/DSM-like system for parallel computing Network RAM: use the memory associated with each

workstation as an aggregate DRAM cache Software RAID: use the arrays of workstation disks to

provide cheap, highly available, and scalable file storage Multipath communication: use the multiple networks

for parallel data transfer between nodes

Page 16: Cluster Computing: An Introduction

16

Windows of Opportunity

Most high-end scalable WWW servers are clusters end services (data, web, enhanced information services,

reliability)

Network mediation services also cluster-based Inktomi traffic server, etc. Clustered proxy caches, clustered firewalls, etc. => These object web applications increasingly compute

intensive => These applications are an increasing part of the

“scientific computing”

Page 17: Cluster Computing: An Introduction

17

Classification of

Cluster Computers

Page 18: Cluster Computing: An Introduction

18

Clusters Classification 1

Based on Focus (in Market)High performance (HP) clusters

Grand challenging applicationsHigh availability (HA) clusters

Mission critical applications

Page 19: Cluster Computing: An Introduction

19

HA Clusters

Page 20: Cluster Computing: An Introduction

20

Clusters Classification 2

Based on Workstation/PC OwnershipDedicated clustersNon-dedicated clusters

Adaptive parallel computingCan be used for CPU cycle stealing

Page 21: Cluster Computing: An Introduction

21

Clusters Classification 3

Based on Node ArchitectureClusters of PCs (CoPs)Clusters of Workstations (COWs)Clusters of SMPs (CLUMPs)

Page 22: Cluster Computing: An Introduction

22

Clusters Classification 4

Based on Node Components Architecture & Configuration:Homogeneous clusters

All nodes have similar configuration

Heterogeneous clusters Nodes based on different processors and running

different OS

Page 23: Cluster Computing: An Introduction

23

Clusters Classification 5

Based on Levels of Clustering:Group clusters (# nodes: 2-99)

A set of dedicated/non-dedicated computers --- mainly connected by SAN like Myrinet

Departmental clusters (# nodes: 99-999)Organizational clusters (# nodes: many 100s)Internet-wide clusters = Global clusters

(# nodes: 1000s to many millions) Metacomputing

Page 24: Cluster Computing: An Introduction

24

Clusters and Their

Commodity Components

Page 25: Cluster Computing: An Introduction

25

Cluster Computer Architecture

Page 26: Cluster Computing: An Introduction

26

Cluster Components...1aNodes

Multiple high performance components:PCsWorkstationsSMPs (CLUMPS)Distributed HPC systems leading to

Metacomputing They can be based on different architectures and

running different OS

Page 27: Cluster Computing: An Introduction

27

Cluster Components...1bProcessors There are many (CISC/RISC/VLIW/Vector..)

Intel: Pentiums, Xeon, Merced…. Sun: SPARC, ULTRASPARC HP PA IBM RS6000/PowerPC SGI MPIS Digital Alphas

Integrating memory, processing and networking into a single chip IRAM (CPU & Mem): (http://iram.cs.berkeley.edu) Alpha 21366 (CPU, Memory Controller, NI)

Page 28: Cluster Computing: An Introduction

28

Cluster Components…2OS

State of the art OS: Tend to be modular: can easily be extended and new

subsystem can be added without modifying the underlying OS structure

Multithread has added a new dimension to parallel processing

Popular OS used on nodes of clusters: Linux (Beowulf) Microsoft NT (Illinois HPVM) SUN Solaris (Berkeley NOW) IBM AIX (IBM SP2) …..

Page 29: Cluster Computing: An Introduction

29

Cluster Components…3High Performance Networks

Ethernet (10Mbps) Fast Ethernet (100Mbps) Gigabit Ethernet (1Gbps) SCI (Dolphin - MPI- 12 usec latency) ATM Myrinet (1.2Gbps) Digital Memory Channel FDDI

Page 30: Cluster Computing: An Introduction

30P

Cluster Components…4Network Interfaces

Dedicated Processing power and storage embedded in the Network Interface

An I/O card today Tomorrow on chip?

$

M I/O bus (S-Bus)50 MB/s

MryicomNet

P

Sun Ultra 170

MyricomNIC

160 MB/s

M

Page 31: Cluster Computing: An Introduction

31

Cluster Components…4Network Interfaces

Network interface cardMyrinet has NICUser-level access support: VIAAlpha 21364 processor integrates processing,

memory controller, network interface into a single chip..

Page 32: Cluster Computing: An Introduction

32

Cluster Components…5 Communication Software

Traditional OS supported facilities (but heavy weight due to protocol processing).. Sockets (TCP/IP), Pipes, etc.

Light weight protocols (user-level): minimal Interface into OS User must transmit directly into and receive from the

network without OS intervention Communication protection domains established by

interface card and OS Treat message loss as an infrequent case Active Messages (Berkeley), Fast Messages (UI), ...

Page 33: Cluster Computing: An Introduction

33

Cluster Components…6aCluster Middleware

Resides between OS and applications and offers an infrastructure for supporting:Single System Image (SSI)System Availability (SA)

SSI makes collection of computers appear as a single machine (globalized view of system resources)

SA supports check pointing and process migration, etc.

Page 34: Cluster Computing: An Introduction

34

Cluster Components…6bMiddleware Components

Hardware DEC Memory Channel, DSM (Alewife, DASH) SMP

techniques

OS/gluing layers Solaris MC, Unixware, Glunix

Applications and Subsystems System management and electronic forms Runtime systems (software DSM, PFS etc.) Resource management and scheduling (RMS):

CODINE, LSF, PBS, NQS, etc.

Page 35: Cluster Computing: An Introduction

35

Cluster Components…7aProgramming Environments

Threads (PCs, SMPs, NOW, ..) POSIX Threads Java Threads

MPI Linux, NT, on many Supercomputers

PVM Software DSMs (Shmem)

Page 36: Cluster Computing: An Introduction

36

Cluster Components…7bDevelopment Tools?

Compilers C/C++/Java/

RAD (rapid application development tools):GUI based tools for parallel processing modeling

Debuggers Performance monitoring and analysis tools Visualization tools

Page 37: Cluster Computing: An Introduction

37

Cluster Components…8Applications

Sequential Parallel/distributed (cluster-aware applications)

Grand challenging applications Weather Forecasting Quantum Chemistry Molecular Biology Modeling Engineering Analysis (CAD/CAM) ……………….

Web servers, data-mining

Page 38: Cluster Computing: An Introduction

38

Cluster Middleware and

Single System Image

Page 39: Cluster Computing: An Introduction

39

Middleware Design Goals

Complete transparency Let users see a single cluster system

Single entry point, ftp, telnet, software loading...

Scalable performance Easy growth of cluster

no change of API and automatic load distribution

Enhanced availability Automatic recovery from failures

Employ checkpointing and fault tolerant technologies

Handle consistency of data when replicated..

Page 40: Cluster Computing: An Introduction

40

Single System Image (SSI)

A single system image is the illusion, created by software or hardware, that a collection of computers appear as a single computing resource

Benefits: Usage of system resources transparently Improved reliability and higher availability Simplified system management Reduction in the risk of operator errors User need not be aware of the underlying system

architecture to use these machines effectively

Page 41: Cluster Computing: An Introduction

41

Desired SSI Services

Single entry point telnet cluster.my_institute.edu telnet node1.cluster.my_institute.edu

Single file hierarchy: AFS, Solaris MC Proxy Single control point: manage from single GUI Single virtual networking Single memory space - DSM Single job management: Glunix, Condin, LSF Single user interface: like workstation/PC

windowing environment

Page 42: Cluster Computing: An Introduction

42

SSI Levels

Single system support can exist at different levels within a system, one is able to be built on another

Application and Subsystem Level

Operating System Kernel Level

Hardware Level

Page 43: Cluster Computing: An Introduction

46

Availability Support Functions

Single I/O space (SIO): Any node can access any peripheral or disk devices

without the knowledge of physical location.

Single process space (SPS) Any process can create processes on any node, and

they can communicate through signals, pipes, etc, as if they were one a single node

Checkpointing and process migration Saves the process state and intermediate results in memory

or disk; process migration for load balancing

Reduction in the risk of operator errors

Page 44: Cluster Computing: An Introduction

47

Relationship among Middleware Modules

Page 45: Cluster Computing: An Introduction

48

Strategies for SSI

Build as a layer on top of existing OS (e.g. Glunix) Benefits:

Makes the system quickly portable, tracks vendor software upgrades, and reduces development time

New systems can be built quickly by mapping new services onto the functionality provided by the layer beneath, e.g. Glunix/Solaris-MC

Build SSI at the kernel level (True Cluster OS) Good, but can’t leverage of OS improvements by

vendor e.g. Unixware and Mosix (built using BSD Unix)

Page 46: Cluster Computing: An Introduction

49

Representative Cluster Systems

Page 47: Cluster Computing: An Introduction

50

Research Projects of Clusters

Beowulf: CalTech, JPL, and NASA Condor: Wisconsin State University DQS (Distributed Queuing System): Florida

State U. HPVM (High Performance Virtual Machine):

UIUC& UCSB Gardens: Queensland U. of Technology, AU NOW (Network of Workstations): UC Berkeley PRM (Prospero Resource Manager): USC

Page 48: Cluster Computing: An Introduction

51

Commercial Cluster Software

Codine (Computing in Distributed Network Environment): GENIAS GmbH, Germany

LoadLeveler: IBM Corp. LSF (Load Sharing Facility): Platform Computing NQE (Network Queuing Environment): Craysoft RWPC: Real World Computing Partnership, Japan Unixware: SCO Solaris-MC: Sun Microsystems

Page 49: Cluster Computing: An Introduction

55

Comparison of 4 Cluster Systems

Page 50: Cluster Computing: An Introduction

56

Task Forces

on Cluster Computing

Page 51: Cluster Computing: An Introduction

57

IEEE Task Force on Cluster Computing (TFCC)

http://www.dgs.monash.edu.au/~rajkumar/tfcc/

http://www.dcs.port.ac.uk/~mab/tfcc/

Page 52: Cluster Computing: An Introduction

58

TFCC Activities

Mailing list, workshops, conferences, tutorials, web-resources etc.

Resources for introducing the subject in senior undergraduate and graduate levels

Tutorials/workshops at IEEE Chapters ….. and so on.

Visit TFCC Page for more details: http://www.dgs.monash.edu.au/~rajkumar/tfcc/

Page 53: Cluster Computing: An Introduction

59

Efforts in Taiwan

PC Farm Project at Academia Sinica Computing Center: http://www.pcf.sinica.edu.tw/

NCHC PC Cluster Project: http://www.nchc.gov.tw/project/pccluster/

Page 54: Cluster Computing: An Introduction

60

NCHC PC Cluster

A Beowulf class cluster

Page 55: Cluster Computing: An Introduction

61

System Hardware

5 Fast Ethernet switching hubs

Page 56: Cluster Computing: An Introduction

62

System Software

Page 57: Cluster Computing: An Introduction

63

Conclusions

Clusters are promising and funOffer incremental growth and match with

funding patternNew trends in hardware and software

technologies are likely to make clusters more promising

Cluster-based HP and HA systems can be seen everywhere!

Page 58: Cluster Computing: An Introduction

64

The Future

Cluster system using idle cycles from computers will continue

Individual nodes will have of multiple processors Widespread usage of Fast and Gigabit Ethernet and

they will become de facto network for clusters Cluster software bypass OS as much as possible Unix-based OS are likely to be most popular, but

the steady improvement and acceptance of NT will not be far behind

Page 59: Cluster Computing: An Introduction

65

The Challenges

Programming enable applications, reduce programming effort, distributed

object/component models?

Reliability (RAS) programming effort, reliability with scalability to 1000’s

Heterogeneity performance, configuration, architecture and interconnect

Resource Management (scheduling, perf. pred.) System Administration/Management Input/Output (both network and storage)

Page 60: Cluster Computing: An Introduction

66

Pointers to Literature on

Cluster Computing

Page 61: Cluster Computing: An Introduction

67

Reading Resources..1aInternet & WWW

Computer architecture:http://www.cs.wisc.edu/~arch/www/

PFS and parallel I/O:http://www.cs.dartmouth.edu/pario/

Linux parallel processing:http://yara.ecn.purdue.edu/~pplinux/Sites/

Distributed shared memory:http://www.cs.umd.edu/~keleher/dsm.html

Page 62: Cluster Computing: An Introduction

68

Reading Resources..1bInternet & WWW

Solaris-MC:http://www.sunlabs.com/research/solaris-mc

Microprocessors: recent advanceshttp://www.microprocessor.sscc.ru

Beowulf:http://www.beowulf.org

Metacomputinghttp://www.sis.port.ac.uk/~mab/Metacomputing/

Page 63: Cluster Computing: An Introduction

69

Reading Resources..2Books

In Search of Clusterby G.Pfister, Prentice Hall (2ed), 98

High Performance Cluster ComputingVolume1: Architectures and SystemsVolume2: Programming and Applications

Edited by Rajkumar Buyya, Prentice Hall, NJ, USA.

Scalable Parallel Computingby K Hwang & Zhu, McGraw Hill,98

Page 64: Cluster Computing: An Introduction

70

Reading Resources..3Journals

“A Case of NOW”, IEEE Micro, Feb1995 by Anderson, Culler, Paterson

“Fault Tolerant COW with SSI”, IEEE Concurrency by Kai Hwang, Chow, Wang, Jin, Xu

“Cluster Computing: The Commodity Supercomputing”, Journal of Software Practice and Experience by Mark Baker & Rajkumar Buyya