martin tsenkov;elfe;221210014;77b

Upload: martin-tsenkov

Post on 05-Apr-2018

220 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/31/2019 Martin Tsenkov;ELFE;221210014;77B

    1/17

    Computing is changing more rapidly

    than ever before, and scientists have the

    unprecedented opportunity to changecomputing directions

  • 7/31/2019 Martin Tsenkov;ELFE;221210014;77B

    2/17

    Largest computer at a given time Technical use for science and engineering

    calculations Large government defense, weather, aero

    laboratories are first buyers

    Price is no object Market size is 3-5

    5/26/2012 Copyright G Bell & TCM History Center 2

  • 7/31/2019 Martin Tsenkov;ELFE;221210014;77B

    3/17

    Major Challenges are ahead for extremecomputing Power

    Parallelism

    and many others not discussed here

    We will need completely new approaches andtechnologies to reach the Exascale level

    This opens up a unique opportunity for scienceapplications to lead extreme scale systemsdevelopment

  • 7/31/2019 Martin Tsenkov;ELFE;221210014;77B

    4/17

    Commodity processor with commodity inter-processorconnectionClustersPentium, Itanium, Opteron, AlphaGigE, Infiniband, Myrinet, Quadrics, SCINEC TX7

    HP AlphaCommodity processor with custom interconnect

    SGI AltixIntel Itanium 2Cray Red StormAMD Opteron

    Custom processor with custom interconnectCray X1NEC SX-7IBM RegattaIBM Blue Gene/L

    Loosely

    Coupled

    TightlyCoupled

    Commercial Parallel Computer Architecture

  • 7/31/2019 Martin Tsenkov;ELFE;221210014;77B

    5/17

    SGI AltixThe Columbia Supercomputer at NASA'sAdvanced Supercomputing Facility at AmesResearch Center.It consists of a 10,240-processor SGI Altix

    system comprised of 20 nodes, each with512 Intel Itanium 2 processors, and running aLinux operating systemBlack Hole Simulations

    Hitachi SR11000NEC SX-7AppleCray RedStormCray BlackWidow

    IBM Blue Gene/L

    http://imagine.gsfc.nasa.gov/Images/news/columbia_computer.jpghttp://imagine.gsfc.nasa.gov/Images/news/columbia_computer.jpg
  • 7/31/2019 Martin Tsenkov;ELFE;221210014;77B

    6/17

    5/26/2012 Copyright G Bell & TCM History Center 6

    Time $M structure example1950 1 mainframes many...

    1960 3 instruction //sm IBM / CDCmainframe SMP

    1970 10 pipelining 7600 / Cray 11980 30 vectors; SCI Crays1990 250 MIMDs: mC, SMP, DSM Crays/MPP2000 1,000 ASCI, COTS MPP Grid, Legion

  • 7/31/2019 Martin Tsenkov;ELFE;221210014;77B

    7/17

    Intel Pentium Xeon3.2 GHz, peak = 6.4 Gflop/s

    Linpack 100 = 1.7 Gflop/sLinpack 1000 = 3.1 Gflop/s

    AMD Opteron2.2 GHz, peak = 4.4 Gflop/sLinpack 100 = 1.3 Gflop/sLinpack 1000 = 3.1 Gflop/s

    Intel Itanium 21.5 GHz, peak = 6 Gflop/sLinpack 100 = 1.7 Gflop/sLinpack 1000 = 5.4 Gflop/s

    HP PA RISCSun UltraSPARC IVHP Alpha EV68

    1.25 GHz, 2.5 Gflop/s

    MIPS R16000

    Linpack: a standard benchmark software that testhow fast your computer runs

    Gflop/s: One billion floating pointoperations per second

  • 7/31/2019 Martin Tsenkov;ELFE;221210014;77B

    8/17

    Switch

    topology

    NIC $ Node MPI Lat

    (us)

    1-way

    speed(MB/s)

    Bi-Dir

    speed(MB/s)

    GigabitEthernet

    Bus $ 50 $100 30 100 150

    SCI Torus $1,600

    $1600 5 300 400

    QsNetII(R)

    Fat Tree $1200 $2900 3 880 900

    Myrinet(D card

    Clos $595 $995 6.5 240 480

    Myrinet(E card)

    Clos $995 $1395 6 450 900

    IBM 4X Fat Tree $1000 $1400 6 820 790

    Gig EthernetMyrinetInfinibandQsNetSCI

    More detail

  • 7/31/2019 Martin Tsenkov;ELFE;221210014;77B

    9/17

    Tree network: there is only one path between

    any pair of processors. Fat tree network: increase the number of

    communication links close to the root.

    Root levelhas more physicalconnections

  • 7/31/2019 Martin Tsenkov;ELFE;221210014;77B

    10/17

    A.K.A----Wrapped-around-mesh topology

    Three-dimensional Mesh

    Mesh with wraparound

  • 7/31/2019 Martin Tsenkov;ELFE;221210014;77B

    11/17

    is a kind of multistage switchingnetwork

    Three stages, each consisting a numberof crossbars.

    Middle stage have redundant switchingboxes to alleviate blocking probability

  • 7/31/2019 Martin Tsenkov;ELFE;221210014;77B

    12/17

    By Myricom company First Myrinet in 1994 An alternative for

    Ethernet to connectthe nodes in a cluster

    entirely operated inuser space, no

    Operating Systemdelays

    Miyinet switch: 10-Gbps, $12,800Clos networks up to 128 host ports

    10G PCI Express NICWith fiber connectors

  • 7/31/2019 Martin Tsenkov;ELFE;221210014;77B

    13/17

    By Quadrics (formed in 1996) uses a 'fat tree' topology QsNetII scales up to 4096 nodes

    Each node might have multiple CPUs Designed for use within SMP systems MPI latency on standard AMD Opteron starts

    at 1.22 usec; Bandwidth on Intel Xeon EM64T is 912

    Mbytes/s.

    QsNetII E-Series128-way switch

    http://en.wikipedia.org/wiki/Symmetric_multiprocessinghttp://en.wikipedia.org/wiki/Opteronhttp://en.wikipedia.org/wiki/EM64Thttp://en.wikipedia.org/wiki/EM64Thttp://en.wikipedia.org/wiki/Opteronhttp://en.wikipedia.org/wiki/Symmetric_multiprocessing
  • 7/31/2019 Martin Tsenkov;ELFE;221210014;77B

    14/17

    Each chip containstwo nodes

    Each node is aPPC440 processor

    Each node has 512local memory

    Each node runslightweight OS withMPI.

    Each node runs oneuser process

    No contextswitching at node

  • 7/31/2019 Martin Tsenkov;ELFE;221210014;77B

    15/17

    Use five networks: GigE for I/O nodes, to external systems

    A control network use FastEthernet

    3-D Torus for node-to-node message passing Handle majority of application traffic (mpi messaging)

    Longest path: 64 hops

    MPI software is highly customized: A collective network for broadcasting

    A barrier network

  • 7/31/2019 Martin Tsenkov;ELFE;221210014;77B

    16/17

  • 7/31/2019 Martin Tsenkov;ELFE;221210014;77B

    17/17

    System attributes 2010 2015 2018

    System peak 2 Peta 200 Petaflop/sec 1 Exaflop/sec

    Power 6 MW 15 MW 20 MW

    System memory 0.3 PB 5 PB 32-64 PB

    Node performance 125 GF 0.5 TF 7 TF 1 TF 10 TF

    Node memory BW 25 GB/s 0.1 TB/sec 1 TB/sec 0.4 TB/sec 4 TB/sec

    Node concurrency 12 O(100) O(1,000) O(1,000) O(10,000)

    System size (nodes) 18,700 50,000 5,000 1,000,000 100,000

    Total Node

    Interconnect BW

    1.5 GB/s 20 GB/sec 200 GB/sec

    MTTI days O(1day) O(1 day)