创新释放高性能计算潜力images.nvidia.com/cn/gtc/downloads/pdf/partners/606...node 1p-32p...

29
创新释放高性能计算潜力 林俊:华为服务器领域首席架构师

Upload: trinhdan

Post on 24-May-2018

240 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: 创新释放高性能计算潜力images.nvidia.com/cn/gtc/downloads/pdf/partners/606...Node 1P-32P Modular HPC Systems NVMe SSD ... management Hardware platform fault ... “Huawei’s

创新释放高性能计算潜力

林俊:华为服务器领域首席架构师

Page 2: 创新释放高性能计算潜力images.nvidia.com/cn/gtc/downloads/pdf/partners/606...Node 1P-32P Modular HPC Systems NVMe SSD ... management Hardware platform fault ... “Huawei’s

22

Market Trends

Page 3: 创新释放高性能计算潜力images.nvidia.com/cn/gtc/downloads/pdf/partners/606...Node 1P-32P Modular HPC Systems NVMe SSD ... management Hardware platform fault ... “Huawei’s

33

Requirement for Compute

1972 0.004 MIPS

1989

20 MIPS

Mobility

Cloud

Big Data

Security

2014 124,000 MIPS

2020

Millions of MIPS

Opportunity for Innovation

Internet of Things

Industry 4.0

Intelligent City

Traditional

Architecture

Page 4: 创新释放高性能计算潜力images.nvidia.com/cn/gtc/downloads/pdf/partners/606...Node 1P-32P Modular HPC Systems NVMe SSD ... management Hardware platform fault ... “Huawei’s

44

Computing Innovation slows down

Low Utilization High PowerFast Growth Not Secure

Past

The Doubling of Transistors are Slowing Down

Single Core Performance Increase

is Slowing Down

Multi Core Performance Limited by mdahl’s Law限制

Uneven Subsystem

Development

Now

Page 5: 创新释放高性能计算潜力images.nvidia.com/cn/gtc/downloads/pdf/partners/606...Node 1P-32P Modular HPC Systems NVMe SSD ... management Hardware platform fault ... “Huawei’s

55

The End of Moore’s LawTick Tock

Process Architecture Optimization

10 µm – 1971

6 µm – 1974

3 µm – 1977

1.5 µm – 1982

1 µm – 1985

800 nm – 1989

600 nm – 1994

350 nm – 1995

250 nm – 1997

180 nm – 1999

130 nm – 2001

90 nm – 2004

65 nm – 2006

45 nm – 2008

32 nm – 2010

22 nm – 2012

14 nm – 2014

10 nm – 2016

7 nm – 2018

5 nm – 2020

Covalent radius of Silicon Atom is 111 pm (0.111 nm)

Page 6: 创新释放高性能计算潜力images.nvidia.com/cn/gtc/downloads/pdf/partners/606...Node 1P-32P Modular HPC Systems NVMe SSD ... management Hardware platform fault ... “Huawei’s

66

Changes to CPU Power Consumption

Page 7: 创新释放高性能计算潜力images.nvidia.com/cn/gtc/downloads/pdf/partners/606...Node 1P-32P Modular HPC Systems NVMe SSD ... management Hardware platform fault ... “Huawei’s

77

Increase usage of AcceleratorsAdoption accelerated since 2010. Nvidia still dominates

Total performance share plateaued in past year, mainly due to life cycle

Accelerator-based system projected to dominate for next decade

Page 8: 创新释放高性能计算潜力images.nvidia.com/cn/gtc/downloads/pdf/partners/606...Node 1P-32P Modular HPC Systems NVMe SSD ... management Hardware platform fault ... “Huawei’s

88

Heterogeneous Architecture

Processors are moving toward specialization

Performance per Watt is becoming more important

• Heterogeneous CPUs can be more flexible, higher cost performance, and high power performance

• First used by storage systems• Internet server begin small scale deployment from 2013~2014

• Enterprise server application still lag behind 3~5 years

Page 9: 创新释放高性能计算潜力images.nvidia.com/cn/gtc/downloads/pdf/partners/606...Node 1P-32P Modular HPC Systems NVMe SSD ... management Hardware platform fault ... “Huawei’s

99

SolutionHuawei HPC Technology

Page 10: 创新释放高性能计算潜力images.nvidia.com/cn/gtc/downloads/pdf/partners/606...Node 1P-32P Modular HPC Systems NVMe SSD ... management Hardware platform fault ... “Huawei’s

1010

World Class HPC Solutions TODAY170+

Countries 2015 Revenue

16

R&D Centers

36Joint Innovation

Centers

79,000

R&D Engineers

Standalone Compute

Node 1P-32P

Modular HPC

Systems

NVMe SSD

HPC storage

Big Data

storage

Network FabricModular &

Container Data center

$63B

% of Revenue in R&D

14.2%

Huawei FusionServer

OceanStor CloudEngine

Reduce Complexity

More Performance / $

Design for Growth

HPC Private Cloud

》》

Petascale System

Direct Liquid Cooling

Workload Optimization

Ecosystem Partnership

》》

Page 11: 创新释放高性能计算潜力images.nvidia.com/cn/gtc/downloads/pdf/partners/606...Node 1P-32P Modular HPC Systems NVMe SSD ... management Hardware platform fault ... “Huawei’s

1111

Simplify HPC Systems TODAYMAXIMIZE EFFICIENCY ACCELERATE WORKLOAD

MAXIMIZE PERFORMANCE FOR

INDIVIDUAL WORKLOAD

Flexible, modular architecture

Multiple innovative form factors

Deep optimization with hardware

acceleration

Super Fat nodes

CONVERGED HPC & BIG DATA

MAXIMIZE HARDWARE ROI

Single HPC cluster and storage

system for both traditional HPC

MPI workload and Hadoop

Innovative big data analytic

appliance with deep hardware

and software optimization for

maximum cost effectiveness

MORE COMPUTE LESS SPACE

LOWER POWER CONSUMPTION

End-to-end energy efficient design

HVDC

High Ambient Temperature ~40oC

Direct Liquid Cooling ~ 84%

coverage

Tight integration with Huawei data

center infrastructure

SDS

Big Data

SDI

Page 12: 创新释放高性能计算潜力images.nvidia.com/cn/gtc/downloads/pdf/partners/606...Node 1P-32P Modular HPC Systems NVMe SSD ... management Hardware platform fault ... “Huawei’s

1212

HPC / IT Solutions for Tomorrow

Deep Learning

Enabling HPC Cloud

FusionInsight

Big Data

FusionSphere

Cloud OS

ManageOne

Management software

FusionStorage

Software Defined Storage Pool

〉〉〉〉

〉〉〉〉

〉〉〉〉

〉〉〉〉

Big Data Acceleration

DDR4

Next Gen CPU

GPU/FPGA

Accelerato

rHBM/HMC

GPU/FPGA

Accelerato

r HB

M/H

MC

New Heterogeneous CPU

New Memory Hierarchy

DDR4 DRAM as SCM Cache

DDR4-SCM-DIMM

SRAM

Cache

X86 CPU

SCM-SSD NVMe-

SSD HDD

HBA/RAID

HB

M

GPU

ME

MFPGA

New Technology Enablement Leverage Cross Disciplinary AssetsCreative Converged Solutions

Page 13: 创新释放高性能计算潜力images.nvidia.com/cn/gtc/downloads/pdf/partners/606...Node 1P-32P Modular HPC Systems NVMe SSD ... management Hardware platform fault ... “Huawei’s

1313

HPC Cloud Framework

- Open architecture

- Rapid deployment

- Efficient operation

- Demand based

- Maximize utilization

- Multi-tenents VPC isolation

- On-premise to cloud end-to-end

secured

Simulation Graphic Visualization

Technical Computing

Head NodeResource Pool Job SchedulerUser Login DB

Agile

Elastic

Secure

HPC Compute HPC Distributed Storage

Low Latency Networking

Security Isolation

Compute NodeGPGPU CPU IntenseGraphic Virtualized Memory Intense

Storage NodeObject Store HPC NASDistributed File System HPC Block

Page 14: 创新释放高性能计算潜力images.nvidia.com/cn/gtc/downloads/pdf/partners/606...Node 1P-32P Modular HPC Systems NVMe SSD ... management Hardware platform fault ... “Huawei’s

1414

Converged HPC & Big Data

Page 15: 创新释放高性能计算潜力images.nvidia.com/cn/gtc/downloads/pdf/partners/606...Node 1P-32P Modular HPC Systems NVMe SSD ... management Hardware platform fault ... “Huawei’s

1515

Builds A Leading Computing Platform With NVIDIA

Page 16: 创新释放高性能计算潜力images.nvidia.com/cn/gtc/downloads/pdf/partners/606...Node 1P-32P Modular HPC Systems NVMe SSD ... management Hardware platform fault ... “Huawei’s

Copyright©2016 Huawei Technologies Co., Ltd. All Rights Reserved.

The information in this document may contain predictive statements including, without limitation, statements regarding the

future financial and operating results, future product portfolio, new technology, etc. There are a number of factors that

could cause actual results and developments to differ materially from those expressed or implied in the predictive

statements. Therefore, such information is provided for reference purpose only and constitutes neither an offer nor an

acceptance. Huawei may change the information at any time without notice.

THANK YOU

HPC Solutions

Page 17: 创新释放高性能计算潜力images.nvidia.com/cn/gtc/downloads/pdf/partners/606...Node 1P-32P Modular HPC Systems NVMe SSD ... management Hardware platform fault ... “Huawei’s

1717

HUAWEI HPC MomentumManufacturing CAE/CFD Education/Research/SupercomputingChip Design & Manufacturing

Oil & Gas ExplorationEnergy Production & Distribution Digital Media

Page 18: 创新释放高性能计算潜力images.nvidia.com/cn/gtc/downloads/pdf/partners/606...Node 1P-32P Modular HPC Systems NVMe SSD ... management Hardware platform fault ... “Huawei’s

1818

Industrial CAE Simulation

Vibration and noise

Crash & safety

Indoor acoustics

Static strength

NVH Electro-

magnetics CFD

Physical component Computing model Result obtaining Verification analysis

Processing before modeling Processing after analysisComputing resolution

Size model Computing analysis

Design Sample Verification Design Sample Verification Product…Planning

Computational fluid

dynamics (CFD)

Structural

mechanics

Electromagnetic

simulation

System

engineering

General development process disadvantages

Long development period

High design cost

Weak process control

Customers' main requirements

Short development cycle

Low cost

Intuitive analysis and controllable process

Industrial HPC cluster highlights

Application integration and optimization

Operation cost reduction

Large-scale cluster management

Page 19: 创新释放高性能计算潜力images.nvidia.com/cn/gtc/downloads/pdf/partners/606...Node 1P-32P Modular HPC Systems NVMe SSD ... management Hardware platform fault ... “Huawei’s

1919

Huawei CAE Simulation Solution

Applications

Industrial simulation

application scenarios

Hardware platforms

Cluster capabilities

Application optimization

centralized management

LS-DYNA

PAM-CRASH

Computing: X6800 & E9000

Network: IB EDR

High parallelism

100 Gbit/s Fluid mechanics analysis

FLUENT

STAR-CCM+ABAQUS

NASTRAN

Computing: 8100 V3 & KunLun

Storage: OceanStor V3

24TB large memory capacity

400GB/s high storage bandwidth

Bright Computing PARATERA IBM Platform Altair

Application optimization

Star-CCM+ test, performance up 30%

PAM-Crash test, performance up 10%

Energy saving

Cabinet- and board-level liquid cooling, PUE ≤ 1.1

45ºC warm water cooling, lower power consumption for heat exchange

Converged management

Software license and job scheduling, unified management

Hardware platform fault diagnostics, high reliability

Crash simulation

test

Fluid mechanics

analysis

Structure

analysis

simulation

Page 20: 创新释放高性能计算潜力images.nvidia.com/cn/gtc/downloads/pdf/partners/606...Node 1P-32P Modular HPC Systems NVMe SSD ... management Hardware platform fault ... “Huawei’s

2020

Open/Cooperative HPC Ecosystem

Hardware

Application Local partners

Software

Page 21: 创新释放高性能计算潜力images.nvidia.com/cn/gtc/downloads/pdf/partners/606...Node 1P-32P Modular HPC Systems NVMe SSD ... management Hardware platform fault ... “Huawei’s

2121

CAE Customer Success Stories

Volkswagen Builds Immersive Car Crash Simulation Test Platform with Huawei HPC

Saves 50% design costs and shortens the product development cycle from 3 months to 1 week.

HiSilicon Builds Chip Simulation Cloud Platform with Huawei HPC

Increases computing capability from 1 million grids to 10 million grids, improving computing efficiency by 5x.

Global Foundries Builds Chip Simulation Platform with Huawei HPC

Shortens chip design computing simulation time from 1 day to 1 hour.

Daimler Mercedes-Benz Builds Core Vehicle R&D Capabilities with Huawei HPC

Improves simulation efficiency by 50% and saves power consumption by 10%.

Page 22: 创新释放高性能计算潜力images.nvidia.com/cn/gtc/downloads/pdf/partners/606...Node 1P-32P Modular HPC Systems NVMe SSD ... management Hardware platform fault ... “Huawei’s

2222

HPC Computing Drives R&D

149 TFLOPS/cabinet, 64 CPUs per cabinet

100G network, proprietary EDR switching

technology

100 TFLOPS-level CPU computing capability

21.1 TFLOPS/chassis, 8 GPUs per chassis

50% density increase, 1U 4-socket ultra-high computing

density

10 TFLOPS-level heterogeneous computing capability

24 TB in-memory capacity per node

2084 GB/s memory data bandwidth per node

Fat node in-memory computing

Animation rendering and production

3DMax、Maya、Softimage

Weather forecasting, environment

monitoring, aviation simulation

WRF、MM5、CMAQ、CAMs

Gene sequencing and molecular

motion simulation

BLAST、FASTA、Gromacs、NAMD

Hardware platform Application optimization Benefits

X6800

E9000

KunLun

Page 23: 创新释放高性能计算潜力images.nvidia.com/cn/gtc/downloads/pdf/partners/606...Node 1P-32P Modular HPC Systems NVMe SSD ... management Hardware platform fault ... “Huawei’s

2323

Fast Massive Data Transmission

Compute node cluster

Storage node cluster

OceanStor 9000

IB/10GE/GE

… …

… …400 GB/s massive data bandwidth

100 GB file system, biggest in the industry

Huawei massive data storage solution

Smart teaching

management system

Electronic reading library

storage system

Periodical and paper

storage system

200 GB/s bandwidth

50 GB storage capacity per node

144 nodes per cluster

Common solution

3 to 288 nodes linear expansion

Hardware platform Application optimization Benefits

Page 24: 创新释放高性能计算潜力images.nvidia.com/cn/gtc/downloads/pdf/partners/606...Node 1P-32P Modular HPC Systems NVMe SSD ... management Hardware platform fault ... “Huawei’s

2424

University of Toronto

Up to 12 TB memory capacity per node, 5x the

ecosystem modeling computing requirement,

enough for long-term expansion

Shortens the 4D simulation computing result time

for the integrated watershed-receiving waterbody

model from 6 days to 4 hours

Page 25: 创新释放高性能计算潜力images.nvidia.com/cn/gtc/downloads/pdf/partners/606...Node 1P-32P Modular HPC Systems NVMe SSD ... management Hardware platform fault ... “Huawei’s

2525

Bibliotheca Alexandrina in Egypt

Deployment density improved 33%, and

overall system energy efficiency improved

10%

NAMD application performance delivered

in the cluster deployment test is 10%

higher than that required by the customer

Page 26: 创新释放高性能计算潜力images.nvidia.com/cn/gtc/downloads/pdf/partners/606...Node 1P-32P Modular HPC Systems NVMe SSD ... management Hardware platform fault ... “Huawei’s

2626

Huawei Warm Water Cooling Solution

Cooling system (including the primary

loop)

CDU system and cooling media

Secondary loop between the CDU system and cooling

cabinets

Huawei FusionServer

liquid cooling cabinetsAir conditioning system

Cabinet- and board-level warm water cooling integrated delivery

Page 27: 创新释放高性能计算潜力images.nvidia.com/cn/gtc/downloads/pdf/partners/606...Node 1P-32P Modular HPC Systems NVMe SSD ... management Hardware platform fault ... “Huawei’s

2727

Huawei Warm Water Cooling Solution

Integrated cooling loop component, low leakage

risk

Physical isolation of water flows from circuits, no

short-circuit risks

217 system verification test items, high reliability

Warm water cooling reduces TCO by

30% compared with air cooling.

>>> >>>

Cooling PUE ≤ 1.1

Up to 45ºC inlet water

80% warm water cooling

Air cooling TCO

Liquid cooling TCO

TCO ratio

Page 28: 创新释放高性能计算潜力images.nvidia.com/cn/gtc/downloads/pdf/partners/606...Node 1P-32P Modular HPC Systems NVMe SSD ... management Hardware platform fault ... “Huawei’s

2828

Poland PCSS

“Huawei’s liquid cooling HPC cluster helps PCSS significantly reduce hardware investments and TCO.

This year, PCSS and Huawei will further their cooperation by building a joint innovation center to

develop solutions covering computing, storage, and cluster architectures. The cooperation with Huawei

has enabled PCSS to become one of the most competitive HPC service providers in Europe.”

— Norbert Meyer,

Manager of HPC & Data Department, PCSS

1.37 PFLOPS, PUE < 1.2, top 100 supercomputing center worldwide

Warm water cooling reduces electricity consumption by 3.26 million kWh per equipment room every year,

lowering consumption by 40%+

Page 29: 创新释放高性能计算潜力images.nvidia.com/cn/gtc/downloads/pdf/partners/606...Node 1P-32P Modular HPC Systems NVMe SSD ... management Hardware platform fault ... “Huawei’s

Copyright©2016 Huawei Technologies Co., Ltd. All Rights Reserved.

The information in this document may contain predictive statements including, without limitation, statements regarding the

future financial and operating results, future product portfolio, new technology, etc. There are a number of factors that

could cause actual results and developments to differ materially from those expressed or implied in the predictive

statements. Therefore, such information is provided for reference purpose only and constitutes neither an offer nor an

acceptance. Huawei may change the information at any time without notice.

THANK YOU