병렬프로그래밍 - cuvix.co.kr...phase/trend major constraints 2x efficient app runs…...

병렬프로그래밍 김명신, Technical Evangelist, Microsoft

먼저

Phase/Trend Major Constraints 2x Efficient App Runs…

(1950-90s) Compute-constrained Processor 2x compute speed

2x users

(200x-) Mobile + bigger experiences

(e.g., tablet, ‘smartphone’)

Power (battery life)

Processor

2x battery life

2x compute speed

(2009-) Cloud / datacenter

(e.g., Office 365, Shazam, Siri)

Server HW (57%)

Power (31%) *

0.56x nodes

0.56x power

(2009-) Heterogeneous cores

(e.g., Cell, GPGPU)

Power (dark silicon)

Processor

0.5x power envelope

2x compute speed

(2020ish-) Moore’s End Processor 2x compute speed forever

* http://perspectives.mvdirona.com/2010/09/18/OverallDataCenterCosts.aspx

(1995ish-2007ish) Surplus local

compute + low UI innovation

(e.g., 2nd party LOB client WIMP apps) *WIMP(Windows, Icon, Menu, Pointing Device)

Programmer time n/a

* http://perspectives.mvdirona.com/2010/09/18/OverallDataCenterCosts.aspx

(200x-) Mobile + bigger experiences

(e.g., tablet, ‘smartphone’)

Power (battery life)

Processor

2x battery life

2x compute speed

(2009-) Cloud / datacenter

(e.g., Office 365, Shazam, Siri)

Server HW (57%)

Power (31%) *

0.56x nodes

0.56x power

(2009-) Heterogeneous cores

(e.g., Cell, GPGPU)

Power (dark silicon)

Processor

0.5x power envelope

2x compute speed

(2020ish-) Moore’s End Processor 2x compute speed forever

Note: The final four are going to dominate for the rest of our careers.

Phase/Trend Major Constraints 2x Efficient App Runs…

Distributed Parallel Telco network, Internet, DFS,

Cluster computing, Grid computing

Multi Processor, Multi Core, NUMA

1 − 𝑃 +𝑃𝑆

P : Parallel Portion

S : Speed up

1 − 0.5 + 0.52

= 1.333 …

50% 구간을

2배 성능 향상시

Performance Wizard Concurrency Visualizer

How(Old Features)

Multithread Programming

OpenMP PPL / TPL

How(VS2012 New Features)

Auto-Vectorization

Auto-Parallelization

C++ AMP

const int N = 1000; float a[N], b[N]; // initialize a[i] = i, b[i] = 100 + i; for (int i = 0 ; i < N ; ++i) a[i] += b[i];

By default, ON

SSE instruction in Intel / NEON instruction in ARM

Vector registers are called XMM0~XMM15

SSE 4.2 instruction set if available

To disable vectorization

#pragma loop(no_vector)

Compiler evaluate the code to find loops that might benefit form parallelization

Use, /Qpar

To enable the auto-parallelization, manually

#pragma loop(hint_parallel(n))

Accelerated Massive Parallelism

C++, not C

Just one general language extension

Portable, mix & match hardware from any vender, one exe

General and future-proof

Open specification

병렬프로그래밍 - cuvix.co.kr...phase/trend major constraints 2x efficient app runs…...

Documents

lego.com/creator · 2020. 6. 17. · 40 2x 4113858 2x...

hpe compute prezentacja 3.11.2015

mpi를이용한 병렬프로그래밍 -...

vers. - lego45 1x 4517986 2x 4495935 1x 4539385 1x 303626 1x...

계산가속기 프로그래밍 모델별...

stuva - ikea · 101358 2x 2x 14x 12x 100218 2x 2x 2x 190729...

파이썬 병렬프로그래밍

yalova ro-ro port logo kullanim kilavuzu...yalova ro-ro port...

nuevo - lego · 2020-03-10 · nuevo año del cerdo. 3 2x 1...

3 4 - lego · 2017-11-28 · 3x 3. 28 1x 2x 4 1x 5 2x 1x 6....

compute credit default swaps

10703 - lego · 2019-07-23 · 3x 3x 21 3x 24. 2x 2x 22 2x...

county - garbolixufficio.com4x 2x 2x 4x 4x 2x 2x 4x 4x 2x 2x...

적합한 오라클 데이터베이스 클라우드...

intel® compute card ﺞﺗﻧﻣﻠﻟ دودﺣﻣﻟا...

50coah · 2019. 4. 17. · a' b c d e j k a k' 2x 2x 2x 2x...

2x · 2019. 10. 1. · 2x am19158agv 3x am16980ajw 2x...

compute shader dx11

compute methods@cornell

stuva...101358 2x 2x 14x 12x 100218 2x 2x 2x 190729 152054...