pl-4050, an introduction to spir for opencl application developers and compiler developers, by ...
DESCRIPTION
Presentation PL-4050, An Introduction to SPIR for OpenCL Application Developers and Compiler Developers, by Peter Zion, at the AMD Developer Summit (APU13) November 11-13, 2013.TRANSCRIPT
HSA AND FABRIC ENGINE: A GAME CHANGER FOR DIGITAL CONTENT CREATION
PETER ZION CHIEF ARCHITECT
FABRIC ENGINE INC.
2 | HSA AND FABRIC ENGINE: A GAME CHANGER FOR DIGITAL CONTENT CREATION | NOVEMBER 19, 2013 | CONFIDENTIAL
PERFORMANCE AND 3D
y Performance is very important for high-‐end 3D ‒ SimulaSons: parScles, crowds, materials, hair ‒ Rendering: scene culling, subdivisions, path tracing
y Quality of 3D content is largely driven by available performance
3 | HSA AND FABRIC ENGINE: A GAME CHANGER FOR DIGITAL CONTENT CREATION | NOVEMBER 19, 2013 | CONFIDENTIAL
PERFORMANCE AND 3D
y GPU came from 3D, but sSll mostly used for rendering in high-‐end 3D content creaSon ‒ GPU compute is domain of “ninja coders” ‒ SSll o[en done through “shader hacks”!
y Need to democraSze the GPU!
4 | HSA AND FABRIC ENGINE: A GAME CHANGER FOR DIGITAL CONTENT CREATION | NOVEMBER 19, 2013 | CONFIDENTIAL
WHAT IS FABRIC ENGINE?
y A high-‐performance plaborm for building 3D applicaSons, effects and tools. ‒ OpSmized naSve code ‒ Parallelism ‒ High-‐end 3D for media and entertainment
y ApplicaSons can be standalone and/or embedded in DCCs (Maya, So[image, 3DSMax, …)
5 | HSA AND FABRIC ENGINE: A GAME CHANGER FOR DIGITAL CONTENT CREATION | NOVEMBER 19, 2013 | CONFIDENTIAL
WHAT IS FABRIC ENGINE?
} Fabric Engine SIGGRAPH 2013 teaser video: hjp://vimeo.com/70421665
6 | HSA AND FABRIC ENGINE: A GAME CHANGER FOR DIGITAL CONTENT CREATION | NOVEMBER 19, 2013 | CONFIDENTIAL
WHAT IS FABRIC ENGINE?
y ApplicaSons are a combinaSon of Python (or a DCC) and KL ‒ Python/DCC: UI, construcSon of 3D scenes ‒ KL: rendering, simulaSon, effects and data import/export ‒ Python/DCC drives execuSon of KL code
7 | HSA AND FABRIC ENGINE: A GAME CHANGER FOR DIGITAL CONTENT CREATION | NOVEMBER 19, 2013 | CONFIDENTIAL
HORDE
y Horde: High-‐End Crowd SimulaSon ‒ Thousands of interacSng characters ‒ Rigging (puppetry) of each character ‒ Behaviour of characters ‒ A typical Fabric Engine applicaSon
8 | HSA AND FABRIC ENGINE: A GAME CHANGER FOR DIGITAL CONTENT CREATION | NOVEMBER 19, 2013 | CONFIDENTIAL
THE KL LANGUAGE
y Procedural y JavaScript-‐like syntax y Rich type system ‒ Integers, Booleans, Floats, Strings ‒ Fixed-‐ and variable-‐size arrays; dicSonaries ‒ Structures and Objects
y Pointer-‐free
9 | HSA AND FABRIC ENGINE: A GAME CHANGER FOR DIGITAL CONTENT CREATION | NOVEMBER 19, 2013 | CONFIDENTIAL
THE KL LANGUAGE
y A simple language ‒ Accessible to “technical arSsts”
10 | HSA AND FABRIC ENGINE: A GAME CHANGER FOR DIGITAL CONTENT CREATION | NOVEMBER 19, 2013 | CONFIDENTIAL
THE KL LANGUAGE
y KL is built on LLVM ‒ Targets many plaborms ‒ Rich opSmizaSons ‒ Amazing API
y KL was originally designed with only CPUs in mind ‒ Can it target the GPU?
11 | HSA AND FABRIC ENGINE: A GAME CHANGER FOR DIGITAL CONTENT CREATION | NOVEMBER 19, 2013 | CONFIDENTIAL
SUPPORTING HSA GPUS
y Goals ‒ Allow most KL code to run without modificaSon on HSA GPUs ‒ Allow KL code on CPU to perform a parallel evaluaSon of other KL code on GPU ‒ Make memory management as easy as possible
12 | HSA AND FABRIC ENGINE: A GAME CHANGER FOR DIGITAL CONTENT CREATION | NOVEMBER 19, 2013 | CONFIDENTIAL
SUPPORTING HSA GPUS
} Video demo of Maya integraSon of water simulaSon running on HSA inside Maya
13 | HSA AND FABRIC ENGINE: A GAME CHANGER FOR DIGITAL CONTENT CREATION | NOVEMBER 19, 2013 | CONFIDENTIAL
SUPPORTING HSA GPUS
y Challenges ‒ KL runSme library is C++ ‒ MulSple address spaces on GPUs ‒ KL is high-‐level ‒ Dynamic memory management ‒ ExcepSons ‒ “Virtual funcSons”
14 | HSA AND FABRIC ENGINE: A GAME CHANGER FOR DIGITAL CONTENT CREATION | NOVEMBER 19, 2013 | CONFIDENTIAL
STAGE ONE
y Goal: get compiler unit tests passing on GPU y Convert KL runSme library to LLVM IR y Support mulSple address spaces ‒ AutomaSc regeneraSon of LLVM funcSons for correct address spaces
y Create HSA-‐based test harness
15 | HSA AND FABRIC ENGINE: A GAME CHANGER FOR DIGITAL CONTENT CREATION | NOVEMBER 19, 2013 | CONFIDENTIAL
KL RUNTIME LIBRARY
y Originally, KL runSme library was wrijen in C++ ‒ Not GPU-‐compaSble
y LLVM is very good at inlining y EnSre runSme library was converted into code that builds LLVM IR ‒ EffecSvely, runSme library is now dynamically compiled ‒ Very low level, eg. conversion of float to string
16 | HSA AND FABRIC ENGINE: A GAME CHANGER FOR DIGITAL CONTENT CREATION | NOVEMBER 19, 2013 | CONFIDENTIAL
MULTIPLE ADDRESS SPACES
y GPU differenSates between pointers to private, local and global memory
y Rewrote KL code generators to account for address spaces ‒ If same funcSon is used with two different combinaSons of pointer type, funcSon is generated twice
17 | HSA AND FABRIC ENGINE: A GAME CHANGER FOR DIGITAL CONTENT CREATION | NOVEMBER 19, 2013 | CONFIDENTIAL
KL UNIT TESTS
y KL has a rich set of unit tests (~400 tests) y GPU test harness was easy to write ‒ HSA runSme API ‒ Pass LLVM IR to AMD compiler library in place of OpenCL ‒ Simulate a heap and “prinb”
y A few HSA-‐related problems in our code ‒ Alignment, global iniSalizaSon, intrinsics
18 | HSA AND FABRIC ENGINE: A GAME CHANGER FOR DIGITAL CONTENT CREATION | NOVEMBER 19, 2013 | CONFIDENTIAL
STAGE ONE RESULTS
y Vast majority of KL unit tests pass on HSA ‒ Failures are very isolated ‒ eg. unsupported transcendentals
‒ LLVM IR -‐> HSAIL path in AMD compiler library is stable
19 | HSA AND FABRIC ENGINE: A GAME CHANGER FOR DIGITAL CONTENT CREATION | NOVEMBER 19, 2013 | CONFIDENTIAL
STAGE TWO
y Goal: support trampoline from CPU to GPU ‒ Meaning: GPU kernel execuSon from KL code running on CPU ‒ GPU-‐enable parallel execute (PEX) operaSon
y Use OpenGL interop for direct rendering
20 | HSA AND FABRIC ENGINE: A GAME CHANGER FOR DIGITAL CONTENT CREATION | NOVEMBER 19, 2013 | CONFIDENTIAL
PARALLEL EXECUTE (PEX) OPERATION
y KL parallel PEX primiSve adapted for GPU execuSon ‒ Simple one-‐dimensional parallel call ‒ Decision to run on GPU made at runSme
21 | HSA AND FABRIC ENGINE: A GAME CHANGER FOR DIGITAL CONTENT CREATION | NOVEMBER 19, 2013 | CONFIDENTIAL
PARALLEL EXECUTE (PEX) OPERATION
operator gpuKernel<<<index>>>(MyStruct myStruct) { report(“[“ + index + “]: myStruct=“ + myStruct); } operator cpuKernel() { UInt32 count = 4096; Boolean useGPU = true; MyStruct myStruct; // Execute kernel 4096 times on GPU kernel<<<count@useGPU>>>(myStruct); }
22 | HSA AND FABRIC ENGINE: A GAME CHANGER FOR DIGITAL CONTENT CREATION | NOVEMBER 19, 2013 | CONFIDENTIAL
PARALLEL EXECUTE (PEX) OPERATION
y KL parallel PEX primiSve adapted for GPU execuSon ‒ Compiles KL code to GPU kernel (if not cached) ‒ Creates “trampoline” from CPU to HSA in CPU code ‒ Passes arguments to kernel
‒ Direct values or pointers to shared memory ‒ Calls HsaSubmitAql
23 | HSA AND FABRIC ENGINE: A GAME CHANGER FOR DIGITAL CONTENT CREATION | NOVEMBER 19, 2013 | CONFIDENTIAL
MEMORY REGISTRATION
y HSA runSme: All memory shared between CPU and HSA must be registered ‒ HsaRegisterSystemMemory ‒ For dynamic memory, this is easy ‒ HSA runSme provides a heap!
‒ What about variables allocated on CPU stack?
24 | HSA AND FABRIC ENGINE: A GAME CHANGER FOR DIGITAL CONTENT CREATION | NOVEMBER 19, 2013 | CONFIDENTIAL
MEMORY REGISTRATION
operator cpuCode() { UInt32 count = 4096; Boolean useGPU = true; MyStruct myStructOnStack; // Execute kernel 4096 times on GPU kernel<<<count@useGPU>>>(myStructOnStack); }
25 | HSA AND FABRIC ENGINE: A GAME CHANGER FOR DIGITAL CONTENT CREATION | NOVEMBER 19, 2013 | CONFIDENTIAL
MEMORY REGISTRATION
y SoluSon: alternate stack ‒ Register stack for each CPU thread in HSA-‐registered memory ‒ Every call to KL code “trampolines” to registered stack
26 | HSA AND FABRIC ENGINE: A GAME CHANGER FOR DIGITAL CONTENT CREATION | NOVEMBER 19, 2013 | CONFIDENTIAL
DYNAMIC MEMORY ALLOCATION
y KL supports dynamic allocaSon ‒ Internal to types (eg. variable-‐length arrays, strings) ‒ HsaAllocateSystemMemory on CPU ‒ Well-‐known GPU allocaSon algorithms ‒ eg. ScajerAlloc
‒ What about mixed allocaSon?
27 | HSA AND FABRIC ENGINE: A GAME CHANGER FOR DIGITAL CONTENT CREATION | NOVEMBER 19, 2013 | CONFIDENTIAL
DYNAMIC MEMORY ALLOCATION
operator cpuKernel() { UInt32 a[][]; a.resize(4096); // alloc CPU mem for (Index i=0; i<4096; ++i) a.resize(i%32); // alloc CPU mem gpuKernel<<<4096@true>>>(a); a.clear(); // free GPU mem and CPU mem } operator gpuKernel<<<index>>>(UInt32 a[][]) { a[index].resize(index%64); // free CPU mem, alloc GPU mem }
28 | HSA AND FABRIC ENGINE: A GAME CHANGER FOR DIGITAL CONTENT CREATION | NOVEMBER 19, 2013 | CONFIDENTIAL
DYNAMIC MEMORY ALLOCATION
y How to manage mixed allocaSon? ‒ Defer incompaSble frees ‒ GPU kernels atomically append GPU pointers to be freed to a list ‒ CPU frees pointers when kernel finishes ‒ CPU can free GPU pointers ‒ Using either system atomics or a simple mutex
29 | HSA AND FABRIC ENGINE: A GAME CHANGER FOR DIGITAL CONTENT CREATION | NOVEMBER 19, 2013 | CONFIDENTIAL
STAGE TWO RESULTS"
y For command-‐line tests (eg. naïve matrix mulSplies): 5x-‐15x performance improvement
y For real-‐world tests (eg. embedded in UI): up to 5x performance improvement
y 3D effects can be run in real-‐Sme
30 | HSA AND FABRIC ENGINE: A GAME CHANGER FOR DIGITAL CONTENT CREATION | NOVEMBER 19, 2013 | CONFIDENTIAL
STAGE TWO RESULTS"
y Paradigm shi[ for programmaSc effects ‒ Technical arSsts can make run-‐Sme changes to GPU code and see the results in real-‐Sme
31 | HSA AND FABRIC ENGINE: A GAME CHANGER FOR DIGITAL CONTENT CREATION | NOVEMBER 19, 2013 | CONFIDENTIAL
ONGOING WORK
y OpenGL interop ‒ Tag KL arrays as bound to VBOs
y GPU-‐to-‐GPU PEX y Virtual funcSons on GPU y Debugger for GPU