pl-4050, an introduction to spir for opencl application developers and compiler developers, by ...

31
HSA AND FABRIC ENGINE: A GAME CHANGER FOR DIGITAL CONTENT CREATION PETER ZION CHIEF ARCHITECT FABRIC ENGINE INC.

Upload: amd-developer-central

Post on 12-Jan-2015

519 views

Category:

Technology


3 download

DESCRIPTION

Presentation PL-4050, An Introduction to SPIR for OpenCL Application Developers and Compiler Developers, by Peter Zion, at the AMD Developer Summit (APU13) November 11-13, 2013.

TRANSCRIPT

Page 1: PL-4050, An Introduction to SPIR for OpenCL Application Developers and Compiler Developers, by  Peter Zion

HSA  AND  FABRIC  ENGINE:  A  GAME  CHANGER  FOR  DIGITAL  CONTENT  CREATION  

PETER  ZION  CHIEF  ARCHITECT  

FABRIC  ENGINE  INC.  

Page 2: PL-4050, An Introduction to SPIR for OpenCL Application Developers and Compiler Developers, by  Peter Zion

2   |      HSA  AND  FABRIC  ENGINE:  A  GAME  CHANGER  FOR  DIGITAL  CONTENT  CREATION      |      NOVEMBER  19,  2013      |      CONFIDENTIAL  

PERFORMANCE  AND  3D  

y  Performance  is  very  important  for  high-­‐end  3D  ‒ SimulaSons:  parScles,  crowds,  materials,  hair  ‒ Rendering:  scene  culling,  subdivisions,  path  tracing  

y  Quality  of  3D  content  is  largely  driven  by  available  performance  

Page 3: PL-4050, An Introduction to SPIR for OpenCL Application Developers and Compiler Developers, by  Peter Zion

3   |      HSA  AND  FABRIC  ENGINE:  A  GAME  CHANGER  FOR  DIGITAL  CONTENT  CREATION      |      NOVEMBER  19,  2013      |      CONFIDENTIAL  

PERFORMANCE  AND  3D  

y  GPU  came  from  3D,  but  sSll  mostly  used  for  rendering  in  high-­‐end  3D  content  creaSon  ‒ GPU  compute  is  domain  of  “ninja  coders”  ‒ SSll  o[en  done  through  “shader  hacks”!  

y  Need  to  democraSze  the  GPU!  

Page 4: PL-4050, An Introduction to SPIR for OpenCL Application Developers and Compiler Developers, by  Peter Zion

4   |      HSA  AND  FABRIC  ENGINE:  A  GAME  CHANGER  FOR  DIGITAL  CONTENT  CREATION      |      NOVEMBER  19,  2013      |      CONFIDENTIAL  

WHAT  IS  FABRIC  ENGINE?  

y  A  high-­‐performance  plaborm  for  building  3D  applicaSons,  effects  and  tools.  ‒ OpSmized  naSve  code  ‒ Parallelism  ‒ High-­‐end  3D  for  media  and  entertainment  

y  ApplicaSons  can  be  standalone  and/or  embedded  in  DCCs  (Maya,  So[image,  3DSMax,  …)  

Page 5: PL-4050, An Introduction to SPIR for OpenCL Application Developers and Compiler Developers, by  Peter Zion

5   |      HSA  AND  FABRIC  ENGINE:  A  GAME  CHANGER  FOR  DIGITAL  CONTENT  CREATION      |      NOVEMBER  19,  2013      |      CONFIDENTIAL  

WHAT  IS  FABRIC  ENGINE?  

} Fabric  Engine  SIGGRAPH  2013  teaser  video:  hjp://vimeo.com/70421665  

Page 6: PL-4050, An Introduction to SPIR for OpenCL Application Developers and Compiler Developers, by  Peter Zion

6   |      HSA  AND  FABRIC  ENGINE:  A  GAME  CHANGER  FOR  DIGITAL  CONTENT  CREATION      |      NOVEMBER  19,  2013      |      CONFIDENTIAL  

WHAT  IS  FABRIC  ENGINE?  

y  ApplicaSons  are  a  combinaSon  of  Python  (or  a  DCC)  and  KL  ‒ Python/DCC:  UI,  construcSon  of  3D  scenes  ‒ KL:  rendering,  simulaSon,  effects  and  data  import/export  ‒ Python/DCC  drives  execuSon  of  KL  code  

Page 7: PL-4050, An Introduction to SPIR for OpenCL Application Developers and Compiler Developers, by  Peter Zion

7   |      HSA  AND  FABRIC  ENGINE:  A  GAME  CHANGER  FOR  DIGITAL  CONTENT  CREATION      |      NOVEMBER  19,  2013      |      CONFIDENTIAL  

HORDE  

y  Horde:  High-­‐End  Crowd  SimulaSon  ‒ Thousands  of  interacSng  characters  ‒ Rigging  (puppetry)  of  each  character  ‒ Behaviour  of  characters  ‒ A  typical  Fabric  Engine  applicaSon  

Page 8: PL-4050, An Introduction to SPIR for OpenCL Application Developers and Compiler Developers, by  Peter Zion

8   |      HSA  AND  FABRIC  ENGINE:  A  GAME  CHANGER  FOR  DIGITAL  CONTENT  CREATION      |      NOVEMBER  19,  2013      |      CONFIDENTIAL  

THE  KL  LANGUAGE  

y  Procedural  y  JavaScript-­‐like  syntax  y  Rich  type  system  ‒ Integers,  Booleans,  Floats,  Strings  ‒ Fixed-­‐  and  variable-­‐size  arrays;  dicSonaries  ‒ Structures  and  Objects  

y  Pointer-­‐free  

Page 9: PL-4050, An Introduction to SPIR for OpenCL Application Developers and Compiler Developers, by  Peter Zion

9   |      HSA  AND  FABRIC  ENGINE:  A  GAME  CHANGER  FOR  DIGITAL  CONTENT  CREATION      |      NOVEMBER  19,  2013      |      CONFIDENTIAL  

THE  KL  LANGUAGE  

y  A  simple  language  ‒ Accessible  to  “technical  arSsts”  

Page 10: PL-4050, An Introduction to SPIR for OpenCL Application Developers and Compiler Developers, by  Peter Zion

10   |      HSA  AND  FABRIC  ENGINE:  A  GAME  CHANGER  FOR  DIGITAL  CONTENT  CREATION      |      NOVEMBER  19,  2013      |      CONFIDENTIAL  

THE  KL  LANGUAGE  

y  KL  is  built  on  LLVM  ‒ Targets  many  plaborms  ‒ Rich  opSmizaSons  ‒ Amazing  API  

y  KL  was  originally  designed  with  only  CPUs  in  mind  ‒ Can  it  target  the  GPU?  

Page 11: PL-4050, An Introduction to SPIR for OpenCL Application Developers and Compiler Developers, by  Peter Zion

11   |      HSA  AND  FABRIC  ENGINE:  A  GAME  CHANGER  FOR  DIGITAL  CONTENT  CREATION      |      NOVEMBER  19,  2013      |      CONFIDENTIAL  

SUPPORTING  HSA  GPUS  

y  Goals  ‒ Allow  most  KL  code  to  run  without  modificaSon  on  HSA  GPUs  ‒ Allow  KL  code  on  CPU  to  perform  a  parallel  evaluaSon  of  other  KL  code  on  GPU  ‒ Make  memory  management  as  easy  as  possible  

Page 12: PL-4050, An Introduction to SPIR for OpenCL Application Developers and Compiler Developers, by  Peter Zion

12   |      HSA  AND  FABRIC  ENGINE:  A  GAME  CHANGER  FOR  DIGITAL  CONTENT  CREATION      |      NOVEMBER  19,  2013      |      CONFIDENTIAL  

SUPPORTING  HSA  GPUS  

} Video  demo  of  Maya  integraSon  of  water  simulaSon  running  on  HSA  inside  Maya  

Page 13: PL-4050, An Introduction to SPIR for OpenCL Application Developers and Compiler Developers, by  Peter Zion

13   |      HSA  AND  FABRIC  ENGINE:  A  GAME  CHANGER  FOR  DIGITAL  CONTENT  CREATION      |      NOVEMBER  19,  2013      |      CONFIDENTIAL  

SUPPORTING  HSA  GPUS  

y  Challenges  ‒ KL  runSme  library  is  C++  ‒ MulSple  address  spaces  on  GPUs  ‒ KL  is  high-­‐level  ‒ Dynamic  memory  management  ‒ ExcepSons  ‒ “Virtual  funcSons”  

Page 14: PL-4050, An Introduction to SPIR for OpenCL Application Developers and Compiler Developers, by  Peter Zion

14   |      HSA  AND  FABRIC  ENGINE:  A  GAME  CHANGER  FOR  DIGITAL  CONTENT  CREATION      |      NOVEMBER  19,  2013      |      CONFIDENTIAL  

STAGE  ONE  

y  Goal:  get  compiler  unit  tests  passing  on  GPU  y  Convert  KL  runSme  library  to  LLVM  IR  y  Support  mulSple  address  spaces  ‒ AutomaSc  regeneraSon  of  LLVM  funcSons  for  correct  address  spaces  

y  Create  HSA-­‐based  test  harness  

Page 15: PL-4050, An Introduction to SPIR for OpenCL Application Developers and Compiler Developers, by  Peter Zion

15   |      HSA  AND  FABRIC  ENGINE:  A  GAME  CHANGER  FOR  DIGITAL  CONTENT  CREATION      |      NOVEMBER  19,  2013      |      CONFIDENTIAL  

KL  RUNTIME  LIBRARY  

y  Originally,  KL  runSme  library  was  wrijen  in  C++  ‒ Not  GPU-­‐compaSble  

y  LLVM  is  very  good  at  inlining  y  EnSre  runSme  library  was  converted  into  code  that  builds  LLVM  IR  ‒ EffecSvely,  runSme  library  is  now  dynamically  compiled  ‒ Very  low  level,  eg.  conversion  of  float  to  string  

Page 16: PL-4050, An Introduction to SPIR for OpenCL Application Developers and Compiler Developers, by  Peter Zion

16   |      HSA  AND  FABRIC  ENGINE:  A  GAME  CHANGER  FOR  DIGITAL  CONTENT  CREATION      |      NOVEMBER  19,  2013      |      CONFIDENTIAL  

MULTIPLE  ADDRESS  SPACES  

y  GPU  differenSates  between  pointers  to  private,  local  and  global  memory  

y  Rewrote  KL  code  generators  to  account  for  address  spaces  ‒ If  same  funcSon  is  used  with  two  different  combinaSons  of  pointer  type,  funcSon  is  generated  twice  

Page 17: PL-4050, An Introduction to SPIR for OpenCL Application Developers and Compiler Developers, by  Peter Zion

17   |      HSA  AND  FABRIC  ENGINE:  A  GAME  CHANGER  FOR  DIGITAL  CONTENT  CREATION      |      NOVEMBER  19,  2013      |      CONFIDENTIAL  

KL  UNIT  TESTS  

y  KL  has  a  rich  set  of  unit  tests  (~400  tests)  y  GPU  test  harness  was  easy  to  write  ‒ HSA  runSme  API  ‒ Pass  LLVM  IR  to  AMD  compiler  library  in  place  of  OpenCL  ‒ Simulate  a  heap  and  “prinb”  

y  A  few  HSA-­‐related  problems  in  our  code  ‒ Alignment,  global  iniSalizaSon,  intrinsics  

Page 18: PL-4050, An Introduction to SPIR for OpenCL Application Developers and Compiler Developers, by  Peter Zion

18   |      HSA  AND  FABRIC  ENGINE:  A  GAME  CHANGER  FOR  DIGITAL  CONTENT  CREATION      |      NOVEMBER  19,  2013      |      CONFIDENTIAL  

STAGE  ONE  RESULTS  

y  Vast  majority  of  KL  unit  tests  pass  on  HSA  ‒ Failures  are  very  isolated  ‒ eg.  unsupported  transcendentals  

‒ LLVM  IR  -­‐>  HSAIL  path  in  AMD  compiler  library  is  stable  

Page 19: PL-4050, An Introduction to SPIR for OpenCL Application Developers and Compiler Developers, by  Peter Zion

19   |      HSA  AND  FABRIC  ENGINE:  A  GAME  CHANGER  FOR  DIGITAL  CONTENT  CREATION      |      NOVEMBER  19,  2013      |      CONFIDENTIAL  

STAGE  TWO  

y  Goal:  support  trampoline  from  CPU  to  GPU  ‒ Meaning:  GPU  kernel  execuSon  from  KL  code  running  on  CPU  ‒ GPU-­‐enable  parallel  execute  (PEX)  operaSon

y  Use  OpenGL  interop  for  direct  rendering  

Page 20: PL-4050, An Introduction to SPIR for OpenCL Application Developers and Compiler Developers, by  Peter Zion

20   |      HSA  AND  FABRIC  ENGINE:  A  GAME  CHANGER  FOR  DIGITAL  CONTENT  CREATION      |      NOVEMBER  19,  2013      |      CONFIDENTIAL  

PARALLEL  EXECUTE  (PEX)  OPERATION  

y  KL  parallel  PEX  primiSve  adapted  for  GPU  execuSon  ‒ Simple  one-­‐dimensional  parallel  call  ‒ Decision  to  run  on  GPU  made  at  runSme  

Page 21: PL-4050, An Introduction to SPIR for OpenCL Application Developers and Compiler Developers, by  Peter Zion

21   |      HSA  AND  FABRIC  ENGINE:  A  GAME  CHANGER  FOR  DIGITAL  CONTENT  CREATION      |      NOVEMBER  19,  2013      |      CONFIDENTIAL  

PARALLEL  EXECUTE  (PEX)  OPERATION

operator gpuKernel<<<index>>>(MyStruct myStruct) { report(“[“ + index + “]: myStruct=“ + myStruct); } operator cpuKernel() { UInt32 count = 4096; Boolean useGPU = true; MyStruct myStruct; // Execute kernel 4096 times on GPU kernel<<<count@useGPU>>>(myStruct); }

Page 22: PL-4050, An Introduction to SPIR for OpenCL Application Developers and Compiler Developers, by  Peter Zion

22   |      HSA  AND  FABRIC  ENGINE:  A  GAME  CHANGER  FOR  DIGITAL  CONTENT  CREATION      |      NOVEMBER  19,  2013      |      CONFIDENTIAL  

PARALLEL  EXECUTE  (PEX)  OPERATION  

y  KL  parallel  PEX  primiSve  adapted  for  GPU  execuSon  ‒ Compiles  KL  code  to  GPU  kernel  (if  not  cached)  ‒ Creates  “trampoline”  from  CPU  to  HSA  in  CPU  code  ‒ Passes  arguments  to  kernel  

‒ Direct  values  or  pointers  to  shared  memory  ‒ Calls  HsaSubmitAql

Page 23: PL-4050, An Introduction to SPIR for OpenCL Application Developers and Compiler Developers, by  Peter Zion

23   |      HSA  AND  FABRIC  ENGINE:  A  GAME  CHANGER  FOR  DIGITAL  CONTENT  CREATION      |      NOVEMBER  19,  2013      |      CONFIDENTIAL  

MEMORY  REGISTRATION

y  HSA  runSme:  All  memory  shared  between  CPU  and  HSA  must  be  registered  ‒ HsaRegisterSystemMemory ‒ For  dynamic  memory,  this  is  easy  ‒ HSA  runSme  provides  a  heap!  

‒ What  about  variables  allocated  on  CPU  stack?  

Page 24: PL-4050, An Introduction to SPIR for OpenCL Application Developers and Compiler Developers, by  Peter Zion

24   |      HSA  AND  FABRIC  ENGINE:  A  GAME  CHANGER  FOR  DIGITAL  CONTENT  CREATION      |      NOVEMBER  19,  2013      |      CONFIDENTIAL  

MEMORY  REGISTRATION

operator cpuCode() { UInt32 count = 4096; Boolean useGPU = true; MyStruct myStructOnStack; // Execute kernel 4096 times on GPU kernel<<<count@useGPU>>>(myStructOnStack); }

Page 25: PL-4050, An Introduction to SPIR for OpenCL Application Developers and Compiler Developers, by  Peter Zion

25   |      HSA  AND  FABRIC  ENGINE:  A  GAME  CHANGER  FOR  DIGITAL  CONTENT  CREATION      |      NOVEMBER  19,  2013      |      CONFIDENTIAL  

MEMORY  REGISTRATION

y  SoluSon:  alternate  stack  ‒ Register  stack  for  each  CPU  thread  in  HSA-­‐registered  memory  ‒ Every  call  to  KL  code  “trampolines”  to  registered  stack  

Page 26: PL-4050, An Introduction to SPIR for OpenCL Application Developers and Compiler Developers, by  Peter Zion

26   |      HSA  AND  FABRIC  ENGINE:  A  GAME  CHANGER  FOR  DIGITAL  CONTENT  CREATION      |      NOVEMBER  19,  2013      |      CONFIDENTIAL  

DYNAMIC  MEMORY  ALLOCATION

y  KL  supports  dynamic  allocaSon  ‒ Internal  to  types  (eg.  variable-­‐length  arrays,  strings)  ‒ HsaAllocateSystemMemory  on  CPU  ‒ Well-­‐known  GPU  allocaSon  algorithms  ‒ eg.  ScajerAlloc  

‒ What  about  mixed  allocaSon?  

Page 27: PL-4050, An Introduction to SPIR for OpenCL Application Developers and Compiler Developers, by  Peter Zion

27   |      HSA  AND  FABRIC  ENGINE:  A  GAME  CHANGER  FOR  DIGITAL  CONTENT  CREATION      |      NOVEMBER  19,  2013      |      CONFIDENTIAL  

DYNAMIC  MEMORY  ALLOCATION

operator cpuKernel() { UInt32 a[][]; a.resize(4096); // alloc CPU mem for (Index i=0; i<4096; ++i) a.resize(i%32); // alloc CPU mem gpuKernel<<<4096@true>>>(a); a.clear(); // free GPU mem and CPU mem } operator gpuKernel<<<index>>>(UInt32 a[][]) { a[index].resize(index%64); // free CPU mem, alloc GPU mem }

Page 28: PL-4050, An Introduction to SPIR for OpenCL Application Developers and Compiler Developers, by  Peter Zion

28   |      HSA  AND  FABRIC  ENGINE:  A  GAME  CHANGER  FOR  DIGITAL  CONTENT  CREATION      |      NOVEMBER  19,  2013      |      CONFIDENTIAL  

DYNAMIC  MEMORY  ALLOCATION

y  How  to  manage  mixed  allocaSon?  ‒ Defer  incompaSble  frees  ‒ GPU  kernels  atomically  append  GPU  pointers  to  be  freed  to  a  list  ‒ CPU  frees  pointers  when  kernel  finishes  ‒ CPU  can  free  GPU  pointers  ‒ Using  either  system  atomics  or  a  simple  mutex  

Page 29: PL-4050, An Introduction to SPIR for OpenCL Application Developers and Compiler Developers, by  Peter Zion

29   |      HSA  AND  FABRIC  ENGINE:  A  GAME  CHANGER  FOR  DIGITAL  CONTENT  CREATION      |      NOVEMBER  19,  2013      |      CONFIDENTIAL  

STAGE TWO RESULTS"

y  For  command-­‐line  tests  (eg.  naïve  matrix  mulSplies):  5x-­‐15x  performance  improvement  

y  For  real-­‐world  tests  (eg.  embedded  in  UI):  up  to  5x  performance  improvement  

y  3D  effects  can  be  run  in  real-­‐Sme  

Page 30: PL-4050, An Introduction to SPIR for OpenCL Application Developers and Compiler Developers, by  Peter Zion

30   |      HSA  AND  FABRIC  ENGINE:  A  GAME  CHANGER  FOR  DIGITAL  CONTENT  CREATION      |      NOVEMBER  19,  2013      |      CONFIDENTIAL  

STAGE TWO RESULTS"

y  Paradigm  shi[  for  programmaSc  effects  ‒ Technical  arSsts  can  make  run-­‐Sme  changes  to  GPU  code  and  see  the  results  in  real-­‐Sme  

Page 31: PL-4050, An Introduction to SPIR for OpenCL Application Developers and Compiler Developers, by  Peter Zion

31   |      HSA  AND  FABRIC  ENGINE:  A  GAME  CHANGER  FOR  DIGITAL  CONTENT  CREATION      |      NOVEMBER  19,  2013      |      CONFIDENTIAL  

ONGOING  WORK  

y  OpenGL  interop  ‒ Tag  KL  arrays  as  bound  to  VBOs  

y  GPU-­‐to-­‐GPU  PEX  y  Virtual  funcSons  on  GPU  y  Debugger  for  GPU