kloeckner slides
TRANSCRIPT
-
8/17/2019 Kloeckner Slides
1/88
Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives
GPU Metaprogramming using PyCUDA:Methods & Applications
Andreas Klöckner
Division of Applied MathematicsBrown University
GPU @ BU · November 12, 2009
Andreas Klöckner Applied Math · Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications
http://goforward/http://find/http://goback/
-
8/17/2019 Kloeckner Slides
2/88
Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives
Thanks
Tim Warburton (Rice)
Jan Hesthaven (Brown)Nicolas Pinto (MIT)PyCUDA contributorsNvidia Corporation
Andreas Klöckner Applied Math · Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications
http://find/http://goback/
-
8/17/2019 Kloeckner Slides
3/88
Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives
Outline
1 Why GPU Scripting?
2 Scripting CUDA
3 GPU Run-Time Code Generation
4 DG on GPUs
5 Perspectives
Andreas Klöckner Applied Math · Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications
http://find/
-
8/17/2019 Kloeckner Slides
4/88
Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives
Outline
1 Why GPU Scripting?Combining two Strong Tools
2 Scripting CUDA
3 GPU Run-Time Code Generation
4 DG on GPUs
5 Perspectives
Andreas Klöckner Applied Math · Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications
http://find/
-
8/17/2019 Kloeckner Slides
5/88
Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives
Combining two Strong Tools
How are High-Performance Codes constructed?
“Traditional” Construction of High-Performance Codes:
C/C++/Fortran
Libraries“Alternative” Construction of High-Performance Codes:
Scripting for ‘brains’GPUs for ‘inner loops’
Play to the strengths of eachprogramming environment.
Andreas Klöckner Applied Math · Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications
http://find/
-
8/17/2019 Kloeckner Slides
6/88
Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives
Combining two Strong Tools
Scripting: Means
A scripting language. . .is discoverable and interactive.
has comprehensive built-in functionality.manages resources automatically.is dynamically typed.works well for “gluing” lower-level blockstogether.
Andreas Klöckner Applied Math · Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications
h
http://find/
-
8/17/2019 Kloeckner Slides
7/88
Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives
Combining two Strong Tools
Scripting: Interpreted, not Compiled
Program creation workow:
Edit
Compile
Link
Run
Andreas Klöckner Applied Math · Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications
Wh GPU S i ti ? S i ti CUDA GPU RTCG DG GPU P ti
http://find/
-
8/17/2019 Kloeckner Slides
8/88
Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives
Combining two Strong Tools
Scripting: Interpreted, not Compiled
Program creation workow:
Edit
Compile
Link
Run
Andreas Klöckner Applied Math · Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications
Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives
http://find/
-
8/17/2019 Kloeckner Slides
9/88
Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives
Combining two Strong Tools
Scripting: Interpreted, not Compiled
Program creation workow:
Edit
Compile
Link
Run
Andreas Klöckner Applied Math · Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications
http://find/http://goback/
-
8/17/2019 Kloeckner Slides
10/88
Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives
-
8/17/2019 Kloeckner Slides
11/88
Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives
Combining two Strong Tools
Why do Scripting for GPUs?
GPUs are everything that scriptinglanguages are not.
Highly parallelVery architecture-sensitiveBuilt for maximum FP/memorythroughput
→ complement each otherCPU: largely restricted to control
tasks (∼1000/sec)Scripting fast enough
Andreas Klöckner Applied Math · Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications
Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives
http://find/
-
8/17/2019 Kloeckner Slides
12/88
Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives
Combining two Strong Tools
Why do Scripting for GPUs?
GPUs are everything that scriptinglanguages are not.
Highly parallelVery architecture-sensitiveBuilt for maximum FP/memorythroughput
→ complement each otherCPU: largely restricted to control
tasks (∼1000/sec)Scripting fast enoughPython + CUDA = PyCUDA
Andreas Klöckner Applied Math · Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications
Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives
http://find/
-
8/17/2019 Kloeckner Slides
13/88
W y G U Sc pt g? Sc pt g CUD G U CG DG o G Us e spect ves
Outline
1 Why GPU Scripting?
2 Scripting CUDA
PyCUDA in DetailDo More, Faster with PyCUDA
3 GPU Run-Time Code Generation
4 DG on GPUs
5 Perspectives
Andreas Klöckner Applied Math · Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications
Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives
http://find/
-
8/17/2019 Kloeckner Slides
14/88
y p g p g p
PyCUDA in Detail
Whetting your appetite
1 import pycuda.driver as cuda2 import pycuda.autoinit3 import numpy45 a = numpy.random.randn(4,4).astype( numpy.oat32)6 a gpu = cuda.mem alloc(a.nbytes)7 cuda.memcpy htod(a gpu, a)
[This is examples/demo.py in the PyCUDA distribution.]
Andreas Klöckner Applied Math · Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications
Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives
http://find/http://goback/
-
8/17/2019 Kloeckner Slides
15/88
PyCUDA in Detail
Whetting your appetite
9 mod = cuda.SourceModule(”””10 global void twice( oat ∗a)11 {12 int idx = threadIdx.x + threadIdx.y∗4;13 a[idx] ∗= 2;
14 }15 ”””)1617 func = mod.get function( ”twice”)18 func(a gpu, block=(4,4,1))1920 a doubled = numpy.empty like(a)21 cuda.memcpy dtoh(a doubled, a gpu)22 print a doubled23 print a
Andreas Klöckner Applied Math · Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications
Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives
http://find/
-
8/17/2019 Kloeckner Slides
16/88
PyCUDA in Detail
Whetting your appetite
9 mod = cuda.SourceModule(”””10 global void twice( oat ∗a)11 {12 int idx = threadIdx.x + threadIdx.y∗4;13 a[idx] ∗= 2;
14 }15 ”””)1617 func = mod.get function( ”twice”)18 func(a gpu, block=(4,4,1))1920 a doubled = numpy.empty like(a)21 cuda.memcpy dtoh(a doubled, a gpu)22 print a doubled23 print a
Compute kernel
Andreas Klöckner Applied Math · Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications
Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives
http://find/
-
8/17/2019 Kloeckner Slides
17/88
PyCUDA in Detail
Whetting your appetite, Part II
Did somebody say “Abstraction is good”?
Andreas Klöckner Applied Math · Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications
Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives
http://find/
-
8/17/2019 Kloeckner Slides
18/88
PyCUDA in Detail
Whetting your appetite, Part II
1 import numpy2 import pycuda.autoinit3 import pycuda.gpuarray as gpuarray45 a gpu = gpuarray.to gpu(6 numpy.random.randn(4,4).astype( numpy.oat32))7 a doubled = (2∗a gpu).get()8 print a doubled
9 print a gpu
Andreas Klöckner Applied Math · Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications
Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives
http://find/
-
8/17/2019 Kloeckner Slides
19/88
PyCUDA in Detail
PyCUDA Philosophy
Provide complete accessAutomatically manage resources
Provide abstractionsAllow interactive useCheck for and report errorsautomatically
Integrate tightly with numpy
Andreas Klöckner Applied Math · Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications
Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives
http://find/
-
8/17/2019 Kloeckner Slides
20/88
PyCUDA in Detail
PyCUDA: Completeness
PyCUDA exposes all of CUDA.
For example:Arrays and TexturesPagelocked host memoryMemory transfers (asynchronous,structured)Streams and EventsDevice queriesGL Interop
Andreas Klöckner Applied Math · Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications
Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives
http://find/
-
8/17/2019 Kloeckner Slides
21/88
PyCUDA in Detail
PyCUDA: Completeness
PyCUDA supports every OS thatCUDA supports.
LinuxWindowsOS X
Andreas Klöckner Applied Math · Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications
Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives
http://find/
-
8/17/2019 Kloeckner Slides
22/88
PyCUDA in Detail
PyCUDA: Workow
Edit
Run
Andreas Klöckner Applied Math · Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications
Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives
http://find/http://goback/
-
8/17/2019 Kloeckner Slides
23/88
PyCUDA in Detail
PyCUDA: Workow
Edit
Run
SourceModule("...")
Andreas Klöckner Applied Math · Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications
Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives
http://find/
-
8/17/2019 Kloeckner Slides
24/88
PyCUDA in Detail
PyCUDA: Workow
Edit
PyCUDA
Run
SourceModule("...")
Andreas Klöckner Applied Math · Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications
Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives
http://find/
-
8/17/2019 Kloeckner Slides
25/88
PyCUDA in Detail
PyCUDA: Workow
Edit
PyCUDA
Run
SourceModule("...")
Cache?
Andreas Klöckner Applied Math · Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications
Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives
http://find/
-
8/17/2019 Kloeckner Slides
26/88
PyCUDA in Detail
PyCUDA: Workow
Edit
PyCUDA
Run
SourceModule("...")
Cache?
nvcc
Andreas Klöckner Applied Math · Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications
Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives
http://find/http://goback/
-
8/17/2019 Kloeckner Slides
27/88
PyCUDA in Detail
PyCUDA: Workow
Edit
PyCUDA
Run
SourceModule("...")
Cache?
nvcc .cubin
Andreas Klöckner Applied Math · Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications
Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives
http://find/http://goback/
-
8/17/2019 Kloeckner Slides
28/88
PyCUDA in Detail
PyCUDA: Workow
Edit
PyCUDA
Run
SourceModule("...")
Cache!
nvcc .cubin
Andreas Klöckner Applied Math · Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications
Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives
P CUDA i D il
http://find/
-
8/17/2019 Kloeckner Slides
29/88
PyCUDA in Detail
PyCUDA: Workow
Edit
PyCUDA
Run
SourceModule("...")
Cache!
nvcc .cubin
Upload to GPU
Andreas Klöckner Applied Math · Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications
Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives
P CUDA i D t il
http://find/
-
8/17/2019 Kloeckner Slides
30/88
PyCUDA in Detail
PyCUDA: Workow
Edit
PyCUDA
Run
SourceModule("...")
Cache!
nvcc .cubin
Upload to GPU
Run on GPU
Andreas Klöckner Applied Math · Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications
Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives
PyCUDA in Detail
http://find/http://goback/
-
8/17/2019 Kloeckner Slides
31/88
PyCUDA in Detail
gpuarray : Simple Linear Algebra
pycuda.gpuarray :Meant to look and feel just like numpy.
gpuarray.to gpu(numpy array)
numpy array = gpuarray.get()
+, -, ∗, /, ll, sin, exp, rand,basic indexing, norm, inner product, . . .Mixed types (int32 + oat32 = oat64)print gpuarray for debugging.
Allows access to raw bitsUse as kernel arguments, textures, etc.
Andreas Klöckner Applied Math ·
Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications
Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives
Do More Faster with PyCUDA
http://find/
-
8/17/2019 Kloeckner Slides
32/88
Do More, Faster with PyCUDA
gpuarray : Elementwise expressions
Avoiding extra store-fetch cycles for elementwise math:from pycuda.curandom import rand as curanda gpu = curand((50,))b gpu = curand((50,))
from pycuda.elementwise import ElementwiseKernellin comb = ElementwiseKernel(
” oat a, oat ∗x, oat b, oat ∗y, oat ∗z”,”z[ i ] = a∗x[i] + b∗y[i]”)
c gpu = gpuarray.empty like(a gpu)lin comb(5, a gpu, 6, b gpu, c gpu)
assert la .norm((c gpu − (5∗a gpu+6∗b gpu)).get()) < 1e− 5
Andreas Klöckner Applied Math ·
Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications
Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives
Do More Faster with PyCUDA
http://find/
-
8/17/2019 Kloeckner Slides
33/88
Do More, Faster with PyCUDA
PyCUDA: Vital Information
http://mathema.tician.de/software/pycuda
Complete documentationX Consortium License(no warranty, free for all use)Requires: numpy, Boost C++,Python 2.4+.Support via mailing list.
Andreas Klöckner Applied Math ·
Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications
Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives
http://mathema.tician.de/software/pycudahttp://mathema.tician.de/software/pycudahttp://mathema.tician.de/software/pycudahttp://mathema.tician.de/software/pycudahttp://find/
-
8/17/2019 Kloeckner Slides
34/88
Outline
1 Why GPU Scripting?
2 Scripting CUDA
3 GPU Run-Time Code GenerationPrograms that write Programs
4 DG on GPUs
5 Perspectives
Andreas Klöckner Applied Math ·
Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications
Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives
Programs that write Programs
http://find/http://goback/
-
8/17/2019 Kloeckner Slides
35/88
g g
Metaprogramming
In GPU scripting,GPU code doesnot need to bea compile-time
constant.
Andreas Klöckner Applied Math ·
Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications
http://find/http://goback/
-
8/17/2019 Kloeckner Slides
36/88
-
8/17/2019 Kloeckner Slides
37/88
Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives
Programs that write Programs
-
8/17/2019 Kloeckner Slides
38/88
Metaprogramming
Idea
Python Code
GPU Code
GPU Compiler
GPU Binary
GPU
Result
In GPU scripting,GPU code doesnot need to bea compile-time
constant.
(Key: Code is data–it wants to bereasoned about at run time)
Andreas Klöckner Applied Math ·
Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications
Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives
Programs that write Programs
http://find/
-
8/17/2019 Kloeckner Slides
39/88
Metaprogramming
Idea
Python Code
GPU Code
GPU Compiler
GPU Binary
GPU
Result
Machine
In GPU scripting,GPU code doesnot need to bea compile-time
constant.
(Key: Code is data–it wants to bereasoned about at run time)
Andreas Klöckner Applied Math ·
Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications
Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives
Programs that write Programs
http://find/
-
8/17/2019 Kloeckner Slides
40/88
Metaprogramming
Idea
Python Code
GPU Code
GPU Compiler
GPU Binary
GPU
Result
Human In GPU scripting,GPU code doesnot need to bea compile-time
constant.
(Key: Code is data–it wants to bereasoned about at run time)
Andreas Klöckner Applied Math ·
Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications
Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives
Programs that write Programs
http://find/
-
8/17/2019 Kloeckner Slides
41/88
Metaprogramming
Idea
Python Code
GPU Code
GPU Compiler
GPU Binary
GPU
Result
In GPU scripting,GPU code doesnot need to bea compile-time
constant.
(Key: Code is data–it wants to bereasoned about at run time)
Good for codegeneration
Andreas Klöckner Applied Math · Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications
Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives
Programs that write Programs
http://find/http://goback/
-
8/17/2019 Kloeckner Slides
42/88
Metaprogramming
Idea
Python Code
GPU Code
GPU Compiler
GPU Binary
GPU
Result
In GPU scripting,GPU code doesnot need to bea compile-time
constant.
(Key: Code is data–it wants to bereasoned about at run time)
Good for codegeneration
P yCUDA
Andreas Klöckner Applied Math · Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications
Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives
Programs that write Programs
http://find/
-
8/17/2019 Kloeckner Slides
43/88
Machine-generated Code
Why machine-generate code?Automated Tuning(cf. ATLAS, FFTW)Data typesSpecialize code for given problemConstants faster than variables(→ register pressure)
Loop Unrolling
Andreas Klöckner Applied Math · Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications
Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives
Programs that write Programs
http://find/
-
8/17/2019 Kloeckner Slides
44/88
RTCG via Templates
from jinja2 import Templatetpl = Template( ”””global void twice( {{ type name }} ∗tgt)
{int idx = threadIdx.x +
{{ thread block size }} ∗ {{block size }}∗ blockIdx .x;
{% for i in range( block size ) %}{% set offset = i∗ thread block size % }tgt[idx + {{ offset }}] ∗= 2;
{% endfor %}}””” )
rendered tpl = tpl . render(type name= ”oat” , block size =block size,thread block size =thread block size)
smod = SourceModule(rendered tpl)
Andreas Klöckner Applied Math · Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications
Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives
Programs that write Programs
http://find/
-
8/17/2019 Kloeckner Slides
45/88
RTCG via AST Generation
from codepy.cgen import ∗from codepy.cgen. cuda import CudaGlobal
mod = Module([FunctionBody(
CudaGlobal(FunctionDeclaration(Value( ”void” , ”twice” ),arg decls =[Pointer(POD(dtype, ”tgt” ))])),
Block([Initializer (POD( numpy .int32, ”idx”),
”threadIdx.x + %d ∗blockIdx.x”% ( thread block size ∗block size )),
]+[
Assign( ”tgt[idx+%d]” % (o∗thread block size),”2 ∗tgt[idx+%d]” % (o∗thread block size))for o in range( block size )]))])
smod = SourceModule(mod)
Andreas Klöckner Applied Math · Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications
Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives
http://find/
-
8/17/2019 Kloeckner Slides
46/88
Outline
1 Why GPU Scripting?
2 Scripting CUDA
3 GPU Run-Time Code Generation
4 DG on GPUsIntroduction
DG and MetaprogrammingResults
5 PerspectivesAndreas Klöckner Applied Math · Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications
Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives
Introduction
http://find/
-
8/17/2019 Kloeckner Slides
47/88
Discontinuous Galerkin Method
Let Ω := i Dk ⊂ R d .
Andreas Klöckner Applied Math · Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications
Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives
Introduction
http://find/http://goback/
-
8/17/2019 Kloeckner Slides
48/88
Discontinuous Galerkin Method
Let Ω := i Dk ⊂ R d .GoalSolve a conservation law on Ω: u t + ∇ · F (u ) = 0
Andreas Klöckner Applied Math · Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications
Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs PerspectivesIntroduction
http://find/
-
8/17/2019 Kloeckner Slides
49/88
Discontinuous Galerkin Method
Let Ω := i Dk ⊂ R d .GoalSolve a conservation law on Ω: u t + ∇ · F (u ) = 0
Example
Maxwell’s Equations : EM eld: E (x , t ), H (x , t ) on Ω governed by
∂ t E − 1ε∇ × H = −
j ε , ∂ t H +
1µ∇ × E = 0 ,
∇ · E = ρε
, ∇ · H = 0 .
Andreas Klöckner Applied Math · Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications
Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs PerspectivesIntroduction
http://find/
-
8/17/2019 Kloeckner Slides
50/88
Discontinuous Galerkin Method
Multiply by test function, integrate by parts:
0 = ˆ Dk u t ϕ + [∇ · F (u )]ϕ dx =
ˆ Dk u t ϕ − F (u ) · ∇ϕ dx +
ˆ ∂ Dk (n̂ · F )∗ϕ dS x ,
Integrate by parts again, subsitute in basis functions, introduceelementwise differentiation and “lifting” matrices D , L:
∂ t u k
= − ν D ∂ ν ,k
[F (u k
)] + Lk
[n̂ · F − (n̂ · F )∗
]|A⊂∂ Dk .
For straight-sided simplicial elements:Reduce D ∂ ν and L to reference matrices.
Andreas Klöckner Applied Math · Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications
Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs PerspectivesDG and Metaprogramming
http://find/http://goback/
-
8/17/2019 Kloeckner Slides
51/88
Metaprogramming for GPU-DG
Specialize code for user-given problem:Flux Terms
Andreas Klöckner Applied Math · Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications
Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs PerspectivesDG and Metaprogramming
http://find/
-
8/17/2019 Kloeckner Slides
52/88
Metaprogramming for GPU-DG
Specialize code for user-given problem:Flux Terms
Automated Tuning:Memory layout
Loop slicingGather granularity
Andreas Klöckner Applied Math · Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications
Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs PerspectivesDG and Metaprogramming
http://find/
-
8/17/2019 Kloeckner Slides
53/88
Metaprogramming for GPU-DG
Specialize code for user-given problem:Flux Terms
Automated Tuning:Memory layout
Loop slicingGather granularityConstants instead of variables:
DimensionalityPolynomial degree
Element propertiesMatrix sizes
Andreas Klöckner Applied Math · Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications
Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs PerspectivesDG and Metaprogramming
http://find/
-
8/17/2019 Kloeckner Slides
54/88
Metaprogramming for GPU-DG
Specialize code for user-given problem:Flux Terms
Automated Tuning:Memory layout
Loop slicingGather granularityConstants instead of variables:
DimensionalityPolynomial degree
Element propertiesMatrix sizes
Loop Unrolling
Andreas Klöckner Applied Math · Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications
Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives
DG and Metaprogramming
f
http://find/
-
8/17/2019 Kloeckner Slides
55/88
Metaprogramming for GPU-DG
Specialize code for user-given problem:Flux Terms (*)
Automated Tuning:Memory layout
Loop slicing (*)Gather granularityConstants instead of variables:
DimensionalityPolynomial degree
Element propertiesMatrix sizes
Loop Unrolling
Andreas Klöckner Applied Math · Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications
Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives
DG and Metaprogramming
M i DG Fl T
http://find/http://goback/
-
8/17/2019 Kloeckner Slides
56/88
Metaprogramming DG: Flux Terms
0 = ˆ Dk u t ϕ + [∇ · F (u )]ϕ dx − ˆ ∂ Dk [n̂ · F − (n̂ · F )∗]ϕ dS x Flux term
Andreas Klöckner Applied Math · Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications
Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives
DG and Metaprogramming
M i DG Fl T
http://find/http://goback/
-
8/17/2019 Kloeckner Slides
57/88
Metaprogramming DG: Flux Terms
0 = ˆ Dk u t ϕ + [∇ · F (u )]ϕ dx − ˆ ∂ Dk [n̂ · F − (n̂ · F )∗]ϕ dS x Flux term
Flux terms:vary by problemexpression specied by userevaluated pointwise
Andreas Klöckner Applied Math · Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications
http://find/http://goback/
-
8/17/2019 Kloeckner Slides
58/88
Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives
DG and Metaprogramming
M t g i g DG Fl T E l
-
8/17/2019 Kloeckner Slides
59/88
Metaprogramming DG: Flux Terms Example
Example: Fluxes for Maxwell’s Equations
n̂ · (F − F ∗)E := 12
[n̂ × ( H − α n̂ × E )]
User writes: Vectorial statement in math. notation
ux = 1/2∗cross(normal, h. int − h.ext− alpha∗cross(normal, e. int − e.ext))
Andreas Klöckner Applied Math · Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications
Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives
DG and Metaprogramming
Metaprogramming DG: Flux Terms Example
http://find/
-
8/17/2019 Kloeckner Slides
60/88
Metaprogramming DG: Flux Terms Example
Example: Fluxes for Maxwell’s Equations
n̂ · (F − F ∗)E := 12
[n̂ × ( H − α n̂ × E )]
We generate: Scalar evaluator in C (6× )
a ux += (((( val a eld5 − val b eld5 )∗fpair − > normal[2]− ( val a eld4 − val b eld4 )∗fpair − > normal[0])
+ val a eld0 − val b eld0 )∗fpair − > normal[0]− ((( val a eld4 − val b eld4 ) ∗fpair − > normal[1]
− ( val a eld1 − val b eld1 )∗fpair − > normal[2])+ val a eld3 − val b eld3 ) ∗ fpair− > normal[1]
)∗value type (0.5);
Andreas Klöckner Applied Math · Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications
Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives
DG and Metaprogramming
Loop Slicing on the GPU: A Pattern
http://find/
-
8/17/2019 Kloeckner Slides
61/88
Loop Slicing on the GPU: A Pattern
Setting: N independent work units + preparationPreparation
Question: How should one assign work units to threads?
Andreas Klöckner Applied Math · Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications
Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives
DG and Metaprogramming
Loop Slicing on the GPU: A Pattern
http://find/
-
8/17/2019 Kloeckner Slides
62/88
Loop Slicing on the GPU: A Pattern
Setting: N independent work units + preparationPreparation
Question: How should one assign work units to threads?
w s : in sequenceThread
t
Andreas Klöckner Applied Math · Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications
Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives
DG and Metaprogramming
Loop Slicing on the GPU: A Pattern
http://find/
-
8/17/2019 Kloeckner Slides
63/88
Loop Slicing on the GPU: A Pattern
Setting: N independent work units + preparationPreparation
Question: How should one assign work units to threads?
w s : in sequenceThread
t
w p : in parallelThread
t
Andreas Klöckner Applied Math · Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications
Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives
DG and Metaprogramming
Loop Slicing on the GPU: A Pattern
http://find/
-
8/17/2019 Kloeckner Slides
64/88
Loop Slicing on the GPU: A Pattern
Setting: N independent work units + preparationPreparation
Question: How should one assign work units to threads?
w s : in sequenceThread
t
w i : “inline-parallel”Thread
t
w p : in parallelThread
t
Andreas Klöckner Applied Math · Brown University
GPU Metaprogramming using PyCUDA: Methods & Applications
Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives
DG and Metaprogramming
Loop Slicing on the GPU: A Pattern
http://find/
-
8/17/2019 Kloeckner Slides
65/88
Loop Slicing on the GPU: A Pattern
Setting: N independent work units + preparationPreparation
Question: How should one assign work units to threads?
w s : in sequenceThread
t
w i : “inline-parallel”Thread
t
w p : in parallelThread
t
(amortize preparation)Andreas Klöckner Applied Math · Brown University
GPU Metaprogramming using PyCUDA: Methods & Applications
Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives
DG and Metaprogramming
Loop Slicing on the GPU: A Pattern
http://find/
-
8/17/2019 Kloeckner Slides
66/88
Loop Slicing on the GPU: A Pattern
Setting: N independent work units + preparationPreparation
Question: How should one assign work units to threads?
w s : in sequenceThread
t
w i : “inline-parallel”Thread
t
w p : in parallelThread
t
(amortize preparation) (exploit register sp ace)Andreas Klöckner Applied Math · Brown University
GPU Metaprogramming using PyCUDA: Methods & Applications
http://find/
-
8/17/2019 Kloeckner Slides
67/88
Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives
Results
Nvidia GTX280 vs. single core of Intel Core 2 Duo E8400
-
8/17/2019 Kloeckner Slides
68/88
Nvidia GTX280 vs. single core of Intel Core 2 Duo E8400
2 4 6 8
P o l y n o m i a l O r d e r N
0
5 0
1 0 0
1 5 0
2 0 0
2 5 0
3 0 0
G
F
l
o
p
s
/
s
G P U
C P U
2 4 6 8
1 0
2 0
3 0
4 0
5 0
6 0
7 0
S
p
e
e
d
u
p
F
a
c
t
o
r
S p e e d u p
Andreas Klöckner Applied Math · Brown University
GPU Metaprogramming using PyCUDA: Methods & Applications
http://find/
-
8/17/2019 Kloeckner Slides
69/88
Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives
Results
GPU DG Showcase
-
8/17/2019 Kloeckner Slides
70/88
Eletromagnetism
Andreas Klöckner Applied Math · Brown University
GPU Metaprogramming using PyCUDA: Methods & Applications
Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives
Results
GPU DG Showcase
http://find/
-
8/17/2019 Kloeckner Slides
71/88
Eletromagnetism
Poisson
Andreas Klöckner Applied Math · Brown University
GPU Metaprogramming using PyCUDA: Methods & Applications
Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives
Results
GPU DG Showcase
http://find/
-
8/17/2019 Kloeckner Slides
72/88
Eletromagnetism
Poisson
CFD
Andreas Klöckner Applied Math · Brown University
GPU Metaprogramming using PyCUDA: Methods & Applications
Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives
Results
GPU DG Showcase
http://find/
-
8/17/2019 Kloeckner Slides
73/88
Eletromagnetism
Poisson
CFDetc...
Andreas Klöckner Applied Math · Brown University
GPU Metaprogramming using PyCUDA: Methods & Applications
Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives
Outline
http://find/
-
8/17/2019 Kloeckner Slides
74/88
1 Why GPU Scripting?
2 Scripting CUDA
3 GPU Run-Time Code Generation
4 DG on GPUs
5 PerspectivesConclusions
Andreas Klöckner Applied Math · Brown University
GPU Metaprogramming using PyCUDA: Methods & Applications
Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives
Related Developments
Introducing. . . PyOpenCL
http://find/
-
8/17/2019 Kloeckner Slides
75/88
PyOpenCL is“PyCUDA for OpenCL”Complete, mature API wrapper
Features like PyCUDA: not yetTested on all availableImplementations, OSshttp://mathema.tician.de/
software/pyopencl
OpenCL
Andreas Klöckner Applied Math · Brown University
GPU Metaprogramming using PyCUDA: Methods & Applications
Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives
Related Developments
Automating GPU Programming
http://mathema.tician.de/software/pyopenclhttp://mathema.tician.de/software/pyopenclhttp://mathema.tician.de/software/pyopenclhttp://mathema.tician.de/software/pyopenclhttp://find/
-
8/17/2019 Kloeckner Slides
76/88
GPU programming can be time-consuming, unintuitive anderror-prone.
Obvious idea: Let the computer do it.
One way: Smart compilers
Andreas Klöckner Applied Math · Brown University
GPU Metaprogramming using PyCUDA: Methods & Applications
Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives
Related Developments
Automating GPU Programming
http://find/
-
8/17/2019 Kloeckner Slides
77/88
GPU programming can be time-consuming, unintuitive anderror-prone.
Obvious idea: Let the computer do it.
One way: Smart compilersGPU programming requires complex tradeoffsTradeoffs require heuristicsHeuristics are fragile
Andreas Klöckner Applied Math · Brown University
GPU Metaprogramming using PyCUDA: Methods & Applications
Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives
Related Developments
Automating GPU Programming
http://find/http://goback/
-
8/17/2019 Kloeckner Slides
78/88
GPU programming can be time-consuming, unintuitive anderror-prone.
Obvious idea: Let the computer do it.
One way: Smart compilersGPU programming requires complex tradeoffsTradeoffs require heuristicsHeuristics are fragile
Another way: Dumb enumeration
Enumerate loop slicingsEnumerate prefetch optionsChoose by running resulting code on actual hardware
Andreas Klöckner Applied Math · Brown University
GPU Metaprogramming using PyCUDA: Methods & Applications Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives
Related Developments
Loo.py Example
http://find/http://goback/
-
8/17/2019 Kloeckner Slides
79/88
Empirical GPU loop optimization:a , b , c , i , j , k = [var(s) for s in ”abcijk” ]n = 500k = make loop kernel([
LoopDimension( ”i” , n),LoopDimension( ”j” , n),LoopDimension( ”k” , n),
], [(c[i+n ∗ j], a[ i+n∗k]∗b[k+n ∗ j])])
gen kwargs = {”min threads” : 128,”min blocks” : 32,}
→ Ideal case: Finds 160 GF/s kernelwithout human intervention.
Andreas Klöckner Applied Math · Brown University
GPU Metaprogramming using PyCUDA: Methods & Applications Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives
Related Developments
Loo.py Status
http://find/http://goback/
-
8/17/2019 Kloeckner Slides
80/88
Limited scope:Require input/output separationKernels must be expressible using“loopy” model(i.e. indices decompose into “output”and “reduction”)Enough for DG, LA, FD, . . .
Andreas Klöckner Applied Math · Brown University
GPU Metaprogramming using PyCUDA: Methods & Applications Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives
Related Developments
Loo.py Status
http://find/
-
8/17/2019 Kloeckner Slides
81/88
Limited scope:Require input/output separationKernels must be expressible using“loopy” model(i.e. indices decompose into “output”and “reduction”)Enough for DG, LA, FD, . . .
Kernel compilation limits trial rateNon-Goal: Peak performance
Good results currently for dense linearalgebra and (some) DG subkernels
Andreas Klöckner Applied Math · Brown University
GPU Metaprogramming using PyCUDA: Methods & Applications Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives
Conclusions
Conclusions
http://find/
-
8/17/2019 Kloeckner Slides
82/88
Fun time to be in computational science
Andreas Klöckner Applied Math · Brown University
GPU Metaprogramming using PyCUDA: Methods & Applications Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives
Conclusions
Conclusions
http://find/
-
8/17/2019 Kloeckner Slides
83/88
Fun time to be in computational scienceUse Python and PyCUDA to have even more fun :-)
With no compromise in performance
Andreas Klöckner Applied Math · Brown University
GPU Metaprogramming using PyCUDA: Methods & Applications Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives
Conclusions
Conclusions
http://find/
-
8/17/2019 Kloeckner Slides
84/88
Fun time to be in computational scienceUse Python and PyCUDA to have even more fun :-)
With no compromise in performance
GPUs and scripting work well togetherEnable Metaprogramming
Andreas Klöckner Applied Math · Brown University
GPU Metaprogramming using PyCUDA: Methods & Applications Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives
Conclusions
Conclusions
http://find/
-
8/17/2019 Kloeckner Slides
85/88
Fun time to be in computational scienceUse Python and PyCUDA to have even more fun :-)
With no compromise in performance
GPUs and scripting work well togetherEnable MetaprogrammingFurther work in GPU-DG:
Other equations (Euler, Navier-Stokes)Curvilinear Elements
Local Time Stepping
Andreas Klöckner Applied Math · Brown University
GPU Metaprogramming using PyCUDA: Methods & Applications Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives
Conclusions
Where to from here?
http://find/
-
8/17/2019 Kloeckner Slides
86/88
More at...→ http://mathema.tician.de/
CUDA-DG
AK, T. Warburton, J. Bridge, J.S. Hesthaven, “Nodal Discontinuous Galerkin Methods on Graphics Processors” ,J. Comp. Phys., 2009.
GPU RTCGAK, N. Pinto et al. PyCUDA: GPU Run-Time Code Generation for High-Performance Computing , in prep.
Andreas Klöckner Applied Math · Brown University
GPU Metaprogramming using PyCUDA: Methods & Applications Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives
Conclusions
Questions?
http://mathema.tician.de/http://mathema.tician.de/http://find/http://goback/
-
8/17/2019 Kloeckner Slides
87/88
?
Thank you for your attention!
http://mathema.tician.de/
image credits
Andreas Klöckner Applied Math · Brown University
GPU Metaprogramming using PyCUDA: Methods & Applications Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives
Conclusions
Image Credits
http://mathema.tician.de/http://mathema.tician.de/http://find/
-
8/17/2019 Kloeckner Slides
88/88
Circuitry: ickr.com/oskayC870 GPU: Nvidia Corp.Old Books: ickr.com/ppdigitalOpenCL logo: Ars Technica/Apple Corp.OS Platforms: ickr.com/aOliN.Tk
Adding Machine: ickr.com/thomashawkFloppy disk: ickr.com/ethanheinMachine: ickr.com/13521837@N00OpenCL logo: Ars Technica/Apple Corp.
Andreas Klöckner Applied Math · Brown University
GPU Metaprogramming using PyCUDA: Methods & Applications
http://find/