kloeckner slides

Upload: eftihis85

Post on 06-Jul-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/17/2019 Kloeckner Slides

    1/88

    Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives

    GPU Metaprogramming using PyCUDA:Methods & Applications

    Andreas Klöckner

    Division of Applied MathematicsBrown University

    GPU @ BU · November 12, 2009

    Andreas Klöckner Applied Math · Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications

    http://goforward/http://find/http://goback/

  • 8/17/2019 Kloeckner Slides

    2/88

    Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives

    Thanks

    Tim Warburton (Rice)

    Jan Hesthaven (Brown)Nicolas Pinto (MIT)PyCUDA contributorsNvidia Corporation

    Andreas Klöckner Applied Math · Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications

    http://find/http://goback/

  • 8/17/2019 Kloeckner Slides

    3/88

    Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives

    Outline

    1 Why GPU Scripting?

    2 Scripting CUDA

    3 GPU Run-Time Code Generation

    4 DG on GPUs

    5 Perspectives

    Andreas Klöckner Applied Math · Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications

    http://find/

  • 8/17/2019 Kloeckner Slides

    4/88

    Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives

    Outline

    1 Why GPU Scripting?Combining two Strong Tools

    2 Scripting CUDA

    3 GPU Run-Time Code Generation

    4 DG on GPUs

    5 Perspectives

    Andreas Klöckner Applied Math · Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications

    http://find/

  • 8/17/2019 Kloeckner Slides

    5/88

    Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives

    Combining two Strong Tools

    How are High-Performance Codes constructed?

    “Traditional” Construction of High-Performance Codes:

    C/C++/Fortran

    Libraries“Alternative” Construction of High-Performance Codes:

    Scripting for ‘brains’GPUs for ‘inner loops’

    Play to the strengths of eachprogramming environment.

    Andreas Klöckner Applied Math · Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications

    http://find/

  • 8/17/2019 Kloeckner Slides

    6/88

    Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives

    Combining two Strong Tools

    Scripting: Means

    A scripting language. . .is discoverable and interactive.

    has comprehensive built-in functionality.manages resources automatically.is dynamically typed.works well for “gluing” lower-level blockstogether.

    Andreas Klöckner Applied Math · Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications

    h

    http://find/

  • 8/17/2019 Kloeckner Slides

    7/88

    Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives

    Combining two Strong Tools

    Scripting: Interpreted, not Compiled

    Program creation workow:

    Edit

    Compile

    Link

    Run

    Andreas Klöckner Applied Math · Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications

    Wh GPU S i ti ? S i ti CUDA GPU RTCG DG GPU P ti

    http://find/

  • 8/17/2019 Kloeckner Slides

    8/88

    Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives

    Combining two Strong Tools

    Scripting: Interpreted, not Compiled

    Program creation workow:

    Edit

    Compile

    Link

    Run

    Andreas Klöckner Applied Math · Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications

    Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives

    http://find/

  • 8/17/2019 Kloeckner Slides

    9/88

    Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives

    Combining two Strong Tools

    Scripting: Interpreted, not Compiled

    Program creation workow:

    Edit

    Compile

    Link

    Run

    Andreas Klöckner Applied Math · Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications

    http://find/http://goback/

  • 8/17/2019 Kloeckner Slides

    10/88

    Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives

  • 8/17/2019 Kloeckner Slides

    11/88

    Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives

    Combining two Strong Tools

    Why do Scripting for GPUs?

    GPUs are everything that scriptinglanguages are not.

    Highly parallelVery architecture-sensitiveBuilt for maximum FP/memorythroughput

    → complement each otherCPU: largely restricted to control

    tasks (∼1000/sec)Scripting fast enough

    Andreas Klöckner Applied Math · Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications

    Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives

    http://find/

  • 8/17/2019 Kloeckner Slides

    12/88

    Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives

    Combining two Strong Tools

    Why do Scripting for GPUs?

    GPUs are everything that scriptinglanguages are not.

    Highly parallelVery architecture-sensitiveBuilt for maximum FP/memorythroughput

    → complement each otherCPU: largely restricted to control

    tasks (∼1000/sec)Scripting fast enoughPython + CUDA = PyCUDA

    Andreas Klöckner Applied Math · Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications

    Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives

    http://find/

  • 8/17/2019 Kloeckner Slides

    13/88

    W y G U Sc pt g? Sc pt g CUD G U CG DG o G Us e spect ves

    Outline

    1 Why GPU Scripting?

    2 Scripting CUDA

    PyCUDA in DetailDo More, Faster with PyCUDA

    3 GPU Run-Time Code Generation

    4 DG on GPUs

    5 Perspectives

    Andreas Klöckner Applied Math · Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications

    Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives

    http://find/

  • 8/17/2019 Kloeckner Slides

    14/88

    y p g p g p

    PyCUDA in Detail

    Whetting your appetite

    1 import pycuda.driver as cuda2 import pycuda.autoinit3 import numpy45 a = numpy.random.randn(4,4).astype( numpy.oat32)6 a gpu = cuda.mem alloc(a.nbytes)7 cuda.memcpy htod(a gpu, a)

    [This is examples/demo.py in the PyCUDA distribution.]

    Andreas Klöckner Applied Math · Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications

    Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives

    http://find/http://goback/

  • 8/17/2019 Kloeckner Slides

    15/88

    PyCUDA in Detail

    Whetting your appetite

    9 mod = cuda.SourceModule(”””10 global void twice( oat ∗a)11 {12 int idx = threadIdx.x + threadIdx.y∗4;13 a[idx] ∗= 2;

    14 }15 ”””)1617 func = mod.get function( ”twice”)18 func(a gpu, block=(4,4,1))1920 a doubled = numpy.empty like(a)21 cuda.memcpy dtoh(a doubled, a gpu)22 print a doubled23 print a

    Andreas Klöckner Applied Math · Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications

    Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives

    http://find/

  • 8/17/2019 Kloeckner Slides

    16/88

    PyCUDA in Detail

    Whetting your appetite

    9 mod = cuda.SourceModule(”””10 global void twice( oat ∗a)11 {12 int idx = threadIdx.x + threadIdx.y∗4;13 a[idx] ∗= 2;

    14 }15 ”””)1617 func = mod.get function( ”twice”)18 func(a gpu, block=(4,4,1))1920 a doubled = numpy.empty like(a)21 cuda.memcpy dtoh(a doubled, a gpu)22 print a doubled23 print a

    Compute kernel

    Andreas Klöckner Applied Math · Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications

    Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives

    http://find/

  • 8/17/2019 Kloeckner Slides

    17/88

    PyCUDA in Detail

    Whetting your appetite, Part II

    Did somebody say “Abstraction is good”?

    Andreas Klöckner Applied Math · Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications

    Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives

    http://find/

  • 8/17/2019 Kloeckner Slides

    18/88

    PyCUDA in Detail

    Whetting your appetite, Part II

    1 import numpy2 import pycuda.autoinit3 import pycuda.gpuarray as gpuarray45 a gpu = gpuarray.to gpu(6 numpy.random.randn(4,4).astype( numpy.oat32))7 a doubled = (2∗a gpu).get()8 print a doubled

    9 print a gpu

    Andreas Klöckner Applied Math · Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications

    Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives

    http://find/

  • 8/17/2019 Kloeckner Slides

    19/88

    PyCUDA in Detail

    PyCUDA Philosophy

    Provide complete accessAutomatically manage resources

    Provide abstractionsAllow interactive useCheck for and report errorsautomatically

    Integrate tightly with numpy

    Andreas Klöckner Applied Math · Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications

    Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives

    http://find/

  • 8/17/2019 Kloeckner Slides

    20/88

    PyCUDA in Detail

    PyCUDA: Completeness

    PyCUDA exposes all of CUDA.

    For example:Arrays and TexturesPagelocked host memoryMemory transfers (asynchronous,structured)Streams and EventsDevice queriesGL Interop

    Andreas Klöckner Applied Math · Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications

    Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives

    http://find/

  • 8/17/2019 Kloeckner Slides

    21/88

    PyCUDA in Detail

    PyCUDA: Completeness

    PyCUDA supports every OS thatCUDA supports.

    LinuxWindowsOS X

    Andreas Klöckner Applied Math · Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications

    Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives

    http://find/

  • 8/17/2019 Kloeckner Slides

    22/88

    PyCUDA in Detail

    PyCUDA: Workow

    Edit

    Run

    Andreas Klöckner Applied Math · Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications

    Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives

    http://find/http://goback/

  • 8/17/2019 Kloeckner Slides

    23/88

    PyCUDA in Detail

    PyCUDA: Workow

    Edit

    Run

    SourceModule("...")

    Andreas Klöckner Applied Math · Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications

    Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives

    http://find/

  • 8/17/2019 Kloeckner Slides

    24/88

    PyCUDA in Detail

    PyCUDA: Workow

    Edit

    PyCUDA

    Run

    SourceModule("...")

    Andreas Klöckner Applied Math · Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications

    Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives

    http://find/

  • 8/17/2019 Kloeckner Slides

    25/88

    PyCUDA in Detail

    PyCUDA: Workow

    Edit

    PyCUDA

    Run

    SourceModule("...")

    Cache?

    Andreas Klöckner Applied Math · Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications

    Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives

    http://find/

  • 8/17/2019 Kloeckner Slides

    26/88

    PyCUDA in Detail

    PyCUDA: Workow

    Edit

    PyCUDA

    Run

    SourceModule("...")

    Cache?

    nvcc

    Andreas Klöckner Applied Math · Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications

    Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives

    http://find/http://goback/

  • 8/17/2019 Kloeckner Slides

    27/88

    PyCUDA in Detail

    PyCUDA: Workow

    Edit

    PyCUDA

    Run

    SourceModule("...")

    Cache?

    nvcc .cubin

    Andreas Klöckner Applied Math · Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications

    Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives

    http://find/http://goback/

  • 8/17/2019 Kloeckner Slides

    28/88

    PyCUDA in Detail

    PyCUDA: Workow

    Edit

    PyCUDA

    Run

    SourceModule("...")

    Cache!

    nvcc .cubin

    Andreas Klöckner Applied Math · Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications

    Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives

    P CUDA i D il

    http://find/

  • 8/17/2019 Kloeckner Slides

    29/88

    PyCUDA in Detail

    PyCUDA: Workow

    Edit

    PyCUDA

    Run

    SourceModule("...")

    Cache!

    nvcc .cubin

    Upload to GPU

    Andreas Klöckner Applied Math · Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications

    Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives

    P CUDA i D t il

    http://find/

  • 8/17/2019 Kloeckner Slides

    30/88

    PyCUDA in Detail

    PyCUDA: Workow

    Edit

    PyCUDA

    Run

    SourceModule("...")

    Cache!

    nvcc .cubin

    Upload to GPU

    Run on GPU

    Andreas Klöckner Applied Math · Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications

    Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives

    PyCUDA in Detail

    http://find/http://goback/

  • 8/17/2019 Kloeckner Slides

    31/88

    PyCUDA in Detail

    gpuarray : Simple Linear Algebra

    pycuda.gpuarray :Meant to look and feel just like numpy.

    gpuarray.to gpu(numpy array)

    numpy array = gpuarray.get()

    +, -, ∗, /, ll, sin, exp, rand,basic indexing, norm, inner product, . . .Mixed types (int32 + oat32 = oat64)print gpuarray for debugging.

    Allows access to raw bitsUse as kernel arguments, textures, etc.

    Andreas Klöckner Applied Math ·

    Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications

    Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives

    Do More Faster with PyCUDA

    http://find/

  • 8/17/2019 Kloeckner Slides

    32/88

    Do More, Faster with PyCUDA

    gpuarray : Elementwise expressions

    Avoiding extra store-fetch cycles for elementwise math:from pycuda.curandom import rand as curanda gpu = curand((50,))b gpu = curand((50,))

    from pycuda.elementwise import ElementwiseKernellin comb = ElementwiseKernel(

    ” oat a, oat ∗x, oat b, oat ∗y, oat ∗z”,”z[ i ] = a∗x[i] + b∗y[i]”)

    c gpu = gpuarray.empty like(a gpu)lin comb(5, a gpu, 6, b gpu, c gpu)

    assert la .norm((c gpu − (5∗a gpu+6∗b gpu)).get()) < 1e− 5

    Andreas Klöckner Applied Math ·

    Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications

    Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives

    Do More Faster with PyCUDA

    http://find/

  • 8/17/2019 Kloeckner Slides

    33/88

    Do More, Faster with PyCUDA

    PyCUDA: Vital Information

    http://mathema.tician.de/software/pycuda

    Complete documentationX Consortium License(no warranty, free for all use)Requires: numpy, Boost C++,Python 2.4+.Support via mailing list.

    Andreas Klöckner Applied Math ·

    Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications

    Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives

    http://mathema.tician.de/software/pycudahttp://mathema.tician.de/software/pycudahttp://mathema.tician.de/software/pycudahttp://mathema.tician.de/software/pycudahttp://find/

  • 8/17/2019 Kloeckner Slides

    34/88

    Outline

    1 Why GPU Scripting?

    2 Scripting CUDA

    3 GPU Run-Time Code GenerationPrograms that write Programs

    4 DG on GPUs

    5 Perspectives

    Andreas Klöckner Applied Math ·

    Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications

    Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives

    Programs that write Programs

    http://find/http://goback/

  • 8/17/2019 Kloeckner Slides

    35/88

    g g

    Metaprogramming

    In GPU scripting,GPU code doesnot need to bea compile-time

    constant.

    Andreas Klöckner Applied Math ·

    Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications

    http://find/http://goback/

  • 8/17/2019 Kloeckner Slides

    36/88

  • 8/17/2019 Kloeckner Slides

    37/88

    Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives

    Programs that write Programs

  • 8/17/2019 Kloeckner Slides

    38/88

    Metaprogramming

    Idea

    Python Code

    GPU Code

    GPU Compiler

    GPU Binary

    GPU

    Result

    In GPU scripting,GPU code doesnot need to bea compile-time

    constant.

    (Key: Code is data–it wants to bereasoned about at run time)

    Andreas Klöckner Applied Math ·

    Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications

    Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives

    Programs that write Programs

    http://find/

  • 8/17/2019 Kloeckner Slides

    39/88

    Metaprogramming

    Idea

    Python Code

    GPU Code

    GPU Compiler

    GPU Binary

    GPU

    Result

    Machine

    In GPU scripting,GPU code doesnot need to bea compile-time

    constant.

    (Key: Code is data–it wants to bereasoned about at run time)

    Andreas Klöckner Applied Math ·

    Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications

    Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives

    Programs that write Programs

    http://find/

  • 8/17/2019 Kloeckner Slides

    40/88

    Metaprogramming

    Idea

    Python Code

    GPU Code

    GPU Compiler

    GPU Binary

    GPU

    Result

    Human In GPU scripting,GPU code doesnot need to bea compile-time

    constant.

    (Key: Code is data–it wants to bereasoned about at run time)

    Andreas Klöckner Applied Math ·

    Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications

    Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives

    Programs that write Programs

    http://find/

  • 8/17/2019 Kloeckner Slides

    41/88

    Metaprogramming

    Idea

    Python Code

    GPU Code

    GPU Compiler

    GPU Binary

    GPU

    Result

    In GPU scripting,GPU code doesnot need to bea compile-time

    constant.

    (Key: Code is data–it wants to bereasoned about at run time)

    Good for codegeneration

    Andreas Klöckner Applied Math · Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications

    Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives

    Programs that write Programs

    http://find/http://goback/

  • 8/17/2019 Kloeckner Slides

    42/88

    Metaprogramming

    Idea

    Python Code

    GPU Code

    GPU Compiler

    GPU Binary

    GPU

    Result

    In GPU scripting,GPU code doesnot need to bea compile-time

    constant.

    (Key: Code is data–it wants to bereasoned about at run time)

    Good for codegeneration

    P yCUDA

    Andreas Klöckner Applied Math · Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications

    Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives

    Programs that write Programs

    http://find/

  • 8/17/2019 Kloeckner Slides

    43/88

    Machine-generated Code

    Why machine-generate code?Automated Tuning(cf. ATLAS, FFTW)Data typesSpecialize code for given problemConstants faster than variables(→ register pressure)

    Loop Unrolling

    Andreas Klöckner Applied Math · Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications

    Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives

    Programs that write Programs

    http://find/

  • 8/17/2019 Kloeckner Slides

    44/88

    RTCG via Templates

    from jinja2 import Templatetpl = Template( ”””global void twice( {{ type name }} ∗tgt)

    {int idx = threadIdx.x +

    {{ thread block size }} ∗ {{block size }}∗ blockIdx .x;

    {% for i in range( block size ) %}{% set offset = i∗ thread block size % }tgt[idx + {{ offset }}] ∗= 2;

    {% endfor %}}””” )

    rendered tpl = tpl . render(type name= ”oat” , block size =block size,thread block size =thread block size)

    smod = SourceModule(rendered tpl)

    Andreas Klöckner Applied Math · Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications

    Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives

    Programs that write Programs

    http://find/

  • 8/17/2019 Kloeckner Slides

    45/88

    RTCG via AST Generation

    from codepy.cgen import ∗from codepy.cgen. cuda import CudaGlobal

    mod = Module([FunctionBody(

    CudaGlobal(FunctionDeclaration(Value( ”void” , ”twice” ),arg decls =[Pointer(POD(dtype, ”tgt” ))])),

    Block([Initializer (POD( numpy .int32, ”idx”),

    ”threadIdx.x + %d ∗blockIdx.x”% ( thread block size ∗block size )),

    ]+[

    Assign( ”tgt[idx+%d]” % (o∗thread block size),”2 ∗tgt[idx+%d]” % (o∗thread block size))for o in range( block size )]))])

    smod = SourceModule(mod)

    Andreas Klöckner Applied Math · Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications

    Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives

    http://find/

  • 8/17/2019 Kloeckner Slides

    46/88

    Outline

    1 Why GPU Scripting?

    2 Scripting CUDA

    3 GPU Run-Time Code Generation

    4 DG on GPUsIntroduction

    DG and MetaprogrammingResults

    5 PerspectivesAndreas Klöckner Applied Math · Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications

    Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives

    Introduction

    http://find/

  • 8/17/2019 Kloeckner Slides

    47/88

    Discontinuous Galerkin Method

    Let Ω := i Dk ⊂ R d .

    Andreas Klöckner Applied Math · Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications

    Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives

    Introduction

    http://find/http://goback/

  • 8/17/2019 Kloeckner Slides

    48/88

    Discontinuous Galerkin Method

    Let Ω := i Dk ⊂ R d .GoalSolve a conservation law on Ω: u t + ∇ · F (u ) = 0

    Andreas Klöckner Applied Math · Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications

    Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs PerspectivesIntroduction

    http://find/

  • 8/17/2019 Kloeckner Slides

    49/88

    Discontinuous Galerkin Method

    Let Ω := i Dk ⊂ R d .GoalSolve a conservation law on Ω: u t + ∇ · F (u ) = 0

    Example

    Maxwell’s Equations : EM eld: E (x , t ), H (x , t ) on Ω governed by

    ∂ t E − 1ε∇ × H = −

    j ε , ∂ t H +

    1µ∇ × E = 0 ,

    ∇ · E = ρε

    , ∇ · H = 0 .

    Andreas Klöckner Applied Math · Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications

    Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs PerspectivesIntroduction

    http://find/

  • 8/17/2019 Kloeckner Slides

    50/88

    Discontinuous Galerkin Method

    Multiply by test function, integrate by parts:

    0 = ˆ Dk u t ϕ + [∇ · F (u )]ϕ dx =

    ˆ Dk u t ϕ − F (u ) · ∇ϕ dx +

    ˆ ∂ Dk (n̂ · F )∗ϕ dS x ,

    Integrate by parts again, subsitute in basis functions, introduceelementwise differentiation and “lifting” matrices D , L:

    ∂ t u k

    = − ν D ∂ ν ,k

    [F (u k

    )] + Lk

    [n̂ · F − (n̂ · F )∗

    ]|A⊂∂ Dk .

    For straight-sided simplicial elements:Reduce D ∂ ν and L to reference matrices.

    Andreas Klöckner Applied Math · Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications

    Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs PerspectivesDG and Metaprogramming

    http://find/http://goback/

  • 8/17/2019 Kloeckner Slides

    51/88

    Metaprogramming for GPU-DG

    Specialize code for user-given problem:Flux Terms

    Andreas Klöckner Applied Math · Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications

    Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs PerspectivesDG and Metaprogramming

    http://find/

  • 8/17/2019 Kloeckner Slides

    52/88

    Metaprogramming for GPU-DG

    Specialize code for user-given problem:Flux Terms

    Automated Tuning:Memory layout

    Loop slicingGather granularity

    Andreas Klöckner Applied Math · Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications

    Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs PerspectivesDG and Metaprogramming

    http://find/

  • 8/17/2019 Kloeckner Slides

    53/88

    Metaprogramming for GPU-DG

    Specialize code for user-given problem:Flux Terms

    Automated Tuning:Memory layout

    Loop slicingGather granularityConstants instead of variables:

    DimensionalityPolynomial degree

    Element propertiesMatrix sizes

    Andreas Klöckner Applied Math · Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications

    Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs PerspectivesDG and Metaprogramming

    http://find/

  • 8/17/2019 Kloeckner Slides

    54/88

    Metaprogramming for GPU-DG

    Specialize code for user-given problem:Flux Terms

    Automated Tuning:Memory layout

    Loop slicingGather granularityConstants instead of variables:

    DimensionalityPolynomial degree

    Element propertiesMatrix sizes

    Loop Unrolling

    Andreas Klöckner Applied Math · Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications

    Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives

    DG and Metaprogramming

    f

    http://find/

  • 8/17/2019 Kloeckner Slides

    55/88

    Metaprogramming for GPU-DG

    Specialize code for user-given problem:Flux Terms (*)

    Automated Tuning:Memory layout

    Loop slicing (*)Gather granularityConstants instead of variables:

    DimensionalityPolynomial degree

    Element propertiesMatrix sizes

    Loop Unrolling

    Andreas Klöckner Applied Math · Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications

    Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives

    DG and Metaprogramming

    M i DG Fl T

    http://find/http://goback/

  • 8/17/2019 Kloeckner Slides

    56/88

    Metaprogramming DG: Flux Terms

    0 = ˆ Dk u t ϕ + [∇ · F (u )]ϕ dx − ˆ ∂ Dk [n̂ · F − (n̂ · F )∗]ϕ dS x Flux term

    Andreas Klöckner Applied Math · Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications

    Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives

    DG and Metaprogramming

    M i DG Fl T

    http://find/http://goback/

  • 8/17/2019 Kloeckner Slides

    57/88

    Metaprogramming DG: Flux Terms

    0 = ˆ Dk u t ϕ + [∇ · F (u )]ϕ dx − ˆ ∂ Dk [n̂ · F − (n̂ · F )∗]ϕ dS x Flux term

    Flux terms:vary by problemexpression specied by userevaluated pointwise

    Andreas Klöckner Applied Math · Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications

    http://find/http://goback/

  • 8/17/2019 Kloeckner Slides

    58/88

    Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives

    DG and Metaprogramming

    M t g i g DG Fl T E l

  • 8/17/2019 Kloeckner Slides

    59/88

    Metaprogramming DG: Flux Terms Example

    Example: Fluxes for Maxwell’s Equations

    n̂ · (F − F ∗)E := 12

    [n̂ × ( H − α n̂ × E )]

    User writes: Vectorial statement in math. notation

    ux = 1/2∗cross(normal, h. int − h.ext− alpha∗cross(normal, e. int − e.ext))

    Andreas Klöckner Applied Math · Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications

    Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives

    DG and Metaprogramming

    Metaprogramming DG: Flux Terms Example

    http://find/

  • 8/17/2019 Kloeckner Slides

    60/88

    Metaprogramming DG: Flux Terms Example

    Example: Fluxes for Maxwell’s Equations

    n̂ · (F − F ∗)E := 12

    [n̂ × ( H − α n̂ × E )]

    We generate: Scalar evaluator in C (6× )

    a ux += (((( val a eld5 − val b eld5 )∗fpair − > normal[2]− ( val a eld4 − val b eld4 )∗fpair − > normal[0])

    + val a eld0 − val b eld0 )∗fpair − > normal[0]− ((( val a eld4 − val b eld4 ) ∗fpair − > normal[1]

    − ( val a eld1 − val b eld1 )∗fpair − > normal[2])+ val a eld3 − val b eld3 ) ∗ fpair− > normal[1]

    )∗value type (0.5);

    Andreas Klöckner Applied Math · Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications

    Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives

    DG and Metaprogramming

    Loop Slicing on the GPU: A Pattern

    http://find/

  • 8/17/2019 Kloeckner Slides

    61/88

    Loop Slicing on the GPU: A Pattern

    Setting: N independent work units + preparationPreparation

    Question: How should one assign work units to threads?

    Andreas Klöckner Applied Math · Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications

    Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives

    DG and Metaprogramming

    Loop Slicing on the GPU: A Pattern

    http://find/

  • 8/17/2019 Kloeckner Slides

    62/88

    Loop Slicing on the GPU: A Pattern

    Setting: N independent work units + preparationPreparation

    Question: How should one assign work units to threads?

    w s : in sequenceThread

    t

    Andreas Klöckner Applied Math · Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications

    Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives

    DG and Metaprogramming

    Loop Slicing on the GPU: A Pattern

    http://find/

  • 8/17/2019 Kloeckner Slides

    63/88

    Loop Slicing on the GPU: A Pattern

    Setting: N independent work units + preparationPreparation

    Question: How should one assign work units to threads?

    w s : in sequenceThread

    t

    w p : in parallelThread

    t

    Andreas Klöckner Applied Math · Brown UniversityGPU Metaprogramming using PyCUDA: Methods & Applications

    Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives

    DG and Metaprogramming

    Loop Slicing on the GPU: A Pattern

    http://find/

  • 8/17/2019 Kloeckner Slides

    64/88

    Loop Slicing on the GPU: A Pattern

    Setting: N independent work units + preparationPreparation

    Question: How should one assign work units to threads?

    w s : in sequenceThread

    t

    w i : “inline-parallel”Thread

    t

    w p : in parallelThread

    t

    Andreas Klöckner Applied Math · Brown University

    GPU Metaprogramming using PyCUDA: Methods & Applications

    Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives

    DG and Metaprogramming

    Loop Slicing on the GPU: A Pattern

    http://find/

  • 8/17/2019 Kloeckner Slides

    65/88

    Loop Slicing on the GPU: A Pattern

    Setting: N independent work units + preparationPreparation

    Question: How should one assign work units to threads?

    w s : in sequenceThread

    t

    w i : “inline-parallel”Thread

    t

    w p : in parallelThread

    t

    (amortize preparation)Andreas Klöckner Applied Math · Brown University

    GPU Metaprogramming using PyCUDA: Methods & Applications

    Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives

    DG and Metaprogramming

    Loop Slicing on the GPU: A Pattern

    http://find/

  • 8/17/2019 Kloeckner Slides

    66/88

    Loop Slicing on the GPU: A Pattern

    Setting: N independent work units + preparationPreparation

    Question: How should one assign work units to threads?

    w s : in sequenceThread

    t

    w i : “inline-parallel”Thread

    t

    w p : in parallelThread

    t

    (amortize preparation) (exploit register sp ace)Andreas Klöckner Applied Math · Brown University

    GPU Metaprogramming using PyCUDA: Methods & Applications

    http://find/

  • 8/17/2019 Kloeckner Slides

    67/88

    Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives

    Results

    Nvidia GTX280 vs. single core of Intel Core 2 Duo E8400

  • 8/17/2019 Kloeckner Slides

    68/88

    Nvidia GTX280 vs. single core of Intel Core 2 Duo E8400

    2 4 6 8

    P o l y n o m i a l O r d e r N

    0

    5 0

    1 0 0

    1 5 0

    2 0 0

    2 5 0

    3 0 0

    G

    F

    l

    o

    p

    s

    /

    s

    G P U

    C P U

    2 4 6 8

    1 0

    2 0

    3 0

    4 0

    5 0

    6 0

    7 0

    S

    p

    e

    e

    d

    u

    p

    F

    a

    c

    t

    o

    r

    S p e e d u p

    Andreas Klöckner Applied Math · Brown University

    GPU Metaprogramming using PyCUDA: Methods & Applications

    http://find/

  • 8/17/2019 Kloeckner Slides

    69/88

    Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives

    Results

    GPU DG Showcase

  • 8/17/2019 Kloeckner Slides

    70/88

    Eletromagnetism

    Andreas Klöckner Applied Math · Brown University

    GPU Metaprogramming using PyCUDA: Methods & Applications

    Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives

    Results

    GPU DG Showcase

    http://find/

  • 8/17/2019 Kloeckner Slides

    71/88

    Eletromagnetism

    Poisson

    Andreas Klöckner Applied Math · Brown University

    GPU Metaprogramming using PyCUDA: Methods & Applications

    Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives

    Results

    GPU DG Showcase

    http://find/

  • 8/17/2019 Kloeckner Slides

    72/88

    Eletromagnetism

    Poisson

    CFD

    Andreas Klöckner Applied Math · Brown University

    GPU Metaprogramming using PyCUDA: Methods & Applications

    Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives

    Results

    GPU DG Showcase

    http://find/

  • 8/17/2019 Kloeckner Slides

    73/88

    Eletromagnetism

    Poisson

    CFDetc...

    Andreas Klöckner Applied Math · Brown University

    GPU Metaprogramming using PyCUDA: Methods & Applications

    Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives

    Outline

    http://find/

  • 8/17/2019 Kloeckner Slides

    74/88

    1 Why GPU Scripting?

    2 Scripting CUDA

    3 GPU Run-Time Code Generation

    4 DG on GPUs

    5 PerspectivesConclusions

    Andreas Klöckner Applied Math · Brown University

    GPU Metaprogramming using PyCUDA: Methods & Applications

    Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives

    Related Developments

    Introducing. . . PyOpenCL

    http://find/

  • 8/17/2019 Kloeckner Slides

    75/88

    PyOpenCL is“PyCUDA for OpenCL”Complete, mature API wrapper

    Features like PyCUDA: not yetTested on all availableImplementations, OSshttp://mathema.tician.de/

    software/pyopencl

    OpenCL

    Andreas Klöckner Applied Math · Brown University

    GPU Metaprogramming using PyCUDA: Methods & Applications

    Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives

    Related Developments

    Automating GPU Programming

    http://mathema.tician.de/software/pyopenclhttp://mathema.tician.de/software/pyopenclhttp://mathema.tician.de/software/pyopenclhttp://mathema.tician.de/software/pyopenclhttp://find/

  • 8/17/2019 Kloeckner Slides

    76/88

    GPU programming can be time-consuming, unintuitive anderror-prone.

    Obvious idea: Let the computer do it.

    One way: Smart compilers

    Andreas Klöckner Applied Math · Brown University

    GPU Metaprogramming using PyCUDA: Methods & Applications

    Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives

    Related Developments

    Automating GPU Programming

    http://find/

  • 8/17/2019 Kloeckner Slides

    77/88

    GPU programming can be time-consuming, unintuitive anderror-prone.

    Obvious idea: Let the computer do it.

    One way: Smart compilersGPU programming requires complex tradeoffsTradeoffs require heuristicsHeuristics are fragile

    Andreas Klöckner Applied Math · Brown University

    GPU Metaprogramming using PyCUDA: Methods & Applications

    Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives

    Related Developments

    Automating GPU Programming

    http://find/http://goback/

  • 8/17/2019 Kloeckner Slides

    78/88

    GPU programming can be time-consuming, unintuitive anderror-prone.

    Obvious idea: Let the computer do it.

    One way: Smart compilersGPU programming requires complex tradeoffsTradeoffs require heuristicsHeuristics are fragile

    Another way: Dumb enumeration

    Enumerate loop slicingsEnumerate prefetch optionsChoose by running resulting code on actual hardware

    Andreas Klöckner Applied Math · Brown University

    GPU Metaprogramming using PyCUDA: Methods & Applications Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives

    Related Developments

    Loo.py Example

    http://find/http://goback/

  • 8/17/2019 Kloeckner Slides

    79/88

    Empirical GPU loop optimization:a , b , c , i , j , k = [var(s) for s in ”abcijk” ]n = 500k = make loop kernel([

    LoopDimension( ”i” , n),LoopDimension( ”j” , n),LoopDimension( ”k” , n),

    ], [(c[i+n ∗ j], a[ i+n∗k]∗b[k+n ∗ j])])

    gen kwargs = {”min threads” : 128,”min blocks” : 32,}

    → Ideal case: Finds 160 GF/s kernelwithout human intervention.

    Andreas Klöckner Applied Math · Brown University

    GPU Metaprogramming using PyCUDA: Methods & Applications Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives

    Related Developments

    Loo.py Status

    http://find/http://goback/

  • 8/17/2019 Kloeckner Slides

    80/88

    Limited scope:Require input/output separationKernels must be expressible using“loopy” model(i.e. indices decompose into “output”and “reduction”)Enough for DG, LA, FD, . . .

    Andreas Klöckner Applied Math · Brown University

    GPU Metaprogramming using PyCUDA: Methods & Applications Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives

    Related Developments

    Loo.py Status

    http://find/

  • 8/17/2019 Kloeckner Slides

    81/88

    Limited scope:Require input/output separationKernels must be expressible using“loopy” model(i.e. indices decompose into “output”and “reduction”)Enough for DG, LA, FD, . . .

    Kernel compilation limits trial rateNon-Goal: Peak performance

    Good results currently for dense linearalgebra and (some) DG subkernels

    Andreas Klöckner Applied Math · Brown University

    GPU Metaprogramming using PyCUDA: Methods & Applications Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives

    Conclusions

    Conclusions

    http://find/

  • 8/17/2019 Kloeckner Slides

    82/88

    Fun time to be in computational science

    Andreas Klöckner Applied Math · Brown University

    GPU Metaprogramming using PyCUDA: Methods & Applications Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives

    Conclusions

    Conclusions

    http://find/

  • 8/17/2019 Kloeckner Slides

    83/88

    Fun time to be in computational scienceUse Python and PyCUDA to have even more fun :-)

    With no compromise in performance

    Andreas Klöckner Applied Math · Brown University

    GPU Metaprogramming using PyCUDA: Methods & Applications Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives

    Conclusions

    Conclusions

    http://find/

  • 8/17/2019 Kloeckner Slides

    84/88

    Fun time to be in computational scienceUse Python and PyCUDA to have even more fun :-)

    With no compromise in performance

    GPUs and scripting work well togetherEnable Metaprogramming

    Andreas Klöckner Applied Math · Brown University

    GPU Metaprogramming using PyCUDA: Methods & Applications Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives

    Conclusions

    Conclusions

    http://find/

  • 8/17/2019 Kloeckner Slides

    85/88

    Fun time to be in computational scienceUse Python and PyCUDA to have even more fun :-)

    With no compromise in performance

    GPUs and scripting work well togetherEnable MetaprogrammingFurther work in GPU-DG:

    Other equations (Euler, Navier-Stokes)Curvilinear Elements

    Local Time Stepping

    Andreas Klöckner Applied Math · Brown University

    GPU Metaprogramming using PyCUDA: Methods & Applications Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives

    Conclusions

    Where to from here?

    http://find/

  • 8/17/2019 Kloeckner Slides

    86/88

    More at...→ http://mathema.tician.de/

    CUDA-DG

    AK, T. Warburton, J. Bridge, J.S. Hesthaven, “Nodal Discontinuous Galerkin Methods on Graphics Processors” ,J. Comp. Phys., 2009.

    GPU RTCGAK, N. Pinto et al. PyCUDA: GPU Run-Time Code Generation for High-Performance Computing , in prep.

    Andreas Klöckner Applied Math · Brown University

    GPU Metaprogramming using PyCUDA: Methods & Applications Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives

    Conclusions

    Questions?

    http://mathema.tician.de/http://mathema.tician.de/http://find/http://goback/

  • 8/17/2019 Kloeckner Slides

    87/88

    ?

    Thank you for your attention!

    http://mathema.tician.de/

    image credits

    Andreas Klöckner Applied Math · Brown University

    GPU Metaprogramming using PyCUDA: Methods & Applications Why GPU Scripting? Scripting CUDA GPU RTCG DG on GPUs Perspectives

    Conclusions

    Image Credits

    http://mathema.tician.de/http://mathema.tician.de/http://find/

  • 8/17/2019 Kloeckner Slides

    88/88

    Circuitry: ickr.com/oskayC870 GPU: Nvidia Corp.Old Books: ickr.com/ppdigitalOpenCL logo: Ars Technica/Apple Corp.OS Platforms: ickr.com/aOliN.Tk

    Adding Machine: ickr.com/thomashawkFloppy disk: ickr.com/ethanheinMachine: ickr.com/13521837@N00OpenCL logo: Ars Technica/Apple Corp.

    Andreas Klöckner Applied Math · Brown University

    GPU Metaprogramming using PyCUDA: Methods & Applications

    http://find/