gpu programming overview
DESCRIPTION
GPU Programming Overview. Summer 2005 류승택 . Introduction. GPGPU (General-Purpose Computation on GPUs) The first commodity, programmable parallel architecture GPU evolution driven by computer game market Advantage of data-parallelism GPUs are >10x faster than CPU for appropriate problems - PowerPoint PPT PresentationTRANSCRIPT
GPU ProgrammingGPU ProgrammingOverviewOverview
Summer 2005 류승택
Introduction
■ GPGPU (General-Purpose Computation on GPUs) The first commodity, programmable parallel architecture GPU evolution driven by computer game market Advantage of data-parallelism
• GPUs are >10x faster than CPU for appropriate problems Advantage of commodity
• GPUs are inexpensive• GPUs are Ubiquitous
Desktops, laptops, PDAs, cell phones Achieving this speedup
• Requires a large amount of GPU-specific knowledge
Motivation
■ Challenge Statement GPGPU signifies the dawn of the desktop parallel computing age
Real-time Rendering
■ Realtime Rendering Graphics hardware enables real-time rendering Real-time means display rate at more than 10 images per sec
ond
3D Scene = Collection of 3D primitives
(triangles, lines, points)
Image = Array of pixels
PC Architecture
Bus Interface
■ ISA (Industry Standard Architecture) 버스 인터페이스 90 년대 초반의 XT, AT 시절부터 사용 이론적으로 최대 16Mbps 의 속도 주변기기에서의 병목현상은 심각
• 처리속도가 크게 문제되지 않는 사운드카드나 모뎀등을 연결하는 정도로 쓰이고 있음
■ PCI (Peripheral Component Interconnect) parallel connection ISA 후속으로 주변장치 연결을 위해 사용되고 있는 인터페이스 ISA 슬롯보다 크기가 작고 IRQ 공유 일반적인 32 비트 33MHz 는 133Mbps 의 속도 , 64 비트 66MH
z 는 524Mbps 속도 주변 장치 대부분이 PCI 인터페이스를 사용
ISA
PCIAGP
Bus Interface ■ AGP (Accelerated Graphics Port)
Serial Connection (cheap, scalable) 인텔에 의해 개발 PCI 에 기반을 두고 있으나 전송 속도는 PCI 보다 두배 이상 빠름 기본적으로 66MHz 로 작동 AGP = 2 x PCI (AGP 2x = 2 x AGP)
• AGP 1x 방식일 경우는 최고 264Mbps• AGP 2x 방식에서는 최고 533Mbps
3D 그래픽 카드용■ PCIe (PCI Express)
Serial Connection 최대 8.0 GB/s 의 대역폭 (PCIe = 2 x AGP x 8) 전 세계 그래픽 시장을 책임지고 있는 인텔 / ATI / NVIDIA 가 이 새로운
규격을 차세대 그래픽 인터페이스로 확실하게 인정 기존 PCI 의 제한 때문에 탄생한 그래픽 프로세싱 유닛 (GPUs) 에 독보적
존재였던 AGP 가 PCI Express 로 대체되고 있는 상황
PCI
PCIe x1
PCIe x16
GeForce 7800 GTX (PCIe x16)
PC Graphics Software Architecture
- The application, 3D API and driver are written in C or C++- The vertex and pixel programs are written in a high-level shading language (Cg, DirectX HLSL, OpenGL Shading Language)- Pushbuffer: Contains the commands to be executed on the GPU
Hardware Graphics Pipelines
GPU Fundamentals:The Graphics Pipeline
■ A simplified graphics pipeline Note that pipe widths vary Many caches, FIFOs, and so on not shown
GPUCPU
Application Transform Rasterizer Shade VideoMemory
(Textures)Xformed,Xformed,LitLit
VerticesVertices(2D)(2D)
FragmentsFragments(pre-pixels)(pre-pixels)
FinalFinalpixelspixels
(Color, Depth)(Color, Depth)
Graphics StateGraphics State
Render-to-textureRender-to-texture
VerticesVertices(3D)(3D)
Stream Program => GPU
■ A stream is a sequence of data (could be numbers, colors, RGBA vectors,…)
GPU Fundamentals:The Modern Graphics Pipeline
■ Programmable vertex processor!
■ Programmable pixel processor!
GPUCPU
Application VertexProcessor Rasterizer Pixel
ProcessorVideo
Memory(Textures)Xformed,Xformed,
LitLitVerticesVertices
(2D)(2D)
FragmentsFragments(pre-pixels)(pre-pixels)
FinalFinalpixelspixels
(Color, Depth)(Color, Depth)
Graphics StateGraphics State
Render-to-textureRender-to-texture
FragmentProcessor
VerticesVertices(3D)(3D)
VertexProcessor
GPU Pipeline: Transform
■ Vertex Processor (multiple operate in parallel) Transform from “world space” to “image space” Compute per-vertex lighting
GPU Pipeline: Rasterizer
■ Rasterizer Convert geometric rep. (vertex) to image rep. (fragment)
• Fragment = image fragment Pixel + associated data: color, depth, stencil, etc.
Interpolate per-vertex quantities across pixels
GPU Pipeline: Shade
■ Fragment Processors (multiple in parallel) Compute a color for each pixel Optionally read colors from textures (images)
Programming Graphics HardwareProgramming Graphics Hardware
1995-1998: Texture Mapping and Z-Buffer
- PCI: Peripheral Component Interconnect- 3dfx’s Voodoo
Texture Mapping
Texture Mapping: Perspective-Correct Interpolation
Texture Mapping: Perspective-Correct Interpolation
1998: Multitexturing
- AGP: Accelerated Graphics Port- NVIDIA’s TNT, ATI’s Rage
Multitexturing
Light Mapping
1999-2000: Transform and Lighting
- Register Combiner: Offer many more texture/color combinations- NVIDIA’s Geforce 256 and Geforce2, ATI’s Radeon 7500)
Bump Mapping
Environment Mapping
Environment Mapping
Projective Texture Mapping
2001: Programmable Vertex Shader
- Z-Cull: Predicts which fragments will fail the Z test and discard them- Texture Shader: Offer more texture addressing and operations- NVIDIA’s Geforce3 and Geforce4 Ti, ATI’s Radeon 8500
A programmable processor for any per-vertex computation
Volume Texture Mapping
2002-2003: Programmable Pixel Shader
- MRT: Multiple Render Target- NVIDIA’s Geforce FX, ATI’s Radeon 9600 to 9800
A programmable processorfor any per-pixel computation
Shader: Static vs. Dynamic flow control
■ Static flow control Condition varies per batch of triangles
■ Dynamic flow control Condition varies per vertex or pixel
■ Full flow control Static and dynamic flow control
2004: Shader Model 3.0 and 64 bit Color Support
- PCIe: Peripheral Component Interconnect Express- NVIDIA’s Geforce 6800
VertexIndex
Stream
3D APICommands
AssembledPrimitives
PixelUpdates
PixelLocationStream
ProgrammableFragmentProcessor
Tran
sfor
med
Vert
ices
ProgrammableVertex
Processor
GPUFront End
PrimitiveAssembly
Frame Buffer
RasterOperations
Rasterizationand
Interpolation
3D API:OpenGL orDirect3D
3DApplication
Or Game
Pre-transformed
Vertices
Pre-transformed
Fragments
Tran
sfor
med
Frag
men
ts
GPU
Comm
and &D
ata Stream
CPU-GPU Boundary (AGP/PCIe)
Fixed-function pipeline
VertexIndex
Stream
3D APICommands
AssembledPrimitives
PixelUpdates
PixelLocationStream
ProgrammableFragmentProcessor
Tran
sfor
med
Vert
ices
ProgrammableVertex
Processor
GPUFront End
PrimitiveAssembly
Frame Buffer
RasterOperations
Rasterizationand
Interpolation
3D API:OpenGL orDirect3D
3DApplication
Or Game
Pre-transformed
Vertices
Pre-transformed
Fragments
Tran
sfor
med
Frag
men
ts
GPU
Comm
and &D
ata Stream
CPU-GPU Boundary (AGP/PCIe)
Programmable pipeline
Real-time Tone Mapping■ The image is entirely computed in 64-bit color and tone-mapped f
or display 64-bit color 16 bit floating-point value per channel (R, G, B, A) Tone Mapping
• HDRI(High Dynamic Range Image) low dynamic range device
From low to high exposure image of the same scene
2005: Nvidia Geforce 7800
■ Nvidia Geforce 7800 NVIDIA SLI (Scalable Link Interface) Technology
• Dramatically scales performance by allowing two graphics cards to be run in parallel.
64-Bit Floating Point Texture Filtering and Blending Designed for PCI Express x16 API Support
• Complete DirectX support, including the latest version of Microsoft DirectX 9.0 Shader Model 3.0
• Full OpenGL support, including OpenGL 2.0
Radiosity■ A visual effect that shows how light bounces off of some objects
and contributes to the final lighting of another object
NVIDIA Demo: Mad Mod Mike
The Future
■ Unified general programming model at primitive, vertex and pixel levels
■ Scary amount of: Floating point horsepower Video memory Bandwidth b/w system and video memory
■ Lower chip costs and power requirements to make 3D graphics hardware ubiquitous Automotive (gaming, navigation, head-up displays) Home (remotes, media center, automation) Mobile (PDAs, cell phones)
Programming the GPUProgramming the GPU
The Evolution of GPU Programming Language
Programmable Pipeline
Programmable Pipeline
GPU Programming■ GPU Programming
Low-level Language• Assembler-like• best performance• Platform-dependent• Vertex programming, Fragment programming• Ex) OpenGL extensions, Direct 9
High-level shading language• Easier programming• Easier code reuse• Easier debugging • Easy to read• Ex) Cg, HLSL, GLSL
Assembly vs. High-Level Language
Data Flow through Pipeline
GPU Programming■ GPU Programming
Low-level Language• OpenGL extensions
GL_ARB_vertex_program, GL_ARB_fragment_program• Direct 9
Vertex Shader 2.0, Pixel Shader 2.0 High-level shading language
• Cg “C for Graphics” By Nvidia
• HLSL “High-Level Shading Language”, Part of DirectX 9 (Microsof
t)• GLSL
“OpenGL 2.0 Shading Language”, Proposal by 3D Labs
HLSL and Cg are much more similar to each other than they are to GLSL
Workflow in Cg
Reference■ Reference
Course Note• EG2004• SIGGRAPH2004• VIS2004
David Luebke , General-Purpose Computation on Graphics Hardware Daniel Weiskopf, Basic of GPU-Based Programming Cyril Zeller, Introduction to the Hardware Graphics Pipeline Randy Fernando, Programming the GPU Suresh Venkatasubramanian, GPU Programming and Architecture GPGPU (http://www.gpgpu.org/) GPU Programming
http://euclid.uits.iupui.edu/wiki/index.php/GPU_Programming Shader::Tech http://www.shadertech.com/ Nvidia Developer
http://developer.nvidia.com/object/gpu_programming_guide.html GPGPU DEVELOPER RESOURCES