gpu tutorial 이윤진 computer game 2007 가을 2007 년 11 월 다섯째 주, 12 월 첫째 주
TRANSCRIPT
GPU TutorialGPU Tutorial이윤진
Computer Game 2007 가을2007 년 11 월 다섯째 주 , 12 월 첫째 주
ContentsContentsIntroduction to GPUHigh-level shading languagesGPU applications
Introduction to GPUIntroduction to GPU이윤진
Computer Game 2007 가을2007 년 11 월 26 일
Slide CreditsSlide CreditsMarc Olano (UMBC)
◦ SIGGRAPH 2006 Course notesDavid Luebke (University of Virginia)
◦ SIGGRAPH 2005, 2007 Course notesMark Kilgard (NVIDIA Corporation)
◦ SIGGRAPH 2006 Course notesRudolph Balaz and Sam Glassenberg
(Microsoft Corporation)◦ PDC 05
Randy Fernando and Cyril Zeller (NVIDIA Corporation)◦ I3D 2005
Americas Army
GPUGPUGPU: Graphics Processing Unit
◦Designed for real-time graphics◦Present in almost every PC◦Increasing realism and complexity
Growth of GPU (NVIDIA)Growth of GPU (NVIDIA)
Growth of GPU (NVIDIA)Growth of GPU (NVIDIA)Performance matrices
◦since 2000, the amount of horsepower applied to processing 3D vertices and fragments has been growing at a staggering rate
Computational PowerComputational PowerGPUs are fast…
◦ 3.0 GHz Intel Core2 Duo (Woodcrest Xeon 5160): Computation: 48 GFLOPS peak Memory bandwidth: 21 GB/s peak Price: $874 (chip)
◦ NVIDIA GeForce 8800 GTX: Computation: 330 GFLOPS observed • Memory bandwidth: 55.2 GB/s observed • Price: $599 (board)
GPUs are getting faster, faster◦ CPUs: 1.4× annual growth◦ GPUs: 1.7×(pixels) to 2.3× (vertices) annual
growth
Computational PowerComputational Power
Computational PowerComputational PowerWhy are GPUs getting faster so
fast?◦Arithmetic intensity
the specialized nature of GPUs makes it easier to use additional transistors for computation
◦Economics multi-billion dollar video game market is
a pressure cooker that drives innovation to exploit this property
Flexible and PreciseFlexible and PreciseModern GPUs are deeply
programmable◦Programmable pixel, vertex, and
geometry engines◦Solid high-level language support
Modern GPUs support “real” precision◦32 bit floating point throughout the
pipeline High enough for many (not all) applications Vendors committed to double precision soon
◦DX10-class GPUs add 32-bit integers
GPU Fundamentals: Graphics GPU Fundamentals: Graphics PipelinePipeline
A simplified graphics pipeline◦Note that pipe widths vary◦Many caches, FIFOs, and so on not
shown
GPUCPU
ApplicationApplication Transform& Light
Transform& Light RasterizeRasterize ShadeShade Video
Memory(Textures)
VideoMemory
(Textures)
Xfo
rmed, Lit V
ertice
s (2
D)
Graphics State
Render-to-texture
AssemblePrimitivesAssemblePrimitives
Vertice
s (3
D)
Scre
ensp
ace
triangle
s (2
D)
Fra
gm
ents (p
re-p
ixels)
Fin
al P
ixels (C
olo
r, D
epth
)
GPU
Transform& Light
Transform& Light
CPU
ApplicationApplication RasterizeRasterize ShadeShade VideoMemory
(Textures)
VideoMemory
(Textures)
Xfo
rmed, Lit V
ertice
s (2
D)
Graphics State
Render-to-texture
AssemblePrimitivesAssemblePrimitives
Vertice
s (3
D)
Scre
ensp
ace
triangle
s (2
D)
Fra
gm
ents (p
re-p
ixels)
Fin
al P
ixels (C
olo
r, D
epth
)
GPU Fundamentals: GPU Fundamentals: ModernModern Graphics Graphics PipelinePipeline
Programmable vertex processor!
Programmable pixel processor!
FragmentProcessorFragmentProcessor
VertexProcessor
VertexProcessor
GPUCPU
ApplicationApplication VertexProcessor
VertexProcessor RasterizeRasterize Fragment
ProcessorFragmentProcessor
VideoMemory
(Textures)
VideoMemory
(Textures)
Xfo
rmed, Lit V
ertice
s (2
D)
Graphics State
Render-to-texture
Vertice
s (3
D)
Scre
ensp
ace
triangle
s (2
D)
Fra
gm
ents (p
re-p
ixels)
Fin
al P
ixels (C
olo
r, D
epth
)
GPU Fundamentals: GPU Fundamentals: ModernModern Graphics Graphics PipelinePipeline
AssemblePrimitivesAssemblePrimitives
GeometryProcessorGeometryProcessor
Programmable primitive assembly!
More flexible memory access!
GPU Pipeline: TransformGPU Pipeline: TransformVertex processor (multiple in
parallel)◦Transform from “world space” to
“image space”◦Compute per-vertex lighting
GPU Pipeline: Assemble GPU Pipeline: Assemble PrimitivesPrimitivesGeometry processor
◦How the vertices connect to form a primitive
◦Per-Primitive Operations
GPU Pipeline: RasterizeGPU Pipeline: RasterizeRasterizer
◦Convert geometric rep. (vertex) to image rep. (fragment) Pixel + associated data: color, depth,
stencil, etc.
◦Interpolate per-vertex quantities across pixels
GPU Pipeline: ShadeGPU Pipeline: ShadeFragment processors (multiple in
parallel)◦Compute a color for each pixel◦Optionally read colors from textures
(images)
GPU ParallelismGPU Parallelism
GeForce 7900 GTX
GPU ProgrammingGPU ProgrammingSimplified
computational model◦ consistent as
hardware changesAll stages SIMDFixed conversion /
remapping between each stage
BufferBufferVertex (stream)Vertex (stream)
Geometry(stream)Geometry(stream)
Fragment(array)Fragment(array)
ExampleExampleVertex shader
void main() { gl_FrontColor = gl_Color; gl_Position = gl_ProjectionMatrix * gl_ModelViewMatrix * gl_Vertex; }
Pixel shadervoid main() { gl_FragColor = gl_Color;}
BufferBufferVertex (stream)Vertex (stream)
Geometry(stream)Geometry(stream)
Fragment(array)Fragment(array)
Vertex ShaderVertex ShaderOne element in / one outNo communicationCan select fragment addressInput:
◦ Vertex data (position, normal, color, …)◦ Shader constants, Texture data
Output: ◦ Required: Transformed clip-space position◦ Optional: Colors, texture coordinates, normals
(data you want passed on to the pixel shader)Restrictions:
◦ Can’t create new vertices
Pixel ShaderPixel ShaderBiggest computational resourceOne element in / 0 – 1 outCannot change destination addressNo communicationInput:
◦ Interpolated data from vertex shader ◦ Shader constants, Texture data
Output: ◦ Required: Pixel color (with alpha)◦ Optional: Can write additional colors to
multiple render targetsRestrictions:
◦ Can’t read and write the same texture simultaneously
ExampleExampleVertex shader
void main() {
vec4 v = vec4(gl_Vertex); v.z = 0.0; gl_Position = gl_ProjectionMatrix *
gl_ModelViewMatrix * gl_Vertex; }
Pixel shadervoid main() {
gl_FragColor = vec4(0.8,0.4,0.4,1.0); }
http://www.lighthouse3d.com/opengl/glsl/
Geometry ShaderGeometry ShaderOne element in / 0 to ~100 out
◦ Limited by hardware buffer sizesLike vertex:
◦ No communication◦ Can select fragment address
Input:◦ Entire primitive (point, line, or triangle)◦ Optional: Adjacency
Output:◦ Zero or more primitives (a homogenous list of
points/lines or triangles)Restrictions:
◦ Allow parallel processing but preserve serial order
Geometry ShaderGeometry ShaderApplications
◦Fur/fins, procedural geometry/detailing,
◦Data visualization techniques,◦Wide lines and strokes, …
Multiple PassesMultiple PassesCommunication
◦ None in one pass◦ Arbitrary read
addresses between passes
BufferBufferVertex (stream)Vertex (stream)
Geometry(stream)Geometry(stream)
Fragment(array)Fragment(array)
ExampleExample
Image Space Silhouette Extraction Using Graphics Hardware [Wang 2005]
Depth buffer Normal buffer
Silhouettes Creases
Final result
GPU ApplicationsGPU ApplicationsBump/Displacement mapping
Height mapDiffuse light without bump Diffuse light with bump
GPU ApplicationsGPU ApplicationsVolume texture mapping
GPU ApplicationsGPU ApplicationsCloth simulation
GPU ApplicationsGPU Applications
GPU ApplicationsGPU ApplicationsReal-time renderingImage processingGeneral purpose GPU (GPGPU)…
ContentsContentsIntroduction to GPUHigh level shading languagesGPU applications
GPU ApplicationsGPU ApplicationsSoft Shadows
Percentage-closer soft shadows [Fernando 2005]