gpu programming overview

47
GPU Programming GPU Programming Overview Overview Summer 2005 류류류

Upload: rianne

Post on 11-Feb-2016

29 views

Category:

Documents


0 download

DESCRIPTION

GPU Programming Overview. Summer 2005 류승택 . Introduction. GPGPU (General-Purpose Computation on GPUs) The first commodity, programmable parallel architecture GPU evolution driven by computer game market Advantage of data-parallelism GPUs are >10x faster than CPU for appropriate problems - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: GPU Programming Overview

GPU ProgrammingGPU ProgrammingOverviewOverview

Summer 2005 류승택

Page 2: GPU Programming Overview

Introduction

■ GPGPU (General-Purpose Computation on GPUs) The first commodity, programmable parallel architecture GPU evolution driven by computer game market Advantage of data-parallelism

• GPUs are >10x faster than CPU for appropriate problems Advantage of commodity

• GPUs are inexpensive• GPUs are Ubiquitous

Desktops, laptops, PDAs, cell phones Achieving this speedup

• Requires a large amount of GPU-specific knowledge

Page 3: GPU Programming Overview

Motivation

■ Challenge Statement GPGPU signifies the dawn of the desktop parallel computing age

Page 4: GPU Programming Overview

Real-time Rendering

■ Realtime Rendering Graphics hardware enables real-time rendering Real-time means display rate at more than 10 images per sec

ond

3D Scene = Collection of 3D primitives

(triangles, lines, points)

Image = Array of pixels

Page 5: GPU Programming Overview

PC Architecture

Page 6: GPU Programming Overview

Bus Interface

■ ISA (Industry Standard Architecture) 버스 인터페이스 90 년대 초반의 XT, AT 시절부터 사용 이론적으로 최대 16Mbps 의 속도 주변기기에서의 병목현상은 심각

• 처리속도가 크게 문제되지 않는 사운드카드나 모뎀등을 연결하는 정도로 쓰이고 있음

■ PCI (Peripheral Component Interconnect) parallel connection ISA 후속으로 주변장치 연결을 위해 사용되고 있는 인터페이스 ISA 슬롯보다 크기가 작고 IRQ 공유 일반적인 32 비트 33MHz 는 133Mbps 의 속도 , 64 비트 66MH

z 는 524Mbps 속도 주변 장치 대부분이 PCI 인터페이스를 사용

ISA

PCIAGP

Page 7: GPU Programming Overview

Bus Interface ■ AGP (Accelerated Graphics Port)

Serial Connection (cheap, scalable) 인텔에 의해 개발 PCI 에 기반을 두고 있으나 전송 속도는 PCI 보다 두배 이상 빠름 기본적으로 66MHz 로 작동 AGP = 2 x PCI (AGP 2x = 2 x AGP)

• AGP 1x 방식일 경우는 최고 264Mbps• AGP 2x 방식에서는 최고 533Mbps

3D 그래픽 카드용■ PCIe (PCI Express)

Serial Connection 최대 8.0 GB/s 의 대역폭 (PCIe = 2 x AGP x 8) 전 세계 그래픽 시장을 책임지고 있는 인텔 / ATI / NVIDIA 가 이 새로운

규격을 차세대 그래픽 인터페이스로 확실하게 인정 기존 PCI 의 제한 때문에 탄생한 그래픽 프로세싱 유닛 (GPUs) 에 독보적

존재였던 AGP 가 PCI Express 로 대체되고 있는 상황

PCI

PCIe x1

PCIe x16

GeForce 7800 GTX (PCIe x16)

Page 8: GPU Programming Overview

PC Graphics Software Architecture

- The application, 3D API and driver are written in C or C++- The vertex and pixel programs are written in a high-level shading language (Cg, DirectX HLSL, OpenGL Shading Language)- Pushbuffer: Contains the commands to be executed on the GPU

Page 9: GPU Programming Overview

Hardware Graphics Pipelines

Page 10: GPU Programming Overview

GPU Fundamentals:The Graphics Pipeline

■ A simplified graphics pipeline Note that pipe widths vary Many caches, FIFOs, and so on not shown

GPUCPU

Application Transform Rasterizer Shade VideoMemory

(Textures)Xformed,Xformed,LitLit

VerticesVertices(2D)(2D)

FragmentsFragments(pre-pixels)(pre-pixels)

FinalFinalpixelspixels

(Color, Depth)(Color, Depth)

Graphics StateGraphics State

Render-to-textureRender-to-texture

VerticesVertices(3D)(3D)

Page 11: GPU Programming Overview

Stream Program => GPU

■ A stream is a sequence of data (could be numbers, colors, RGBA vectors,…)

Page 12: GPU Programming Overview

GPU Fundamentals:The Modern Graphics Pipeline

■ Programmable vertex processor!

■ Programmable pixel processor!

GPUCPU

Application VertexProcessor Rasterizer Pixel

ProcessorVideo

Memory(Textures)Xformed,Xformed,

LitLitVerticesVertices

(2D)(2D)

FragmentsFragments(pre-pixels)(pre-pixels)

FinalFinalpixelspixels

(Color, Depth)(Color, Depth)

Graphics StateGraphics State

Render-to-textureRender-to-texture

FragmentProcessor

VerticesVertices(3D)(3D)

VertexProcessor

Page 13: GPU Programming Overview

GPU Pipeline: Transform

■ Vertex Processor (multiple operate in parallel) Transform from “world space” to “image space” Compute per-vertex lighting

Page 14: GPU Programming Overview

GPU Pipeline: Rasterizer

■ Rasterizer Convert geometric rep. (vertex) to image rep. (fragment)

• Fragment = image fragment Pixel + associated data: color, depth, stencil, etc.

Interpolate per-vertex quantities across pixels

Page 15: GPU Programming Overview

GPU Pipeline: Shade

■ Fragment Processors (multiple in parallel) Compute a color for each pixel Optionally read colors from textures (images)

Page 16: GPU Programming Overview

Programming Graphics HardwareProgramming Graphics Hardware

Page 17: GPU Programming Overview

1995-1998: Texture Mapping and Z-Buffer

- PCI: Peripheral Component Interconnect- 3dfx’s Voodoo

Page 18: GPU Programming Overview

Texture Mapping

Page 19: GPU Programming Overview

Texture Mapping: Perspective-Correct Interpolation

Page 20: GPU Programming Overview

Texture Mapping: Perspective-Correct Interpolation

Page 21: GPU Programming Overview

1998: Multitexturing

- AGP: Accelerated Graphics Port- NVIDIA’s TNT, ATI’s Rage

Page 22: GPU Programming Overview

Multitexturing

Light Mapping

Page 23: GPU Programming Overview

1999-2000: Transform and Lighting

- Register Combiner: Offer many more texture/color combinations- NVIDIA’s Geforce 256 and Geforce2, ATI’s Radeon 7500)

Page 24: GPU Programming Overview

Bump Mapping

Page 25: GPU Programming Overview

Environment Mapping

Environment Mapping

Page 26: GPU Programming Overview

Projective Texture Mapping

Page 27: GPU Programming Overview

2001: Programmable Vertex Shader

- Z-Cull: Predicts which fragments will fail the Z test and discard them- Texture Shader: Offer more texture addressing and operations- NVIDIA’s Geforce3 and Geforce4 Ti, ATI’s Radeon 8500

A programmable processor for any per-vertex computation

Page 28: GPU Programming Overview

Volume Texture Mapping

Page 29: GPU Programming Overview

2002-2003: Programmable Pixel Shader

- MRT: Multiple Render Target- NVIDIA’s Geforce FX, ATI’s Radeon 9600 to 9800

A programmable processorfor any per-pixel computation

Page 30: GPU Programming Overview

Shader: Static vs. Dynamic flow control

■ Static flow control Condition varies per batch of triangles

■ Dynamic flow control Condition varies per vertex or pixel

■ Full flow control Static and dynamic flow control

Page 31: GPU Programming Overview

2004: Shader Model 3.0 and 64 bit Color Support

- PCIe: Peripheral Component Interconnect Express- NVIDIA’s Geforce 6800

Page 32: GPU Programming Overview

VertexIndex

Stream

3D APICommands

AssembledPrimitives

PixelUpdates

PixelLocationStream

ProgrammableFragmentProcessor

Tran

sfor

med

Vert

ices

ProgrammableVertex

Processor

GPUFront End

PrimitiveAssembly

Frame Buffer

RasterOperations

Rasterizationand

Interpolation

3D API:OpenGL orDirect3D

3DApplication

Or Game

Pre-transformed

Vertices

Pre-transformed

Fragments

Tran

sfor

med

Frag

men

ts

GPU

Comm

and &D

ata Stream

CPU-GPU Boundary (AGP/PCIe)

Fixed-function pipeline

Page 33: GPU Programming Overview

VertexIndex

Stream

3D APICommands

AssembledPrimitives

PixelUpdates

PixelLocationStream

ProgrammableFragmentProcessor

Tran

sfor

med

Vert

ices

ProgrammableVertex

Processor

GPUFront End

PrimitiveAssembly

Frame Buffer

RasterOperations

Rasterizationand

Interpolation

3D API:OpenGL orDirect3D

3DApplication

Or Game

Pre-transformed

Vertices

Pre-transformed

Fragments

Tran

sfor

med

Frag

men

ts

GPU

Comm

and &D

ata Stream

CPU-GPU Boundary (AGP/PCIe)

Programmable pipeline

Page 34: GPU Programming Overview

Real-time Tone Mapping■ The image is entirely computed in 64-bit color and tone-mapped f

or display 64-bit color 16 bit floating-point value per channel (R, G, B, A) Tone Mapping

• HDRI(High Dynamic Range Image) low dynamic range device

From low to high exposure image of the same scene

Page 35: GPU Programming Overview

2005: Nvidia Geforce 7800

■ Nvidia Geforce 7800 NVIDIA SLI (Scalable Link Interface) Technology

• Dramatically scales performance by allowing two graphics cards to be run in parallel.

64-Bit Floating Point Texture Filtering and Blending Designed for PCI Express x16 API Support

• Complete DirectX support, including the latest version of Microsoft DirectX 9.0 Shader Model 3.0

• Full OpenGL support, including OpenGL 2.0

Page 36: GPU Programming Overview

Radiosity■ A visual effect that shows how light bounces off of some objects

and contributes to the final lighting of another object

NVIDIA Demo: Mad Mod Mike

Page 37: GPU Programming Overview

The Future

■ Unified general programming model at primitive, vertex and pixel levels

■ Scary amount of: Floating point horsepower Video memory Bandwidth b/w system and video memory

■ Lower chip costs and power requirements to make 3D graphics hardware ubiquitous Automotive (gaming, navigation, head-up displays) Home (remotes, media center, automation) Mobile (PDAs, cell phones)

Page 38: GPU Programming Overview

Programming the GPUProgramming the GPU

Page 39: GPU Programming Overview

The Evolution of GPU Programming Language

Page 40: GPU Programming Overview

Programmable Pipeline

Page 41: GPU Programming Overview

Programmable Pipeline

Page 42: GPU Programming Overview

GPU Programming■ GPU Programming

Low-level Language• Assembler-like• best performance• Platform-dependent• Vertex programming, Fragment programming• Ex) OpenGL extensions, Direct 9

High-level shading language• Easier programming• Easier code reuse• Easier debugging • Easy to read• Ex) Cg, HLSL, GLSL

Page 43: GPU Programming Overview

Assembly vs. High-Level Language

Page 44: GPU Programming Overview

Data Flow through Pipeline

Page 45: GPU Programming Overview

GPU Programming■ GPU Programming

Low-level Language• OpenGL extensions

GL_ARB_vertex_program, GL_ARB_fragment_program• Direct 9

Vertex Shader 2.0, Pixel Shader 2.0 High-level shading language

• Cg “C for Graphics” By Nvidia

• HLSL “High-Level Shading Language”, Part of DirectX 9 (Microsof

t)• GLSL

“OpenGL 2.0 Shading Language”, Proposal by 3D Labs

HLSL and Cg are much more similar to each other than they are to GLSL

Page 46: GPU Programming Overview

Workflow in Cg

Page 47: GPU Programming Overview

Reference■ Reference

Course Note• EG2004• SIGGRAPH2004• VIS2004

David Luebke , General-Purpose Computation on Graphics Hardware Daniel Weiskopf, Basic of GPU-Based Programming Cyril Zeller, Introduction to the Hardware Graphics Pipeline Randy Fernando, Programming the GPU Suresh Venkatasubramanian, GPU Programming and Architecture GPGPU (http://www.gpgpu.org/) GPU Programming

http://euclid.uits.iupui.edu/wiki/index.php/GPU_Programming Shader::Tech http://www.shadertech.com/ Nvidia Developer

http://developer.nvidia.com/object/gpu_programming_guide.html GPGPU DEVELOPER RESOURCES