Download - CUDA (body)
-
8/3/2019 CUDA (body)
1/12
Compute Unified Device Architecture
Department of Computer Science Page 3
ACKNOWLEDGEMENT
The satisfaction that accompanies the successful completion of any task would be incomplete without
mentioning the people who made it possible and whose constant encouragement and guidance has been
a source of inspiration throughout the course.
I express my gratitude to Dr. H. C. Nagaraj, Principal for his continuous efforts in creating a
competitive environment in college and encouragement through this course.
I express my gratitude to D.r. Nalini N, H.O.D, Department of Computer Science and Engineering for
her help and presence.
I would like to express my deep sense of gratitude to our guide M.s Prathibha Ballal, Assistant
Professor, CSE for providing constant support and motivation in the class as well as outside it
throughout, without which the seminar would not be a reality.
I am thankful to the entire Department Of Computer Science and Engineering for its co-operation and
suggestions.
Also I thank all my friends who have helped me and have proved to be a constant source of support.
Kamal Datta
-
8/3/2019 CUDA (body)
2/12
Compute Unified Device Architecture
Department of Computer Science Page 4
Table of Contents
S No. Description Page No1. Introduction 52. CUDA (Compute Unified Device Architecture) 63. CPU (Central Processing Unit) and GPU(Graphical Processing unit) 84. Comparison (CPU vs. GPU) 105. DirectX support and latest GPUs 126. Conclusion 137. References 14
-
8/3/2019 CUDA (body)
3/12
Compute Unified Device Architecture
Department of Computer Science Page 5
CHAPTER1
INTRODUCTION
If you work with a lot of complex programs on your PC, you need a lot of processing capacity.
For the longest time, you were forced to spend on more powerful CPUs to get better performance.
However, the GPU (Graphics Processing Unit), which can be found on the graphics card, can now
offload that work. Games demand complex processes that need to be carried out in real-time, which
means that a GPU often has to process more than the CPU. Thus, graphics cards are clocked at up to
1000 MHz, have super-fast memory, and up to 2 GB of dedicated RAM. You can hardly ask for a better
co-processor in the system.
Clever programmers, supported by GPU manufacturers, came up with the idea to use the
processing capacity of graphics cards in a different way: for video processing, flow simulations or
market price predictions. Three years ago, NVidia developed CUDA (Compute Unified Device
Architecture), a programming environment through which some program processes can be run on the
graphics chip. Only NVidia chips starting from the GeForce 8000 series onwards supported this. Their
competitor AMD supports the general standard OpenCL, pioneered by a company called the Khronos
Group (which NVidia now also supports), using which you can share your programs workloads on
OpenCL-compatible processors (CPU and GPU). Even Microsoft approved of this development,
equipping the new DirectX 11 instructions set with a new interface (Direct Compute), using which, youcan run program processes on the GPU.
-
8/3/2019 CUDA (body)
4/12
Compute Unified Device Architecture
Department of Computer Science Page 6
CHAPTER2
CUDA (Compute Unified Device Architecture)
CUDA is a parallel computing platform and programming model invented by NVIDIA. Itenables dramatic increases in computing performance by harnessing the power of the graphics
processing unit (GPU). With millions of CUDA-enabled GPUs sold to date, software developers,
scientists and researchers are finding broad-ranging uses for GPU computing with CUDA. Here are a
few examples:
1. Identify hidden plaque in arteries: Heart attacks are the leading cause of death worldwide.Harvard Engineering, Harvard Medical School and Brigham & Women's Hospital have teamed
up to use GPUs to simulate blood flow and identify hidden arterial plaque without invasive
imaging techniques or exploratory surgery.
2. Analyze air traffic flow: The National Airspace System manages the nationwide coordination ofair traffic flow. Computer models help identify new ways to alleviate congestion and keep
airplane traffic moving efficiently. Using the computational power of GPUs, a team at NASA
obtained a large performance gain, reducing analysis time from ten minutes to three seconds.
3. Visualize molecules: A molecular simulation called NAMD (Nano scale molecular dynamics)gets a large performance boost with GPUs. The speed-up is a result of the parallel architecture of
GPUs, which enables NAMD developers to port compute-intensive portions of the application tothe GPU using the CUDA Toolkit.
You're faced with imperatives: Improve performance. Solve a problem more quickly. Parallel
processing would be faster, but the learning curve is steepisn't it? Not anymore. With CUDA, you can
send C, C++ and FORTRAN code straight to GPU, no assembly language required. Developers at
companies such as Adobe, ANSYS, Autodesk, Math Works and Wolfram Research are waking that
sleeping giantthe GPU -- to do general-purpose scientific and engineering computing across a range
of platforms. Using high-level languages, GPU-accelerated applications run the sequential part of their
workload on the CPUwhich is optimized for single-threaded performancewhile accelerating parallel
processing on the GPU. This is called GPU computing.
-
8/3/2019 CUDA (body)
5/12
Compute Unified Device Architecture
Department of Computer Science Page 7
GPU computing is possible because today's GPU does much more than render graphics: It sizzles
with a teraflop of floating point performance and crunches application tasks designed for anything from
finance to medicine. CUDA is widely deployed through thousands of applications and published
research papers and supported by an installed base of over 300 million CUDA-enabled GPUs in
notebooks, workstations, compute clusters and supercomputers.
-
8/3/2019 CUDA (body)
6/12
Compute Unified Device Architecture
Department of Computer Science Page 8
CHAPTER3
CPU (Central processing Unit) and GPU (Graphical Processing Unit)
CPU:
GPUs and CPUs function very differently, and handle different types of processes better. Current
CPUs have up to four cores, with maybe double that number in the form of virtual cores via Hyper
Threading (to use unused processing capacity) to get up to eight processing cores per CPU. Six core/12
thread CPUs will become common soon. This means a large number of processing threads can run in
parallel. CPU cores are developed for general tasks, so they are flexible and can cope with varied
situations. They are also designed to accommodate completely
Different threads in every single clock tick on every single core.
Hyper Threading:
Hyper Threading is a form of simultaneous multithreading .It is used to improve the
parallelization of computation (doing multiple tasks at once). For each processor core that is physically
present, the operating system addresses two virtual processors, and shares the workload between them
when possible. Hyper threading works by duplicating certain sections of the processors those that store
the architectural state (architectural state is the part of the CPU which holds the state of the process
which includes Control registers and general purpose registers) but not duplicating the main execution
resources. This allows hyper threading processor to appear as two logical processors to host the
operating system. When execution resources would not be used by the current task in a processor
without hyper-threading, and especially when the processor is stalled, a hyper-threading equipped
processor can use those execution resources to execute another scheduled task.
-
8/3/2019 CUDA (body)
7/12
Compute Unified Device Architecture
Department of Computer Science Page 9
Graphical Processing Unit:
Today, graphics chips offer up to 240 cores; 40 times greater than modern CPUs, even though
they are conceptually different and cannot handle many different types of tasks. The Radeon 5000 seriesby ATI has up to 1,600 processing units. These cores (also known as stream processors in GPUs) take
on one thread each, but are packed into clusters that can only make use of one processing operation on
the threads they handle. A GPU is therefore unsuitable for complex tasks which require multiple
different types of processing.
However, many programs demand only one operation, such as counting the number of times a
particular word has appeared in a book. It is exactly here that a GPU can show of its awesome total
processing power. The CPU has to start on page 1, go through the text word by word, and stop at the last
page. A GPU, on the other hand, divides the book into many small parts, distributes them to all its cores,
and then simply counts the appearance of the word in a fraction of the time. The actual real-worldprocesses that best use this capability are found in video and scientific editing work. There are no book
pages, but instead repeated additions and multiplications of floating point numbers in big matrices;
always the exact same operation carried out for thousands of numbers.
If a program demands numerous types of processes, a GPU cannot keep up because of slower
clock rates and restrictions in individual processing steps. The flexible cores of the CPU have a strong
advantage at this point. However, if the processes and data packets are very similar, the GPU cores,
which are arranged in parallel masses, get cracking with spectacular results.
The specialization of GPU core design is, however, not the main limitation when
programming software. The greatest difficulty is posed by parallelism. It must be possible to divide a
program into at least 240 parts (or threads) to be able to use 240 cores. These must all be completely
independent of each other so that one thread can be processed parallel to another. In the end, you do not
know which thread is processed when, so even the sequence needs to be irrelevant. Current CPUs also
experience the same problem, but you have only eight threads to grapple with, not 240 or more.
Many programs have problems with eight threads as well, and its true that there is hardly any
software today that makes full use of a current CPU. Many programs cannot be parallelized, or it is
extremely difficult to do so. The reason is the compilers, which are tools that develop working softwarefrom a program code, which basically need to analyze the code and find elements that can be
parallelized. However, this automatic mechanism often fails. If the input of a process depends on the
output of another process and the location at which either of the processes takes place is not certain,
automating a sequence of processes fails.
-
8/3/2019 CUDA (body)
8/12
Compute Unified Device Architecture
Department of Computer Science Page 10
CHAPTER4
Comparison (CPU vs. GPU)
Fig4.1
Utilizing the graphics card power can result in massive speed advantages. If you have an ATI graphics
card of the HD 3000 or HD 4000 series, or an NVidia graphics card from the GeForce 8000 series
onwards, your PC is ready for a massive performance leap. You can check whether your NVidia
graphics card is CUDA compatible with the Cuda-Z tool (http://s ourceforge.net / project s/cu d a-z / ).
You only need suitable software now. Not many free tools are available at present, which benefit from
the power of a graphics card, but you can achieve a massive improvement in the performance by usingpaid ones. Even multimedia tools, especially video and photo editors, in which the same process is
carried out on a lot of independent data, are ideal for running on a GPU. A majority of the programs
which benefit from the power of graphics cards can be found within the scope of video conversion and
processing.
-
8/3/2019 CUDA (body)
9/12
Compute Unified Device Architecture
Department of Computer Science Page 11
Fig4.2
If the requirements are different (i.e. different kinds of tasks to be performed) any dual core processor
would process two tasks parallel. (run two threads simultaneously) using hyper-threading technology ,
whereas in case of a graphical processing unit, it is incapable of processing two threads simultaneously.
The GPU cannot work parallel in case of complex problems and processes the tasks individually at a
much slower speed.
Fig 4.3
When the tasks to be done are identical then the GPU (graphical processing unit) has an advantage. The
dual core CPU will process only two tasks parallel even in case of identical packets. The slower clocked
GPU displays its strength with similar data. Eight shaders would run in parallel in a Graphical
processing unit.
-
8/3/2019 CUDA (body)
10/12
Compute Unified Device Architecture
Department of Computer Science Page 12
CHAPTER5
DirectX Support and Latest GPUs
The power of a graphics card is far from exhausted though. NVidias CUDA supports only one
graphics card right now, but combining the power of several in SLI configurations is in the offing .
DirectX 11 also makes GPUs the secondary processing units. Microsoft has loaded its latest graphics
API with the Direct Compute programming interface. Direct Compute can be used to write programs
which use the power of the graphics cards unified shaders as independent compute units. These
programmable execution units can already be used as pixel, vertex and geometry shaders, but now they
will be able to handle functions outside of gaming and graphics. Since DirectX has been the most
popular programming interface for games since its inception in 1995, it is obviously assumed that Direct
Compute will garner the same fan base amongst programmers. The basic advantage here is that it no
longer makes a difference whether you have an ATI, NVidia or Intel graphics card.
-
8/3/2019 CUDA (body)
11/12
Compute Unified Device Architecture
Department of Computer Science Page 13
CHAPTER6
Conclusion
Last, but not the least, the world is not going to stick with GPUs in their traditional roles only. For a long
time, Intel has been planning its Larrabee project, a processor with several GPU-inspired cores, all ofwhich are as flexible as those in a CPU, and which is said to process graphics as well as programs.
AMDs Fusion initiative is also headed in a similar direction. And both ATI and NVidia already have
plug-in cards, named FireStream and Tesla respectively, which are certainly built with graphics
processors, but are not intended for 3D applications at all (they do not even have video outputs). This is
the start of an era of co-processing. The CPU remains a vital part of the computer, but specialized
processors will take up other tasks, in the form of plug-in cards, additional chips on the motherboard,
and eventually, integrated into the CPU
-
8/3/2019 CUDA (body)
12/12
Compute Unified Device Architecture
Department of Computer Science Page 14
CHAPTER7
References
CHIP ( April 2010) http://www.nvidia.com http://en.wikipedia.org/wiki/CUDA developer.nvidia.com http://www.cuda.com
http://www.nvidia.com/http://www.nvidia.com/http://en.wikipedia.org/wiki/CUDAhttp://en.wikipedia.org/wiki/CUDAhttp://en.wikipedia.org/wiki/CUDAhttp://www.nvidia.com/