gpu-accelerated evaluation platform for high fidelity networking modeling 11 december 2007 alex...

GPU-accelerated Evaluation GPU-accelerated Evaluation Platform for High Fidelity Platform for High Fidelity

Networking ModelingNetworking Modeling

11 December 2007

Alex Donkers

Joost Schutte

ContentsContents

Summary of the paperSummary of the paper

EvaluationEvaluation

QuestionsQuestions

Summary of the paperSummary of the paper

Using commercial graphic cards Using commercial graphic cards

to speed up to speed up execution of network simulation models.execution of network simulation models.

Network simulators Network simulators

high fidelity performance evaluation high fidelity performance evaluation more detailed modelsmore detailed models

higher computation cost higher computation cost speed up technique speed up technique

GPU = graphics processing unitGPU = graphics processing unit

Computational power GPU against CPU widening.Computational power GPU against CPU widening.

Computational power of GPU and CPU

(courtesy of Ian Buck, Standford Univ.)

GPU superior because:GPU superior because:Stream processing model Stream processing model

Spatial parallelismSpatial parallelism

Necessities for GPU usage:Necessities for GPU usage:

Identification data parallelism in network simultions Identification data parallelism in network simultions Software abstractionSoftware abstraction

Goal:Goal:Design evaluation platform architecture Design evaluation platform architecture

Efficient utilisation of computational processors Efficient utilisation of computational processors

of GPUs and CPU, memory, IO and other recources.of GPUs and CPU, memory, IO and other recources.

Available in commodity desktops.Available in commodity desktops.

Commodity desktop equipped with multiple GPUs

With Vidia SLI technology more GPUs in singel system.

Suitability for different types of computation:Suitability for different types of computation:CPU = CPU = high performance on single thread of executionhigh performance on single thread of execution

GPU = GPU = many more arithmetic unitsmany more arithmetic units

extremely high extremely high data parallel and data parallel and

instruction parallel executioninstruction parallel execution

Evaluating process high-fidelity network modeling involves:Evaluating process high-fidelity network modeling involves:task-parallel computation task-parallel computation multi CPUmulti CPU

data-parallel computation data-parallel computation GPUsGPUs

Features necessary for GPU acceleration:Features necessary for GPU acceleration:highly data parallelhighly data parallel

arithmetic-intensive arithmetic-intensive

Power of GPUs is showed by implementing two cases from Power of GPUs is showed by implementing two cases from a network environment in both CPU and GPU.a network environment in both CPU and GPU.

Compared are speed and acurracy of the simulation Compared are speed and acurracy of the simulation results.results.

Two cases:Two cases:

Fluid-flow-based TCP model = Fluid-flow-based TCP model = predicts the traffic dynamics at predicts the traffic dynamics at active queue management active queue management routers.routers.

Adaptive antenna modelAdaptive antenna model = = calculates weight of calculates weight of the beam the beam former in direction former in direction minimizingminimizing mean squared error. mean squared error.

Fluid-flow-based TCP modelFluid-flow-based TCP model

• TCP flows and active queue management Routers TCP flows and active queue management Routers are modelled with Stochastic differential equationsare modelled with Stochastic differential equations

• Transform Stochastic differential equations into Transform Stochastic differential equations into ordinary differential equations (ODEs) for CPU useordinary differential equations (ODEs) for CPU use

• CPU-based implementation uses a ODE solver, CPU-based implementation uses a ODE solver, ODE45, provided in MatlabODE45, provided in Matlab

• GPU maps all data structures in CPU to on-board GPU maps all data structures in CPU to on-board memory in GPUmemory in GPU

Fluid-flow-based TCP modelFluid-flow-based TCP model

• Time varying state of routers require Time varying state of routers require recomputation of ODE solvers periodicallyrecomputation of ODE solvers periodically

• Execution speed of model is significantly Execution speed of model is significantly affected by execution speed of ODE solversaffected by execution speed of ODE solvers

• Implementing ODE solver in GPU can Implementing ODE solver in GPU can significantly increase size of network that can significantly increase size of network that can be evaluatedbe evaluated

Adaptive antenna modelAdaptive antenna model

• recursively updates weights of the recursively updates weights of the beamformers in the direction minimizing mean beamformers in the direction minimizing mean squared error (MSE)squared error (MSE)

• Recursive least squares (RLS) algorithm is Recursive least squares (RLS) algorithm is usedused

• Implement data layout and operations of Implement data layout and operations of arrays of complex numbers in GPUarrays of complex numbers in GPU

EvaluationEvaluation

Strong pointsStrong points

Weak pointsWeak points

Simulation modelsSimulation models

Conclusion & Future workConclusion & Future work

Strong PointsStrong Points

Highly data-parallelHighly data-parallel

Arithmetic-intensive Arithmetic-intensive

Weak PointsWeak Points

Processes constitute largely sequential Processes constitute largely sequential operationsoperations

Processes require bit-wise operationsProcesses require bit-wise operations

Solution: Use DSP platformSolution: Use DSP platform

Real-time simulationReal-time simulation

Evaluation simulation modelsEvaluation simulation models

Hardware Platform:Hardware Platform:Dell Dimension desktopDell Dimension desktop• Intel (dual core) 3GHz Pentium 4 CPU Intel (dual core) 3GHz Pentium 4 CPU

1GB DDR2 memory1GB DDR2 memory• nVidia GeForce 7900GTXnVidia GeForce 7900GTX

512MB texture memory512MB texture memory

Vertex & fragment program: Vertex & fragment program: • programmed with openGL and GLSLprogrammed with openGL and GLSL


Differences between GPU & CPU based Differences between GPU & CPU based simulation for Fluid-flow-based TCP modelsimulation for Fluid-flow-based TCP model

• Difference in prediction of traffic dynamicsDifference in prediction of traffic dynamics

• Difference in execution timeDifference in execution time• GPU outperforms CPU for with 256 flows & 256 GPU outperforms CPU for with 256 flows & 256

queues or more because of larger number of queues or more because of larger number of iterations in GPU based ODE solveriterations in GPU based ODE solver

Normalized ODE solver Evaluation Time


Adaptive antenna modelAdaptive antenna model• GPU-based simulation runs faster than CPU-GPU-based simulation runs faster than CPU-

based one when antenna array size exceeds based one when antenna array size exceeds 256256

• Execution time of GPU-based implementation Execution time of GPU-based implementation linear decreases with respect to the number linear decreases with respect to the number of sub-carriers due to parallel processingof sub-carriers due to parallel processing

Simulation Execution Times

Conclusions & Future workConclusions & Future work

GPU’s can achieve a speedup of 10x GPU’s can achieve a speedup of 10x without loss of accuracywithout loss of accuracyHigh fidelity network simulations can be High fidelity network simulations can be accelerated by parallel use of CPU & GPU accelerated by parallel use of CPU & GPU unitsunits

Integrate GPU-implemented modules into Integrate GPU-implemented modules into existing simulation-based network existing simulation-based network evaluation platform evaluation platform

Questions?Questions?

gpu-accelerated evaluation platform for high fidelity networking modeling 11 december 2007 alex...

Documents