walberla::core: an overview
Post on 19-Mar-2022
8 Views
Preview:
TRANSCRIPT
WaLBerla::Core:
An Overview
C. Feichtinger
Chair for System Simulation, University of Erlangen-Nuremberg, Erlangen, Germany
RRZE Seminar
22.6.10
C. Feichtinger Core 0.2
Core 0.2: Implementation details
Outline
Introduction to LBM
Introduction to WaLBerla
Software DesignsSweepsFunctionality ManagementPatches and BlocksParallelization
GPU Performance Study
Implementation Details
C. Feichtinger Core 0.2
Lattice Boltzmann Method
Brief Introduction
Mesoscopic method for CFD simulations
Equivalent to a finite difference Navier-Stokes scheme
Two major steps: Stream step and collision step
fα(xi + eα,iδt, t + δt)− fα(xi , t) = −δt
τ
hfα(xi , t)− f
(eq)α (ρ(xi , t), ui (xi , t))
iρui =
18Xα=0
eα,i · fα ρ =18Xα=0
fα
C. Feichtinger Core 0.2
The Software Framework WaLBerla
Widely Applicable Lattice Boltzmann Solver from Erlangen
Massively Parallel LB Framework
Designed tosupport a wide range of functionalities required by CFD applicationsminimize the integration effort of new functionality
C. Feichtinger Core 0.2
Software Design of WaLBerla::Core
New Design Objectives
Library
Organization of functionality
Heterogeneous computing
Dynamic load balancing
Grid refinement
New data structures
Optimized dynamic simulations
C. Feichtinger Core 0.2
Software Design of WaLBerla::Core
Sweeps: A Kernel Management Concept
Sweep Chain I
Sweep I
Sweep II
Sweep III
Sweep Chain II
Sweep I
Sweep II
Sweep
Preprocessing
Post-processing
CommunicationTiming
Visualization
Timing
Block Sweep
Global Sweep
Sweep Concept
: Iteration : Execution Order : Dependency
(Time loop)
C. Feichtinger Core 0.2
Software Design of WaLBerla::Core
Functionality Management
UID Name Granularity Example
fs Functionality Selector Simulation Gravity on/offhs Hardware Selector Process CPU and/or GPUbs Block Selector Block LBM
Examples
useFunction(LBMSweep_CPU, fsNoFeat, hsCPU, bsPureLBM);
useFunction(LBMSweep_GPU, fsNoFeat, hsGPU, bsPureLBM);
useFunction(LBMSweep_Grav, fsGravity, hsCPU, bsPureLBM);
useFunction(LBMSweep_FreeSurf_Grav, fsGravity, hsCPU, bsFreeSurface);
C. Feichtinger Core 0.2
Software Design of WaLBerla::Core
Patch Data Structure
C. Feichtinger Core 0.2
Software Design of WaLBerla::Core
Patch Data Structures
C. Feichtinger Core 0.2
Software Design of WaLBerla::Core
MPI Parallelization
C. Feichtinger Core 0.2
Software Design of WaLBerla::Core
Data: B = All Blocks allocated on the process
for block ∈ B do1
// Go over all neighboring Blocks
for nBlock ∈ N do2
if nBlock.isAllocated then // nBlock lies on current process3
for data ∈ D do4
sendData = extract(block.data, Direction To nBlock, fs, hs, bs);5
insert(nBlock.data, sendData, Direction To nBlock, fs ,hs, bs);6
end7
end8
//
else // nBlock lies on a different process9
for data ∈ D do10
sendData = extract(block.data, Direction To nBlock, fs, hs, bs);11
sendData.addHeader();12
insert(endBuffer[nBlock.rank], sendData, fs , hs, bs);13
end14
15
end16
end17
Algorithm 1: Data Extraction
C. Feichtinger Core 0.2
Software Design of WaLBerla::Core
Multi-GPU Implementation
C. Feichtinger Core 0.2
Software Design of WaLBerla::Core
Heterogeneous Multi-GPU Implementation
C. Feichtinger Core 0.2
LBM Performance Study
Multi-GPU Performance
C. Feichtinger Core 0.2
LBM Performance Study
Single-GPU Performance
C. Feichtinger Core 0.2
LBM Performance Study
Multi-GPU Performance
C. Feichtinger Core 0.2
LBM Performance Study
Multi-GPU Performance
C. Feichtinger Core 0.2
LBM Performance Study
Heterogeneous Multi-GPU Performance
Blocks GPU: 1 GPU: 22, CPU: 1
Nodes 1 30 1 30 60 90Processes 2 x GPU 60 x GPU 2 x GPU + 60 x GPU + 60 GPU + 60 GPU +
6 x CPU 180 x CPU 420 x CPU 660 x CPU
MFLUPS 476 14480 459 13267 15684 17846
C. Feichtinger Core 0.2
Logging
Modifications to the Logger class (core/src/Logging.h)
Log levels: no log, log info, log progress, log progress detail
Logging macros:Always on: LOG ERROR, LOG WARNING, LOG ASSERT, LOG RESULTInput file activated: LOG INFO, LOG PROGRESSMakefile activated: LOG PROGRESS DETAIL
Logging sectionsLOG INFO SEC(){...LOG INFO();}
Logger only creates a files if logging output has to be written
ROOT PROCCESS sections
C. Feichtinger Core 0.2
Timing
PerfLogger and WallTimeLogger
Can be wrapped around any function
Start/End measuring with begin() and end()
PerfLogger provides also trigger()
WallTimeLogger provides min/avg/max times in parallel simulations
Examples
PerLogger logLoop("Timeloop PerfLogger");
logLoop.begin();
logLoop.trigger();
logLoop.end();
PerLogger logPDF("PDFSweep PerfLogger");
wrapFunction(pdfSweep,logPDF);
C. Feichtinger Core 0.2
Timing
PerfLogger and WallTimeLogger
[RESULT ]------(0.341 sec) -----------------------------------------------------------------Final TimeLoop PerfLogger : 0 MFLUPS, 8.54671 MLUPSTime: 0.137654-----------------------------------------------------------------
[RESULT ]------(0.342 sec) -----------------------------------------------------------------Walltime of Communication :Min: 0.000949621 sec, Max: 0.000949621 sec, Avg: 0.000949621 sec-----------------------------------------------------------------
[RESULT ]------(0.342 sec) -----------------------------------------------------------------Final PDF Logger : 0 MFLUPS, 8.65434 MLUPSTime: 0.135942-----------------------------------------------------------------
C. Feichtinger Core 0.2
Memory Management
Class for Grid based Data: bd::Field<Type T, Uint CellSize>.
C. Feichtinger Core 0.2
top related