![Page 1: S9925: FAST AI DATA PRE- PROCESSING WITH NVIDIA DALI...Janusz Lisiecki, Michał Zientkiewicz, 2019-03-18 S9925: FAST AI DATA PRE-PROCESSING WITH NVIDIA DALI](https://reader033.vdocuments.pub/reader033/viewer/2022042309/5ed68d2f5cd0d56eef02e4a0/html5/thumbnails/1.jpg)
Janusz Lisiecki, Michał Zientkiewicz, 2019-03-18
S9925: FAST AI DATA PRE-PROCESSING WITH NVIDIA DALI
![Page 2: S9925: FAST AI DATA PRE- PROCESSING WITH NVIDIA DALI...Janusz Lisiecki, Michał Zientkiewicz, 2019-03-18 S9925: FAST AI DATA PRE-PROCESSING WITH NVIDIA DALI](https://reader033.vdocuments.pub/reader033/viewer/2022042309/5ed68d2f5cd0d56eef02e4a0/html5/thumbnails/2.jpg)
Janusz Lisiecki, Michał Zientkiewicz, 2019-03-18
S9925: FAST AI DATA PRE-PROCESSING WITH NVIDIA DALI
![Page 3: S9925: FAST AI DATA PRE- PROCESSING WITH NVIDIA DALI...Janusz Lisiecki, Michał Zientkiewicz, 2019-03-18 S9925: FAST AI DATA PRE-PROCESSING WITH NVIDIA DALI](https://reader033.vdocuments.pub/reader033/viewer/2022042309/5ed68d2f5cd0d56eef02e4a0/html5/thumbnails/3.jpg)
3
THE PROBLEM
![Page 4: S9925: FAST AI DATA PRE- PROCESSING WITH NVIDIA DALI...Janusz Lisiecki, Michał Zientkiewicz, 2019-03-18 S9925: FAST AI DATA PRE-PROCESSING WITH NVIDIA DALI](https://reader033.vdocuments.pub/reader033/viewer/2022042309/5ed68d2f5cd0d56eef02e4a0/html5/thumbnails/4.jpg)
4
CPU BOTTLENECK OF DL TRAINING
Half precision arithmetic, multi-GPU, dense systems are now common (DGX1V, DGX2)
Can’t easily scale CPU cores (expensive, technically challenging)
Falling CPU to GPU ratio:
DGX1V: 40 cores, 8 GPUs, 5 cores/ GPU
DGX2: 48 cores , 16 GPUs , 3 cores/ GPU
CPU : GPU ratio
![Page 5: S9925: FAST AI DATA PRE- PROCESSING WITH NVIDIA DALI...Janusz Lisiecki, Michał Zientkiewicz, 2019-03-18 S9925: FAST AI DATA PRE-PROCESSING WITH NVIDIA DALI](https://reader033.vdocuments.pub/reader033/viewer/2022042309/5ed68d2f5cd0d56eef02e4a0/html5/thumbnails/5.jpg)
5
CPU BOTTLENECK OF DL TRAININGComplexity of I/O pipeline
2015
2012
![Page 6: S9925: FAST AI DATA PRE- PROCESSING WITH NVIDIA DALI...Janusz Lisiecki, Michał Zientkiewicz, 2019-03-18 S9925: FAST AI DATA PRE-PROCESSING WITH NVIDIA DALI](https://reader033.vdocuments.pub/reader033/viewer/2022042309/5ed68d2f5cd0d56eef02e4a0/html5/thumbnails/6.jpg)
6
CPU BOTTLENECK OF DL TRAININGIn practice
Higher is
better
When we put 2x GPU we don’t get adequate perf improvement
Goal: 2x
8GPU 16GPU
![Page 7: S9925: FAST AI DATA PRE- PROCESSING WITH NVIDIA DALI...Janusz Lisiecki, Michał Zientkiewicz, 2019-03-18 S9925: FAST AI DATA PRE-PROCESSING WITH NVIDIA DALI](https://reader033.vdocuments.pub/reader033/viewer/2022042309/5ed68d2f5cd0d56eef02e4a0/html5/thumbnails/7.jpg)
7
CPU BOTTLENECK OF DL TRAININGIn practice
Higher is
better
Reality: < 2x
When we put 2x GPU we don’t get adequate perf improvement
8GPU 16GPU
Goal: 2x
![Page 8: S9925: FAST AI DATA PRE- PROCESSING WITH NVIDIA DALI...Janusz Lisiecki, Michał Zientkiewicz, 2019-03-18 S9925: FAST AI DATA PRE-PROCESSING WITH NVIDIA DALI](https://reader033.vdocuments.pub/reader033/viewer/2022042309/5ed68d2f5cd0d56eef02e4a0/html5/thumbnails/8.jpg)
8
DALI TO THE RESCUE
![Page 9: S9925: FAST AI DATA PRE- PROCESSING WITH NVIDIA DALI...Janusz Lisiecki, Michał Zientkiewicz, 2019-03-18 S9925: FAST AI DATA PRE-PROCESSING WITH NVIDIA DALI](https://reader033.vdocuments.pub/reader033/viewer/2022042309/5ed68d2f5cd0d56eef02e4a0/html5/thumbnails/9.jpg)
9
WHAT IS DALI?High Performance Data Processing Library
![Page 10: S9925: FAST AI DATA PRE- PROCESSING WITH NVIDIA DALI...Janusz Lisiecki, Michał Zientkiewicz, 2019-03-18 S9925: FAST AI DATA PRE-PROCESSING WITH NVIDIA DALI](https://reader033.vdocuments.pub/reader033/viewer/2022042309/5ed68d2f5cd0d56eef02e4a0/html5/thumbnails/10.jpg)
10
DALI RESULTSRN50 MXNet
2x
Higher is
betterHigher is
better
8 GPU 16 GPU
![Page 11: S9925: FAST AI DATA PRE- PROCESSING WITH NVIDIA DALI...Janusz Lisiecki, Michał Zientkiewicz, 2019-03-18 S9925: FAST AI DATA PRE-PROCESSING WITH NVIDIA DALI](https://reader033.vdocuments.pub/reader033/viewer/2022042309/5ed68d2f5cd0d56eef02e4a0/html5/thumbnails/11.jpg)
11
DALI RESULTSRN50 MXNet
2x
Higher is
betterHigher is
better
2x
8 GPU 16 GPU
![Page 12: S9925: FAST AI DATA PRE- PROCESSING WITH NVIDIA DALI...Janusz Lisiecki, Michał Zientkiewicz, 2019-03-18 S9925: FAST AI DATA PRE-PROCESSING WITH NVIDIA DALI](https://reader033.vdocuments.pub/reader033/viewer/2022042309/5ed68d2f5cd0d56eef02e4a0/html5/thumbnails/12.jpg)
12
DALI RESULTSRN50 PyTorch
Higher is
betterHigher is
better
8 GPU 16 GPU
![Page 13: S9925: FAST AI DATA PRE- PROCESSING WITH NVIDIA DALI...Janusz Lisiecki, Michał Zientkiewicz, 2019-03-18 S9925: FAST AI DATA PRE-PROCESSING WITH NVIDIA DALI](https://reader033.vdocuments.pub/reader033/viewer/2022042309/5ed68d2f5cd0d56eef02e4a0/html5/thumbnails/13.jpg)
13
DALI RESULTSRN50 TensorFlow
Higher is
betterHigher is
better
8 GPU 16 GPU
![Page 14: S9925: FAST AI DATA PRE- PROCESSING WITH NVIDIA DALI...Janusz Lisiecki, Michał Zientkiewicz, 2019-03-18 S9925: FAST AI DATA PRE-PROCESSING WITH NVIDIA DALI](https://reader033.vdocuments.pub/reader033/viewer/2022042309/5ed68d2f5cd0d56eef02e4a0/html5/thumbnails/14.jpg)
14
DALI RESULTS - MLPERFPerfect scaling
https://mlperf.org/results
![Page 15: S9925: FAST AI DATA PRE- PROCESSING WITH NVIDIA DALI...Janusz Lisiecki, Michał Zientkiewicz, 2019-03-18 S9925: FAST AI DATA PRE-PROCESSING WITH NVIDIA DALI](https://reader033.vdocuments.pub/reader033/viewer/2022042309/5ed68d2f5cd0d56eef02e4a0/html5/thumbnails/15.jpg)
15
INSIDE DALI
![Page 16: S9925: FAST AI DATA PRE- PROCESSING WITH NVIDIA DALI...Janusz Lisiecki, Michał Zientkiewicz, 2019-03-18 S9925: FAST AI DATA PRE-PROCESSING WITH NVIDIA DALI](https://reader033.vdocuments.pub/reader033/viewer/2022042309/5ed68d2f5cd0d56eef02e4a0/html5/thumbnails/16.jpg)
16
DALI: CURRENT ARCHITECTURE
![Page 17: S9925: FAST AI DATA PRE- PROCESSING WITH NVIDIA DALI...Janusz Lisiecki, Michał Zientkiewicz, 2019-03-18 S9925: FAST AI DATA PRE-PROCESSING WITH NVIDIA DALI](https://reader033.vdocuments.pub/reader033/viewer/2022042309/5ed68d2f5cd0d56eef02e4a0/html5/thumbnails/17.jpg)
17
HOW TO USE DALIDefine Graph
Instantiate operators
def __init__(self, batch_size, num_threads, device_id):
super(SimplePipeline, self).__init__(batch_size, num_threads, device_id)
self.input = ops.FileReader(file_root = image_dir)
self.decode = ops.nvJPEGDecoder(device = "mixed", output_type = types.RGB)
self.resize = ops.Resize(device = "gpu", resize_x = 224, resize_y = 224)
Define graph in imperative way
def define_graph(self):
jpegs, labels = self.input()
images = self.decode(jpegs)
images = self.resize(images)
return (images, labels)
Use it
pipe.build()
images, labels = pipe.run()
![Page 18: S9925: FAST AI DATA PRE- PROCESSING WITH NVIDIA DALI...Janusz Lisiecki, Michał Zientkiewicz, 2019-03-18 S9925: FAST AI DATA PRE-PROCESSING WITH NVIDIA DALI](https://reader033.vdocuments.pub/reader033/viewer/2022042309/5ed68d2f5cd0d56eef02e4a0/html5/thumbnails/18.jpg)
18
Instantiate operators
def __init__(self, batch_size, num_threads, device_id):
super(SimplePipeline, self).__init__(batch_size, num_threads, device_id)
self.input = ops.FileReader(file_root = image_dir)
self.decode = ops.nvJPEGDecoder(device = “mixed”, output_type = types.RGB)
self.resize = ops.Resize(device = "gpu", resize_x = 224, resize_y = 224)
Define graph in imperative way
def define_graph(self):
jpegs, labels = self.input()
images = self.decode(jpegs)
images = self.resize(images)
return (images, labels)
Use it
pipe.build()
images, labels = pipe.run()
HOW TO USE DALIDefine Graph
![Page 19: S9925: FAST AI DATA PRE- PROCESSING WITH NVIDIA DALI...Janusz Lisiecki, Michał Zientkiewicz, 2019-03-18 S9925: FAST AI DATA PRE-PROCESSING WITH NVIDIA DALI](https://reader033.vdocuments.pub/reader033/viewer/2022042309/5ed68d2f5cd0d56eef02e4a0/html5/thumbnails/19.jpg)
19
Instantiate operators
def __init__(self, batch_size, num_threads, device_id):
super(SimplePipeline, self).__init__(batch_size, num_threads, device_id)
self.input = ops.FileReader(file_root = image_dir)
self.decode = ops.nvJPEGDecoder(device = "mixed", output_type = types.RGB)
self.resize = ops.Resize(device = "gpu", resize_x = 224, resize_y = 224)
Define graph in imperative way
def define_graph(self):
jpegs, labels = self.input()
images = self.decode(jpegs)
images = self.resize(images)
return (images, labels)
Use it
pipe.build()
images, labels = pipe.run()
HOW TO USE DALIDefine Graph
![Page 20: S9925: FAST AI DATA PRE- PROCESSING WITH NVIDIA DALI...Janusz Lisiecki, Michał Zientkiewicz, 2019-03-18 S9925: FAST AI DATA PRE-PROCESSING WITH NVIDIA DALI](https://reader033.vdocuments.pub/reader033/viewer/2022042309/5ed68d2f5cd0d56eef02e4a0/html5/thumbnails/20.jpg)
20
Instantiate operators
def __init__(self, batch_size, num_threads, device_id):
super(SimplePipeline, self).__init__(batch_size, num_threads, device_id)
self.input = ops.FileReader(file_root = image_dir)
self.decode = ops.nvJPEGDecoder(device = "mixed", output_type = types.RGB)
self.resize = ops.Resize(device = "gpu", resize_x = 224, resize_y = 224)
Define graph in imperative way
def define_graph(self):
jpegs, labels = self.input()
images = self.decode(jpegs)
images = self.resize(images)
return (images, labels)
Use it
pipe.build()
images, labels = pipe.run()
HOW TO USE DALIDefine Graph
![Page 21: S9925: FAST AI DATA PRE- PROCESSING WITH NVIDIA DALI...Janusz Lisiecki, Michał Zientkiewicz, 2019-03-18 S9925: FAST AI DATA PRE-PROCESSING WITH NVIDIA DALI](https://reader033.vdocuments.pub/reader033/viewer/2022042309/5ed68d2f5cd0d56eef02e4a0/html5/thumbnails/21.jpg)
21
HOW TO USE DALIUse in PyTorch
DALI iterator
dali_pipe = TrainPipe(...)
train_loader = DALIClassificationIterator(dali_pipe)
for i, data in enumerate(train_loader):
input = data[0]["data"]
target = data[0]["label"].squeeze()
(...)
PyTorch DataLoader
train_loader = torch.utils.data.DataLoader(...)
prefetcher = data_prefetcher(train_loader)
input, target = prefetcher.next()
i = -1
while input is not None:
i += 1
(...)
input, target = prefetcher.next()
![Page 22: S9925: FAST AI DATA PRE- PROCESSING WITH NVIDIA DALI...Janusz Lisiecki, Michał Zientkiewicz, 2019-03-18 S9925: FAST AI DATA PRE-PROCESSING WITH NVIDIA DALI](https://reader033.vdocuments.pub/reader033/viewer/2022042309/5ed68d2f5cd0d56eef02e4a0/html5/thumbnails/22.jpg)
22
HOW TO USE DALIUse in MXNet
DALI iteratordali_pipes = [TrainPipe(...) for gpu_id in gpus]
train_data = DALIClassificationIterator(dali_pipe)
for i, batches in enumerate(train_data):
data = [b.data[0] for b in batches]
label = [b.label[0].as_in_context(b.data[0].context) for
b in batches]
(...)
MXNet DataIter and DataBatchtrain_data = SyntheticDataIter(...)
for i, batches in enumerate(train_data):
data = [b.data[0] for b in batches]
label = [b.label[0].as_in_context(b.data[0].context)
for b in batches]
(...)
![Page 23: S9925: FAST AI DATA PRE- PROCESSING WITH NVIDIA DALI...Janusz Lisiecki, Michał Zientkiewicz, 2019-03-18 S9925: FAST AI DATA PRE-PROCESSING WITH NVIDIA DALI](https://reader033.vdocuments.pub/reader033/viewer/2022042309/5ed68d2f5cd0d56eef02e4a0/html5/thumbnails/23.jpg)
23
HOW TO USE DALIUse in TensorFlow
DALI TensorFlow operatordef get_data():
dali_pipe = TrainPipe(...)
daliop = dali_tf.DALIIterator()
with tf.device("/gpu:0"):
img, labels = daliop(pipeline=dali_pipe, ...)
return img, labels
classifier.train(input_fn=get_data,...)
TensorFlow Dataset
def get_data():
ds = tf.data.Dataset.from_tensor_slices(files)
ds.define_operations(...)
return ds
classifier.train(input_fn=get_data,...)
![Page 24: S9925: FAST AI DATA PRE- PROCESSING WITH NVIDIA DALI...Janusz Lisiecki, Michał Zientkiewicz, 2019-03-18 S9925: FAST AI DATA PRE-PROCESSING WITH NVIDIA DALI](https://reader033.vdocuments.pub/reader033/viewer/2022042309/5ed68d2f5cd0d56eef02e4a0/html5/thumbnails/24.jpg)
24
NEW USE CASES
![Page 25: S9925: FAST AI DATA PRE- PROCESSING WITH NVIDIA DALI...Janusz Lisiecki, Michał Zientkiewicz, 2019-03-18 S9925: FAST AI DATA PRE-PROCESSING WITH NVIDIA DALI](https://reader033.vdocuments.pub/reader033/viewer/2022042309/5ed68d2f5cd0d56eef02e4a0/html5/thumbnails/25.jpg)
25
OBJECT DETECTIONSingle Shot Multibox Detector Model (SSD)
Use operators in the DALI graph:
images = self.paste(images, paste_x = px, paste_y = py, ratio = ratio)
bboxes = self.bbpaste(bboxes, paste_x = px, paste_y = py, ratio = ratio)
crop_begin, crop_size, bboxes, labels = self.prospective_crop(bboxes, labels)
images = self.slice(images, crop_begin, crop_size)
images = self.flip(images, horizontal = rng, vertical = rng2)
bboxes = self.bbflip(bboxes, horizontal = rng, vertical = rng2)
return (images, bboxes, labels)
![Page 26: S9925: FAST AI DATA PRE- PROCESSING WITH NVIDIA DALI...Janusz Lisiecki, Michał Zientkiewicz, 2019-03-18 S9925: FAST AI DATA PRE-PROCESSING WITH NVIDIA DALI](https://reader033.vdocuments.pub/reader033/viewer/2022042309/5ed68d2f5cd0d56eef02e4a0/html5/thumbnails/26.jpg)
26
VIDEOVideo Pipeline Example
Instantiate operator:
self.input = ops.VideoReader(device="gpu", filenames=data, sequence_length=len)
Use it in the DALI graph:
frames = self.input(name="Reader")
output_frames = self.Crop(frames)
return output_frames
![Page 27: S9925: FAST AI DATA PRE- PROCESSING WITH NVIDIA DALI...Janusz Lisiecki, Michał Zientkiewicz, 2019-03-18 S9925: FAST AI DATA PRE-PROCESSING WITH NVIDIA DALI](https://reader033.vdocuments.pub/reader033/viewer/2022042309/5ed68d2f5cd0d56eef02e4a0/html5/thumbnails/27.jpg)
27
Instantiate operator:
self.input = ops.VideoReader(file_root = video_files, sequence_length = len, step = step)
self.opticalFlow = ops.OpticalFlow()
self.takeFirst = ops.ElementExtract(element_map = [0])
Use it in the DALI graph:
frames = self.input()
flow = self.opticalFlow(frames)
first = self.takeFirst(frames)
return first, flow
VIDEOOptical Flow Example
DALI+
![Page 28: S9925: FAST AI DATA PRE- PROCESSING WITH NVIDIA DALI...Janusz Lisiecki, Michał Zientkiewicz, 2019-03-18 S9925: FAST AI DATA PRE-PROCESSING WITH NVIDIA DALI](https://reader033.vdocuments.pub/reader033/viewer/2022042309/5ed68d2f5cd0d56eef02e4a0/html5/thumbnails/28.jpg)
28
MAKING LIFE EASIER
![Page 29: S9925: FAST AI DATA PRE- PROCESSING WITH NVIDIA DALI...Janusz Lisiecki, Michał Zientkiewicz, 2019-03-18 S9925: FAST AI DATA PRE-PROCESSING WITH NVIDIA DALI](https://reader033.vdocuments.pub/reader033/viewer/2022042309/5ed68d2f5cd0d56eef02e4a0/html5/thumbnails/29.jpg)
29
MORE EXAMPLES
ResNet50 for PyTorch, MXNet, TensorFlow
How to read data in various frameworks
How to create custom operators
Pipeline for the detection
Video pipeline
More to come...
Documentation available online:
https://docs.nvidia.com/deeplearning/sdk/dali-developer-guide/docs/index.html
Help you get started
![Page 30: S9925: FAST AI DATA PRE- PROCESSING WITH NVIDIA DALI...Janusz Lisiecki, Michał Zientkiewicz, 2019-03-18 S9925: FAST AI DATA PRE-PROCESSING WITH NVIDIA DALI](https://reader033.vdocuments.pub/reader033/viewer/2022042309/5ed68d2f5cd0d56eef02e4a0/html5/thumbnails/30.jpg)
30
PLUGIN MANAGERAdds Extensibility
Create operatortemplate<>
void Dummy<GPUBackend>::RunImpl(DeviceWorkspace *ws, const int idx) {
(...)
}
DALI_REGISTER_OPERATOR(CustomDummy, Dummy<GPUBackend>, GPU);
Load Plugin from pythonimport nvidia.dali.plugin_manager as plugin_manager
plugin_manager.load_library('./customdummy/build/libcustomdummy.so')
ops.CustomDummy(...)
DALI
plugin1.so
plugin2.so
plugin3.so
![Page 31: S9925: FAST AI DATA PRE- PROCESSING WITH NVIDIA DALI...Janusz Lisiecki, Michał Zientkiewicz, 2019-03-18 S9925: FAST AI DATA PRE-PROCESSING WITH NVIDIA DALI](https://reader033.vdocuments.pub/reader033/viewer/2022042309/5ed68d2f5cd0d56eef02e4a0/html5/thumbnails/31.jpg)
31
CHALLENGES
![Page 32: S9925: FAST AI DATA PRE- PROCESSING WITH NVIDIA DALI...Janusz Lisiecki, Michał Zientkiewicz, 2019-03-18 S9925: FAST AI DATA PRE-PROCESSING WITH NVIDIA DALI](https://reader033.vdocuments.pub/reader033/viewer/2022042309/5ed68d2f5cd0d56eef02e4a0/html5/thumbnails/32.jpg)
32
CHALLENGES
Data-dependent random transformation
Object Detection
Random crop
![Page 33: S9925: FAST AI DATA PRE- PROCESSING WITH NVIDIA DALI...Janusz Lisiecki, Michał Zientkiewicz, 2019-03-18 S9925: FAST AI DATA PRE-PROCESSING WITH NVIDIA DALI](https://reader033.vdocuments.pub/reader033/viewer/2022042309/5ed68d2f5cd0d56eef02e4a0/html5/thumbnails/33.jpg)
33
CHALLENGES
More types of data, not only images and labels - bounding boxes as well
Previously only images were processed
Now processing of bounding boxes drives image processing
Object Detection
![Page 34: S9925: FAST AI DATA PRE- PROCESSING WITH NVIDIA DALI...Janusz Lisiecki, Michał Zientkiewicz, 2019-03-18 S9925: FAST AI DATA PRE-PROCESSING WITH NVIDIA DALI](https://reader033.vdocuments.pub/reader033/viewer/2022042309/5ed68d2f5cd0d56eef02e4a0/html5/thumbnails/34.jpg)
34
CHALLENGES
Integrated NVDEC to utilize H.264 and HEVC
Samples are no longer single image - sequence (NFHWC<->NCFHW)
Reuse operators - flatten the sequence
Video
![Page 35: S9925: FAST AI DATA PRE- PROCESSING WITH NVIDIA DALI...Janusz Lisiecki, Michał Zientkiewicz, 2019-03-18 S9925: FAST AI DATA PRE-PROCESSING WITH NVIDIA DALI](https://reader033.vdocuments.pub/reader033/viewer/2022042309/5ed68d2f5cd0d56eef02e4a0/html5/thumbnails/35.jpg)
35
CHALLENGES
CPU/GPU high or network traffic consumes GPU cycles
• CPU operators coverage
Sweet spot for SSD mixed pipeline - part CPU, part GPU
• Test what works best for you
CPU based pipeline
![Page 36: S9925: FAST AI DATA PRE- PROCESSING WITH NVIDIA DALI...Janusz Lisiecki, Michał Zientkiewicz, 2019-03-18 S9925: FAST AI DATA PRE-PROCESSING WITH NVIDIA DALI](https://reader033.vdocuments.pub/reader033/viewer/2022042309/5ed68d2f5cd0d56eef02e4a0/html5/thumbnails/36.jpg)
36
CHALLENGES
DGX - “works for me”
A lot of non-DGX users started using DALI
• Want to use CPU operators
• Memory consumption on the CPU side matters
• Usability more important than speed
Memory Consumption
![Page 37: S9925: FAST AI DATA PRE- PROCESSING WITH NVIDIA DALI...Janusz Lisiecki, Michał Zientkiewicz, 2019-03-18 S9925: FAST AI DATA PRE-PROCESSING WITH NVIDIA DALI](https://reader033.vdocuments.pub/reader033/viewer/2022042309/5ed68d2f5cd0d56eef02e4a0/html5/thumbnails/37.jpg)
37
CHALLENGES
Multiple buffering
...but memory consumption
• Caching allocators?
• Subbatches?
Memory Consumption
![Page 38: S9925: FAST AI DATA PRE- PROCESSING WITH NVIDIA DALI...Janusz Lisiecki, Michał Zientkiewicz, 2019-03-18 S9925: FAST AI DATA PRE-PROCESSING WITH NVIDIA DALI](https://reader033.vdocuments.pub/reader033/viewer/2022042309/5ed68d2f5cd0d56eef02e4a0/html5/thumbnails/38.jpg)
38
CHALLENGES
Significant image decoding time
• CPU decoding already pushed to the limits
Can we do better?
• nvJPEG - huge improvement
• ROI decoding
Decoding Time
![Page 39: S9925: FAST AI DATA PRE- PROCESSING WITH NVIDIA DALI...Janusz Lisiecki, Michał Zientkiewicz, 2019-03-18 S9925: FAST AI DATA PRE-PROCESSING WITH NVIDIA DALI](https://reader033.vdocuments.pub/reader033/viewer/2022042309/5ed68d2f5cd0d56eef02e4a0/html5/thumbnails/39.jpg)
39
CHALLENGES
PyTorch and MXNet integration
• Python API - “easy-peasy”
TensorFlow - custom operator needed
• Frequent changes to TensorFlow C++ API
• Cannot preserve forward compatibility at the binary level
• DALI TF plug-in package is now available - compile your TensorFlow DALI op
TensorFlow Forward Compatibility
![Page 40: S9925: FAST AI DATA PRE- PROCESSING WITH NVIDIA DALI...Janusz Lisiecki, Michał Zientkiewicz, 2019-03-18 S9925: FAST AI DATA PRE-PROCESSING WITH NVIDIA DALI](https://reader033.vdocuments.pub/reader033/viewer/2022042309/5ed68d2f5cd0d56eef02e4a0/html5/thumbnails/40.jpg)
40
CHALLENGESDiscrepancies Between Frameworks
https://hackernoon.com/how-tensorflows-tf-image-resize-stole-60-days-of-my-life-aba5eb093f35
Bilinear filter – OpenCV vs Pillow Bicubic filter – TensorFlow vs PIllow
![Page 41: S9925: FAST AI DATA PRE- PROCESSING WITH NVIDIA DALI...Janusz Lisiecki, Michał Zientkiewicz, 2019-03-18 S9925: FAST AI DATA PRE-PROCESSING WITH NVIDIA DALI](https://reader033.vdocuments.pub/reader033/viewer/2022042309/5ed68d2f5cd0d56eef02e4a0/html5/thumbnails/41.jpg)
41
CHALLENGES
• MXNet is based on OpenCV
• PyTorch uses Pillow
• TensorFlow has its own augmentation operators
We want portability between frameworks,
but what about pre-trained models?
Discrepancies Between Frameworks
https://github.com/python-pillow/Pillow/issues/2718
![Page 42: S9925: FAST AI DATA PRE- PROCESSING WITH NVIDIA DALI...Janusz Lisiecki, Michał Zientkiewicz, 2019-03-18 S9925: FAST AI DATA PRE-PROCESSING WITH NVIDIA DALI](https://reader033.vdocuments.pub/reader033/viewer/2022042309/5ed68d2f5cd0d56eef02e4a0/html5/thumbnails/42.jpg)
42
NEXT STEPS
![Page 43: S9925: FAST AI DATA PRE- PROCESSING WITH NVIDIA DALI...Janusz Lisiecki, Michał Zientkiewicz, 2019-03-18 S9925: FAST AI DATA PRE-PROCESSING WITH NVIDIA DALI](https://reader033.vdocuments.pub/reader033/viewer/2022042309/5ed68d2f5cd0d56eef02e4a0/html5/thumbnails/43.jpg)
43
NEW USE CASES
Medical imaging (Volumetric data)
• Performant 3D augmentations library
Segmentation?
![Page 44: S9925: FAST AI DATA PRE- PROCESSING WITH NVIDIA DALI...Janusz Lisiecki, Michał Zientkiewicz, 2019-03-18 S9925: FAST AI DATA PRE-PROCESSING WITH NVIDIA DALI](https://reader033.vdocuments.pub/reader033/viewer/2022042309/5ed68d2f5cd0d56eef02e4a0/html5/thumbnails/44.jpg)
44
NEW USE CASES
Extract augmentation operators in a separate library
• Inference - the same augmentation operation can be used in custom inference pipeline where full feature DALI is not required (i.e. embedded platform)
• Ability to use operator directly from Python code
import nvidia.dali.standaloneOps as standaloneOps
import cv2
image = cv2.imread('test.jpg',0)
standaloneOps.Rotate(image, device="gpu", angle=45, interp_type = types.INTERP_LINEAR)
cv2.imwrite("./img_tf.png", image)
![Page 45: S9925: FAST AI DATA PRE- PROCESSING WITH NVIDIA DALI...Janusz Lisiecki, Michał Zientkiewicz, 2019-03-18 S9925: FAST AI DATA PRE-PROCESSING WITH NVIDIA DALI](https://reader033.vdocuments.pub/reader033/viewer/2022042309/5ed68d2f5cd0d56eef02e4a0/html5/thumbnails/45.jpg)
45
DALI
Open source, GPU-accelerated data augmentation and image loading library
Over 1100 GitHub stars
1) Top 50 ML/DL Projects (out of 22,000 in 2018)
Full pre-processing data pipeline ready for training and inference
Easy framework integration
Portable training workflows
1) https://github.com/Mybridge/amazing-machine-learning-opensource-2019
Summary
![Page 46: S9925: FAST AI DATA PRE- PROCESSING WITH NVIDIA DALI...Janusz Lisiecki, Michał Zientkiewicz, 2019-03-18 S9925: FAST AI DATA PRE-PROCESSING WITH NVIDIA DALI](https://reader033.vdocuments.pub/reader033/viewer/2022042309/5ed68d2f5cd0d56eef02e4a0/html5/thumbnails/46.jpg)
![Page 47: S9925: FAST AI DATA PRE- PROCESSING WITH NVIDIA DALI...Janusz Lisiecki, Michał Zientkiewicz, 2019-03-18 S9925: FAST AI DATA PRE-PROCESSING WITH NVIDIA DALI](https://reader033.vdocuments.pub/reader033/viewer/2022042309/5ed68d2f5cd0d56eef02e4a0/html5/thumbnails/47.jpg)
47
DALI
More questions? Connect with Experts Sessions: DALI Tue 19th, Wed 20th, 2pm (Expo Hall)
Meet us P9291 - Fast Data Pre-processing with DALI (Mon 18th, 6-8pm)
Attend S9818 - TensorRT with DALI on Xavier to learn about TensorRT inference workflow with DALI graphs and customer operators
We want to hear from you
https://github.com/NVIDIA/DALI
https://developer.nvidia.com/dali
Summary
![Page 48: S9925: FAST AI DATA PRE- PROCESSING WITH NVIDIA DALI...Janusz Lisiecki, Michał Zientkiewicz, 2019-03-18 S9925: FAST AI DATA PRE-PROCESSING WITH NVIDIA DALI](https://reader033.vdocuments.pub/reader033/viewer/2022042309/5ed68d2f5cd0d56eef02e4a0/html5/thumbnails/48.jpg)
48
ACKNOWLEDGEMENTS
Joaquin Anton
Trevor Gale
Andrei Ivanov
Simon Layton
Krzysztof Łęcki
Serge Panev
Michał Szołucha
Przemek Trędak
Albert Wolant
Pablo Ribalta
Cliff Woolley
DL Frameworks @ NVIDIA
![Page 49: S9925: FAST AI DATA PRE- PROCESSING WITH NVIDIA DALI...Janusz Lisiecki, Michał Zientkiewicz, 2019-03-18 S9925: FAST AI DATA PRE-PROCESSING WITH NVIDIA DALI](https://reader033.vdocuments.pub/reader033/viewer/2022042309/5ed68d2f5cd0d56eef02e4a0/html5/thumbnails/49.jpg)