transfer learning based gpu accelerated deep learning for end-to-end...

Dr. Peter Pyun (변경석박사), Principal Solution Architect

Joint work with Dr. Andrew Liu

Transfer learning based GPU

accelerated deep learning for

end-to-end industrial inspection

2

JOIN US FOR 50-MINUTE SPEECH AT NEXT GTC SILICON VALLEY USA

(MARCH 17-19, 2019)

Invitation

3

AGENDA

Industrial Defect Inspection for DAGM Dataset

1. NGC Docker Images

2. DL Model Set up - Unet

3. Data Preparation

4. Training with an NGC Tensorflow Container

5. Defect Segmentation – precision/recall

6. GPU Accelerated Inferencing – TF-TRT & TRT

Automatic Defect Inspection from Real GPU Production Dataset

Looking into the Future: Automating Defects Inspection for Advanced Packaging

4

INDUSTRIAL DEFECT INSPECTION FOR DAGM DATASET

(Nvidia white paper available)

5

INDUSTRIAL DEFECT INSPECTIONHow to apply end-to-end deep learning?

6

NVIDIA END-TO-END DEEP LEARNING PLATFORM

DNN

Data (Curated/Annotated)

DGX Tesla

Nvidia GPU Cloud (NGC) docker container

AI TRAINING @DATA CENTER

TensorRTDRIVE AGX

TensorRT

Optimizer Runtime

Jetson AGX

Tesla/Turing

AI INFERENCING @EDGE

7

NGC DOCKER IMAGES

Benefits for Deep Learning Workflow

High Level Benefits and Feature Set

Single software stack

Develop once, deploy anywhere

Scale across teams of practitioners

Developer, DevOp, QC

Automatic Defect Inspection Workflow

Training Inference

Tensorflow: NGC optimized docker image TF-TRT / TensorRT

NGC tensorflow:18.08-py3 1. NGC tensorflow:18.08-py3

(for TensorFlow-TensorRT)

2. NGC tensorrt:18.08-py3

Pre-Training

Titan VDGX-station

DGX DGX T4/V100

10

MODEL SET UP

11

WAYS FOR DEFECT INSPECTION

Depends on domain adaptation

Image SegmentationObject Detection

Image Classification

Label: 0 / 1 (Defect / Non Defect)

Label: Bounding-Box Label: Polygons Mask

12

FROM LITERATURE: CNN/LENET (2016)

Source: Design of Deep Convolutional Neural Network Architectures for Automated Feature Extraction in Industrial Inspection, D. Weimer et al, 2016

13

FROM LITERATURE CNN/LENET (2016)Coarse segmentation results - can we do better?

Source: Design of Deep Convolutional Neural Network Architectures for Automated Feature Extraction in Industrial Inspection, D. Weimer et al, 2016

51

22

51

22

51

22

1 16 16

16 32 32

32 64 64

64 128 128 256

128 256 256

25

62

25

62

25

62

12

82

12

82

12

82

64

2

64

2

64

2

32

2

32

2

32

2

12

82

12

82

12

82

128 64 64

128 128

64

2

64

2

64

2

64 32 32

25

62

25

62

25

62

51

22

51

22

51

22

32 16 16

3X3 Conv2d+ReLU

2X2 MaxPool

2X2 Conv2dTranspose

copy and concatenate

U-Net structure

15

ENCODING IN KERAS-TF

Convolution

16

deconvolution

DECODING IN KERAS-TF

17

Image segmentation on medical images

Same DL process for different application (i.e., healthcare)

MRI image

Left ventricle

heart disease

Data Science BOWL2016

Data Science BOWL 2017

CT image

Nodule

Lung cancer

Image

Nuclei

Drug discovery

Data Science BOWL2018

18

DATA PREPARATION

19

INDUSTRIAL OPTICAL INSPECTION

DAGM dataset by German Association for Pattern Recognition

• http://resources.mpi-inf.mpg.de/conferences/dagm/2007/prizes.html

http://resources.mpi-inf.mpg.de/conferences/dagm/2007/prizes.html

20

INDUSTRIAL OPTICAL INSPECTIONDAGM dataset

Pass NG Pass

NGNGPass Pass

NG

21

DATA DETAILS

• Original images are 512 x 512 grayscale format

• Output is a tensor of size 512 x 512 x 1

• Each pixel belongs to one of two classes

• 6 defect classes

• 150 defects & 1000 defect-free images per class

Setting Training set

size

Validation set size Testing set size

Defect/Defect-free

1 100 30 20/1000

22

MORE EXAMPLES

23

TRAINING WITH NGC TENSORFLOW CONTAINER

24

Dice Metric (IOU) for unbalanced dataset

• Metric to compare the similarity of two samples:

2𝐴𝑛𝑙________________________________

𝐴𝑛 + 𝐴𝑙

• Where: • An is the area of the contour predicted by the network • Al is the area of the contour from the label• Anl is the intersection of the two

• More accurately compute how well we’re predicting the contour against the label

25

TRAINING FOR SMALL DATASET25

Check both train & validation loss to prevent overfitting

1. Start with small U-net (with 8 kernels in the 1st layer), then increases

complexity (i.e., doubling kernels)

2. Once model has enough capacity to fit small dataset, then try dropout, L1/L2

regularization.

*trained with binary cross entropy loss & Adam optimizer with learning rate = 0.001

26

APPLICATION: INDUSTRIAL INSPECTION

• Challenges: Not all deviations from

the texture are necessarily defects.

27

DEFECT SEGMENTATION – PRECISION/RECALL

28

FINAL DECISION - SEGMENTATION

29

INFERENCE PIPELINE

decision making(defect vs. non-defect)

Inference

Camera

Machine

P40

DGX Server

TF-TRT &

TensorRT

Fetch image

TF-TRT &

TensorRT

Detectors/Classifiers/Segment Composite Result Metadata

Data Center

/Cloud

Edge

Defect Pattern Ratio Defect Level Defect region size Defect counts

…

Domain Criteria Determine threshold

Precision/ Recall

30

DEFECT VS. NON-DEFECT BY THRESHOLDING

Segmentation model outputs Numpy array of

class probability of each class (i.e. 2 classes –

defect/defect-free)

Thresholding

Declare as defect (white)

if probability is higher

than threshold (i.e., 0.5)

query image 512x512

31

(Example) Precision/Recall diagram

32

(Example) Simple binary anomaly detector

TP: True Positive, FP: False Positive, FN: False Negative, TN: True Negative.

*Red arrow means increasing threshold of probability on defect detection.

Threshold of probability of defect: higher number means harder for classifier to detect as defect class.

Higher threshold: FP lower, precision (TP/(TP+FP)) higher

FN higher, recall (TP/(TP+FN)) lower

33

Precision/Recall Results

threshold 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

TP 137 135 135 135 135 135 135 133 131

TN 885 893 899 899 899 899 899 900 901

FP 16 8 2 2 2 2 2 1 0

FN 1 3 3 3 3 3 3 5 7

FP rate 0.0178 0.0089 0.0023 0.0023 0.0023 0.0023 0.0023 0.0011 0.0000

precision0.8954 0.9441 0.9854 0.9854 0.9854 0.9854 0.9854 0.9925 1.0000

recall 0.9928 0.9783 0.9783 0.9783 0.9783 0.9783 0.9783 0.9638 0.9493

Sweep experiments of thresholds verifies precision/recall trade-off.

Choose threshold per your application needs - precision (FP) or recall (FN) more important ?

Total image: 1039 (defects 138, non-defects: 901)

34

Precision/Recall - reducing false positives

Precision =TP/(TP+FP) : 99.25%

Recall = TP/(TP+FN) : 96.38%

False alarm rate = FP/(FP+TN): 0.11%

*sensitivity=recall=true positive rate,

specificity=true negative rate=TN/(TN+FP),

false alarm rate=false positive rate

Defect Defect-free

Defect 99.25% (TP) 0.75% (FP)

Defect-free 0.55% (FN) 99.45% (TN)

Actual

Predict

35

Final decisionFinal Defect Segmentation

(U-net/thresholding)

36

GPU-ACCELERATED INFERENCING

37

Automatic Defect Inspection Workflow

Training Inference

Tensorflow: NGC optimized docker image TF-TRT / TensorRT

NGC tensorflow:18.08-py3 1. NGC tensorflow:18.08-py3

(for TensorFlow-TensorRT)

2. NGC tensorrt:18.08-py3

Pre-Training

Titan VDGX-station

DGX DGX T4/V100

38

TensorRT workflow

Called UFF, Universal Framework FormatONNX

39

TensorRT IntegratedWith TensorFlow

Speed up TensorFlow model inference with TensorRT with new TensorFlow APIs

Simple API to use TensorRT within TensorFlow easily

Sub-graph optimization with fallback offers flexibility of TensorFlow and optimizations of TensorRT

Optimizations for FP32, FP16 and INT8 with use of Tensor Cores automatically

Speed Up TensorFlow Inference With TensorRT Optimizations

developer.nvidia.com/tensorrt

# Apply TensorRT optimizations

trt_graph = trt.create_inference_graph(frozen_graph_def,

output_node_name,

max_batch_size=batch_size,

max_workspace_size_bytes=workspace_size,

precision_mode=precision)

# INT8 specific graph conversion

trt_graph = trt.calib_graph_to_infer_graph(calibGraph)

Available from TensorFlow 1.7+https://github.com/tensorflow/tensorflow

40

Inference Results

TF-TRT for fast prototyping, TRT for maximum performance

Inference method TF TF-TRT TRT

FP 32 bit images/sec 141.8 236.1 1079.8

perf. Increase 1 1.7 7.6

FP 16 bit* images/sec N/A 297.4 1219.7

perf. Increase 1 2.1 8.6

FP 16 bit*: by mixed precision TensorCore in V100 GPU

41

SUMMARY

Challenges Delivers

Is Deep Learning for industrial

inspection completely a black-box

approach?

Insert tunable parameter(s) for final decision

1. Governs tradeoff between precision and recall.

2. Cost, throughput requirement sets criteria for parameters.

Training, inference environment is hard

to build, maintain, share.

Using NGC Docker images.

Model optimizations and speed up

throughput.

TF-TRT or TensorRT

Too many deep learning model out

there, how to choose the right model?

For small labelled dataset, U-Net model is great choice for

segmentation task, then apply transfer learning.

42

AUTOMATIC DEFECT INSPECTIONFROM REAL GPU PRODUCTION DATASET

Peter Pyun, Andrew Liu, Ryan Shen, Julius Chan, NV operations team at Shenzhen, China

43

AOI & DECISION FLOW

AOI testVerify

by OPNG

OK OK

NG

Pass to next station, no image

stored

Pass to next station, store all images in AOI

machine

Repair

OK NG

EscapeFalse Positives

True Positives

Defect

44

False Positive Stats

Boards: 200Components: 332



Golden Sample

False Positive

MEC21 Y5

Golden Sample

False Positive

Limitation: hand-tuned AOI parameter can not reduce false positives of all components in PCBA

45

WHY FALSE POSITIVES?

Golden Sample False Positives

GLARE

MEC21

46

True Positives (MEC)

MEC17 MEC16

Golden Sample

True Positive

True Positive

Golden Sample

BROKEN

SHIFT

47

WHERE IS MEC21?

48

False Positive Statistics

Capacitors: False Positives (defect-free) = 1608,

True Positives (defects): 60

49

False Positives (Capacitors)

50

True Positives (Capacitors)

51

Data Augmentation (Capacitors)

D0_C42

D0_C131

D0_C201

D0_C511

D0_C1090_3 D1_C1088 D1_C109

D1_C48

D1_C105D0_C1085

52

LOOKING INTO THE FUTURE:

AUTOMATING DEFECTS DETECTION IN ADVANCED PACKAGING

53

Rising Semiconductor Design & Verification Cost

Source: A. Olofsson, Program Manager, DARPA Microsystems Technology Office

* Courtesy of Jerry Chen

56

Defects in solder balls

57

Defects in micro-Bumps in packaging

58

X-ray Solder Balls (BGA): 3D movie

59

ACKNOWLEDGEMENT

SPECIAL THANKS TO VIRTUAL TEAM MEMBERS @NVIDIA:

JERRY CHEN,OWEN LI,

BOBBY WANG, BILL LIAO,

HOWARD MARKS,RICK COMPTON,TY MCKERCHER,MARC HAMILTON

60

Thank YouFor Your Attention

transfer learning based gpu accelerated deep learning for end-to-end...

Documents