transfer learning based gpu accelerated deep learning for end-to-end...
TRANSCRIPT
Dr. Peter Pyun (변경석박사), Principal Solution Architect
Joint work with Dr. Andrew Liu
Transfer learning based GPU
accelerated deep learning for
end-to-end industrial inspection
2
JOIN US FOR 50-MINUTE SPEECH AT NEXT GTC SILICON VALLEY USA
(MARCH 17-19, 2019)
Invitation
3
AGENDA
Industrial Defect Inspection for DAGM Dataset
1. NGC Docker Images
2. DL Model Set up - Unet
3. Data Preparation
4. Training with an NGC Tensorflow Container
5. Defect Segmentation – precision/recall
6. GPU Accelerated Inferencing – TF-TRT & TRT
Automatic Defect Inspection from Real GPU Production Dataset
Looking into the Future: Automating Defects Inspection for Advanced Packaging
4
INDUSTRIAL DEFECT INSPECTION FOR DAGM DATASET
(Nvidia white paper available)
5
INDUSTRIAL DEFECT INSPECTIONHow to apply end-to-end deep learning?
6
NVIDIA END-TO-END DEEP LEARNING PLATFORM
DNN
Data (Curated/Annotated)
DGX Tesla
Nvidia GPU Cloud (NGC) docker container
AI TRAINING @DATA CENTER
TensorRTDRIVE AGX
TensorRT
Optimizer Runtime
Jetson AGX
Tesla/Turing
AI INFERENCING @EDGE
7
NGC DOCKER IMAGES
Benefits for Deep Learning Workflow
High Level Benefits and Feature Set
Single software stack
Develop once, deploy anywhere
Scale across teams of practitioners
Developer, DevOp, QC
Automatic Defect Inspection Workflow
Training Inference
Tensorflow: NGC optimized docker image TF-TRT / TensorRT
NGC tensorflow:18.08-py3 1. NGC tensorflow:18.08-py3
(for TensorFlow-TensorRT)
2. NGC tensorrt:18.08-py3
Pre-Training
Titan VDGX-station
DGX DGX T4/V100
10
MODEL SET UP
11
WAYS FOR DEFECT INSPECTION
Depends on domain adaptation
Image SegmentationObject Detection
Image Classification
Label: 0 / 1 (Defect / Non Defect)
Label: Bounding-Box Label: Polygons Mask
12
FROM LITERATURE: CNN/LENET (2016)
Source: Design of Deep Convolutional Neural Network Architectures for Automated Feature Extraction in Industrial Inspection, D. Weimer et al, 2016
13
FROM LITERATURE CNN/LENET (2016)Coarse segmentation results - can we do better?
Source: Design of Deep Convolutional Neural Network Architectures for Automated Feature Extraction in Industrial Inspection, D. Weimer et al, 2016
51
22
51
22
51
22
1 16 16
16 32 32
32 64 64
64 128 128 256
128 256 256
25
62
25
62
25
62
12
82
12
82
12
82
64
2
64
2
64
2
32
2
32
2
32
2
12
82
12
82
12
82
128 64 64
128 128
64
2
64
2
64
2
64 32 32
25
62
25
62
25
62
51
22
51
22
51
22
32 16 16
3X3 Conv2d+ReLU
2X2 MaxPool
2X2 Conv2dTranspose
copy and concatenate
U-Net structure
15
ENCODING IN KERAS-TF
Convolution
16
deconvolution
DECODING IN KERAS-TF
17
Image segmentation on medical images
Same DL process for different application (i.e., healthcare)
MRI image
Left ventricle
heart disease
Data Science BOWL2016
Data Science BOWL 2017
CT image
Nodule
Lung cancer
Image
Nuclei
Drug discovery
Data Science BOWL2018
18
DATA PREPARATION
19
INDUSTRIAL OPTICAL INSPECTION
DAGM dataset by German Association for Pattern Recognition
• http://resources.mpi-inf.mpg.de/conferences/dagm/2007/prizes.html
20
INDUSTRIAL OPTICAL INSPECTIONDAGM dataset
Pass NG Pass
NGNGPass Pass
NG
21
DATA DETAILS
• Original images are 512 x 512 grayscale format
• Output is a tensor of size 512 x 512 x 1
• Each pixel belongs to one of two classes
• 6 defect classes
• 150 defects & 1000 defect-free images per class
Setting Training set
size
Validation set size Testing set size
Defect/Defect-free
1 100 30 20/1000
22
MORE EXAMPLES
23
TRAINING WITH NGC TENSORFLOW CONTAINER
24
Dice Metric (IOU) for unbalanced dataset
• Metric to compare the similarity of two samples:
2𝐴𝑛𝑙________________________________
𝐴𝑛 + 𝐴𝑙
• Where: • An is the area of the contour predicted by the network • Al is the area of the contour from the label• Anl is the intersection of the two
• More accurately compute how well we’re predicting the contour against the label
25
TRAINING FOR SMALL DATASET25
Check both train & validation loss to prevent overfitting
1. Start with small U-net (with 8 kernels in the 1st layer), then increases
complexity (i.e., doubling kernels)
2. Once model has enough capacity to fit small dataset, then try dropout, L1/L2
regularization.
*trained with binary cross entropy loss & Adam optimizer with learning rate = 0.001
26
APPLICATION: INDUSTRIAL INSPECTION
• Challenges: Not all deviations from
the texture are necessarily defects.
27
DEFECT SEGMENTATION – PRECISION/RECALL
28
FINAL DECISION - SEGMENTATION
29
INFERENCE PIPELINE
decision making(defect vs. non-defect)
Inference
Camera
Machine
P40
DGX Server
TF-TRT &
TensorRT
Fetch image
TF-TRT &
TensorRT
Detectors/Classifiers/Segment Composite Result Metadata
Data Center
/Cloud
Edge
Defect Pattern Ratio Defect Level Defect region size Defect counts
…
Domain Criteria Determine threshold
Precision/ Recall
30
DEFECT VS. NON-DEFECT BY THRESHOLDING
Segmentation model outputs Numpy array of
class probability of each class (i.e. 2 classes –
defect/defect-free)
Thresholding
Declare as defect (white)
if probability is higher
than threshold (i.e., 0.5)
query image 512x512
31
(Example) Precision/Recall diagram
32
(Example) Simple binary anomaly detector
TP: True Positive, FP: False Positive, FN: False Negative, TN: True Negative.
*Red arrow means increasing threshold of probability on defect detection.
Threshold of probability of defect: higher number means harder for classifier to detect as defect class.
Higher threshold: FP lower, precision (TP/(TP+FP)) higher
FN higher, recall (TP/(TP+FN)) lower
33
Precision/Recall Results
threshold 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
TP 137 135 135 135 135 135 135 133 131
TN 885 893 899 899 899 899 899 900 901
FP 16 8 2 2 2 2 2 1 0
FN 1 3 3 3 3 3 3 5 7
FP rate 0.0178 0.0089 0.0023 0.0023 0.0023 0.0023 0.0023 0.0011 0.0000
precision0.8954 0.9441 0.9854 0.9854 0.9854 0.9854 0.9854 0.9925 1.0000
recall 0.9928 0.9783 0.9783 0.9783 0.9783 0.9783 0.9783 0.9638 0.9493
Sweep experiments of thresholds verifies precision/recall trade-off.
Choose threshold per your application needs - precision (FP) or recall (FN) more important ?
Total image: 1039 (defects 138, non-defects: 901)
34
Precision/Recall - reducing false positives
Precision =TP/(TP+FP) : 99.25%
Recall = TP/(TP+FN) : 96.38%
False alarm rate = FP/(FP+TN): 0.11%
*sensitivity=recall=true positive rate,
specificity=true negative rate=TN/(TN+FP),
false alarm rate=false positive rate
Defect Defect-free
Defect 99.25% (TP) 0.75% (FP)
Defect-free 0.55% (FN) 99.45% (TN)
Actual
Predict
35
Final decisionFinal Defect Segmentation
(U-net/thresholding)
36
GPU-ACCELERATED INFERENCING
37
Automatic Defect Inspection Workflow
Training Inference
Tensorflow: NGC optimized docker image TF-TRT / TensorRT
NGC tensorflow:18.08-py3 1. NGC tensorflow:18.08-py3
(for TensorFlow-TensorRT)
2. NGC tensorrt:18.08-py3
Pre-Training
Titan VDGX-station
DGX DGX T4/V100
38
TensorRT workflow
Called UFF, Universal Framework FormatONNX
39
TensorRT IntegratedWith TensorFlow
Speed up TensorFlow model inference with TensorRT with new TensorFlow APIs
Simple API to use TensorRT within TensorFlow easily
Sub-graph optimization with fallback offers flexibility of TensorFlow and optimizations of TensorRT
Optimizations for FP32, FP16 and INT8 with use of Tensor Cores automatically
Speed Up TensorFlow Inference With TensorRT Optimizations
developer.nvidia.com/tensorrt
# Apply TensorRT optimizations
trt_graph = trt.create_inference_graph(frozen_graph_def,
output_node_name,
max_batch_size=batch_size,
max_workspace_size_bytes=workspace_size,
precision_mode=precision)
# INT8 specific graph conversion
trt_graph = trt.calib_graph_to_infer_graph(calibGraph)
Available from TensorFlow 1.7+https://github.com/tensorflow/tensorflow
40
Inference Results
TF-TRT for fast prototyping, TRT for maximum performance
Inference method TF TF-TRT TRT
FP 32 bit images/sec 141.8 236.1 1079.8
perf. Increase 1 1.7 7.6
FP 16 bit* images/sec N/A 297.4 1219.7
perf. Increase 1 2.1 8.6
FP 16 bit*: by mixed precision TensorCore in V100 GPU
41
SUMMARY
Challenges Delivers
Is Deep Learning for industrial
inspection completely a black-box
approach?
Insert tunable parameter(s) for final decision
1. Governs tradeoff between precision and recall.
2. Cost, throughput requirement sets criteria for parameters.
Training, inference environment is hard
to build, maintain, share.
Using NGC Docker images.
Model optimizations and speed up
throughput.
TF-TRT or TensorRT
Too many deep learning model out
there, how to choose the right model?
For small labelled dataset, U-Net model is great choice for
segmentation task, then apply transfer learning.
42
AUTOMATIC DEFECT INSPECTIONFROM REAL GPU PRODUCTION DATASET
Peter Pyun, Andrew Liu, Ryan Shen, Julius Chan, NV operations team at Shenzhen, China
43
AOI & DECISION FLOW
AOI testVerify
by OPNG
OK OK
NG
Pass to next station, no image
stored
Pass to next station, store all images in AOI
machine
Repair
OK NG
EscapeFalse Positives
True Positives
Defect
44
False Positive Stats
Boards: 200Components: 332
Boards: 400Components: 604
Boards: 200Components: 272
Golden Sample
False Positive
MEC21 Y5
Golden Sample
False Positive
Limitation: hand-tuned AOI parameter can not reduce false positives of all components in PCBA
45
WHY FALSE POSITIVES?
Golden Sample False Positives
GLARE
MEC21
46
True Positives (MEC)
MEC17 MEC16
Golden Sample
True Positive
True Positive
Golden Sample
BROKEN
SHIFT
47
WHERE IS MEC21?
48
False Positive Statistics
Capacitors: False Positives (defect-free) = 1608,
True Positives (defects): 60
49
False Positives (Capacitors)
50
True Positives (Capacitors)
51
Data Augmentation (Capacitors)
D0_C42
D0_C131
D0_C201
D0_C511
D0_C1090_3 D1_C1088 D1_C109
D1_C48
D1_C105D0_C1085
52
LOOKING INTO THE FUTURE:
AUTOMATING DEFECTS DETECTION IN ADVANCED PACKAGING
53
Rising Semiconductor Design & Verification Cost
Source: A. Olofsson, Program Manager, DARPA Microsystems Technology Office
* Courtesy of Jerry Chen
54
55
56
Defects in solder balls
57
Defects in micro-Bumps in packaging
58
X-ray Solder Balls (BGA): 3D movie
59
ACKNOWLEDGEMENT
SPECIAL THANKS TO VIRTUAL TEAM MEMBERS @NVIDIA:
JERRY CHEN,OWEN LI,
BOBBY WANG, BILL LIAO,
HOWARD MARKS,RICK COMPTON,TY MCKERCHER,MARC HAMILTON
60
Thank YouFor Your Attention