aiia dnn benchmarkaiiaorg.cn/uploadfile/2019/0709/20190709091740147.pdf · 6 face detection xilinx...

30
DNN processor benchmark for Inference at the edge 基于端侧推断任务的深度神经网络处理器基准测试 第二轮评估结果发布 2019.6.28 南京 AIIA DNN benchmark V0.5b evaluation results

Upload: others

Post on 10-Oct-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: AIIA DNN benchmarkaiiaorg.cn/uploadfile/2019/0709/20190709091740147.pdf · 6 Face detection Xilinx Data DenseBox fps caffe Xilinx b UniSoc T710 参测场景 a:首轮评估模型 b:第二轮评估增加模型

DNN processor benchmark for Inference at the edge基于端侧推断任务的深度神经网络处理器基准测试 第二轮评估结果发布

2019.6.28 南京

AIIA DNN benchmarkV0.5b evaluation results

Page 2: AIIA DNN benchmarkaiiaorg.cn/uploadfile/2019/0709/20190709091740147.pdf · 6 Face detection Xilinx Data DenseBox fps caffe Xilinx b UniSoc T710 参测场景 a:首轮评估模型 b:第二轮评估增加模型

I. AIIA DNN Benchmark简介 About AIIA DNN Benchmark

II. V0.5版本评估方案简介 Introduction of Version 0.5

III. 评测指标及场景 Metrics and scenarios

IV. 致谢 Acknowledgement

V. v0.5第二轮 评测结果发布 v0.5b Results

VI. 结果分析 Interpretation

Content

Page 3: AIIA DNN benchmarkaiiaorg.cn/uploadfile/2019/0709/20190709091740147.pdf · 6 Face detection Xilinx Data DenseBox fps caffe Xilinx b UniSoc T710 参测场景 a:首轮评估模型 b:第二轮评估增加模型

I AIIA DNN benchmarkAbout us: Provide selection reference for application companies, and provide third-party evaluation results for chip companies.关于我们:为应用企业提供选型参考,为芯片企业提供第三方评测结果Aims: The goal of AIIA DNN benchmarks is to objectively reflect the current state of AI accelerator capabilities, and all metrics are designed to provide an objective comparison dimension. 目标:在芯片发展过程中,基于清晰指标的技术竞争可以帮助企业快速进步。AIIA DNN benchmark致力于客观地反应AI加速器能力现状,所有度量指标旨在提供客观的比对维度Evaluation method: step-by-step, version iterations, training and inference, terminal and cloud 工作方式:「版本迭代、不断丰富、不断完善」,训练+推断,端+云

Page 4: AIIA DNN benchmarkaiiaorg.cn/uploadfile/2019/0709/20190709091740147.pdf · 6 Face detection Xilinx Data DenseBox fps caffe Xilinx b UniSoc T710 参测场景 a:首轮评估模型 b:第二轮评估增加模型

I AIIA DNN benchmark-Work already done

已制定两套评估规范、完成两轮端侧评估评测工作

2018.122018.10 2019.3Release edge/inference evaluation method V0.5发布端侧v0.5版本评估方案

Release AIIA DNN benchmark v0.5 Edge/inference first evaluation result发布端侧v0.5版本首轮评估结果

start up AIIA DNN benchmark v0.5 edge/inference first evaluation启动端侧v0.5版本首轮评估

2019.4

start up AIIA DNN benchmark v0.5 edge/inference second evaluation启动端侧v0.5版本第二轮评估

2019.5.27 2019.6.28…

Release AIIA DNN benchmark v0.5 Edge/inference second evaluation result发布端侧v0.5第二轮评估结果

Release cloud/inference evaluation method V0.5发布云端推断v0.5版本评估方案

Page 5: AIIA DNN benchmarkaiiaorg.cn/uploadfile/2019/0709/20190709091740147.pdf · 6 Face detection Xilinx Data DenseBox fps caffe Xilinx b UniSoc T710 参测场景 a:首轮评估模型 b:第二轮评估增加模型

Device

start

model

AIIA DNN benchmarkV0.5 Tools

II Version 0.5:Evaluation methods of DNN processor benchmark for Inference at the edge0.5版本工具已支持Android & Linux系统

Page 6: AIIA DNN benchmarkaiiaorg.cn/uploadfile/2019/0709/20190709091740147.pdf · 6 Face detection Xilinx Data DenseBox fps caffe Xilinx b UniSoc T710 参测场景 a:首轮评估模型 b:第二轮评估增加模型

Classification分类

Object recognition目标识别

Super-Resolution超分辨率

Semantic segmentation语义分割

Face Recognition 人脸识别

Face Detection 人脸识别

Two evaluation metrics两大类关键评测指标

Six typical application scenario六种典型应用场景

Eighteen network models18种网络模型

III Metrics and scenarios 评测指标及应用情景

Page 7: AIIA DNN benchmarkaiiaorg.cn/uploadfile/2019/0709/20190709091740147.pdf · 6 Face detection Xilinx Data DenseBox fps caffe Xilinx b UniSoc T710 参测场景 a:首轮评估模型 b:第二轮评估增加模型

No Application scenarios Test data network metrics framework Source State

1 Classification ImageNet

MobileNet_v1

fps, top1,top5

TensorFlow Qualcomm b

MobileNet_v2

caffe AIIA aTensorFlow AIIA

btflite Imaginationcaffe Xilinx

Resnet101 caffe AIIA aTensorFlow AIIA b

VGG16 caffe AIIA aTensorFlow AIIA bTensorFlow AIIA a

Inception_v3 TensorFlow Qualcommb

caffe Xilinx

2 Object recognition VOC2012

SSD_VGG16

fps, mAP

caffe AIIA aSSD_VGG caffe ARM b

ssd_mobilenet_v1 caffe AIIA aTensorFlow Qualcomm b

ssd_mobilenet_v2 caffe AIIA aSSD TensorFlow Xilinx b

3 Super-Resolution 2017CVPRvdsr

fps, PSNRcaffe AIIA a

TensorFlow QualcommbVGG19 TFlite Imagination

4 Semantic segmentation VOC2012Deeplabv3+

fps,mIoU

TensorFlow AIIA aTensorFlow Qualcomm b

FCN caffe AIIA aFPN caffe Xilinx b

5 Face recognition FLW Light CNN fps, accuracy caffe ARM bInception-ResNet-v1 TensorFlow AIIA b

6 Face detection Xilinx Data DenseBox fps caffe Xilinx b

III Metrics and scenariosV0.5版本相较首轮增加两类应用场景+九种网络模型

a:首轮评估模型 b:第二轮评估增加模型

Page 8: AIIA DNN benchmarkaiiaorg.cn/uploadfile/2019/0709/20190709091740147.pdf · 6 Face detection Xilinx Data DenseBox fps caffe Xilinx b UniSoc T710 参测场景 a:首轮评估模型 b:第二轮评估增加模型

IV AIIA DNN Benchmark tools were mainly supported by : 感谢在评测过程中给予大力支持的20余家企业及机构

HK 南京华科广发

Page 9: AIIA DNN benchmarkaiiaorg.cn/uploadfile/2019/0709/20190709091740147.pdf · 6 Face detection Xilinx Data DenseBox fps caffe Xilinx b UniSoc T710 参测场景 a:首轮评估模型 b:第二轮评估增加模型

V. Version 0.5b Results

Page 10: AIIA DNN benchmarkaiiaorg.cn/uploadfile/2019/0709/20190709091740147.pdf · 6 Face detection Xilinx Data DenseBox fps caffe Xilinx b UniSoc T710 参测场景 a:首轮评估模型 b:第二轮评估增加模型

Key words

推断任务 端侧 区分整型与浮点

Inference at the edge int8 fp16 fp32

行业应用

0.01 0.1 1 10 100

安防摄像头机器人 IOT 手机 自动驾驶

Log(Power)(W)

Page 11: AIIA DNN benchmarkaiiaorg.cn/uploadfile/2019/0709/20190709091740147.pdf · 6 Face detection Xilinx Data DenseBox fps caffe Xilinx b UniSoc T710 参测场景 a:首轮评估模型 b:第二轮评估增加模型

DUT1 Information芯片基本信息披露Mobile phone SOC(UniSoc T710) 工程机

processor UniSoc T710

description mobile phone SOC

process TSMC 12FFC

CPU 4 (Cortex-A75) + 4 (Cortex-A55)

NNA(NPU) Imagination PowerVR AX2185

GPU Imagination PowerVR GM9446

interface PCIE3.0, USB3.0, UFS2.1

system Android Ubuntu

supported mobile framework TensorFlow, Caffe, ONNX

year 2018

Page 12: AIIA DNN benchmarkaiiaorg.cn/uploadfile/2019/0709/20190709091740147.pdf · 6 Face detection Xilinx Data DenseBox fps caffe Xilinx b UniSoc T710 参测场景 a:首轮评估模型 b:第二轮评估增加模型

No Application scenarios Test data network metrics framework Source State

1 Classification ImageNet

MobileNet_v1

fps, top1,top5

TensorFlow Qualcomm b

MobileNet_v2

caffe AIIA aTensorFlow AIIA

btflite Imaginationcaffe Xilinx

Resnet101 caffe AIIA aTensorFlow AIIA b

VGG16 caffe AIIA aTensorFlow AIIA bTensorFlow AIIA a

Inception_v3 caffe Xilinx b

2 Object recognition VOC2012

SSD_VGG16

fps, mAP

caffe AIIA aSSD_VGG caffe ARM b

ssd_mobilenet_v1 caffe AIIA aTensorFlow Qualcomm b

ssd_mobilenet_v2 caffe AIIA aSSD TensorFlow Xilinx b

3 Super-Resolution 2017CVPRVDSR

fps, PSNRcaffe AIIA a

TensorFlow QualcommbVGG19 TFlite Imagination

4 Semantic segmentation VOC2012DeepLabv3+

fps,mIoU

TensorFlow AIIA aTensorFlow Qualcomm b

FCN caffe AIIA aFPN caffe Xilinx b

5 Face recognition FLW Light CNN fps, accuracy caffe ARM bInception-ResNet-v1 TensorFlow AIIA b

6 Face detection Xilinx Data DenseBox fps caffe Xilinx b

UniSoc T710 参测场景

a:首轮评估模型 b:第二轮评估增加模型

两类场景的五种模型的两种加速方式(PowerVR NN /AndroidNN)

Page 13: AIIA DNN benchmarkaiiaorg.cn/uploadfile/2019/0709/20190709091740147.pdf · 6 Face detection Xilinx Data DenseBox fps caffe Xilinx b UniSoc T710 参测场景 a:首轮评估模型 b:第二轮评估增加模型

Completely standalone H/W accelerator All key layers are fully accelerated硬件加速目前主流的Layer,Industry leading performanceInt8/16 的端测推理性能优异

The Top1/Top5 accurate are kept well from INT16 to INT8 . 基于自带方案在提供优异性能的同时保持TOP1/TOP5精度稳定

UniSoc T710

Page 14: AIIA DNN benchmarkaiiaorg.cn/uploadfile/2019/0709/20190709091740147.pdf · 6 Face detection Xilinx Data DenseBox fps caffe Xilinx b UniSoc T710 参测场景 a:首轮评估模型 b:第二轮评估增加模型

0 50 100 150 200

MobileNet v2IMG NNA Tools VS Android NN API

Android NN API INT8 IMG NNA Tools INT8

Test cases NetWork FPS Accurate/PSNR

INPUT API

Face recogniation Inception_resnet_v1_quant8

46.7 88.8% 160x160 Android NN

Super_Resolution VGG19-Quant8 10.24 58.25(PSNR)

192x192 Android NN

Object_Classification MobileNetV2_quant8 108.28 85.50%(Top5)

224x224 Android NN

Wide network layer support and well Android NN API support

Offline tools to support network productisationSupport for conversion from popular frameworks

通过Offline工具,可大幅提高性能,加速产品化落地

Wide network layer support and well Android NN API support良好的Android NN API支持

中国人工智能产业发展联盟

UniSoc T710

Page 15: AIIA DNN benchmarkaiiaorg.cn/uploadfile/2019/0709/20190709091740147.pdf · 6 Face detection Xilinx Data DenseBox fps caffe Xilinx b UniSoc T710 参测场景 a:首轮评估模型 b:第二轮评估增加模型

DUT2智能语音识别模组CI1006A1CSD02

processor CI1006A1CSD02description 于ASIC架构的DNN语音识别芯片

process -

CPU ARM M4

NPU BNPU

内存 16Minterface UART、I2C、SPI、PWM、红外等外围控

制接口system RTOSsupported mobile framework -

year 2017

Information芯片基本信息披露

Page 16: AIIA DNN benchmarkaiiaorg.cn/uploadfile/2019/0709/20190709091740147.pdf · 6 Face detection Xilinx Data DenseBox fps caffe Xilinx b UniSoc T710 参测场景 a:首轮评估模型 b:第二轮评估增加模型

环境:

语音类芯片模组测试条件: 指标:

Page 17: AIIA DNN benchmarkaiiaorg.cn/uploadfile/2019/0709/20190709091740147.pdf · 6 Face detection Xilinx Data DenseBox fps caffe Xilinx b UniSoc T710 参测场景 a:首轮评估模型 b:第二轮评估增加模型

序号 测试项目 环境 安静 平稳噪声 非平稳噪声 自噪声1 误唤醒率 一般混响 1m(声源多角度) 0% 0% 10次/50小时 0%

3m(声源多角度) 0% 0% 10次/50小时 0%5m(声源多角度) 0% 0% 10次/50小时 0%

大混响 1m(声源多角度) 0% 0% 10次/50小时 0%3m(声源多角度) 0% 0% 10次/50小时 0%5m(声源多角度) 0% 0% 10次/50小时 0%

2 唤醒率 一般混响 1m(声源多角度) 99.9% 99% 93% 99.9%3m(声源多角度) 99.9% 98% 92% 99.9%5m(声源多角度) 99% 97% 90% 99%

大混响 1m(声源多角度) 99% 98% 92% 99%3m(声源多角度) 97% 96% 92% 97%5m(声源多角度) 95% 94% 90% 95%

3 识别准确率 一般混响 1m(声源多角度) 99.9% 98% 93% 99.9%3m(声源多角度) 99.9% 97% 92% 99.9%5m(声源多角度) 99% 96% 90% 99%

大混响 1m(声源多角度) 99% 97% 92% 99%3m(声源多角度) 97% 96% 90% 97%5m(声源多角度) 96% 92% 88% 96%

4 误识别次数 一般混响 1m(声源多角度) 0% 2% 6% 0%3m(声源多角度) 0% 3% 7% 0%5m(声源多角度) 1% 4% 8% 1%

大混响 1m(声源多角度) 1% 2% 6% 1%3m(声源多角度) 2% 3% 8% 2%5m(声源多角度) 3% 7% 10% 3%

测试集依据:成都启英泰伦科技有限公司标准《本地语音模块语音识别及性能测试标准》

四大类评测指标的具体结果:

Page 18: AIIA DNN benchmarkaiiaorg.cn/uploadfile/2019/0709/20190709091740147.pdf · 6 Face detection Xilinx Data DenseBox fps caffe Xilinx b UniSoc T710 参测场景 a:首轮评估模型 b:第二轮评估增加模型

DUT3ZCU104板卡

processor ZU7EV

descriptionEvaluation kit for embedded vision applications

process 16nm

CPU

quad-core ARM® Cortex™-A53 applications processor, dual-core Cortex-R5 real-time processor

GPU Mali™-400 MP2 interface USB3, DP, SATA, LPC FMCsystem Linuxsupported mobile framework Nyear 2017

Information芯片基本信息披露

Page 19: AIIA DNN benchmarkaiiaorg.cn/uploadfile/2019/0709/20190709091740147.pdf · 6 Face detection Xilinx Data DenseBox fps caffe Xilinx b UniSoc T710 参测场景 a:首轮评估模型 b:第二轮评估增加模型

No Application scenarios Test data network metrics framework Source State

Classification ImageNet

MobileNet_v1

fps, top1,top5

TensorFlow Qualcomm b

1

MobileNet_v2

caffe AIIA aTensorFlow AIIA

btflite Imaginationcaffe Xilinx

Resnet101 caffe AIIA aTensorFlow AIIA b

VGG16 caffe AIIA aTensorFlow AIIA bTensorFlow AIIA a

Inception_v3 caffe Xilinx b

2 Object recognition VOC2012

SSD_VGG16

fps, mAP

caffe AIIA aSSD_VGG caffe ARM b

ssd_mobilenet_v1 caffe AIIA aTensorFlow Qualcomm b

ssd_mobilenet_v2 caffe AIIA aSSD TensorFlow Xilinx b

3 Super-Resolution 2017CVPRvdsr

fps, PSNRcaffe AIIA a

TensorFlow QualcommbVGG19 TFlite Imagination

4 Semantic segmentation VOC2012Deeplabv3+

fps,mIoU

TensorFlow AIIA aTensorFlow Qualcomm b

FCN caffe AIIA aFPN caffe Xilinx b

5 Face recognition FLW Light CNN fps, accuracy caffe ARM bInception-ResNet-v1 TensorFlow AIIA b

6 Face detection Xilinx Data DenseBox fps caffe Xilinx b

ZCU104参测场景

a:首轮评估模型 b:第二轮评估增加模型

四类场景,七种模型

Page 20: AIIA DNN benchmarkaiiaorg.cn/uploadfile/2019/0709/20190709091740147.pdf · 6 Face detection Xilinx Data DenseBox fps caffe Xilinx b UniSoc T710 参测场景 a:首轮评估模型 b:第二轮评估增加模型

INTE8 (ZU7EV)四类场景,七种模型的性能与精度结果

Page 21: AIIA DNN benchmarkaiiaorg.cn/uploadfile/2019/0709/20190709091740147.pdf · 6 Face detection Xilinx Data DenseBox fps caffe Xilinx b UniSoc T710 参测场景 a:首轮评估模型 b:第二轮评估增加模型

processor Snapdragon 855 Mobile Platform

description First mobile platform to collectively commercialize 5G, AI, XR

process 7nmCPU Qualcomm® Kryo™ 485 CPU (Octa-core)GPU Qualcomm® Adreno™ 640 GPUinterface USB Version 3.1; USB Type-C Supportsystem Androidsupported mobile framework SNPEyear 2018

DUT4高通QRD855参考测试机

Information芯片基本信息披露

Page 22: AIIA DNN benchmarkaiiaorg.cn/uploadfile/2019/0709/20190709091740147.pdf · 6 Face detection Xilinx Data DenseBox fps caffe Xilinx b UniSoc T710 参测场景 a:首轮评估模型 b:第二轮评估增加模型

No Application scenarios Test data network metrics framework Source State

1 Classification ImageNet

MobileNet_v1

fps, top1,top5

TensorFlow Qualcomm b

MobileNet_v2

caffe AIIA aTensorFlow AIIA

btflite Imaginationcaffe Xilinx

Resnet101 caffe AIIA aTensorFlow AIIA b

VGG16 caffe AIIA aTensorFlow AIIA bTensorFlow AIIA a

Inception_v3 caffe Xilinx b

2 Object recognition VOC2012

SSD_VGG16

fps, mAP

caffe AIIA aSSD_VGG caffe ARM b

ssd_mobilenet_v1 caffe AIIA aTensorFlow Qualcomm b

ssd_mobilenet_v2 caffe AIIA aSSD TensorFlow Xilinx b

3 Super-Resolution 2017CVPRVDSR

fps, PSNRcaffe AIIA a

TensorFlow QualcommbVGG19 TFlite Imagination

4 Semantic segmentation VOC2012DeepLabv3+

fps,mIoU

TensorFlow AIIA aTensorFlow Qualcomm b

FCN caffe AIIA aFPN caffe Xilinx b

5 Face recognition FLW Light CNN fps, accuracy caffe ARM bInception-ResNet-v1 TensorFlow AIIA b

6 Face detection Xilinx Data DenseBox fps caffe Xilinx b

QRD855参测场景

a:首轮评估模型 b:第二轮评估增加模型

四类场景,五种模型

Page 23: AIIA DNN benchmarkaiiaorg.cn/uploadfile/2019/0709/20190709091740147.pdf · 6 Face detection Xilinx Data DenseBox fps caffe Xilinx b UniSoc T710 参测场景 a:首轮评估模型 b:第二轮评估增加模型

(SNPE: v1.27.0)

INT8四类场景,五种模型的性能与精度结果

ssd_mobilenetv1(300x300)

DeepLabv3+(513x513)

VDSR(256x256)

INT8 0.385 0.6993 25.5544原始精度 —— —— ——

Page 24: AIIA DNN benchmarkaiiaorg.cn/uploadfile/2019/0709/20190709091740147.pdf · 6 Face detection Xilinx Data DenseBox fps caffe Xilinx b UniSoc T710 参测场景 a:首轮评估模型 b:第二轮评估增加模型

AIIA DNN benchmark v0.5 Top1

Page 25: AIIA DNN benchmarkaiiaorg.cn/uploadfile/2019/0709/20190709091740147.pdf · 6 Face detection Xilinx Data DenseBox fps caffe Xilinx b UniSoc T710 参测场景 a:首轮评估模型 b:第二轮评估增加模型

AIIA DNN benchmark v0.5 五类场景12种模型top1榜单(手机类 INT8)

刷榜方式(指定):ü 模型ü 测试数据集ü 预处理方式ü 单线程推理任务

增加测试场景方式(提交):ü 原始FP32模型文件ü 前处理ü 精度ü 数据集ü 后处理脚本

定期公布更新数据,欢迎企业刷榜

性能 精度

ImageNet Validation 1000张

Page 26: AIIA DNN benchmarkaiiaorg.cn/uploadfile/2019/0709/20190709091740147.pdf · 6 Face detection Xilinx Data DenseBox fps caffe Xilinx b UniSoc T710 参测场景 a:首轮评估模型 b:第二轮评估增加模型

定期公布更新数据,欢迎企业刷榜

AIIA DNN benchmark v0.5 五类场景10种模型top1榜单(手机类 FP16)性能 精度

ImageNet Validation 1000张

刷榜方式(指定):ü 模型ü 测试数据集ü 预处理方式ü 单线程推理任务

增加测试场景方式(提交):ü 原始FP32模型文件ü 前处理ü 精度ü 数据集ü 后处理脚本

Page 27: AIIA DNN benchmarkaiiaorg.cn/uploadfile/2019/0709/20190709091740147.pdf · 6 Face detection Xilinx Data DenseBox fps caffe Xilinx b UniSoc T710 参测场景 a:首轮评估模型 b:第二轮评估增加模型

AIIA DNN benchmark v0.5 五类场景10种模型top1榜单(板卡类 INT8)

定期公布更新数据,欢迎企业刷榜

性能 精度

刷榜方式(指定):ü 模型ü 测试数据集ü 预处理方式ü 单线程推理任务

增加测试场景方式(提交):ü 原始FP32模型文件ü 前处理ü 精度ü 数据集ü 后处理脚本

Page 28: AIIA DNN benchmarkaiiaorg.cn/uploadfile/2019/0709/20190709091740147.pdf · 6 Face detection Xilinx Data DenseBox fps caffe Xilinx b UniSoc T710 参测场景 a:首轮评估模型 b:第二轮评估增加模型

AIIA DNN benchmark model details

Page 29: AIIA DNN benchmarkaiiaorg.cn/uploadfile/2019/0709/20190709091740147.pdf · 6 Face detection Xilinx Data DenseBox fps caffe Xilinx b UniSoc T710 参测场景 a:首轮评估模型 b:第二轮评估增加模型

Work Plan Application scenarios iteration

场景迭代Rich evaluation object: Voice interaction/ADAS/smart camera

评测对象丰富:语音、自动驾驶、安防metrics expansion: Power

指标扩充Benchmark demo update

Benchmark demo更新

Release Version 1.0 guidelines :

发布V1.0评估方案Guidelines of artificial intelligence chip benchmark Part 1:Metrics and evaluation methods for terminal-based deep neural network processor benchmark

人工智能芯片测试评估规范:第1部分:人工智能端侧芯片基准测试指标要求和评估方法

Iteration benchmark result——

迭代结果发布2019 Artificial Intelligence Developer Conference

AIIA 2019人工智能开发者大会……

Release Benchmark v0.5 evaluation method: DNN processorbenchmark for inference at the cloud

云端推断v0.5首轮测试启动

Page 30: AIIA DNN benchmarkaiiaorg.cn/uploadfile/2019/0709/20190709091740147.pdf · 6 Face detection Xilinx Data DenseBox fps caffe Xilinx b UniSoc T710 参测场景 a:首轮评估模型 b:第二轮评估增加模型

Thanks

Contact:[email protected]