arm ethos npus€¦ · © 2019 arm limited arm ethos npus delivering solutions for all ai workloads
TRANSCRIPT
© 2019 Arm Limited
Arm Ethos NPUsdelivering solutions for all AI workloads
2 © 2019 Arm Limited
Vision
Images and video
Object detection, face unlock, defocus (bokeh),
beautification, scaling, etc.
Voice
Recognition and creation
Keyword spotting, speech recognition, natural language processing, speech synthesis,
etc.
Vibration
Any ‘signal’
Accelerometer, pressure, lidar/radar, speed, shock,
vibration, pollution, density, viscosity, etc.
AI performs well with ‘patterns’ of data
What is AI Being Used for in Endpoints?
3 © 2019 Arm Limited
Diversity of AI Requirements in the Market
Premium• Best user experience and responsiveness
• Highest performance in power-efficient design
Balanced• Superior user experience in mid-range designs
• Balance performance with area and power
Cost Sensitive• Delivering advanced user experiences for the most
cost-sensitive designs
• Optimized for performance in the smallest area
4 © 2019 Arm Limited
Introducing Ethos NPUs for Every Market Segment
Performance-critical AI applications delivering premium experiences
Supporting AI applications in the most cost-sensitive
endpoint devices
Enabling AI applications in mid-range devices balancing performance with cost and
battery life constraints
5 © 2019 Arm Limited
Arm Ethos NPU Family Enables Multiple IP Choices For Devices
SmartCameras
EntrySmartphones DTV
MainstreamSmartphones
SmartHome Hub
ComputationalPhotography
PremiumSmartphones AR/VR
Ethos-N37
Ethos-N57
Ethos-N77
4 TOP/s
2 TOP/s
1 TOP/s
< 2 GB/S 4 GB/S 8 GB/S
6 © 2019 Arm Limited
ML for All: Arm Ethos Family of NPUs
Common Software Stack based on Arm NN
Common Hardware Architecture Datatypes: Int8, Int16 Frequency: up to 1 GHz
Software and Distributed Hardware Compression
Winograd and Sparsity
Ethos-N77
Up to 4 TOP/s @ 1 GHz
1 - 4MB Configurable Internal SRAM
Ethos-N57
Up to 2 TOP/s @ 1 GHz
512 KB Internal SRAM
Ethos-N37
Up to 1 TOP/s @ 1 GHz
512 KB Internal SRAM
7 © 2019 Arm Limited
Benefits of Arm Ethos NPU Family
Flexible architecture enables compatibility with current and future ML networks
High-performance ML in highly cost, power and bandwidth-efficient designs
Arm unified software stack facilitates ML app portability across Cortex-A CPUs, Mali GPUs and Ethos NPUs
Advanced compression and data management unlocks ML capabilities with limited DRAM bandwidth
Multicore and mesh support providing increased NPU performance
8 © 2019 Arm Limited
Flexible, Scalable, Programmable NPU ArchitectureComputation Engine (CE) = MCE + PLEEthos-N77/N57/N37= 16x/8x/4x CE
SRAM
MAC Computation Engine (MCE)
Programmable Layer Engine(PLE)
Network Control Unit
DMA
External Memory System
Network Control Unit Independently control execution of the neural
network end-to-end
MAC Computation Engine (MCE)Fixed-function MAC engine with Winograd
supports highly efficient convolutional operations
Programmable Layer Engine (PLE)Programmable, software upgradable engine
for activations and vector operations
DMAManages data transfer to/from system memory
and optimized internal data movement
9 © 2019 Arm Limited
Compute Engine: Micro-architecture Block DiagramMAC Computation
Engine
Programmable Layer Engine
MAC Unit
Shared SRAM(Ethos-N77: 64KB/256KB*, Ethos-N57: 64KB , Ethos-N37: 128KB)
WeightDecompression
Winograd
Scaling
MAC Compute Engine
Control
PLE Memory (SRAM)
Vector Unit
Register FileLoad-store Unit
Programmable Layer Engine (PLE)
Quad Routing Block (QRB) Quad
Quad Quad
Quad Quad
NCU DMA
Ethos-N77
Quad Quad
NCU DMA
Ethos-N57
Quad
NCU DMA
Ethos-N37
To NCU and DMA
Compute Engine
Compute Engine (CE) = MCE + PLEEthos-N77/N57/N37 = 16x/8x/4x CE
To other QRB
* For 4MB Ethos-N77 configurationTotal Memory
Ethos-N77 (1MB) = 16 x 64KBEthos-N77 (4MB) = 16 x 256KB
Ethos-N57 (512KB) = 8 x 64KB
Ethos-N37 (512KB) = 4 x 128KB
© 2019 Arm Limited11
HostCortex-A CPU
NPUCompiler/Driver
Arm NN
ML Framework
ML Application
NCU
MCE PLE
DMA
Arm Ethos NPU
Arm Ethos NPU: System-level View
SRAM SRAM
DRAM
© 2019 Arm Limited12
Minimized Data Movement is Key to Highest ML Performance
• Distributed, local SRAM architecture allows 90% of data accesses to occur locally (inside Ethos NPU)
• Reduces loading on system DRAM allowing increased throughput and lower power
• Highly effective utilization of MACs minimizes bandwidth constraints
• Software and hardware compression minimizes amount of data that needs to be moved and managed
0% 20% 40% 60% 80% 100%
MobileNet_v1
Inception_v3
ResNet_v1-50
VGG-16
SRAM and DRAM Access Per Inference
DRAM SRAM
Source: Arm Engineering Analysis
© 2019 Arm Limited13
High-efficiency Gains from Augmented NN Techniques
Highly accurate Winograd techniques provide up to 3x reduction in calculations
16-bit quantized support for HDR image processing, high fidelity audio, etc.Support for mixed precision
Native hardware support for key computation patternsConvolution, deconvolution, depth-wise separable, vector products, stride modes
Built-In sparsity support Zero-gating techniques to eliminate redundant computations and save power
© 2019 Arm Limited14
Arm Ethos NPUs Enable Secure AI
• AI security needs driven by data• Heightened need to keep personal data and protected content
secure and on device
• Use case: Personal information protection• Face unlock, voice verification, biometrics
• Use case: Authorization and payments• Protects against SW and HW attacks
• Use case: Protected content• Models and content created by OEMs and AI vendors
• Security is a system problem• Security cannot be achieved by a single IP alone• Complementary system HW IP and SW packages are required • Arm NPU security scheme is highly programable to accommodate
different needs from partners
Mobile Payment
Content Protection
Authentication
© 2019 Arm Limited15
Open-source Standard Software Speeds ML Deployment
An inference engine for Machine Learning
350M+ devices shipped with Arm NN
Parsers
Model conversion and optimization tooling
Application
Network import(eg. TensorFlow, Caffe, ONNX)
Connect to inferenceengines
Training-timetooling
NN API
NN inference engine
Seamless dispatch todedicated IP
Optimized NN algorithms
3rd party IP
NPUs CPUs GPUs
Pro
filin
g an
d d
ebu
ggin
g to
olin
g
Connect to high-level
frameworks
1
1
Supported byend-to-end
tooling
4
4
Integrateadditional IP
3
3
Connect toinferenceengines
22
© 2019 Arm Limited17
Arm NN is Supported by a Broad Ecosystem
“We're delighted to work with Arm on enabling high performance ML across the breadth of Android devices. The Arm Compute Library
provides excellent performance. We're looking forward to using it.”
The Android Neural Networks team
On Arm NN: “… one of the largest open-source teams in the world, and is in the top 2% of all project teams on Open Hub [which tracks
many repositories including GitHib]”
Open Hub
Facts and Figures• Around 445K lines of code• Approx. 120 Eng. Years of effort to date• Estimated that Arm NN is already shipping in over 350M
Android devices• Arm NN SDK donated to Linaro to become the
centrepiece of their Machine Intelligence Initiative
“The combination of Amazon SageMaker Neo and the Arm NN SDK will help developers optimize ML models to run efficiently on a
wide variety of connected edge devices.”
from Amazon SageMaker Neo PR
“The TensorFlow team is excited to work with Arm and Linaro to expand support for edge devices, and we’re looking forward to
integrating with the Arm NN library.”
Pete Warden, Google
© 2019 Arm Limited18
Ethos NPUs Deliver Superior Experiences in Mid-range Mobile
Super ResolutionUpscale images from 1080p to 2K/4K for a
superior viewing experience
Face UnlockUnlock phone and
authorize payments using facial biometrics
Avatars
Support innovative real-time animations
on selfies
Night ModeEnhance images in low/no
light conditions
© 2019 Arm Limited19
Ethos NPUs Bring Theatre-quality Experiences to your Home
Super ResolutionUpscale images from 1080p to 2K/4K for a
superior viewing experience
Auto PauseAuto detect children and
pause age-restricted content
Context-aware Sound
Tune and amplify sound based on what is
playing
Picture Quality
Enhance colors based on scene analysis
© 2019 Arm Limited20
Raising a Successful AI Ecosystem Takes a Global VillageLeverage this new world through trusted partnerships that only Arm can bring
Arm AI Ecosystem starts with optimizing software for Cortex CPUs and expands to Mali GPUs and Arm NPUs
Arm AI Ecosystem embraces open-source software Arm NN
Arm AI Ecosystem delivers end-to-end developer resources to quickly and easily deploy AI applications everywhere
Arm’s central position in the market guarantees the strongest AI Ecosystem
AIEcosystem is about helping you build
complete solutions
© 2
01
9A
rm L
imit
ed
21 © 2019 Arm Limited
Arm’s scalable and efficient NPU
familyenables optimized
ML capabilities
Delivering best-in-class
efficiency
Arm ML software makes it easy to run AI on Arm,
whatever the SoC
Thank YouDankeMerci谢谢
ありがとうGracias
Kiitos감사합니다
धन्यवाद
شكًراתודה
© 2019 Arm Limited
The Arm trademarks featured in this presentation are registered trademarks or trademarks of Arm Limited (or its subsidiaries) in
the US and/or elsewhere. All rights reserved. All other marks featured may be trademarks of their respective owners.
www.arm.com/company/policies/trademarks