arm compute library

ARM Compute Libraray

https://developer.arm.com/technologies/compute-library

ARMが公開した画像処理およびCNNライブラリ

Linux / Android / Bare Metalで利用可能　

2017.04.01(土)

@Vengineer

クロスコンパイラの用意

AArch64 : arm64-v8agcc-linaro-5.3-2016.02-x86_64_aarch64-linux-gnu

ARM : armv7aaro/gcc-linaro-5.3-2016.02-x86_64_arm-linux-gnueabihf

ビルド

% scons debug=1 neon=1 opencl=0 arch=arm64-v8a

OpenCL対応

libOpenCL.so がGPU(ARM Mali)をサポートしているときのみ利用可能

この資料では、NEONのみについて説明します

画像処理関連

　・Basic arithmetic, mathematical and binary operator functions　・Colour manipulation (conversion, channel extraction, and more)　・Convolution filters (Sobel, Gaussian, and more)　・Canny Edge, Harris corners, optical flow and more　・Pyramids (such as Laplacians)　・HOG (Histogram of Oriented Gradients)　・SVM (Support Vector Machines)　・H/SGEMM (Half and Single precision General Matrix Multiply)

Convolutional Neural Networks関連

　・Activation　・Convolution　・Fully connected　・Locally connected　・Normalization　・Pooling　・Soft-max

サンプルコード：scale (NEON)

PPMLoader ppm; ppmファイルImage src, dst; イメージバッファ

ppm.open(argv[1]); ファイルオープンppm.init_image(src, Format::U8); イメージ読み込み

constexpr int scale_factor = 2;

TensorInfo dst_tensor_info( 入力テンソル情報src.info()->dimension(0) / scale_factor, src.info()->dimension(1) / scale_factor, Format::U8);

サンプルコード：scale (NEON)

dst.allocator()->init(dst_tensor_info); 初期化

NEScale scale; スケールscale.configure(&src, &dst, コンフィギュレーション

InterpolationPolicy::NEAREST_NEIGHBOR,BorderMode::UNDEFINED);

src.allocator()->allocate(); メモリ割当てdst.allocator()->allocate(); メモリ割当て

scale.run(); 実行

サンプルコード：convolution (NEON)

PPMLoader ppm; ppmファイルImage src, tmp, dst; イメージバッファ

ppm.open(argv[1]); ファイルオープンppm.init_image(src, Format::U8); イメージ読み込み

tmp.allocator()->init(*src.info()); 初期化dst.allocator()->init(*src.info()); 初期化

NEConvolution3x3 conv3x3; 3x3 ConvolutionNEConvolution5x5 conv5x5; 5x5 Convolution

サンプルコード：convolution (NEON)

conv3x3.configure(&src, &tmp, コンフィギュレーションgaussian3x3, 0, BorderMode::UNDEFINED);

conv5x5.configure(&tmp, &dst, コンフィギュレーションgaussian5x5, 0, BorderMode::UNDEFINED);

src.allocator()->allocate(); メモリ割当てtmp.allocator()->allocate(); メモリ割当てdst.allocator()->allocate(); メモリ割当て

conv3x3.run(); 実行conv5x5.run(); 実行

スケジューラ

arm_compute/runtime/NEON/CPPScheduler.h

arm_compute/runtime/NEON/NEScheduler.h

namespace arm_compute{using NEScheduler = CPPScheduler;}

NEScheduler は、CPPScheduler と同じ

multithread(スレッド無し)

void CPPScheduler::multithread(ICPPKernel *kernel, const size_t split_dimension){ const Window &max_window = kernel->window(); const int num_iterations = max_window.num_iterations(split_dimension); int num_threads = std::min(num_iterations, _num_threads);

if(!kernel->is_parallelisable() || 1 == num_threads) { kernel->run(max_window); }

multithread (スレッド有り)

for(int t = 0; t < num_threads; ++t) { Window win = max_window.split_window(split_dimension, t, num_threads); win.set_thread_id(t); win.set_num_threads(num_threads);

if(t != num_threads - 1) { _threads[t].start(kernel, win); } else { kernel->run(win); } }

サンプルカーネル：NEScaleKernel

void NEScaleKernel::run(const Window &window){ ARM_COMPUTE_ERROR_ON_UNCONFIGURED_KERNEL(this); ARM_COMPUTE_ERROR_ON_INVALID_SUBWINDOW(INEKernel::window(), window); ARM_COMPUTE_ERROR_ON(_func == nullptr);

(this->*_func)(window);}

_func = &NEScaleKernel::scale_nearest;_func = &NEScaleKernel::scale_bilinear;_func = &NEScaleKernel::scale_area;

おしまい

arm compute library

Devices & Hardware

azure compute - new features and roadmap

saltstack com google compute engine

outage-proof your applications - rightscale compute 2013

bt compute mar2016

cloud compute

altervista€¦ · c.5 rall. un poco in armonici 8q sopra...

all' artista prof. mario rolla serenata alpestre chitarra...

más allá de la raspberry pi (altamente subjetivo)€¦ ·...

hyper-v openstack nova compute

hpe compute 2.0 prezentacja 2.12.2015

俺的 ignite update 萌えポイント portal&arm,...

20191112 acd praktikum raspberry pi compute module ·...

compute methods@cornell

1 hpc pour les opérations. sommaire quelques rappels sur...

webfiles.wulib.wustl.eduwebfiles.wulib.wustl.edu/units/music/catalog/b35117576.pdf ·...

bulldozer: an approach to multithreaded compute performance

compute system

short arm cast - med.mahidol.ac.th · short arm cast ....

amazon elastic compute cloudamazon elastic compute cloud aws...

compute credit default swaps