hpe hpc & ai フォーラム 2018 講演資料...design and build of container as a service...

40
HPE HPC & AI フォーラム 2018 Hewlett Packard Enterprise Pointnext Hybrid IT COE Lead Architect 吉瀬 淳一 AI活用を加速する イノベーションプラットフォーム

Upload: others

Post on 05-Jun-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: HPE HPC & AI フォーラム 2018 講演資料...Design and build of container as a service (based on Mesosphere DCOS) with OSS CICD lifecycle tool chain such as Jenkins, Tensorflow,

HPE HPC & AI フォーラム 2018

Hewlett Packard EnterprisePointnext Hybrid IT COELead Architect吉瀬 淳一

AI活用を加速するイノベーションプラットフォーム

Page 2: HPE HPC & AI フォーラム 2018 講演資料...Design and build of container as a service (based on Mesosphere DCOS) with OSS CICD lifecycle tool chain such as Jenkins, Tensorflow,

HPE PointnextのDigital Transformation支援

1

Page 3: HPE HPC & AI フォーラム 2018 講演資料...Design and build of container as a service (based on Mesosphere DCOS) with OSS CICD lifecycle tool chain such as Jenkins, Tensorflow,

お客様を支援する専門知識 – HPE Pointnext –

2

アドバイザリー プロフェッショナル オペレーション

お客様の成果と課題の把握

概念実証と

パイロットを通じた妥当性の確認

変革プランの設計

ソリューションの迅速な展開と実装

大規模なITソリューションの設計と構成

ソリューションに対する

継続的な運用とサポートの提供

柔軟なデリバリモデルと消費モデルの最適化

Page 4: HPE HPC & AI フォーラム 2018 講演資料...Design and build of container as a service (based on Mesosphere DCOS) with OSS CICD lifecycle tool chain such as Jenkins, Tensorflow,

Broader Use Cases of Container Based Transformation

3

By 2018, more than 50% of new workloads will be

deployed into containers in at least one stage

of the application life cycle

2018

Gartner, Mar. 2016

Adoption has been accelerating due to the fact more use cases are built in various areas…Originally the use was started from application development.

Now, expanding to…• Replacement of VM• AI / Deep Learning• CICD Automation

Replacement of Virtualization• Use container technology instead of

virtualization to minimize infrastructure tax

• Benefit : Performance, TCO

Optimization, Flexibility

AI / Deep Learning Platform• Higher utilization and performance of AI /

DL frameworks such as Tensorflow could

be provided

• Benefit : Efficiency, Agility, Innovation

CICD Automation• Seamless application development and

deployment could be accelerated and

automated by the use of Container

• Benefit : Agility, IP Protection

Page 5: HPE HPC & AI フォーラム 2018 講演資料...Design and build of container as a service (based on Mesosphere DCOS) with OSS CICD lifecycle tool chain such as Jenkins, Tensorflow,

References of Container based Transformation

4

CICD Automation @ 2017-

Source: https://h50146.www5.hpe.com/products/servers/news/casestudy/jcb-synergy/

Integration of container based application development lifecycle tool chain to transform and accelerate application development with HPE Pointnext Center of Excellence expertiseDesign and build of container as a service (based on Mesosphere DCOS) with OSS CICD lifecycle tool chain such as Jenkins, Tensorflow, etc. on HPE Synergy platform.With the transformation, JCB would be able to acquire “agility” and “flexibility” in their application development.

Replacement of Virtualization

@ 2016-

Source: https://h20195.www2.hpe.com/v2/Getdocument.aspx?docname=a00045370enw

Adoption of container platform in order to bring efficiency, agility, and flexibility as one package for competitive semicon manufacturing process. Testing of SSD firmware could be

CICD Automation

with OpenShift

@ 2018-

Design and integration of CICD automation with OSS tools on Red Hat OpenShift in order to increase the speed of application development and to bring clarity and standardization of application development platform for security and governance purpose.Customer : Financial bank in Japan

Container based automated

data analysis on AWS

@ 2018-

DL Platform with

Container based

Distributed GPU

@ 2018-Bring efficiency and high utilization to GPU platform for DL framework, Tensorflow by transforming the platform to containerization.Customer : Manufacturer in Korea

Customer : Financial Institution in Singapore

Page 6: HPE HPC & AI フォーラム 2018 講演資料...Design and build of container as a service (based on Mesosphere DCOS) with OSS CICD lifecycle tool chain such as Jenkins, Tensorflow,

取り扱っている関連テクノロジー/プロダクト

– Cloud Native Platform (Kubernetesディストリビューション)

– Mesosphere DC/OS

– SUSE CaaS Platform

– RedHat OpenShift

– Docker Enterprise

– Infrastructure

– HPE サーバー/ストレージ/ネットワーク

–パブリッククラウド

–AWS

–Azure

–GCP

–プライベートクラウド

–OpenStack

–VMware

5

Page 7: HPE HPC & AI フォーラム 2018 講演資料...Design and build of container as a service (based on Mesosphere DCOS) with OSS CICD lifecycle tool chain such as Jenkins, Tensorflow,

Deep Learning Starter Package w/ Tensorflow

Best Optimized Platform for Deep Learning

– HPE Apollo 6500 Gen10 System provides superior performance-per-dollar for GPU

intensive workloads, with eight NVIDIA Tesla V100 GPUs per server and NVLink

interconnect, delivering up to 125 TFlops single precision compute2 for faster

intelligence.

– Unprecedented performance delivering economical AI and deep learning

– Rock-solid, enterprise-level reliability, availability, serviceability - RAS features

– Supports a wide range of workloads, including deep learning and HPC workloads of complex simulation

and modeling

– Open Sourced Deep Learning Framework would be loaded out of box

– Major deep learning framework in the market, Tensorflow, would be configured for your innovation

– Open Sourced Deep Learning Library for Multiple GPUs– Easy to execute complex Deep Neural Network structure with Python

– Simple and visual management console for deep learning process with TensorBoard

– No code changes for enabling multiple GPUs to maximize its process power

– Proven Architecture by HPE– Various services leveraged Tensorflow in the world

– Apollo specially developed for innovation of our customers

– Deep Learning Ready Platform– Start innovation from today with buit-in deep learning architecture

– Certified architecture by HP Enterprise

– Various platform support tools are enabled from the beginning, such as iLO Management, CUDA

toolkits, and TensorBoard

6

HPE Deep Learning Development Platform

8 x Tesla GPU with NVLINK2.0 could be loaded on HPE ProLiant XL270d Accelerator Tray

Baremetal TensorFlow

Solution Characteristics

Best GPU

Density

Leading DL

Framework

Flexible

Storage

Option

Superior

Management

Tool

Benefit of HPE Deep Learning Development Platform

– Service Duration : 2.0 weeks– Design & Integration

– Configuration of Apollo System

– Installation of CentOS or Ubuntu

– NVIDIA Driver Implementation

– CUDA Toolkit / cuDNN Implementation

– TensorFlow Installation

– Sample Program Test

– Skill Transfer– Sample Program Handover

– Q&A for 1.0 week

– Output– Implementation Report

– Optional Services– Deep Learning Consulting Service

– Inception Integration

Service Description

CentOS / Ubuntu

CUDA

Deep Learning Framework

TensorFlow

Sample DL Apps

HPE Apollo 6500 Gen10

+ GPU(NVIDIA)

Page 8: HPE HPC & AI フォーラム 2018 講演資料...Design and build of container as a service (based on Mesosphere DCOS) with OSS CICD lifecycle tool chain such as Jenkins, Tensorflow,

Deep Learning Starter Package w/ Tensorflowon Container based Distributed GPUs Platform w/ Red Hat

Best Optimized and Scalable Container based Platform for Deep Learning– HPE Apollo 6500 Gen10 System provides superior performance-per-dollar for GPU

intensive workloads, with eight NVIDIA Tesla V100 GPUs per server and NVLinkinterconnect, delivering up to 125 TFlops single precision compute2 for faster intelligence.

– Unprecedented performance delivering economical AI and deep learning

– Rock-solid, enterprise-level reliability, availability, serviceability - RAS features

– Supports a wide range of workloads, including deep learning and HPC workloads of complex simulation and modeling

– Container based Deep Learning Framework would be loaded out of box

– Major deep learning framework in the market, Tensorflow, would be configured for your innovation

– Provide the highest level of GPU resource utilization with container technologies

– Open Sourced Deep Learning Library for Multiple GPUs– Easy to execute complex Deep Neural Network structure with Python– Simple and visual management console for deep learning process with TensorBoard– No code changes for enabling multiple GPUs to maximize its process power

– Innovative Architecture by HPE– Various services leveraged Tensorflow in the world– Apollo specially developed for innovation of our customers– Container based Tensorflow would increase the efficiency of GPU resource usage– Easy to develop entire application ecosystem by integration of Deep Learning framework on Red Hat

OpenShift

– Deep Learning Ready Platform– Start innovation from today with buit-in deep learning architecture– Certified architecture by HP Enterprise– Various platform support tools are enabled from the beginning, such as Red Hat OpenShift, iLO

Management, CUDA toolkits, and TensorBoard

7

HPE Deep Learning Development Platform

8 x Tesla GPU with NVLINK2.0 could be loaded on HPE ProLiant XL270d Accelerator Tray

Solution Characteristics

Best GPU

Density

Leading DL

Framework

Flexible

Storage

Option

Superior

Management

Tool

Benefit of HPE Deep Learning Development Platform

– Service Duration : 3.0 weeks– Design & Integration

– Configuration of Apollo System

– Installation of CentOS or Ubuntu

– Installation of Red Hat OpenShift

– Configuration of master, worker, and infra nodes

– NVIDIA Driver Implementation

– CUDA Toolkit / cuDNN Implementation

– TensorFlow Installation

– Sample Program Test

– Skill Transfer– Sample Program Handover

– Q&A for 1.0 week

– Output– Implementation Report

– Optional Services– Deep Learning Consulting Service

– Inception Integration

Service Description

CentOS / Ubuntu

CUDA

Deep Learning Framework

TensorFlow

Sample DL Apps

HPE Apollo 6500 Gen10

+ GPU(NVIDIA)

Container Based TensorFlow

OpenShift

Page 9: HPE HPC & AI フォーラム 2018 講演資料...Design and build of container as a service (based on Mesosphere DCOS) with OSS CICD lifecycle tool chain such as Jenkins, Tensorflow,

Deep Learning Starter Package w/ Tensorflowon Container based Distributed GPUs Platform w/ DCOS

Best Optimized and Scalable Container based Platform for Deep Learning– HPE Apollo 6500 Gen10 System provides superior performance-per-dollar for GPU

intensive workloads, with eight NVIDIA Tesla V100 GPUs per server and NVLinkinterconnect, delivering up to 125 TFlops single precision compute2 for faster intelligence.

– Unprecedented performance delivering economical AI and deep learning

– Rock-solid, enterprise-level reliability, availability, serviceability - RAS features

– Supports a wide range of workloads, including deep learning and HPC workloads of complex simulation and modeling

– Container based Deep Learning Framework would be loaded out of box

– Major deep learning framework in the market, Tensorflow, would be configured for your innovation

– Provide the highest level of GPU resource utilization with container technologies

– Open Sourced Deep Learning Library for Multiple GPUs– Easy to execute complex Deep Neural Network structure with Python– Simple and visual management console for deep learning process with TensorBoard– No code changes for enabling multiple GPUs to maximize its process power

– Innovative Architecture by HPE– Various services leveraged Tensorflow in the world– Apollo specially developed for innovation of our customers– Container based Tensorflow would increase the efficiency of GPU resource usage– Easy to develop entire application ecosystem by integration of Deep Learning framework on

Mesosphere DCOS

– Deep Learning Ready Platform– Start innovation from today with buit-in deep learning architecture– Certified architecture by HP Enterprise– Various platform support tools are enabled from the beginning, such as Mesosphere DCOS, iLO

Management, CUDA toolkits, and TensorBoard

8

HPE Deep Learning Development Platform

8 x Tesla GPU with NVLINK2.0 could be loaded on HPE ProLiant XL270d Accelerator Tray

Solution Characteristics

Best GPU

Density

Leading DL

Framework

Flexible

Storage

Option

Superior

Management

Tool

Benefit of HPE Deep Learning Development Platform

– Service Duration : 3.0 weeks– Design & Integration

– Configuration of Apollo System

– Installation of CentOS or Ubuntu

– Installation of Mesosphere DCOS

– Configuration of master, worker, and infra nodes

– NVIDIA Driver Implementation

– CUDA Toolkit / cuDNN Implementation

– TensorFlow Installation

– Sample Program Test

– Skill Transfer– Sample Program Handover

– Q&A for 1.0 week

– Output– Implementation Report

– Optional Services– Deep Learning Consulting Service

– Inception Integration

Service Description

CentOS / Ubuntu

CUDA

Deep Learning Framework

TensorFlow

Sample DL Apps

HPE Apollo 6500 Gen10

+ GPU(NVIDIA)

Container Based TensorFlow

DCOS

Page 10: HPE HPC & AI フォーラム 2018 講演資料...Design and build of container as a service (based on Mesosphere DCOS) with OSS CICD lifecycle tool chain such as Jenkins, Tensorflow,

そもそもAIとはどのように構成されるのか

Page 11: HPE HPC & AI フォーラム 2018 講演資料...Design and build of container as a service (based on Mesosphere DCOS) with OSS CICD lifecycle tool chain such as Jenkins, Tensorflow,

機械学習/ディープラーニングのプロセス

10

New DataTraining Dataset

“dog”

“cat”

“dog”

“cat”

“dog”

“cat”

Neural Network Model

元データ

前処理

学習Model

Training

学習済みモデル

推論Inference

“cat”

Page 12: HPE HPC & AI フォーラム 2018 講演資料...Design and build of container as a service (based on Mesosphere DCOS) with OSS CICD lifecycle tool chain such as Jenkins, Tensorflow,

AIと呼ばれているのはこのあたり

11

New DataTraining Dataset

“dog”

“cat”

“dog”

“cat”

“dog”

“cat”

Neural Network Model

元データ

前処理

学習Model

Training

学習済みモデル

推論Inference

“cat”

つまり• 学習データからいい感じに学習してくれる• 学習結果を用いて、問いに対していい感じに答えを出してくれるものがAI、という雰囲気

Page 13: HPE HPC & AI フォーラム 2018 講演資料...Design and build of container as a service (based on Mesosphere DCOS) with OSS CICD lifecycle tool chain such as Jenkins, Tensorflow,

AI開発のイメージ

–やりたいこと(例えば)–ペットの写真を与えたら、被写体が猫か犬かを見分けるアプリケーション

–よろしい、では–大量の猫と犬の画像を用意

–機械学習で特徴を抽出しモデルを作成するための学習アルゴリズムの開発

–モデルの学習

–学習済みモデルを利用して推論を行うアプリケーションの開発

12

Page 14: HPE HPC & AI フォーラム 2018 講演資料...Design and build of container as a service (based on Mesosphere DCOS) with OSS CICD lifecycle tool chain such as Jenkins, Tensorflow,

AI開発のイメージ

–やりたいこと(例えば)–ペットの写真を与えたら、被写体が猫か犬かを見分けるアプリケーション

–よろしい、では–大量の猫と犬の画像を用意

–機械学習で特徴を抽出しモデルを作成するための学習アルゴリズムの開発

–モデルの学習

–学習済みモデルを利用して推論を行うアプリケーションの開発

そんな単純な話ではない。

13

Page 15: HPE HPC & AI フォーラム 2018 講演資料...Design and build of container as a service (based on Mesosphere DCOS) with OSS CICD lifecycle tool chain such as Jenkins, Tensorflow,

機械学習/ディープラーニングのプロセス:考慮ポイント

14

New DataTraining Dataset

“dog”

“cat”

“dog”

“cat”

“dog”

“cat”

Neural Network Model

元データ

前処理preprocessing

学習Model

Training

学習済みモデル

推論Inference

“cat”

- GPUパワーの割り当て

- 開発者の利便性

- モデルとフレームワーク/ライブラリの管理

- データをどう貯めておくか

- どう処理するべきか

- 学習ジョブに与えるためのデータの管理

- リアルタイム処理- アプリケーション開発の効率化

- モデルの管理- 推論アプリケーションからのアクセス

Page 16: HPE HPC & AI フォーラム 2018 講演資料...Design and build of container as a service (based on Mesosphere DCOS) with OSS CICD lifecycle tool chain such as Jenkins, Tensorflow,

Googleの論文Hidden Technical Debt in Machine Learning Systems(NIPS2015)

15

Page 17: HPE HPC & AI フォーラム 2018 講演資料...Design and build of container as a service (based on Mesosphere DCOS) with OSS CICD lifecycle tool chain such as Jenkins, Tensorflow,

Googleの論文Hidden Technical Debt in Machine Learning Systems(NIPS2015)

16

ちょっとしたAIアプリケーションのための構築コストが半端ない

Page 18: HPE HPC & AI フォーラム 2018 講演資料...Design and build of container as a service (based on Mesosphere DCOS) with OSS CICD lifecycle tool chain such as Jenkins, Tensorflow,

しかも

AIをビジネスに活用するための要件:–変化し続けるデータを用いた継続的再学習

–モデルの精度のトラッキングとチューニング

–同じデータセットから様々な用途に応じた学習

–様々な推論アプリケーションへの対応

17

Page 19: HPE HPC & AI フォーラム 2018 講演資料...Design and build of container as a service (based on Mesosphere DCOS) with OSS CICD lifecycle tool chain such as Jenkins, Tensorflow,

Facebookの論文Applied Machine Learning at Facebook: A Datacenter Infrastructure Perspective (HPCA2018)

18

Page 20: HPE HPC & AI フォーラム 2018 講演資料...Design and build of container as a service (based on Mesosphere DCOS) with OSS CICD lifecycle tool chain such as Jenkins, Tensorflow,

AIプラットフォームのパラダイムシフト:MLaaS

19

Page 21: HPE HPC & AI フォーラム 2018 講演資料...Design and build of container as a service (based on Mesosphere DCOS) with OSS CICD lifecycle tool chain such as Jenkins, Tensorflow,

DataOps – DevOps inData Science and ML

20

• DataOpsとは• 自動化されたプロセス指向の方法論• データ/分析チームが品質を向上させ、データ分析のサイクルを短縮することが目的

• 目指すところ• 継続的な価値の提供• 属人性の低減• 疎結合• 柔軟なリソース活用

• つまり、アプリケーション開発における- DevOps-アジャイル開発手法をデータサイエンス/機械学習の分野に応用した方法論

Page 22: HPE HPC & AI フォーラム 2018 講演資料...Design and build of container as a service (based on Mesosphere DCOS) with OSS CICD lifecycle tool chain such as Jenkins, Tensorflow,

こういうものが必要だということ(MLaaS: Machine Learning as a Service)

21

ダイナミックに蓄積されたデータ

管理されたモデル利用可能な

各種フレームワーク計算リソース

(GPU,CPU,メモリ)共有資源

いい感じに必要な資源に取り次いでくれるなにか

利用する人・プロセス

データの取り込み

データの前処理

データ分析

トレーニング

モデルの評価

推論アプリ

チューニング

Page 23: HPE HPC & AI フォーラム 2018 講演資料...Design and build of container as a service (based on Mesosphere DCOS) with OSS CICD lifecycle tool chain such as Jenkins, Tensorflow,

なんか見たことある

22

分散データサービス

ストレージ・ネットワークサービス

計算リソース(GPU,CPU,メモリ)共有資源

いい感じに必要な資源に取り次いでくれるなにか

コンテナ化されたアプリケーションワークロード(マイクロサービス)

リポジトリ・レジストリ

Page 24: HPE HPC & AI フォーラム 2018 講演資料...Design and build of container as a service (based on Mesosphere DCOS) with OSS CICD lifecycle tool chain such as Jenkins, Tensorflow,

なんか見たことある

23

分散データサービス

ストレージ・ネットワークサービス

リポジトリ・レジストリ

計算リソース(GPU,CPU,メモリ)共有資源

コンテナオーケストレーション+DevOps

コンテナ化されたアプリケーションワークロード(マイクロサービス)

Page 25: HPE HPC & AI フォーラム 2018 講演資料...Design and build of container as a service (based on Mesosphere DCOS) with OSS CICD lifecycle tool chain such as Jenkins, Tensorflow,

ちなみにコンテナプラットフォームいろいろ

マネージド プライベート

プロプラ

Kubernetes aaS Kubernetesベース k8s以外

GKE

Azure Kubernetes Service

• IBM

• Oracle

• Red Hat

• Pivotal

• etc

DC/OS

ピュアk8s

Distro系

大いなるなにかの

一部

マルチクラスタ系

IBM Cloud Private

独自強化発展系

Docker

Page 26: HPE HPC & AI フォーラム 2018 講演資料...Design and build of container as a service (based on Mesosphere DCOS) with OSS CICD lifecycle tool chain such as Jenkins, Tensorflow,

Kubernetes: コンテナオーケストレーションのデファクトスタンダード

–Googleのサービス基盤のコンセプトをGoで再実装しオープンソース化

–Linux Foundationの下位団体であるCloud Native Computinf Foundationの中心プロジェクト

–“Cloud Native”なアプリケーション開発と運用を実現するためのインフラ技術としてコンテナを活用

25

Page 27: HPE HPC & AI フォーラム 2018 講演資料...Design and build of container as a service (based on Mesosphere DCOS) with OSS CICD lifecycle tool chain such as Jenkins, Tensorflow,

Kubernetesエコシステムの一例

–Mesosphere DC/OS

–各種分散サービスのための統合プラットフォーム

–コンテナオーケストレーションとしてKubernetesをサポート

–データサービス、CI/CDツールなどをカタログからデプロイ可能

26

Page 28: HPE HPC & AI フォーラム 2018 講演資料...Design and build of container as a service (based on Mesosphere DCOS) with OSS CICD lifecycle tool chain such as Jenkins, Tensorflow,

Kubeflow: Kubernetes上でMLaaSを実現するプロジェクト

– MLaaSとして必要な各種機能をKubernetes上にインテグレーション

– Jupyterhub: 多チーム対応のモデル作成環境(ノートブック)

– フレームワーク: Tensorflow, Pytorch, Caffe,Chainerなど

– Katib: ハイパーパラメーターチューニング

– Argo: コンテナワークフローエンジン

– Pachyderm: データパイプライン管理

– Serving: モデル・推論のAPI提供

–現在Ver 0.2

– Ver 1.0は2018/12/16リリース予定

27

Page 29: HPE HPC & AI フォーラム 2018 講演資料...Design and build of container as a service (based on Mesosphere DCOS) with OSS CICD lifecycle tool chain such as Jenkins, Tensorflow,

具体例

28

Page 30: HPE HPC & AI フォーラム 2018 講演資料...Design and build of container as a service (based on Mesosphere DCOS) with OSS CICD lifecycle tool chain such as Jenkins, Tensorflow,

学習(Model Training)

29

Page 31: HPE HPC & AI フォーラム 2018 講演資料...Design and build of container as a service (based on Mesosphere DCOS) with OSS CICD lifecycle tool chain such as Jenkins, Tensorflow,

機械学習モデルのトレーニングにおける課題

– データセットの用意

– 増加するプロジェクト/データサイエンティストによるGPUの効率的な活用

– 日進月歩のフレームワーク/ライブラリの活用

30

プラットフォームに求められるケイパビリティ:

• データストアのスケーラビリティーとフレキシビリティ• データの前処理/ETL

• GPUスケジューリング• 機械学習ジョブ実行環境のイメージ管理と柔軟なデプロイメント

Page 32: HPE HPC & AI フォーラム 2018 講演資料...Design and build of container as a service (based on Mesosphere DCOS) with OSS CICD lifecycle tool chain such as Jenkins, Tensorflow,

機械学習のためのコンテナ・オーケストレーション

TensorFlow 1.0

Ubuntu16.04

Container image build automation

Container Orchestration Platform

GPU

GPU

GPU

GPU

Training Job Inference app

InferenceUser

Data Scientist

GPU

GPU

GPU

GPU

GPU

GPU

GPU

GPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

Distributed workers

Training Data Model

CUDA8.0cuDNN v6

TensorFlow 1.0

Ubuntu16.04

TensorFlow 1.0

Ubuntu16.04

Training Job Inference app

Other framework/version

Training Code

Application Code

App Developer

Page 33: HPE HPC & AI フォーラム 2018 講演資料...Design and build of container as a service (based on Mesosphere DCOS) with OSS CICD lifecycle tool chain such as Jenkins, Tensorflow,

DC/OS+Distributed Tensorflowによる学習ジョブ実行例

32

DC/OS Universe Package

Learning Code

TensorflowServer

Scheduling Parameter

TensorflowWorker

TensorflowWorker

TensorflowWorker

CPU

GPU

GPU

CPU

GPU

GPU

CPU

GPU

GPU

学習データ

CPU

CPU

Model

データ前処理パイプライン

Checkpoint

Scheduling

Distributed Tensorflow Job

DC/OSがホストできるワークロード

Data Scientist

Page 34: HPE HPC & AI フォーラム 2018 講演資料...Design and build of container as a service (based on Mesosphere DCOS) with OSS CICD lifecycle tool chain such as Jenkins, Tensorflow,

例 : GPUパワーを利用する即実行可能な環境を瞬時に用意

33クラウド管理

ハードウェリソース (CPU, GPU, Memory, Disk)

コンテナオーケストレーション

データサービス イメージレジストリ

データセット

データセット管理

フレーバーの選択

展開

要件の定義

利用

Page 35: HPE HPC & AI フォーラム 2018 講演資料...Design and build of container as a service (based on Mesosphere DCOS) with OSS CICD lifecycle tool chain such as Jenkins, Tensorflow,

推論(Inference)

34

Page 36: HPE HPC & AI フォーラム 2018 講演資料...Design and build of container as a service (based on Mesosphere DCOS) with OSS CICD lifecycle tool chain such as Jenkins, Tensorflow,

Inference(学習済みモデルを利用した推論アプリケーション)における課題

–学習済みモデルへのアクセス

–入力データに対するリアルタイム処理

–日進月歩のフレームワーク/ライブラリの活用

–アプリケーション開発の効率化

35

プラットフォームに求められるケイパビリティ:

• リアルタイムデータパイプライン• 用途に応じた各種データサービス• アプリケーションランタイムのイメージ管理• アプリケーションのCI/CD

Page 37: HPE HPC & AI フォーラム 2018 講演資料...Design and build of container as a service (based on Mesosphere DCOS) with OSS CICD lifecycle tool chain such as Jenkins, Tensorflow,

例 : リアルタイムの可視化とロンダリング検出

36

金融取引データ

学習済みモデル

メッセージキュー(Kafka)

ロンダリング検出器

POTENTIAL MONEY LAUNDERING: 856 -> 804 totalling 8994 now POTENTIAL MONEY LAUNDERING: 233 -> 954 totalling 8710 now POTENTIAL MONEY LAUNDERING: 318 -> 273 totalling 8883 now

時系列DB(Influxdb) ダッシュボード

(Grafana)

Page 38: HPE HPC & AI フォーラム 2018 講演資料...Design and build of container as a service (based on Mesosphere DCOS) with OSS CICD lifecycle tool chain such as Jenkins, Tensorflow,

まとめ

37

Page 39: HPE HPC & AI フォーラム 2018 講演資料...Design and build of container as a service (based on Mesosphere DCOS) with OSS CICD lifecycle tool chain such as Jenkins, Tensorflow,

⚫データ大事。データなくしてAIは始まらない。

⚫コンテナオーケストレーション技術を活用したMLaaSがこれからのAI開発と活用を支えるプラットフォームとなる。

⚫HPE Pointnextは豊富な経験と先進的な取り組みにより、お客様のイノベーションを支援します。

38

Page 40: HPE HPC & AI フォーラム 2018 講演資料...Design and build of container as a service (based on Mesosphere DCOS) with OSS CICD lifecycle tool chain such as Jenkins, Tensorflow,

Thank you