from a single droplet to a full bottle, our journey to hadoop at coca-cola east japan

26
ココ コココココココココココココココ From a single droplet to a full bottle, our journey to Hadoop at Coca-Cola east Japan October 27, 2016 Information Systems, Enterprise Architect & Innovation project manager Damien Contreras ダダダダ ダダダダダ

Upload: hadoop-summit

Post on 07-Jan-2017

803 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: From a single droplet to a full bottle, our journey to Hadoop at Coca-Cola East Japan

コカ コーライーストジャパン株式会社・

From a single droplet to a full bottle, our journey to Hadoop at Coca-Cola east Japan

October 27, 2016Information Systems, Enterprise Architect

& Innovation project manager

Damien Contrerasダミアン コントレラ

Page 2: From a single droplet to a full bottle, our journey to Hadoop at Coca-Cola East Japan

コカ コーライーストジャパン株式会社・

In This session

• About Coca-Cola East Japan• Hadoop Journey at CCEJ• Hadoop Projects• Hadoop for the manufacturing

industry• Hadoop for CCEJ: What’s Next

Page 3: From a single droplet to a full bottle, our journey to Hadoop at Coca-Cola East Japan

3コカ コーライーストジャパン株式会社・

• Coca-Cola East Japan was established on Jul. 1, 2013 through the merger of four bottlers. • On Apr. 1, 2015, it underwent further business integration with

Sendai Coca-Cola Bottling Co. , Ltd.• Announced MOU with Coca-Cola West on April 26, 2016 to

proceed with discussions/review of business integration opportunities

• Japan's largest Coca-Cola Bottler, with an extensive local network, selling the most popular beverage brands in Japan

Data as of December 2015

About Coca-Cola East Japan

Page 4: From a single droplet to a full bottle, our journey to Hadoop at Coca-Cola East Japan

コカ コーライーストジャパン株式会社・

HADOOP JOURNEY AT CCEJ

Page 5: From a single droplet to a full bottle, our journey to Hadoop at Coca-Cola East Japan

コカ コーライーストジャパン株式会社・

CCEJ Data Landscape

DATA IN SILOS(Datamart, ERP, DWH, Staging, Mainframe,…)

P2P INTERFACES(No ESB, Multiple ETL & Interface Servers)

NO GOVERNANCE(Multiple Data formats for same business

context, No Meta Data Mgt.)

BATCH ORIENTED(File, Scheduler, …)

Page 6: From a single droplet to a full bottle, our journey to Hadoop at Coca-Cola East Japan

コカ コーライーストジャパン株式会社・

Hadoop Journey: Genesis

Yarn

HiveKNIME

WEKA Tez

Analytics System Processing Integration Data sourceData

Restitution

HDFS

MR

Centos

Flat files

July 2015• Pilot phase• 5 nodes• Azure A1 A4• 100GB• 70GB of RAM• Team: 1 person

Ambari

KNIME

Page 7: From a single droplet to a full bottle, our journey to Hadoop at Coca-Cola East Japan

コカ コーライーストジャパン株式会社・

Hadoop Journey: Stability

Yarn

Hive

Ranger

KNIME

TezPython

NotebookNiFi

Analytics System Processing Integration Data sourceData

Restitution

Flat files

HDFS

MR

Centos

Active Directory

November 2015• Pilot phase• 6 nodes• Azure A4 D & DS13• 1TB of data• 336GB of RAM• Team: 2 persons

Zeppelin

Ambari

Page 8: From a single droplet to a full bottle, our journey to Hadoop at Coca-Cola East Japan

コカ コーライーストジャパン株式会社・

Hadoop Journey: Production

Yarn

HiveSpark

BW on Hana

Ranger

KNIME Zeppelin

TezPython

Notebook

NiFi

Analytics System Processing Integration Data sourceData

Restitution

Flat files

Web Services

HDFS

MR

Centos

Active Directory

March 2016• 8 nodes• Azure D/DS13• 3TB of Data• 64 cores• 448GB Ram• Team: 2 people

Ambari

Page 9: From a single droplet to a full bottle, our journey to Hadoop at Coca-Cola East Japan

コカ コーライーストジャパン株式会社・

• 13 nodes• 20TB• 104 cores• 728GB RAM• 1000+ Tables• 3 Production Systems

Hadoop Eco-system at CCEJ

Analytics System Processing Integration

Data source

Data Restituti

on

Aggregated data Visualization

2 Data Hub

Past data Forecast data

1AAAAAAAAA

3 Master Data

Centralize

Lineage

Governance

Yarn

Hive

Spark

BW on HanaHTML

Report

Ranger

Zeppelin

Tez Presto

AirPal

PythonNotebook

MySQL

NiFi

SAP ECC

Boomi

Sparkling WaterTensorflow

Flat files

Web Services

HDFS

MR

Drill

Centos

Active Directo

ry

Ambari

KNIME

Page 10: From a single droplet to a full bottle, our journey to Hadoop at Coca-Cola East Japan

コカ コーライーストジャパン株式会社・

May

Jun July Aug

Sept

Oct Nov

Dec Jan Feb Mar Apr May

Jun July Aug

Sept

OctTimeline

Hadoop / NiFi PlatformPlatform POC

VM Analytics POC Forecast ImplementationVM Analytics POC

2015 2016

POC VM Placement

Flow implementationBW Report integration

1 SAP integration & MDM3

2 Write-Off report

Page 11: From a single droplet to a full bottle, our journey to Hadoop at Coca-Cola East Japan

コカ コーライーストジャパン株式会社・

HADOOP PROJECTS

20TB

Page 12: From a single droplet to a full bottle, our journey to Hadoop at Coca-Cola East Japan

コカ コーライーストジャパン株式会社・

HIGH Nbr. OF MACHINES

550,000 VM, On/Offline

Nbr. SKUs per VM

25 SKUs, Hot & Cold

Vending Replenishment: The Business Case

EXTERNAL FACTORS(Weather, City data, Geo-Location, Events )

VENDING ROUTES(Visit List per truck, Logistics dependence)

ColdHot

How to:Reduce nbr. of visits

Optimize Truck stock Avoid out of stocks

Page 13: From a single droplet to a full bottle, our journey to Hadoop at Coca-Cola East Japan

コカ コーライーストジャパン株式会社・

Vending replenishment forecast: The Project

The Challenge:• Deployment in 3 months

• 1 ½ hour to generate the forecast

• +20% of accuracy versus previous version

• 120 steps in the program

Picking list

Visit Plan

Online VM

Offline VM

Every day

Yes NoNoArbitrationForecast generation

Hadoop Has Delivered:• Feed 5GB+ of new data everyday• Process high volume of data (in-memory)

300GB+• Integrate from different data sources• Generate more complicated forecast than

legacy systems

14 Million items

Page 14: From a single droplet to a full bottle, our journey to Hadoop at Coca-Cola East Japan

コカ コーライーストジャパン株式会社・

But Hadoop is not only a data lake

Page 15: From a single droplet to a full bottle, our journey to Hadoop at Coca-Cola East Japan

コカ コーライーストジャパン株式会社・

Staging: The Case of “write-off” report

Drill Web ServerAzure

X7systems

Master Data

Generate SQL query

JSON

HTML Interface

Verify & CheckCombine

Report

Challenges:• Data set harmonization

(Sales, Billing, Inventory)• Data volume from source

systems• Complex Computation logic• Not clear functional

requirements

Objectives:• Aggregate a large number of

dataset 40+ flows 4GB of data everyday

• Single view of data, anywhere, to Finance, SC & Commercial

• Dynamic transaction vs. static in excel

• Reduce manual work to zero

Comparison=

Aggregation=+

Enrichment

Analytics

Transformation (conversion)

Page 16: From a single droplet to a full bottle, our journey to Hadoop at Coca-Cola East Japan

コカ コーライーストジャパン株式会社・

MDM: Centralization and Dispatch

External Systems

4 Replicate data

Event driven

3 Consistency check

Rule engine Replication EngineMDM Repository

2 MDM registration

Lineage

1 MDM Creation

Challenges:• Rule engine definition and

implementation• MDM on Hadoop & ESB

integration• MDM & SAP Synchronization

Objectives:• Single MDM repository• Centralized bridge tables &

Mapping table• Standardization of MDM across

data landscape• Targeted distribution / replication

of MDM to external systems

Realization:• MySQL and Hadoop synchronization

300+ tables• Replication engine with ESB• MDM-Tool: Pilot with Customer

Master• Full go-live: April 2017

Page 17: From a single droplet to a full bottle, our journey to Hadoop at Coca-Cola East Japan

コカ コーライーストジャパン株式会社・

Use case – SAP Integration / sales interface reportObjectives:

• Leverage the most granular data already in Hadoop

• Leverage the processing power of Hadoop

x9 flows

x4 flows

x7 flows

x9 flowsMD & Bridge

Vending Sales Data

Legacy format data

CCEJ format data

Bridge table& Master Combine

Calculate

x9 output tables

Company 1

Company 2

Company 3

Azure

Challenges:• Many data format requiring

complex data transformation• Wide variety of data sources &

technologies to transfer data • Data mapping between systems

Realization:• Data structure in Hadoop• Logic for one type of sales

channel implemented • Full go-live: April 2017

Page 18: From a single droplet to a full bottle, our journey to Hadoop at Coca-Cola East Japan

コカ コーライーストジャパン株式会社・

Hadoop: What’s Next

Increase data velocity & Create a true Data Lake

Improve data collection, quality, profiling, meta-data & propose a catalog of curated data to end users

Toward a Data Driven Decision Process

Develop Support & Operational Excellence

Page 19: From a single droplet to a full bottle, our journey to Hadoop at Coca-Cola East Japan

コカ コーライーストジャパン株式会社・

I thank CCEJ management who had the courage to believe in an Agile approach

Thank to my team member and comrade:

Vinay Mahadev for all the long hours we’ve put together to make this project a reality

Page 20: From a single droplet to a full bottle, our journey to Hadoop at Coca-Cola East Japan

コカ コーライーストジャパン株式会社・

Your turn, let s share ideas & a coke !

Damien ContrerasEmail: [email protected]: Damien ContrerasTwitter: @dvolute

Page 21: From a single droplet to a full bottle, our journey to Hadoop at Coca-Cola East Japan

コカ コーライーストジャパン株式会社・

The inside of Hadoop

Page 22: From a single droplet to a full bottle, our journey to Hadoop at Coca-Cola East Japan

コカ コーライーストジャパン株式会社・

BW on Hana

Integration Landscape overviewHadoop Prod

Nifi Prod

NiFiProd

Oracle

Boomi

HiveJDBC

DrillIDOCS

JDBC

Flat files

MySQL

SAP ECC

Other systems

Other systems

FTP

JDBC

HTTP HTMLinterface

Power users

Acquisition Transformation Restitution

dt=20161024

dt=20161025

t_my_table_txt_p

My_file_20161024.csv

My_file_20161025.csv

Myflow-data

t_my_table_txt_p(External text tables)

t_my_table_txt_p

t_my_bridge_table_txt_p

+Myflow-data(Database)

t_my_report_orc_p(ORC tables)

Page 23: From a single droplet to a full bottle, our journey to Hadoop at Coca-Cola East Japan

コカ コーライーストジャパン株式会社・

Guidelines around NiFi flows

Prod

Dev

Prod

Dev

Azure

T r i g g e r sSystem source NiFi

Listener

Extraction

webCall

JDBC

AAAAAA

E n c r y p t i o n

/ Flow

Master Data

Transaction Data

ProcessingGroup

Page 24: From a single droplet to a full bottle, our journey to Hadoop at Coca-Cola East Japan

コカ コーライーストジャパン株式会社・

Guidelines around NiFi flows

AAAAA

Processor

Write to error logSuccess

OnErrorRead from Error log

Re-Process

Update Error log

Send Data

Every 5 mins

AAAAAAAAAAAAA / Flow

Master Data

Transaction Data

Page 25: From a single droplet to a full bottle, our journey to Hadoop at Coca-Cola East Japan

コカ コーライーストジャパン株式会社・

NiFi enhancement: example

Page 26: From a single droplet to a full bottle, our journey to Hadoop at Coca-Cola East Japan

コカ コーライーストジャパン株式会社・

Technical ArchitectureHadoop Production environment

….

Node 3

Node 4

Node 5 Node 11

AD

NiFi

Node 0

Node 1

Node 2 Node 6

Hadoop Dev environment

Node 3Node 0

Node 1 Node 2

Prod environment

Dev environment

RDBMS

FTP Server

SAP ECC

Azure

NiFi

NiFi

NiFi