from a single droplet to a full bottle, our journey to hadoop at coca-cola east japan
TRANSCRIPT
コカ コーライーストジャパン株式会社・
From a single droplet to a full bottle, our journey to Hadoop at Coca-Cola east Japan
October 27, 2016Information Systems, Enterprise Architect
& Innovation project manager
Damien Contrerasダミアン コントレラ
コカ コーライーストジャパン株式会社・
In This session
• About Coca-Cola East Japan• Hadoop Journey at CCEJ• Hadoop Projects• Hadoop for the manufacturing
industry• Hadoop for CCEJ: What’s Next
3コカ コーライーストジャパン株式会社・
• Coca-Cola East Japan was established on Jul. 1, 2013 through the merger of four bottlers. • On Apr. 1, 2015, it underwent further business integration with
Sendai Coca-Cola Bottling Co. , Ltd.• Announced MOU with Coca-Cola West on April 26, 2016 to
proceed with discussions/review of business integration opportunities
• Japan's largest Coca-Cola Bottler, with an extensive local network, selling the most popular beverage brands in Japan
Data as of December 2015
About Coca-Cola East Japan
コカ コーライーストジャパン株式会社・
HADOOP JOURNEY AT CCEJ
コカ コーライーストジャパン株式会社・
CCEJ Data Landscape
DATA IN SILOS(Datamart, ERP, DWH, Staging, Mainframe,…)
P2P INTERFACES(No ESB, Multiple ETL & Interface Servers)
NO GOVERNANCE(Multiple Data formats for same business
context, No Meta Data Mgt.)
BATCH ORIENTED(File, Scheduler, …)
コカ コーライーストジャパン株式会社・
Hadoop Journey: Genesis
Yarn
HiveKNIME
WEKA Tez
Analytics System Processing Integration Data sourceData
Restitution
HDFS
MR
Centos
Flat files
July 2015• Pilot phase• 5 nodes• Azure A1 A4• 100GB• 70GB of RAM• Team: 1 person
Ambari
KNIME
コカ コーライーストジャパン株式会社・
Hadoop Journey: Stability
Yarn
Hive
Ranger
KNIME
TezPython
NotebookNiFi
Analytics System Processing Integration Data sourceData
Restitution
Flat files
HDFS
MR
Centos
Active Directory
November 2015• Pilot phase• 6 nodes• Azure A4 D & DS13• 1TB of data• 336GB of RAM• Team: 2 persons
Zeppelin
Ambari
コカ コーライーストジャパン株式会社・
Hadoop Journey: Production
Yarn
HiveSpark
BW on Hana
Ranger
KNIME Zeppelin
TezPython
Notebook
NiFi
Analytics System Processing Integration Data sourceData
Restitution
Flat files
Web Services
HDFS
MR
Centos
Active Directory
March 2016• 8 nodes• Azure D/DS13• 3TB of Data• 64 cores• 448GB Ram• Team: 2 people
Ambari
コカ コーライーストジャパン株式会社・
• 13 nodes• 20TB• 104 cores• 728GB RAM• 1000+ Tables• 3 Production Systems
Hadoop Eco-system at CCEJ
Analytics System Processing Integration
Data source
Data Restituti
on
Aggregated data Visualization
2 Data Hub
Past data Forecast data
1AAAAAAAAA
3 Master Data
Centralize
Lineage
Governance
Yarn
Hive
Spark
BW on HanaHTML
Report
Ranger
Zeppelin
Tez Presto
AirPal
PythonNotebook
MySQL
NiFi
SAP ECC
Boomi
Sparkling WaterTensorflow
Flat files
Web Services
HDFS
MR
Drill
Centos
Active Directo
ry
Ambari
KNIME
コカ コーライーストジャパン株式会社・
May
Jun July Aug
Sept
Oct Nov
Dec Jan Feb Mar Apr May
Jun July Aug
Sept
OctTimeline
Hadoop / NiFi PlatformPlatform POC
VM Analytics POC Forecast ImplementationVM Analytics POC
2015 2016
POC VM Placement
Flow implementationBW Report integration
1 SAP integration & MDM3
2 Write-Off report
コカ コーライーストジャパン株式会社・
HADOOP PROJECTS
20TB
コカ コーライーストジャパン株式会社・
HIGH Nbr. OF MACHINES
550,000 VM, On/Offline
Nbr. SKUs per VM
25 SKUs, Hot & Cold
Vending Replenishment: The Business Case
EXTERNAL FACTORS(Weather, City data, Geo-Location, Events )
VENDING ROUTES(Visit List per truck, Logistics dependence)
ColdHot
How to:Reduce nbr. of visits
Optimize Truck stock Avoid out of stocks
コカ コーライーストジャパン株式会社・
Vending replenishment forecast: The Project
The Challenge:• Deployment in 3 months
• 1 ½ hour to generate the forecast
• +20% of accuracy versus previous version
• 120 steps in the program
Picking list
Visit Plan
Online VM
Offline VM
Every day
Yes NoNoArbitrationForecast generation
Hadoop Has Delivered:• Feed 5GB+ of new data everyday• Process high volume of data (in-memory)
300GB+• Integrate from different data sources• Generate more complicated forecast than
legacy systems
14 Million items
コカ コーライーストジャパン株式会社・
But Hadoop is not only a data lake
コカ コーライーストジャパン株式会社・
Staging: The Case of “write-off” report
Drill Web ServerAzure
X7systems
Master Data
Generate SQL query
JSON
HTML Interface
Verify & CheckCombine
Report
Challenges:• Data set harmonization
(Sales, Billing, Inventory)• Data volume from source
systems• Complex Computation logic• Not clear functional
requirements
Objectives:• Aggregate a large number of
dataset 40+ flows 4GB of data everyday
• Single view of data, anywhere, to Finance, SC & Commercial
• Dynamic transaction vs. static in excel
• Reduce manual work to zero
Comparison=
Aggregation=+
Enrichment
Analytics
Transformation (conversion)
コカ コーライーストジャパン株式会社・
MDM: Centralization and Dispatch
External Systems
4 Replicate data
Event driven
3 Consistency check
Rule engine Replication EngineMDM Repository
2 MDM registration
Lineage
1 MDM Creation
Challenges:• Rule engine definition and
implementation• MDM on Hadoop & ESB
integration• MDM & SAP Synchronization
Objectives:• Single MDM repository• Centralized bridge tables &
Mapping table• Standardization of MDM across
data landscape• Targeted distribution / replication
of MDM to external systems
Realization:• MySQL and Hadoop synchronization
300+ tables• Replication engine with ESB• MDM-Tool: Pilot with Customer
Master• Full go-live: April 2017
コカ コーライーストジャパン株式会社・
Use case – SAP Integration / sales interface reportObjectives:
• Leverage the most granular data already in Hadoop
• Leverage the processing power of Hadoop
x9 flows
x4 flows
x7 flows
x9 flowsMD & Bridge
Vending Sales Data
Legacy format data
CCEJ format data
Bridge table& Master Combine
Calculate
x9 output tables
Company 1
Company 2
Company 3
Azure
Challenges:• Many data format requiring
complex data transformation• Wide variety of data sources &
technologies to transfer data • Data mapping between systems
Realization:• Data structure in Hadoop• Logic for one type of sales
channel implemented • Full go-live: April 2017
コカ コーライーストジャパン株式会社・
Hadoop: What’s Next
Increase data velocity & Create a true Data Lake
Improve data collection, quality, profiling, meta-data & propose a catalog of curated data to end users
Toward a Data Driven Decision Process
Develop Support & Operational Excellence
コカ コーライーストジャパン株式会社・
I thank CCEJ management who had the courage to believe in an Agile approach
Thank to my team member and comrade:
Vinay Mahadev for all the long hours we’ve put together to make this project a reality
コカ コーライーストジャパン株式会社・
Your turn, let s share ideas & a coke !
Damien ContrerasEmail: [email protected]: Damien ContrerasTwitter: @dvolute
コカ コーライーストジャパン株式会社・
The inside of Hadoop
コカ コーライーストジャパン株式会社・
BW on Hana
Integration Landscape overviewHadoop Prod
Nifi Prod
NiFiProd
Oracle
Boomi
HiveJDBC
DrillIDOCS
JDBC
Flat files
MySQL
SAP ECC
Other systems
Other systems
FTP
JDBC
HTTP HTMLinterface
Power users
Acquisition Transformation Restitution
dt=20161024
dt=20161025
t_my_table_txt_p
My_file_20161024.csv
My_file_20161025.csv
Myflow-data
t_my_table_txt_p(External text tables)
t_my_table_txt_p
t_my_bridge_table_txt_p
+Myflow-data(Database)
t_my_report_orc_p(ORC tables)
コカ コーライーストジャパン株式会社・
Guidelines around NiFi flows
Prod
Dev
Prod
Dev
Azure
T r i g g e r sSystem source NiFi
Listener
Extraction
webCall
JDBC
AAAAAA
E n c r y p t i o n
/ Flow
Master Data
Transaction Data
ProcessingGroup
コカ コーライーストジャパン株式会社・
Guidelines around NiFi flows
AAAAA
Processor
Write to error logSuccess
OnErrorRead from Error log
Re-Process
Update Error log
Send Data
Every 5 mins
AAAAAAAAAAAAA / Flow
Master Data
Transaction Data
コカ コーライーストジャパン株式会社・
NiFi enhancement: example
コカ コーライーストジャパン株式会社・
Technical ArchitectureHadoop Production environment
….
Node 3
Node 4
Node 5 Node 11
AD
NiFi
Node 0
Node 1
Node 2 Node 6
Hadoop Dev environment
Node 3Node 0
Node 1 Node 2
Prod environment
Dev environment
RDBMS
FTP Server
SAP ECC
Azure
NiFi
NiFi
NiFi
…