breathe new life into your data warehouse by offloading etl processes to hadoop
TRANSCRIPT
Breathe New Life Into Your Data Warehouse by Offloading ETL on Hadoop
Shahab KamalSupreet Oberoi
ABOUT CONCURRENT
2
TRUSTEDby over 10,000
companies as their big data app platform
BACKEDby top Silicon Valley
investors True Ventures,Rembrandt VP, Bain
Capital
FOUNDEDin 2008, with
headquarters in San Francisco
•Founded in 1995
•HQ in Chicago, IL
•Offices in India & Australia
• ISO 9001:2008 & ISO
27001:2005 Certified
Most experienced
data professionals
Proprietary frameworks and accelerators for
guaranteed, efficient and cost-effective services for data
projects
ABOUT BITWISE
3
ENTERPRISE ENGAGEMENT WITH HADOOP IS GAINING DEPTH…
4
Improving brand experience, creating new revenue channels, enhancing operational visibility to risk & compliance, reducing
TCO have been the key drivers, engaging at levels of CEO, CIO, CDO
Analytics
EMERGING ENTERPRISE ARCHITECTURE FOR HADOOP
5
Reporting Mining Analytics
Exploratory Discovery Search
Data Mart
ReportingData Mining
STAGE TRANSFORM ARCHIVE
Data Lake
CASE STUDY
6
RECOVERY APPLICATIONRECOVERY APPLICATIONDATA SOURCE
ANALYTICSANALYTICS
REPORTINGSREPORTINGS
Developer UI
XMLCustomCode
Execution Service
Cascading Framework
Generate Cascading Flow
Launch MapReduce Jobs
On ExecutionETL Application
ETL Application
RECOVERY APPLICATIONRECOVERY APPLICATIONDATA SOURCE
ANALYTICSANALYTICS
REPORTINGSREPORTINGS
Automated ETL
Conversion
RDBMSRDBMS
RDBMS
Data Quality
Monitoring
Dat
a Q
ualit
y M
onito
ring
ETLTesting
ETLConversion QualiDI Data Quality
Framework
BITWISE ETL TOOL ARCHITECTURE
7
Developer UI
XMLCustomCode
Execution Service
CascadingFramework
Development Environment
DRIVEN PROVIDES OPERATIONAL READINESS TO ETL WORKLOADS
PERFORMANCE MANAGEMENT FOR BIG DATA APPLICATIONS
higher quality big data apps
BUILDbig data apps more reliably
RUNbig data apps
more effectively
MANAGE
BUILD HIGHER QUALITY BIG DATA APPS
9
SOURCES OPERATIONS(Functions, filters, joins, and aggregators)
RESULTS
Fully visualize your entire data pipeline Quickly and easily identify execution errors
10
BUILD HIGHER QUALITY BIG DATA APPSFully visualize your entire data pipeline Quickly and easily identify execution errors
RUN BIG DATA APPS MORE RELIABLY
11
CURRENTLY EXECUTING
Watch your apps execute in real time Easily detect apps that violate SLA’s and policiesPinpoint bottlenecks and identify causes
RUN BIG DATA APPS MORE RELIABLY
12
Pinpoint bottlenecks and identify causes
EXECUTING WAITING
Watch your apps execute in real time Easily detect apps that violate SLA’s and policiesPinpoint bottlenecks and identify causes
DETAILED MAPPER/REDUCER STATS
RUN BIG DATA APPS MORE RELIABLY
13
Pinpoint bottlenecks and identify causes
Watch your apps execute in real time Easily detect apps that violate SLA’s and policiesPinpoint bottlenecks and identify causes
For example, see metrics for all apps on the production cluster that failed to execute in under 5 minutes…
…or all applications that use more than their allotment of mappers
MANAGE BIG DATA APPS MORE EFFECTIVELY
14
See how all apps consume resources as they run Compare performance, resource consumption, and other metrics across departments, teams and any segment you define
MANAGE BIG DATA APPS MORE EFFECTIVELY
15
See how all apps consume resources as they run Segment performance by team, by department or custom tags for role-based views, chargeback models, and capacity planning
For example, see performance of all apps owned by the DevOps team
Marketing Sales Compliance Data science team QA cluster Production cluster
MANAGE BIG DATA APPS FOR COMPLIANCE
16
Visualize Lineage – See exactly how each app ingests, manipulates and outputs data
Further inspect lineage by detecting apps that write to, or read from, a given dataset
SOURCES OPERATIONS(Functions, filters, joins, and aggregators)
RESULTS
MANAGE BIG DATA APPS FOR COMPLIANCE
17
Visualize Lineage – See exactly how each app ingests, manipulates and outputs data
Further inspect lineage by detecting apps that write to, or read from, a given dataset
For example, show all apps that interact with the dataset in “rain.txt”
MANAGE BIG DATA APPS FOR COLLABORATION
18
Create JIRA issues with views and data for quickly collaborating to resolve performance problems
Integrate alerts with popular notification platforms like HipChat, PagerDuty, & Nagios
With one click, create a Jira issue with a link to this view
MANAGE BIG DATA APPS FOR COLLABORATION
19
Create JIRA issues with views and data for quickly collaborating to resolve performance problems
Integrate alerts with popular notification platforms like HipChat, PagerDuty, & Nagios
Automatically send app status notifications via webhooks or JMX
NURTURE A CULTURE OF OPERATIONAL EXCELLENCE NURTURE A CULTURE OF OPERATIONAL EXCELLENCE
“The coolest part about Driven is being able to visualize data pipelines and inspect components in real time for easy troubleshooting and optimization. I don't know of any other tool that's close in functionality.”
- Neville LiSoftware Engineer, Spotify
20
”Driven has given us a way to monitor the performance of our data-driven applications in a manner which is visually intuitive to both engineering and business users.”
- Joao Vicente Performance Architect Dun & Bradstreet
End-to-end operational telemetry metadata for big data applicationsAccessible via Web browser, command-line interface (CLI), or simple search queriesEasy integrations through JMX and upcoming Driven SDK
… THROUGH A SCALABLE, SEARCHABLE METADATA STORE
Telemetry metadata(SSL)
YARNYARN
HADOOP APPS AND INFRASTRUCTURE
APPLICATIONS
Plugin
21
HADOOP CLUSTERS
WAR
files Web App
Server
Server
Web CLI JMX
Web AppServer
THANK YOU