tableau and hadoop
DESCRIPTION
Architecture patterns for using Tableau and Hadoop.TRANSCRIPT
TABLEAU AND HADOOPTableau’s Place in a Big Data Architecture
DAMA, Tableau User Group Meeting
November 13, 2014
TA
BLEA
UA
ND
HA
DO
OP
Agenda
BI/DW Workload Categories & Tableau
Three Integration Models
Capability Models
Architecture Patterns
Summary
Q & A
2
TA
BLEA
UA
ND
HA
DO
OP
Workload Categories
3
Operational BI Data Exploration Data Science
• Operational processes • Reports and dashboards• Transactional sys integration• Automatic distribution• 100s – 1,000s of consumers
• Front-line staff• Data analysts• Business leaders• Executives
• Production data prep• High availability• Report archiving• Op sys response time & SLA• Enterprise governance• Enterprise security• Self-service
• report access & Interactivity
• Decision support processes• Less strict definition• Ad hoc reports and
dashboards• Perf mgmt analysis by • 100s of users
• data analysts • business leaders
• Production & manual data prep
• Enterprise or div governance• Corporate security• Self-Service
• Query• Report/analysis
authoring• Data design• Metadata definition
• Complex data exploration • Descriptive analytics• Predictive statistical models • Machine learning algorithms• Large data volumes• Wide data variety• 10s of users
• Data scientists• Technologists
• Departmental governance• Raw data (Bus & IT)• Derivative data (Bus & IT)• Self-Service: Full
Tableau
TA
BLEA
UA
ND
HA
DO
OP
Three Integration Models
Isolated Exploration Environment (aka Sandbox)
Snapshot of data cached on desktop or server
Frequency of data change is analyst dependent
Integrations occur through analyst, not enterprise, work
Live Interactive Query (aka BI/DW)
Constantly changing data stored in an enterprise data platform.
Frequency of data change is independent of analyst
Integrations occur primarily through enterprise work
Integrated Advanced Analytic Platform
Access to [custom] advanced analytic algorithms through Tableau
Application of analytic algorithms to new datasets
4
Analyst
Isolated Exploration EnvironmentT
AB
LEAU
AN
DH
AD
OO
P
Visual Exploration Prototype Analytical Applications
5
Metadata Tool
?
Analyst
Tableau SAS
Visual navigation Measures Hierarchies
Statistical profile
Technical & business metadata
?
Tableau
IntegrationsData designVisual organizationGranularity
Isolated Exploration Environment (aka Sandbox)Snapshot of data cached on desktop or serverFrequency of data change is analyst dependentIntegrations occur through analyst, not enterprise, work
Live Interactive QueryT
AB
LEAU
AN
DH
AD
OO
P
Dashboarding Performance Management Analysis
6
Tableau
Visually engagingKPIs Defined analysis paths
Analyst
Define
Developer
Build
Business Leaders
& Staff
Use
Tableau
KPIs Ad hoc analysis pathsDetail records
Analyst
Iterates
Generate
AnalysisRecommendation
Live Interactive Query (aka BI/DW)Constantly changing data stored in an enterprise data platform.Frequency of data change is independent of analystIntegrations occur primarily through enterprise work
TA
BLEA
UA
ND
HA
DO
OP
Integrated Advanced Analytic Platform
Enabling a “Clinical Trials” Model for Data Science
7
Phase IModel Discovery
Phase IIConfirmation
Phase IIIPilot
Phase IVRollout
Data Science Team(Centralized)
Data Analysts(Decentralized)
Select Business Leaders
Staff or Customers
AllBusiness Leaders
Staff or Customers
• Appropriate modeling technique
• Rapid iterations• Tool & algorithm
variety
• Confirm value• Wider application• Tool & data
conformity• Demo business value• Demo feasibility
• Realized value• Refine through
application
Tableau
Integrated Advanced Analytic PlatformAccess to [custom] advanced analytic algorithms through TableauApplication of analytic algorithms to new datasets
TA
BLEA
UA
ND
HA
DO
OP
Analytic Capabilities & Hadoop
Architecture Pattern
Capability Suitable for Hadoop / Considerations
Isolated Exploration Environment
Visual Exploration
Possibly• Dataset has limited joins • Dataset is large enough to warrant Hadoop as the
“cache”
Prototype Analytical Apps
No• Too many joins typically required for a prototype• Prototypes can be confirmed on data subsets
Live Interactive Query
Dashboards No• Too many concurrent users • Response time requirements are too stringent
PerformanceMgmt Analysis
Possibly• Dataset has limited joins • Dataset is large enough to warrant Hadoop as the
repository
Integrated Advanced Analytic Platform
“Clinical Trial” approach
Yes.• Tableau’s R integration • Hadoop’s UDF, UDAF features
8
TA
BLEA
UA
ND
HA
DO
OP
Architecture Pattern
Isolated Exploration Environment
9
Tableau
Desktopcache
Private
Data Data analyst
Business Leader
On demand
Enterprise
Data Asset
Extract Interactive
query
Isolated Exploration Environment
Tableau
Server
Enterprise
Data Asset
TA
BLEA
UA
ND
HA
DO
OP
Architecture Pattern
Live Interactive Query
10
cache
cache
Data analyst
Developer
Cached Live Query
Live Query
Live Interactive Query
Tableau
Desktop
Tableau
Browser & Mobile
TA
BLEA
UA
ND
HA
DO
OP
Architecture Pattern
Integrated Advanced Analytic Platform
11
Enterprise
Data Asset
Analytic
Workbench
M
M
M
M
Live Query
Live Query via
SQL extensions
& R integration
python, R,
SAS, …
Data analyst
Data scientist
Interactive Advanced
Analytic Platform
cache
Analytic ModelM
References:http://www.tableausoftware.com/about/blog/tableau-and-marklogichttp://developer.marklogic.com/blog/the-art-of-the-possible-marklogic-tableau-publichttps://cwiki.apache.org/confluence/display/Hive/HivePlugins
SQL Extension Examples
MarkLogic SPARQLSELECT name, affiliation
FROM emails
WHERE subject MATCH “answer”
HiveQLSELECT my_function(…),
sum(freq)
FROM myDataTable;
Tableau
Server
TA
BLEA
UA
ND
HA
DO
OP
Architecture Pattern
Integrated Advanced Analytic Platform
12
References:https://boraberan.wordpress.com/2013/12/24/sentiment-analysis-in-tableau-with-r/ http://cran.r-project.org/src/contrib/Archive/sentiment/ http://kb.tableausoftware.com/articles/knowledgebase/r-implementation-noteshttp://www.tableausoftware.com/about/blog/2013/10/tableau-81-and-r-25327
Enterprise
Data Asset
Analytic
Workbench
M
M
M
M
Live Query
Live Query via
SQL extensions
& R integration
python, R,
SAS, …
Data analyst
Data scientist
Interactive Advanced
Analytic Platform
cache
Analytic ModelM
Tableau
Server
R integration example
Install R package called sentimentCall classify_polarity R function using SCRIPT_STR function
Live Interactive Query
Interactive Advanced
Analytic Platform
Tableau
Browser & Mobile
Tableau
Desktop
TA
BLEA
UA
ND
HA
DO
OP
Consolidated Architecture
13
Tableau
Desktopcache
Private
Data
Data analyst
Business Leader
On demand
Enterprise
Data AssetExtract Interactive
query
Isolated Exploration Environment
W WTableau
Server
Tableau
Servercache
Data analyst
DeveloperCached Live Query
Live Query
Analytic
Workbench
M
M
M
M
Live Query
Live Query via
SQL extensions
& R integration
python, R,
SAS, …
Data analyst
Data scientist
cache
TA
BLEA
UA
ND
HA
DO
OP
Summary, Q&A
– Thank you –
Contact Information
Craig Jordan
LinkedIn: www.linkedin.com/in/crjordan/
Email: [email protected]
15