security and governance on hadoop with apache atlas and apache ranger by srikanth venkat
TRANSCRIPT
1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Enterprise Ready Security & Governance with Hortonworks Data PlatformSrikanth Venkat Senior Director, Product Management
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Protecting the Elephant in the Castle…..Kerberos,
Wire Encryption
HDFS Encryption
Apache RangerNetwork Segmentation,
Firewalls
LDAP/AD
Apache Knox
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Ranger
• Central audit location for all access requests
• Support multiple destination sources (HDFS, Solr, etc.)
• Real-time visual query interface
AuditingAuthorization
• Store and manage encryption keys• Support HDFS Transparent Data
Encryption• Integration with HSM
• Safenet LUNA
Ranger KMS
• Centralized platform to define, administer and manage security policies consistently across Hadoop components
• HDFS, Hive, HBase, YARN, Kafka, Solr, Storm, Knox, NiFi
• Extensible Architecture• Custom policy conditions, user context
enrichers• Easy to add new component types for
authorization
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Ranger Architecture
HDFS
Ranger Administration Portal
HBase
Hive Server2
Ranger Audit Server
Ranger Plugin
Had
oop
Com
pone
nts
Ent
erpr
ise
Use
rs
Ranger Plugin
Ranger Plugin
Legacy Tools and Data Governance
HDFS
Knox
NifI
Ranger Plugin
Ranger Plugin
SolrRanger Plugin
Ranger Policy Server Integration API
KafkaRanger Plugin
YARNRanger Plugin
Ranger PluginStorm Ranger Plugin Atlas
Solr
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Enterprise Data Governance: Apache Atlas Data Managementalong the entire data lifecycle with integrated provenance and lineage capability
• Cross component lineage
Modeling with Metadataenables comprehensive business metadata vocabulary with enhanced tagging and attribute capabilities
• Common Business Language
• Hierarchically organized – No dupes !
Interoperable Solutionsacross the Hadoop ecosystem, through a common metadata store
• Combine and Exchange Metadata
STRUCTURED
UNSTRUCTURED
TRADITIONALRDBMS
METADATA
MPP APPLIANCES
Kafka Storm
Sqoop
Hive
ATLASMETADATA
Falcon
RANGER
STREAMING
Custom
Partners
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
High Level Architecture: 4 Key points
Type System
Repository
Search DSL
Brid
geHive Storm
Falcon Custom
REST API
Graph DB
Sear
ch
Kafka
SqoopCo
nnec
tors
Mes
sagi
ng F
ram
ewor
k
3 REST APIModern, flexible access to Atlas services, HDP components, UI & external tools
1 Data Lineage Only product that captures lineage across Hadoop components at platform level.
4 ExchangeLeverage existing metadata / models by importing it from current tools. Export metadata to downstream systems
2 Agile Data Modeling:Type system allows custom metadata structures in a hierarchy taxonomy
7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Atlas Component Integration
• Cross- component dataset lineage. Centralized location for all metadata inside HDP
• Single Interface point for Metadata Exchange with platforms outside of HDP
Apache Atlas
Hive
Ranger
Falcon
Sqoop
Storm
Kafka
Spark
NiFi
HBase
HDP 2.3
HDP 2.5
Beyond HDP 2.5
8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Next Generation Security & Governance for Hadoop NEW
9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Demo Scenario HortoniaBank – mid-size financial services company (bank + health insurance
services) expanding from US to international markets Employees in EU and US Multiple business units need access to customer data: Analysts, Compliance
Admins, HR Customer data is co-mingled as well as isolated Leases data from external data brokers Needs to have rational security policies to provide the right level of access
control to customer data across geographies, business functions, and to comply with external regulations (PII, HIPAA, EU Privacy etc.)
all user passwords: hadoop
10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Demo Data Customer data in hortoniabank DB
• 2 Customer Tables: 50K customer records each with 38 fields (PII, PHI, PCI & non-sensitive data)
–us_customers: USA person data only–ww_customers: multi-language, multi-country, localized person
data across the world• 1 Reference table: eu_countries (reference table for looking up EU
country codes to country mappings – with BRExit etc.) Finance DB: 1 data set leased from a data broker
– tax_2015: Data lease expired already (on Dec 31st 2015)
all user passwords: hadoop
11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Ranger Policies Setup for Demo Only US employees can see data in us_customers table and only from locations within the US
(access_us_customers)
Only US employees can see data rows of US persons in ww_customers table (filter_ww_customers_table + access_ww_customers)
Only EU employees can see rows with EU person data in ww_customers table (filter_ww_customers_table + access_ww_customers)
US HR team members can see all original unmasked data (PCI, PII,….)
Analysts can view masked versions of sensitive data from WW customers table but are prohibited from viewing PII data in US tables (All masking policies under Masking Tab of Resource based policies)
No combination of zip code, MRN, and bloodgroup data are permitted to be joined in any query (prohibition policy)
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Personas Setup for DemoUser Group Access Privileges
joe-analyst us_employees, analyst
US Data Only, non-sensitive data only, rest masked or forbidden depending on sensitivity
kate-hr us_employees, hr US Data Only, All sensitive data (PCI, PII, PHI)
ivana-eu-hr eu_employees, hr EU Data Only, All sensitive data
compliance-admin compliance, us_employees
Compliance with licensing, can only see leased data sets
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Data Column Data Column Description
Masking Type
Sample Output Ranger Masking Policy
password Password Hash 237672b21819462ff39fcea7d990c3e5 mask_password_hash
nationalid National ID Show Last 4 xx-xx-9324 mask_nationalid_last4
ccnumber Credit Card Number
Show First 4 4532xxxxxxxxxxxx mask_ccnumber_first4
streetaddress Street Address
Redact nnn Xxxxxx Xxxxx mask_streetaddress_redact
MRN MRN Nullify null mask_mrn_nullify
age Age CUSTOM (Adds a random number below 20 to actual age)
mask_age_custom
birthday Date of Brith
CUSTOM 01-01-1987 (Keep year of birth and make date & month 01-01)
mask_dob_custom
Data Masking Policies setup for us_customers data for analyst group
14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Tag Based Policy for Leased data
Group Access Privileges
public No Access after data lease expiration date (denied)
compliance Compliance team allowed to access data after expiration date
Tagging Leased Data set in Atlas
tax_2015 table tagged with EXPIRES_ON with expiry_date:2015-12-31
Tag Based Policy in Ranger for leased dataset: (Policy name: tag_EXPIRES_ON)
15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
HDP Security Benefits Comprehensive Securitythrough a platform approach. Providing Administrators with complete visibility into the security administration process
Data ProtectionEncryption of data at rest and in motion, Dynamic Masking & Row Filtering
Centralized Administrationof security policies and user authentication. Consistently define, administer and manage security policies. Define a policy once and apply it to all the applicable components across the stack
Fine-Grain Authorizationfor data access control for Database, Table, Column, LDAP Groups & Specific Users. Dynamic tag based policies
Integrated with Data Governance via Apache Atlas
YA R ND A T A O P E R A T I N G S Y S T E M
OPERATIONS SECURITY
GOVERNANCE
ST
OR
AG
E
ST
OR
AG
E
MachineLearningBatch
StreamingInteractive
Search
SECURITY