trihug october: apache ranger
TRANSCRIPT
Hadoop Data Security with
Apache Ranger Biren Saini
© Hortonworks Inc. 2011 – 2015. All Rights Reserved
Page 2 © Hortonworks Inc. 2015
About me
• Biren Saini • Senior Solutions Engineer • Governance SME Lead • Overall 15 years of technology experience
@ Hortonworks }
Page 3 © Hortonworks Inc. 2015
Agenda
• Hadoop Security Overview • Apache Ranger
– Introduction – Architecture – Sample Flow – Best Practices – Ranger Stacks – Demo
Page 4 © Hortonworks Inc. 2015
Overview of Security in Hadoop
Page 5 © Hortonworks Inc. 2015
5 Pillars of Security
• Authentication • Authorization
• Audit
• Encryption
• Centralized Administration
Page 6 © Hortonworks Inc. 2015
Security Tools in Hadoop world
• Kerberos (authentication) • Apache Knox (authentication)
• AD/LDAP (authentication)
• Apache Ranger (authorization, audit, kms)
• HDFS TDE (data encryption)
• Wire Encryption (data protection)
Page 7 © Hortonworks Inc. 2015
HDFS
Typical Flow – SQL Access through Beeline client
HiveServer 2 A B C
Beeline Client
Page 8 © Hortonworks Inc. 2015
HDFS
Typical Flow – Authenticate through Kerberos
HiveServer 2 A B C
KDC
Login into Hive using AD password
Hive gets Namenode (NN) service ticket
Hive creates map reduce using NN ST
Client gets service ticket for Hive
Beeline Client
Active Directory
Page 9 © Hortonworks Inc. 2015
HDFS
Typical Flow – Add Authorization through Apache Ranger
HiveServer 2 A B C
KDC
Hive gets Namenode (NN) service ticket
Column level access control, auditing
Ranger
Beeline Client
File level access control
Active Directory
Import users/groups from LDAP
Login into Hive using AD password
Page 10 © Hortonworks Inc. 2015
HDFS
Typical Flow – Firewall, Route through Knox Gateway
HiveServer 2 A B C
KDC
Use Hive ST, submit query
Hive gets Namenode (NN) service ticket
Hive creates map reduce using NN ST
Ranger
Knox gets service ticket for Hive
Knox runs as proxy user using Hive ST
Original request w/user id/password
Client gets query result
Beeline Client
Apache Knox
Active Directory
Page 11 © Hortonworks Inc. 2015
HDFS
Typical Flow – Add Wire and File Encryption
HiveServer 2 A B C
KDC
Use Hive ST, submit query
Hive gets Namenode (NN) service ticket
Hive creates map reduce using NN ST
Ranger
Knox gets service ticket for Hive
Knox runs as proxy user using Hive ST
Original request w/user id/password
Client gets query result
SSL
Beeline Client
SSL SASL
SSL SSL
Apache Knox
Active Directory
Page 12 © Hortonworks Inc. 2015
Apache Ranger
Page 13 © Hortonworks Inc. 2015
Apache Ranger
• Provides centralized policy definition for authorizing & auditing access to resources in a consistent manner.
• Supported components as of v0.5 • HDFS • HBase • Hive • YARN • Knox • Storm • Solr • Kafka
Page 14 © Hortonworks Inc. 2015
Setup Authorization Policies
14
file level access control, flexible definition
Control permissions
Page 15 © Hortonworks Inc. 2015
Monitor through Auditing
Page 16 © Hortonworks Inc. 2015
Agent Agent Agent Agent Agent Agent
Apache Ranger authZ Architecture
HBase Hive YARN Knox Storm Solr Kafka
Agent
HDFS
Agent
Administration Portal (Ranger UI)
REST APIs
DB
SOLR
HDFS
Policy Server LDAP/AD user/group
sync
Log4j
KMS Audit Server
User Sync Server
Page 17 © Hortonworks Inc. 2015
Hadoop Cluster
Sample Simplified Workflow - HDFS
Policy Server
Audit Server
Administration Portal
Agent
Namenode
Audit Store
Ranger
Policy Store
Unauthorized user attempts to access the data
User access is denied
No Policy defined.
Page 18 © Hortonworks Inc. 2015
Hadoop Cluster
Sample Simplified Workflow - HDFS
Policy Server
Audit Server
Administration Portal
Admin sets policies for HDFS files/folder 1a
1b
1d
Agent
Namenode
Audit Store
Ranger
Policy Store
1c
Page 19 © Hortonworks Inc. 2015
Hadoop Cluster
Sample Simplified Workflow - HDFS
Policy Server
Audit Server
Administration Portal
Admin sets policies for HDFS files/folder 1a
Data scientist runs a map reduce job
User Application
Analysts access HDFS data through application
IT users access HDFS through CLI
1b
2a
2a
2a
Agent
Namenode
Namenode provides resource access to user/client
Namenode uses Agent for Authorization 2b
Audit Store
2d 2c
Ranger
Policy Store
1d
1c
Page 20 © Hortonworks Inc. 2015
Hadoop Cluster
Sample Simplified Workflow - HDFS
Policy Server
Audit Server
Administration Portal
Admin sets policies for HDFS files/folder 1a
Data scientist runs a map reduce job
User Application
Analysts access HDFS data through application
IT users access HDFS through CLI
1b
2a
2a
2a
Agent
Namenode
Namenode provides resource access to user/client
Namenode uses Agent for Authorization 2b
Audit Store
Admin requests the Audit report 3a 3b
3c
2d 2c
Ranger
Policy Store
1d
1c
Page 21 © Hortonworks Inc. 2015
Ranger UserSync Best Practice
21
• Ensure LDAPS is used to integrate with Ranger • Create OU ONLY for Hadoop users for performance • Only run usersync when necessary
– How much users are being added and how often – How much users are changing roles – Too much syncing can degrade LDAP performance
• Do not sync anonymously
Page 22 © Hortonworks Inc. 2015
Ranger Audit Best Practices
22
• HDFS – Long term storage that can be used to understand user event
trends and predict anomaly • RDBMS
– When SQL is preferred by auditors – MySQL, Oracle, Postgres, SQL Server
• Solr – Nice quick reporting metrics to understand user event trends
• Log4j Appenders
Page 23 © Hortonworks Inc. 2015
Ranger Stacks • Apache Ranger v0.5 supports stack-model to enable easier onboarding
of new components, without requiring code changes in Apache Ranger.
Ranger Side Changes
Define Service-type
Secured Components Side Changes
Develop Ranger Authorization Plugin • Create a JSON file with
following details : - Resources - Access types - Config to connect
• Load the JSON into Ranger.
• Include plugin library in the secure component. • During initialization of the service: Init
RangerBasePlugIn & RangerDefaultAuditHandler class. • To authorize access to a resource: Use
RangerAccessRequest.isAccessAllowed() • To support resource lookup: Implement
RangerBaseService.lookupResource() & RangerBaseService.validateConfig()
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=53741207
Page 24 © Hortonworks Inc. 2015
Summary & Misc. points
24
• All functions are available as Rest API • Ranger integrates with AD/LDAP for ranger login as well as user sync. • Support for High Availability (HA) • Support for Transparent Data Encryption with KMS implementation • Tighter integration with Apache Ambari • Stack based implementation of Plugins • Ranger also has the KMS for HDFS TDE. • Some features in development are
– Spark support – Time based authorization – Geo Location based authorization
Page 25 © Hortonworks Inc. 2015
Demo - HDFS
Admin
Sam Tom
/demo/data/trihug
/demo/data/trihug Ranger UI
WRITE Access denied READ Access denied
1
2
Sam Tom
/demo/data/trihug WRITE Access allowed READ Access allowed
3
Grants access
READ for Sam WRITE for Tom
Ranger Plugin
gets the update
WRITE Access denied
hdfs:hdfs rwx --- ---
Elevated Privileges Restricted Privileges
Directory already exists
Page 26 © Hortonworks Inc. 2015
Demo - Hive
Admin
Sam Tom
tickers eod
Ranger UI
WRITE Access denied READ Access denied
1
2
Sam Tom
WRITE Access allowed READ Access to SOME COLUMNS allowed 3
Grants access
READ for Sam ALL for Tom
Ranger Plugin
gets the update
WRITE Access denied
hive tables
tickers eod
hive tables
tickers eod
hive tables
SOME COLUMNS
READ Access to ALL COLUMNS denied
Created by “hive” user in trihug schema
Elevated Privileges Restricted Privileges
GRANT Access allowed
DB already exists
Page 27 © Hortonworks Inc. 2015
Demo time..
Page 28 © Hortonworks Inc. 2015
Thank you. Questions?