trihug october: apache ranger

28
Hadoop Data Security with Apache Ranger Biren Saini © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Upload: trihug

Post on 12-Apr-2017

589 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: TriHUG October: Apache Ranger

Hadoop Data Security with

Apache Ranger Biren Saini

© Hortonworks Inc. 2011 – 2015. All Rights Reserved

Page 2: TriHUG October: Apache Ranger

Page 2 © Hortonworks Inc. 2015

About me

•  Biren Saini •  Senior Solutions Engineer •  Governance SME Lead •  Overall 15 years of technology experience

@ Hortonworks }

Page 3: TriHUG October: Apache Ranger

Page 3 © Hortonworks Inc. 2015

Agenda

•  Hadoop Security Overview •  Apache Ranger

–  Introduction – Architecture – Sample Flow – Best Practices – Ranger Stacks – Demo

Page 4: TriHUG October: Apache Ranger

Page 4 © Hortonworks Inc. 2015

Overview of Security in Hadoop

Page 5: TriHUG October: Apache Ranger

Page 5 © Hortonworks Inc. 2015

5 Pillars of Security

•  Authentication •  Authorization

•  Audit

•  Encryption

•  Centralized Administration

Page 6: TriHUG October: Apache Ranger

Page 6 © Hortonworks Inc. 2015

Security Tools in Hadoop world

•  Kerberos (authentication) •  Apache Knox (authentication)

•  AD/LDAP (authentication)

•  Apache Ranger (authorization, audit, kms)

•  HDFS TDE (data encryption)

•  Wire Encryption (data protection)

Page 7: TriHUG October: Apache Ranger

Page 7 © Hortonworks Inc. 2015

HDFS

Typical Flow – SQL Access through Beeline client

HiveServer 2 A B C

Beeline Client

Page 8: TriHUG October: Apache Ranger

Page 8 © Hortonworks Inc. 2015

HDFS

Typical Flow – Authenticate through Kerberos

HiveServer 2 A B C

KDC

Login into Hive using AD password

Hive gets Namenode (NN) service ticket

Hive creates map reduce using NN ST

Client gets service ticket for Hive

Beeline Client

Active Directory

Page 9: TriHUG October: Apache Ranger

Page 9 © Hortonworks Inc. 2015

HDFS

Typical Flow – Add Authorization through Apache Ranger

HiveServer 2 A B C

KDC

Hive gets Namenode (NN) service ticket

Column level access control, auditing

Ranger

Beeline Client

File level access control

Active Directory

Import users/groups from LDAP

Login into Hive using AD password

Page 10: TriHUG October: Apache Ranger

Page 10 © Hortonworks Inc. 2015

HDFS

Typical Flow – Firewall, Route through Knox Gateway

HiveServer 2 A B C

KDC

Use Hive ST, submit query

Hive gets Namenode (NN) service ticket

Hive creates map reduce using NN ST

Ranger

Knox gets service ticket for Hive

Knox runs as proxy user using Hive ST

Original request w/user id/password

Client gets query result

Beeline Client

Apache Knox

Active Directory

Page 11: TriHUG October: Apache Ranger

Page 11 © Hortonworks Inc. 2015

HDFS

Typical Flow – Add Wire and File Encryption

HiveServer 2 A B C

KDC

Use Hive ST, submit query

Hive gets Namenode (NN) service ticket

Hive creates map reduce using NN ST

Ranger

Knox gets service ticket for Hive

Knox runs as proxy user using Hive ST

Original request w/user id/password

Client gets query result

SSL

Beeline Client

SSL SASL

SSL SSL

Apache Knox

Active Directory

Page 12: TriHUG October: Apache Ranger

Page 12 © Hortonworks Inc. 2015

Apache Ranger

Page 13: TriHUG October: Apache Ranger

Page 13 © Hortonworks Inc. 2015

Apache Ranger

•  Provides centralized policy definition for authorizing & auditing access to resources in a consistent manner.

•  Supported components as of v0.5 •  HDFS •  HBase •  Hive •  YARN •  Knox •  Storm •  Solr •  Kafka

Page 14: TriHUG October: Apache Ranger

Page 14 © Hortonworks Inc. 2015

Setup Authorization Policies

14

file level access control, flexible definition

Control permissions

Page 15: TriHUG October: Apache Ranger

Page 15 © Hortonworks Inc. 2015

Monitor through Auditing

Page 16: TriHUG October: Apache Ranger

Page 16 © Hortonworks Inc. 2015

Agent Agent Agent Agent Agent Agent

Apache Ranger authZ Architecture

HBase Hive YARN Knox Storm Solr Kafka

Agent

HDFS

Agent

Administration Portal (Ranger UI)

REST APIs

DB

SOLR

HDFS

Policy Server LDAP/AD user/group

sync

Log4j

KMS Audit Server

User Sync Server

Page 17: TriHUG October: Apache Ranger

Page 17 © Hortonworks Inc. 2015

Hadoop Cluster

Sample Simplified Workflow - HDFS

Policy Server

Audit Server

Administration Portal

Agent

Namenode

Audit Store

Ranger

Policy Store

Unauthorized user attempts to access the data

User access is denied

No Policy defined.

Page 18: TriHUG October: Apache Ranger

Page 18 © Hortonworks Inc. 2015

Hadoop Cluster

Sample Simplified Workflow - HDFS

Policy Server

Audit Server

Administration Portal

Admin sets policies for HDFS files/folder 1a

1b

1d

Agent

Namenode

Audit Store

Ranger

Policy Store

1c

Page 19: TriHUG October: Apache Ranger

Page 19 © Hortonworks Inc. 2015

Hadoop Cluster

Sample Simplified Workflow - HDFS

Policy Server

Audit Server

Administration Portal

Admin sets policies for HDFS files/folder 1a

Data scientist runs a map reduce job

User Application

Analysts access HDFS data through application

IT users access HDFS through CLI

1b

2a

2a

2a

Agent

Namenode

Namenode provides resource access to user/client

Namenode uses Agent for Authorization 2b

Audit Store

2d 2c

Ranger

Policy Store

1d

1c

Page 20: TriHUG October: Apache Ranger

Page 20 © Hortonworks Inc. 2015

Hadoop Cluster

Sample Simplified Workflow - HDFS

Policy Server

Audit Server

Administration Portal

Admin sets policies for HDFS files/folder 1a

Data scientist runs a map reduce job

User Application

Analysts access HDFS data through application

IT users access HDFS through CLI

1b

2a

2a

2a

Agent

Namenode

Namenode provides resource access to user/client

Namenode uses Agent for Authorization 2b

Audit Store

Admin requests the Audit report 3a 3b

3c

2d 2c

Ranger

Policy Store

1d

1c

Page 21: TriHUG October: Apache Ranger

Page 21 © Hortonworks Inc. 2015

Ranger UserSync Best Practice

21

•  Ensure LDAPS is used to integrate with Ranger •  Create OU ONLY for Hadoop users for performance •  Only run usersync when necessary

– How much users are being added and how often – How much users are changing roles – Too much syncing can degrade LDAP performance

•  Do not sync anonymously

Page 22: TriHUG October: Apache Ranger

Page 22 © Hortonworks Inc. 2015

Ranger Audit Best Practices

22

•  HDFS – Long term storage that can be used to understand user event

trends and predict anomaly •  RDBMS

– When SQL is preferred by auditors – MySQL, Oracle, Postgres, SQL Server

•  Solr – Nice quick reporting metrics to understand user event trends

•  Log4j Appenders

Page 23: TriHUG October: Apache Ranger

Page 23 © Hortonworks Inc. 2015

Ranger Stacks •  Apache Ranger v0.5 supports stack-model to enable easier onboarding

of new components, without requiring code changes in Apache Ranger.

Ranger Side Changes

Define Service-type

Secured Components Side Changes

Develop Ranger Authorization Plugin •  Create a JSON file with

following details : - Resources - Access types - Config to connect

•  Load the JSON into Ranger.

•  Include plugin library in the secure component. •  During initialization of the service: Init

RangerBasePlugIn & RangerDefaultAuditHandler class. •  To authorize access to a resource: Use

RangerAccessRequest.isAccessAllowed() •  To support resource lookup: Implement

RangerBaseService.lookupResource() & RangerBaseService.validateConfig()

https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=53741207

Page 24: TriHUG October: Apache Ranger

Page 24 © Hortonworks Inc. 2015

Summary & Misc. points

24

•  All functions are available as Rest API •  Ranger integrates with AD/LDAP for ranger login as well as user sync. •  Support for High Availability (HA) •  Support for Transparent Data Encryption with KMS implementation •  Tighter integration with Apache Ambari •  Stack based implementation of Plugins •  Ranger also has the KMS for HDFS TDE. •  Some features in development are

–  Spark support –  Time based authorization –  Geo Location based authorization

Page 25: TriHUG October: Apache Ranger

Page 25 © Hortonworks Inc. 2015

Demo - HDFS

Admin

Sam Tom

/demo/data/trihug

/demo/data/trihug Ranger UI

WRITE Access denied READ Access denied

1

2

Sam Tom

/demo/data/trihug WRITE Access allowed READ Access allowed

3

Grants access

READ for Sam WRITE for Tom

Ranger Plugin

gets the update

WRITE Access denied

hdfs:hdfs rwx --- ---

Elevated Privileges Restricted Privileges

Directory already exists

Page 26: TriHUG October: Apache Ranger

Page 26 © Hortonworks Inc. 2015

Demo - Hive

Admin

Sam Tom

tickers eod

Ranger UI

WRITE Access denied READ Access denied

1

2

Sam Tom

WRITE Access allowed READ Access to SOME COLUMNS allowed 3

Grants access

READ for Sam ALL for Tom

Ranger Plugin

gets the update

WRITE Access denied

hive tables

tickers eod

hive tables

tickers eod

hive tables

SOME COLUMNS

READ Access to ALL COLUMNS denied

Created by “hive” user in trihug schema

Elevated Privileges Restricted Privileges

GRANT Access allowed

DB already exists

Page 27: TriHUG October: Apache Ranger

Page 27 © Hortonworks Inc. 2015

Demo time..

Page 28: TriHUG October: Apache Ranger

Page 28 © Hortonworks Inc. 2015

Thank you. Questions?