SlideShare a Scribd company logo
1 of 39
Download to read offline
Deploying	
  enterprise	
  grade	
  security	
  
for	
  Hadoop	
  
Brock	
  Noland	
  |So.ware	
  Engineer,	
  Cloudera	
  
February	
  27,	
  2014	
  

1
Outline	
  
• 
• 

IntroducCon	
  
Hadoop	
  security	
  primer	
  
• 
• 

• 

Security	
  opCons	
  
• 
• 
• 

• 

2

AuthenCcaCon	
  
AuthorizaCon	
  
Default	
  
Kerberos	
  with	
  ImpersonaCon	
  
Kerberos	
  with	
  Sentry	
  

Demo	
  
IntroducCon	
  
Tonight's	
  focus	
  is	
  SQL-­‐on-­‐Hadoop	
  
•  Vast	
  majority	
  of	
  Hadoop	
  users	
  use	
  Hive	
  or	
  Cloudera	
  
Impala	
  
•  Data	
  warehouse	
  offload	
  is	
  the	
  most	
  common	
  use	
  
case	
  
•  Data	
  warehouse	
  offload	
  is	
  a	
  two	
  step	
  process	
  
1. 
2. 

3

AutomaCc	
  transformaCons	
  moved	
  to	
  Hadoop	
  
Data	
  analysts	
  given	
  query	
  access	
  
Data	
  warehouse	
  use	
  case	
  

Online	
  
Database	
  

4

Hadoop	
  

Data	
  Warehouse	
  
Outline	
  
• 
• 

IntroducCon	
  
Hadoop	
  Security	
  Primer	
  
• 
• 

• 

Security	
  opCons	
  
• 
• 
• 

• 

5

AuthenCcaCon	
  
AuthorizaCon	
  
Default	
  
Kerberos	
  with	
  ImpersonaCon	
  
Kerberos	
  with	
  Sentry	
  

Demo	
  
AuthenCcaCon	
  
• 
• 

AuthenCcaCon	
  is	
  who	
  you	
  are	
  
Hadoop	
  models	
  
• 
• 

6

Default	
  -­‐	
  “trusted	
  network”	
  
Strong	
  -­‐	
  Kerberos	
  
Default	
  AuthenCcaCon	
  –	
  trusted	
  network	
  
• 
• 
• 

Default	
  security	
  mechanism	
  
Hadoop	
  client	
  uses	
  local	
  username	
  
Used	
  in	
  
• 
• 
• 
• 

7

POCs	
  
Startups	
  
Demos	
  
Pre-­‐prod	
  environments	
  
Default	
  AuthenCcaCon	
  –	
  trusted	
  network	
  

Client	
  Host	
  

User:	
  brock	
  
File:	
  a.txt	
  
Contents:	
  some	
  data	
  

$	
  whoami	
  
brock	
  
$	
  cat	
  a.txt	
  
some	
  data	
  
$	
  hadoop	
  fs	
  -­‐put	
  a.txt	
  .	
  

8

Hadoop	
  
Strong	
  AuthenCcaCon	
  –	
  Kerberos	
  
• 

Hadoop	
  is	
  secured	
  with	
  Kerberos	
  
• 
• 

• 

Every	
  user	
  and	
  service	
  has	
  a	
  Kerberos	
  “principal”	
  
• 
• 

• 

Service:	
  impala/hostname@MYCOMPANY.COM	
  
User:	
  brock@MYCOMPANY.COM	
  

CredenCals	
  
• 
• 

9

Provides	
  mutual	
  authenCcaCon	
  
Protects	
  against	
  eavesdropping	
  and	
  replay	
  a^acks	
  

Service:	
  keytabs	
  
User:	
  password	
  
Strong	
  AuthenCcaCon	
  –	
  Kerberos	
  

Client	
  Host	
  

<kerberos	
  Ccket>	
  
<encrypted	
  data>	
  *	
  

$	
  whoami	
  
brock	
  
$	
  kinit	
  
Password:	
  *******	
  
$	
  cat	
  a.txt	
  
some	
  data	
  
$	
  hadoop	
  fs	
  -­‐put	
  a.txt	
  .	
  
10

Hadoop	
  

*	
  RPC	
  EncrypCon	
  must	
  be	
  enabled	
  
Strong	
  AuthenCcaCon	
  –	
  Kerberos	
  
• 

Keytab	
  
• 
• 

11

Encrypted	
  key	
  for	
  servers	
  (similar	
  to	
  a	
  “password”)	
  
Generated	
  by	
  server	
  such	
  as	
  MIT	
  Kerberos	
  or	
  AcCve	
  
Directory	
  
Strong	
  AuthenCcaCon	
  –	
  Kerberos	
  
• 

ImpersonaCon	
  
• 
• 
• 

12

Services	
  such	
  as	
  Hive	
  Server2	
  impersonate	
  users	
  
Data	
  loaded	
  by	
  “joe”	
  via	
  HS2	
  is	
  owned	
  by	
  “joe”	
  
Oozie	
  jobs	
  submi^ed	
  by	
  “brock”	
  are	
  run	
  as	
  “brock”	
  
Hive	
  Server	
  2	
  and	
  Oozie	
  
Beeline	
  
(Hive	
  CLI)	
  

Tableau	
  

JDBC	
  

Hive	
  Server	
  2	
  (HS2)	
  

Oozie	
  

Hadoop	
  
13

Oozie	
  CLI	
  

Control-­‐M	
  
AuthorizaCon	
  
• 

HDFS	
  permissions	
  
• 
• 
• 

• 

Other	
  Hadoop	
  components	
  have	
  authorizaCon	
  
• 
• 

14

Unix	
  style	
  
Read/Write/Execute	
  for	
  Owner/Group/Other	
  
Coarse	
  grained	
  
MapReduce	
  who	
  can	
  use	
  which	
  job	
  queues	
  
HBase	
  table	
  ACL’s	
  
HDFS	
  Permisssions	
  
$ hadoop fs -ls file
-rw-r----1 analyst1 analysts

	
  
• 

Permissions	
  
• 
• 
• 

• 

Owner	
  
• 

• 

Unix	
  style	
  permissions	
  
Read/Write/Execute	
  
Owner/Group/Other	
  
One	
  and	
  only	
  one	
  owner	
  

Group	
  
• 

One	
  and	
  only	
  one	
  group	
  

2244 2014-01-19 12:15 file
Back	
  to	
  our	
  use	
  case	
  
• 

Scenario	
  facts	
  
• 
• 
• 

• 

Next	
  step	
  
• 
• 

16

ETL	
  offload	
  is	
  a	
  success	
  
Data	
  warehouse	
  is	
  expensive	
  and	
  at	
  capacity	
  
Same	
  data	
  is	
  in	
  Hadoop	
  
End	
  users	
  start	
  using	
  Hadoop	
  to	
  augment	
  the	
  DW	
  
Security	
  becomes	
  primary	
  concern	
  
End	
  users	
  need	
  to	
  share	
  data	
  
• 
• 
• 
• 

17

Unlike	
  automated	
  ETL	
  jobs,	
  end	
  users	
  want	
  to	
  share	
  
data	
  with	
  peers	
  
Must	
  manage	
  HDFS	
  permissions	
  manually	
  
Each	
  file	
  has	
  a	
  single	
  group	
  
End	
  result	
  is	
  users	
  set	
  permissions	
  to	
  world	
  
readable/writeable	
  
Outline	
  
• 
• 

IntroducCon	
  
Hadoop	
  Security	
  Primer	
  
• 
• 

• 

Security	
  opCons	
  
• 
• 
• 

• 

18

AuthenCcaCon	
  
AuthorizaCon	
  
Default	
  
Kerberos	
  with	
  ImpersonaCon	
  
Kerberos	
  with	
  Sentry	
  

Demo	
  
Hive:	
  Security	
  holes	
  
CREATE TEMPORARY FUNCTION
custom_udf AS ’com.mycompany.
MaliciousClass’;
SELECT TRANSFORM(stuff)
USING 'malicious-script.pl'
AS thing1, thing;
CREATE EXTERNAL TABLE
external_table(column1 string)
LOCATION ‘/path/to/any/table’;
19
Hive:	
  Security	
  holes	
  
CREATE TABLE test (c1 string)
ROW FORMAT SERDE 'com.mycompany.MaliciousClass';
FROM (
FROM t1
MAP t1.c1
USING 'malicious-script1.pl'
CLUSTER BY key) map_output
INSERT OVERWRITE TABLE t2
REDUCE t2.c1
USING 'malicious-script2.pl'
AS c2;

20
Default:	
  AuthorizaCon	
  
• 

Hive	
  ships	
  with	
  an	
  “advisory”	
  authorizaCon	
  system	
  
• 
• 
• 

21

All	
  users	
  see	
  all	
  databases/tables/columns	
  
Does	
  not	
  fix	
  any	
  security	
  holes	
  
Users	
  grant	
  themselves	
  permissions	
  
Outline	
  
• 
• 

IntroducCon	
  
Hadoop	
  Security	
  Primer	
  
• 
• 

• 

Security	
  opCons	
  
• 
• 
• 

• 

22

AuthenCcaCon	
  
AuthorizaCon	
  
Default	
  
Kerberos	
  with	
  ImpersonaCon	
  
Kerberos	
  with	
  Sentry	
  

Demo	
  
Kerberos	
  with	
  impersonaCon:	
  Sharing	
  data	
  
The	
  user	
  “manager1”	
  wants	
  to	
  share	
  the	
  table	
  “manager1_table”	
  
with	
  senior	
  analysts	
  but	
  not	
  junior	
  analysts.
# hadoop fs -ls -R /user/hive/warehouse
drwxr-x--T
- analyst1
analyst1
drwxr-x--T
- jranalyst1 jranalyst1
drwxr-x--T
- manager1
manager1

23

0
0
0

analyst1_table
jranalyst1_table
manager1_table
Kerberos	
  with	
  impersonaCon:	
  Sharing	
  data	
  
IT	
  must	
  create	
  a	
  group
# groupadd senioranalysts

	
  

Then	
  add	
  the	
  appropriate	
  members	
  to	
  group
# usermod -G analyst,senioranalysts analyst1
# usermod -G management,analyst,senioranalysts manager1

24
Kerberos	
  with	
  impersonaCon:	
  Sharing	
  data	
  
Then	
  “manager1”	
  can	
  manually	
  change	
  the	
  file	
  permissions	
  
$ hadoop fs -chgrp -R senioranalysts …/warehouse/manager1_table
$ hadoop fs -ls /user/hive/warehouse/
Found 3 items
drwxr-x--T
- analyst1
analyst1
drwxr-x--T
- jranalyst1 jranalyst1
drwxr-x--T
- manager1
senioranalysts

25

0
0
0

analyst1_table
jranalyst1_table
manager1_table
Kerberos	
  with	
  impersonaCon:	
  Sharing	
  data	
  
Now	
  any	
  senior-­‐level	
  analyst	
  can	
  query	
  the	
  data	
  
$ whoami
analyst1
$ beeline ...
Connected to: Hive (version 0.10.0)
0: jdbc:hive2://localhost:10000/default>
select count(*) from manager1_table;
+------------+
| count(*)
|
+------------+
| 47
|
+------------+

26

⏎
Kerberos	
  with	
  impersonaCon:	
  Sharing	
  data	
  
Junior	
  analysts	
  cannot	
  query	
  the	
  data:	
  
$ whoami
jranalyst1
$ beeline ....
Connected to: Hive (version 0.10.0)
0: jdbc:hive2://localhost:10000/default> ⏎
select * from manager1_table;
Error: java.io.IOException:
org.apache.hadoop.security.AccessControlException: Permission denied:
user=jranalyst1, access=READ_EXECUTE, inode="/user/hive/warehouse/
manager1_table":manager1:senioranalysts:drwxr-x--T

27
Kerberos	
  with	
  impersonaCon:	
  Sharing	
  data	
  

	
  
	
  
What	
  happens	
  in	
  the	
  real	
  world?	
  

28
Kerberos	
  with	
  impersonaCon:	
  Sharing	
  data	
  
Table	
  “manager1_table”	
  is	
  owned	
  by	
  user/group	
  “manager1”	
  
$ hadoop fs -ls /user/hive/warehouse/
Found 3 items
drwxr-x--T
- analyst1
analyst1
drwxr-x--T
- jranalyst1 jranalyst1
drwxr-x--T
- manager1
manager1

29

0
0
0

analyst1_table
jranalyst1_table
manager1_table
Kerberos	
  with	
  impersonaCon:	
  Sharing	
  data	
  
User	
  “manager1”	
  makes	
  “manager1_table”	
  world	
  readable/writable	
  
$ hadoop fs -chmod -R 777 /user/hive/warehouse/manager1_table
$ hadoop fs -ls /user/hive/warehouse/
Found 3 items
drwxr-x--T
- analyst1
analyst1
drwxr-x--T
- jranalyst1 jranalyst1
drwxrwxrwt
- manager1
manager1

30

0
0
0

analyst1_table
jranalyst1_table
manager1_table
Kerberos	
  with	
  impersonaCon:	
  Summary	
  
• 

Securing	
  Hive	
  with	
  Kerberos	
  and	
  impersonaCon	
  
makes	
  Hive	
  unusable	
  for	
  DW	
  offload	
  
• 
• 
• 
• 

31

Manual	
  file	
  permission	
  management	
  
End	
  state	
  is	
  world	
  writable/readable	
  
No	
  ability	
  to	
  restrict	
  access	
  to	
  columns	
  or	
  rows	
  
All	
  users	
  see	
  all	
  databases/tables/columns	
  
Outline	
  
• 
• 

IntroducCon	
  
Hadoop	
  Security	
  Primer	
  
• 
• 

• 

Security	
  opCons	
  
• 
• 
• 

• 

32

AuthenCcaCon	
  
AuthorizaCon	
  
Default	
  
Kerberos	
  with	
  ImpersonaCon	
  
Kerberos	
  with	
  Sentry	
  

Demo	
  
Fine	
  Grained	
  Security:	
  Apache	
  Sentry	
  
AuthorizaRon	
  module	
  for	
  Hive,	
  Search,	
  &	
  Impala	
  
Unlocks	
  Key	
  RBAC	
  Requirements	
  
Secure,	
  fine-­‐grained,	
  role-­‐based	
  authorizaCon	
  
MulC-­‐tenant	
  administraCon	
  

Open	
  Source	
  
Apache	
  Incubator	
  project	
  

Ecosystem	
  Support	
  
Apache	
  SOLR,	
  HiveServer2,	
  &	
  Impala	
  1.1+	
  

33
Key	
  Benefits	
  of	
  Sentry	
  
Store	
  SensiCve	
  Data	
  in	
  Hadoop	
  
Extend	
  Hadoop	
  to	
  More	
  Users	
  
Comply	
  with	
  RegulaCons	
  

34
Key	
  CapabiliCes	
  of	
  Sentry	
  
Fine-­‐Grained	
  AuthorizaCon	
  
Specify	
  security	
  for	
  SERVERS,	
  DATABASES,	
  TABLES	
  &	
  VIEWS	
  

Role-­‐Based	
  AuthorizaCon	
  
SELECT	
  privilege	
  on	
  views	
  &	
  tables	
  	
  
INSERT	
  privilege	
  on	
  tables	
  
ALL	
  privilege	
  on	
  the	
  server,	
  databases,	
  tables	
  &	
  views	
  
ALL	
  privilege	
  is	
  needed	
  to	
  create/modify	
  schema	
  

MulC-­‐Tenant	
  AdministraCon	
  
Separate	
  policies	
  for	
  each	
  database/schema	
  
Can	
  be	
  maintained	
  by	
  separate	
  admins	
  

35
Sentry	
  Architecture	
  
Impala	
  

Binding	
  
Layer	
  

Impala	
  

HiveServer2	
  

Hive	
  

Authoriza5on	
  
Provider	
  

SOLR	
  

Search	
  

Pig	
  

Policy	
  Engine	
  
Policy	
  Provider	
  
File	
  

Local	
  FS/HDFS	
  

36

Database	
  

…	
  
Query	
  ExecuCon	
  Flow	
  
SQL	
  
Parse	
  

Validate	
  SQL	
  grammar	
  

Build	
  

Construct	
  statement	
  tree	
  

Check	
  

37

Validate	
  statement	
  objects	
  
•  First	
  check:	
  AuthorizaCon	
  
Forward	
  to	
  execuCon	
  planner	
  

Plan	
  
MR	
  

Sentry	
  

Query	
  
Outline	
  
• 
• 

IntroducCon	
  
Hadoop	
  Security	
  Primer	
  
• 
• 

• 

Security	
  opCons	
  
• 
• 
• 

• 

38

AuthenCcaCon	
  
AuthorizaCon	
  
Default	
  
Kerberos	
  with	
  ImpersonaCon	
  
Kerberos	
  with	
  Sentry	
  

Demo	
  
Click	
  to	
  edit	
  Master	
  Ctle	
  style	
  

39	
  

More Related Content

What's hot

Hadoop security overview_hit2012_1117rev
Hadoop security overview_hit2012_1117revHadoop security overview_hit2012_1117rev
Hadoop security overview_hit2012_1117revJason Shih
 
Hadoop Security Architecture
Hadoop Security ArchitectureHadoop Security Architecture
Hadoop Security ArchitectureOwen O'Malley
 
Hadoop Security Today and Tomorrow
Hadoop Security Today and TomorrowHadoop Security Today and Tomorrow
Hadoop Security Today and TomorrowDataWorks Summit
 
Hadoop Security Now and Future
Hadoop Security Now and FutureHadoop Security Now and Future
Hadoop Security Now and Futuretcloudcomputing-tw
 
2014 sept 4_hadoop_security
2014 sept 4_hadoop_security2014 sept 4_hadoop_security
2014 sept 4_hadoop_securityAdam Muise
 
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 editionHadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 editionSteve Loughran
 
Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...
Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...
Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...Big Data Spain
 
Hadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox GatewayHadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox GatewayDataWorks Summit
 
Redis for Security Data : SecurityScorecard JVM Redis Usage
Redis for Security Data : SecurityScorecard JVM Redis UsageRedis for Security Data : SecurityScorecard JVM Redis Usage
Redis for Security Data : SecurityScorecard JVM Redis UsageTimothy Spann
 
Understanding the Solr Security Framekwork: Presented by Anshum Gupta, IBM
Understanding the Solr Security Framekwork: Presented by Anshum Gupta, IBMUnderstanding the Solr Security Framekwork: Presented by Anshum Gupta, IBM
Understanding the Solr Security Framekwork: Presented by Anshum Gupta, IBMLucidworks
 
Open Source Security Tools for Big Data
Open Source Security Tools for Big DataOpen Source Security Tools for Big Data
Open Source Security Tools for Big DataRommel Garcia
 
Hdp security overview
Hdp security overview Hdp security overview
Hdp security overview Hortonworks
 
[2A5]하둡 보안 어떻게 해야 할까
[2A5]하둡 보안 어떻게 해야 할까[2A5]하둡 보안 어떻게 해야 할까
[2A5]하둡 보안 어떻게 해야 할까NAVER D2
 
State of Solr Security 2016: Presented by Ishan Chattopadhyaya, Lucidworks
State of Solr Security 2016: Presented by Ishan Chattopadhyaya, LucidworksState of Solr Security 2016: Presented by Ishan Chattopadhyaya, Lucidworks
State of Solr Security 2016: Presented by Ishan Chattopadhyaya, LucidworksLucidworks
 
Nl HUG 2016 Feb Hadoop security from the trenches
Nl HUG 2016 Feb Hadoop security from the trenchesNl HUG 2016 Feb Hadoop security from the trenches
Nl HUG 2016 Feb Hadoop security from the trenchesBolke de Bruin
 
Hadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, FutureHadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, FutureUwe Printz
 
Overview of HDFS Transparent Encryption
Overview of HDFS Transparent Encryption Overview of HDFS Transparent Encryption
Overview of HDFS Transparent Encryption Cloudera, Inc.
 
Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...
Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...
Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...Kevin Minder
 

What's hot (20)

Hadoop security overview_hit2012_1117rev
Hadoop security overview_hit2012_1117revHadoop security overview_hit2012_1117rev
Hadoop security overview_hit2012_1117rev
 
Hadoop Security Architecture
Hadoop Security ArchitectureHadoop Security Architecture
Hadoop Security Architecture
 
Hadoop Security Today and Tomorrow
Hadoop Security Today and TomorrowHadoop Security Today and Tomorrow
Hadoop Security Today and Tomorrow
 
Hadoop security
Hadoop securityHadoop security
Hadoop security
 
Hadoop security
Hadoop securityHadoop security
Hadoop security
 
Hadoop Security Now and Future
Hadoop Security Now and FutureHadoop Security Now and Future
Hadoop Security Now and Future
 
2014 sept 4_hadoop_security
2014 sept 4_hadoop_security2014 sept 4_hadoop_security
2014 sept 4_hadoop_security
 
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 editionHadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
 
Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...
Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...
Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...
 
Hadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox GatewayHadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox Gateway
 
Redis for Security Data : SecurityScorecard JVM Redis Usage
Redis for Security Data : SecurityScorecard JVM Redis UsageRedis for Security Data : SecurityScorecard JVM Redis Usage
Redis for Security Data : SecurityScorecard JVM Redis Usage
 
Understanding the Solr Security Framekwork: Presented by Anshum Gupta, IBM
Understanding the Solr Security Framekwork: Presented by Anshum Gupta, IBMUnderstanding the Solr Security Framekwork: Presented by Anshum Gupta, IBM
Understanding the Solr Security Framekwork: Presented by Anshum Gupta, IBM
 
Open Source Security Tools for Big Data
Open Source Security Tools for Big DataOpen Source Security Tools for Big Data
Open Source Security Tools for Big Data
 
Hdp security overview
Hdp security overview Hdp security overview
Hdp security overview
 
[2A5]하둡 보안 어떻게 해야 할까
[2A5]하둡 보안 어떻게 해야 할까[2A5]하둡 보안 어떻게 해야 할까
[2A5]하둡 보안 어떻게 해야 할까
 
State of Solr Security 2016: Presented by Ishan Chattopadhyaya, Lucidworks
State of Solr Security 2016: Presented by Ishan Chattopadhyaya, LucidworksState of Solr Security 2016: Presented by Ishan Chattopadhyaya, Lucidworks
State of Solr Security 2016: Presented by Ishan Chattopadhyaya, Lucidworks
 
Nl HUG 2016 Feb Hadoop security from the trenches
Nl HUG 2016 Feb Hadoop security from the trenchesNl HUG 2016 Feb Hadoop security from the trenches
Nl HUG 2016 Feb Hadoop security from the trenches
 
Hadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, FutureHadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, Future
 
Overview of HDFS Transparent Encryption
Overview of HDFS Transparent Encryption Overview of HDFS Transparent Encryption
Overview of HDFS Transparent Encryption
 
Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...
Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...
Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...
 

Viewers also liked

E-RBAC Development - A Risk Based Security Architecture Approach
E-RBAC Development - A Risk Based Security Architecture ApproachE-RBAC Development - A Risk Based Security Architecture Approach
E-RBAC Development - A Risk Based Security Architecture ApproachFemi Ashaye
 
Implementation of RBAC and Data Classification onto a Mainframe system (v1.5)
Implementation of RBAC and Data Classification onto a Mainframe system (v1.5)Implementation of RBAC and Data Classification onto a Mainframe system (v1.5)
Implementation of RBAC and Data Classification onto a Mainframe system (v1.5)Rui Miguel Feio
 
Role based access control - RBAC
Role based access control - RBACRole based access control - RBAC
Role based access control - RBACAjit Dadresa
 
Deploying Enterprise-grade Security for Hadoop
Deploying Enterprise-grade Security for HadoopDeploying Enterprise-grade Security for Hadoop
Deploying Enterprise-grade Security for HadoopCloudera, Inc.
 
Hadoop and Data Access Security
Hadoop and Data Access SecurityHadoop and Data Access Security
Hadoop and Data Access SecurityCloudera, Inc.
 
Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)Michael Rys
 
Project Rhino: Enhancing Data Protection for Hadoop
Project Rhino: Enhancing Data Protection for HadoopProject Rhino: Enhancing Data Protection for Hadoop
Project Rhino: Enhancing Data Protection for HadoopCloudera, Inc.
 

Viewers also liked (7)

E-RBAC Development - A Risk Based Security Architecture Approach
E-RBAC Development - A Risk Based Security Architecture ApproachE-RBAC Development - A Risk Based Security Architecture Approach
E-RBAC Development - A Risk Based Security Architecture Approach
 
Implementation of RBAC and Data Classification onto a Mainframe system (v1.5)
Implementation of RBAC and Data Classification onto a Mainframe system (v1.5)Implementation of RBAC and Data Classification onto a Mainframe system (v1.5)
Implementation of RBAC and Data Classification onto a Mainframe system (v1.5)
 
Role based access control - RBAC
Role based access control - RBACRole based access control - RBAC
Role based access control - RBAC
 
Deploying Enterprise-grade Security for Hadoop
Deploying Enterprise-grade Security for HadoopDeploying Enterprise-grade Security for Hadoop
Deploying Enterprise-grade Security for Hadoop
 
Hadoop and Data Access Security
Hadoop and Data Access SecurityHadoop and Data Access Security
Hadoop and Data Access Security
 
Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)
 
Project Rhino: Enhancing Data Protection for Hadoop
Project Rhino: Enhancing Data Protection for HadoopProject Rhino: Enhancing Data Protection for Hadoop
Project Rhino: Enhancing Data Protection for Hadoop
 

Similar to TriHUG 2/14: Apache Sentry

[Devconf.cz][2017] Understanding OpenShift Security Context Constraints
[Devconf.cz][2017] Understanding OpenShift Security Context Constraints[Devconf.cz][2017] Understanding OpenShift Security Context Constraints
[Devconf.cz][2017] Understanding OpenShift Security Context ConstraintsAlessandro Arrichiello
 
AstriCon 2017 - Docker Swarm & Asterisk
AstriCon 2017  - Docker Swarm & AsteriskAstriCon 2017  - Docker Swarm & Asterisk
AstriCon 2017 - Docker Swarm & AsteriskEvan McGee
 
Orchestrating Docker with Terraform and Consul by Mitchell Hashimoto
Orchestrating Docker with Terraform and Consul by Mitchell Hashimoto Orchestrating Docker with Terraform and Consul by Mitchell Hashimoto
Orchestrating Docker with Terraform and Consul by Mitchell Hashimoto Docker, Inc.
 
Who is afraid of privileged containers ?
Who is afraid of privileged containers ?Who is afraid of privileged containers ?
Who is afraid of privileged containers ?Marko Bevc
 
Automation with ansible
Automation with ansibleAutomation with ansible
Automation with ansibleKhizer Naeem
 
Big Data in Container; Hadoop Spark in Docker and Mesos
Big Data in Container; Hadoop Spark in Docker and MesosBig Data in Container; Hadoop Spark in Docker and Mesos
Big Data in Container; Hadoop Spark in Docker and MesosHeiko Loewe
 
Building Out Your Kafka Developer CDC Ecosystem
Building Out Your Kafka Developer CDC  EcosystemBuilding Out Your Kafka Developer CDC  Ecosystem
Building Out Your Kafka Developer CDC Ecosystemconfluent
 
How to create a multi tenancy for an interactive data analysis
How to create a multi tenancy for an interactive data analysisHow to create a multi tenancy for an interactive data analysis
How to create a multi tenancy for an interactive data analysisTiago Simões
 
Bare Metal to OpenStack with Razor and Chef
Bare Metal to OpenStack with Razor and ChefBare Metal to OpenStack with Razor and Chef
Bare Metal to OpenStack with Razor and ChefMatt Ray
 
Cosmos, Big Data GE implementation in FIWARE
Cosmos, Big Data GE implementation in FIWARECosmos, Big Data GE implementation in FIWARE
Cosmos, Big Data GE implementation in FIWAREFernando Lopez Aguilar
 
Cosmos, Big Data GE Implementation
Cosmos, Big Data GE ImplementationCosmos, Big Data GE Implementation
Cosmos, Big Data GE ImplementationFIWARE
 
Hadoop Security, Cloudera - Todd Lipcon and Aaron Myers - Hadoop World 2010
Hadoop Security, Cloudera - Todd Lipcon and Aaron Myers - Hadoop World 2010Hadoop Security, Cloudera - Todd Lipcon and Aaron Myers - Hadoop World 2010
Hadoop Security, Cloudera - Todd Lipcon and Aaron Myers - Hadoop World 2010Cloudera, Inc.
 
Lessons from running potentially malicious code inside containers
Lessons from running potentially malicious code inside containersLessons from running potentially malicious code inside containers
Lessons from running potentially malicious code inside containersBen Hall
 
Docker 1.11 Presentation
Docker 1.11 PresentationDocker 1.11 Presentation
Docker 1.11 PresentationSreenivas Makam
 
Chickens & Eggs: Managing secrets in AWS with Hashicorp Vault
Chickens & Eggs: Managing secrets in AWS with Hashicorp VaultChickens & Eggs: Managing secrets in AWS with Hashicorp Vault
Chickens & Eggs: Managing secrets in AWS with Hashicorp VaultJeff Horwitz
 
The Challenges of Becoming Cloud Native
The Challenges of Becoming Cloud NativeThe Challenges of Becoming Cloud Native
The Challenges of Becoming Cloud NativeBen Hall
 
AAI-3218 Production Deployment Best Practices for WebSphere Liberty Profile
AAI-3218 Production Deployment Best Practices for WebSphere Liberty ProfileAAI-3218 Production Deployment Best Practices for WebSphere Liberty Profile
AAI-3218 Production Deployment Best Practices for WebSphere Liberty ProfileWASdev Community
 
Lateral Movement: How attackers quietly traverse your Network
Lateral Movement: How attackers quietly traverse your NetworkLateral Movement: How attackers quietly traverse your Network
Lateral Movement: How attackers quietly traverse your NetworkEC-Council
 
Lateral Movement - Hacker Halted 2016
Lateral Movement - Hacker Halted 2016Lateral Movement - Hacker Halted 2016
Lateral Movement - Hacker Halted 2016Xavier Ashe
 

Similar to TriHUG 2/14: Apache Sentry (20)

[Devconf.cz][2017] Understanding OpenShift Security Context Constraints
[Devconf.cz][2017] Understanding OpenShift Security Context Constraints[Devconf.cz][2017] Understanding OpenShift Security Context Constraints
[Devconf.cz][2017] Understanding OpenShift Security Context Constraints
 
HiveServer2
HiveServer2HiveServer2
HiveServer2
 
AstriCon 2017 - Docker Swarm & Asterisk
AstriCon 2017  - Docker Swarm & AsteriskAstriCon 2017  - Docker Swarm & Asterisk
AstriCon 2017 - Docker Swarm & Asterisk
 
Orchestrating Docker with Terraform and Consul by Mitchell Hashimoto
Orchestrating Docker with Terraform and Consul by Mitchell Hashimoto Orchestrating Docker with Terraform and Consul by Mitchell Hashimoto
Orchestrating Docker with Terraform and Consul by Mitchell Hashimoto
 
Who is afraid of privileged containers ?
Who is afraid of privileged containers ?Who is afraid of privileged containers ?
Who is afraid of privileged containers ?
 
Automation with ansible
Automation with ansibleAutomation with ansible
Automation with ansible
 
Big Data in Container; Hadoop Spark in Docker and Mesos
Big Data in Container; Hadoop Spark in Docker and MesosBig Data in Container; Hadoop Spark in Docker and Mesos
Big Data in Container; Hadoop Spark in Docker and Mesos
 
Building Out Your Kafka Developer CDC Ecosystem
Building Out Your Kafka Developer CDC  EcosystemBuilding Out Your Kafka Developer CDC  Ecosystem
Building Out Your Kafka Developer CDC Ecosystem
 
How to create a multi tenancy for an interactive data analysis
How to create a multi tenancy for an interactive data analysisHow to create a multi tenancy for an interactive data analysis
How to create a multi tenancy for an interactive data analysis
 
Bare Metal to OpenStack with Razor and Chef
Bare Metal to OpenStack with Razor and ChefBare Metal to OpenStack with Razor and Chef
Bare Metal to OpenStack with Razor and Chef
 
Cosmos, Big Data GE implementation in FIWARE
Cosmos, Big Data GE implementation in FIWARECosmos, Big Data GE implementation in FIWARE
Cosmos, Big Data GE implementation in FIWARE
 
Cosmos, Big Data GE Implementation
Cosmos, Big Data GE ImplementationCosmos, Big Data GE Implementation
Cosmos, Big Data GE Implementation
 
Hadoop Security, Cloudera - Todd Lipcon and Aaron Myers - Hadoop World 2010
Hadoop Security, Cloudera - Todd Lipcon and Aaron Myers - Hadoop World 2010Hadoop Security, Cloudera - Todd Lipcon and Aaron Myers - Hadoop World 2010
Hadoop Security, Cloudera - Todd Lipcon and Aaron Myers - Hadoop World 2010
 
Lessons from running potentially malicious code inside containers
Lessons from running potentially malicious code inside containersLessons from running potentially malicious code inside containers
Lessons from running potentially malicious code inside containers
 
Docker 1.11 Presentation
Docker 1.11 PresentationDocker 1.11 Presentation
Docker 1.11 Presentation
 
Chickens & Eggs: Managing secrets in AWS with Hashicorp Vault
Chickens & Eggs: Managing secrets in AWS with Hashicorp VaultChickens & Eggs: Managing secrets in AWS with Hashicorp Vault
Chickens & Eggs: Managing secrets in AWS with Hashicorp Vault
 
The Challenges of Becoming Cloud Native
The Challenges of Becoming Cloud NativeThe Challenges of Becoming Cloud Native
The Challenges of Becoming Cloud Native
 
AAI-3218 Production Deployment Best Practices for WebSphere Liberty Profile
AAI-3218 Production Deployment Best Practices for WebSphere Liberty ProfileAAI-3218 Production Deployment Best Practices for WebSphere Liberty Profile
AAI-3218 Production Deployment Best Practices for WebSphere Liberty Profile
 
Lateral Movement: How attackers quietly traverse your Network
Lateral Movement: How attackers quietly traverse your NetworkLateral Movement: How attackers quietly traverse your Network
Lateral Movement: How attackers quietly traverse your Network
 
Lateral Movement - Hacker Halted 2016
Lateral Movement - Hacker Halted 2016Lateral Movement - Hacker Halted 2016
Lateral Movement - Hacker Halted 2016
 

More from trihug

TriHUG October: Apache Ranger
TriHUG October: Apache RangerTriHUG October: Apache Ranger
TriHUG October: Apache Rangertrihug
 
TriHUG Feb: Hive on spark
TriHUG Feb: Hive on sparkTriHUG Feb: Hive on spark
TriHUG Feb: Hive on sparktrihug
 
TriHUG 3/14: HBase in Production
TriHUG 3/14: HBase in ProductionTriHUG 3/14: HBase in Production
TriHUG 3/14: HBase in Productiontrihug
 
TriHUG talk on Spark and Shark
TriHUG talk on Spark and SharkTriHUG talk on Spark and Shark
TriHUG talk on Spark and Sharktrihug
 
Impala presentation
Impala presentationImpala presentation
Impala presentationtrihug
 
Practical pig
Practical pigPractical pig
Practical pigtrihug
 
Financial services trihug
Financial services trihugFinancial services trihug
Financial services trihugtrihug
 
TriHUG January 2012 Talk by Chris Shain
TriHUG January 2012 Talk by Chris ShainTriHUG January 2012 Talk by Chris Shain
TriHUG January 2012 Talk by Chris Shaintrihug
 
TriHUG November HCatalog Talk by Alan Gates
TriHUG November HCatalog Talk by Alan GatesTriHUG November HCatalog Talk by Alan Gates
TriHUG November HCatalog Talk by Alan Gatestrihug
 
TriHUG November Pig Talk by Alan Gates
TriHUG November Pig Talk by Alan GatesTriHUG November Pig Talk by Alan Gates
TriHUG November Pig Talk by Alan Gatestrihug
 
MapR, Implications for Integration
MapR, Implications for IntegrationMapR, Implications for Integration
MapR, Implications for Integrationtrihug
 

More from trihug (11)

TriHUG October: Apache Ranger
TriHUG October: Apache RangerTriHUG October: Apache Ranger
TriHUG October: Apache Ranger
 
TriHUG Feb: Hive on spark
TriHUG Feb: Hive on sparkTriHUG Feb: Hive on spark
TriHUG Feb: Hive on spark
 
TriHUG 3/14: HBase in Production
TriHUG 3/14: HBase in ProductionTriHUG 3/14: HBase in Production
TriHUG 3/14: HBase in Production
 
TriHUG talk on Spark and Shark
TriHUG talk on Spark and SharkTriHUG talk on Spark and Shark
TriHUG talk on Spark and Shark
 
Impala presentation
Impala presentationImpala presentation
Impala presentation
 
Practical pig
Practical pigPractical pig
Practical pig
 
Financial services trihug
Financial services trihugFinancial services trihug
Financial services trihug
 
TriHUG January 2012 Talk by Chris Shain
TriHUG January 2012 Talk by Chris ShainTriHUG January 2012 Talk by Chris Shain
TriHUG January 2012 Talk by Chris Shain
 
TriHUG November HCatalog Talk by Alan Gates
TriHUG November HCatalog Talk by Alan GatesTriHUG November HCatalog Talk by Alan Gates
TriHUG November HCatalog Talk by Alan Gates
 
TriHUG November Pig Talk by Alan Gates
TriHUG November Pig Talk by Alan GatesTriHUG November Pig Talk by Alan Gates
TriHUG November Pig Talk by Alan Gates
 
MapR, Implications for Integration
MapR, Implications for IntegrationMapR, Implications for Integration
MapR, Implications for Integration
 

TriHUG 2/14: Apache Sentry

  • 1. Deploying  enterprise  grade  security   for  Hadoop   Brock  Noland  |So.ware  Engineer,  Cloudera   February  27,  2014   1
  • 2. Outline   •  •  IntroducCon   Hadoop  security  primer   •  •  •  Security  opCons   •  •  •  •  2 AuthenCcaCon   AuthorizaCon   Default   Kerberos  with  ImpersonaCon   Kerberos  with  Sentry   Demo  
  • 3. IntroducCon   Tonight's  focus  is  SQL-­‐on-­‐Hadoop   •  Vast  majority  of  Hadoop  users  use  Hive  or  Cloudera   Impala   •  Data  warehouse  offload  is  the  most  common  use   case   •  Data  warehouse  offload  is  a  two  step  process   1.  2.  3 AutomaCc  transformaCons  moved  to  Hadoop   Data  analysts  given  query  access  
  • 4. Data  warehouse  use  case   Online   Database   4 Hadoop   Data  Warehouse  
  • 5. Outline   •  •  IntroducCon   Hadoop  Security  Primer   •  •  •  Security  opCons   •  •  •  •  5 AuthenCcaCon   AuthorizaCon   Default   Kerberos  with  ImpersonaCon   Kerberos  with  Sentry   Demo  
  • 6. AuthenCcaCon   •  •  AuthenCcaCon  is  who  you  are   Hadoop  models   •  •  6 Default  -­‐  “trusted  network”   Strong  -­‐  Kerberos  
  • 7. Default  AuthenCcaCon  –  trusted  network   •  •  •  Default  security  mechanism   Hadoop  client  uses  local  username   Used  in   •  •  •  •  7 POCs   Startups   Demos   Pre-­‐prod  environments  
  • 8. Default  AuthenCcaCon  –  trusted  network   Client  Host   User:  brock   File:  a.txt   Contents:  some  data   $  whoami   brock   $  cat  a.txt   some  data   $  hadoop  fs  -­‐put  a.txt  .   8 Hadoop  
  • 9. Strong  AuthenCcaCon  –  Kerberos   •  Hadoop  is  secured  with  Kerberos   •  •  •  Every  user  and  service  has  a  Kerberos  “principal”   •  •  •  Service:  impala/hostname@MYCOMPANY.COM   User:  brock@MYCOMPANY.COM   CredenCals   •  •  9 Provides  mutual  authenCcaCon   Protects  against  eavesdropping  and  replay  a^acks   Service:  keytabs   User:  password  
  • 10. Strong  AuthenCcaCon  –  Kerberos   Client  Host   <kerberos  Ccket>   <encrypted  data>  *   $  whoami   brock   $  kinit   Password:  *******   $  cat  a.txt   some  data   $  hadoop  fs  -­‐put  a.txt  .   10 Hadoop   *  RPC  EncrypCon  must  be  enabled  
  • 11. Strong  AuthenCcaCon  –  Kerberos   •  Keytab   •  •  11 Encrypted  key  for  servers  (similar  to  a  “password”)   Generated  by  server  such  as  MIT  Kerberos  or  AcCve   Directory  
  • 12. Strong  AuthenCcaCon  –  Kerberos   •  ImpersonaCon   •  •  •  12 Services  such  as  Hive  Server2  impersonate  users   Data  loaded  by  “joe”  via  HS2  is  owned  by  “joe”   Oozie  jobs  submi^ed  by  “brock”  are  run  as  “brock”  
  • 13. Hive  Server  2  and  Oozie   Beeline   (Hive  CLI)   Tableau   JDBC   Hive  Server  2  (HS2)   Oozie   Hadoop   13 Oozie  CLI   Control-­‐M  
  • 14. AuthorizaCon   •  HDFS  permissions   •  •  •  •  Other  Hadoop  components  have  authorizaCon   •  •  14 Unix  style   Read/Write/Execute  for  Owner/Group/Other   Coarse  grained   MapReduce  who  can  use  which  job  queues   HBase  table  ACL’s  
  • 15. HDFS  Permisssions   $ hadoop fs -ls file -rw-r----1 analyst1 analysts   •  Permissions   •  •  •  •  Owner   •  •  Unix  style  permissions   Read/Write/Execute   Owner/Group/Other   One  and  only  one  owner   Group   •  One  and  only  one  group   2244 2014-01-19 12:15 file
  • 16. Back  to  our  use  case   •  Scenario  facts   •  •  •  •  Next  step   •  •  16 ETL  offload  is  a  success   Data  warehouse  is  expensive  and  at  capacity   Same  data  is  in  Hadoop   End  users  start  using  Hadoop  to  augment  the  DW   Security  becomes  primary  concern  
  • 17. End  users  need  to  share  data   •  •  •  •  17 Unlike  automated  ETL  jobs,  end  users  want  to  share   data  with  peers   Must  manage  HDFS  permissions  manually   Each  file  has  a  single  group   End  result  is  users  set  permissions  to  world   readable/writeable  
  • 18. Outline   •  •  IntroducCon   Hadoop  Security  Primer   •  •  •  Security  opCons   •  •  •  •  18 AuthenCcaCon   AuthorizaCon   Default   Kerberos  with  ImpersonaCon   Kerberos  with  Sentry   Demo  
  • 19. Hive:  Security  holes   CREATE TEMPORARY FUNCTION custom_udf AS ’com.mycompany. MaliciousClass’; SELECT TRANSFORM(stuff) USING 'malicious-script.pl' AS thing1, thing; CREATE EXTERNAL TABLE external_table(column1 string) LOCATION ‘/path/to/any/table’; 19
  • 20. Hive:  Security  holes   CREATE TABLE test (c1 string) ROW FORMAT SERDE 'com.mycompany.MaliciousClass'; FROM ( FROM t1 MAP t1.c1 USING 'malicious-script1.pl' CLUSTER BY key) map_output INSERT OVERWRITE TABLE t2 REDUCE t2.c1 USING 'malicious-script2.pl' AS c2; 20
  • 21. Default:  AuthorizaCon   •  Hive  ships  with  an  “advisory”  authorizaCon  system   •  •  •  21 All  users  see  all  databases/tables/columns   Does  not  fix  any  security  holes   Users  grant  themselves  permissions  
  • 22. Outline   •  •  IntroducCon   Hadoop  Security  Primer   •  •  •  Security  opCons   •  •  •  •  22 AuthenCcaCon   AuthorizaCon   Default   Kerberos  with  ImpersonaCon   Kerberos  with  Sentry   Demo  
  • 23. Kerberos  with  impersonaCon:  Sharing  data   The  user  “manager1”  wants  to  share  the  table  “manager1_table”   with  senior  analysts  but  not  junior  analysts. # hadoop fs -ls -R /user/hive/warehouse drwxr-x--T - analyst1 analyst1 drwxr-x--T - jranalyst1 jranalyst1 drwxr-x--T - manager1 manager1 23 0 0 0 analyst1_table jranalyst1_table manager1_table
  • 24. Kerberos  with  impersonaCon:  Sharing  data   IT  must  create  a  group # groupadd senioranalysts   Then  add  the  appropriate  members  to  group # usermod -G analyst,senioranalysts analyst1 # usermod -G management,analyst,senioranalysts manager1 24
  • 25. Kerberos  with  impersonaCon:  Sharing  data   Then  “manager1”  can  manually  change  the  file  permissions   $ hadoop fs -chgrp -R senioranalysts …/warehouse/manager1_table $ hadoop fs -ls /user/hive/warehouse/ Found 3 items drwxr-x--T - analyst1 analyst1 drwxr-x--T - jranalyst1 jranalyst1 drwxr-x--T - manager1 senioranalysts 25 0 0 0 analyst1_table jranalyst1_table manager1_table
  • 26. Kerberos  with  impersonaCon:  Sharing  data   Now  any  senior-­‐level  analyst  can  query  the  data   $ whoami analyst1 $ beeline ... Connected to: Hive (version 0.10.0) 0: jdbc:hive2://localhost:10000/default> select count(*) from manager1_table; +------------+ | count(*) | +------------+ | 47 | +------------+ 26 ⏎
  • 27. Kerberos  with  impersonaCon:  Sharing  data   Junior  analysts  cannot  query  the  data:   $ whoami jranalyst1 $ beeline .... Connected to: Hive (version 0.10.0) 0: jdbc:hive2://localhost:10000/default> ⏎ select * from manager1_table; Error: java.io.IOException: org.apache.hadoop.security.AccessControlException: Permission denied: user=jranalyst1, access=READ_EXECUTE, inode="/user/hive/warehouse/ manager1_table":manager1:senioranalysts:drwxr-x--T 27
  • 28. Kerberos  with  impersonaCon:  Sharing  data       What  happens  in  the  real  world?   28
  • 29. Kerberos  with  impersonaCon:  Sharing  data   Table  “manager1_table”  is  owned  by  user/group  “manager1”   $ hadoop fs -ls /user/hive/warehouse/ Found 3 items drwxr-x--T - analyst1 analyst1 drwxr-x--T - jranalyst1 jranalyst1 drwxr-x--T - manager1 manager1 29 0 0 0 analyst1_table jranalyst1_table manager1_table
  • 30. Kerberos  with  impersonaCon:  Sharing  data   User  “manager1”  makes  “manager1_table”  world  readable/writable   $ hadoop fs -chmod -R 777 /user/hive/warehouse/manager1_table $ hadoop fs -ls /user/hive/warehouse/ Found 3 items drwxr-x--T - analyst1 analyst1 drwxr-x--T - jranalyst1 jranalyst1 drwxrwxrwt - manager1 manager1 30 0 0 0 analyst1_table jranalyst1_table manager1_table
  • 31. Kerberos  with  impersonaCon:  Summary   •  Securing  Hive  with  Kerberos  and  impersonaCon   makes  Hive  unusable  for  DW  offload   •  •  •  •  31 Manual  file  permission  management   End  state  is  world  writable/readable   No  ability  to  restrict  access  to  columns  or  rows   All  users  see  all  databases/tables/columns  
  • 32. Outline   •  •  IntroducCon   Hadoop  Security  Primer   •  •  •  Security  opCons   •  •  •  •  32 AuthenCcaCon   AuthorizaCon   Default   Kerberos  with  ImpersonaCon   Kerberos  with  Sentry   Demo  
  • 33. Fine  Grained  Security:  Apache  Sentry   AuthorizaRon  module  for  Hive,  Search,  &  Impala   Unlocks  Key  RBAC  Requirements   Secure,  fine-­‐grained,  role-­‐based  authorizaCon   MulC-­‐tenant  administraCon   Open  Source   Apache  Incubator  project   Ecosystem  Support   Apache  SOLR,  HiveServer2,  &  Impala  1.1+   33
  • 34. Key  Benefits  of  Sentry   Store  SensiCve  Data  in  Hadoop   Extend  Hadoop  to  More  Users   Comply  with  RegulaCons   34
  • 35. Key  CapabiliCes  of  Sentry   Fine-­‐Grained  AuthorizaCon   Specify  security  for  SERVERS,  DATABASES,  TABLES  &  VIEWS   Role-­‐Based  AuthorizaCon   SELECT  privilege  on  views  &  tables     INSERT  privilege  on  tables   ALL  privilege  on  the  server,  databases,  tables  &  views   ALL  privilege  is  needed  to  create/modify  schema   MulC-­‐Tenant  AdministraCon   Separate  policies  for  each  database/schema   Can  be  maintained  by  separate  admins   35
  • 36. Sentry  Architecture   Impala   Binding   Layer   Impala   HiveServer2   Hive   Authoriza5on   Provider   SOLR   Search   Pig   Policy  Engine   Policy  Provider   File   Local  FS/HDFS   36 Database   …  
  • 37. Query  ExecuCon  Flow   SQL   Parse   Validate  SQL  grammar   Build   Construct  statement  tree   Check   37 Validate  statement  objects   •  First  check:  AuthorizaCon   Forward  to  execuCon  planner   Plan   MR   Sentry   Query  
  • 38. Outline   •  •  IntroducCon   Hadoop  Security  Primer   •  •  •  Security  opCons   •  •  •  •  38 AuthenCcaCon   AuthorizaCon   Default   Kerberos  with  ImpersonaCon   Kerberos  with  Sentry   Demo  
  • 39. Click  to  edit  Master  Ctle  style   39