Deploying enterprise grade security for Hadoop with Apache Sentry (incubating).
Apache Hive is deployed in the vast majority of Hadoop use cases despite the major practical flaws in it's most secure operational mode (Kerberos + User Impersonation).
In this talk we will discuss these flaws and how Apache Sentry addresses them. We will then enable Apache Sentry on a existing cluster. Additional topics will include Hadoop security and Role Based Access Control (RBAC).
1. Deploying
enterprise
grade
security
for
Hadoop
Brock
Noland
|So.ware
Engineer,
Cloudera
February
27,
2014
1
2. Outline
•
•
IntroducCon
Hadoop
security
primer
•
•
•
Security
opCons
•
•
•
•
2
AuthenCcaCon
AuthorizaCon
Default
Kerberos
with
ImpersonaCon
Kerberos
with
Sentry
Demo
3. IntroducCon
Tonight's
focus
is
SQL-‐on-‐Hadoop
• Vast
majority
of
Hadoop
users
use
Hive
or
Cloudera
Impala
• Data
warehouse
offload
is
the
most
common
use
case
• Data
warehouse
offload
is
a
two
step
process
1.
2.
3
AutomaCc
transformaCons
moved
to
Hadoop
Data
analysts
given
query
access
7. Default
AuthenCcaCon
–
trusted
network
•
•
•
Default
security
mechanism
Hadoop
client
uses
local
username
Used
in
•
•
•
•
7
POCs
Startups
Demos
Pre-‐prod
environments
8. Default
AuthenCcaCon
–
trusted
network
Client
Host
User:
brock
File:
a.txt
Contents:
some
data
$
whoami
brock
$
cat
a.txt
some
data
$
hadoop
fs
-‐put
a.txt
.
8
Hadoop
9. Strong
AuthenCcaCon
–
Kerberos
•
Hadoop
is
secured
with
Kerberos
•
•
•
Every
user
and
service
has
a
Kerberos
“principal”
•
•
•
Service:
impala/hostname@MYCOMPANY.COM
User:
brock@MYCOMPANY.COM
CredenCals
•
•
9
Provides
mutual
authenCcaCon
Protects
against
eavesdropping
and
replay
a^acks
Service:
keytabs
User:
password
10. Strong
AuthenCcaCon
–
Kerberos
Client
Host
<kerberos
Ccket>
<encrypted
data>
*
$
whoami
brock
$
kinit
Password:
*******
$
cat
a.txt
some
data
$
hadoop
fs
-‐put
a.txt
.
10
Hadoop
*
RPC
EncrypCon
must
be
enabled
11. Strong
AuthenCcaCon
–
Kerberos
•
Keytab
•
•
11
Encrypted
key
for
servers
(similar
to
a
“password”)
Generated
by
server
such
as
MIT
Kerberos
or
AcCve
Directory
12. Strong
AuthenCcaCon
–
Kerberos
•
ImpersonaCon
•
•
•
12
Services
such
as
Hive
Server2
impersonate
users
Data
loaded
by
“joe”
via
HS2
is
owned
by
“joe”
Oozie
jobs
submi^ed
by
“brock”
are
run
as
“brock”
13. Hive
Server
2
and
Oozie
Beeline
(Hive
CLI)
Tableau
JDBC
Hive
Server
2
(HS2)
Oozie
Hadoop
13
Oozie
CLI
Control-‐M
14. AuthorizaCon
•
HDFS
permissions
•
•
•
•
Other
Hadoop
components
have
authorizaCon
•
•
14
Unix
style
Read/Write/Execute
for
Owner/Group/Other
Coarse
grained
MapReduce
who
can
use
which
job
queues
HBase
table
ACL’s
15. HDFS
Permisssions
$ hadoop fs -ls file
-rw-r----1 analyst1 analysts
•
Permissions
•
•
•
•
Owner
•
•
Unix
style
permissions
Read/Write/Execute
Owner/Group/Other
One
and
only
one
owner
Group
•
One
and
only
one
group
2244 2014-01-19 12:15 file
16. Back
to
our
use
case
•
Scenario
facts
•
•
•
•
Next
step
•
•
16
ETL
offload
is
a
success
Data
warehouse
is
expensive
and
at
capacity
Same
data
is
in
Hadoop
End
users
start
using
Hadoop
to
augment
the
DW
Security
becomes
primary
concern
17. End
users
need
to
share
data
•
•
•
•
17
Unlike
automated
ETL
jobs,
end
users
want
to
share
data
with
peers
Must
manage
HDFS
permissions
manually
Each
file
has
a
single
group
End
result
is
users
set
permissions
to
world
readable/writeable
18. Outline
•
•
IntroducCon
Hadoop
Security
Primer
•
•
•
Security
opCons
•
•
•
•
18
AuthenCcaCon
AuthorizaCon
Default
Kerberos
with
ImpersonaCon
Kerberos
with
Sentry
Demo
19. Hive:
Security
holes
CREATE TEMPORARY FUNCTION
custom_udf AS ’com.mycompany.
MaliciousClass’;
SELECT TRANSFORM(stuff)
USING 'malicious-script.pl'
AS thing1, thing;
CREATE EXTERNAL TABLE
external_table(column1 string)
LOCATION ‘/path/to/any/table’;
19
20. Hive:
Security
holes
CREATE TABLE test (c1 string)
ROW FORMAT SERDE 'com.mycompany.MaliciousClass';
FROM (
FROM t1
MAP t1.c1
USING 'malicious-script1.pl'
CLUSTER BY key) map_output
INSERT OVERWRITE TABLE t2
REDUCE t2.c1
USING 'malicious-script2.pl'
AS c2;
20
21. Default:
AuthorizaCon
•
Hive
ships
with
an
“advisory”
authorizaCon
system
•
•
•
21
All
users
see
all
databases/tables/columns
Does
not
fix
any
security
holes
Users
grant
themselves
permissions
22. Outline
•
•
IntroducCon
Hadoop
Security
Primer
•
•
•
Security
opCons
•
•
•
•
22
AuthenCcaCon
AuthorizaCon
Default
Kerberos
with
ImpersonaCon
Kerberos
with
Sentry
Demo
23. Kerberos
with
impersonaCon:
Sharing
data
The
user
“manager1”
wants
to
share
the
table
“manager1_table”
with
senior
analysts
but
not
junior
analysts.
# hadoop fs -ls -R /user/hive/warehouse
drwxr-x--T
- analyst1
analyst1
drwxr-x--T
- jranalyst1 jranalyst1
drwxr-x--T
- manager1
manager1
23
0
0
0
analyst1_table
jranalyst1_table
manager1_table
24. Kerberos
with
impersonaCon:
Sharing
data
IT
must
create
a
group
# groupadd senioranalysts
Then
add
the
appropriate
members
to
group
# usermod -G analyst,senioranalysts analyst1
# usermod -G management,analyst,senioranalysts manager1
24
25. Kerberos
with
impersonaCon:
Sharing
data
Then
“manager1”
can
manually
change
the
file
permissions
$ hadoop fs -chgrp -R senioranalysts …/warehouse/manager1_table
$ hadoop fs -ls /user/hive/warehouse/
Found 3 items
drwxr-x--T
- analyst1
analyst1
drwxr-x--T
- jranalyst1 jranalyst1
drwxr-x--T
- manager1
senioranalysts
25
0
0
0
analyst1_table
jranalyst1_table
manager1_table
26. Kerberos
with
impersonaCon:
Sharing
data
Now
any
senior-‐level
analyst
can
query
the
data
$ whoami
analyst1
$ beeline ...
Connected to: Hive (version 0.10.0)
0: jdbc:hive2://localhost:10000/default>
select count(*) from manager1_table;
+------------+
| count(*)
|
+------------+
| 47
|
+------------+
26
⏎
27. Kerberos
with
impersonaCon:
Sharing
data
Junior
analysts
cannot
query
the
data:
$ whoami
jranalyst1
$ beeline ....
Connected to: Hive (version 0.10.0)
0: jdbc:hive2://localhost:10000/default> ⏎
select * from manager1_table;
Error: java.io.IOException:
org.apache.hadoop.security.AccessControlException: Permission denied:
user=jranalyst1, access=READ_EXECUTE, inode="/user/hive/warehouse/
manager1_table":manager1:senioranalysts:drwxr-x--T
27
29. Kerberos
with
impersonaCon:
Sharing
data
Table
“manager1_table”
is
owned
by
user/group
“manager1”
$ hadoop fs -ls /user/hive/warehouse/
Found 3 items
drwxr-x--T
- analyst1
analyst1
drwxr-x--T
- jranalyst1 jranalyst1
drwxr-x--T
- manager1
manager1
29
0
0
0
analyst1_table
jranalyst1_table
manager1_table
30. Kerberos
with
impersonaCon:
Sharing
data
User
“manager1”
makes
“manager1_table”
world
readable/writable
$ hadoop fs -chmod -R 777 /user/hive/warehouse/manager1_table
$ hadoop fs -ls /user/hive/warehouse/
Found 3 items
drwxr-x--T
- analyst1
analyst1
drwxr-x--T
- jranalyst1 jranalyst1
drwxrwxrwt
- manager1
manager1
30
0
0
0
analyst1_table
jranalyst1_table
manager1_table
31. Kerberos
with
impersonaCon:
Summary
•
Securing
Hive
with
Kerberos
and
impersonaCon
makes
Hive
unusable
for
DW
offload
•
•
•
•
31
Manual
file
permission
management
End
state
is
world
writable/readable
No
ability
to
restrict
access
to
columns
or
rows
All
users
see
all
databases/tables/columns
32. Outline
•
•
IntroducCon
Hadoop
Security
Primer
•
•
•
Security
opCons
•
•
•
•
32
AuthenCcaCon
AuthorizaCon
Default
Kerberos
with
ImpersonaCon
Kerberos
with
Sentry
Demo
33. Fine
Grained
Security:
Apache
Sentry
AuthorizaRon
module
for
Hive,
Search,
&
Impala
Unlocks
Key
RBAC
Requirements
Secure,
fine-‐grained,
role-‐based
authorizaCon
MulC-‐tenant
administraCon
Open
Source
Apache
Incubator
project
Ecosystem
Support
Apache
SOLR,
HiveServer2,
&
Impala
1.1+
33
34. Key
Benefits
of
Sentry
Store
SensiCve
Data
in
Hadoop
Extend
Hadoop
to
More
Users
Comply
with
RegulaCons
34
35. Key
CapabiliCes
of
Sentry
Fine-‐Grained
AuthorizaCon
Specify
security
for
SERVERS,
DATABASES,
TABLES
&
VIEWS
Role-‐Based
AuthorizaCon
SELECT
privilege
on
views
&
tables
INSERT
privilege
on
tables
ALL
privilege
on
the
server,
databases,
tables
&
views
ALL
privilege
is
needed
to
create/modify
schema
MulC-‐Tenant
AdministraCon
Separate
policies
for
each
database/schema
Can
be
maintained
by
separate
admins
35