SlideShare a Scribd company logo
1 of 45
© Hortonworks Inc. 2014
Securing Hadoop’s REST APIs
Apache Knox Gateway
Hadoop Summit 2014
Kevin Minder
Larry McCayhttp://knox.apache.org/
user (at) knox.apache.org
dev (at) knox.apache.org
© Hortonworks Inc. 2014
Agenda
• Introduction
• The What, Why and When of Apache Knox
• Hadoop Context
• Basic Knox operation and extensibility
• How Knox
• Enhances security
• Simplifies access
• Centralizes control
• Integrates with the enterprise
• What is next for Knox
• Q & A
© Hortonworks Inc. 2014
Introductions
Kevin Minder
Middleware &
WebServices
Hortonworks
Oracle
HP
Bluestone
Larry McCay
Middleware &
Security
Hortonworks
Oracle
Probaris
HP
Bluestone
Tony Soprano
Barone Sanitation
Bada Bing
Crime Boss
Pauly D
Jersey Shore House Member
Disk Jockey
Jersey really
isn’t like this!
Mostly…
Just your “normal”
Hadoop security
guys.
© Hortonworks Inc. 2014
What is Apache Knox?
• The Apache Knox Gateway is…
• an extensible reverse proxy framework
• for securely exposing REST APIs and HTTP based services at a
perimeter
• out of the box it provides:
• support for several of the most common Hadoop services
• integration with enterprise authentication systems
• several other useful features
© Hortonworks Inc. 2014
What the Apache Knox Gateway isn’t
• Not an alternative to Kerberos for strong Hadoop core authentication
• Not a channel for high volume data ingest or export
© Hortonworks Inc. 2014
History and Status of the Apache Knox Gateway?
• 2013-02: Accepted into Apache Incubator
• 2013-04: Released 0.2.0
• 2013-10: Released 0.3.0
• 2014-02: Graduated to Apache TLP
• 2014-04: Released 0.4.0, Included in HDP 2.1
© Hortonworks Inc. 2014
Why Knox?
Simplified Access
• Kerberos encapsulation
• Extends API reach
• Single access point
• Multi-cluster support
• Single SSL certificate
Centralized Control
• Central REST API auditing
• Service-level authorization
• Alternative to SSH “edge node”
Enterprise Integration
• LDAP integration
• Active Directory integration
• SSO integration
• Apache Shiro extensibility
• Custom extensibility
Enhanced Security
• Protect network details
• Partial SSL for non-SSL services
• WebApp vulnerability filter
© Hortonworks Inc. 2014
Layers Of Hadoop Security
Perimeter Level Security
• Network Security (i.e. Firewalls)
• Apache Knox (i.e. Gateways)
Authentication
• Kerberos
• Delegation Tokens
OS Security
• File Permissions
• Process Isolation
Authorization
• MR ACLs
• HDFS Permissions
• HDFS ACLs
• HiveATZ-NG
• HBase ACLs
• Accumulo Label Security
• XA Security Policies
Data Protection
• Transport
• Storage
© Hortonworks Inc. 2014
REST API
Hadoop
Services
What does Perimeter Security really mean?
Gateway
REST API
Firewall
User
Firewall
required at
perimeter
(today)
Knox Gateway
controls all
Hadoop REST
API access
through firewall
Hadoop
cluster
mostly
unaffected
Firewall only
allows
connections
through specific
ports from Knox
host
© Hortonworks Inc. 2014
What REST APIs does Hadoop support?
Service URL Example
WebHDFS http://localhost:50070/webhdfs
WebHCat (aka Templeton) http://localhost:50111/templeton
Oozie http://localhost:11000/oozie
HBase (via Stargate) http://localhost:60080
Hive (HiveServer2) http://localhost:10001/cliservice
jdbc:hive2://localhost:10001/?hive.server2.transport.mode=http;hive.server2.thrif
t.http.path=cliservice
© Hortonworks Inc. 2014
Basic Knox Operation & Extensibility
© Hortonworks Inc. 2014
Authentication and Identity Propagation
1. REST API Request
2. HTTP Basic Auth Challenge
kminder:secret
3. Authenticate kminder:secret
knox
keytab
4. Authenticates as
knox via SPNego
(i.e. Kerberos)
5. REST API Request
doAs kminder
0. Configure
knox user to be
known as
trusted proxy
LDAP
© Hortonworks Inc. 2014
Scalability and Fault Tolerance
Hadoop
Apache HTTPD+mod_proxy_balancer
f5 BIG-IP
HAProxy
Knox Cluster
(no shared state)
Really any
traditional
web tier
load balancer
© Hortonworks Inc. 2014
Extensibility: Providers and Services
• Both are dynamically discovered on the class path via Java’s ServiceLoader
• Providers
• Add new features to the gateway that can be used by Services
• Typically result in one or more filters being added to one or more chains
• Services
• Add new endpoints to the gateway to expose a specific service
• Assemble filter chains to enable specific features via providers
• Includes providing configuration to providers
• For example URL rewrite rules
• Associates endpoints with filter chains
© Hortonworks Inc. 2014
Topology Files
• Describe the services that should be exposed for a specific cluster
• Found in <GATEWAY_HOME>/conf/topologies
• Name of topology file dictates URL component
• sandbox.xml -> http://localhost:8443/gateway/sandbox/webhdfs/…
<topology>
<gateway>
<provider>
<role>authentication</role>
<name>custom</name>
</provider>
</gateway>
<service>
<role>WEBHDFS</role>
<url>http://localhost:50070</url>
</service>
</topology>
Location of
WebHDFS in
target cluster
Selects an
authentication
provider
implementation
© Hortonworks Inc. 2014
Enhanced Security
© Hortonworks Inc. 2014
Protect Network Details: WebHDFS Example
• WebHDFS direct
curl -i -X PUT 'http://localhost:50070/webhdfs/v1/user/guest/file1?op=CREATE&user.name=guest’
HTTP/1.1 307 TEMPORARY_REDIRECT
Location:
http://sandbox.hortonworks.com:50075/webhdfs/v1/user/guest/file1?op=CREATE&user.name=guest&namenoderp
caddress=sandbox.hortonworks.com:8020&overwrite=false
• WebHDFS via Knox
curl -u guest:guest-password -i -k -X PUT 'https://localhost:8443/webhdfs/v1/user/guest/file2?op=CREATE’
HTTP/1.1 307 Temporary Redirect
Location:
https://localhost:8443/gateway/sandbox/webhdfs/data/v1/webhdfs/v1/user/guest/file2?_=AAAACAAAABAAAACAg
UDT7-QQZlpkcm09lxrxI0Bgo9d-
Egghp_qxmd4pQsmm3zvYc3M_LrDBQpMBNA48DnMS9QOhyzywCMl1WAShyX4RUETPjEcZa6x9Jwz7TMANj
SRKMR6F3rKf93ME-VsI2Phe8CX72L6oiI778--8F9DQCO8LHFHzLL70iB13Hm2BLyj-x9p3tn7FOHxkbPl5d-
eHxVop7Dk
RPC and
HTTP address
of DataNode is
leaked
unnecessarily
to REST client
Encrypted query param contains
dispatch information used by gateway
when redirect followed
© Hortonworks Inc. 2014
Protect Network Details: Oozie Example
• Oozie direct
<configuration>
<property>
<name>oozie.wf.application.path</name>
<value>hdfs://foo:9000/user/bansalm/myapp/</value>
</property>
...
</configuration>
• Oozie via Knox
<configuration>
<property>
<name>oozie.wf.application.path</name>
<value>/user/bansalm/myapp/</value>
</property>
...
</configuration>
• Example of submitting an Oozie job from Apache docs
• https://oozie.apache.org/docs/4.0.1/WebServicesAPI.html
• HTTP POST XML below to /oozie/v1/jobs
REST client
must know
RPC address
of NameNode
© Hortonworks Inc. 2014
Partial SSL for non-SSL enabled services
REST API REST API
WebHCat
DMZ
Desktop
Gateway
HTTPS HTTP
First “hop”
through
public/corp
networks
protected with
SSL
Last “hop”
within
secure
network
non-SSL
© Hortonworks Inc. 2014
WebApp Vulnerability Filter
• The Knox WebAppSec provider allows for the plugin of vulnerability prevention filters
• Cross Site Request Forgery CSRF is currently provided
• Uses common required header technique
• Later releases will include more filters based on standard techniques
<provider
<role>webappsec</role>
<name>WebAppSec</name>
<enabled>true</enabled>
<param><name>csrf.enabled</name><value>true</value></param>
<param><name>csrf.customHeader</name><value>X-XSRF-Header</value></param>
<param><name>csrf.methodsToIgnore</name><value>GET,OPTIONS,HEAD</value></param>
</provider>
© Hortonworks Inc. 2014
Simplified Access
© Hortonworks Inc. 2014
Knox Service URLs vs. direct URLs
Service Direct URL Knox URL
WebHDFS http://namenode-host:50070/webhdfs https://knox-host:8443/webhdfs
WebHCat http://webhcat-host:50111/templeton https://knox-host:8443/templeton
Oozie http://ooziehost:11000/oozie https://knox-host:8443/oozie
HBase http://hbasehost:60080 https://knox-host:8443/hbase
Hive http://hivehost:10001/cliservice https://knox-host:8443/hive
Masters could
be on many
different hosts
One hosts,
one port
Consistent
paths
© Hortonworks Inc. 2014
Hadoop CLIs need almost full server configs
/etc/hive/conf/hive-site.xml
<property>
<name>hive.server2.thrift.http.port</name>
<value>10001</value>
</property>
<property>
<name>hive.server2.thrift.http.path</name>
<value>cliservice</value>
</property>
/etc/hadoop/conf/core-site.xml
<property>
<name>fs.defaultFS</name>
<value>hdfs://sandbox.hortonworks.com:8020</value>
</property>
/etc/hadoop/conf/hdfs-site.xml
<property>
<name>dfs.namenode.http-address</name>
<value>sandbox.hortonworks.com:50070</value>
</property>
/etc/hadoop/conf/yarn-site.xml
<property>
<name>yarn.resourcemanager.address</name>
<value>sandbox.hortonworks.com:8050</value>
</property>
/etc/hive-webhcat/conf/webhcat-site.xml
<property>
<name>templeton.port</name>
<value>50111</value>
</property>
/etc/oozie/conf/oozie-site.xml
<property>
<name>oozie.base.url</name>
<value>http://sandbox.hortonworks.com:11000/oozie</value>
</property>
HBase – Command line
These files
may all be
on different
nodes on
the cluster
too!
© Hortonworks Inc. 2014
Kerberos Encapsulation
1. REST API Request
2. HTTP Basic Auth Challenge
kminder:secret
3. Authenticate kminder:secret
knox
keytab
4. Authenticates as
knox via SPNego
(i.e. Kerberos)
5. REST API Request
doAs kminder
0. Configure
knox as trusted
proxy
The client isn’t
even aware the
cluster is secured
with Kerberos
© Hortonworks Inc. 2014
REST API REST API
Hadoop
REST API Reach: Intranet Access Model
DMZ
Desktop
Gateway
Users will
discover novel
ways to use easily
accessible REST
APIs
© Hortonworks Inc. 2014
HTML/JS REST
Hadoop
REST API Reach: Middleware Access Model
Web Tier / DMZ
Browser
“Give the APIs to the Apps”
GatewayApp
Server
REST
Most enterprises
cannot deal with
Kerberos in the
web tier and don’t
have CLI access
© Hortonworks Inc. 2014
REST API REST API
Hadoop
REST API Reach: Internet Access Model
DMZ
“Give the APIs to the Everyone”
Gateway
Internet
HaaS vendors
are exposing
Hadoop REST
APIs to the
internet. What
does the API tell
these clients to
know about your
cluster?
© Hortonworks Inc. 2014
Multi-Cluster Support
Gateway
http://knox:8443/gateway/green/webhdfs/v1 http://knox:8443/gateway/blue/webhdfs/v1
green
Production
Cluster
blue
Research
Cluster
One hosts,
one port for
many
clusters
© Hortonworks Inc. 2014
Simplified Client Certificate Management
hdfs
cert
hive
cert
hbase
cert
knox
cert
knox
pubkey
hive
pubkey
hbase
pubkey
hdfs
pubkey
• User only needs to trust Knox’s cert
• Admin only needs to manage multiple keys on Knox hosts
© Hortonworks Inc. 2014
Centralized Control
© Hortonworks Inc. 2014
SCP/SSHLogin Hadoop CLIs
Hadoop
SSH Edge Node CLI Access Model
DMZ
Edge Node
Desktop
“Take the Users to the CLI”Limited
auditing on
edge node
CLI too hard
to install on
desktops
© Hortonworks Inc. 2014
REST APILogin REST API
Hadoop
Improved auditing and access control
DMZ
Desktop
Gateway
All activity
audited
consistently
Additional
authorization
control
available
© Hortonworks Inc. 2014
Service Level Authorization
• Control access to services by user, group or IP address
• Resource level authorization should always be done at resource manager (e.g. HDFS)
<provider>
<role>authorization</role>
<name>AclsAuthz</name>
<enabled>true</enabled>
<param>
<name>WEBHDFS.acl</name>
<value>*;admin;127.0.0.1</value>
</param>
</provider>
© Hortonworks Inc. 2014
XA Secure Integration Thoughts
1. REST API Request
0. Distribute
policy
3. REST API Request
Policy Server
Agent
2. Service level
authorization decision
Agent
integrated as
authorization
provider
Policies
authored in
the portal and
distributed by
the policy
server
© Hortonworks Inc. 2014
KNOX-250: SSH Bastion Auditing Functionality
• Community is developing an extension
• Based on Apache MINA SSHD
• Provides administrative Hadoop SSH access via Knox
• Further centralizes auditing of cluster administration
© Hortonworks Inc. 2014
KNOX-250: SSH Bastion Auditing Functionality
SSHLogin Hadoop CLI
Hadoop
DMZ
Desktop
Gateway
All activity
audited
consistently
© Hortonworks Inc. 2014
Enterprise Integration
© Hortonworks Inc. 2014
Apache Shiro Authentication Provider
• Apache Shiro is the primary authentication provider for Knox
• Used for both LDAP and Active Directory
• Apache Shiro is a popular JEE and JSE security framework
• Very modular and flexible architecture
• Many community extensions
• Integrated into Knox as normal authentication provider
© Hortonworks Inc. 2014
Apache Shiro Authentication Provider
<provider>
<role>authentication</role>
<name>ShiroProvider</name>
<enabled>true</enabled>
<param>
<name>main.ldapRealm</name>
<value>org.apache.shiro.realm.ldap.JndiLdapRealm</value>
</param>
<param>
<name>main.ldapRealm.userDnTemplate</name>
<value>uid={0},ou=people,dc=hadoop,dc=apache,dc=org</value>
</param>
<param>
<name>main.ldapRealm.contextFactory.url</name>
<value>ldap://localhost:33389</value>
</param>
<param>
<name>main.ldapRealm.contextFactory.authenticationMechanism</name>
<value>simple</value>
</param>
<param>
<name>urls./**</name>
<value>authcBasic</value>
</param>
</provider>
© Hortonworks Inc. 2014
SSO Integration
• Similar in concept Hadoop’s trusted proxy model
• Preconfigured for SiteMinder use case
• HTTP Headers used to propagate pre-authenticated user and group info
• Only acceptable for use in a tightly controlled network environment
<provider>
<role>federation</role>
<name>HeaderPreAuth</name>
<enabled>true</enabled>
<param>
<name>preauth.validation.method</name>
<value>preauth.ip.validation</value>
</param>
<param>
<name>preauth.ip.addresses</name>
<value>127.0.*</value>
</param>
</provider>
© Hortonworks Inc. 2014
OAuth 2
• OAuth is becoming the defacto standard for communicating a user’s
identity to REST APIs
• It allows for explicit authorization by the user for the application to
access resources
• It has a number of ways to represent the user and authentication
information to go over the wire
• JSON Web Token (JWT) is an emerging standard for representing the
various claims, attributes and scopes of an identity
• Can be used as a bearer token, URL parameter or Header
• OAuth is also gaining popularity as a federation token for SSO
integrations
© Hortonworks Inc. 2014
KNOX-393: OAuth Resource Provider
• Community investigating OAuth Federation Provider extension
• Considering Apache Oltu
• Warning: Diagram dramatically oversimplified
• There are a number of other potential flows
2. REST API Request
Authorization: Bearer <token>
3. validateAccessToken(<token>)
4. Authenticates as
knox via SPNego
(i.e. Kerberos)
5. REST API Request
doAs kminder
0. Configure
knox user to be
known as
trusted proxy
1. requestAccessToken(JWT)
return Bearer token
kminder
© Hortonworks Inc. 2014
What is next for Knox?
Jira Assignee Description
KNOX-393: OAuth Resource Provider for
Middleware and Application Integration
COMMUNITY OAuth 2 federation provider potentially based on Apache
Oltu for external application SSO to Knox and Hadoop
KNOX-355: Support Knox Authentication
Provider based on Hadoop Auth Module
(SPNEGO)
KNOX Team SPNEGO authentication support for Knox clients
KNOX-250: SSH Bastion Auditing Functionality COMMUNITY SSH tunneling and auditing functionality in addition to
REST gateway services.
KNOX-353: Support Hadoop Java Client URLs KNOX Team In order to be used Hadoop CLIs that can use REST, we
need to support the expected URLs. This is in addition to
the extended URLs for multiple Hadoop cluster support
by Knox.
KNOX-242: LDAP Authentication
Enhancements
KNOX Team Search attribute based authentication rather than simple
LDAP bind.
KNOX-74: Support YARN REST API KNOX Team Add support for the YARN REST API
KNOX-66: Support Ambari REST API access
via the Gateway
KNOX Team Add support for the Ambari REST API
TBD TBD What is important to you?
© Hortonworks Inc. 2014
Interested?
• We’re hiring!
• http://hortonworks.com/careers/open-positions/
• Especially hands on platform level development experience with
• Kerberos
• LDAP
• OAuth
• SAML
• JAAS/GSS-API
• Crypto
© Hortonworks Inc. 2014
Questions and Answers

More Related Content

What's hot

Ambari: Agent Registration Flow
Ambari: Agent Registration FlowAmbari: Agent Registration Flow
Ambari: Agent Registration Flow
Hortonworks
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
DataWorks Summit
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
Dvir Volk
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data Processing
DataWorks Summit
 

What's hot (20)

Managing your Hadoop Clusters with Apache Ambari
Managing your Hadoop Clusters with Apache AmbariManaging your Hadoop Clusters with Apache Ambari
Managing your Hadoop Clusters with Apache Ambari
 
Ambari: Agent Registration Flow
Ambari: Agent Registration FlowAmbari: Agent Registration Flow
Ambari: Agent Registration Flow
 
Manage Add-On Services with Apache Ambari
Manage Add-On Services with Apache AmbariManage Add-On Services with Apache Ambari
Manage Add-On Services with Apache Ambari
 
Kafka 101
Kafka 101Kafka 101
Kafka 101
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
 
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudAmazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
 
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing
 
Introduction to Vault
Introduction to VaultIntroduction to Vault
Introduction to Vault
 
[211] HBase 기반 검색 데이터 저장소 (공개용)
[211] HBase 기반 검색 데이터 저장소 (공개용)[211] HBase 기반 검색 데이터 저장소 (공개용)
[211] HBase 기반 검색 데이터 저장소 (공개용)
 
Ceph issue 해결 사례
Ceph issue 해결 사례Ceph issue 해결 사례
Ceph issue 해결 사례
 
Apache Knox Gateway "Single Sign On" expands the reach of the Enterprise Users
Apache Knox Gateway "Single Sign On" expands the reach of the Enterprise UsersApache Knox Gateway "Single Sign On" expands the reach of the Enterprise Users
Apache Knox Gateway "Single Sign On" expands the reach of the Enterprise Users
 
The Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and ContainersThe Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and Containers
 
Introduction to memcached
Introduction to memcachedIntroduction to memcached
Introduction to memcached
 
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query ProcessingApache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
 
Intro to HBase
Intro to HBaseIntro to HBase
Intro to HBase
 
Kafka Security
Kafka SecurityKafka Security
Kafka Security
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data Processing
 
[오픈소스컨설팅] 쿠버네티스와 쿠버네티스 on 오픈스택 비교 및 구축 방법
[오픈소스컨설팅] 쿠버네티스와 쿠버네티스 on 오픈스택 비교  및 구축 방법[오픈소스컨설팅] 쿠버네티스와 쿠버네티스 on 오픈스택 비교  및 구축 방법
[오픈소스컨설팅] 쿠버네티스와 쿠버네티스 on 오픈스택 비교 및 구축 방법
 
Building Data Product Based on Apache Spark at Airbnb with Jingwei Lu and Liy...
Building Data Product Based on Apache Spark at Airbnb with Jingwei Lu and Liy...Building Data Product Based on Apache Spark at Airbnb with Jingwei Lu and Liy...
Building Data Product Based on Apache Spark at Airbnb with Jingwei Lu and Liy...
 

Similar to Hadoop REST API Security with Apache Knox Gateway

Hadoop Security Today and Tomorrow
Hadoop Security Today and TomorrowHadoop Security Today and Tomorrow
Hadoop Security Today and Tomorrow
DataWorks Summit
 
Apache Knox - Hadoop Security Swiss Army Knife
Apache Knox - Hadoop Security Swiss Army KnifeApache Knox - Hadoop Security Swiss Army Knife
Apache Knox - Hadoop Security Swiss Army Knife
DataWorks Summit
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop Security
DataWorks Summit
 

Similar to Hadoop REST API Security with Apache Knox Gateway (20)

Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...
Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...
Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...
 
Hadoop Security Today and Tomorrow
Hadoop Security Today and TomorrowHadoop Security Today and Tomorrow
Hadoop Security Today and Tomorrow
 
Apache Kafka Security
Apache Kafka Security Apache Kafka Security
Apache Kafka Security
 
Hdp security overview
Hdp security overview Hdp security overview
Hdp security overview
 
Apache Argus - How do I secure my entire Hadoop cluster? Olivier Renault @ Ho...
Apache Argus - How do I secure my entire Hadoop cluster? Olivier Renault @ Ho...Apache Argus - How do I secure my entire Hadoop cluster? Olivier Renault @ Ho...
Apache Argus - How do I secure my entire Hadoop cluster? Olivier Renault @ Ho...
 
TriHUG October: Apache Ranger
TriHUG October: Apache RangerTriHUG October: Apache Ranger
TriHUG October: Apache Ranger
 
Fortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache Knox
Fortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache KnoxFortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache Knox
Fortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache Knox
 
Troubleshooting Kerberos in Hadoop: Taming the Beast
Troubleshooting Kerberos in Hadoop: Taming the BeastTroubleshooting Kerberos in Hadoop: Taming the Beast
Troubleshooting Kerberos in Hadoop: Taming the Beast
 
August 2014 HUG : Comprehensive Security for Hadoop
August 2014 HUG : Comprehensive Security for HadoopAugust 2014 HUG : Comprehensive Security for Hadoop
August 2014 HUG : Comprehensive Security for Hadoop
 
2014 sept 4_hadoop_security
2014 sept 4_hadoop_security2014 sept 4_hadoop_security
2014 sept 4_hadoop_security
 
Hortonworks Technical Workshop: Interactive Query with Apache Hive
Hortonworks Technical Workshop: Interactive Query with Apache Hive Hortonworks Technical Workshop: Interactive Query with Apache Hive
Hortonworks Technical Workshop: Interactive Query with Apache Hive
 
Kafka Security
Kafka SecurityKafka Security
Kafka Security
 
Curb your insecurity with HDP - Tips for a Secure Cluster
Curb your insecurity with HDP - Tips for a Secure ClusterCurb your insecurity with HDP - Tips for a Secure Cluster
Curb your insecurity with HDP - Tips for a Secure Cluster
 
Apache Knox - Hadoop Security Swiss Army Knife
Apache Knox - Hadoop Security Swiss Army KnifeApache Knox - Hadoop Security Swiss Army Knife
Apache Knox - Hadoop Security Swiss Army Knife
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop Security
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop Security
 
Secure Hadoop clusters on Windows platform
Secure Hadoop clusters on Windows platformSecure Hadoop clusters on Windows platform
Secure Hadoop clusters on Windows platform
 
An Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseAn Apache Hive Based Data Warehouse
An Apache Hive Based Data Warehouse
 
Hadoop and Data Access Security
Hadoop and Data Access SecurityHadoop and Data Access Security
Hadoop and Data Access Security
 
Accumulo Summit 2014: Monitoring Apache Accumulo
Accumulo Summit 2014: Monitoring Apache AccumuloAccumulo Summit 2014: Monitoring Apache Accumulo
Accumulo Summit 2014: Monitoring Apache Accumulo
 

More from DataWorks Summit

HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 

Hadoop REST API Security with Apache Knox Gateway

  • 1. © Hortonworks Inc. 2014 Securing Hadoop’s REST APIs Apache Knox Gateway Hadoop Summit 2014 Kevin Minder Larry McCayhttp://knox.apache.org/ user (at) knox.apache.org dev (at) knox.apache.org
  • 2. © Hortonworks Inc. 2014 Agenda • Introduction • The What, Why and When of Apache Knox • Hadoop Context • Basic Knox operation and extensibility • How Knox • Enhances security • Simplifies access • Centralizes control • Integrates with the enterprise • What is next for Knox • Q & A
  • 3. © Hortonworks Inc. 2014 Introductions Kevin Minder Middleware & WebServices Hortonworks Oracle HP Bluestone Larry McCay Middleware & Security Hortonworks Oracle Probaris HP Bluestone Tony Soprano Barone Sanitation Bada Bing Crime Boss Pauly D Jersey Shore House Member Disk Jockey Jersey really isn’t like this! Mostly… Just your “normal” Hadoop security guys.
  • 4. © Hortonworks Inc. 2014 What is Apache Knox? • The Apache Knox Gateway is… • an extensible reverse proxy framework • for securely exposing REST APIs and HTTP based services at a perimeter • out of the box it provides: • support for several of the most common Hadoop services • integration with enterprise authentication systems • several other useful features
  • 5. © Hortonworks Inc. 2014 What the Apache Knox Gateway isn’t • Not an alternative to Kerberos for strong Hadoop core authentication • Not a channel for high volume data ingest or export
  • 6. © Hortonworks Inc. 2014 History and Status of the Apache Knox Gateway? • 2013-02: Accepted into Apache Incubator • 2013-04: Released 0.2.0 • 2013-10: Released 0.3.0 • 2014-02: Graduated to Apache TLP • 2014-04: Released 0.4.0, Included in HDP 2.1
  • 7. © Hortonworks Inc. 2014 Why Knox? Simplified Access • Kerberos encapsulation • Extends API reach • Single access point • Multi-cluster support • Single SSL certificate Centralized Control • Central REST API auditing • Service-level authorization • Alternative to SSH “edge node” Enterprise Integration • LDAP integration • Active Directory integration • SSO integration • Apache Shiro extensibility • Custom extensibility Enhanced Security • Protect network details • Partial SSL for non-SSL services • WebApp vulnerability filter
  • 8. © Hortonworks Inc. 2014 Layers Of Hadoop Security Perimeter Level Security • Network Security (i.e. Firewalls) • Apache Knox (i.e. Gateways) Authentication • Kerberos • Delegation Tokens OS Security • File Permissions • Process Isolation Authorization • MR ACLs • HDFS Permissions • HDFS ACLs • HiveATZ-NG • HBase ACLs • Accumulo Label Security • XA Security Policies Data Protection • Transport • Storage
  • 9. © Hortonworks Inc. 2014 REST API Hadoop Services What does Perimeter Security really mean? Gateway REST API Firewall User Firewall required at perimeter (today) Knox Gateway controls all Hadoop REST API access through firewall Hadoop cluster mostly unaffected Firewall only allows connections through specific ports from Knox host
  • 10. © Hortonworks Inc. 2014 What REST APIs does Hadoop support? Service URL Example WebHDFS http://localhost:50070/webhdfs WebHCat (aka Templeton) http://localhost:50111/templeton Oozie http://localhost:11000/oozie HBase (via Stargate) http://localhost:60080 Hive (HiveServer2) http://localhost:10001/cliservice jdbc:hive2://localhost:10001/?hive.server2.transport.mode=http;hive.server2.thrif t.http.path=cliservice
  • 11. © Hortonworks Inc. 2014 Basic Knox Operation & Extensibility
  • 12. © Hortonworks Inc. 2014 Authentication and Identity Propagation 1. REST API Request 2. HTTP Basic Auth Challenge kminder:secret 3. Authenticate kminder:secret knox keytab 4. Authenticates as knox via SPNego (i.e. Kerberos) 5. REST API Request doAs kminder 0. Configure knox user to be known as trusted proxy LDAP
  • 13. © Hortonworks Inc. 2014 Scalability and Fault Tolerance Hadoop Apache HTTPD+mod_proxy_balancer f5 BIG-IP HAProxy Knox Cluster (no shared state) Really any traditional web tier load balancer
  • 14. © Hortonworks Inc. 2014 Extensibility: Providers and Services • Both are dynamically discovered on the class path via Java’s ServiceLoader • Providers • Add new features to the gateway that can be used by Services • Typically result in one or more filters being added to one or more chains • Services • Add new endpoints to the gateway to expose a specific service • Assemble filter chains to enable specific features via providers • Includes providing configuration to providers • For example URL rewrite rules • Associates endpoints with filter chains
  • 15. © Hortonworks Inc. 2014 Topology Files • Describe the services that should be exposed for a specific cluster • Found in <GATEWAY_HOME>/conf/topologies • Name of topology file dictates URL component • sandbox.xml -> http://localhost:8443/gateway/sandbox/webhdfs/… <topology> <gateway> <provider> <role>authentication</role> <name>custom</name> </provider> </gateway> <service> <role>WEBHDFS</role> <url>http://localhost:50070</url> </service> </topology> Location of WebHDFS in target cluster Selects an authentication provider implementation
  • 16. © Hortonworks Inc. 2014 Enhanced Security
  • 17. © Hortonworks Inc. 2014 Protect Network Details: WebHDFS Example • WebHDFS direct curl -i -X PUT 'http://localhost:50070/webhdfs/v1/user/guest/file1?op=CREATE&user.name=guest’ HTTP/1.1 307 TEMPORARY_REDIRECT Location: http://sandbox.hortonworks.com:50075/webhdfs/v1/user/guest/file1?op=CREATE&user.name=guest&namenoderp caddress=sandbox.hortonworks.com:8020&overwrite=false • WebHDFS via Knox curl -u guest:guest-password -i -k -X PUT 'https://localhost:8443/webhdfs/v1/user/guest/file2?op=CREATE’ HTTP/1.1 307 Temporary Redirect Location: https://localhost:8443/gateway/sandbox/webhdfs/data/v1/webhdfs/v1/user/guest/file2?_=AAAACAAAABAAAACAg UDT7-QQZlpkcm09lxrxI0Bgo9d- Egghp_qxmd4pQsmm3zvYc3M_LrDBQpMBNA48DnMS9QOhyzywCMl1WAShyX4RUETPjEcZa6x9Jwz7TMANj SRKMR6F3rKf93ME-VsI2Phe8CX72L6oiI778--8F9DQCO8LHFHzLL70iB13Hm2BLyj-x9p3tn7FOHxkbPl5d- eHxVop7Dk RPC and HTTP address of DataNode is leaked unnecessarily to REST client Encrypted query param contains dispatch information used by gateway when redirect followed
  • 18. © Hortonworks Inc. 2014 Protect Network Details: Oozie Example • Oozie direct <configuration> <property> <name>oozie.wf.application.path</name> <value>hdfs://foo:9000/user/bansalm/myapp/</value> </property> ... </configuration> • Oozie via Knox <configuration> <property> <name>oozie.wf.application.path</name> <value>/user/bansalm/myapp/</value> </property> ... </configuration> • Example of submitting an Oozie job from Apache docs • https://oozie.apache.org/docs/4.0.1/WebServicesAPI.html • HTTP POST XML below to /oozie/v1/jobs REST client must know RPC address of NameNode
  • 19. © Hortonworks Inc. 2014 Partial SSL for non-SSL enabled services REST API REST API WebHCat DMZ Desktop Gateway HTTPS HTTP First “hop” through public/corp networks protected with SSL Last “hop” within secure network non-SSL
  • 20. © Hortonworks Inc. 2014 WebApp Vulnerability Filter • The Knox WebAppSec provider allows for the plugin of vulnerability prevention filters • Cross Site Request Forgery CSRF is currently provided • Uses common required header technique • Later releases will include more filters based on standard techniques <provider <role>webappsec</role> <name>WebAppSec</name> <enabled>true</enabled> <param><name>csrf.enabled</name><value>true</value></param> <param><name>csrf.customHeader</name><value>X-XSRF-Header</value></param> <param><name>csrf.methodsToIgnore</name><value>GET,OPTIONS,HEAD</value></param> </provider>
  • 21. © Hortonworks Inc. 2014 Simplified Access
  • 22. © Hortonworks Inc. 2014 Knox Service URLs vs. direct URLs Service Direct URL Knox URL WebHDFS http://namenode-host:50070/webhdfs https://knox-host:8443/webhdfs WebHCat http://webhcat-host:50111/templeton https://knox-host:8443/templeton Oozie http://ooziehost:11000/oozie https://knox-host:8443/oozie HBase http://hbasehost:60080 https://knox-host:8443/hbase Hive http://hivehost:10001/cliservice https://knox-host:8443/hive Masters could be on many different hosts One hosts, one port Consistent paths
  • 23. © Hortonworks Inc. 2014 Hadoop CLIs need almost full server configs /etc/hive/conf/hive-site.xml <property> <name>hive.server2.thrift.http.port</name> <value>10001</value> </property> <property> <name>hive.server2.thrift.http.path</name> <value>cliservice</value> </property> /etc/hadoop/conf/core-site.xml <property> <name>fs.defaultFS</name> <value>hdfs://sandbox.hortonworks.com:8020</value> </property> /etc/hadoop/conf/hdfs-site.xml <property> <name>dfs.namenode.http-address</name> <value>sandbox.hortonworks.com:50070</value> </property> /etc/hadoop/conf/yarn-site.xml <property> <name>yarn.resourcemanager.address</name> <value>sandbox.hortonworks.com:8050</value> </property> /etc/hive-webhcat/conf/webhcat-site.xml <property> <name>templeton.port</name> <value>50111</value> </property> /etc/oozie/conf/oozie-site.xml <property> <name>oozie.base.url</name> <value>http://sandbox.hortonworks.com:11000/oozie</value> </property> HBase – Command line These files may all be on different nodes on the cluster too!
  • 24. © Hortonworks Inc. 2014 Kerberos Encapsulation 1. REST API Request 2. HTTP Basic Auth Challenge kminder:secret 3. Authenticate kminder:secret knox keytab 4. Authenticates as knox via SPNego (i.e. Kerberos) 5. REST API Request doAs kminder 0. Configure knox as trusted proxy The client isn’t even aware the cluster is secured with Kerberos
  • 25. © Hortonworks Inc. 2014 REST API REST API Hadoop REST API Reach: Intranet Access Model DMZ Desktop Gateway Users will discover novel ways to use easily accessible REST APIs
  • 26. © Hortonworks Inc. 2014 HTML/JS REST Hadoop REST API Reach: Middleware Access Model Web Tier / DMZ Browser “Give the APIs to the Apps” GatewayApp Server REST Most enterprises cannot deal with Kerberos in the web tier and don’t have CLI access
  • 27. © Hortonworks Inc. 2014 REST API REST API Hadoop REST API Reach: Internet Access Model DMZ “Give the APIs to the Everyone” Gateway Internet HaaS vendors are exposing Hadoop REST APIs to the internet. What does the API tell these clients to know about your cluster?
  • 28. © Hortonworks Inc. 2014 Multi-Cluster Support Gateway http://knox:8443/gateway/green/webhdfs/v1 http://knox:8443/gateway/blue/webhdfs/v1 green Production Cluster blue Research Cluster One hosts, one port for many clusters
  • 29. © Hortonworks Inc. 2014 Simplified Client Certificate Management hdfs cert hive cert hbase cert knox cert knox pubkey hive pubkey hbase pubkey hdfs pubkey • User only needs to trust Knox’s cert • Admin only needs to manage multiple keys on Knox hosts
  • 30. © Hortonworks Inc. 2014 Centralized Control
  • 31. © Hortonworks Inc. 2014 SCP/SSHLogin Hadoop CLIs Hadoop SSH Edge Node CLI Access Model DMZ Edge Node Desktop “Take the Users to the CLI”Limited auditing on edge node CLI too hard to install on desktops
  • 32. © Hortonworks Inc. 2014 REST APILogin REST API Hadoop Improved auditing and access control DMZ Desktop Gateway All activity audited consistently Additional authorization control available
  • 33. © Hortonworks Inc. 2014 Service Level Authorization • Control access to services by user, group or IP address • Resource level authorization should always be done at resource manager (e.g. HDFS) <provider> <role>authorization</role> <name>AclsAuthz</name> <enabled>true</enabled> <param> <name>WEBHDFS.acl</name> <value>*;admin;127.0.0.1</value> </param> </provider>
  • 34. © Hortonworks Inc. 2014 XA Secure Integration Thoughts 1. REST API Request 0. Distribute policy 3. REST API Request Policy Server Agent 2. Service level authorization decision Agent integrated as authorization provider Policies authored in the portal and distributed by the policy server
  • 35. © Hortonworks Inc. 2014 KNOX-250: SSH Bastion Auditing Functionality • Community is developing an extension • Based on Apache MINA SSHD • Provides administrative Hadoop SSH access via Knox • Further centralizes auditing of cluster administration
  • 36. © Hortonworks Inc. 2014 KNOX-250: SSH Bastion Auditing Functionality SSHLogin Hadoop CLI Hadoop DMZ Desktop Gateway All activity audited consistently
  • 37. © Hortonworks Inc. 2014 Enterprise Integration
  • 38. © Hortonworks Inc. 2014 Apache Shiro Authentication Provider • Apache Shiro is the primary authentication provider for Knox • Used for both LDAP and Active Directory • Apache Shiro is a popular JEE and JSE security framework • Very modular and flexible architecture • Many community extensions • Integrated into Knox as normal authentication provider
  • 39. © Hortonworks Inc. 2014 Apache Shiro Authentication Provider <provider> <role>authentication</role> <name>ShiroProvider</name> <enabled>true</enabled> <param> <name>main.ldapRealm</name> <value>org.apache.shiro.realm.ldap.JndiLdapRealm</value> </param> <param> <name>main.ldapRealm.userDnTemplate</name> <value>uid={0},ou=people,dc=hadoop,dc=apache,dc=org</value> </param> <param> <name>main.ldapRealm.contextFactory.url</name> <value>ldap://localhost:33389</value> </param> <param> <name>main.ldapRealm.contextFactory.authenticationMechanism</name> <value>simple</value> </param> <param> <name>urls./**</name> <value>authcBasic</value> </param> </provider>
  • 40. © Hortonworks Inc. 2014 SSO Integration • Similar in concept Hadoop’s trusted proxy model • Preconfigured for SiteMinder use case • HTTP Headers used to propagate pre-authenticated user and group info • Only acceptable for use in a tightly controlled network environment <provider> <role>federation</role> <name>HeaderPreAuth</name> <enabled>true</enabled> <param> <name>preauth.validation.method</name> <value>preauth.ip.validation</value> </param> <param> <name>preauth.ip.addresses</name> <value>127.0.*</value> </param> </provider>
  • 41. © Hortonworks Inc. 2014 OAuth 2 • OAuth is becoming the defacto standard for communicating a user’s identity to REST APIs • It allows for explicit authorization by the user for the application to access resources • It has a number of ways to represent the user and authentication information to go over the wire • JSON Web Token (JWT) is an emerging standard for representing the various claims, attributes and scopes of an identity • Can be used as a bearer token, URL parameter or Header • OAuth is also gaining popularity as a federation token for SSO integrations
  • 42. © Hortonworks Inc. 2014 KNOX-393: OAuth Resource Provider • Community investigating OAuth Federation Provider extension • Considering Apache Oltu • Warning: Diagram dramatically oversimplified • There are a number of other potential flows 2. REST API Request Authorization: Bearer <token> 3. validateAccessToken(<token>) 4. Authenticates as knox via SPNego (i.e. Kerberos) 5. REST API Request doAs kminder 0. Configure knox user to be known as trusted proxy 1. requestAccessToken(JWT) return Bearer token kminder
  • 43. © Hortonworks Inc. 2014 What is next for Knox? Jira Assignee Description KNOX-393: OAuth Resource Provider for Middleware and Application Integration COMMUNITY OAuth 2 federation provider potentially based on Apache Oltu for external application SSO to Knox and Hadoop KNOX-355: Support Knox Authentication Provider based on Hadoop Auth Module (SPNEGO) KNOX Team SPNEGO authentication support for Knox clients KNOX-250: SSH Bastion Auditing Functionality COMMUNITY SSH tunneling and auditing functionality in addition to REST gateway services. KNOX-353: Support Hadoop Java Client URLs KNOX Team In order to be used Hadoop CLIs that can use REST, we need to support the expected URLs. This is in addition to the extended URLs for multiple Hadoop cluster support by Knox. KNOX-242: LDAP Authentication Enhancements KNOX Team Search attribute based authentication rather than simple LDAP bind. KNOX-74: Support YARN REST API KNOX Team Add support for the YARN REST API KNOX-66: Support Ambari REST API access via the Gateway KNOX Team Add support for the Ambari REST API TBD TBD What is important to you?
  • 44. © Hortonworks Inc. 2014 Interested? • We’re hiring! • http://hortonworks.com/careers/open-positions/ • Especially hands on platform level development experience with • Kerberos • LDAP • OAuth • SAML • JAAS/GSS-API • Crypto
  • 45. © Hortonworks Inc. 2014 Questions and Answers