SlideShare a Scribd company logo
1 of 107
Download to read offline
New In HDP 2.3
© Hortonworks Inc. 2011 – 2015. All Rights Reserved
$(whoami)
Ajay Singh
Director Technical Channels
About Hortonworks
Customer Momentum
•  556 customers (as of August 5, 2015)
•  119 customers added in Q2 2015
•  Publicly traded on NASDAQ: HDP
Hortonworks Data Platform
•  Completely open multi-tenant platform
for any app and any data
•  Consistent enterprise services for security,
operations, and governance
Partner for Customer Success
•  Leader in open-source community, focused
on innovation to meet enterprise needs
•  Unrivaled Hadoop support subscriptions
Founded in 2011
Original 24 architects, developers,
operators of Hadoop from Yahoo!
740+
E M P L O Y E E S
1350+
E C O S Y S T E M
PA R T N E R S
HDP Is Enterprise Hadoop
Hortonworks Data Platform
YARN: Data Operating System
(Cluster Resource Management)
1 ° ° ° ° ° ° °
° ° ° ° ° ° ° °
Script
Pig
SQL
Hive
Tez
Tez
Java
Scala
Cascading
Tez
° °
° °
° ° ° ° °
° ° ° ° °
Others
ISV
Engines
HDFS
(Hadoop Distributed File System)
Stream
Storm
Search
Solr
NoSQL
HBase
Accumulo
Slider
 Slider
SECURITYGOVERNANCE OPERATIONSBATCH, INTERACTIVE & REAL-TIME DATA ACCESS
In-Memory
Spark
Provision,
Manage &
Monitor
Ambari
Zookeeper
Scheduling
Oozie
Data Workflow,
Lifecycle &
Governance
Falcon
Sqoop
Flume
Kafka
NFS
WebHDFS
Authentication
Authorization
Accounting
Data Protection
Storage: HDFS
Resources: YARN
Access: Hive, …
Pipeline: Falcon
Cluster: Knox
Cluster: Ranger
Deployment ChoiceLinux Windows On-Premises Cloud
YARN
is the architectural
center of HDP
Enables batch, interactive
and real-time workloads
Provides comprehensive
enterprise capabilities
The widest range of
deployment options
Delivered Completely in the OPEN
Hortonworks Data Platform
HORTONWORKS DATA PLATFORM
Hadoop&
YARN
Flume
Oozie
Pig
Hive
Tez
Sqoop
Cloudbreak
Ambari
Slider
Kafka
Knox
Solr
Zookeeper
Spark
Falcon
Ranger
HBase
Atlas
Accumulo
Storm
Phoenix
4.10.2
DATA MGMT DATA ACCESS GOVERNANCE & INTEGRATION OPERATIONS SECURITY
HDP 2.2
Dec 2014
HDP 2.1
April 2014
HDP 2.0
Oct 2013
HDP 2.2
Dec 2014
HDP 2.1
April 2014
HDP 2.0
Oct 2013
0.12.0 0.12.0
0.12.1 0.13.0 0.4.0
1.4.4 1.4.4 3.3.23.4.5
0.4.00.5.0
0.14.0 0.14.0 3.4.6 0.5.0 0.4.00.9.30.5.2
4.0.04.7.2
1.2.1 0.60.0 0.98.4 4.2.0 1.6.1 0.6.0 1.5.21.4.5 4.1.02.0.0
1.4.0 1.5.1 4.0.0
1.3.1
1.5.1 1.4.4 3.4.5
2.2.0
2.4.0
2.6.0
0.96.1
0.98.0 0.9.1
0.8.1
HDP 2.3
July 2015
1.3.12.7.1 1.4.6 1.0.0 0.6.0 0.5.02.1.00.8.2 3.4.61.5.25.2.1 0.80.0 1.1.1 0.5.01.7.04.4.0 0.10.0 0.6.10.7.01.2.10.15.0 4.2.0
Ongoing Innovation in Apache
New Capabilities in Hortonworks Data Platform 2.3
Breakthrough User
Experience
Dramatic Improvement in the User Experience
HDP 2.3 eliminates much of the complexity administering
Hadoop and improves developer productivity.
Enhanced Security
and Governance
Enhanced Security and Data Governance
HDP 2.3 delivers new encryption of data at rest, and
extends the data governance initiative with Apache™ Atlas.
Proactive Support
Extending the Value of a Hortonworks Subscription
Hortonworks® SmartSense™ adds proactive cluster monitoring,
enhancing Hortonworks’ award-winning support in key areas.
Apache is a trademark of the Apache Software Foundation.
Page 7 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
New In Apache Hadoop
Page 8 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
HDP Core
User Experience
•  Guided Configuration
•  Install/Manage/Monitor
NFS Gateway
•  Customizable Dashboards
•  Files View
•  Capacity Scheduler
Workload Management
•  Non-Exclusive Node
Labels
•  Fair Sharing Policy
•  [TP] Local Disk Isolation
Security
•  HDFS Data Encryption at
Rest
•  Yarn Queue ACLs through
Ranger
Operations
•  Report on Bad Disks
•  Enhanced Distcp (using
snapshots)
•  Quotas for Storage Tiers
Page 9 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Simplified Configuration Management
Page 10 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Deploy/Manage/Monitor NFS through Ambari
Monitor
ManageDeploy
Starts ‘portmap’ and ‘nfs’
Page 11 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Detect Bad Disks
Detect “bad” disk volumes on a DataNode
HDFS-7604
Page 12 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Enhanced HDFS Mirroring
Efficiency:
Reliability:
HDFS-7535
1st Snapshot
2nd Snapshot
Source Cluster Target Cluster
Backup
Create first snapshot
during initial copy
Copy only files in
delta
Use snapshot to
calculate
differential
Snapshot Diff faster than MapReduce based Diff for large directories
Snapshots ensure catch any changes to Source directory during
Distcp do not disrupt mirror
Page 13 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Quota Management By Storage Tiers
DISK
DISK
DISK
DISK
DISK
DISK
DISK
DISK
DISK
ARCHIVE
ARCHIVE
ARCHIVE
ARCHIVE
ARCHIVE
ARCHIVE
HDP Cluster
Warm
1 replica on DISK,
others on ARCHIVE
DataSet A
Cold
All replicas on
ARCHIVE
DataSet B
A A A
ARCHIVE
ARCHIVE
ARCHIVE
B B B
HDP 2.2
Page 14 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
HDFS Quotas: Extending to Tiered Storage
Quota: Number of files for a directory
hdfs dfsadmin –setQuota n <list of directories>
Sets total number of files that can be stored in each directory.
Quota: Total disk space for a directory
hdfs dfsadmin –setSpaceQuota n <list of directories>
Sets total disk space that can be used by each directory.
New in HDP 2.3: Quota by Storage Tier
hdfs dfsadmin –setSpaceQuota n [-storageType <type>] <list of directories>
Sets total disk space that can be used by each directory.
HDFS-7584
Page 15 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Node Labels in YARN
Enable configuration of node partitions
Now with HDP 2.3, two options:
Non-exclusive Node Labels
Exclusive Node Labels
Page 16 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Storm Storm
StormStorm
Exclusive Node Labels enable Isolated Partitions
S
App
Storm
Configure
Partitions
Storm
B
App
Exclusive Labels
enforce Isolation
S S
nodes
labels
S S
HDP 2.2
Page 17 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Spark Spark
SparkSpark
Non-Exclusive Node Labels
S
App
Spark
Configure non-
exclusive labels
Spark
B
App
Schedule if free
capacity
S S
nodes
labels
S S
B
YARN-3214
Page 18 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Fair Sharing: Pluggable Queue Policies
Choose scheduling policy per leaf queue
FIFO
Application Container requests accommodated on first come first serve basis
Multi-fair weight
Application Container requests accommodated according to:
•  Order of least resources used – multiple applications make progress
•  (Optional) Size based weight – adjustment to boost large applications making progress
YARN-3319YARN-3318
Page 19 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
New In Apache Hive
Page 20 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Hive
§  Performance
§  Vectorized Map Joins and other improvements
§  SQL
§  Union
§  Interval types
§  CURRENT_TIMESTAMP, CURRENT_DATE
§  Usability
§  Configurations
§  Hive View
§  Tez View
Page 21 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Vectorized Map Join
SELECT Count(*)
FROM store_sales
JOIN customer_demographics2
ON ss_cdemo_sk = cd_demo_sk
AND cd_demo_sk2 < 96040
AND ss_sold_date_sk BETWEEN 2450815 AND 2451697
SELECT Count(*)
FROM store_sales
LEFT OUTER JOIN customer_demographics2
ON ss_cdemo_sk = cd_demo_sk
AND cd_demo_sk2 < 96040
AND ss_sold_date_sk BETWEEN 2450815 AND 2451697
Map Join is up to 5x
faster, making the
overall query up to
2x faster in HDP 2.3
over Champlain
mapjoin_20.sql means the
query had a selectivity of 20
or 20% of rows end up
joining
Page 22 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
New SQL Syntax: Union
create table sample_03(name varchar(50), age int, gpa decimal(3, 2));
create table sample_04(name varchar(50), age int, gpa decimal(3, 2));
insert into table sample_03 values
('aaa', 35, 3.00),
('bbb', 32, 3.00),
('ccc', 32, 3.00),
('ddd', 35, 3.00),
('eee', 32, 3.00);
insert into table sample_04 values
('ccc', 32, 3.00),
('ddd', 35, 3.00),
('eee', 32, 3.00),
('fff', 35, 3.00),
('ggg', 32, 3.00);
hive> select * from sample_03 UNION select * from sample_04;
Query ID = ambari-qa_20150526023228_198786c5-5c89-4a38-9246-cbba9b903ab4
Total jobs = 1
Launching Job 1 out of 1
Status: Running (Executing on YARN cluster with App id
application_1432604373833_0002)
--------------------------------------------------------------------------------
VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
--------------------------------------------------------------------------------
Map 1 .......... SUCCEEDED 1 1 0 0 0 0
Map 4 .......... SUCCEEDED 1 1 0 0 0 0
Reducer 3 ...... SUCCEEDED 1 1 0 0 0 0
--------------------------------------------------------------------------------
VERTICES: 03/03 [==========================>>] 100% ELAPSED TIME: 8.48 s
--------------------------------------------------------------------------------
OK
aaa 35 3
bbb 32 3
ccc 32 3
ddd 35 3
eee 32 3
fff 35 3
ggg 32 3
Time taken: 11.208 seconds, Fetched: 7 row(s)
Page 23 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
New SQL Syntax: Interval Type in Expressions
hive> select timestamp '2015-03-08 01:00:00' + interval '1' hour;
OK
2015-03-08 02:00:00
Time taken: 0.136 seconds, Fetched: 1 row(s)
hive> select timestamp '2015-03-08 00:00:00' + interval '23' hour;
OK
2015-03-08 23:00:00
Time taken: 0.057 seconds, Fetched: 1 row(s)
hive> select timestamp '2015-03-08 00:00:00' + interval '24' hour;
OK
2015-03-09 00:00:00
Time taken: 0.149 seconds, Fetched: 1 row(s)
hive> select timestamp '2015-03-08 00:00:00' + interval '1' day;
OK
2015-03-09 00:00:00
Time taken: 0.063 seconds, Fetched: 1 row(s)
hive> select timestamp '2015-02-09 00:00:00' + interval '1' month;
OK
2015-03-09 00:00:00
Time taken: 0.107 seconds, Fetched: 1 row(s)
hive> select current_timestamp - interval '24' hour;
OK
2015-05-25 02:35:13.89
Time taken: 0.181 seconds, Fetched: 1 row(s)
hive> select current_date;
OK
2015-05-26
Time taken: 0.102 seconds, Fetched: 1 row(s
hive> select current_timestamp;
OK
2015-05-26 02:33:15.428
Time taken: 0.091 seconds, Fetched: 1 row(s
Page 24 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Not Supported: Interval Type in Tables
hive> CREATE TABLE t1 (c1 INTERVAL YEAR TO MONTH);
NoViableAltException(142@[])
at org.apache.hadoop.hive.ql.parse.HiveParser.type(HiveParser.java:38574)
at org.apache.hadoop.hive.ql.parse.HiveParser.colType(HiveParser.java:38331)
...
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
FAILED: ParseException line 1:20 cannot recognize input near 'INTERVAL' 'YEAR' 'TO' in column
type
hive> CREATE TABLE t1 (c1 INTERVAL DAY(5) TO SECOND(3));
NoViableAltException(142@[])
at org.apache.hadoop.hive.ql.parse.HiveParser.type(HiveParser.java:38574)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
FAILED: ParseException line 1:20 cannot recognize input near 'INTERVAL' 'DAY' '(' in column type
Page 25 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Simplified Configuration Management
Page 26 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
New In Apache HBASE
Page 27 © Hortonworks Inc. 2011 – 2015. All Rights Reserved HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION
HBase and Phoenix in HDP 2.3
HBase	
  and	
  Phoenix	
  in	
  HDP	
  2.3	
  
Opera5ons	
   Scale	
  and	
  Robustness	
   Developer	
  
HBase	
  
•  Next	
  Genera-on	
  Ambari	
  UI.	
  
•  Customizable	
  Dashboards.	
  
•  Supported	
  init.d	
  scripts.	
  
•  Improved	
  HMaster	
  Reliability	
  
•  Security:	
  
•  Namespaces.	
  
•  Encryp-on.	
  
•  Authoriza-on	
  
Improvements.	
  
•  Cell-­‐Level	
  Security.	
  
•  LOB	
  support	
  
Phoenix	
  
•  Phoenix	
  Slider	
  Support	
  
•  HBase	
  Read	
  HA	
  Support	
  
•  Func-onal	
  Indexes	
  
•  Query	
  Tracing	
  
•  Phoenix	
  SQL:	
  
•  UNION	
  ALL	
  
•  UDFs	
  
•  7	
  New	
  Date/Time	
  Func-ons	
  
•  Spark	
  Driver	
  
•  PhoenixServer	
  
Page 28 © Hortonworks Inc. 2011 – 2015. All Rights Reserved HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION
Simplified Configuration Management
Guide	
  configura-on	
  and	
  provide	
  
recommenda-ons	
   for	
   the	
   most	
  
common	
  seTngs.	
  
Page 29 © Hortonworks Inc. 2011 – 2015. All Rights Reserved HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION
Build Your Own HBase Dashboard
Monitor	
  the	
  metrics	
  that	
  ma@er	
  to	
  you.	
  
1.  Select	
  a	
  pre-­‐defined	
  visualiza-on.	
  
2.  Choose	
  from	
  more	
  than	
  >	
  1000	
  metrics,	
  
ranging	
  from	
  HBase,	
  HDFS,	
  MapReduce2	
  
and	
  YARN.	
  
3.  Define	
  custom	
  aggrega-ons	
  for	
  metrics	
  
within	
  one	
  component	
  or	
  across	
  
components.	
  
1
2 3
Page 30 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Namespaces and Delegated Admin
Namespaces
•  Namespaces are like RDBMS schemas.
•  Introduced in HBase 0.96.
•  Many security gaps until HBase 1.0.
Delegated Administration
•  Goal: Create a namespace and hand it over to a DBA.
•  People in the namespace can’t do anything outside their namespace.
Page 31 © Hortonworks Inc. 2011 – 2015. All Rights Reserved HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION
Security: Namespaces, Tables, Authorizations
Scopes:
•  Authorization scopes: Global -> namespace -> table -> column family -> cell.
Access Levels:
•  Read, Write, Execute, Create, Admin
Page 32 © Hortonworks Inc. 2011 – 2015. All Rights Reserved HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION
Delegated Administration Example
Give a user their own Namespace to play in.
•  Step 1: Superuser (e.g. user hbase) creates namespace foo.
•  create_namespace ‘foo’
•  Step 2: Admin gives dba-bar full permissions to the namespace:
•  grant ’dba-bar', 'RWXCA', '@foo’
•  Note: namespaces are prefixed by @.
•  Step 3: dba-bar creates tables within the namespace:
•  create ’foo:t1', 'f1’
•  Step 4: dba-bar hands out permissions to the tables:
•  grant ‘user-x’, ‘RWXCA’, ‘foo:t1’
•  Note: All users will be able to see namespaces and tables within namespaces, but not the data.
Page 33 © Hortonworks Inc. 2011 – 2015. All Rights Reserved HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION
Turning Authorization On
Turn Authorization On in Non-Kerberized (test) Clusters:
•  Set hbase.security.authorization	
  = true
•  Set hbase.coprocessor.master.classes =
org.apache.hadoop.hbase.security.access.AccessController	
  
•  Set hbase.coprocessor.region.classes =
org.apache.hadoop.hbase.security.access.AccessController	
  
•  Set hbase.coprocessor.regionserver.classes =
org.apache.hadoop.hbase.security.access.AccessController	
  
Authorization in Kerberized Clusters:
•  hbase.coprocessor.region.classes	
  should have both
org.apache.hadoop.hbase.security.token.TokenProvider and
org.apache.hadoop.hbase.security.access.AccessController	
  
Page 34 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
SQL in Phoenix / HDP 2.3
UNION ALL
Date / Time Functions
•  now(), year, month, week, dayofmonth, curdate
•  hour, minute, second
Custom UDFs
•  Row-level UDFs.
Tracing
•  Trace a query to pinpoint bottlenecks.
Page 35 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
HBase	
  
RegionServer	
  1	
  
HBase	
  
RegionServer	
  2	
  
HBase	
  
RegionServer	
  3	
  
HBase	
  
RegionServer	
  4	
  
Phoenix	
  Query	
  Server:	
  Suppor5ng	
  Non-­‐Java	
  Drivers	
  
HTTP
Endpoint
HTTP
Endpoint
HTTP
Endpoint
HTTP
Endpoint
Python
Client
.NET
Client
Request	
  proxied	
  
if	
  needed	
  
Thri^	
  RPC	
  to	
  endpoint	
  
on	
  any	
  RegionServer	
  
1 Endpoints	
  colocated	
  with	
  RegionServers.	
  No	
  Single-­‐Point-­‐of-­‐Failure.	
  Op-onal	
  loadbalancer.	
  
2 Endpoints	
  can	
  proxy	
  requests	
  or	
  perform	
  local	
  aggrega-ons	
  
Page 36 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Using Phoenix Query Server
Client Side:
•  Thin JDBC Driver: /usr/hdp/current/phoenix/phoenix-thin-client.jar (1.7mb versus 44mb)
•  Does not require Zookeeper access.
•  Wrapper Script: sqlline-thin.py
•  sqlline-thin.py https://host:8765
Server Side:
•  Ambari Install and Management: Yes
•  Port: Default = 8765
HTTP Example:
•  curl	
  -­‐XPOST	
  -­‐H	
  'request:	
  {"request":"prepareAndExecute","connectionId":"aaaaaaaa-­‐
aaaa-­‐aaaa-­‐aaaa-­‐aaaaaaaaaaaa","sql":"select	
  count(*)	
  from	
  PRICES","maxRowCount":-­‐1}'	
  
http://localhost:8765/	
  
	
  
Page 37 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Phoenix / Spark integration in HDP 2.3
Phoenix / Spark Connector
•  Load Phoenix tables / views into RDDs or DataFrames.
•  Integrate with Spark, Spark Streaming and SparkSQL.
Page 38 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
New In Apache Storm
Page 39 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Stream Processing Ready For Mainstream Adoption
Stream analysis, scalable across the cluster
Nimbus High Availability
No single point of failure for stream processing job
management
Ease of Deployment
Quickly create stream processing pipelines via Flux
Rolling Upgrades
Update Storm to newer versions, with zero downtime
Enhanced Security for Kafka
Authorization via Ranger and authentication via Kerberos
Storage
YARN: Data Operating System
Governance Security
Operations
Resource Management
Page 40 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Connectivity Enhancements
Apache Storm 0.10.0
•  Microsoft Azure Event Hubs Integration
•  Redis Support
•  JDBC/RDBMS Integration
•  Solr 5.2.1 -- Storm Bolt: some assembly
required
Kafka 0.8.2
•  Flume Integration (originally released in HDP
2.2) – not supported when Kafka Security is
activated
Page 41 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Storm Nimbus High Availability
NIMBUS-­‐1	
  
SUPERVISOR-­‐1	
  
SUPERVISOR-­‐N	
  
Zookeeper-­‐1	
  
Zookeeper-­‐2	
  
Zookeeper-­‐N	
  
Storm	
  UI	
  DRPC	
  
NIMBUS-­‐2	
  
NIMBUS-­‐N	
  
Nimbus HA uses leader election to
determine primary
Page 42 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Productivity
Partial Key Groupings
•  The stream can be partitioned by fields specified in the grouping, like
the Fields grouping, but in this case are load balanced between two
downstream bolts, which provides better utilization of resources when
the incoming data is skewed.
Reduced Dependency Conflicts with shaded JARs
•  This enhancement provides clear separation between the Storm engine
and supporting code from the topology code provided by developers.
Page 43 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Productivity
Declarative Topology Wiring with Flux
•  Define Storm Core API (Spouts/Bolts) using a flexible YAML DSL
•  YAML DSL support for most Storm components (storm-kafka, storm-hdfs,
storm-hbase, etc.)
•  Convenient support for multi-lang components
•  External property substitution/filtering for easily switching between
configurations/environments (similar to Maven-style ${variable.name}
substitution)
Examples
https://github.com/apache/storm/tree/master/external/flux/flux-examples
Page 44 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Security
§  Storm
§  User Impersonation
§  SSL Support for Storm UI, Log Viewer, and DRPC (Distributed Remote Procedure Call)
§  Automatic credential renewal
§  Kafka
§  Kerberos-based Authentication
§  Pluggable Authorization and Apache Ranger Integration
Page 45 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
New In HDP Search
Page 46 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
HDP Search 2.3
HDP Search 2.2 HDP Search 2.3
Package jar RPM Solr, SiLK (Banana), Connectors all in one package
Solr 4.10.2 5.2.1 Latest stable release version of Solr (Included with package)
HDFS 2.5 2.7.1 Batch Indexing from HDFS (Included with package)
Hive 0.14.0 1.2.1 Batch indexing from Hive tables (Included with package)
Pig 0.14.0 0.15.0 Batch indexing from pig jobs (Included with package)
Storm X 0.10.0
Streaming data real-time index (access from
https://github.com/LucidWorks/storm-solr )
Spark Streaming X 1.3.1 Streaming data real-time index (Included with package)
Security X Included in Solr 5.2.1 Kerberos and Ranger support (Included with Solr)
HBase X 1.1.1
1.  Near Real time indexing of data from HBase tables
2.  Batch indexing from HBase tables (Included with package)
Ranger X 0.5.0 Extend Ranger security configuration to HDP Search
Page 47 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
HDP Search: Packaging and Access
Available as RPM package
Downloadable from HDP-UTILS repo
yum install “lucidworks-hdp-search”
Page 48 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
HBase Near Real Time Indexing into Solr
HBase
HBase
Indexer
HDFS
SolrCloud SolrCloud SolrCloud SolrCloud
Indexer for table to
collection
Asynch. replication
from row update to
document insert
into index
Page 49 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Hbase Indexer
Hbase Realtime Indexer:
•  The HBase Indexer provides the ability to stream events from HBase to Solr for near real time searching.
•  HBase indexer is included with Lucidworks HDPSearch as an additional service
•  The indexer works by acting as an HBase replication sink.
•  As updates are written to HBase, the events are asynchronously replicated to the HBase Indexer processes, which in
turn creates Solr documents and pushes them to Solr.
Bulk Indexing:
•  Run a batch indexing job that will index data already contained within an HBase table.
•  The batch indexing tool operates with the same indexing semantics as the near-real-time indexer, and it is run as a
MapReduce job.
•  The batch indexing can be run as multiple indexers that run over HBase regions and write data directly to Solr
•  Indexing shards can be generated offline and then merged into a running SolrCloud cluster using the --go-live flag
Thread is a parameter and can parallelize the indexing process
Page 50 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
HDP Search Security
•  Apache Solr supports authentication using Kerberos
•  Apache Solr supports ACLs for authorization for a collection
•  Following permissions are supported through Ranger, at a
collection and core level
•  Query
•  Update
•  Admin
Why is it
important?
•  Secure users
using Solr
•  Apply security
policies for
Solr Query
•  Audit Solr
Queries
Page 51 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
SiLK: Visualize Bigdata Insights
•  Bundled with HDP Search RPM package
•  Real time interactive analytics
–  Dashboards display real time users interaction
–  Integration will deliver pre-defined dashboards with most common analytics
–  Drill down into the analytics data all the way to a single event or user
interaction
–  Create time-series to understand patterns and anomalies over time
•  Configure personalized dashboards
–  Administration interface to build new dashboards with minimal effort
–  Create personalized dashboard views based on business unit or job role
–  Admin can setup dashboards per their business requirements to enable
real time analysis of their products and user activity
•  Proactive alerts (Fusion only)
–  Configure alerts to notify new events
–  Realtime proactive alerts help businesses react in real time
•  Security:
–  No authentication or authorization support for SilK with HDP Search
–  Use Lucidworks Fusion to secure SilK as well
Page 52 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
New In Apache Spark
Page 53 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Made for Data Science
All apps need to get predictive at scale and fine granularity
Democratizes Machine Learning
Spark is doing to ML on Hadoop what Hive did for SQL on Hadoop
Elegant Developer APIs
DataFrames, Machine Learning, and SQL
Realize Value of Data Operating System
A key tool in the Hadoop toolbox
Community
Broad developer, customer and partner interest
Spark In HDP
Storage
YARN: Data Operating System
Governance Security
Operations
Resource Management
Page 54 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
HDP 2.3 Includes Spark 1.3.1
§  DataFrame API – (Alpha)
§  SchemaRDD has become DataFrame API
§  New ML algorithms:
§  LDA (Latent Dirichlet Allocation),
§  GMM (Gaussian Mixture Model)
§  & others
§  ML Pipeline API in PySpark
§  Spark Streaming support for Direct Kafka API gives exactly-once delivery w/o WAL
§  Python Kafka API
Page 55 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
DataFrames: Represents Tabular Data
§ RDD is a low level abstraction
§ DataFrames attach schema to RDDs
§ Allows us to perform aggressive query optimizations
§ Brings the power of SQL to RDDs!
Page 56 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
DataFrames are Intuitive
RDD︎
DataFrame︎
dept name age
Bio H Smith 48
CS A Turing 54
Bio B Jones 43
Phys E Witten 61
Page 57 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
DataFrame Operations
• Select, withColumn, filter etc.
• Explode
• groupBy
• Agg
• Join
• Window Functions
Page 58 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
The Data Science Workflow Are Complex
What is the
question I'm
answering?
What data will
I need?
Plan
Acquire
the data
Analyze data
quality
Reformat
Impute
etc
Clean Data
Analyze data
Visualize
Create model
Evaluate
results
Create
features
Create report
Deploy in
Production
Publish
& Share
Start
here
End
here
Script
VisualizeScript
Page 59 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
ML Pipelines
Transformer︎
Transforms one dataset into
another.︎
Estimator︎
Fits model to data.︎
︎
Pipeline︎
Sequence of stages, consisting of estimators or transformers.︎
Page 60 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Tools for Data Science with Spark
§  DataFrame – intuitive manipulation of tabular data
§  ML Pipeline API – construct ML workflows
§  ML algorithms
§  Notebooks (iPython, Zeppelin) – Data Exploration, Visualization, Code
Page 61 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Apache Atlas
Page 62 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Enterprise Data Governance Goals
GOALS: Provide a common approach to
data governance across all systems
and data within the organization
•  Transparent
Governance standards & protocols must be
clearly defined and available to all
•  Reproducible
Recreate the relevant data landscape at a
point in time
•  Auditable
All relevant events and assets but be
traceable with appropriate historical lineage
•  Consistent
Compliance practices must be consistent
ETL/DQ
BPM
Business
Analytics
Visualization
& Dashboards
ERP
CRM
SCM
MDM
ARCHIVE
Governance
Framework
Page 63 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
DGI becomes Apache Atlas
ETL/DQ
BPM
Business
Analytics
Visualization
& Dashboards
ERP
CRM
SCM
MDM
ARCHIVE
Data Governance Initiative
Common
Governance
Framework
1 ° ° ° ° ° ° °
° ° ° ° ° ° ° °
° °
° °
°
°
ApachePig
ApacheHive
ApacheHBase
ApacheAccumulo
ApacheSolr
ApacheSpark
ApacheStorm
TWO Requirements
1.  Hadoop must snap in to
the existing frameworks
and be a good citizen
2.  Hadoop must also provide
governance within its own
stack of technologies
A group of companies dedicated to meeting
these requirements in the open
Major
Bank
Page 64 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Data Steward
Responsibilities include:
•  Ensuring Data Integrity & Quality
•  Creating Data Standards
•  Ensure Data Lineage
Hadoop Data Governance for the Data Steward
Resolve issues before they occur
Scalable Metadata Service
Business modeling with industry-specific vocabulary
Extend visibility into HDFS path
REST API
Hive Integration
Leverage existing metadata with import/ export capability
Enhanced User Interface
Hive table lineage and Search DSL
Page 65 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Apache Atlas
Metadata Services
•  Business Taxonomy - classification
•  Operational Data – Model for Hive: DB,
Tables, Col,
•  Centralized location for all metadata inside
HDP
•  Single Interface point for Metadata
Exchange with platforms outside of HDP.
•  Search & Prescriptive Lineage – Model
and Audit
Apache Atlas
Hive
Ranger
Falcon
Kafka
Storm
Page 66 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Apache Atlas
Apache Atlas Overview
Taxonomy
Knowledge store categorized with appropriate business-
oriented taxonomy
•  Data sets & objects
•  Tables / Columns
•  Logical context
•  Source, destination
Support exchange of metadata between foundation
components and third-party applications/governance tools
Leverages existing Hadoop metastores
Audit Store
Policy Engine
Data Lifecycle
Management
Security
REST API
Services
Search Lineage Exchange
Healthcare
HIPAA
HL7
Financial
SOX
Dodd-Frank
Custom
CWM
Retail
PCI
PII
Other
Knowledge Store
ModelsType-System
Policy RulesTaxonomies
Page 67 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Apache Atlas
Knowledge Store
Apache Atlas
RESTful interface
•  Extensible enterprise classification of data assets,
relationships and policies organized in a meaningful
way -- aligned to business organization.
•  Supports exploration via user interface
•  Supports extensibility via API and CLI exposure
Audit Store
ModelsType-System
Policy RulesTaxonomies
Policy Engine
Data Lifecycle
Management
Security
REST API
Services
Search Lineage Exchange
Healthcare
HIPAA
HL7
Financial
SOX
Dodd-Frank
Custom
CWM
Retail
PCI
PII
Other
Page 68 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Apache Atlas
Knowledge Store
Apache Atlas Overview
Search & Lineage (Browse)
•  Pre-defined navigation paths to explore the data
classification and audit information
•  Text-based search features locates relevant data and
audit event across Data Lake quickly and accurately
•  Browse visualization of data set lineage allowing
users to drill-down into operational, security, and
provenance related information
•  SQL like DSL – domain specific language
Audit Store
ModelsType-System
Policy RulesTaxonomies
Policy Engine
Data Lifecycle
Management
Security
REST API
Services
Search Lineage Exchange
Healthcare
HIPAA
HL7
Financial
SOX
Dodd-Frank
Custom
CWM
Retail
PCI
PII
Other
Lineage
Page 69 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
New In Apache Ambari 2.1
Page 70 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
New in Ambari 2.1
§  Core Platform
§  Guided Configs (AMBARI-9794)
§  Customizable Dashboards (AMBARI-9792)
§  Manual Kerberos Setup (AMBARI-9783)
§  Rack Awareness (AMBARI-6646)
§  Stack Support
§  NFS Gateway, Atlas, Accumulo, others…
§  Storm Nimbus HA (AMBARI-10457)
§  Ranger HA (AMBARI-10281, AMBARI-10863)
§  User Views
§  Hive, Pig, Files, Capacity Scheduler
§  Ambari Platform
§  New OS: RHEL/CentOS 7 (AMBARI-9791)
§  New JDKs: Oracle 1.8 (AMBARI-9784)
§  Blueprints API
§  Host Discovery (AMBARI-10750)
§  Views Framework
§  Auto-Cluster Configuration (AMBARI-10306)
§  Auto-Create Instance (AMBARI-10424)
Page 71 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Ambari 2.1 HDP Stack Support Matrix
Support for HDP 2.3 and HDP 2.2
Deprecated Support for HDP 2.1 and HDP 2.0
•  Plan to remove support for HDP 2.1 and HDP 2.0 in NEXT Ambari release
HDP 2.3 HDP 2.2 HDP 2.1 HDP 2.0
Ambari 2.1
deprecated deprecated
Ambari 2.0
Ambari 1.7
Page 72 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Ambari 2.1 HDP Stack Components
HDP 2.3 HDP 2.2 HDP 2.1 HDP 2.0
HDFS, YARN, MapReduce, Hive,
HBase, Pig, ZooKeeper, Oozie,
Sqoop
Tez, Storm, Falcon, Flume
Knox, Slider, Kafka
Ranger, Spark, Phoenix
Accumulo, NFS Gateway,
Mahout, DataFu, Atlas
NEW! Ambari 2.1
Page 73 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Ambari 2.1 HDP Stack High Availability
HDP Stack Mode Ambari 2.0 Ambari 2.1
HDFS: NameNode HDP 2.0+
Active/
Standby
YARN: ResourceManager HDP 2.1+
Active/
Standby
HBase: HBaseMaster HDP 2.1+ Multi-master
Hive: HiveServer2 HDP 2.1+ Multi-instance
Hive: Hive Metastore HDP 2.1+ Multi-instance
Hive: WebHCat Server HDP 2.1+ Multi-instance
Oozie: Oozie Server HDP 2.1+ Multi-instance
Storm: Nimbus Server HDP 2.3 Multi-instance
Ranger: AdminServer HDP 2.3 Multi-instance
Page 74 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Ambari 2.1 JDK Support
Important: If you plan on installing HDP 2.2 or earlier with Ambari 2.1, be sure to use JDK 1.7.
Important: If you are using JDK 1.6, you must switch to JDK 1.7 before upgrading to Ambari 2.1
HDP 2.3 HDP 2.2 HDP 2.1 HDP 2.0
JDK 1.8
JDK 1.7
JDK 1.6 *
Page 75 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Ambari 2.1 Platform Support
RHEL 7 RHEL 6 RHEL 5 SLES 11 Ubuntu 12 Ubuntu 14 Debian 7
Ambari 2.1 M10
Ambari 2.1 GA
Ambari 2.0 deprecated
•  Add RHEL/CentOS/Oracle Linux 7 support
•  Removed RHEL/CentOS/Oracle Linux 5 support
•  Ubuntu + Debian NOT AVAILABLE until first Ambari 2.1 and HDP 2.3 maint. releases!!!
Page 76 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Ambari 2.1 Database Support
Ambari 2.1 + HDP 2.3 added support for Oracle 12c
Ambari 2.1 DB: SQL Server *Tech Preview*
Page 77 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Blueprints Challenge Today
•  Today: Blueprints need ALL VMs available to provision cluster
•  This can be a challenge when trying to build a large cluster, especially in Cloud environments
•  Blueprints Host Discovery feature allows you to provision cluster with
all, some or no hosts
•  When Hosts come online and Agents register with Ambari, Blueprints will automatically put
the hosts into the cluster
Page 78 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Blueprints Host Discovery (AMBARI-10750)
Ambari
POST /api/v1/clusters/MyCluster/hosts
[
{
"blueprint" : "single-node-hdfs-test2",
"host_groups" :[
{
"host_group" : "slave",
"host_count" : 3,
"host_predicate" : "Hosts/cpu_count>1”
}
]
}
]
Page 79 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Guided Configurations
•  Improved layout and
grouping of configurations
•  New UI controls to make it
easier to set values
•  Better recommendations and
cross-service dependency
checks
•  Implemented for HDFS,
YARN, HBase and Hive
•  Driven by Stack definition
Page 80 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Alert Changes
Alerts Log (AMBARI-10249)
•  Alert state changes are written to /var/log/ambari-server/ambari-alerts.log
Script-based Alert Notifications (AMBARI-9919)
•  Define a custom script-based notification dispatcher
•  Executed on alert state changes
•  Only available via API
2015-07-13 14:58:03,744 [OK] [ZOOKEEPER] [zookeeper_server_process] (ZooKeeper Server
Process) TCP OK - 0.000s response on port 2181
2015-07-13 14:58:03,768 [OK] [HDFS] [datanode_process_percent] (Percent DataNodes Available)
affected: [0], total: [1]
Page 81 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
HDFS Topology Script + Host Mappings
Set Rack ID from Ambari
Ambari generates + distributes topology script with mappings file
Sets core-site “net.topology.script.file.name” property
If you modify Rack ID HDFS, YARN
Page 82 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
New User Views
Capacity Scheduler View
Browse + manage YARN queues
Tez View
View information related to Tez jobs
that are executing on the cluster.
Page 83 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
New User Views
Pig View
Author and execute Pig
Scripts.
Hive View
Author, execute and debug
Hive queries.
Files View
Browse HDFS file system.
Page 84 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Separate Ambari Servers
•  For Hadoop Operators:
Deploy Views in an Ambari Server that is managing a Hadoop cluster
•  For Data Workers:
Run Views in a “standalone” Ambari Server
Ambari
Server
HDP	
  CLUSTER	
  
Store	
  &	
  Process	
  
Ambari
Server
Operators
manage the
cluster, may
have Views
deployed
Data
Workers use
the cluster
and use a
“standalone”
Ambari
Server for
Views
Page 85 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Views <-> Cluster Communications
HDP
CLUSTER
Ambari
DB
LDAP
AuthN
proxy
Ambari
Server
Ambari
Server
Ambari
Server
Deployed Views talk
with cluster using
REST APIs
(as applicable)
Important: It is NOT a requirement to operate your cluster
with Ambari to use Views with your cluster.
Page 86 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Upgrading Ambari 2.1
Preparing Stop Ambari
Upgrade
Ambari Server
+ Agents
Upgrade
Ambari
Schema
On the host running
Ambari Server,
upgrade the Ambari
Server database
schema.
Complete +
Start
Complete any post-
upgrade tasks
(such as LDAP
setup, database
driver setup).
Perform the
preparation steps,
which include
making backups of
critical cluster
metadata.
On all hosts in the
cluster, stop the
Ambari Server and
Ambari Agents.
On the host running
Ambari Server,
upgrade the Ambari
Server.
On all hosts in the
cluster, upgrade the
Ambari Agent.
Page 87 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Ambari Upgrade Tips
•  After Ambari upgrade, you will see prompts to restart services. Because of all
new guided configurations, Ambari has added the new configurations to
Services.
•  Review the changes by comparing config versions.
•  Use the config filter to identify any config issues.
•  Do not change to JDK 1.8 until you are running HDP 2.3.
•  HDP 2.3 is the ONLY version of HDP that is certified and supported with JDK 1.8.
•  Before upgrading to HDP 2.3, you must upgrade to Ambari 2.1 first.
•  Be sure your cluster has landed on Ambari 2.1 cleanly and is working properly.
•  Recommendation: schedule Ambari upgrade separate from HDP upgrade
Page 88 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
HDP Upgrade Options
HDP 2.2.x HDP 2.2.y
HDP 2.3.x HDP 2.3.y
or or
HDP 2.2.x HDP 2.3.y
HDP 2.0/2.1 HDP 2.3.y
2.0/2.1 -> 2.3
MINOR
UPGRADE
MAINTENANCE
UPGRADE
2.2 -> 2.3
MINOR
UPGRADE
Rolling Upgrade
OR
Manual “Stop the World”
Rolling Upgrade
OR
Manual “Stop the World”
Manual “Stop the World”
(not available at GA)
(must go HDP 2.2 FIRST)
Page 89 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
New In Apache Ranger
Page 90 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
•  Wire encryption
in Hadoop
•  Native and
partner
encryption
•  Centralized
audit reporting
w/ Apache
Ranger
•  Fine grain access
control with
Apache Ranger
Security today in HDP
Authorization
What can I do?
Audit
What did I do?
Data Protection
Can data be encrypted
at rest and over the
wire?
•  Kerberos
•  API security with
Apache Knox
Authentication
Who am I/prove it?
HDP2.3
Centralized Security Administration w/ Ranger
EnterpriseServices:Security
Page 91 © Hortonworks Inc. 2011 – 2015. All Rights Reserved HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION
Security items planned in HDP 2.3
New Components Support
•  Ranger to support authorization and auditing for Solr, Kafka and Yarn
Extending Security
•  Hooks for creating dynamic policy conditions
•  Protect metadata in Hive
•  Introduce Ranger KMS to support HDFS Transparent Encryption
–  UI to manage policies for key management
Auditing changes
•  Ranger to support queries for audit stored in HDFS using Solr
•  Optimization of auditing at source
Page 92 © Hortonworks Inc. 2011 – 2015. All Rights Reserved HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION
Security items planned in HDP 2.3
Extensible Architecture
•  Pluggable architecture for Ranger – Ranger Stacks
•  Config driven new components addition – Knox Stacks
Enterprise Readiness
•  Knox to support LDAP caching
•  Knox to support 2 way SSL queries
•  Ranger to support PostGres and MS-SQL DB for storing policy data
•  Ranger permission changes
Page 93 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Kafka Security
•  Kafka now supports authentication using Kerberos
•  Kafka also supports ACLs for authorization for a topic
per user/group
•  Following permissions are supported through Ranger
–  Publish
–  Consume
–  Create
–  Delete
–  Configure
–  Describe
–  Replicate
–  Connect
Page 94 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Solr Security
§  Apache Solr now supports authentication using
Kerberos
§  Apsche Solr also supports ACLs for authorization for
a collection
§  Following permissions are supported through
Ranger, at a collection level
§  Query
§  Update
§  Admin
Page 95 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Yarn Integration
•  Yarn supports ACL for queue submission
•  Ranger now integrated with Yarn RM to manage these
permissions from Ranger
•  Following permissions are supported through Ranger
•  Submit-app
•  Admin-queue
Page 96 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Dynamic Policy Conditions
•  Currently Ranger supports static “role” based policy controls
•  Users are looking for dynamic attributes such as geo, time and data
attributes to drive policy decisions
•  Ranger has introduced for hooks for these dynamic conditions
Page 97 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Dynamic Policy Hooks - Config
•  Conditions can be added as part of service definition
Conditions can vary by service (HDFS, Hive etc)
Page 98 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Protect Metadata in Hiveserver2
§  In Hive, metadata listing can be protected by
underlying permissions
§  Following commands are protected
§  Show Databases
§  Show Tables
§  Describe table
§  Show Columns
Page 99 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
1	
  
°	
  
°	
  
°	
  
°	
  
°	
   °	
  
°	
   °	
  
°	
   °	
  
°	
   °	
  
°	
   N	
  °	
  
HDFS Transparent Encryption
DATA	
  	
  ACCESS	
  
	
  	
  
DATA	
  	
  MANAGEMENT	
  
1	
   °	
   °	
   °	
   °	
   °	
  
°	
   °	
   °	
   °	
   °	
   °	
  
°	
   °	
   °	
   °	
   °	
   °	
  
SECURITY	
  
	
  	
  YARN	
  
HDFS	
  Client	
  
	
  
	
  
°	
   °	
   °	
   °	
   °	
   °	
  
°	
   °	
   °	
   °	
   °	
   °	
  
°	
   °	
  
°	
   °	
  
°	
   °	
  
°	
   °	
  
°	
  HDFS	
  	
  
(Hadoop	
  Distributed	
  File	
  System)	
  
Encryp5on	
  Zone	
  	
  
(acributes	
  -­‐	
  EZKey	
  ID,	
  version)	
  
Encrypted	
  File	
  
(acributes	
  -­‐	
  EDEK,	
  IV)	
  
Name	
  Node	
  
KeyProvider	
  
API	
  
KeyProvider	
  
API	
  
Key	
  Management	
  
System	
  (KMS)	
  
KeyProvider	
  API	
  
EDEK	
  
DEK	
  
Crypto	
  Stream	
  	
  
(r/w	
  with	
  DEK)	
  
DEKs	
   EZKs	
  
Acronym	
   Descrip-on	
  
EZ	
   Encryp-on	
  Zone	
  (an	
  HDFS	
  directory)	
  	
  
EZK	
   Encryp-on	
  Zone	
  Key;	
  master	
  key	
  associated	
  with	
  all	
  
files	
  in	
  an	
  EZ	
  
DEK	
   Data	
  Encryp-on	
  Key,	
  unique	
  key	
  associated	
  with	
  each	
  
file.	
  	
  EZ	
  Key	
  used	
  to	
  generate	
  DEK	
  
EDEK	
  	
   Encrypted	
  DEK,	
  Name	
  Node	
  only	
  has	
  access	
  to	
  
encrypted	
  DEK.	
  	
  	
  
IV	
   Ini-aliza-on	
  Vector	
  
EDEK	
  
EDEK	
  
Open source KMS based on file level storage.
HDP 2.2
Page 100 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
HDFS Encryption in HDP 2.3
1	
  
°	
  
°	
  
°	
  
°	
  
°	
   °	
  
°	
   °	
  
°	
   °	
  
°	
   °	
  
°	
   N	
  °	
  
DATA	
  	
  ACCESS	
  
	
  	
  
DATA	
  	
  MANAGEMENT	
  
1	
   °	
   °	
   °	
   °	
   °	
  
°	
   °	
   °	
   °	
   °	
   °	
  
°	
   °	
   °	
   °	
   °	
   °	
  
Ranger	
  KMS	
  
	
  	
  YARN	
  
HDFS	
  Client	
  
	
  
	
  
°	
   °	
   °	
   °	
   °	
   °	
  
°	
   °	
   °	
   °	
   °	
   °	
  
°	
   °	
  
°	
   °	
  
°	
   °	
  
°	
   °	
  
°	
  HDFS	
  	
  
(Hadoop	
  Distributed	
  File	
  System)	
  
Encryp5on	
  Zone	
  	
  
(acributes	
  -­‐	
  EZKey	
  ID,	
  version)	
  
Encrypted	
  File	
  
(acributes	
  -­‐	
  EDEK,	
  IV)	
  
Name	
  Node	
  
KeyProvider	
  
API	
  
KeyProvider	
  
API	
  
KeyProvider	
  API	
  
EDEK	
  
DEK	
  
Crypto	
  Stream	
  	
  
(r/w	
  with	
  DEK)	
  
DEKs	
   EZKs	
  
EDEK	
  
EDEK	
  
DB Storage
HDP 2.3
Page 101 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Audit setup in HDP 2.2 – Simplified View
Hadoop
Component
Ranger Administration Portal
Ranger Audit
Query
Ranger Policy
DB
Ranger
Plugin
RDBMS
HDFS
Page 102 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Hadoop
Component
Audit setup in HDP 2.3 – Solr Based Query
Ranger Administration Portal
Ranger Audit
Query
Ranger Policy
DB
Ranger
Plugin
HDFS
RDBMS
Why is it
important?
•  Scalable
approach
•  Remove
dependency on
DB for audit
•  Ability to use
banana for
dashboards
Page 103 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Lab
https://github.com/abajwa-hw/hdp22-hive-streaming/blob/
master/LAB-STEPS.md
Page 104 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Lab Overview
§  Tenants
§  Groups - IT & Marketing
§  Users – it1 (IT) & mktg1 (Marketing)
§  Responsibility
§  IT – Onboard Data & Manage Security
§  Marketing – Analyze Data
§  Lab Environment
§  Using HDP 2.3 Sandbox
§  Linux and Ranger users it1 and mktg1 pre-created
§  Global Allow policy set in Ranger
Page 105 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Lab Steps
§  Step 1
§  Create hdfs directories for users it1 and
mktg1
§  Step 2
§  Disable Ranger Global Allow Policy
§  Enable hdfs & hive permissions for it1
§  Step 3
§  Create interactive and batch queues in
YARN
§  Assign user it1 to batch queue and
mktg1 to default queue
§  Step 4
§  Create ambari users it1 and mktg1 and
enable hive views
§  Step 5
§  Load data at it1
§  Step 6
§  Enable table access for mkt1
§  Step 7
§  Query Data as mkt1
§  --
Page 106 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Thank You
Page 106 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Page 107 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
This presentation contains forward-looking statements involving risks and uncertainties. Such forward-
looking statements in this presentation generally relate to future events, our ability to increase the
number of support subscription customers, the growth in usage of the Hadoop framework, our ability
to innovate and develop the various open source projects that will enhance the capabilities of the
Hortonworks Data Platform, anticipated customer benefits and general business outlook. In some
cases, you can identify forward-looking statements because they contain words such as “may,” “will,”
“should,” “expects,” “plans,” “anticipates,” “could,” “intends,” “target,” “projects,” “contemplates,”
“believes,” “estimates,” “predicts,” “potential” or “continue” or similar terms or expressions that concern
our expectations, strategy, plans or intentions. You should not rely upon forward-looking statements as
predictions of future events. We have based the forward-looking statements contained in this
presentation primarily on our current expectations and projections about future events and trends that
we believe may affect our business, financial condition and prospects. We cannot assure you that the
results, events and circumstances reflected in the forward-looking statements will be achieved or
occur, and actual results, events, or circumstances could differ materially from those described in the
forward-looking statements.
The forward-looking statements made in this prospectus relate only to events as of the date on which
the statements are made and we undertake no obligation to update any of the information in this
presentation.
Trademarks
Hortonworks is a trademark of Hortonworks, Inc. in the United States and other jurisdictions. Other
names used herein may be trademarks of their respective owners.
Page 107 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

More Related Content

What's hot

Enabling Diverse Workload Scheduling in YARN
Enabling Diverse Workload Scheduling in YARNEnabling Diverse Workload Scheduling in YARN
Enabling Diverse Workload Scheduling in YARNDataWorks Summit
 
Hortonworks Technical Workshop - HDP Search
Hortonworks Technical Workshop - HDP Search Hortonworks Technical Workshop - HDP Search
Hortonworks Technical Workshop - HDP Search Hortonworks
 
Deploying Docker applications on YARN via Slider
Deploying Docker applications on YARN via SliderDeploying Docker applications on YARN via Slider
Deploying Docker applications on YARN via SliderHortonworks
 
Log Analytics Optimization
Log Analytics OptimizationLog Analytics Optimization
Log Analytics OptimizationHortonworks
 
Securing Hadoop with Apache Ranger
Securing Hadoop with Apache RangerSecuring Hadoop with Apache Ranger
Securing Hadoop with Apache RangerDataWorks Summit
 
Double Your Hadoop Hardware Performance with SmartSense
Double Your Hadoop Hardware Performance with SmartSenseDouble Your Hadoop Hardware Performance with SmartSense
Double Your Hadoop Hardware Performance with SmartSenseHortonworks
 
Hp Converged Systems and Hortonworks - Webinar Slides
Hp Converged Systems and Hortonworks - Webinar SlidesHp Converged Systems and Hortonworks - Webinar Slides
Hp Converged Systems and Hortonworks - Webinar SlidesHortonworks
 
Webinar Series Part 5 New Features of HDF 5
Webinar Series Part 5 New Features of HDF 5Webinar Series Part 5 New Features of HDF 5
Webinar Series Part 5 New Features of HDF 5Hortonworks
 
Apache Ambari - HDP Cluster Upgrades Operational Deep Dive and Troubleshooting
Apache Ambari - HDP Cluster Upgrades Operational Deep Dive and TroubleshootingApache Ambari - HDP Cluster Upgrades Operational Deep Dive and Troubleshooting
Apache Ambari - HDP Cluster Upgrades Operational Deep Dive and TroubleshootingDataWorks Summit/Hadoop Summit
 
Hadoop crashcourse v3
Hadoop crashcourse v3Hadoop crashcourse v3
Hadoop crashcourse v3Hortonworks
 
Apache Ambari - What's New in 2.2
 Apache Ambari - What's New in 2.2 Apache Ambari - What's New in 2.2
Apache Ambari - What's New in 2.2Hortonworks
 
Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureApache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureDataWorks Summit
 
Hadoop and Spark – Perfect Together
Hadoop and Spark – Perfect TogetherHadoop and Spark – Perfect Together
Hadoop and Spark – Perfect TogetherHortonworks
 
Predictive Analytics and Machine Learning …with SAS and Apache Hadoop
Predictive Analytics and Machine Learning…with SAS and Apache HadoopPredictive Analytics and Machine Learning…with SAS and Apache Hadoop
Predictive Analytics and Machine Learning …with SAS and Apache HadoopHortonworks
 
What s new in spark 2.3 and spark 2.4
What s new in spark 2.3 and spark 2.4What s new in spark 2.3 and spark 2.4
What s new in spark 2.3 and spark 2.4DataWorks Summit
 
Internet of things Crash Course Workshop
Internet of things Crash Course WorkshopInternet of things Crash Course Workshop
Internet of things Crash Course WorkshopDataWorks Summit
 
Hortonworks Technical Workshop - build a yarn ready application with apache ...
Hortonworks Technical Workshop -  build a yarn ready application with apache ...Hortonworks Technical Workshop -  build a yarn ready application with apache ...
Hortonworks Technical Workshop - build a yarn ready application with apache ...Hortonworks
 
Combine SAS High-Performance Capabilities with Hadoop YARN
Combine SAS High-Performance Capabilities with Hadoop YARNCombine SAS High-Performance Capabilities with Hadoop YARN
Combine SAS High-Performance Capabilities with Hadoop YARNHortonworks
 
Hortonworks Hadoop summit 2011 keynote - eric14
Hortonworks Hadoop summit 2011 keynote - eric14Hortonworks Hadoop summit 2011 keynote - eric14
Hortonworks Hadoop summit 2011 keynote - eric14Hortonworks
 
Splunk-hortonworks-risk-management-oct-2014
Splunk-hortonworks-risk-management-oct-2014Splunk-hortonworks-risk-management-oct-2014
Splunk-hortonworks-risk-management-oct-2014Hortonworks
 

What's hot (20)

Enabling Diverse Workload Scheduling in YARN
Enabling Diverse Workload Scheduling in YARNEnabling Diverse Workload Scheduling in YARN
Enabling Diverse Workload Scheduling in YARN
 
Hortonworks Technical Workshop - HDP Search
Hortonworks Technical Workshop - HDP Search Hortonworks Technical Workshop - HDP Search
Hortonworks Technical Workshop - HDP Search
 
Deploying Docker applications on YARN via Slider
Deploying Docker applications on YARN via SliderDeploying Docker applications on YARN via Slider
Deploying Docker applications on YARN via Slider
 
Log Analytics Optimization
Log Analytics OptimizationLog Analytics Optimization
Log Analytics Optimization
 
Securing Hadoop with Apache Ranger
Securing Hadoop with Apache RangerSecuring Hadoop with Apache Ranger
Securing Hadoop with Apache Ranger
 
Double Your Hadoop Hardware Performance with SmartSense
Double Your Hadoop Hardware Performance with SmartSenseDouble Your Hadoop Hardware Performance with SmartSense
Double Your Hadoop Hardware Performance with SmartSense
 
Hp Converged Systems and Hortonworks - Webinar Slides
Hp Converged Systems and Hortonworks - Webinar SlidesHp Converged Systems and Hortonworks - Webinar Slides
Hp Converged Systems and Hortonworks - Webinar Slides
 
Webinar Series Part 5 New Features of HDF 5
Webinar Series Part 5 New Features of HDF 5Webinar Series Part 5 New Features of HDF 5
Webinar Series Part 5 New Features of HDF 5
 
Apache Ambari - HDP Cluster Upgrades Operational Deep Dive and Troubleshooting
Apache Ambari - HDP Cluster Upgrades Operational Deep Dive and TroubleshootingApache Ambari - HDP Cluster Upgrades Operational Deep Dive and Troubleshooting
Apache Ambari - HDP Cluster Upgrades Operational Deep Dive and Troubleshooting
 
Hadoop crashcourse v3
Hadoop crashcourse v3Hadoop crashcourse v3
Hadoop crashcourse v3
 
Apache Ambari - What's New in 2.2
 Apache Ambari - What's New in 2.2 Apache Ambari - What's New in 2.2
Apache Ambari - What's New in 2.2
 
Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureApache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and Future
 
Hadoop and Spark – Perfect Together
Hadoop and Spark – Perfect TogetherHadoop and Spark – Perfect Together
Hadoop and Spark – Perfect Together
 
Predictive Analytics and Machine Learning …with SAS and Apache Hadoop
Predictive Analytics and Machine Learning…with SAS and Apache HadoopPredictive Analytics and Machine Learning…with SAS and Apache Hadoop
Predictive Analytics and Machine Learning …with SAS and Apache Hadoop
 
What s new in spark 2.3 and spark 2.4
What s new in spark 2.3 and spark 2.4What s new in spark 2.3 and spark 2.4
What s new in spark 2.3 and spark 2.4
 
Internet of things Crash Course Workshop
Internet of things Crash Course WorkshopInternet of things Crash Course Workshop
Internet of things Crash Course Workshop
 
Hortonworks Technical Workshop - build a yarn ready application with apache ...
Hortonworks Technical Workshop -  build a yarn ready application with apache ...Hortonworks Technical Workshop -  build a yarn ready application with apache ...
Hortonworks Technical Workshop - build a yarn ready application with apache ...
 
Combine SAS High-Performance Capabilities with Hadoop YARN
Combine SAS High-Performance Capabilities with Hadoop YARNCombine SAS High-Performance Capabilities with Hadoop YARN
Combine SAS High-Performance Capabilities with Hadoop YARN
 
Hortonworks Hadoop summit 2011 keynote - eric14
Hortonworks Hadoop summit 2011 keynote - eric14Hortonworks Hadoop summit 2011 keynote - eric14
Hortonworks Hadoop summit 2011 keynote - eric14
 
Splunk-hortonworks-risk-management-oct-2014
Splunk-hortonworks-risk-management-oct-2014Splunk-hortonworks-risk-management-oct-2014
Splunk-hortonworks-risk-management-oct-2014
 

Viewers also liked

Building a Graph Database in Neo4j with Spark & Spark SQL to gain new insight...
Building a Graph Database in Neo4j with Spark & Spark SQL to gain new insight...Building a Graph Database in Neo4j with Spark & Spark SQL to gain new insight...
Building a Graph Database in Neo4j with Spark & Spark SQL to gain new insight...DataWorks Summit/Hadoop Summit
 
Apache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data ApplicationsApache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data ApplicationsHortonworks
 
Hadoop Cluster on Docker Containers
Hadoop Cluster on Docker ContainersHadoop Cluster on Docker Containers
Hadoop Cluster on Docker Containerspranav_joshi
 
Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....
Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....
Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....Jeffrey Breen
 
Docker Swarm Cluster
Docker Swarm ClusterDocker Swarm Cluster
Docker Swarm ClusterFernando Ike
 
Docker based Hadoop provisioning - Hadoop Summit 2014
Docker based Hadoop provisioning - Hadoop Summit 2014 Docker based Hadoop provisioning - Hadoop Summit 2014
Docker based Hadoop provisioning - Hadoop Summit 2014 Janos Matyas
 
Configuring Your First Hadoop Cluster On EC2
Configuring Your First Hadoop Cluster On EC2Configuring Your First Hadoop Cluster On EC2
Configuring Your First Hadoop Cluster On EC2benjaminwootton
 
Hadoop on Docker
Hadoop on DockerHadoop on Docker
Hadoop on DockerRakesh Saha
 
Lessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker ContainersLessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker ContainersBlueData, Inc.
 
Managing Docker Containers In A Cluster - Introducing Kubernetes
Managing Docker Containers In A Cluster - Introducing KubernetesManaging Docker Containers In A Cluster - Introducing Kubernetes
Managing Docker Containers In A Cluster - Introducing KubernetesMarc Sluiter
 

Viewers also liked (11)

Building a Graph Database in Neo4j with Spark & Spark SQL to gain new insight...
Building a Graph Database in Neo4j with Spark & Spark SQL to gain new insight...Building a Graph Database in Neo4j with Spark & Spark SQL to gain new insight...
Building a Graph Database in Neo4j with Spark & Spark SQL to gain new insight...
 
Apache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data ApplicationsApache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data Applications
 
Hadoop Cluster on Docker Containers
Hadoop Cluster on Docker ContainersHadoop Cluster on Docker Containers
Hadoop Cluster on Docker Containers
 
Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....
Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....
Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....
 
Simplified Cluster Operation & Troubleshooting
Simplified Cluster Operation & TroubleshootingSimplified Cluster Operation & Troubleshooting
Simplified Cluster Operation & Troubleshooting
 
Docker Swarm Cluster
Docker Swarm ClusterDocker Swarm Cluster
Docker Swarm Cluster
 
Docker based Hadoop provisioning - Hadoop Summit 2014
Docker based Hadoop provisioning - Hadoop Summit 2014 Docker based Hadoop provisioning - Hadoop Summit 2014
Docker based Hadoop provisioning - Hadoop Summit 2014
 
Configuring Your First Hadoop Cluster On EC2
Configuring Your First Hadoop Cluster On EC2Configuring Your First Hadoop Cluster On EC2
Configuring Your First Hadoop Cluster On EC2
 
Hadoop on Docker
Hadoop on DockerHadoop on Docker
Hadoop on Docker
 
Lessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker ContainersLessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker Containers
 
Managing Docker Containers In A Cluster - Introducing Kubernetes
Managing Docker Containers In A Cluster - Introducing KubernetesManaging Docker Containers In A Cluster - Introducing Kubernetes
Managing Docker Containers In A Cluster - Introducing Kubernetes
 

Similar to New Capabilities in HDP 2.3

What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?DataWorks Summit
 
What's New in Apache Hive 3.0 - Tokyo
What's New in Apache Hive 3.0 - TokyoWhat's New in Apache Hive 3.0 - Tokyo
What's New in Apache Hive 3.0 - TokyoDataWorks Summit
 
20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weitingWei Ting Chen
 
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015Rajit Saha
 
Hive Performance Dataworks Summit Melbourne February 2019
Hive Performance Dataworks Summit Melbourne February 2019Hive Performance Dataworks Summit Melbourne February 2019
Hive Performance Dataworks Summit Melbourne February 2019alanfgates
 
Fast SQL on Hadoop, Really?
Fast SQL on Hadoop, Really?Fast SQL on Hadoop, Really?
Fast SQL on Hadoop, Really?DataWorks Summit
 
DataEng Mad - 03.03.2020 - Tibero 30-min Presentation.pdf
DataEng Mad - 03.03.2020 - Tibero 30-min Presentation.pdfDataEng Mad - 03.03.2020 - Tibero 30-min Presentation.pdf
DataEng Mad - 03.03.2020 - Tibero 30-min Presentation.pdfMiguel Angel Fajardo
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Cécile Poyet
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Hortonworks
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Cécile Poyet
 
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...Sumeet Singh
 
Fast SQL on Hadoop, really?
Fast SQL on Hadoop, really?Fast SQL on Hadoop, really?
Fast SQL on Hadoop, really?DataWorks Summit
 
Taming Big Data with Big SQL 3.0
Taming Big Data with Big SQL 3.0Taming Big Data with Big SQL 3.0
Taming Big Data with Big SQL 3.0Nicolas Morales
 
What it takes to run Hadoop at Scale: Yahoo! Perspectives
What it takes to run Hadoop at Scale: Yahoo! PerspectivesWhat it takes to run Hadoop at Scale: Yahoo! Perspectives
What it takes to run Hadoop at Scale: Yahoo! PerspectivesDataWorks Summit
 
Hive 3 - a new horizon
Hive 3 - a new horizonHive 3 - a new horizon
Hive 3 - a new horizonThejas Nair
 
1 extreme performance - part i
1   extreme performance - part i1   extreme performance - part i
1 extreme performance - part isqlserver.co.il
 
SPCA2013 - Windows Azure for SharePoint People
SPCA2013 - Windows Azure for SharePoint PeopleSPCA2013 - Windows Azure for SharePoint People
SPCA2013 - Windows Azure for SharePoint PeopleNCCOMMS
 

Similar to New Capabilities in HDP 2.3 (20)

What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?
 
What's New in Apache Hive 3.0 - Tokyo
What's New in Apache Hive 3.0 - TokyoWhat's New in Apache Hive 3.0 - Tokyo
What's New in Apache Hive 3.0 - Tokyo
 
20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting
 
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015
 
Hive Performance Dataworks Summit Melbourne February 2019
Hive Performance Dataworks Summit Melbourne February 2019Hive Performance Dataworks Summit Melbourne February 2019
Hive Performance Dataworks Summit Melbourne February 2019
 
Fast SQL on Hadoop, Really?
Fast SQL on Hadoop, Really?Fast SQL on Hadoop, Really?
Fast SQL on Hadoop, Really?
 
DataEng Mad - 03.03.2020 - Tibero 30-min Presentation.pdf
DataEng Mad - 03.03.2020 - Tibero 30-min Presentation.pdfDataEng Mad - 03.03.2020 - Tibero 30-min Presentation.pdf
DataEng Mad - 03.03.2020 - Tibero 30-min Presentation.pdf
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It!
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It!
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It!
 
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
 
Fast SQL on Hadoop, really?
Fast SQL on Hadoop, really?Fast SQL on Hadoop, really?
Fast SQL on Hadoop, really?
 
Taming Big Data with Big SQL 3.0
Taming Big Data with Big SQL 3.0Taming Big Data with Big SQL 3.0
Taming Big Data with Big SQL 3.0
 
What it takes to run Hadoop at Scale: Yahoo! Perspectives
What it takes to run Hadoop at Scale: Yahoo! PerspectivesWhat it takes to run Hadoop at Scale: Yahoo! Perspectives
What it takes to run Hadoop at Scale: Yahoo! Perspectives
 
Hive 3 a new horizon
Hive 3  a new horizonHive 3  a new horizon
Hive 3 a new horizon
 
Hive 3 - a new horizon
Hive 3 - a new horizonHive 3 - a new horizon
Hive 3 - a new horizon
 
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in HiveLLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
 
1 extreme performance - part i
1   extreme performance - part i1   extreme performance - part i
1 extreme performance - part i
 
SPCA2013 - Windows Azure for SharePoint People
SPCA2013 - Windows Azure for SharePoint PeopleSPCA2013 - Windows Azure for SharePoint People
SPCA2013 - Windows Azure for SharePoint People
 
What's new in Ambari
What's new in AmbariWhat's new in Ambari
What's new in Ambari
 

More from Hortonworks

Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks
 
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyIoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyHortonworks
 
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakGetting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakHortonworks
 
Johns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsJohns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsHortonworks
 
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysCatch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysHortonworks
 
HDF 3.2 - What's New
HDF 3.2 - What's NewHDF 3.2 - What's New
HDF 3.2 - What's NewHortonworks
 
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerCuring Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerHortonworks
 
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsInterpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsHortonworks
 
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeIBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeHortonworks
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidHortonworks
 
Accelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleAccelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleHortonworks
 
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATATIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATAHortonworks
 
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Hortonworks
 
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseDelivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseHortonworks
 
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseMaking Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseHortonworks
 
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationWebinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationHortonworks
 
Driving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementDriving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementHortonworks
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHortonworks
 
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks
 
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCUnlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCHortonworks
 

More from Hortonworks (20)

Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
 
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyIoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
 
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakGetting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with Cloudbreak
 
Johns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsJohns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log Events
 
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysCatch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
 
HDF 3.2 - What's New
HDF 3.2 - What's NewHDF 3.2 - What's New
HDF 3.2 - What's New
 
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerCuring Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
 
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsInterpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
 
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeIBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data Landscape
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache Druid
 
Accelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleAccelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at Scale
 
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATATIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
 
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
 
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseDelivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
 
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseMaking Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with Ease
 
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationWebinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
 
Driving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementDriving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data Management
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
 
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
 
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCUnlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDC
 

Recently uploaded

What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 

Recently uploaded (20)

What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 

New Capabilities in HDP 2.3

  • 1. New In HDP 2.3 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
  • 3. About Hortonworks Customer Momentum •  556 customers (as of August 5, 2015) •  119 customers added in Q2 2015 •  Publicly traded on NASDAQ: HDP Hortonworks Data Platform •  Completely open multi-tenant platform for any app and any data •  Consistent enterprise services for security, operations, and governance Partner for Customer Success •  Leader in open-source community, focused on innovation to meet enterprise needs •  Unrivaled Hadoop support subscriptions Founded in 2011 Original 24 architects, developers, operators of Hadoop from Yahoo! 740+ E M P L O Y E E S 1350+ E C O S Y S T E M PA R T N E R S
  • 4. HDP Is Enterprise Hadoop Hortonworks Data Platform YARN: Data Operating System (Cluster Resource Management) 1 ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° Script Pig SQL Hive Tez Tez Java Scala Cascading Tez ° ° ° ° ° ° ° ° ° ° ° ° ° ° Others ISV Engines HDFS (Hadoop Distributed File System) Stream Storm Search Solr NoSQL HBase Accumulo Slider Slider SECURITYGOVERNANCE OPERATIONSBATCH, INTERACTIVE & REAL-TIME DATA ACCESS In-Memory Spark Provision, Manage & Monitor Ambari Zookeeper Scheduling Oozie Data Workflow, Lifecycle & Governance Falcon Sqoop Flume Kafka NFS WebHDFS Authentication Authorization Accounting Data Protection Storage: HDFS Resources: YARN Access: Hive, … Pipeline: Falcon Cluster: Knox Cluster: Ranger Deployment ChoiceLinux Windows On-Premises Cloud YARN is the architectural center of HDP Enables batch, interactive and real-time workloads Provides comprehensive enterprise capabilities The widest range of deployment options Delivered Completely in the OPEN
  • 5. Hortonworks Data Platform HORTONWORKS DATA PLATFORM Hadoop& YARN Flume Oozie Pig Hive Tez Sqoop Cloudbreak Ambari Slider Kafka Knox Solr Zookeeper Spark Falcon Ranger HBase Atlas Accumulo Storm Phoenix 4.10.2 DATA MGMT DATA ACCESS GOVERNANCE & INTEGRATION OPERATIONS SECURITY HDP 2.2 Dec 2014 HDP 2.1 April 2014 HDP 2.0 Oct 2013 HDP 2.2 Dec 2014 HDP 2.1 April 2014 HDP 2.0 Oct 2013 0.12.0 0.12.0 0.12.1 0.13.0 0.4.0 1.4.4 1.4.4 3.3.23.4.5 0.4.00.5.0 0.14.0 0.14.0 3.4.6 0.5.0 0.4.00.9.30.5.2 4.0.04.7.2 1.2.1 0.60.0 0.98.4 4.2.0 1.6.1 0.6.0 1.5.21.4.5 4.1.02.0.0 1.4.0 1.5.1 4.0.0 1.3.1 1.5.1 1.4.4 3.4.5 2.2.0 2.4.0 2.6.0 0.96.1 0.98.0 0.9.1 0.8.1 HDP 2.3 July 2015 1.3.12.7.1 1.4.6 1.0.0 0.6.0 0.5.02.1.00.8.2 3.4.61.5.25.2.1 0.80.0 1.1.1 0.5.01.7.04.4.0 0.10.0 0.6.10.7.01.2.10.15.0 4.2.0 Ongoing Innovation in Apache
  • 6. New Capabilities in Hortonworks Data Platform 2.3 Breakthrough User Experience Dramatic Improvement in the User Experience HDP 2.3 eliminates much of the complexity administering Hadoop and improves developer productivity. Enhanced Security and Governance Enhanced Security and Data Governance HDP 2.3 delivers new encryption of data at rest, and extends the data governance initiative with Apache™ Atlas. Proactive Support Extending the Value of a Hortonworks Subscription Hortonworks® SmartSense™ adds proactive cluster monitoring, enhancing Hortonworks’ award-winning support in key areas. Apache is a trademark of the Apache Software Foundation.
  • 7. Page 7 © Hortonworks Inc. 2011 – 2015. All Rights Reserved New In Apache Hadoop
  • 8. Page 8 © Hortonworks Inc. 2011 – 2015. All Rights Reserved HDP Core User Experience •  Guided Configuration •  Install/Manage/Monitor NFS Gateway •  Customizable Dashboards •  Files View •  Capacity Scheduler Workload Management •  Non-Exclusive Node Labels •  Fair Sharing Policy •  [TP] Local Disk Isolation Security •  HDFS Data Encryption at Rest •  Yarn Queue ACLs through Ranger Operations •  Report on Bad Disks •  Enhanced Distcp (using snapshots) •  Quotas for Storage Tiers
  • 9. Page 9 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Simplified Configuration Management
  • 10. Page 10 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Deploy/Manage/Monitor NFS through Ambari Monitor ManageDeploy Starts ‘portmap’ and ‘nfs’
  • 11. Page 11 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Detect Bad Disks Detect “bad” disk volumes on a DataNode HDFS-7604
  • 12. Page 12 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Enhanced HDFS Mirroring Efficiency: Reliability: HDFS-7535 1st Snapshot 2nd Snapshot Source Cluster Target Cluster Backup Create first snapshot during initial copy Copy only files in delta Use snapshot to calculate differential Snapshot Diff faster than MapReduce based Diff for large directories Snapshots ensure catch any changes to Source directory during Distcp do not disrupt mirror
  • 13. Page 13 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Quota Management By Storage Tiers DISK DISK DISK DISK DISK DISK DISK DISK DISK ARCHIVE ARCHIVE ARCHIVE ARCHIVE ARCHIVE ARCHIVE HDP Cluster Warm 1 replica on DISK, others on ARCHIVE DataSet A Cold All replicas on ARCHIVE DataSet B A A A ARCHIVE ARCHIVE ARCHIVE B B B HDP 2.2
  • 14. Page 14 © Hortonworks Inc. 2011 – 2015. All Rights Reserved HDFS Quotas: Extending to Tiered Storage Quota: Number of files for a directory hdfs dfsadmin –setQuota n <list of directories> Sets total number of files that can be stored in each directory. Quota: Total disk space for a directory hdfs dfsadmin –setSpaceQuota n <list of directories> Sets total disk space that can be used by each directory. New in HDP 2.3: Quota by Storage Tier hdfs dfsadmin –setSpaceQuota n [-storageType <type>] <list of directories> Sets total disk space that can be used by each directory. HDFS-7584
  • 15. Page 15 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Node Labels in YARN Enable configuration of node partitions Now with HDP 2.3, two options: Non-exclusive Node Labels Exclusive Node Labels
  • 16. Page 16 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Storm Storm StormStorm Exclusive Node Labels enable Isolated Partitions S App Storm Configure Partitions Storm B App Exclusive Labels enforce Isolation S S nodes labels S S HDP 2.2
  • 17. Page 17 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Spark Spark SparkSpark Non-Exclusive Node Labels S App Spark Configure non- exclusive labels Spark B App Schedule if free capacity S S nodes labels S S B YARN-3214
  • 18. Page 18 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Fair Sharing: Pluggable Queue Policies Choose scheduling policy per leaf queue FIFO Application Container requests accommodated on first come first serve basis Multi-fair weight Application Container requests accommodated according to: •  Order of least resources used – multiple applications make progress •  (Optional) Size based weight – adjustment to boost large applications making progress YARN-3319YARN-3318
  • 19. Page 19 © Hortonworks Inc. 2011 – 2015. All Rights Reserved New In Apache Hive
  • 20. Page 20 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Hive §  Performance §  Vectorized Map Joins and other improvements §  SQL §  Union §  Interval types §  CURRENT_TIMESTAMP, CURRENT_DATE §  Usability §  Configurations §  Hive View §  Tez View
  • 21. Page 21 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Vectorized Map Join SELECT Count(*) FROM store_sales JOIN customer_demographics2 ON ss_cdemo_sk = cd_demo_sk AND cd_demo_sk2 < 96040 AND ss_sold_date_sk BETWEEN 2450815 AND 2451697 SELECT Count(*) FROM store_sales LEFT OUTER JOIN customer_demographics2 ON ss_cdemo_sk = cd_demo_sk AND cd_demo_sk2 < 96040 AND ss_sold_date_sk BETWEEN 2450815 AND 2451697 Map Join is up to 5x faster, making the overall query up to 2x faster in HDP 2.3 over Champlain mapjoin_20.sql means the query had a selectivity of 20 or 20% of rows end up joining
  • 22. Page 22 © Hortonworks Inc. 2011 – 2015. All Rights Reserved New SQL Syntax: Union create table sample_03(name varchar(50), age int, gpa decimal(3, 2)); create table sample_04(name varchar(50), age int, gpa decimal(3, 2)); insert into table sample_03 values ('aaa', 35, 3.00), ('bbb', 32, 3.00), ('ccc', 32, 3.00), ('ddd', 35, 3.00), ('eee', 32, 3.00); insert into table sample_04 values ('ccc', 32, 3.00), ('ddd', 35, 3.00), ('eee', 32, 3.00), ('fff', 35, 3.00), ('ggg', 32, 3.00); hive> select * from sample_03 UNION select * from sample_04; Query ID = ambari-qa_20150526023228_198786c5-5c89-4a38-9246-cbba9b903ab4 Total jobs = 1 Launching Job 1 out of 1 Status: Running (Executing on YARN cluster with App id application_1432604373833_0002) -------------------------------------------------------------------------------- VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED -------------------------------------------------------------------------------- Map 1 .......... SUCCEEDED 1 1 0 0 0 0 Map 4 .......... SUCCEEDED 1 1 0 0 0 0 Reducer 3 ...... SUCCEEDED 1 1 0 0 0 0 -------------------------------------------------------------------------------- VERTICES: 03/03 [==========================>>] 100% ELAPSED TIME: 8.48 s -------------------------------------------------------------------------------- OK aaa 35 3 bbb 32 3 ccc 32 3 ddd 35 3 eee 32 3 fff 35 3 ggg 32 3 Time taken: 11.208 seconds, Fetched: 7 row(s)
  • 23. Page 23 © Hortonworks Inc. 2011 – 2015. All Rights Reserved New SQL Syntax: Interval Type in Expressions hive> select timestamp '2015-03-08 01:00:00' + interval '1' hour; OK 2015-03-08 02:00:00 Time taken: 0.136 seconds, Fetched: 1 row(s) hive> select timestamp '2015-03-08 00:00:00' + interval '23' hour; OK 2015-03-08 23:00:00 Time taken: 0.057 seconds, Fetched: 1 row(s) hive> select timestamp '2015-03-08 00:00:00' + interval '24' hour; OK 2015-03-09 00:00:00 Time taken: 0.149 seconds, Fetched: 1 row(s) hive> select timestamp '2015-03-08 00:00:00' + interval '1' day; OK 2015-03-09 00:00:00 Time taken: 0.063 seconds, Fetched: 1 row(s) hive> select timestamp '2015-02-09 00:00:00' + interval '1' month; OK 2015-03-09 00:00:00 Time taken: 0.107 seconds, Fetched: 1 row(s) hive> select current_timestamp - interval '24' hour; OK 2015-05-25 02:35:13.89 Time taken: 0.181 seconds, Fetched: 1 row(s) hive> select current_date; OK 2015-05-26 Time taken: 0.102 seconds, Fetched: 1 row(s hive> select current_timestamp; OK 2015-05-26 02:33:15.428 Time taken: 0.091 seconds, Fetched: 1 row(s
  • 24. Page 24 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Not Supported: Interval Type in Tables hive> CREATE TABLE t1 (c1 INTERVAL YEAR TO MONTH); NoViableAltException(142@[]) at org.apache.hadoop.hive.ql.parse.HiveParser.type(HiveParser.java:38574) at org.apache.hadoop.hive.ql.parse.HiveParser.colType(HiveParser.java:38331) ... at org.apache.hadoop.util.RunJar.main(RunJar.java:136) FAILED: ParseException line 1:20 cannot recognize input near 'INTERVAL' 'YEAR' 'TO' in column type hive> CREATE TABLE t1 (c1 INTERVAL DAY(5) TO SECOND(3)); NoViableAltException(142@[]) at org.apache.hadoop.hive.ql.parse.HiveParser.type(HiveParser.java:38574) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) FAILED: ParseException line 1:20 cannot recognize input near 'INTERVAL' 'DAY' '(' in column type
  • 25. Page 25 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Simplified Configuration Management
  • 26. Page 26 © Hortonworks Inc. 2011 – 2015. All Rights Reserved New In Apache HBASE
  • 27. Page 27 © Hortonworks Inc. 2011 – 2015. All Rights Reserved HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION HBase and Phoenix in HDP 2.3 HBase  and  Phoenix  in  HDP  2.3   Opera5ons   Scale  and  Robustness   Developer   HBase   •  Next  Genera-on  Ambari  UI.   •  Customizable  Dashboards.   •  Supported  init.d  scripts.   •  Improved  HMaster  Reliability   •  Security:   •  Namespaces.   •  Encryp-on.   •  Authoriza-on   Improvements.   •  Cell-­‐Level  Security.   •  LOB  support   Phoenix   •  Phoenix  Slider  Support   •  HBase  Read  HA  Support   •  Func-onal  Indexes   •  Query  Tracing   •  Phoenix  SQL:   •  UNION  ALL   •  UDFs   •  7  New  Date/Time  Func-ons   •  Spark  Driver   •  PhoenixServer  
  • 28. Page 28 © Hortonworks Inc. 2011 – 2015. All Rights Reserved HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Simplified Configuration Management Guide  configura-on  and  provide   recommenda-ons   for   the   most   common  seTngs.  
  • 29. Page 29 © Hortonworks Inc. 2011 – 2015. All Rights Reserved HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Build Your Own HBase Dashboard Monitor  the  metrics  that  ma@er  to  you.   1.  Select  a  pre-­‐defined  visualiza-on.   2.  Choose  from  more  than  >  1000  metrics,   ranging  from  HBase,  HDFS,  MapReduce2   and  YARN.   3.  Define  custom  aggrega-ons  for  metrics   within  one  component  or  across   components.   1 2 3
  • 30. Page 30 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Namespaces and Delegated Admin Namespaces •  Namespaces are like RDBMS schemas. •  Introduced in HBase 0.96. •  Many security gaps until HBase 1.0. Delegated Administration •  Goal: Create a namespace and hand it over to a DBA. •  People in the namespace can’t do anything outside their namespace.
  • 31. Page 31 © Hortonworks Inc. 2011 – 2015. All Rights Reserved HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Security: Namespaces, Tables, Authorizations Scopes: •  Authorization scopes: Global -> namespace -> table -> column family -> cell. Access Levels: •  Read, Write, Execute, Create, Admin
  • 32. Page 32 © Hortonworks Inc. 2011 – 2015. All Rights Reserved HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Delegated Administration Example Give a user their own Namespace to play in. •  Step 1: Superuser (e.g. user hbase) creates namespace foo. •  create_namespace ‘foo’ •  Step 2: Admin gives dba-bar full permissions to the namespace: •  grant ’dba-bar', 'RWXCA', '@foo’ •  Note: namespaces are prefixed by @. •  Step 3: dba-bar creates tables within the namespace: •  create ’foo:t1', 'f1’ •  Step 4: dba-bar hands out permissions to the tables: •  grant ‘user-x’, ‘RWXCA’, ‘foo:t1’ •  Note: All users will be able to see namespaces and tables within namespaces, but not the data.
  • 33. Page 33 © Hortonworks Inc. 2011 – 2015. All Rights Reserved HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Turning Authorization On Turn Authorization On in Non-Kerberized (test) Clusters: •  Set hbase.security.authorization  = true •  Set hbase.coprocessor.master.classes = org.apache.hadoop.hbase.security.access.AccessController   •  Set hbase.coprocessor.region.classes = org.apache.hadoop.hbase.security.access.AccessController   •  Set hbase.coprocessor.regionserver.classes = org.apache.hadoop.hbase.security.access.AccessController   Authorization in Kerberized Clusters: •  hbase.coprocessor.region.classes  should have both org.apache.hadoop.hbase.security.token.TokenProvider and org.apache.hadoop.hbase.security.access.AccessController  
  • 34. Page 34 © Hortonworks Inc. 2011 – 2015. All Rights Reserved SQL in Phoenix / HDP 2.3 UNION ALL Date / Time Functions •  now(), year, month, week, dayofmonth, curdate •  hour, minute, second Custom UDFs •  Row-level UDFs. Tracing •  Trace a query to pinpoint bottlenecks.
  • 35. Page 35 © Hortonworks Inc. 2011 – 2015. All Rights Reserved HBase   RegionServer  1   HBase   RegionServer  2   HBase   RegionServer  3   HBase   RegionServer  4   Phoenix  Query  Server:  Suppor5ng  Non-­‐Java  Drivers   HTTP Endpoint HTTP Endpoint HTTP Endpoint HTTP Endpoint Python Client .NET Client Request  proxied   if  needed   Thri^  RPC  to  endpoint   on  any  RegionServer   1 Endpoints  colocated  with  RegionServers.  No  Single-­‐Point-­‐of-­‐Failure.  Op-onal  loadbalancer.   2 Endpoints  can  proxy  requests  or  perform  local  aggrega-ons  
  • 36. Page 36 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Using Phoenix Query Server Client Side: •  Thin JDBC Driver: /usr/hdp/current/phoenix/phoenix-thin-client.jar (1.7mb versus 44mb) •  Does not require Zookeeper access. •  Wrapper Script: sqlline-thin.py •  sqlline-thin.py https://host:8765 Server Side: •  Ambari Install and Management: Yes •  Port: Default = 8765 HTTP Example: •  curl  -­‐XPOST  -­‐H  'request:  {"request":"prepareAndExecute","connectionId":"aaaaaaaa-­‐ aaaa-­‐aaaa-­‐aaaa-­‐aaaaaaaaaaaa","sql":"select  count(*)  from  PRICES","maxRowCount":-­‐1}'   http://localhost:8765/    
  • 37. Page 37 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Phoenix / Spark integration in HDP 2.3 Phoenix / Spark Connector •  Load Phoenix tables / views into RDDs or DataFrames. •  Integrate with Spark, Spark Streaming and SparkSQL.
  • 38. Page 38 © Hortonworks Inc. 2011 – 2015. All Rights Reserved New In Apache Storm
  • 39. Page 39 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Stream Processing Ready For Mainstream Adoption Stream analysis, scalable across the cluster Nimbus High Availability No single point of failure for stream processing job management Ease of Deployment Quickly create stream processing pipelines via Flux Rolling Upgrades Update Storm to newer versions, with zero downtime Enhanced Security for Kafka Authorization via Ranger and authentication via Kerberos Storage YARN: Data Operating System Governance Security Operations Resource Management
  • 40. Page 40 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Connectivity Enhancements Apache Storm 0.10.0 •  Microsoft Azure Event Hubs Integration •  Redis Support •  JDBC/RDBMS Integration •  Solr 5.2.1 -- Storm Bolt: some assembly required Kafka 0.8.2 •  Flume Integration (originally released in HDP 2.2) – not supported when Kafka Security is activated
  • 41. Page 41 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Storm Nimbus High Availability NIMBUS-­‐1   SUPERVISOR-­‐1   SUPERVISOR-­‐N   Zookeeper-­‐1   Zookeeper-­‐2   Zookeeper-­‐N   Storm  UI  DRPC   NIMBUS-­‐2   NIMBUS-­‐N   Nimbus HA uses leader election to determine primary
  • 42. Page 42 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Productivity Partial Key Groupings •  The stream can be partitioned by fields specified in the grouping, like the Fields grouping, but in this case are load balanced between two downstream bolts, which provides better utilization of resources when the incoming data is skewed. Reduced Dependency Conflicts with shaded JARs •  This enhancement provides clear separation between the Storm engine and supporting code from the topology code provided by developers.
  • 43. Page 43 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Productivity Declarative Topology Wiring with Flux •  Define Storm Core API (Spouts/Bolts) using a flexible YAML DSL •  YAML DSL support for most Storm components (storm-kafka, storm-hdfs, storm-hbase, etc.) •  Convenient support for multi-lang components •  External property substitution/filtering for easily switching between configurations/environments (similar to Maven-style ${variable.name} substitution) Examples https://github.com/apache/storm/tree/master/external/flux/flux-examples
  • 44. Page 44 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Security §  Storm §  User Impersonation §  SSL Support for Storm UI, Log Viewer, and DRPC (Distributed Remote Procedure Call) §  Automatic credential renewal §  Kafka §  Kerberos-based Authentication §  Pluggable Authorization and Apache Ranger Integration
  • 45. Page 45 © Hortonworks Inc. 2011 – 2015. All Rights Reserved New In HDP Search
  • 46. Page 46 © Hortonworks Inc. 2011 – 2015. All Rights Reserved HDP Search 2.3 HDP Search 2.2 HDP Search 2.3 Package jar RPM Solr, SiLK (Banana), Connectors all in one package Solr 4.10.2 5.2.1 Latest stable release version of Solr (Included with package) HDFS 2.5 2.7.1 Batch Indexing from HDFS (Included with package) Hive 0.14.0 1.2.1 Batch indexing from Hive tables (Included with package) Pig 0.14.0 0.15.0 Batch indexing from pig jobs (Included with package) Storm X 0.10.0 Streaming data real-time index (access from https://github.com/LucidWorks/storm-solr ) Spark Streaming X 1.3.1 Streaming data real-time index (Included with package) Security X Included in Solr 5.2.1 Kerberos and Ranger support (Included with Solr) HBase X 1.1.1 1.  Near Real time indexing of data from HBase tables 2.  Batch indexing from HBase tables (Included with package) Ranger X 0.5.0 Extend Ranger security configuration to HDP Search
  • 47. Page 47 © Hortonworks Inc. 2011 – 2015. All Rights Reserved HDP Search: Packaging and Access Available as RPM package Downloadable from HDP-UTILS repo yum install “lucidworks-hdp-search”
  • 48. Page 48 © Hortonworks Inc. 2011 – 2015. All Rights Reserved HBase Near Real Time Indexing into Solr HBase HBase Indexer HDFS SolrCloud SolrCloud SolrCloud SolrCloud Indexer for table to collection Asynch. replication from row update to document insert into index
  • 49. Page 49 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Hbase Indexer Hbase Realtime Indexer: •  The HBase Indexer provides the ability to stream events from HBase to Solr for near real time searching. •  HBase indexer is included with Lucidworks HDPSearch as an additional service •  The indexer works by acting as an HBase replication sink. •  As updates are written to HBase, the events are asynchronously replicated to the HBase Indexer processes, which in turn creates Solr documents and pushes them to Solr. Bulk Indexing: •  Run a batch indexing job that will index data already contained within an HBase table. •  The batch indexing tool operates with the same indexing semantics as the near-real-time indexer, and it is run as a MapReduce job. •  The batch indexing can be run as multiple indexers that run over HBase regions and write data directly to Solr •  Indexing shards can be generated offline and then merged into a running SolrCloud cluster using the --go-live flag Thread is a parameter and can parallelize the indexing process
  • 50. Page 50 © Hortonworks Inc. 2011 – 2015. All Rights Reserved HDP Search Security •  Apache Solr supports authentication using Kerberos •  Apache Solr supports ACLs for authorization for a collection •  Following permissions are supported through Ranger, at a collection and core level •  Query •  Update •  Admin Why is it important? •  Secure users using Solr •  Apply security policies for Solr Query •  Audit Solr Queries
  • 51. Page 51 © Hortonworks Inc. 2011 – 2015. All Rights Reserved SiLK: Visualize Bigdata Insights •  Bundled with HDP Search RPM package •  Real time interactive analytics –  Dashboards display real time users interaction –  Integration will deliver pre-defined dashboards with most common analytics –  Drill down into the analytics data all the way to a single event or user interaction –  Create time-series to understand patterns and anomalies over time •  Configure personalized dashboards –  Administration interface to build new dashboards with minimal effort –  Create personalized dashboard views based on business unit or job role –  Admin can setup dashboards per their business requirements to enable real time analysis of their products and user activity •  Proactive alerts (Fusion only) –  Configure alerts to notify new events –  Realtime proactive alerts help businesses react in real time •  Security: –  No authentication or authorization support for SilK with HDP Search –  Use Lucidworks Fusion to secure SilK as well
  • 52. Page 52 © Hortonworks Inc. 2011 – 2015. All Rights Reserved New In Apache Spark
  • 53. Page 53 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Made for Data Science All apps need to get predictive at scale and fine granularity Democratizes Machine Learning Spark is doing to ML on Hadoop what Hive did for SQL on Hadoop Elegant Developer APIs DataFrames, Machine Learning, and SQL Realize Value of Data Operating System A key tool in the Hadoop toolbox Community Broad developer, customer and partner interest Spark In HDP Storage YARN: Data Operating System Governance Security Operations Resource Management
  • 54. Page 54 © Hortonworks Inc. 2011 – 2015. All Rights Reserved HDP 2.3 Includes Spark 1.3.1 §  DataFrame API – (Alpha) §  SchemaRDD has become DataFrame API §  New ML algorithms: §  LDA (Latent Dirichlet Allocation), §  GMM (Gaussian Mixture Model) §  & others §  ML Pipeline API in PySpark §  Spark Streaming support for Direct Kafka API gives exactly-once delivery w/o WAL §  Python Kafka API
  • 55. Page 55 © Hortonworks Inc. 2011 – 2015. All Rights Reserved DataFrames: Represents Tabular Data § RDD is a low level abstraction § DataFrames attach schema to RDDs § Allows us to perform aggressive query optimizations § Brings the power of SQL to RDDs!
  • 56. Page 56 © Hortonworks Inc. 2011 – 2015. All Rights Reserved DataFrames are Intuitive RDD︎ DataFrame︎ dept name age Bio H Smith 48 CS A Turing 54 Bio B Jones 43 Phys E Witten 61
  • 57. Page 57 © Hortonworks Inc. 2011 – 2015. All Rights Reserved DataFrame Operations • Select, withColumn, filter etc. • Explode • groupBy • Agg • Join • Window Functions
  • 58. Page 58 © Hortonworks Inc. 2011 – 2015. All Rights Reserved The Data Science Workflow Are Complex What is the question I'm answering? What data will I need? Plan Acquire the data Analyze data quality Reformat Impute etc Clean Data Analyze data Visualize Create model Evaluate results Create features Create report Deploy in Production Publish & Share Start here End here Script VisualizeScript
  • 59. Page 59 © Hortonworks Inc. 2011 – 2015. All Rights Reserved ML Pipelines Transformer︎ Transforms one dataset into another.︎ Estimator︎ Fits model to data.︎ ︎ Pipeline︎ Sequence of stages, consisting of estimators or transformers.︎
  • 60. Page 60 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Tools for Data Science with Spark §  DataFrame – intuitive manipulation of tabular data §  ML Pipeline API – construct ML workflows §  ML algorithms §  Notebooks (iPython, Zeppelin) – Data Exploration, Visualization, Code
  • 61. Page 61 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Apache Atlas
  • 62. Page 62 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Enterprise Data Governance Goals GOALS: Provide a common approach to data governance across all systems and data within the organization •  Transparent Governance standards & protocols must be clearly defined and available to all •  Reproducible Recreate the relevant data landscape at a point in time •  Auditable All relevant events and assets but be traceable with appropriate historical lineage •  Consistent Compliance practices must be consistent ETL/DQ BPM Business Analytics Visualization & Dashboards ERP CRM SCM MDM ARCHIVE Governance Framework
  • 63. Page 63 © Hortonworks Inc. 2011 – 2015. All Rights Reserved DGI becomes Apache Atlas ETL/DQ BPM Business Analytics Visualization & Dashboards ERP CRM SCM MDM ARCHIVE Data Governance Initiative Common Governance Framework 1 ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ApachePig ApacheHive ApacheHBase ApacheAccumulo ApacheSolr ApacheSpark ApacheStorm TWO Requirements 1.  Hadoop must snap in to the existing frameworks and be a good citizen 2.  Hadoop must also provide governance within its own stack of technologies A group of companies dedicated to meeting these requirements in the open Major Bank
  • 64. Page 64 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Data Steward Responsibilities include: •  Ensuring Data Integrity & Quality •  Creating Data Standards •  Ensure Data Lineage Hadoop Data Governance for the Data Steward Resolve issues before they occur Scalable Metadata Service Business modeling with industry-specific vocabulary Extend visibility into HDFS path REST API Hive Integration Leverage existing metadata with import/ export capability Enhanced User Interface Hive table lineage and Search DSL
  • 65. Page 65 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Apache Atlas Metadata Services •  Business Taxonomy - classification •  Operational Data – Model for Hive: DB, Tables, Col, •  Centralized location for all metadata inside HDP •  Single Interface point for Metadata Exchange with platforms outside of HDP. •  Search & Prescriptive Lineage – Model and Audit Apache Atlas Hive Ranger Falcon Kafka Storm
  • 66. Page 66 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Apache Atlas Apache Atlas Overview Taxonomy Knowledge store categorized with appropriate business- oriented taxonomy •  Data sets & objects •  Tables / Columns •  Logical context •  Source, destination Support exchange of metadata between foundation components and third-party applications/governance tools Leverages existing Hadoop metastores Audit Store Policy Engine Data Lifecycle Management Security REST API Services Search Lineage Exchange Healthcare HIPAA HL7 Financial SOX Dodd-Frank Custom CWM Retail PCI PII Other Knowledge Store ModelsType-System Policy RulesTaxonomies
  • 67. Page 67 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Apache Atlas Knowledge Store Apache Atlas RESTful interface •  Extensible enterprise classification of data assets, relationships and policies organized in a meaningful way -- aligned to business organization. •  Supports exploration via user interface •  Supports extensibility via API and CLI exposure Audit Store ModelsType-System Policy RulesTaxonomies Policy Engine Data Lifecycle Management Security REST API Services Search Lineage Exchange Healthcare HIPAA HL7 Financial SOX Dodd-Frank Custom CWM Retail PCI PII Other
  • 68. Page 68 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Apache Atlas Knowledge Store Apache Atlas Overview Search & Lineage (Browse) •  Pre-defined navigation paths to explore the data classification and audit information •  Text-based search features locates relevant data and audit event across Data Lake quickly and accurately •  Browse visualization of data set lineage allowing users to drill-down into operational, security, and provenance related information •  SQL like DSL – domain specific language Audit Store ModelsType-System Policy RulesTaxonomies Policy Engine Data Lifecycle Management Security REST API Services Search Lineage Exchange Healthcare HIPAA HL7 Financial SOX Dodd-Frank Custom CWM Retail PCI PII Other Lineage
  • 69. Page 69 © Hortonworks Inc. 2011 – 2015. All Rights Reserved New In Apache Ambari 2.1
  • 70. Page 70 © Hortonworks Inc. 2011 – 2015. All Rights Reserved New in Ambari 2.1 §  Core Platform §  Guided Configs (AMBARI-9794) §  Customizable Dashboards (AMBARI-9792) §  Manual Kerberos Setup (AMBARI-9783) §  Rack Awareness (AMBARI-6646) §  Stack Support §  NFS Gateway, Atlas, Accumulo, others… §  Storm Nimbus HA (AMBARI-10457) §  Ranger HA (AMBARI-10281, AMBARI-10863) §  User Views §  Hive, Pig, Files, Capacity Scheduler §  Ambari Platform §  New OS: RHEL/CentOS 7 (AMBARI-9791) §  New JDKs: Oracle 1.8 (AMBARI-9784) §  Blueprints API §  Host Discovery (AMBARI-10750) §  Views Framework §  Auto-Cluster Configuration (AMBARI-10306) §  Auto-Create Instance (AMBARI-10424)
  • 71. Page 71 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Ambari 2.1 HDP Stack Support Matrix Support for HDP 2.3 and HDP 2.2 Deprecated Support for HDP 2.1 and HDP 2.0 •  Plan to remove support for HDP 2.1 and HDP 2.0 in NEXT Ambari release HDP 2.3 HDP 2.2 HDP 2.1 HDP 2.0 Ambari 2.1 deprecated deprecated Ambari 2.0 Ambari 1.7
  • 72. Page 72 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Ambari 2.1 HDP Stack Components HDP 2.3 HDP 2.2 HDP 2.1 HDP 2.0 HDFS, YARN, MapReduce, Hive, HBase, Pig, ZooKeeper, Oozie, Sqoop Tez, Storm, Falcon, Flume Knox, Slider, Kafka Ranger, Spark, Phoenix Accumulo, NFS Gateway, Mahout, DataFu, Atlas NEW! Ambari 2.1
  • 73. Page 73 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Ambari 2.1 HDP Stack High Availability HDP Stack Mode Ambari 2.0 Ambari 2.1 HDFS: NameNode HDP 2.0+ Active/ Standby YARN: ResourceManager HDP 2.1+ Active/ Standby HBase: HBaseMaster HDP 2.1+ Multi-master Hive: HiveServer2 HDP 2.1+ Multi-instance Hive: Hive Metastore HDP 2.1+ Multi-instance Hive: WebHCat Server HDP 2.1+ Multi-instance Oozie: Oozie Server HDP 2.1+ Multi-instance Storm: Nimbus Server HDP 2.3 Multi-instance Ranger: AdminServer HDP 2.3 Multi-instance
  • 74. Page 74 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Ambari 2.1 JDK Support Important: If you plan on installing HDP 2.2 or earlier with Ambari 2.1, be sure to use JDK 1.7. Important: If you are using JDK 1.6, you must switch to JDK 1.7 before upgrading to Ambari 2.1 HDP 2.3 HDP 2.2 HDP 2.1 HDP 2.0 JDK 1.8 JDK 1.7 JDK 1.6 *
  • 75. Page 75 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Ambari 2.1 Platform Support RHEL 7 RHEL 6 RHEL 5 SLES 11 Ubuntu 12 Ubuntu 14 Debian 7 Ambari 2.1 M10 Ambari 2.1 GA Ambari 2.0 deprecated •  Add RHEL/CentOS/Oracle Linux 7 support •  Removed RHEL/CentOS/Oracle Linux 5 support •  Ubuntu + Debian NOT AVAILABLE until first Ambari 2.1 and HDP 2.3 maint. releases!!!
  • 76. Page 76 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Ambari 2.1 Database Support Ambari 2.1 + HDP 2.3 added support for Oracle 12c Ambari 2.1 DB: SQL Server *Tech Preview*
  • 77. Page 77 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Blueprints Challenge Today •  Today: Blueprints need ALL VMs available to provision cluster •  This can be a challenge when trying to build a large cluster, especially in Cloud environments •  Blueprints Host Discovery feature allows you to provision cluster with all, some or no hosts •  When Hosts come online and Agents register with Ambari, Blueprints will automatically put the hosts into the cluster
  • 78. Page 78 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Blueprints Host Discovery (AMBARI-10750) Ambari POST /api/v1/clusters/MyCluster/hosts [ { "blueprint" : "single-node-hdfs-test2", "host_groups" :[ { "host_group" : "slave", "host_count" : 3, "host_predicate" : "Hosts/cpu_count>1” } ] } ]
  • 79. Page 79 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Guided Configurations •  Improved layout and grouping of configurations •  New UI controls to make it easier to set values •  Better recommendations and cross-service dependency checks •  Implemented for HDFS, YARN, HBase and Hive •  Driven by Stack definition
  • 80. Page 80 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Alert Changes Alerts Log (AMBARI-10249) •  Alert state changes are written to /var/log/ambari-server/ambari-alerts.log Script-based Alert Notifications (AMBARI-9919) •  Define a custom script-based notification dispatcher •  Executed on alert state changes •  Only available via API 2015-07-13 14:58:03,744 [OK] [ZOOKEEPER] [zookeeper_server_process] (ZooKeeper Server Process) TCP OK - 0.000s response on port 2181 2015-07-13 14:58:03,768 [OK] [HDFS] [datanode_process_percent] (Percent DataNodes Available) affected: [0], total: [1]
  • 81. Page 81 © Hortonworks Inc. 2011 – 2015. All Rights Reserved HDFS Topology Script + Host Mappings Set Rack ID from Ambari Ambari generates + distributes topology script with mappings file Sets core-site “net.topology.script.file.name” property If you modify Rack ID HDFS, YARN
  • 82. Page 82 © Hortonworks Inc. 2011 – 2015. All Rights Reserved New User Views Capacity Scheduler View Browse + manage YARN queues Tez View View information related to Tez jobs that are executing on the cluster.
  • 83. Page 83 © Hortonworks Inc. 2011 – 2015. All Rights Reserved New User Views Pig View Author and execute Pig Scripts. Hive View Author, execute and debug Hive queries. Files View Browse HDFS file system.
  • 84. Page 84 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Separate Ambari Servers •  For Hadoop Operators: Deploy Views in an Ambari Server that is managing a Hadoop cluster •  For Data Workers: Run Views in a “standalone” Ambari Server Ambari Server HDP  CLUSTER   Store  &  Process   Ambari Server Operators manage the cluster, may have Views deployed Data Workers use the cluster and use a “standalone” Ambari Server for Views
  • 85. Page 85 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Views <-> Cluster Communications HDP CLUSTER Ambari DB LDAP AuthN proxy Ambari Server Ambari Server Ambari Server Deployed Views talk with cluster using REST APIs (as applicable) Important: It is NOT a requirement to operate your cluster with Ambari to use Views with your cluster.
  • 86. Page 86 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Upgrading Ambari 2.1 Preparing Stop Ambari Upgrade Ambari Server + Agents Upgrade Ambari Schema On the host running Ambari Server, upgrade the Ambari Server database schema. Complete + Start Complete any post- upgrade tasks (such as LDAP setup, database driver setup). Perform the preparation steps, which include making backups of critical cluster metadata. On all hosts in the cluster, stop the Ambari Server and Ambari Agents. On the host running Ambari Server, upgrade the Ambari Server. On all hosts in the cluster, upgrade the Ambari Agent.
  • 87. Page 87 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Ambari Upgrade Tips •  After Ambari upgrade, you will see prompts to restart services. Because of all new guided configurations, Ambari has added the new configurations to Services. •  Review the changes by comparing config versions. •  Use the config filter to identify any config issues. •  Do not change to JDK 1.8 until you are running HDP 2.3. •  HDP 2.3 is the ONLY version of HDP that is certified and supported with JDK 1.8. •  Before upgrading to HDP 2.3, you must upgrade to Ambari 2.1 first. •  Be sure your cluster has landed on Ambari 2.1 cleanly and is working properly. •  Recommendation: schedule Ambari upgrade separate from HDP upgrade
  • 88. Page 88 © Hortonworks Inc. 2011 – 2015. All Rights Reserved HDP Upgrade Options HDP 2.2.x HDP 2.2.y HDP 2.3.x HDP 2.3.y or or HDP 2.2.x HDP 2.3.y HDP 2.0/2.1 HDP 2.3.y 2.0/2.1 -> 2.3 MINOR UPGRADE MAINTENANCE UPGRADE 2.2 -> 2.3 MINOR UPGRADE Rolling Upgrade OR Manual “Stop the World” Rolling Upgrade OR Manual “Stop the World” Manual “Stop the World” (not available at GA) (must go HDP 2.2 FIRST)
  • 89. Page 89 © Hortonworks Inc. 2011 – 2015. All Rights Reserved New In Apache Ranger
  • 90. Page 90 © Hortonworks Inc. 2011 – 2015. All Rights Reserved •  Wire encryption in Hadoop •  Native and partner encryption •  Centralized audit reporting w/ Apache Ranger •  Fine grain access control with Apache Ranger Security today in HDP Authorization What can I do? Audit What did I do? Data Protection Can data be encrypted at rest and over the wire? •  Kerberos •  API security with Apache Knox Authentication Who am I/prove it? HDP2.3 Centralized Security Administration w/ Ranger EnterpriseServices:Security
  • 91. Page 91 © Hortonworks Inc. 2011 – 2015. All Rights Reserved HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Security items planned in HDP 2.3 New Components Support •  Ranger to support authorization and auditing for Solr, Kafka and Yarn Extending Security •  Hooks for creating dynamic policy conditions •  Protect metadata in Hive •  Introduce Ranger KMS to support HDFS Transparent Encryption –  UI to manage policies for key management Auditing changes •  Ranger to support queries for audit stored in HDFS using Solr •  Optimization of auditing at source
  • 92. Page 92 © Hortonworks Inc. 2011 – 2015. All Rights Reserved HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Security items planned in HDP 2.3 Extensible Architecture •  Pluggable architecture for Ranger – Ranger Stacks •  Config driven new components addition – Knox Stacks Enterprise Readiness •  Knox to support LDAP caching •  Knox to support 2 way SSL queries •  Ranger to support PostGres and MS-SQL DB for storing policy data •  Ranger permission changes
  • 93. Page 93 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Kafka Security •  Kafka now supports authentication using Kerberos •  Kafka also supports ACLs for authorization for a topic per user/group •  Following permissions are supported through Ranger –  Publish –  Consume –  Create –  Delete –  Configure –  Describe –  Replicate –  Connect
  • 94. Page 94 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Solr Security §  Apache Solr now supports authentication using Kerberos §  Apsche Solr also supports ACLs for authorization for a collection §  Following permissions are supported through Ranger, at a collection level §  Query §  Update §  Admin
  • 95. Page 95 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Yarn Integration •  Yarn supports ACL for queue submission •  Ranger now integrated with Yarn RM to manage these permissions from Ranger •  Following permissions are supported through Ranger •  Submit-app •  Admin-queue
  • 96. Page 96 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Dynamic Policy Conditions •  Currently Ranger supports static “role” based policy controls •  Users are looking for dynamic attributes such as geo, time and data attributes to drive policy decisions •  Ranger has introduced for hooks for these dynamic conditions
  • 97. Page 97 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Dynamic Policy Hooks - Config •  Conditions can be added as part of service definition Conditions can vary by service (HDFS, Hive etc)
  • 98. Page 98 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Protect Metadata in Hiveserver2 §  In Hive, metadata listing can be protected by underlying permissions §  Following commands are protected §  Show Databases §  Show Tables §  Describe table §  Show Columns
  • 99. Page 99 © Hortonworks Inc. 2011 – 2015. All Rights Reserved 1   °   °   °   °   °   °   °   °   °   °   °   °   °   N  °   HDFS Transparent Encryption DATA    ACCESS       DATA    MANAGEMENT   1   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   SECURITY      YARN   HDFS  Client       °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °  HDFS     (Hadoop  Distributed  File  System)   Encryp5on  Zone     (acributes  -­‐  EZKey  ID,  version)   Encrypted  File   (acributes  -­‐  EDEK,  IV)   Name  Node   KeyProvider   API   KeyProvider   API   Key  Management   System  (KMS)   KeyProvider  API   EDEK   DEK   Crypto  Stream     (r/w  with  DEK)   DEKs   EZKs   Acronym   Descrip-on   EZ   Encryp-on  Zone  (an  HDFS  directory)     EZK   Encryp-on  Zone  Key;  master  key  associated  with  all   files  in  an  EZ   DEK   Data  Encryp-on  Key,  unique  key  associated  with  each   file.    EZ  Key  used  to  generate  DEK   EDEK     Encrypted  DEK,  Name  Node  only  has  access  to   encrypted  DEK.       IV   Ini-aliza-on  Vector   EDEK   EDEK   Open source KMS based on file level storage. HDP 2.2
  • 100. Page 100 © Hortonworks Inc. 2011 – 2015. All Rights Reserved HDFS Encryption in HDP 2.3 1   °   °   °   °   °   °   °   °   °   °   °   °   °   N  °   DATA    ACCESS       DATA    MANAGEMENT   1   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   Ranger  KMS      YARN   HDFS  Client       °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °  HDFS     (Hadoop  Distributed  File  System)   Encryp5on  Zone     (acributes  -­‐  EZKey  ID,  version)   Encrypted  File   (acributes  -­‐  EDEK,  IV)   Name  Node   KeyProvider   API   KeyProvider   API   KeyProvider  API   EDEK   DEK   Crypto  Stream     (r/w  with  DEK)   DEKs   EZKs   EDEK   EDEK   DB Storage HDP 2.3
  • 101. Page 101 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Audit setup in HDP 2.2 – Simplified View Hadoop Component Ranger Administration Portal Ranger Audit Query Ranger Policy DB Ranger Plugin RDBMS HDFS
  • 102. Page 102 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Hadoop Component Audit setup in HDP 2.3 – Solr Based Query Ranger Administration Portal Ranger Audit Query Ranger Policy DB Ranger Plugin HDFS RDBMS Why is it important? •  Scalable approach •  Remove dependency on DB for audit •  Ability to use banana for dashboards
  • 103. Page 103 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Lab https://github.com/abajwa-hw/hdp22-hive-streaming/blob/ master/LAB-STEPS.md
  • 104. Page 104 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Lab Overview §  Tenants §  Groups - IT & Marketing §  Users – it1 (IT) & mktg1 (Marketing) §  Responsibility §  IT – Onboard Data & Manage Security §  Marketing – Analyze Data §  Lab Environment §  Using HDP 2.3 Sandbox §  Linux and Ranger users it1 and mktg1 pre-created §  Global Allow policy set in Ranger
  • 105. Page 105 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Lab Steps §  Step 1 §  Create hdfs directories for users it1 and mktg1 §  Step 2 §  Disable Ranger Global Allow Policy §  Enable hdfs & hive permissions for it1 §  Step 3 §  Create interactive and batch queues in YARN §  Assign user it1 to batch queue and mktg1 to default queue §  Step 4 §  Create ambari users it1 and mktg1 and enable hive views §  Step 5 §  Load data at it1 §  Step 6 §  Enable table access for mkt1 §  Step 7 §  Query Data as mkt1 §  --
  • 106. Page 106 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Thank You Page 106 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
  • 107. Page 107 © Hortonworks Inc. 2011 – 2015. All Rights Reserved This presentation contains forward-looking statements involving risks and uncertainties. Such forward- looking statements in this presentation generally relate to future events, our ability to increase the number of support subscription customers, the growth in usage of the Hadoop framework, our ability to innovate and develop the various open source projects that will enhance the capabilities of the Hortonworks Data Platform, anticipated customer benefits and general business outlook. In some cases, you can identify forward-looking statements because they contain words such as “may,” “will,” “should,” “expects,” “plans,” “anticipates,” “could,” “intends,” “target,” “projects,” “contemplates,” “believes,” “estimates,” “predicts,” “potential” or “continue” or similar terms or expressions that concern our expectations, strategy, plans or intentions. You should not rely upon forward-looking statements as predictions of future events. We have based the forward-looking statements contained in this presentation primarily on our current expectations and projections about future events and trends that we believe may affect our business, financial condition and prospects. We cannot assure you that the results, events and circumstances reflected in the forward-looking statements will be achieved or occur, and actual results, events, or circumstances could differ materially from those described in the forward-looking statements. The forward-looking statements made in this prospectus relate only to events as of the date on which the statements are made and we undertake no obligation to update any of the information in this presentation. Trademarks Hortonworks is a trademark of Hortonworks, Inc. in the United States and other jurisdictions. Other names used herein may be trademarks of their respective owners. Page 107 © Hortonworks Inc. 2011 – 2015. All Rights Reserved