SlideShare a Scribd company logo
1 of 33
Download to read offline
1 © Hortonworks Inc. 2011–2018. All rights reserved
Hadoop Operations –
Past, Present, and Future
Santhosh B Gowda
Feb 2019
2 © Hortonworks Inc. 2011–2018. All rights reserved
Disclaimer
This document may contain product features and technology directions that are under
development, may be under development in the future or may ultimately not be developed.
Project capabilities are based on information that is publicly available within the Apache Software
Foundation project websites ("Apache"). Progress of the project capabilities can be tracked from
inception to release through Apache, however, technical feasibility, market demand, user
feedback and the overarching Apache Software Foundation community development process can
all effect timing and final delivery.
This document’s description of these features and technology directions does not represent a
contractual commitment, promise or obligation from Hortonworks to deliver these features in
any generally available product.
Product features and technology directions are subject to change, and must not be included in
contracts, purchase orders, or sales agreements of any kind.
Since this document contains an outline of general product development plans, customers should
not rely upon it when making purchasing decisions.
3 © Hortonworks Inc. 2011–2018. All rights reserved
Agenda
• Hadoop Operations: Ambari
• Hadoop Operations: Data Challenge
• Cloud Key Considerations
• Cloudbreak
• What is Cloudbreak ?
• Custom Images
• Kerberos Security
• Recipes
• Auto Scaling
4 © Hortonworks Inc. 2011–2018. All rights reserved
What Is Apache Ambari?
A completely open source
management platform for
provisioning, managing,
monitoring and securing
Apache Hadoop clusters.
Apache Ambari takes the
guesswork out of operating
Hadoop.
5 © Hortonworks Inc. 2011–2018. All rights reserved
Hadoop Operations - Ambari
Simplified Installation,
Configuration and Management
Centralized Security Setup
Full Visibility into Cluster Health
Highly Extensible and
Customizable
• Wizard-driven and automated cluster provisioning
• Smart Configurations and Cluster Recommendations
• Automated Rolling and Express cluster upgrades
• Reduce complexity to administer security across the
platform
• Automate setup Kerberos
• Simplify the configuration of Apache Ranger
• Predefined alerts based on operational best practices
• Advanced metrics visualization with Grafana
• Integrated with SmartSense for proactive issues prevention
• Seamlessly fit into your enterprise environment
• Bring custom Services under management via Ambari
Stacks
• Customize the UI with Ambari Views
6 © Hortonworks Inc. 2011–2018. All rights reserved
Early Adopters
Ambari
HDFS
Atlas, Ranger,
Metastore, Knox
Hive Spark
YARN
10101
10101010101
010101010101010
Public Cloud Storage
Public Cloud
Compute
Large Shared Workloads, supported by Shared
Services, On-Premise
Ambari
HDFS
Atlas, Ranger,
Metastore, Knox
Hive Spark
YARN
10101
10101010101
01010101010101
01010101010101010
10
Long-Running Cluster on Cloud IaaS
10101
10101010101
01010101010101
0101010101010101010
7 © Hortonworks Inc. 2011–2018. All rights reserved
Hadoop Operations: Data Challenge
• Data is becoming more and more distributed…
• Across data center and cloud environments…
• Accessed using multi- and single-workload clusters…
• But must be discoverable and accessed by all who seek it.
Cluster
Cluster
Cluster
Cluster
ClusterCluster ClusterClusterClusterClusterCluster ClusterClusterCluster
DATA
CENTER CLOUDS
The Virtual Data Lake
Business User
Very difficult to find data
(leading to inefficient use of time)
Platform Operator
Hard to secure and hard to
operate (can be time consuming
and prone to error)
8 © Hortonworks Inc. 2011–2018. All rights reserved
Cloud: Key Considerations
• Cloud is infrastructure… need a Data Strategy
• Hybrid (on-premise & cloud) requirements are real.
• Multi-Cloud (i.e. portability) is a key emerging requirement
• Logistics & Physics
• Regulatory & Compliance
• Economic arbitrage
• Consistent and familiar Security & Governance across on-premise & cloud environments
• Free movement of data, regardless of origin or destination
• Global data catalog, regardless of location
9 © Hortonworks Inc. 2011–2018. All rights reserved
Data Management across On-Prem & Multi-Cloud
Large Shared Workloads, supported by
Shared Services, On-Premise
Ambari
HDFS
Atlas, Ranger,
Metastore, Knox
Hive Spark
YARN
10101
10101010101
01010101010101
010101010101010
Multiple Ephemeral Workloads,
supported by Shared Services, Multi-
Cloud.
Hortonworks DataPlane Service
Public Cloud A
Storage
Public Cloud A
Compute
Atlas, Ranger, Metastore, Knox
Hive LLAP
Ambari Ambari Ambari
NiFi
Spark
Cloudbreak
YARN YARN
Public Cloud B
Storage
Public Cloud B
Compute
Atlas, Ranger, Metastore, Knox
Hive LLAP
Ambari Ambari Ambari
NiFi
Spark
Cloudbreak
YARN YARN
Multiple Ephemeral Workloads,
supported by Shared Services, Multi-
Cloud.
10101
10101010101
01010101010101
010101010101010
1010
10101
10101010101
01010101010101
010101010101010
1010
10 © Hortonworks Inc. 2011–2018. All rights reserved
Hortonworks: Architecting and Optimizing for the Cloud
CLOUD STORAGE WORKLOADS
Durable Ephemeral
When data resides in cloud object
stores (e.g. Amazon S3), Hadoop
optimizes reads/writes and acts as
an intermediate cache to increase
performance and decrease latency.
Metastore
SCHEMA
Long Running
Security access to workload
clusters via a Protected Gateway
enabled for AuthN and HTTPS.
Define your data schema, security
policies, and metadata catalog
once for your ephemeral and
always-on workloads.
Atlas
CATALOG
Ranger
POLICY
SHARED DATA LAKE SERVICES
11 © Hortonworks Inc. 2011–2018. All rights reserved
Cloudbreak
12 © Hortonworks Inc. 2011–2018. All rights reserved
What Is Cloudbreak ?
Cloudbreak is a tool for provisioning Hadoop
clusters on any cloud infrastructure
Simplified cluster provisioning - prescriptive
setup, simple automation
13 © Hortonworks Inc. 2011–2018. All rights reserved
Cloudbreak: Harness the Agility of Cloud with Ease
Cloudbreak
• Declarative workload
provisioning across
multiple cloud providers
• Flexible topologies and
security configuration
options
• DevOps friendly, easy setup
and simple to automate
• Built-in elasticity and auto-
scaling
• Prescriptive integration
with cloud services
14 © Hortonworks Inc. 2011–2018. All rights reserved
Cloudbreak Building Blocks
• Cloud Credentials
• Ambari Blueprints
• Auto Scaling
• Custom Recipes
• Custom Images
• Network
• Gateway
• Kerberos Security
• Dynamic Blueprints
• Cloud Storage
Simple and Flexible Prescriptive Secure
15 © Hortonworks Inc. 2011–2018. All rights reserved
Custom Images
16 © Hortonworks Inc. 2011–2018. All rights reserved
Background: Cloudbreak
1. Cloudbreak creates VM instances using a default base image.
2. Cloudbreak installs Ambari on a VM instance.
3. Cloudbreak instructs Ambari to install a cluster on the remaining VM instances.
Cloudbreak
Node
VM
Node
VM
Node
VM
Node
VM
Node
VM
Node
VM
Cluster
17 © Hortonworks Inc. 2011–2018. All rights reserved
Custom Images Overview
Create the
Custom Image
Register the
Custom Image
Use the
Custom Image
when Creating
a Cluster
1 2 3
18 © Hortonworks Inc. 2011–2018. All rights reserved
Recipes
19 © Hortonworks Inc. 2011–2018. All rights reserved
Background: Recipes
• Cloudbreak lets you provision cluster using Ambari Blueprint however not all use-cases
can be addressed.
• Install additional software.
• System config changes.
• A recipe is a script that runs on all nodes of a selected node group at a specific time.
• Support for bash and python scripts.
• Available hooks
• Pre-ambari-start
• Post-ambari-start
• Post-cluster-install
• Pre-termination
20 © Hortonworks Inc. 2011–2018. All rights reserved
Cloudbreak: Add Recipes
• Cluster Extensions > Recipes > Create
• Add recipe as File, Url or Text
21 © Hortonworks Inc. 2011–2018. All rights reserved
Cloudbreak: Add Recipes
• Clusters > Create Cluster >
Cluster Extensions
22 © Hortonworks Inc. 2011–2018. All rights reserved
Kerberos Security
23 © Hortonworks Inc. 2011–2018. All rights reserved
Background: Kerberos
• Strongly authenticating and establishing a user’s identity is the basis for secure access in
Hadoop. Users need to be able to reliably “identify” themselves and then have that
identity propagated throughout the Hadoop cluster.
• Once this is done, those users can access resources (such as files or directories) or
interact with the cluster (like running MapReduce jobs).
• Besides users, Hadoop cluster resources themselves (such as Hosts and Services) need
to authenticate with each other to avoid potential malicious systems or daemon’s
“posing as” trusted components of the cluster to gain access to data.
25 © Hortonworks Inc. 2011–2018. All rights reserved
Cloudbreak: Enable Kerberos Security
• Create Cluster > Security > Advanced
• [ ] Enable Kerberos Security
26 © Hortonworks Inc. 2011–2018. All rights reserved
Options: Use Existing KDC or Use Test KDC
Use Existing
KDC
Use Test KDC
Advanced
Basic
- Not for production use. For testing and
evaluation purposes only.
- Installs and configures an MIT KDC on the
master node.
- Configures the cluster to leverage that
KDC.
- Provide basic information
about your existing KDC.
- Ambari Kerberos descriptors
are generated automatically.
- Provide basic information
about your existing KDC.
- Provide your own Ambari
Kerberos descriptors.
27 © Hortonworks Inc. 2011–2018. All rights reserved
Auto Scaling
28 © Hortonworks Inc. 2011–2018. All rights reserved
Auto-Scaling
• Alerts: Create metric or time-based alerts for cluster scaling
• Policies: Scaling policies adjust cluster size based on activity and workload alerts
• General Configurations: Boundaries and cooldown period
29 © Hortonworks Inc. 2011–2018. All rights reserved
Auto-Scaling Time-Based Alert
Fire at 10:15 am everyday
30 © Hortonworks Inc. 2011–2018. All rights reserved
Auto-Scaling Metric-Based Alert
Fire after NodeManagers are
CRITICAL for 10 minutes
31 © Hortonworks Inc. 2011–2018. All rights reserved
Auto-Scaling Policies
• Define the Scale Adjustment (Node Count, Percentage, Exact)
• Select the Host Group (to Scale)
• Select Alert (which when fired, executes the Policy)
37 © Hortonworks Inc. 2011–2018. All rights reserved
Learn More
• Try Ambari
• https://docs.hortonworks.co
m/HDPDocuments/Ambari/A
mbari-2.7.0.0/index.html
• Try Cloudbreak 2.8 (TP)
• https://docs.hortonworks.co
m/HDPDocuments/Cloudbre
ak/Cloudbreak-
2.8.0/index.html
38 © Hortonworks Inc. 2011–2018. All rights reserved
Questions ?
39 © Hortonworks Inc. 2011–2018. All rights reserved
Thank you !

More Related Content

What's hot

Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.nextDiscover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.nextHortonworks
 
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...Hortonworks
 
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data PlatformModernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data PlatformHortonworks
 
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...Hortonworks
 
YARN: Future of Data Processing with Apache Hadoop
YARN: Future of Data Processing with Apache HadoopYARN: Future of Data Processing with Apache Hadoop
YARN: Future of Data Processing with Apache HadoopHortonworks
 
Hortonworks sqrrl webinar v5.pptx
Hortonworks sqrrl webinar v5.pptxHortonworks sqrrl webinar v5.pptx
Hortonworks sqrrl webinar v5.pptxHortonworks
 
The Car of the Future - Autonomous, Connected, and Data Centric
The Car of the Future - Autonomous, Connected, and Data CentricThe Car of the Future - Autonomous, Connected, and Data Centric
The Car of the Future - Autonomous, Connected, and Data CentricDataWorks Summit
 
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFSDiscover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFSHortonworks
 
Delivering Apache Hadoop for the Modern Data Architecture
Delivering Apache Hadoop for the Modern Data Architecture Delivering Apache Hadoop for the Modern Data Architecture
Delivering Apache Hadoop for the Modern Data Architecture Hortonworks
 
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache HadoopRescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache HadoopHortonworks
 
Machine Learning Everywhere
Machine Learning EverywhereMachine Learning Everywhere
Machine Learning EverywhereDataWorks Summit
 
Discover HDP 2.1: Apache Solr for Hadoop Search
Discover HDP 2.1: Apache Solr for Hadoop SearchDiscover HDP 2.1: Apache Solr for Hadoop Search
Discover HDP 2.1: Apache Solr for Hadoop SearchHortonworks
 
Hortonworks and Platfora in Financial Services - Webinar
Hortonworks and Platfora in Financial Services - WebinarHortonworks and Platfora in Financial Services - Webinar
Hortonworks and Platfora in Financial Services - WebinarHortonworks
 
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo ClinicBig Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo ClinicDataWorks Summit
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Cécile Poyet
 
Designing data pipelines for analytics and machine learning in industrial set...
Designing data pipelines for analytics and machine learning in industrial set...Designing data pipelines for analytics and machine learning in industrial set...
Designing data pipelines for analytics and machine learning in industrial set...DataWorks Summit
 
Build Big Data Enterprise Solutions Faster on Azure HDInsight
Build Big Data Enterprise Solutions Faster on Azure HDInsightBuild Big Data Enterprise Solutions Faster on Azure HDInsight
Build Big Data Enterprise Solutions Faster on Azure HDInsightDataWorks Summit/Hadoop Summit
 
Discover.hdp2.2.storm and kafka.final
Discover.hdp2.2.storm and kafka.finalDiscover.hdp2.2.storm and kafka.final
Discover.hdp2.2.storm and kafka.finalHortonworks
 
Simplify and Secure your Hadoop Environment with Hortonworks and Centrify
Simplify and Secure your Hadoop Environment with Hortonworks and CentrifySimplify and Secure your Hadoop Environment with Hortonworks and Centrify
Simplify and Secure your Hadoop Environment with Hortonworks and CentrifyHortonworks
 
Enabling the Real Time Analytical Enterprise
Enabling the Real Time Analytical EnterpriseEnabling the Real Time Analytical Enterprise
Enabling the Real Time Analytical EnterpriseHortonworks
 

What's hot (20)

Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.nextDiscover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
 
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
 
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data PlatformModernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
 
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
 
YARN: Future of Data Processing with Apache Hadoop
YARN: Future of Data Processing with Apache HadoopYARN: Future of Data Processing with Apache Hadoop
YARN: Future of Data Processing with Apache Hadoop
 
Hortonworks sqrrl webinar v5.pptx
Hortonworks sqrrl webinar v5.pptxHortonworks sqrrl webinar v5.pptx
Hortonworks sqrrl webinar v5.pptx
 
The Car of the Future - Autonomous, Connected, and Data Centric
The Car of the Future - Autonomous, Connected, and Data CentricThe Car of the Future - Autonomous, Connected, and Data Centric
The Car of the Future - Autonomous, Connected, and Data Centric
 
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFSDiscover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
 
Delivering Apache Hadoop for the Modern Data Architecture
Delivering Apache Hadoop for the Modern Data Architecture Delivering Apache Hadoop for the Modern Data Architecture
Delivering Apache Hadoop for the Modern Data Architecture
 
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache HadoopRescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
 
Machine Learning Everywhere
Machine Learning EverywhereMachine Learning Everywhere
Machine Learning Everywhere
 
Discover HDP 2.1: Apache Solr for Hadoop Search
Discover HDP 2.1: Apache Solr for Hadoop SearchDiscover HDP 2.1: Apache Solr for Hadoop Search
Discover HDP 2.1: Apache Solr for Hadoop Search
 
Hortonworks and Platfora in Financial Services - Webinar
Hortonworks and Platfora in Financial Services - WebinarHortonworks and Platfora in Financial Services - Webinar
Hortonworks and Platfora in Financial Services - Webinar
 
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo ClinicBig Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It!
 
Designing data pipelines for analytics and machine learning in industrial set...
Designing data pipelines for analytics and machine learning in industrial set...Designing data pipelines for analytics and machine learning in industrial set...
Designing data pipelines for analytics and machine learning in industrial set...
 
Build Big Data Enterprise Solutions Faster on Azure HDInsight
Build Big Data Enterprise Solutions Faster on Azure HDInsightBuild Big Data Enterprise Solutions Faster on Azure HDInsight
Build Big Data Enterprise Solutions Faster on Azure HDInsight
 
Discover.hdp2.2.storm and kafka.final
Discover.hdp2.2.storm and kafka.finalDiscover.hdp2.2.storm and kafka.final
Discover.hdp2.2.storm and kafka.final
 
Simplify and Secure your Hadoop Environment with Hortonworks and Centrify
Simplify and Secure your Hadoop Environment with Hortonworks and CentrifySimplify and Secure your Hadoop Environment with Hortonworks and Centrify
Simplify and Secure your Hadoop Environment with Hortonworks and Centrify
 
Enabling the Real Time Analytical Enterprise
Enabling the Real Time Analytical EnterpriseEnabling the Real Time Analytical Enterprise
Enabling the Real Time Analytical Enterprise
 

Similar to Hadoop Operations – Past, Present, and Future

Running Enterprise Workloads in the Cloud
Running Enterprise Workloads in the CloudRunning Enterprise Workloads in the Cloud
Running Enterprise Workloads in the CloudDataWorks Summit
 
Fortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache Knox
Fortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache KnoxFortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache Knox
Fortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache KnoxDataWorks Summit
 
Saving the elephant—now, not later
Saving the elephant—now, not laterSaving the elephant—now, not later
Saving the elephant—now, not laterDataWorks Summit
 
Lessons Learned Running a Container Cloud on Apache Hadoop YARN
Lessons Learned Running a Container Cloud on Apache Hadoop YARNLessons Learned Running a Container Cloud on Apache Hadoop YARN
Lessons Learned Running a Container Cloud on Apache Hadoop YARNBillie Rinaldi
 
Lessons learned running a container cloud on YARN
Lessons learned running a container cloud on YARNLessons learned running a container cloud on YARN
Lessons learned running a container cloud on YARNDataWorks Summit
 
Hello OpenStack, Meet Hadoop
Hello OpenStack, Meet HadoopHello OpenStack, Meet Hadoop
Hello OpenStack, Meet HadoopDataWorks Summit
 
Data in the Cloud Crash Course
Data in the Cloud Crash CourseData in the Cloud Crash Course
Data in the Cloud Crash CourseDataWorks Summit
 
Get most out of Spark on YARN
Get most out of Spark on YARNGet most out of Spark on YARN
Get most out of Spark on YARNDataWorks Summit
 
Data in the Cloud Crash Course
Data in the Cloud Crash CourseData in the Cloud Crash Course
Data in the Cloud Crash CourseDataWorks Summit
 
Hadoop Everywhere & Cloudbreak
Hadoop Everywhere & CloudbreakHadoop Everywhere & Cloudbreak
Hadoop Everywhere & CloudbreakSean Roberts
 
Hortonworks Technical Workshop: HDP everywhere - cloud considerations using...
Hortonworks Technical Workshop:   HDP everywhere - cloud considerations using...Hortonworks Technical Workshop:   HDP everywhere - cloud considerations using...
Hortonworks Technical Workshop: HDP everywhere - cloud considerations using...Hortonworks
 
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...DataWorks Summit
 
Running Enterprise Workloads in the Cloud
Running Enterprise Workloads in the CloudRunning Enterprise Workloads in the Cloud
Running Enterprise Workloads in the CloudDataWorks Summit
 
One Click Hadoop Clusters - Anywhere (Using Docker)
One Click Hadoop Clusters - Anywhere (Using Docker)One Click Hadoop Clusters - Anywhere (Using Docker)
One Click Hadoop Clusters - Anywhere (Using Docker)DataWorks Summit
 
Apache Ambari BOF - OpenStack - Hadoop Summit 2013
Apache Ambari BOF - OpenStack - Hadoop Summit 2013Apache Ambari BOF - OpenStack - Hadoop Summit 2013
Apache Ambari BOF - OpenStack - Hadoop Summit 2013Hortonworks
 
Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015Shravan (Sean) Pabba
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop SecurityDataWorks Summit
 
YARN Containerized Services: Fading The Lines Between On-Prem And Cloud
YARN Containerized Services: Fading The Lines Between On-Prem And CloudYARN Containerized Services: Fading The Lines Between On-Prem And Cloud
YARN Containerized Services: Fading The Lines Between On-Prem And CloudDataWorks Summit
 
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemIntroduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemShivaji Dutta
 

Similar to Hadoop Operations – Past, Present, and Future (20)

Running Enterprise Workloads in the Cloud
Running Enterprise Workloads in the CloudRunning Enterprise Workloads in the Cloud
Running Enterprise Workloads in the Cloud
 
Fortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache Knox
Fortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache KnoxFortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache Knox
Fortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache Knox
 
Saving the elephant—now, not later
Saving the elephant—now, not laterSaving the elephant—now, not later
Saving the elephant—now, not later
 
Lessons Learned Running a Container Cloud on Apache Hadoop YARN
Lessons Learned Running a Container Cloud on Apache Hadoop YARNLessons Learned Running a Container Cloud on Apache Hadoop YARN
Lessons Learned Running a Container Cloud on Apache Hadoop YARN
 
Lessons learned running a container cloud on YARN
Lessons learned running a container cloud on YARNLessons learned running a container cloud on YARN
Lessons learned running a container cloud on YARN
 
Hello OpenStack, Meet Hadoop
Hello OpenStack, Meet HadoopHello OpenStack, Meet Hadoop
Hello OpenStack, Meet Hadoop
 
Data in the Cloud Crash Course
Data in the Cloud Crash CourseData in the Cloud Crash Course
Data in the Cloud Crash Course
 
Get most out of Spark on YARN
Get most out of Spark on YARNGet most out of Spark on YARN
Get most out of Spark on YARN
 
Data in the Cloud Crash Course
Data in the Cloud Crash CourseData in the Cloud Crash Course
Data in the Cloud Crash Course
 
Hadoop Everywhere & Cloudbreak
Hadoop Everywhere & CloudbreakHadoop Everywhere & Cloudbreak
Hadoop Everywhere & Cloudbreak
 
Hortonworks Technical Workshop: HDP everywhere - cloud considerations using...
Hortonworks Technical Workshop:   HDP everywhere - cloud considerations using...Hortonworks Technical Workshop:   HDP everywhere - cloud considerations using...
Hortonworks Technical Workshop: HDP everywhere - cloud considerations using...
 
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
 
Hadoop security
Hadoop securityHadoop security
Hadoop security
 
Running Enterprise Workloads in the Cloud
Running Enterprise Workloads in the CloudRunning Enterprise Workloads in the Cloud
Running Enterprise Workloads in the Cloud
 
One Click Hadoop Clusters - Anywhere (Using Docker)
One Click Hadoop Clusters - Anywhere (Using Docker)One Click Hadoop Clusters - Anywhere (Using Docker)
One Click Hadoop Clusters - Anywhere (Using Docker)
 
Apache Ambari BOF - OpenStack - Hadoop Summit 2013
Apache Ambari BOF - OpenStack - Hadoop Summit 2013Apache Ambari BOF - OpenStack - Hadoop Summit 2013
Apache Ambari BOF - OpenStack - Hadoop Summit 2013
 
Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop Security
 
YARN Containerized Services: Fading The Lines Between On-Prem And Cloud
YARN Containerized Services: Fading The Lines Between On-Prem And CloudYARN Containerized Services: Fading The Lines Between On-Prem And Cloud
YARN Containerized Services: Fading The Lines Between On-Prem And Cloud
 
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemIntroduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystem
 

More from DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 

Recently uploaded (20)

A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 

Hadoop Operations – Past, Present, and Future

  • 1. 1 © Hortonworks Inc. 2011–2018. All rights reserved Hadoop Operations – Past, Present, and Future Santhosh B Gowda Feb 2019
  • 2. 2 © Hortonworks Inc. 2011–2018. All rights reserved Disclaimer This document may contain product features and technology directions that are under development, may be under development in the future or may ultimately not be developed. Project capabilities are based on information that is publicly available within the Apache Software Foundation project websites ("Apache"). Progress of the project capabilities can be tracked from inception to release through Apache, however, technical feasibility, market demand, user feedback and the overarching Apache Software Foundation community development process can all effect timing and final delivery. This document’s description of these features and technology directions does not represent a contractual commitment, promise or obligation from Hortonworks to deliver these features in any generally available product. Product features and technology directions are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind. Since this document contains an outline of general product development plans, customers should not rely upon it when making purchasing decisions.
  • 3. 3 © Hortonworks Inc. 2011–2018. All rights reserved Agenda • Hadoop Operations: Ambari • Hadoop Operations: Data Challenge • Cloud Key Considerations • Cloudbreak • What is Cloudbreak ? • Custom Images • Kerberos Security • Recipes • Auto Scaling
  • 4. 4 © Hortonworks Inc. 2011–2018. All rights reserved What Is Apache Ambari? A completely open source management platform for provisioning, managing, monitoring and securing Apache Hadoop clusters. Apache Ambari takes the guesswork out of operating Hadoop.
  • 5. 5 © Hortonworks Inc. 2011–2018. All rights reserved Hadoop Operations - Ambari Simplified Installation, Configuration and Management Centralized Security Setup Full Visibility into Cluster Health Highly Extensible and Customizable • Wizard-driven and automated cluster provisioning • Smart Configurations and Cluster Recommendations • Automated Rolling and Express cluster upgrades • Reduce complexity to administer security across the platform • Automate setup Kerberos • Simplify the configuration of Apache Ranger • Predefined alerts based on operational best practices • Advanced metrics visualization with Grafana • Integrated with SmartSense for proactive issues prevention • Seamlessly fit into your enterprise environment • Bring custom Services under management via Ambari Stacks • Customize the UI with Ambari Views
  • 6. 6 © Hortonworks Inc. 2011–2018. All rights reserved Early Adopters Ambari HDFS Atlas, Ranger, Metastore, Knox Hive Spark YARN 10101 10101010101 010101010101010 Public Cloud Storage Public Cloud Compute Large Shared Workloads, supported by Shared Services, On-Premise Ambari HDFS Atlas, Ranger, Metastore, Knox Hive Spark YARN 10101 10101010101 01010101010101 01010101010101010 10 Long-Running Cluster on Cloud IaaS 10101 10101010101 01010101010101 0101010101010101010
  • 7. 7 © Hortonworks Inc. 2011–2018. All rights reserved Hadoop Operations: Data Challenge • Data is becoming more and more distributed… • Across data center and cloud environments… • Accessed using multi- and single-workload clusters… • But must be discoverable and accessed by all who seek it. Cluster Cluster Cluster Cluster ClusterCluster ClusterClusterClusterClusterCluster ClusterClusterCluster DATA CENTER CLOUDS The Virtual Data Lake Business User Very difficult to find data (leading to inefficient use of time) Platform Operator Hard to secure and hard to operate (can be time consuming and prone to error)
  • 8. 8 © Hortonworks Inc. 2011–2018. All rights reserved Cloud: Key Considerations • Cloud is infrastructure… need a Data Strategy • Hybrid (on-premise & cloud) requirements are real. • Multi-Cloud (i.e. portability) is a key emerging requirement • Logistics & Physics • Regulatory & Compliance • Economic arbitrage • Consistent and familiar Security & Governance across on-premise & cloud environments • Free movement of data, regardless of origin or destination • Global data catalog, regardless of location
  • 9. 9 © Hortonworks Inc. 2011–2018. All rights reserved Data Management across On-Prem & Multi-Cloud Large Shared Workloads, supported by Shared Services, On-Premise Ambari HDFS Atlas, Ranger, Metastore, Knox Hive Spark YARN 10101 10101010101 01010101010101 010101010101010 Multiple Ephemeral Workloads, supported by Shared Services, Multi- Cloud. Hortonworks DataPlane Service Public Cloud A Storage Public Cloud A Compute Atlas, Ranger, Metastore, Knox Hive LLAP Ambari Ambari Ambari NiFi Spark Cloudbreak YARN YARN Public Cloud B Storage Public Cloud B Compute Atlas, Ranger, Metastore, Knox Hive LLAP Ambari Ambari Ambari NiFi Spark Cloudbreak YARN YARN Multiple Ephemeral Workloads, supported by Shared Services, Multi- Cloud. 10101 10101010101 01010101010101 010101010101010 1010 10101 10101010101 01010101010101 010101010101010 1010
  • 10. 10 © Hortonworks Inc. 2011–2018. All rights reserved Hortonworks: Architecting and Optimizing for the Cloud CLOUD STORAGE WORKLOADS Durable Ephemeral When data resides in cloud object stores (e.g. Amazon S3), Hadoop optimizes reads/writes and acts as an intermediate cache to increase performance and decrease latency. Metastore SCHEMA Long Running Security access to workload clusters via a Protected Gateway enabled for AuthN and HTTPS. Define your data schema, security policies, and metadata catalog once for your ephemeral and always-on workloads. Atlas CATALOG Ranger POLICY SHARED DATA LAKE SERVICES
  • 11. 11 © Hortonworks Inc. 2011–2018. All rights reserved Cloudbreak
  • 12. 12 © Hortonworks Inc. 2011–2018. All rights reserved What Is Cloudbreak ? Cloudbreak is a tool for provisioning Hadoop clusters on any cloud infrastructure Simplified cluster provisioning - prescriptive setup, simple automation
  • 13. 13 © Hortonworks Inc. 2011–2018. All rights reserved Cloudbreak: Harness the Agility of Cloud with Ease Cloudbreak • Declarative workload provisioning across multiple cloud providers • Flexible topologies and security configuration options • DevOps friendly, easy setup and simple to automate • Built-in elasticity and auto- scaling • Prescriptive integration with cloud services
  • 14. 14 © Hortonworks Inc. 2011–2018. All rights reserved Cloudbreak Building Blocks • Cloud Credentials • Ambari Blueprints • Auto Scaling • Custom Recipes • Custom Images • Network • Gateway • Kerberos Security • Dynamic Blueprints • Cloud Storage Simple and Flexible Prescriptive Secure
  • 15. 15 © Hortonworks Inc. 2011–2018. All rights reserved Custom Images
  • 16. 16 © Hortonworks Inc. 2011–2018. All rights reserved Background: Cloudbreak 1. Cloudbreak creates VM instances using a default base image. 2. Cloudbreak installs Ambari on a VM instance. 3. Cloudbreak instructs Ambari to install a cluster on the remaining VM instances. Cloudbreak Node VM Node VM Node VM Node VM Node VM Node VM Cluster
  • 17. 17 © Hortonworks Inc. 2011–2018. All rights reserved Custom Images Overview Create the Custom Image Register the Custom Image Use the Custom Image when Creating a Cluster 1 2 3
  • 18. 18 © Hortonworks Inc. 2011–2018. All rights reserved Recipes
  • 19. 19 © Hortonworks Inc. 2011–2018. All rights reserved Background: Recipes • Cloudbreak lets you provision cluster using Ambari Blueprint however not all use-cases can be addressed. • Install additional software. • System config changes. • A recipe is a script that runs on all nodes of a selected node group at a specific time. • Support for bash and python scripts. • Available hooks • Pre-ambari-start • Post-ambari-start • Post-cluster-install • Pre-termination
  • 20. 20 © Hortonworks Inc. 2011–2018. All rights reserved Cloudbreak: Add Recipes • Cluster Extensions > Recipes > Create • Add recipe as File, Url or Text
  • 21. 21 © Hortonworks Inc. 2011–2018. All rights reserved Cloudbreak: Add Recipes • Clusters > Create Cluster > Cluster Extensions
  • 22. 22 © Hortonworks Inc. 2011–2018. All rights reserved Kerberos Security
  • 23. 23 © Hortonworks Inc. 2011–2018. All rights reserved Background: Kerberos • Strongly authenticating and establishing a user’s identity is the basis for secure access in Hadoop. Users need to be able to reliably “identify” themselves and then have that identity propagated throughout the Hadoop cluster. • Once this is done, those users can access resources (such as files or directories) or interact with the cluster (like running MapReduce jobs). • Besides users, Hadoop cluster resources themselves (such as Hosts and Services) need to authenticate with each other to avoid potential malicious systems or daemon’s “posing as” trusted components of the cluster to gain access to data.
  • 24. 25 © Hortonworks Inc. 2011–2018. All rights reserved Cloudbreak: Enable Kerberos Security • Create Cluster > Security > Advanced • [ ] Enable Kerberos Security
  • 25. 26 © Hortonworks Inc. 2011–2018. All rights reserved Options: Use Existing KDC or Use Test KDC Use Existing KDC Use Test KDC Advanced Basic - Not for production use. For testing and evaluation purposes only. - Installs and configures an MIT KDC on the master node. - Configures the cluster to leverage that KDC. - Provide basic information about your existing KDC. - Ambari Kerberos descriptors are generated automatically. - Provide basic information about your existing KDC. - Provide your own Ambari Kerberos descriptors.
  • 26. 27 © Hortonworks Inc. 2011–2018. All rights reserved Auto Scaling
  • 27. 28 © Hortonworks Inc. 2011–2018. All rights reserved Auto-Scaling • Alerts: Create metric or time-based alerts for cluster scaling • Policies: Scaling policies adjust cluster size based on activity and workload alerts • General Configurations: Boundaries and cooldown period
  • 28. 29 © Hortonworks Inc. 2011–2018. All rights reserved Auto-Scaling Time-Based Alert Fire at 10:15 am everyday
  • 29. 30 © Hortonworks Inc. 2011–2018. All rights reserved Auto-Scaling Metric-Based Alert Fire after NodeManagers are CRITICAL for 10 minutes
  • 30. 31 © Hortonworks Inc. 2011–2018. All rights reserved Auto-Scaling Policies • Define the Scale Adjustment (Node Count, Percentage, Exact) • Select the Host Group (to Scale) • Select Alert (which when fired, executes the Policy)
  • 31. 37 © Hortonworks Inc. 2011–2018. All rights reserved Learn More • Try Ambari • https://docs.hortonworks.co m/HDPDocuments/Ambari/A mbari-2.7.0.0/index.html • Try Cloudbreak 2.8 (TP) • https://docs.hortonworks.co m/HDPDocuments/Cloudbre ak/Cloudbreak- 2.8.0/index.html
  • 32. 38 © Hortonworks Inc. 2011–2018. All rights reserved Questions ?
  • 33. 39 © Hortonworks Inc. 2011–2018. All rights reserved Thank you !