SlideShare a Scribd company logo
1 of 16
Hadoop Now, Next & Beyond
Community Driven Enterprise Apache Hadoop
Eric Baldeschwieler, “Eric14”
Hortonworks CTO
@jeric14




© Hortonworks Inc. 2013
Quick History: Hadoop at Yahoo!




Source: http://developer.yahoo.com/blogs/ydn/posts/2013/02/hadoop-at-yahoo-more-than-ever-before/

          © Hortonworks Inc. 2013
Hortonworks Approach to Enterprise Hadoop

Community Driven Enterprise Apache Hadoop
                    Identify and introduce enterprise
                    requirements into the pubic domain

                    Work with the community to advance and
                    incubate open source projects

                    Apply Enterprise Rigor to provide the most
                    stable and reliable distribution



    © Hortonworks Inc. 2013
Making Hadoop Enterprise-Ready
                     OPERATIONAL                     DATA
                       SERVICES                    SERVICES
                             Manage &
                              AMBARI      FLUME    Store, HIVE
                                                   PIG
                             Operate at         Process and       HBASE
                               Scale      SQOOP Access Data
                               OOZIE               HCATALOG

                                                    MAP REDUCE
                                          Distributed
                      HADOOP CORE         Storage & Processing
                                                      HDFS

                                              Enterprise Readiness: HA,
                      PLATFORM SERVICES       DR, Snapshots, Security, …

                                          HORTONWORKS
                                          DATA PLATFORM (HDP)

                         OS / VM           Cloud           Appliance




   © Hortonworks Inc. 2013
HCatalog: Table-level Abstractions

 • Consistency of data models across tools (MapReduce, Pig, HBase and Hive)
 • Accessibility: share data as tables inside and out of Hadoop




                                 HCatalog                       Shared table
                                                                and schema
                                                                management
  • Raw Hadoop data                         Table access        opens the
  • Inconsistent, unknown                   Consistent schema   platform
  • Tool specific access                    REST API




       © Hortonworks Inc. 2013
Ambari: Provision > Manage > Monitor
 A framework for operating Hadoop…with APIs for integration



                               Manage     Monitor




    Provision                       Ambari          Integrate




                                 Hadoop Cluster




     © Hortonworks Inc. 2013
Ambari: Latest Highlights
                                     • Job Diagnostics


                                     • Cluster History


                                     • Instant Insight


                                     • Cluster Navigation


                                     • REST interface
Apache Ambari Dashboard




           © Hortonworks Inc. 2013
See Hadoop > Learn Hadoop > Do Hadoop




      Hands on
                               Full environment
    step-by- step
                                 to evaluate
  tutorials to learn
                                    Hadoop




     © Hortonworks Inc. 2013
Hadoop 2.0 Innovations - YARN

• Focus on scale and innovation
   – Support 10,000+ computer clusters
   – Extensible to encourage innovation




                                                                    Graph Processing
• Next generation execution




                                             MapReduce
   – Improves MapReduce performance




                                                                                       Other
                                                          Tez
• Supports new frameworks beyond
  MapReduce                                 YARN: Cluster Resource Management
   – Low latency, Streaming, Services
   – Do more with a single Hadoop cluster
                                            HDFS         Redundant, Reliable Storage




       © Hortonworks Inc. 2013
Tez on YARN: Going Beyond Batch




                Tez Task



 Tez Optimizes Execution          Always-On Tez Service
  New runtime engine for         Low latency processing for
more efficient data processing   all Hadoop data processing


      © Hortonworks Inc. 2013
Stinger Initiative
• Community initiative around Hive
• Enables Hive to support interactive workloads
• Enhances Hive’s standard SQL interface for Hadoop
• Improves existing tools & preserves investments


 Execution                       Query         File
  Engine                        Planner       Format

                        +                 +              = 100X
    Tez                          Hive         ORC file




      © Hortonworks Inc. 2013
Stinger: Make Hive Fly For All BI Needs
Parameterized Reports                                                       Enterprise Reports

                                            Dashboard / Scorecard




    Visualization                                                              Data Mining

                                                More SQL
                                                     +

                                            100X Faster

                                  Interactive                       Batch

        © Hortonworks Inc. 2013
Knox: Make Hadoop Security Simple

                                Authentication &
                                  Verification
                                    User Store     Hadoop Cluster
                                     KDC, AD,
                                      LDAP




                       {REST}        Knox
 Client                             Gateway




      © Hortonworks Inc. 2013
Hortonworks Data Platform 2.0 Alpha 2

Key New Features                                  Business Value
 –    Hive performance
 –    First distribution to include Tez
                                                  Single Platform Multiple Use
 From HDP 2.0 Alpha                          BATCH           INTERACTIVE            ONLINE
 –    Yarn
 –    Full Stack HA
 –    Snapshots
 –    Disaster Recovery                                      Big Data
                                                Transactions, Interactions, Observations
 –    Rolling Upgrades




          Available today                 http://Hortonworks.com/products

                                                                                           Page 14
             © Hortonworks Inc. 2013
Falcon: Data Lifecycle Management
• New Apache Incubator Project

• Introduced by InMobi, Hortonworks and Yahoo!

• Data Lifecycle Management Framework for Hadoop

• Configure and Manage Workflows & Policies for:

  – Data Movement
  – Disaster Recovery
  – Data Retention



     © Hortonworks Inc. 2013
Join the Community & Get Involved!


                                             • INNOVATE
                               Open Source

    Vendors
                                             • INTEGRATE


                             End Users       • COMMUNICATE




   © Hortonworks Inc. 2013

More Related Content

What's hot

Hadoop World 2011: Mike Olson Keynote Presentation
Hadoop World 2011: Mike Olson Keynote PresentationHadoop World 2011: Mike Olson Keynote Presentation
Hadoop World 2011: Mike Olson Keynote PresentationCloudera, Inc.
 
Контроль зверей: инструменты для управления и мониторинга распределенных сист...
Контроль зверей: инструменты для управления и мониторинга распределенных сист...Контроль зверей: инструменты для управления и мониторинга распределенных сист...
Контроль зверей: инструменты для управления и мониторинга распределенных сист...yaevents
 
Houston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
Houston Hadoop Meetup Presentation by Vikram Oberoi of ClouderaHouston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
Houston Hadoop Meetup Presentation by Vikram Oberoi of ClouderaMark Kerzner
 
Stinger.Next by Alan Gates of Hortonworks
Stinger.Next by Alan Gates of HortonworksStinger.Next by Alan Gates of Hortonworks
Stinger.Next by Alan Gates of HortonworksData Con LA
 
Sgi hadoop clusters_ds_us_4381
Sgi hadoop clusters_ds_us_4381Sgi hadoop clusters_ds_us_4381
Sgi hadoop clusters_ds_us_4381Michael Moretti
 
Cloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a championCloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a championAmeet Paranjape
 
Big data on virtualized infrastucture
Big data on virtualized infrastuctureBig data on virtualized infrastucture
Big data on virtualized infrastuctureDataWorks Summit
 
Delivering Apache Hadoop for the Modern Data Architecture
Delivering Apache Hadoop for the Modern Data Architecture Delivering Apache Hadoop for the Modern Data Architecture
Delivering Apache Hadoop for the Modern Data Architecture Hortonworks
 
Webinar: Productionizing Hadoop: Lessons Learned - 20101208
Webinar: Productionizing Hadoop: Lessons Learned - 20101208Webinar: Productionizing Hadoop: Lessons Learned - 20101208
Webinar: Productionizing Hadoop: Lessons Learned - 20101208Cloudera, Inc.
 
Why Every NoSQL Deployment Should Be Paired with Hadoop Webinar
Why Every NoSQL Deployment Should Be Paired with Hadoop WebinarWhy Every NoSQL Deployment Should Be Paired with Hadoop Webinar
Why Every NoSQL Deployment Should Be Paired with Hadoop WebinarCloudera, Inc.
 
YARN - Strata 2014
YARN - Strata 2014YARN - Strata 2014
YARN - Strata 2014Hortonworks
 
Discover HDP2.1: Apache Storm for Stream Data Processing in Hadoop
Discover HDP2.1: Apache Storm for Stream Data Processing in HadoopDiscover HDP2.1: Apache Storm for Stream Data Processing in Hadoop
Discover HDP2.1: Apache Storm for Stream Data Processing in HadoopHortonworks
 
Processing Big Data
Processing Big DataProcessing Big Data
Processing Big Datacwensel
 
Enabling Diverse Workload Scheduling in YARN
Enabling Diverse Workload Scheduling in YARNEnabling Diverse Workload Scheduling in YARN
Enabling Diverse Workload Scheduling in YARNDataWorks Summit
 
Hadoop summit cloudera keynote_v5
Hadoop summit cloudera keynote_v5Hadoop summit cloudera keynote_v5
Hadoop summit cloudera keynote_v5Cloudera, Inc.
 
Hadoop crashcourse v3
Hadoop crashcourse v3Hadoop crashcourse v3
Hadoop crashcourse v3Hortonworks
 
Deploying Docker applications on YARN via Slider
Deploying Docker applications on YARN via SliderDeploying Docker applications on YARN via Slider
Deploying Docker applications on YARN via SliderHortonworks
 

What's hot (20)

Hadoop World 2011: Mike Olson Keynote Presentation
Hadoop World 2011: Mike Olson Keynote PresentationHadoop World 2011: Mike Olson Keynote Presentation
Hadoop World 2011: Mike Olson Keynote Presentation
 
Контроль зверей: инструменты для управления и мониторинга распределенных сист...
Контроль зверей: инструменты для управления и мониторинга распределенных сист...Контроль зверей: инструменты для управления и мониторинга распределенных сист...
Контроль зверей: инструменты для управления и мониторинга распределенных сист...
 
201305 hadoop jpl-v3
201305 hadoop jpl-v3201305 hadoop jpl-v3
201305 hadoop jpl-v3
 
Houston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
Houston Hadoop Meetup Presentation by Vikram Oberoi of ClouderaHouston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
Houston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
 
Stinger.Next by Alan Gates of Hortonworks
Stinger.Next by Alan Gates of HortonworksStinger.Next by Alan Gates of Hortonworks
Stinger.Next by Alan Gates of Hortonworks
 
Sgi hadoop clusters_ds_us_4381
Sgi hadoop clusters_ds_us_4381Sgi hadoop clusters_ds_us_4381
Sgi hadoop clusters_ds_us_4381
 
Cloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a championCloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a champion
 
Big data on virtualized infrastucture
Big data on virtualized infrastuctureBig data on virtualized infrastucture
Big data on virtualized infrastucture
 
Delivering Apache Hadoop for the Modern Data Architecture
Delivering Apache Hadoop for the Modern Data Architecture Delivering Apache Hadoop for the Modern Data Architecture
Delivering Apache Hadoop for the Modern Data Architecture
 
Webinar: Productionizing Hadoop: Lessons Learned - 20101208
Webinar: Productionizing Hadoop: Lessons Learned - 20101208Webinar: Productionizing Hadoop: Lessons Learned - 20101208
Webinar: Productionizing Hadoop: Lessons Learned - 20101208
 
Virtualized Hadoop
Virtualized HadoopVirtualized Hadoop
Virtualized Hadoop
 
Why Every NoSQL Deployment Should Be Paired with Hadoop Webinar
Why Every NoSQL Deployment Should Be Paired with Hadoop WebinarWhy Every NoSQL Deployment Should Be Paired with Hadoop Webinar
Why Every NoSQL Deployment Should Be Paired with Hadoop Webinar
 
YARN - Strata 2014
YARN - Strata 2014YARN - Strata 2014
YARN - Strata 2014
 
Discover HDP2.1: Apache Storm for Stream Data Processing in Hadoop
Discover HDP2.1: Apache Storm for Stream Data Processing in HadoopDiscover HDP2.1: Apache Storm for Stream Data Processing in Hadoop
Discover HDP2.1: Apache Storm for Stream Data Processing in Hadoop
 
Processing Big Data
Processing Big DataProcessing Big Data
Processing Big Data
 
Enabling Diverse Workload Scheduling in YARN
Enabling Diverse Workload Scheduling in YARNEnabling Diverse Workload Scheduling in YARN
Enabling Diverse Workload Scheduling in YARN
 
Hadoop summit cloudera keynote_v5
Hadoop summit cloudera keynote_v5Hadoop summit cloudera keynote_v5
Hadoop summit cloudera keynote_v5
 
Hadoop crashcourse v3
Hadoop crashcourse v3Hadoop crashcourse v3
Hadoop crashcourse v3
 
Deploying Docker applications on YARN via Slider
Deploying Docker applications on YARN via SliderDeploying Docker applications on YARN via Slider
Deploying Docker applications on YARN via Slider
 
Cloud computing era
Cloud computing eraCloud computing era
Cloud computing era
 

Similar to Apache Hadoop Now Next and Beyond

Big Data Analytics - Is Your Elephant Enterprise Ready?
Big Data Analytics - Is Your Elephant Enterprise Ready?Big Data Analytics - Is Your Elephant Enterprise Ready?
Big Data Analytics - Is Your Elephant Enterprise Ready?Hortonworks
 
Discover.hdp2.2.ambari.final[1]
Discover.hdp2.2.ambari.final[1]Discover.hdp2.2.ambari.final[1]
Discover.hdp2.2.ambari.final[1]Hortonworks
 
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemIntroduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemShivaji Dutta
 
Introduction to Hortonworks Data Platform
Introduction to Hortonworks Data PlatformIntroduction to Hortonworks Data Platform
Introduction to Hortonworks Data PlatformHortonworks
 
Hadoop - Now, Next and Beyond
Hadoop - Now, Next and BeyondHadoop - Now, Next and Beyond
Hadoop - Now, Next and BeyondTeradata Aster
 
Introduction to Microsoft HDInsight and BI Tools
Introduction to Microsoft HDInsight and BI ToolsIntroduction to Microsoft HDInsight and BI Tools
Introduction to Microsoft HDInsight and BI ToolsDataWorks Summit
 
Why hadoop for data science?
Why hadoop for data science?Why hadoop for data science?
Why hadoop for data science?Hortonworks
 
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFSDiscover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFSHortonworks
 
Discover HDP 2.1: Apache Solr for Hadoop Search
Discover HDP 2.1: Apache Solr for Hadoop SearchDiscover HDP 2.1: Apache Solr for Hadoop Search
Discover HDP 2.1: Apache Solr for Hadoop SearchHortonworks
 
Discover hdp 2.2 hdfs - final
Discover hdp 2.2   hdfs - finalDiscover hdp 2.2   hdfs - final
Discover hdp 2.2 hdfs - finalHortonworks
 
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...Hortonworks
 
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache HadoopRescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache HadoopHortonworks
 
Strata feb2013
Strata feb2013Strata feb2013
Strata feb2013alanfgates
 
Discover.hdp2.2.h base.final[2]
Discover.hdp2.2.h base.final[2]Discover.hdp2.2.h base.final[2]
Discover.hdp2.2.h base.final[2]Hortonworks
 
Mrinal devadas, Hortonworks Making Sense Of Big Data
Mrinal devadas, Hortonworks Making Sense Of Big DataMrinal devadas, Hortonworks Making Sense Of Big Data
Mrinal devadas, Hortonworks Making Sense Of Big DataPatrickCrompton
 
Discover HDP 2.1: Interactive SQL Query in Hadoop with Apache Hive
Discover HDP 2.1: Interactive SQL Query in Hadoop with Apache HiveDiscover HDP 2.1: Interactive SQL Query in Hadoop with Apache Hive
Discover HDP 2.1: Interactive SQL Query in Hadoop with Apache HiveHortonworks
 
Discover.hdp2.2.storm and kafka.final
Discover.hdp2.2.storm and kafka.finalDiscover.hdp2.2.storm and kafka.final
Discover.hdp2.2.storm and kafka.finalHortonworks
 
Discover HDP 2.2: Apache Falcon for Hadoop Data Governance
Discover HDP 2.2: Apache Falcon for Hadoop Data GovernanceDiscover HDP 2.2: Apache Falcon for Hadoop Data Governance
Discover HDP 2.2: Apache Falcon for Hadoop Data GovernanceHortonworks
 
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.nextDiscover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.nextHortonworks
 
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataHortonworks
 

Similar to Apache Hadoop Now Next and Beyond (20)

Big Data Analytics - Is Your Elephant Enterprise Ready?
Big Data Analytics - Is Your Elephant Enterprise Ready?Big Data Analytics - Is Your Elephant Enterprise Ready?
Big Data Analytics - Is Your Elephant Enterprise Ready?
 
Discover.hdp2.2.ambari.final[1]
Discover.hdp2.2.ambari.final[1]Discover.hdp2.2.ambari.final[1]
Discover.hdp2.2.ambari.final[1]
 
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemIntroduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystem
 
Introduction to Hortonworks Data Platform
Introduction to Hortonworks Data PlatformIntroduction to Hortonworks Data Platform
Introduction to Hortonworks Data Platform
 
Hadoop - Now, Next and Beyond
Hadoop - Now, Next and BeyondHadoop - Now, Next and Beyond
Hadoop - Now, Next and Beyond
 
Introduction to Microsoft HDInsight and BI Tools
Introduction to Microsoft HDInsight and BI ToolsIntroduction to Microsoft HDInsight and BI Tools
Introduction to Microsoft HDInsight and BI Tools
 
Why hadoop for data science?
Why hadoop for data science?Why hadoop for data science?
Why hadoop for data science?
 
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFSDiscover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
 
Discover HDP 2.1: Apache Solr for Hadoop Search
Discover HDP 2.1: Apache Solr for Hadoop SearchDiscover HDP 2.1: Apache Solr for Hadoop Search
Discover HDP 2.1: Apache Solr for Hadoop Search
 
Discover hdp 2.2 hdfs - final
Discover hdp 2.2   hdfs - finalDiscover hdp 2.2   hdfs - final
Discover hdp 2.2 hdfs - final
 
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
 
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache HadoopRescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
 
Strata feb2013
Strata feb2013Strata feb2013
Strata feb2013
 
Discover.hdp2.2.h base.final[2]
Discover.hdp2.2.h base.final[2]Discover.hdp2.2.h base.final[2]
Discover.hdp2.2.h base.final[2]
 
Mrinal devadas, Hortonworks Making Sense Of Big Data
Mrinal devadas, Hortonworks Making Sense Of Big DataMrinal devadas, Hortonworks Making Sense Of Big Data
Mrinal devadas, Hortonworks Making Sense Of Big Data
 
Discover HDP 2.1: Interactive SQL Query in Hadoop with Apache Hive
Discover HDP 2.1: Interactive SQL Query in Hadoop with Apache HiveDiscover HDP 2.1: Interactive SQL Query in Hadoop with Apache Hive
Discover HDP 2.1: Interactive SQL Query in Hadoop with Apache Hive
 
Discover.hdp2.2.storm and kafka.final
Discover.hdp2.2.storm and kafka.finalDiscover.hdp2.2.storm and kafka.final
Discover.hdp2.2.storm and kafka.final
 
Discover HDP 2.2: Apache Falcon for Hadoop Data Governance
Discover HDP 2.2: Apache Falcon for Hadoop Data GovernanceDiscover HDP 2.2: Apache Falcon for Hadoop Data Governance
Discover HDP 2.2: Apache Falcon for Hadoop Data Governance
 
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.nextDiscover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
 
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big Data
 

More from DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Apache Hadoop Now Next and Beyond

  • 1. Hadoop Now, Next & Beyond Community Driven Enterprise Apache Hadoop Eric Baldeschwieler, “Eric14” Hortonworks CTO @jeric14 © Hortonworks Inc. 2013
  • 2. Quick History: Hadoop at Yahoo! Source: http://developer.yahoo.com/blogs/ydn/posts/2013/02/hadoop-at-yahoo-more-than-ever-before/ © Hortonworks Inc. 2013
  • 3. Hortonworks Approach to Enterprise Hadoop Community Driven Enterprise Apache Hadoop Identify and introduce enterprise requirements into the pubic domain Work with the community to advance and incubate open source projects Apply Enterprise Rigor to provide the most stable and reliable distribution © Hortonworks Inc. 2013
  • 4. Making Hadoop Enterprise-Ready OPERATIONAL DATA SERVICES SERVICES Manage & AMBARI FLUME Store, HIVE PIG Operate at Process and HBASE Scale SQOOP Access Data OOZIE HCATALOG MAP REDUCE Distributed HADOOP CORE Storage & Processing HDFS Enterprise Readiness: HA, PLATFORM SERVICES DR, Snapshots, Security, … HORTONWORKS DATA PLATFORM (HDP) OS / VM Cloud Appliance © Hortonworks Inc. 2013
  • 5. HCatalog: Table-level Abstractions • Consistency of data models across tools (MapReduce, Pig, HBase and Hive) • Accessibility: share data as tables inside and out of Hadoop HCatalog Shared table and schema management • Raw Hadoop data Table access opens the • Inconsistent, unknown Consistent schema platform • Tool specific access REST API © Hortonworks Inc. 2013
  • 6. Ambari: Provision > Manage > Monitor A framework for operating Hadoop…with APIs for integration Manage Monitor Provision Ambari Integrate Hadoop Cluster © Hortonworks Inc. 2013
  • 7. Ambari: Latest Highlights • Job Diagnostics • Cluster History • Instant Insight • Cluster Navigation • REST interface Apache Ambari Dashboard © Hortonworks Inc. 2013
  • 8. See Hadoop > Learn Hadoop > Do Hadoop Hands on Full environment step-by- step to evaluate tutorials to learn Hadoop © Hortonworks Inc. 2013
  • 9. Hadoop 2.0 Innovations - YARN • Focus on scale and innovation – Support 10,000+ computer clusters – Extensible to encourage innovation Graph Processing • Next generation execution MapReduce – Improves MapReduce performance Other Tez • Supports new frameworks beyond MapReduce YARN: Cluster Resource Management – Low latency, Streaming, Services – Do more with a single Hadoop cluster HDFS Redundant, Reliable Storage © Hortonworks Inc. 2013
  • 10. Tez on YARN: Going Beyond Batch Tez Task Tez Optimizes Execution Always-On Tez Service New runtime engine for Low latency processing for more efficient data processing all Hadoop data processing © Hortonworks Inc. 2013
  • 11. Stinger Initiative • Community initiative around Hive • Enables Hive to support interactive workloads • Enhances Hive’s standard SQL interface for Hadoop • Improves existing tools & preserves investments Execution Query File Engine Planner Format + + = 100X Tez Hive ORC file © Hortonworks Inc. 2013
  • 12. Stinger: Make Hive Fly For All BI Needs Parameterized Reports Enterprise Reports Dashboard / Scorecard Visualization Data Mining More SQL + 100X Faster Interactive Batch © Hortonworks Inc. 2013
  • 13. Knox: Make Hadoop Security Simple Authentication & Verification User Store Hadoop Cluster KDC, AD, LDAP {REST} Knox Client Gateway © Hortonworks Inc. 2013
  • 14. Hortonworks Data Platform 2.0 Alpha 2 Key New Features Business Value – Hive performance – First distribution to include Tez Single Platform Multiple Use From HDP 2.0 Alpha BATCH INTERACTIVE ONLINE – Yarn – Full Stack HA – Snapshots – Disaster Recovery Big Data Transactions, Interactions, Observations – Rolling Upgrades Available today http://Hortonworks.com/products Page 14 © Hortonworks Inc. 2013
  • 15. Falcon: Data Lifecycle Management • New Apache Incubator Project • Introduced by InMobi, Hortonworks and Yahoo! • Data Lifecycle Management Framework for Hadoop • Configure and Manage Workflows & Policies for: – Data Movement – Disaster Recovery – Data Retention © Hortonworks Inc. 2013
  • 16. Join the Community & Get Involved! • INNOVATE Open Source Vendors • INTEGRATE End Users • COMMUNICATE © Hortonworks Inc. 2013

Editor's Notes

  1. HCatalog – metadata shared across whole platformFile locations become abstract (not hard-coded)Data types become shared (not redefined per tool)Partitioning and HDFS-optimized
  2. Job DiagnosticsVisualize and troubleshoot Hadoop job execution and performanceCluster History View historical job execution & performanceInstant InsightView health of Core Hadoop (HDFS, MapReduce) and related projectsCluster Navigation “Quick link” buttons jump into namenode web UI for a serverREST interface provides external access to Ambari for existing tools. Facilitates integration with Microsoft System Center and Teradata Viewpoint
  3. Hortonworks SandboxHortonworks accelerates Hadoop skills development with an easy-to-use, flexible and extensible platform to learn, evaluate and use Apache HadoopWhat is it: virtualized single-node implementation of the enterprise-ready Hortonworks Data PlatformProvides demos, videos and step-by-step hands-on tutorialsPre-built partner integrations and access to datasetsWhat it does: Dramatically accelerates the process of learning Apache HadoopSee It -- demos and videos to illustrate use casesLearn It -- multi level step by step tutorials Do It -- hands on exercises for faster skills developmentHow it helps: Accelerate and validates the use of Hadoop within your unique data architectureUse your data to explore and investigate your use casesZERO to big data in 15 minutes
  4. Community developed frameworksMachine learning / Analytics (MPI, GraphLab, Giraph, Hama, Spark, …)Services inside Hadoop (memcache, HBase, Storm…)Low latency computing (CEP or stream processing)
  5. Tez Approved as New Apache Incubator ProjectHortonworks Introduces Next-Generation Runtime for Improving Latency and Throughput of Hadoop Apps
  6. Buzz about low latency access in Hadoop
  7. Hortonworks Unveils Stinger Initiative to Make Apache Hive 100X Faster for Interactive QueriesHortonworks leading effort with a group of community contributors focusing on enhancing Apache Hive, the defacto standard for SQL access to HadoopEnterprise Reports – Your cell phone bill is an exampleDashboard – KPI trackingParameterized Reports – What are the hot prospects in my region?Visualization – Visual exploration of dataData Mining – Large scale data processing and extraction usually fed to other toolsHow?Improve Latency & ThroughputQuery engine improvementsNew “Optimized RCFile” column storeNext-gen runtime (elim’s M/R latency)Extend Deep Analytical AbilityAnalytics functionsImproved SQL coverageContinued focus on core Hive use cases
  8. Operators can firewall cluster without end user access to “gateway node”Users see one cluster end-point that aggregates capabilities for data access, metadata and job controlProvide perimeter security to make Hadoop security setup easierEnable integration enterprise and cloud identity management environmentsVerificationVerify identity tokenSAML, propagation of identityAuthenticationEstablish identity at Gateway to Authenticate with LDAP + AD
  9. Hadoop 2.0 represents the next generation of the foundation of big data. Under development for nearly three years now, It is a more mature version of Hadoop that has been architected for broader use by more generic enterprise. The main focus for this nest generation has been the broader enterprise. They have very explicit requirements that are a little bit different than the typical web properties who first adopted hadoop. Some of the requirements required the community to rethink the approach. Plus, our experience running hadoop at yahoo provided much insight into how we could architect things to make them better.Some of the critical features are listed here. Go through them.Highlight workloads and explain how 2.0 is engineered to meet these exacting demands. There is a graphic to help illustrate. We have moved beyond just batch…