SlideShare a Scribd company logo
1 of 39
Download to read offline
Integration of Apache Hive
and HBase
Enis Soztutar
enis [at] apache [dot] org
@enissoz




Architecting the Future of Big Data
 © Hortonworks Inc. 2011              Page 1
About Me

•  User and committer of Hadoop since 2007
•  Contributor to Apache Hadoop, HBase, Hive and Gora
•  Joined Hortonworks as Member of Technical Staff
•  Twitter: @enissoz




        Architecting the Future of Big Data
                                                        Page 2
        © Hortonworks Inc. 2011
Agenda

•  Overview of Hive and HBase
•  Hive + HBase Features and Improvements
•  Future of Hive and HBase
•  Q&A




         Architecting the Future of Big Data
                                               Page 3
         © Hortonworks Inc. 2011
Apache Hive Overview
• Apache Hive is a data warehouse system for Hadoop
• SQL-like query language called HiveQL
• Built for PB scale data
• Main purpose is analysis and ad hoc querying
• Database / table / partition / bucket – DDL Operations
• SQL Types + Complex Types (ARRAY, MAP, etc)
• Very extensible
• Not for : small data sets, low latency queries, OLTP



         Architecting the Future of Big Data
                                                           Page 4
         © Hortonworks Inc. 2011
Apache Hive Architecture
                                  JDBC/ODBC




                                     Hive Thrift        Hive Web
       CLI
                                      Server            Interface



    Driver                                                          M
                                                                    S
                                                                    C
                 Parser                            Planner          l   Metastore
                                                                    i
                                                                    e
              Execution                            Optimizer        n
                                                                    t

        MapReduce

                                         HDFS                            RDBMS

       Architecting the Future of Big Data
                                                                                    Page 5
       © Hortonworks Inc. 2011
Overview of Apache HBase
• Apache HBase is the Hadoop database
• Modeled after Google’s BigTable
• A sparse, distributed, persistent multi- dimensional sorted
  map
• The map is indexed by a row key, column key, and a
  timestamp
• Each value in the map is an un-interpreted array of bytes
• Low latency random data access




         Architecting the Future of Big Data
                                                                Page 6
         © Hortonworks Inc. 2011
Overview of Apache HBase
• Logical view:




                                               From: Bigtable: A Distributed Storage System for Structured Data, Chang, et al.




         Architecting the Future of Big Data
                                                                                                                                 Page 7
         © Hortonworks Inc. 2011
Apache HBase Architecture

            Client

                                                HMaster



                                                                     Zookeeper
    Region                                 Region         Region
    server                                 server         server
       Region                               Region          Region


       Region                               Region          Region



                                                     HDFS

     Architecting the Future of Big Data
                                                                                 Page 8
     © Hortonworks Inc. 2011
Hive + HBase Features and
Improvements




 Architecting the Future of Big Data
                                       Page 9
 © Hortonworks Inc. 2011
Hive + HBase Motivation
• Hive and HBase has different characteristics:
  High latency                                Low latency
  Structured                   vs.            Unstructured
  Analysts                                    Programmers

• Hive datawarehouses on Hadoop are high latency
  – Long ETL times
  – Access to real time data
• Analyzing HBase data with MapReduce requires custom
  coding
• Hive and SQL are already known by many analysts

        Architecting the Future of Big Data
                                                             Page 10
        © Hortonworks Inc. 2011
Use Case 1: HBase as ETL Data Sink




From HUG - Hive/HBase Integration or, MaybeSQL? April 2010 John Sichi Facebook
http://www.slideshare.net/hadoopusergroup/hive-h-basehadoopapr2010


                  Architecting the Future of Big Data
                                                                                 Page 11
                  © Hortonworks Inc. 2011
Use Case 2: HBase as Data Source




From HUG - Hive/HBase Integration or, MaybeSQL? April 2010 John Sichi Facebook
http://www.slideshare.net/hadoopusergroup/hive-h-basehadoopapr2010


                  Architecting the Future of Big Data
                                                                                 Page 12
                  © Hortonworks Inc. 2011
Use Case 3: Low Latency Warehouse




From HUG - Hive/HBase Integration or, MaybeSQL? April 2010 John Sichi Facebook
http://www.slideshare.net/hadoopusergroup/hive-h-basehadoopapr2010


                  Architecting the Future of Big Data
                                                                                 Page 13
                  © Hortonworks Inc. 2011
Example: Hive + Hbase (HBase table)
hbase(main):001:0> create 'short_urls', {NAME =>
'u'}, {NAME=>'s'}



hbase(main):014:0> scan 'short_urls'

ROW                   COLUMN+CELL
 bit.ly/aaaa          column=s:hits, value=100
 bit.ly/aaaa          column=u:url,
value=hbase.apache.org/
 bit.ly/abcd          column=s:hits, value=123
 bit.ly/abcd          column=u:url,
value=example.com/foo
      Architecting the Future of Big Data
                                                 Page 14
      © Hortonworks Inc. 2011
Example: Hive + HBase (Hive table)
CREATE TABLE short_urls(
   short_url string,
   url string,
   hit_count int
)
STORED BY
'org.apache.hadoop.hive.hbase.HBaseStorageHandler'

WITH SERDEPROPERTIES
("hbase.columns.mapping" = ":key, u:url, s:hits")

TBLPROPERTIES
("hbase.table.name" = ”short_urls");
       Architecting the Future of Big Data
                                                Page 15
       © Hortonworks Inc. 2011
Storage Handler
• Hive defines HiveStorageHandler class for different storage
  backends: HBase/ Cassandra / MongoDB/ etc
• Storage Handler has hooks for
  –  Getting input / output formats
  –  Meta data operations hook: CREATE TABLE, DROP TABLE, etc
• Storage Handler is a table level concept
  –  Does not support Hive partitions, and buckets




         Architecting the Future of Big Data
                                                           Page 16
         © Hortonworks Inc. 2011
Apache Hive + HBase Architecture
                                           Hive Thrift         Hive Web
                   CLI
                                            Server             Interface


           Driver                                                          M
                                                                           S
                            Parser                         Planner         C
                                                                           l   Metastore
                                                                           i
                         Execution                         Optimizer       e
                                                                           n
                                                                           t


                                                         StorageHandler


                  MapReduce                                  HBase

                                             HDFS                              RDBMS

     Architecting the Future of Big Data
                                                                                           Page 17
     © Hortonworks Inc. 2011
Hive + HBase Integration
• For Input/OutputFormat, getSplits(), etc underlying HBase
  classes are used
• Column selection and certain filters can be pushed down
• HBase tables can be used with other(Hadoop native) tables
  and SQL constructs
• Hive DDL operations are converted to HBase DDL
  operations via the client hook.
  – All operations are performed by the client
  – No two phase commit




        Architecting the Future of Big Data
                                                          Page 18
        © Hortonworks Inc. 2011
Schema / Type Mapping




Architecting the Future of Big Data
                                      Page 19
© Hortonworks Inc. 2011
Schema Mapping
•  Hive table + columns + column types <=> HBase table + column
   families (+ column qualifiers)
•  Every field in Hive table is mapped in order to either
   – The table key (using :key as selector)
   – A column family (cf:) -> MAP fields in Hive
   – A column (cf:cq)
•  Hive table does not need to include all columns in HBase
•  CREATE TABLE short_urls(
       short_url string,
       url string,
       hit_count int,
       props, map<string,string>
   )
   WITH SERDEPROPERTIES
   ("hbase.columns.mapping" = ":key, u:url, s:hits, p:")

          Architecting the Future of Big Data
                                                              Page 20
          © Hortonworks Inc. 2011
Type Mapping
• Recently added to Hive (0.9.0)
• Previously all types were being converted to strings in HBase
• Hive has:
  – Primitive types: INT, STRING, BINARY, DATE, etc
  – ARRAY<Type>
  – MAP<PrimitiveType, Type>
  – STRUCT<a:INT, b:STRING, c:STRING>
• HBase does not have types
  – Bytes.toBytes()




        Architecting the Future of Big Data
                                                            Page 21
        © Hortonworks Inc. 2011
Type Mapping
• Table level property
  "hbase.table.default.storage.type” = “binary”
• Type mapping can be given per column after #
  – Any prefix of “binary” , eg u:url#b
  – Any prefix of “string” , eg u:url#s
  – The dash char “-” , eg u:url#-

CREATE TABLE short_urls(
   short_url string,
   url string,
   hit_count int,
   props, map<string,string>
)
WITH SERDEPROPERTIES
("hbase.columns.mapping" = ":key#b,u:url#b,s:hits#b,p:#s")

        Architecting the Future of Big Data                  Page 22
        © Hortonworks Inc. 2011
Type Mapping
• If the type is not a primitive or Map, it is converted to a JSON
  string and serialized
• Still a few rough edges for schema and type mapping:
   – No Hive BINARY support in HBase mapping
   – No mapping of HBase timestamp (can only provide put
     timestamp)
   – No arbitrary mapping of Structs / Arrays into HBase schema




         Architecting the Future of Big Data
                                                                  Page 23
         © Hortonworks Inc. 2011
Bulk Load
• Steps to bulk load:
   – Sample source data for range partitioning
   – Save sampling results to a file
   – Run CLUSTER BY query using HiveHFileOutputFormat and
     TotalOrderPartitioner
   – Import Hfiles into HBase table
• Ideal setup should be
   SET hive.hbase.bulk=true
   INSERT OVERWRITE TABLE web_table SELECT ….




        Architecting the Future of Big Data
                                                        Page 24
        © Hortonworks Inc. 2011
Filter Pushdown




Architecting the Future of Big Data
                                      Page 25
© Hortonworks Inc. 2011
Filter Pushdown
• Idea is to pass down filter expressions to the storage layer to
  minimize scanned data
• To access indexes at HDFS or HBase
• Example:
   CREATE EXTERNAL TABLE users (userid LONG, email STRING, … )
   STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler’
   WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,…")


   SELECT ... FROM users WHERE userid > 1000000 and email LIKE
‘%@gmail.com’;



-> scan.setStartRow(Bytes.toBytes(1000000))

         Architecting the Future of Big Data
                                                                  Page 26
         © Hortonworks Inc. 2011
Filter Decomposition
• Optimizer pushes down the predicates to the query plan
• Storage handlers can negotiate with the Hive optimizer to
  decompose the filter
  x > 3 AND upper(y) = 'XYZ’
• Handle x > 3, send upper(y) = ’XYZ’ as residual for Hive
• Works with:
key = 3, key > 3, etc
key > 3 AND key < 100
• Only works against constant expressions


        Architecting the Future of Big Data
                                                              Page 27
        © Hortonworks Inc. 2011
Security Aspects
Towards fully secure deployments




Architecting the Future of Big Data
                                      Page 28
© Hortonworks Inc. 2011
Security – Big Picture
• Security becomes more important to support enterprise level
  and multi tenant applications
• 5 Different Components to ensure / impose security
  – HDFS
  – MapReduce
  – HBase
  – Zookeeper
  – Hive
• Each component has:
  – Authentication
  – Authorization


         Architecting the Future of Big Data
                                                           Page 29
         © Hortonworks Inc. 2011
HBase Security – Closer look
• Released with HBase 0.92
• Fully optional module, disabled by default
• Needs an underlying secure Hadoop release
• SecureRPCEngine: optional engine enforcing SASL
  authentication
  – Kerberos
  – DIGEST-MD5 based tokens
  – TokenProvider coprocessor
• Access control is implemented as a Coprocessor:
  AccessController
• Stores and distributes ACL data via Zookeeper
  – Sensitive data is only accessible by HBase daemons
  – Client does not need to authenticate to zk
         Architecting the Future of Big Data
                                                         Page 30
         © Hortonworks Inc. 2011
Hive Security – Closer look
• Hive has different deployment options, security considerations
  should take into account different deployments
• Authentication is only supported at Metastore, not on
  HiveServer, web interface, JDBC
• Authorization is enforced at the query layer (Driver)
• Pluggable authorization providers. Default one stores global/
  table/partition/column permissions in Metastore

GRANT ALTER ON TABLE web_table TO USER bob;
CREATE ROLE db_reader
GRANT SELECT, SHOW_DATABASE ON DATABASE mydb TO
ROLE db_reader

        Architecting the Future of Big Data
                                                           Page 31
        © Hortonworks Inc. 2011
Hive Deployment Option 1
  Client


             CLI


     Driver                                                                 M
                                                          Authorization
                                                                            S
                                                                            C
                        Parser                              Planner         l   Authentication
                                                                            i
                                                                                 Metastore
                                                                            e
                    Execution                              Optimizer        n
                                                                            t

     A/A                                            A/A
             MapReduce                                       HBase

                                                                A12n/A11N       A12n/A11N
                                                 HDFS
                                                                                RDBMS
           Architecting the Future of Big Data
                                                                                          Page 32
           © Hortonworks Inc. 2011
Hive Deployment Option 2
 Client


            CLI



    Driver                                                                 M
                                                         Authorization
                                                                           S
                                                                           C
                       Parser                              Planner         l   Authentication
                                                                           i
                                                                           e    Metastore
                                                                           n
                   Execution                              Optimizer
                                                                           t

    A/A                                            A/A
            MapReduce                                       HBase

                                                               A12n/A11N       A12n/A11N
                                                HDFS
                                                                               RDBMS
          Architecting the Future of Big Data
                                                                                        Page 33
          © Hortonworks Inc. 2011
Hive Deployment Option 3
  Client
                                                 JDBC/ODBC




                                                 Hive Thrift            Hive Web
               CLI
                                                  Server                Interface


                                                                                    M
     Driver                                                     Authorization
                                                                                    S
                                                                                    C
                          Parser                                  Planner           l   Authentication
                                                                                    i     Metastore
                                                                                    e
                       Execution                                 Optimizer          n
                                                                                    t
     A/A                                                  A/A
                MapReduce                                           HBase
                                                                                        A12n/A11N
                                                   HDFS               A12n/A11N
                                                                                         RDBMS
           Architecting the Future of Big Data
                                                                                                Page 34
           © Hortonworks Inc. 2011
Hive + HBase + Hadoop Security
• Regardless of Hive’s own security, for Hive to work on
  secure Hadoop and HBase, we should:
  – Obtain delegation tokens for Hadoop and HBase jobs
  – Ensure to obey the storage level (HDFS, HBase) permission checks
  – In HiveServer deployments, authenticate and impersonate the user

• Delegation tokens for Hadoop are already working
• Obtaining HBase delegation tokens are released in Hive
  0.9.0




         Architecting the Future of Big Data
                                                                   Page 35
         © Hortonworks Inc. 2011
Future of Hive + HBase
• Improve on schema / type mapping
• Fully secure Hive deployment options
• HBase bulk import improvements
• Sortable signed numeric types in HBase
• Filter pushdown: non key column filters
• Hive random access support for HBase
  – https://cwiki.apache.org/HCATALOG/random-access-
    framework.html




        Architecting the Future of Big Data
                                                       Page 36
        © Hortonworks Inc. 2011
References
• Security
  – https://issues.apache.org/jira/browse/HIVE-2764
  – https://issues.apache.org/jira/browse/HBASE-5371
  – https://issues.apache.org/jira/browse/HCATALOG-245
  – https://issues.apache.org/jira/browse/HCATALOG-260
  – https://issues.apache.org/jira/browse/HCATALOG-244
  – https://cwiki.apache.org/confluence/display/HCATALOG/Hcat+Security
    +Design
• Type mapping / Filter Pushdown
  – https://issues.apache.org/jira/browse/HIVE-1634
  – https://issues.apache.org/jira/browse/HIVE-1226
  – https://issues.apache.org/jira/browse/HIVE-1643
  – https://issues.apache.org/jira/browse/HIVE-2815
  – https://issues.apache.org/jira/browse/HIVE-1643
        Architecting the Future of Big Data
                                                                 Page 37
        © Hortonworks Inc. 2011
Other Resources

• Hadoop Summit
  – June 13-14
  – San Jose, California
  – www.Hadoopsummit.org

• Hadoop Training and Certification
  – Developing Solutions Using Apache Hadoop
  – Administering Apache Hadoop
  – Online classes available US, India, EMEA
  – http://hortonworks.com/training/




        © Hortonworks Inc. 2012                Page 38
Thanks
Questions?




    Architecting the Future of Big Data
                                          Page 39
    © Hortonworks Inc. 2011

More Related Content

What's hot

6 types of web application development
6 types of web application development6 types of web application development
6 types of web application developmentClustox
 
Cascading style sheets (CSS-Web Technology)
Cascading style sheets (CSS-Web Technology)Cascading style sheets (CSS-Web Technology)
Cascading style sheets (CSS-Web Technology)Timbal Mayank
 
Components of .NET Framework
Components of .NET FrameworkComponents of .NET Framework
Components of .NET FrameworkRoshith S Pai
 
Cascading Style Sheet
Cascading Style SheetCascading Style Sheet
Cascading Style Sheetvijayta
 
Software engineering : Layered Architecture
Software engineering : Layered ArchitectureSoftware engineering : Layered Architecture
Software engineering : Layered ArchitectureMuhammed Afsal Villan
 
Dom(document object model)
Dom(document object model)Dom(document object model)
Dom(document object model)Partnered Health
 
Hadoop combiner and partitioner
Hadoop combiner and partitionerHadoop combiner and partitioner
Hadoop combiner and partitionerSubhas Kumar Ghosh
 
Advance computer architecture
Advance computer architecture Advance computer architecture
Advance computer architecture SabthamiS1
 
Web technology practical list
Web technology practical listWeb technology practical list
Web technology practical listdesaipratu10
 
Compiler Design
Compiler DesignCompiler Design
Compiler DesignMir Majid
 

What's hot (20)

Know the UNIX Commands
Know the UNIX CommandsKnow the UNIX Commands
Know the UNIX Commands
 
Html forms
Html formsHtml forms
Html forms
 
6 types of web application development
6 types of web application development6 types of web application development
6 types of web application development
 
Cascading style sheets (CSS-Web Technology)
Cascading style sheets (CSS-Web Technology)Cascading style sheets (CSS-Web Technology)
Cascading style sheets (CSS-Web Technology)
 
Unit 01 dbms
Unit 01 dbmsUnit 01 dbms
Unit 01 dbms
 
Components of .NET Framework
Components of .NET FrameworkComponents of .NET Framework
Components of .NET Framework
 
Basic HTML
Basic HTMLBasic HTML
Basic HTML
 
Scripting languages
Scripting languagesScripting languages
Scripting languages
 
Cascading Style Sheet
Cascading Style SheetCascading Style Sheet
Cascading Style Sheet
 
Menu stripe
Menu stripeMenu stripe
Menu stripe
 
Software engineering : Layered Architecture
Software engineering : Layered ArchitectureSoftware engineering : Layered Architecture
Software engineering : Layered Architecture
 
Intro to html 5
Intro to html 5Intro to html 5
Intro to html 5
 
Dom(document object model)
Dom(document object model)Dom(document object model)
Dom(document object model)
 
Unit 3 (frames)
Unit 3 (frames)Unit 3 (frames)
Unit 3 (frames)
 
Web Technology UPTU UNIT 1
Web Technology UPTU UNIT 1 Web Technology UPTU UNIT 1
Web Technology UPTU UNIT 1
 
Unit 2 dhtml
Unit 2 dhtmlUnit 2 dhtml
Unit 2 dhtml
 
Hadoop combiner and partitioner
Hadoop combiner and partitionerHadoop combiner and partitioner
Hadoop combiner and partitioner
 
Advance computer architecture
Advance computer architecture Advance computer architecture
Advance computer architecture
 
Web technology practical list
Web technology practical listWeb technology practical list
Web technology practical list
 
Compiler Design
Compiler DesignCompiler Design
Compiler Design
 

Viewers also liked

Pig, Making Hadoop Easy
Pig, Making Hadoop EasyPig, Making Hadoop Easy
Pig, Making Hadoop EasyNick Dimiduk
 
HIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on HadoopHIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on HadoopZheng Shao
 
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and HadoopFacebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and Hadooproyans
 
Hadoop, Pig, and Twitter (NoSQL East 2009)
Hadoop, Pig, and Twitter (NoSQL East 2009)Hadoop, Pig, and Twitter (NoSQL East 2009)
Hadoop, Pig, and Twitter (NoSQL East 2009)Kevin Weil
 
introduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and Pigintroduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and PigRicardo Varela
 
Practical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & PigPractical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & PigMilind Bhandarkar
 
Introduction To Map Reduce
Introduction To Map ReduceIntroduction To Map Reduce
Introduction To Map Reducerantav
 
Hive Quick Start Tutorial
Hive Quick Start TutorialHive Quick Start Tutorial
Hive Quick Start TutorialCarl Steinbach
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with HadoopPhilippe Julio
 
Big Data & Hadoop Tutorial
Big Data & Hadoop TutorialBig Data & Hadoop Tutorial
Big Data & Hadoop TutorialEdureka!
 
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?sudhakara st
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation HadoopVarun Narang
 
HCatalog: Table Management for Hadoop - CHUG - 20120917
HCatalog: Table Management for Hadoop - CHUG - 20120917HCatalog: Table Management for Hadoop - CHUG - 20120917
HCatalog: Table Management for Hadoop - CHUG - 20120917Chicago Hadoop Users Group
 
Hadoop - Overview
Hadoop - OverviewHadoop - Overview
Hadoop - OverviewJay
 
Hadoop demo ppt
Hadoop demo pptHadoop demo ppt
Hadoop demo pptPhil Young
 
A beginners guide to Cloudera Hadoop
A beginners guide to Cloudera HadoopA beginners guide to Cloudera Hadoop
A beginners guide to Cloudera HadoopDavid Yahalom
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture EMC
 

Viewers also liked (20)

Pig, Making Hadoop Easy
Pig, Making Hadoop EasyPig, Making Hadoop Easy
Pig, Making Hadoop Easy
 
HIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on HadoopHIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on Hadoop
 
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and HadoopFacebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
 
Hadoop, Pig, and Twitter (NoSQL East 2009)
Hadoop, Pig, and Twitter (NoSQL East 2009)Hadoop, Pig, and Twitter (NoSQL East 2009)
Hadoop, Pig, and Twitter (NoSQL East 2009)
 
introduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and Pigintroduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and Pig
 
Practical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & PigPractical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & Pig
 
Introduction To Map Reduce
Introduction To Map ReduceIntroduction To Map Reduce
Introduction To Map Reduce
 
Hive Quick Start Tutorial
Hive Quick Start TutorialHive Quick Start Tutorial
Hive Quick Start Tutorial
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 
Big Data & Hadoop Tutorial
Big Data & Hadoop TutorialBig Data & Hadoop Tutorial
Big Data & Hadoop Tutorial
 
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
 
Big data and Hadoop
Big data and HadoopBig data and Hadoop
Big data and Hadoop
 
HCatalog: Table Management for Hadoop - CHUG - 20120917
HCatalog: Table Management for Hadoop - CHUG - 20120917HCatalog: Table Management for Hadoop - CHUG - 20120917
HCatalog: Table Management for Hadoop - CHUG - 20120917
 
Intro to HBase
Intro to HBaseIntro to HBase
Intro to HBase
 
Hadoop - Overview
Hadoop - OverviewHadoop - Overview
Hadoop - Overview
 
Hadoop demo ppt
Hadoop demo pptHadoop demo ppt
Hadoop demo ppt
 
A beginners guide to Cloudera Hadoop
A beginners guide to Cloudera HadoopA beginners guide to Cloudera Hadoop
A beginners guide to Cloudera Hadoop
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture
 

Similar to Integration of Hive and HBase

HBase for Architects
HBase for ArchitectsHBase for Architects
HBase for ArchitectsNick Dimiduk
 
Techincal Talk Hbase-Ditributed,no-sql database
Techincal Talk Hbase-Ditributed,no-sql databaseTechincal Talk Hbase-Ditributed,no-sql database
Techincal Talk Hbase-Ditributed,no-sql databaseRishabh Dugar
 
HBaseCon 2013: Integration of Apache Hive and HBase
HBaseCon 2013: Integration of Apache Hive and HBaseHBaseCon 2013: Integration of Apache Hive and HBase
HBaseCon 2013: Integration of Apache Hive and HBaseCloudera, Inc.
 
Hortonworks Technical Workshop: HBase and Apache Phoenix
Hortonworks Technical Workshop: HBase and Apache Phoenix Hortonworks Technical Workshop: HBase and Apache Phoenix
Hortonworks Technical Workshop: HBase and Apache Phoenix Hortonworks
 
SoCal BigData Day
SoCal BigData DaySoCal BigData Day
SoCal BigData DayJohn Park
 
Apache Ratis - In Search of a Usable Raft Library
Apache Ratis - In Search of a Usable Raft LibraryApache Ratis - In Search of a Usable Raft Library
Apache Ratis - In Search of a Usable Raft LibraryTsz-Wo (Nicholas) Sze
 
Cloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a championCloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a championAmeet Paranjape
 
Mapreduce over snapshots
Mapreduce over snapshotsMapreduce over snapshots
Mapreduce over snapshotsenissoz
 
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...DataWorks Summit/Hadoop Summit
 
Keynote from ApacheCon NA 2011
Keynote from ApacheCon NA 2011Keynote from ApacheCon NA 2011
Keynote from ApacheCon NA 2011Hortonworks
 
Big data processing engines, Atlanta Meetup 4/30
Big data processing engines, Atlanta Meetup 4/30Big data processing engines, Atlanta Meetup 4/30
Big data processing engines, Atlanta Meetup 4/30Ashish Narasimham
 
Hive 3.0 - HDPの最新バージョンで実現する新機能とパフォーマンス改善
Hive 3.0 - HDPの最新バージョンで実現する新機能とパフォーマンス改善Hive 3.0 - HDPの最新バージョンで実現する新機能とパフォーマンス改善
Hive 3.0 - HDPの最新バージョンで実現する新機能とパフォーマンス改善HortonworksJapan
 
Hadoop: today and tomorrow
Hadoop: today and tomorrowHadoop: today and tomorrow
Hadoop: today and tomorrowSteve Loughran
 
Apache HBase Internals you hoped you Never Needed to Understand
Apache HBase Internals you hoped you Never Needed to UnderstandApache HBase Internals you hoped you Never Needed to Understand
Apache HBase Internals you hoped you Never Needed to UnderstandJosh Elser
 
Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0DataWorks Summit
 

Similar to Integration of Hive and HBase (20)

Mar 2012 HUG: Hive with HBase
Mar 2012 HUG: Hive with HBaseMar 2012 HUG: Hive with HBase
Mar 2012 HUG: Hive with HBase
 
HBase for Architects
HBase for ArchitectsHBase for Architects
HBase for Architects
 
Hbase mhug 2015
Hbase mhug 2015Hbase mhug 2015
Hbase mhug 2015
 
Techincal Talk Hbase-Ditributed,no-sql database
Techincal Talk Hbase-Ditributed,no-sql databaseTechincal Talk Hbase-Ditributed,no-sql database
Techincal Talk Hbase-Ditributed,no-sql database
 
HBaseCon 2013: Integration of Apache Hive and HBase
HBaseCon 2013: Integration of Apache Hive and HBaseHBaseCon 2013: Integration of Apache Hive and HBase
HBaseCon 2013: Integration of Apache Hive and HBase
 
Hortonworks Technical Workshop: HBase and Apache Phoenix
Hortonworks Technical Workshop: HBase and Apache Phoenix Hortonworks Technical Workshop: HBase and Apache Phoenix
Hortonworks Technical Workshop: HBase and Apache Phoenix
 
Jan 2012 HUG: HCatalog
Jan 2012 HUG: HCatalogJan 2012 HUG: HCatalog
Jan 2012 HUG: HCatalog
 
SoCal BigData Day
SoCal BigData DaySoCal BigData Day
SoCal BigData Day
 
Apache Ratis - In Search of a Usable Raft Library
Apache Ratis - In Search of a Usable Raft LibraryApache Ratis - In Search of a Usable Raft Library
Apache Ratis - In Search of a Usable Raft Library
 
Cloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a championCloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a champion
 
Mapreduce over snapshots
Mapreduce over snapshotsMapreduce over snapshots
Mapreduce over snapshots
 
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...
 
Keynote from ApacheCon NA 2011
Keynote from ApacheCon NA 2011Keynote from ApacheCon NA 2011
Keynote from ApacheCon NA 2011
 
Hadoop Trends
Hadoop TrendsHadoop Trends
Hadoop Trends
 
Big data processing engines, Atlanta Meetup 4/30
Big data processing engines, Atlanta Meetup 4/30Big data processing engines, Atlanta Meetup 4/30
Big data processing engines, Atlanta Meetup 4/30
 
Hive 3.0 - HDPの最新バージョンで実現する新機能とパフォーマンス改善
Hive 3.0 - HDPの最新バージョンで実現する新機能とパフォーマンス改善Hive 3.0 - HDPの最新バージョンで実現する新機能とパフォーマンス改善
Hive 3.0 - HDPの最新バージョンで実現する新機能とパフォーマンス改善
 
Hadoop: today and tomorrow
Hadoop: today and tomorrowHadoop: today and tomorrow
Hadoop: today and tomorrow
 
Apache HBase Internals you hoped you Never Needed to Understand
Apache HBase Internals you hoped you Never Needed to UnderstandApache HBase Internals you hoped you Never Needed to Understand
Apache HBase Internals you hoped you Never Needed to Understand
 
Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0
 
BIGDATA ppts
BIGDATA pptsBIGDATA ppts
BIGDATA ppts
 

More from Hortonworks

Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks
 
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyIoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyHortonworks
 
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakGetting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakHortonworks
 
Johns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsJohns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsHortonworks
 
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysCatch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysHortonworks
 
HDF 3.2 - What's New
HDF 3.2 - What's NewHDF 3.2 - What's New
HDF 3.2 - What's NewHortonworks
 
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerCuring Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerHortonworks
 
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsInterpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsHortonworks
 
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeIBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeHortonworks
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidHortonworks
 
Accelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleAccelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleHortonworks
 
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATATIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATAHortonworks
 
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Hortonworks
 
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseDelivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseHortonworks
 
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseMaking Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseHortonworks
 
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationWebinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationHortonworks
 
Driving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementDriving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementHortonworks
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHortonworks
 
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks
 
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCUnlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCHortonworks
 

More from Hortonworks (20)

Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
 
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyIoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
 
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakGetting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with Cloudbreak
 
Johns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsJohns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log Events
 
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysCatch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
 
HDF 3.2 - What's New
HDF 3.2 - What's NewHDF 3.2 - What's New
HDF 3.2 - What's New
 
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerCuring Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
 
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsInterpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
 
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeIBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data Landscape
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache Druid
 
Accelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleAccelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at Scale
 
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATATIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
 
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
 
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseDelivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
 
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseMaking Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with Ease
 
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationWebinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
 
Driving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementDriving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data Management
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
 
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
 
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCUnlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDC
 

Recently uploaded

How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 

Recently uploaded (20)

How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 

Integration of Hive and HBase

  • 1. Integration of Apache Hive and HBase Enis Soztutar enis [at] apache [dot] org @enissoz Architecting the Future of Big Data © Hortonworks Inc. 2011 Page 1
  • 2. About Me •  User and committer of Hadoop since 2007 •  Contributor to Apache Hadoop, HBase, Hive and Gora •  Joined Hortonworks as Member of Technical Staff •  Twitter: @enissoz Architecting the Future of Big Data Page 2 © Hortonworks Inc. 2011
  • 3. Agenda •  Overview of Hive and HBase •  Hive + HBase Features and Improvements •  Future of Hive and HBase •  Q&A Architecting the Future of Big Data Page 3 © Hortonworks Inc. 2011
  • 4. Apache Hive Overview • Apache Hive is a data warehouse system for Hadoop • SQL-like query language called HiveQL • Built for PB scale data • Main purpose is analysis and ad hoc querying • Database / table / partition / bucket – DDL Operations • SQL Types + Complex Types (ARRAY, MAP, etc) • Very extensible • Not for : small data sets, low latency queries, OLTP Architecting the Future of Big Data Page 4 © Hortonworks Inc. 2011
  • 5. Apache Hive Architecture JDBC/ODBC Hive Thrift Hive Web CLI Server Interface Driver M S C Parser Planner l Metastore i e Execution Optimizer n t MapReduce HDFS RDBMS Architecting the Future of Big Data Page 5 © Hortonworks Inc. 2011
  • 6. Overview of Apache HBase • Apache HBase is the Hadoop database • Modeled after Google’s BigTable • A sparse, distributed, persistent multi- dimensional sorted map • The map is indexed by a row key, column key, and a timestamp • Each value in the map is an un-interpreted array of bytes • Low latency random data access Architecting the Future of Big Data Page 6 © Hortonworks Inc. 2011
  • 7. Overview of Apache HBase • Logical view: From: Bigtable: A Distributed Storage System for Structured Data, Chang, et al. Architecting the Future of Big Data Page 7 © Hortonworks Inc. 2011
  • 8. Apache HBase Architecture Client HMaster Zookeeper Region Region Region server server server Region Region Region Region Region Region HDFS Architecting the Future of Big Data Page 8 © Hortonworks Inc. 2011
  • 9. Hive + HBase Features and Improvements Architecting the Future of Big Data Page 9 © Hortonworks Inc. 2011
  • 10. Hive + HBase Motivation • Hive and HBase has different characteristics: High latency Low latency Structured vs. Unstructured Analysts Programmers • Hive datawarehouses on Hadoop are high latency – Long ETL times – Access to real time data • Analyzing HBase data with MapReduce requires custom coding • Hive and SQL are already known by many analysts Architecting the Future of Big Data Page 10 © Hortonworks Inc. 2011
  • 11. Use Case 1: HBase as ETL Data Sink From HUG - Hive/HBase Integration or, MaybeSQL? April 2010 John Sichi Facebook http://www.slideshare.net/hadoopusergroup/hive-h-basehadoopapr2010 Architecting the Future of Big Data Page 11 © Hortonworks Inc. 2011
  • 12. Use Case 2: HBase as Data Source From HUG - Hive/HBase Integration or, MaybeSQL? April 2010 John Sichi Facebook http://www.slideshare.net/hadoopusergroup/hive-h-basehadoopapr2010 Architecting the Future of Big Data Page 12 © Hortonworks Inc. 2011
  • 13. Use Case 3: Low Latency Warehouse From HUG - Hive/HBase Integration or, MaybeSQL? April 2010 John Sichi Facebook http://www.slideshare.net/hadoopusergroup/hive-h-basehadoopapr2010 Architecting the Future of Big Data Page 13 © Hortonworks Inc. 2011
  • 14. Example: Hive + Hbase (HBase table) hbase(main):001:0> create 'short_urls', {NAME => 'u'}, {NAME=>'s'} hbase(main):014:0> scan 'short_urls' ROW COLUMN+CELL bit.ly/aaaa column=s:hits, value=100 bit.ly/aaaa column=u:url, value=hbase.apache.org/ bit.ly/abcd column=s:hits, value=123 bit.ly/abcd column=u:url, value=example.com/foo Architecting the Future of Big Data Page 14 © Hortonworks Inc. 2011
  • 15. Example: Hive + HBase (Hive table) CREATE TABLE short_urls( short_url string, url string, hit_count int ) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key, u:url, s:hits") TBLPROPERTIES ("hbase.table.name" = ”short_urls"); Architecting the Future of Big Data Page 15 © Hortonworks Inc. 2011
  • 16. Storage Handler • Hive defines HiveStorageHandler class for different storage backends: HBase/ Cassandra / MongoDB/ etc • Storage Handler has hooks for –  Getting input / output formats –  Meta data operations hook: CREATE TABLE, DROP TABLE, etc • Storage Handler is a table level concept –  Does not support Hive partitions, and buckets Architecting the Future of Big Data Page 16 © Hortonworks Inc. 2011
  • 17. Apache Hive + HBase Architecture Hive Thrift Hive Web CLI Server Interface Driver M S Parser Planner C l Metastore i Execution Optimizer e n t StorageHandler MapReduce HBase HDFS RDBMS Architecting the Future of Big Data Page 17 © Hortonworks Inc. 2011
  • 18. Hive + HBase Integration • For Input/OutputFormat, getSplits(), etc underlying HBase classes are used • Column selection and certain filters can be pushed down • HBase tables can be used with other(Hadoop native) tables and SQL constructs • Hive DDL operations are converted to HBase DDL operations via the client hook. – All operations are performed by the client – No two phase commit Architecting the Future of Big Data Page 18 © Hortonworks Inc. 2011
  • 19. Schema / Type Mapping Architecting the Future of Big Data Page 19 © Hortonworks Inc. 2011
  • 20. Schema Mapping •  Hive table + columns + column types <=> HBase table + column families (+ column qualifiers) •  Every field in Hive table is mapped in order to either – The table key (using :key as selector) – A column family (cf:) -> MAP fields in Hive – A column (cf:cq) •  Hive table does not need to include all columns in HBase •  CREATE TABLE short_urls( short_url string, url string, hit_count int, props, map<string,string> ) WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key, u:url, s:hits, p:") Architecting the Future of Big Data Page 20 © Hortonworks Inc. 2011
  • 21. Type Mapping • Recently added to Hive (0.9.0) • Previously all types were being converted to strings in HBase • Hive has: – Primitive types: INT, STRING, BINARY, DATE, etc – ARRAY<Type> – MAP<PrimitiveType, Type> – STRUCT<a:INT, b:STRING, c:STRING> • HBase does not have types – Bytes.toBytes() Architecting the Future of Big Data Page 21 © Hortonworks Inc. 2011
  • 22. Type Mapping • Table level property "hbase.table.default.storage.type” = “binary” • Type mapping can be given per column after # – Any prefix of “binary” , eg u:url#b – Any prefix of “string” , eg u:url#s – The dash char “-” , eg u:url#- CREATE TABLE short_urls( short_url string, url string, hit_count int, props, map<string,string> ) WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key#b,u:url#b,s:hits#b,p:#s") Architecting the Future of Big Data Page 22 © Hortonworks Inc. 2011
  • 23. Type Mapping • If the type is not a primitive or Map, it is converted to a JSON string and serialized • Still a few rough edges for schema and type mapping: – No Hive BINARY support in HBase mapping – No mapping of HBase timestamp (can only provide put timestamp) – No arbitrary mapping of Structs / Arrays into HBase schema Architecting the Future of Big Data Page 23 © Hortonworks Inc. 2011
  • 24. Bulk Load • Steps to bulk load: – Sample source data for range partitioning – Save sampling results to a file – Run CLUSTER BY query using HiveHFileOutputFormat and TotalOrderPartitioner – Import Hfiles into HBase table • Ideal setup should be SET hive.hbase.bulk=true INSERT OVERWRITE TABLE web_table SELECT …. Architecting the Future of Big Data Page 24 © Hortonworks Inc. 2011
  • 25. Filter Pushdown Architecting the Future of Big Data Page 25 © Hortonworks Inc. 2011
  • 26. Filter Pushdown • Idea is to pass down filter expressions to the storage layer to minimize scanned data • To access indexes at HDFS or HBase • Example: CREATE EXTERNAL TABLE users (userid LONG, email STRING, … ) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler’ WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,…") SELECT ... FROM users WHERE userid > 1000000 and email LIKE ‘%@gmail.com’; -> scan.setStartRow(Bytes.toBytes(1000000)) Architecting the Future of Big Data Page 26 © Hortonworks Inc. 2011
  • 27. Filter Decomposition • Optimizer pushes down the predicates to the query plan • Storage handlers can negotiate with the Hive optimizer to decompose the filter x > 3 AND upper(y) = 'XYZ’ • Handle x > 3, send upper(y) = ’XYZ’ as residual for Hive • Works with: key = 3, key > 3, etc key > 3 AND key < 100 • Only works against constant expressions Architecting the Future of Big Data Page 27 © Hortonworks Inc. 2011
  • 28. Security Aspects Towards fully secure deployments Architecting the Future of Big Data Page 28 © Hortonworks Inc. 2011
  • 29. Security – Big Picture • Security becomes more important to support enterprise level and multi tenant applications • 5 Different Components to ensure / impose security – HDFS – MapReduce – HBase – Zookeeper – Hive • Each component has: – Authentication – Authorization Architecting the Future of Big Data Page 29 © Hortonworks Inc. 2011
  • 30. HBase Security – Closer look • Released with HBase 0.92 • Fully optional module, disabled by default • Needs an underlying secure Hadoop release • SecureRPCEngine: optional engine enforcing SASL authentication – Kerberos – DIGEST-MD5 based tokens – TokenProvider coprocessor • Access control is implemented as a Coprocessor: AccessController • Stores and distributes ACL data via Zookeeper – Sensitive data is only accessible by HBase daemons – Client does not need to authenticate to zk Architecting the Future of Big Data Page 30 © Hortonworks Inc. 2011
  • 31. Hive Security – Closer look • Hive has different deployment options, security considerations should take into account different deployments • Authentication is only supported at Metastore, not on HiveServer, web interface, JDBC • Authorization is enforced at the query layer (Driver) • Pluggable authorization providers. Default one stores global/ table/partition/column permissions in Metastore GRANT ALTER ON TABLE web_table TO USER bob; CREATE ROLE db_reader GRANT SELECT, SHOW_DATABASE ON DATABASE mydb TO ROLE db_reader Architecting the Future of Big Data Page 31 © Hortonworks Inc. 2011
  • 32. Hive Deployment Option 1 Client CLI Driver M Authorization S C Parser Planner l Authentication i Metastore e Execution Optimizer n t A/A A/A MapReduce HBase A12n/A11N A12n/A11N HDFS RDBMS Architecting the Future of Big Data Page 32 © Hortonworks Inc. 2011
  • 33. Hive Deployment Option 2 Client CLI Driver M Authorization S C Parser Planner l Authentication i e Metastore n Execution Optimizer t A/A A/A MapReduce HBase A12n/A11N A12n/A11N HDFS RDBMS Architecting the Future of Big Data Page 33 © Hortonworks Inc. 2011
  • 34. Hive Deployment Option 3 Client JDBC/ODBC Hive Thrift Hive Web CLI Server Interface M Driver Authorization S C Parser Planner l Authentication i Metastore e Execution Optimizer n t A/A A/A MapReduce HBase A12n/A11N HDFS A12n/A11N RDBMS Architecting the Future of Big Data Page 34 © Hortonworks Inc. 2011
  • 35. Hive + HBase + Hadoop Security • Regardless of Hive’s own security, for Hive to work on secure Hadoop and HBase, we should: – Obtain delegation tokens for Hadoop and HBase jobs – Ensure to obey the storage level (HDFS, HBase) permission checks – In HiveServer deployments, authenticate and impersonate the user • Delegation tokens for Hadoop are already working • Obtaining HBase delegation tokens are released in Hive 0.9.0 Architecting the Future of Big Data Page 35 © Hortonworks Inc. 2011
  • 36. Future of Hive + HBase • Improve on schema / type mapping • Fully secure Hive deployment options • HBase bulk import improvements • Sortable signed numeric types in HBase • Filter pushdown: non key column filters • Hive random access support for HBase – https://cwiki.apache.org/HCATALOG/random-access- framework.html Architecting the Future of Big Data Page 36 © Hortonworks Inc. 2011
  • 37. References • Security – https://issues.apache.org/jira/browse/HIVE-2764 – https://issues.apache.org/jira/browse/HBASE-5371 – https://issues.apache.org/jira/browse/HCATALOG-245 – https://issues.apache.org/jira/browse/HCATALOG-260 – https://issues.apache.org/jira/browse/HCATALOG-244 – https://cwiki.apache.org/confluence/display/HCATALOG/Hcat+Security +Design • Type mapping / Filter Pushdown – https://issues.apache.org/jira/browse/HIVE-1634 – https://issues.apache.org/jira/browse/HIVE-1226 – https://issues.apache.org/jira/browse/HIVE-1643 – https://issues.apache.org/jira/browse/HIVE-2815 – https://issues.apache.org/jira/browse/HIVE-1643 Architecting the Future of Big Data Page 37 © Hortonworks Inc. 2011
  • 38. Other Resources • Hadoop Summit – June 13-14 – San Jose, California – www.Hadoopsummit.org • Hadoop Training and Certification – Developing Solutions Using Apache Hadoop – Administering Apache Hadoop – Online classes available US, India, EMEA – http://hortonworks.com/training/ © Hortonworks Inc. 2012 Page 38
  • 39. Thanks Questions? Architecting the Future of Big Data Page 39 © Hortonworks Inc. 2011