Submit Search
Upload
LLAP: Sub-Second Analytical Queries in Hive
•
Download as PPTX, PDF
•
20 likes
•
6,871 views
DataWorks Summit/Hadoop Summit
Follow
LLAP: Sub-Second Analytical Queries in Hive
Read less
Read more
Technology
Report
Share
Report
Share
1 of 46
Download now
Recommended
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud
gluent.
Achieving 100k Queries per Hour on Hive on Tez
Achieving 100k Queries per Hour on Hive on Tez
DataWorks Summit/Hadoop Summit
Evolving HDFS to Generalized Storage Subsystem
Evolving HDFS to Generalized Storage Subsystem
DataWorks Summit/Hadoop Summit
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
DataWorks Summit/Hadoop Summit
Taming the Elephant: Efficient and Effective Apache Hadoop Management
Taming the Elephant: Efficient and Effective Apache Hadoop Management
DataWorks Summit/Hadoop Summit
LLAP Nov Meetup
LLAP Nov Meetup
t3rmin4t0r
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache Flink
DataWorks Summit
Llap: Locality is Dead
Llap: Locality is Dead
t3rmin4t0r
Recommended
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud
gluent.
Achieving 100k Queries per Hour on Hive on Tez
Achieving 100k Queries per Hour on Hive on Tez
DataWorks Summit/Hadoop Summit
Evolving HDFS to Generalized Storage Subsystem
Evolving HDFS to Generalized Storage Subsystem
DataWorks Summit/Hadoop Summit
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
DataWorks Summit/Hadoop Summit
Taming the Elephant: Efficient and Effective Apache Hadoop Management
Taming the Elephant: Efficient and Effective Apache Hadoop Management
DataWorks Summit/Hadoop Summit
LLAP Nov Meetup
LLAP Nov Meetup
t3rmin4t0r
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache Flink
DataWorks Summit
Llap: Locality is Dead
Llap: Locality is Dead
t3rmin4t0r
Evolving HDFS to a Generalized Distributed Storage Subsystem
Evolving HDFS to a Generalized Distributed Storage Subsystem
DataWorks Summit/Hadoop Summit
ORC 2015: Faster, Better, Smaller
ORC 2015: Faster, Better, Smaller
The Apache Software Foundation
Tune up Yarn and Hive
Tune up Yarn and Hive
rxu
How the Internet of Things are Turning the Internet Upside Down
How the Internet of Things are Turning the Internet Upside Down
DataWorks Summit
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
alanfgates
Optimizing Hive Queries
Optimizing Hive Queries
DataWorks Summit
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
DataWorks Summit
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
DataWorks Summit/Hadoop Summit
Debunking the Myths of HDFS Erasure Coding Performance
Debunking the Myths of HDFS Erasure Coding Performance
DataWorks Summit/Hadoop Summit
Hadoop operations-2015-hadoop-summit-san-jose-v5
Hadoop operations-2015-hadoop-summit-san-jose-v5
Chris Nauroth
Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!
Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!
Mich Talebzadeh (Ph.D.)
What's new in hadoop 3.0
What's new in hadoop 3.0
Heiko Loewe
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
DataWorks Summit/Hadoop Summit
Optimizing Hive Queries
Optimizing Hive Queries
Owen O'Malley
Sub-second-sql-on-hadoop-at-scale
Sub-second-sql-on-hadoop-at-scale
Yifeng Jiang
HDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFS
DataWorks Summit/Hadoop Summit
Unified Batch & Stream Processing with Apache Samza
Unified Batch & Stream Processing with Apache Samza
DataWorks Summit
Scaling HDFS to Manage Billions of Files with Key-Value Stores
Scaling HDFS to Manage Billions of Files with Key-Value Stores
DataWorks Summit
Comparative Performance Analysis of AWS EC2 Instance Types Commonly Used for ...
Comparative Performance Analysis of AWS EC2 Instance Types Commonly Used for ...
DataWorks Summit
Network for the Large-scale Hadoop cluster at Yahoo! JAPAN
Network for the Large-scale Hadoop cluster at Yahoo! JAPAN
DataWorks Summit/Hadoop Summit
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
DataWorks Summit/Hadoop Summit
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
DataWorks Summit/Hadoop Summit
More Related Content
What's hot
Evolving HDFS to a Generalized Distributed Storage Subsystem
Evolving HDFS to a Generalized Distributed Storage Subsystem
DataWorks Summit/Hadoop Summit
ORC 2015: Faster, Better, Smaller
ORC 2015: Faster, Better, Smaller
The Apache Software Foundation
Tune up Yarn and Hive
Tune up Yarn and Hive
rxu
How the Internet of Things are Turning the Internet Upside Down
How the Internet of Things are Turning the Internet Upside Down
DataWorks Summit
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
alanfgates
Optimizing Hive Queries
Optimizing Hive Queries
DataWorks Summit
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
DataWorks Summit
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
DataWorks Summit/Hadoop Summit
Debunking the Myths of HDFS Erasure Coding Performance
Debunking the Myths of HDFS Erasure Coding Performance
DataWorks Summit/Hadoop Summit
Hadoop operations-2015-hadoop-summit-san-jose-v5
Hadoop operations-2015-hadoop-summit-san-jose-v5
Chris Nauroth
Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!
Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!
Mich Talebzadeh (Ph.D.)
What's new in hadoop 3.0
What's new in hadoop 3.0
Heiko Loewe
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
DataWorks Summit/Hadoop Summit
Optimizing Hive Queries
Optimizing Hive Queries
Owen O'Malley
Sub-second-sql-on-hadoop-at-scale
Sub-second-sql-on-hadoop-at-scale
Yifeng Jiang
HDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFS
DataWorks Summit/Hadoop Summit
Unified Batch & Stream Processing with Apache Samza
Unified Batch & Stream Processing with Apache Samza
DataWorks Summit
Scaling HDFS to Manage Billions of Files with Key-Value Stores
Scaling HDFS to Manage Billions of Files with Key-Value Stores
DataWorks Summit
Comparative Performance Analysis of AWS EC2 Instance Types Commonly Used for ...
Comparative Performance Analysis of AWS EC2 Instance Types Commonly Used for ...
DataWorks Summit
Network for the Large-scale Hadoop cluster at Yahoo! JAPAN
Network for the Large-scale Hadoop cluster at Yahoo! JAPAN
DataWorks Summit/Hadoop Summit
What's hot
(20)
Evolving HDFS to a Generalized Distributed Storage Subsystem
Evolving HDFS to a Generalized Distributed Storage Subsystem
ORC 2015: Faster, Better, Smaller
ORC 2015: Faster, Better, Smaller
Tune up Yarn and Hive
Tune up Yarn and Hive
How the Internet of Things are Turning the Internet Upside Down
How the Internet of Things are Turning the Internet Upside Down
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Optimizing Hive Queries
Optimizing Hive Queries
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
Debunking the Myths of HDFS Erasure Coding Performance
Debunking the Myths of HDFS Erasure Coding Performance
Hadoop operations-2015-hadoop-summit-san-jose-v5
Hadoop operations-2015-hadoop-summit-san-jose-v5
Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!
Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!
What's new in hadoop 3.0
What's new in hadoop 3.0
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
Optimizing Hive Queries
Optimizing Hive Queries
Sub-second-sql-on-hadoop-at-scale
Sub-second-sql-on-hadoop-at-scale
HDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFS
Unified Batch & Stream Processing with Apache Samza
Unified Batch & Stream Processing with Apache Samza
Scaling HDFS to Manage Billions of Files with Key-Value Stores
Scaling HDFS to Manage Billions of Files with Key-Value Stores
Comparative Performance Analysis of AWS EC2 Instance Types Commonly Used for ...
Comparative Performance Analysis of AWS EC2 Instance Types Commonly Used for ...
Network for the Large-scale Hadoop cluster at Yahoo! JAPAN
Network for the Large-scale Hadoop cluster at Yahoo! JAPAN
Similar to LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
DataWorks Summit/Hadoop Summit
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
DataWorks Summit/Hadoop Summit
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
DataWorks Summit/Hadoop Summit
LLAP: long-lived execution in Hive
LLAP: long-lived execution in Hive
DataWorks Summit
Stinger.Next by Alan Gates of Hortonworks
Stinger.Next by Alan Gates of Hortonworks
Data Con LA
LLAP: Building Cloud First BI
LLAP: Building Cloud First BI
DataWorks Summit
Hive acid and_2.x new_features
Hive acid and_2.x new_features
Alberto Romero
Kinesis vs-kafka-and-kafka-deep-dive
Kinesis vs-kafka-and-kafka-deep-dive
Yifeng Jiang
Hive Performance Dataworks Summit Melbourne February 2019
Hive Performance Dataworks Summit Melbourne February 2019
alanfgates
Fast SQL on Hadoop, Really?
Fast SQL on Hadoop, Really?
DataWorks Summit
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
DataWorks Summit
Curing the Kafka blindness—Streams Messaging Manager
Curing the Kafka blindness—Streams Messaging Manager
DataWorks Summit
What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?
DataWorks Summit
What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?
DataWorks Summit
What's New in Apache Hive 3.0 - Tokyo
What's New in Apache Hive 3.0 - Tokyo
DataWorks Summit
Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017
alanfgates
An Apache Hive Based Data Warehouse
An Apache Hive Based Data Warehouse
DataWorks Summit
What is New in Apache Hive 3.0?
What is New in Apache Hive 3.0?
DataWorks Summit
Hive 3 New Horizons DataWorks Summit Melbourne February 2019
Hive 3 New Horizons DataWorks Summit Melbourne February 2019
alanfgates
Using Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
Using Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
DataWorks Summit
Similar to LLAP: Sub-Second Analytical Queries in Hive
(20)
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
LLAP: long-lived execution in Hive
LLAP: long-lived execution in Hive
Stinger.Next by Alan Gates of Hortonworks
Stinger.Next by Alan Gates of Hortonworks
LLAP: Building Cloud First BI
LLAP: Building Cloud First BI
Hive acid and_2.x new_features
Hive acid and_2.x new_features
Kinesis vs-kafka-and-kafka-deep-dive
Kinesis vs-kafka-and-kafka-deep-dive
Hive Performance Dataworks Summit Melbourne February 2019
Hive Performance Dataworks Summit Melbourne February 2019
Fast SQL on Hadoop, Really?
Fast SQL on Hadoop, Really?
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
Curing the Kafka blindness—Streams Messaging Manager
Curing the Kafka blindness—Streams Messaging Manager
What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?
What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0 - Tokyo
What's New in Apache Hive 3.0 - Tokyo
Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017
An Apache Hive Based Data Warehouse
An Apache Hive Based Data Warehouse
What is New in Apache Hive 3.0?
What is New in Apache Hive 3.0?
Hive 3 New Horizons DataWorks Summit Melbourne February 2019
Hive 3 New Horizons DataWorks Summit Melbourne February 2019
Using Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
Using Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
More from DataWorks Summit/Hadoop Summit
Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
DataWorks Summit/Hadoop Summit
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
DataWorks Summit/Hadoop Summit
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
DataWorks Summit/Hadoop Summit
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
DataWorks Summit/Hadoop Summit
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
DataWorks Summit/Hadoop Summit
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
DataWorks Summit/Hadoop Summit
Hadoop Crash Course
Hadoop Crash Course
DataWorks Summit/Hadoop Summit
Data Science Crash Course
Data Science Crash Course
DataWorks Summit/Hadoop Summit
Apache Spark Crash Course
Apache Spark Crash Course
DataWorks Summit/Hadoop Summit
Dataflow with Apache NiFi
Dataflow with Apache NiFi
DataWorks Summit/Hadoop Summit
Schema Registry - Set you Data Free
Schema Registry - Set you Data Free
DataWorks Summit/Hadoop Summit
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
DataWorks Summit/Hadoop Summit
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
DataWorks Summit/Hadoop Summit
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
DataWorks Summit/Hadoop Summit
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
DataWorks Summit/Hadoop Summit
HBase in Practice
HBase in Practice
DataWorks Summit/Hadoop Summit
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
DataWorks Summit/Hadoop Summit
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
DataWorks Summit/Hadoop Summit
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
DataWorks Summit/Hadoop Summit
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
DataWorks Summit/Hadoop Summit
More from DataWorks Summit/Hadoop Summit
(20)
Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
Hadoop Crash Course
Hadoop Crash Course
Data Science Crash Course
Data Science Crash Course
Apache Spark Crash Course
Apache Spark Crash Course
Dataflow with Apache NiFi
Dataflow with Apache NiFi
Schema Registry - Set you Data Free
Schema Registry - Set you Data Free
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
HBase in Practice
HBase in Practice
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Recently uploaded
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
Radu Cotescu
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
Ridwan Fadjar
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
Delhi Call girls
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
naman860154
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
Rafal Los
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
Pixlogix Infotech
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
soniya singh
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
Safe Software
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
hans926745
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
2toLead Limited
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
Allon Mureinik
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
Gabriella Davis
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
Delhi Call girls
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
Sinan KOZAK
Slack Application Development 101 Slides
Slack Application Development 101 Slides
praypatel2
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
Softradix Technologies
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Patryk Bandurski
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
Pooja Nehwal
Recently uploaded
(20)
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
Slack Application Development 101 Slides
Slack Application Development 101 Slides
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
LLAP: Sub-Second Analytical Queries in Hive
1.
Page1 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved LLAP: Sub-Second Analytical Queries in Hive Sergey Shelukhin, Siddharth Seth
2.
Page2 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved LLAP: long-lived execution in Hive What is LLAP?+ + LLAP in Hive; performance primer+ + LLAP as a unified data access layer+ + Deploying and managing LLAP+ + Future work+ + + How does LLAP make queries faster?+
3.
Page3 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved What is LLAP?
4.
Page4 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved What is LLAP? • Hybrid model combining daemons and containers for fast, concurrent execution of analytical workloads (e.g. Hive SQL queries) • Concurrent queries without specialized YARN queue setup • Multi-threaded execution of vectorized operator pipelines • Asynchronous IO and efficient in-memory caching • Relational view of the data available thru the API • High performance scans, execution code pushdown • Centralized data security Node LLAP Process Cache Query Fragment HDFS Query Fragment
5.
Page5 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved What LLAP isn't • Not another "execution engine" (like Tez, MR, Spark…) • Tez, Spark (work in progress) can work on top of LLAP – Engines provide coordination and scheduling – some work (e.g. large shuffles) can be scheduled in containers • Not a storage substrate • Daemons are stateless and read (and cache) data from HDFS/S3/Azure/..
6.
Page6 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved LLAP Queue Technical overview – execution • LLAP daemon has a number of executors (think containers) that execute work "fragments" • Fragments are parts of one, or multiple parallel workloads (e.g. Hive SQL queries) • Work queue with pluggable priority • Geared towards low latency queries over long- running queries (by default) • I/O is similar to containers – read/write to HDFS, shuffle, other storages and formats • Streaming output for data API Executor Q1 Map 1 Executor External read Executor Q3 Reducer 3 Q1 Map 1 Q1 Map 1 Q3 Map 19 HDFS Waiting for shuffle inputs HBase Container (shuffle input) Spark executor
7.
Page7 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Executor Technical overview – IO layer • Optional when executing inside LLAP • Wraps existing InputFormat • Currently only ORC is supported • Asynchronous IO for Hive • Transparent, compressed in-memory cache • Format-specific, extensible • SSD cache is work in progress RecordReade r Fragment Cache IO thread Plan & decode Read, decompress Metadata cache Actual data (HDFS, S3, …) What to read Data buffers Splits Vectorized data
8.
Page8 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved LLAP in Hive; performance primer
9.
Page9 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Overview - Hive + LLAP • Transparent to Hive users, BI tools, etc. • Except the queries become faster :) • Number of concurrent queries throttled by Hive Server • Hive decides where query fragments run (LLAP, Container, AM) based on configuration, data size, format, etc. • Each Query coordinated independently by a Tez AM • Hive Operators used for processing • Tez Runtime used for data transfer HiveServer2 Query/AM Controller Client(s) YARN Cluster AM1 llapd llapd Container AM1 Container AM1 llapd Container AM2 AM2 AM3 llapd
10.
Page10 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved How does LLAP speed up the queries? • Eliminates container startup costs • JIT optimizer has a chance to work • Especially important for vectorized processing • Asynchronous IO • Caching • Data sharing (hash join tables, etc.) • Optimizations for parallel queries
11.
Page11 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Industry benchmark – 10Tb scale 0 5 10 15 20 25 30 35 40 45 50 query3 query12 query20 query21 query26 query27 query42 query52 query55 query73 query89 query91 query98 Time(seconds) LLAP vs Hive 1.x 10TB Scale Hive 1.x LLAP90% faster
12.
Page12 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Evaluation for a real use case 0 20000 1 3 5 7 9 11 13 15 17 19 21 23 25 27 • Aggregate daily statistics for a time interval: SELECT yyyymmdd, sum(total_1), sum(total_2), ... from table where yyyymmdd >= xxx and yyyymmdd < xxx and userid = xxx group by userid, yyyymmdd; Max Max Max Avg Avg Avg 0 1 2 3 4 5 6 7 8 D W M Y D W M Y D W M Y Tez Phoenix LLAP Executiontime,s
13.
Page13 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Evaluation for a real use case ExecutionTime,s. • Display a large report • Phoenix runtimes not displayed (30-250s on large ranges) Execution Time in seconds over time range Max Max Avg Avg 0 5 10 15 20 D W M Y D W M Y Tez LLAP Executiontime,s
14.
Page14 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved How does LLAP make queries faster?
15.
Page15 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Persistent daemons (recap) • Reduced startup time • Better use of JIT optimizer • Hash join hash tables, fragment plans are shared • Multiple tasks do not all generate the table or deserialize the plans
16.
Page16 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Performance – first query • Cold LLAP is nearly as fast as shared pre-warmed containers (impractical on real clusters) • Realistic (long-running) LLAP ~3x faster than realistic (no prewarm) Tez No prewarm, 20.94 Shared prewarm 11.96 Cold LLAP 13.23 Realistic LLAP 7.49 0 5 10 15 20 25 Firstqueryruntime,s
17.
Page17 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Performance – repeated query • Cache disabled! 13.23 7.93 7.7 6.29 5.44 5.3 5.01 4.89 4.83 7.49 6.1 4.61 4.59 4.93 4.62 4.63 4.45 4.25 0 2 4 6 8 10 12 14 0 1 2 3 4 5 6 7 8 Queryruntime,s Query run Cold LLAP LLAP after an unrelated workload
18.
Page18 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Parallel queries – priorities, preemption • Lower-priority fragments can be preempted • For example, a fragment can start running before its inputs are ready, for better pipelining; such fragments may be preempted • LLAP work queue examines the DAG parameters to give preference to interactive (BI) queries LLAP QueueExecutor Executor Interactive query map 1/3 … Interactive query map 3/3 Executor Interactive query map 2/3 Wide query reduce waiting Time ClusterUtilization Long-Running Query Short-Running Query
19.
Page19 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Parallel query execution – LLAP vs Hive 1.2 3131 2416 1741 1420 1508 531 291 147 94 75 100% 91% 90% 71% 44% 45% 13% 0.00% 20.00% 40.00% 60.00% 80.00% 100.00% 0 500 1000 1500 2000 2500 3000 3500 1 2 4 8 16 Scaling(comparedtolinear) Totalruntimeforallqueries,s. Number of concurrent users Hive 1.2 linear scaling Hive 1.2 total runtime LLAP linear scaling LLAP total runtime Hive+LLAP efficiency Hive 1.2 efficiency
20.
Page20 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Parallel query execution – 10Tb scale 531 291 147 94 75 100% 91% 90% 71% 44% 0.00% 20.00% 40.00% 60.00% 80.00% 100.00% 0 100 200 300 400 500 600 1 2 4 8 16 Scaling(comparedtolinear) Totalruntimeforallqueries,s. Number of concurrent users LLAP linear scaling LLAP total runtime Hive+LLAP efficiency
21.
Page21 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved IO elevator and caching • Reading, decoding and processing are parallel • Unlike in regular Hive, where processing can wait for e.g. an S3 read • Logically-compresses (e.g. ORC-encoded) data is cached off-heap • Metadata is cached (currently on-heap) with high priority • Replacement policy is pluggable (LRFU is the default) • Tez schedules work to preferred location to improve locality • Tradeoff between HDFS reads and operator speed • Cache takes memory away from operators (sort buffers, hash join tables, etc.) • Depends on how much memory you have, workflow, dataset size, etc. • Ex. 4Gb per executor x 16 CPUs – 64Gb for executors, rest for cache
22.
Page22 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved I/OExecution In-memory processing – present and future Decoder Distributed FS Fragment Hive operator Hive operator Vectorized processin g col1 col2 Low-level pushdown (e.g. filter)* SSD cache* Off-heap cache Compression codec Native data vectors - work in progress * Compact encoded data Compressed data col1 col2
23.
Page23 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Fragment I/OExecution In-memory processing – future? Distributed FSHive operator Hive operator col1 col2 SSD cache* Off-heap cache Compression codec - work in progress * Compact encoded data Compressed data Processing on encoded data, codegen Zero copy - next steps? * Low-level operators (e.g. filter) *
24.
Page24 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Performance – cache on slow FS • Industry-standard cloud FS, much slower than on-prem HDFS • Large scan on 1Tb scale • Local HD/SDD cache (WIP) • Massive improvement on slow storage with little memory cost 0 50 100 150 200 250 300 350 400 Cold cache Hot cache Runtime,s Entire query Scan portion
25.
Page25 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Performance – cache on HDFS, 1Tb scale 20.2 18.3 17.6 16.2 16.2 16.8 14.5 11.3 0.00% 20.00% 40.00% 60.00% 80.00% 100.00% 0 5 10 15 20 25 24 26 28 30 32 34 36 38 40 42 Cachehitrate Queryruntime,s Cache size, Gb Query runtime, after unrelated workload Query runtime, after related workload Metadata hit rate Data hit rate
26.
Page26 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Perf overview – a demo in a gif
27.
Page27 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved LLAP as the unified data access layer
28.
Page28 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Overview • LLAP can provide a "relational datanode" view of the data • Security via endpoint ACLs, or fragment signing (for external engines) • Granular, e.g. column-level, security is possible • Anyone (with access) can push the (approved) code in, from complex query fragments to simple data reads • SparkSQL/LLAP integration is based on SparkSQL data source APIs • DataFrame can be created with LlapInputFormat • Hive execution engines and other DAG executors can use LLAP directly • like Tez does now
29.
Page29 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Example - SparkSQL integration • Allowing direct access to Hive table data from Spark breaks security model for warehouses secured using Ranger or SQL Standard Authorization • Other features may not work correctly unless support added in Spark • Row/Cell level security (when implemented), ACID, Schema Evolution, … • SparkSQL has interfaces for external sources; they can be implemented to access Hive data via HS2/LLAP • SparkSQL Catalyst optimizer hooks can be used to push processing into LLAP (e.g. filters on the tables) • Some optimizations, like dynamic partition pruning, can be used by Hive even if SparkSQL doesn't support them; if execution pushdown is advanced enough
30.
Page30 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Example - SparkSQL integration – execution flow Page 30 HadoopRDD.compute() LlapInputFormat.getRecordReader() - Open socket for incoming data - Send package/plan to LLAP Check permissions Generate splits w/LLAP locations Return securely signed splits HiveServer2 Spark Executor HadoopRDD.getPartitions() Get Hive splits Cluster nodes Verify splits Run query plan Send data back LLAP Partitions/ Splits Request splits : : var llapContext = LlapContext.newInstance( sparkContext, jdbcUrl) var df: DataFrame = llapContext.sql("select * from tpch_text_5.region") DataFrame for Hive/LLAP data
31.
Page31 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Example - Tez/LLAP integration • Tez DAG coordination remains mostly unchanged • Tez plugins (TEZ-2003) • Scheduling – allows scheduling parts of a DAG as fragments on LLAP • Task communication – custom execution specifications, protocols – allows talking to LLAP, manages security tokens, etc. • Hive operators used for processing • Few or no changes necessary to support any Hive query complexity • All new Hive features automatically supported
32.
Page32 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Deploying and managing LLAP
33.
Page33 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Availability • First version shipped in Apache Hive 2.0 • Hive 2.0.1 contains important bugfixes to make it production ready • Hive 2.1 would have new features and further perf improvements • A solid platform for the basic scenario – run LLAP-only queries on a secure cluster • or a fraction thereof • Work in progress to support additional features (ACID, UDFs, hybrid deployments, etc.) • Unified data access layer is the next step
34.
Page34 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved General overview • LLAP daemons run on YARN; deployed via Apache Slider • Easy to bring up, tear down, flex clusters via slider commands • Hive-on-Tez should be set up with a compatible version (0.8.2+) • Zookeeper for service discovery and coordination • On HS2, using doAs is not recommended; use SQL auth or Ranger • However, you can keep using storage-based auth for other queries, as usual • Java 8 strongly recommended • April 2015 called and they want their Java 7 End-of-Lived
35.
Page35 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Starting the cluster • Apache Ambari integration – WIP (will do all of this for you) • Hive service (hive --service llap) • Generates a slider package, and a script to deploy it and start the cluster • Run as the correct user – slider paths are user-specific! • HADOOP_HOME, JAVA_HOME should be set • kinit on secure cluster • Specify a name, e.g. --name llap0; number of instances (e.g. one per machine); memory, cache size, number of executors (e.g. one per CPU); see --help • G1 GC recommended (e.g. --args " -XX:+UseG1GC -XX:+ResizeTLAB - XX:-ResizePLAB")
36.
Page36 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Monitoring • LLAP exposes a UI for monitoring • Also has jmx endpoint with much more data, logs and jstack endpoints as usual • Aggregate monitoring UI is work in progress
37.
Page37 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Running queries • Once the cluster is started, queries can be run from HS2 and CLI • --hiveconf hive.execution.mode=llap --hiveconf hive.llap.execution.mode=all --hiveconf hive.llap.io.enabled=true --hiveconf hive.llap.daemon.service.hosts=@<cluster name> • Set the usual perf settings (recommended, if not already set) • Enable vectorization, PPD, map join; use ORC • LLAP-specific: set hive.tez.input.generate.consistent.splits=true; • For interactive queries, disable CBO • Parallel compilation on HS2 – otherwise your parallel queries might bottleneck there! • Good to go!
38.
Page38 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Watching queries – Tez UI integration
39.
Page39 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Security • Again, Ambari will do this for you soon • SQL or Ranger auth recommended, with hive user running queries • LLAP uses keytabs and ACLs, similar to datanode, to secure endpoints • Keytabs need to be set up; certs may need to be set up for Web UI • HS2 (or client) obtains a token to access a particular LLAP cluster, and gives it to Tez AM • LLAP-s share tokens using secure ZK, similar to HA token sharing • Signing fragments on HS2 and granular security checks are work in progress
40.
Page40 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Summary and future work
41.
Page41 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Future work • Data view for external services • Column-level security • Better integration and deployment • Ambari, monitoring UI, fine-grained log aggregation • Hybrid deployment • Resizing LLAP to accommodate containers, YARN resource delegation, etc. • Feature work • ACID, text cache, better cache policies, UDFs, SSD cache, etc. • More performance work
42.
Page42 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Summary • LLAP is a new execution substrate for fast, concurrent analytical workloads, harnessing Hive vectorized SQL engine and efficient in-memory caching layer • Provides secure relational view of the data thru a standard InputFormat API • Available in Hive 2, with better ecosystem integration in the works
43.
Page43 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Questions? ? Interested? Stop by the Hortonworks booth to learn more
44.
Page44 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Backup slides
45.
Page45 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved Example execution: MR vs Tez vs Tez+LLAP M M M R R M M R M M R M M R HDFS HDFS HDFS T T T T T R T T T R M M M R R R M M R R HDFS MapReduce Tez Tez with LLAP
46.
Page46 © Hortonworks
Inc. 2011 – 2015. All Rights Reserved IO Elevator • Reading, decoding and processing are parallel • Unlike in regular Hive • HDFS reads are expensive (more so on S3, Azure) • Data decompression and decoding is expensive
Editor's Notes
6 months note
Download now