Apache Hiveの今とこれから - 2016

Apache Hiveの今とこれから
Joe Ooura & Yuta Imai
2016/4/22
© Hortonworks Inc. 2011 – 2015. All Rights Reserved

2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
はじめに
Ã  質問はQUESTIONSというボタンからお願いします。プレゼンター以外には⾒
えません。
Ã  Twitter経由でもコメント、質問、⼤歓迎です！ #hwxjp

自己紹介
Ã  ⼤浦譲太郎 Twitter：@JOOOURA
Ã  5歳児と8歳児の⽗
Ã  サーバ、ストレージのシステム営業を経て2011年に
フラッシュメモリストレージ企業の⽇本法⼈⽴ち上げに
参画。Evangelist、プリセールスSE、広報、営業など⼀通
りをカバー
エンタープライズフラッシュの代名詞ともなるioDriveシ
リーズを⽇本国内の通信キャリア、⾦融機関、WEBサービ
ス事業者、アドテク、DC事業者に多数導⼊。
Ã  2016年1⽉より、ホートンワークスジャパンの⼆⼈⽬の
営業として参画。
現在はエヴァンジェリスト活動及びエンタープライズ向け
セールス、パートナー⽀援を⾏なっている。

About Hortonworks
お客様との歩み
•  ~800 社 (2016年2月現在)
•  152 社は 2015年第三四半期で
•  2015年10月NASDAQへ上場: HDP
The Leader in Connected Data
Platforms
•  Hortonworks DataFlow for data in moNon
•  Hortonworks Data PlaOorm for data at rest
•  Powering new modern data applicaNons
Partner for Customer Success
•  Leader in open-source community, focused on
innovaNon to meet enterprise needs
•  Unrivaled support subscripNons
Founded in 2011
Yahoo! で初代の Hadoop 開発を手
がけたアーキテクト、デベロッパー、オ
ペレータ　24名によって創立
1000+
E M P L O Y E E S
1500+
E C O S Y S T E M
PA R T N E R S

Our Model: Drive an Enterprise-focused Roadmap
1.  Innovate Exis6ng Projects
–  Hive/SNnger, YARN, HDFS, common ops & security via Ambari & Ranger
2.  Incubate New Projects
–  Metron (was OpenSOC), Ranger, Knox, Atlas, Falcon, Ambari, Tez, etc.
3.  Acquire IP & Contribute
–  Acquired XASecure and created Apache Ranger; contributed OpenSOC
4.  Partner & Deliver Joint Solu6ons
–  Microsod, EMC, HP, SAS, Pivotal, Red Hat, Teradata, etc.
5.  Rally the Ecosystem
–  Fast SQL via SNnger iniNaNve, Data Governance iniNaNve, ODPi
DataAccess
(batch,interactive,realtime)
Integration&
GovernanceOperationsSecurity
Apache Project
Hortonworks
CommiNers
Hortonworks
PMC
HWX % of
CommiNers
Hadoop 29 24 31%
Accumulo 2 2 9%
Calcite 6 3 43%
HBase 8 5 17%
Hive 19 11 38%
NiFi 5 5 42%
Phoenix 5 5 22%
Pig 5 5 24%
Slider 12 12 100%
Spark 1 0 2%
Storm 4 4 19%
Tez 15 15 44%
Atlas 7 0 35%
Falcon 7 5 41%
Flume 1 1 4%
KaZa 0 0 0%
Sqoop 1 1 4%
Ambari 39 30 76%
Oozie 4 2 22%
Zookeeper 2 1 13%
Knox 12 2 80%
Ranger 13 11 76%
TOTAL 197 144
Source: Apache Sodware FoundaNon. As of October 5, 2015.
A commi'er is someone who has “earned their stripes” within the Apache community and has the ability
to commit code directly to their corresponding Apache project source code repository

6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Page 6 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
100% Open Source Connected
Data Plaaorms
Eliminates Risk
of vendor lock-in by delivering 100% Apache open
source technology
Maximizes Community Innovation
with hundreds of developers across hundreds of
companies
Integrates Seamlessly
through commijed co-engineering partnerships
with other leading technologies
M A X I M U M C O M M U N I T Y I N N O VAT I O N
T H E
I N N O VAT I O N
A D VA N TA G E
P R O P R I E T A R Y
H A D O O P
T I M E INNOVATION
O P E N
C O M M U N I T Y

自己紹介
Ã  今井雄太 Twijer：@imai_factory
Ã  SoluNons Engineer
Ã  広告配信サーバーのレポート作成のために
MapReduce(perl + streaming!)を使ったのがHadoopとの出
会い。
Ã  その後、AWSにてアドテクやゲームのお客様を担当しつ
つ、EMRやS3などのビッグデータなプロダクトを主に担
当。そんなつながりでHortonworksに入社してHadoopを
やっています。

Ã ~Hive1.2.1
– Tez
– Cost Based Optimizer(CBO)
– ORC File format
– Vectorization
Ã Hive2.0
– LLAP
最近のApache Hive: Key highlights

Ã ~Hive1.2.1
– Tez
– Vectorization
Ã Hive2.0
– LLAP
Stinger Initiative
Hiveを100倍以上⾼速化
Already available on HDP!

Sub-second
ショートクエリで
1秒以下のレスポンスを⽬指す
Ã ~Hive1.2.1
– Tez
– Vectorization
Ã Hive2.0
– LLAP
Stinger Initiative

Ã  いずれの改善も数⾏の設定もしくはコマンドで利⽤可能です。
–  Hive2.0については現時点(4/22)においてまだHDPに取り込まれていません。
Ã  今⽇は、それらの仕組みにフォーカスしてお話します。

Hive performance recap
•  Stinger:
•  Apache Hiveのパフォーマンスを100倍にするというゴールのもとに始
まったプロジェクト
Vectorized SQL Engine,
Tez ExecuNon Engine,
ORC Columnar format
Cost Based OpNmizer

Hive 0.10
Batch
Processing
100-150x Query Speedup
Hive 0.14
Human
InteracNve
(5 seconds)

TPC-DS Benchmark at 30 Terabyte Scale
•  TPC-DSより 50 のサンプルクエリを 30 terabyte のスケールで実⾏
•  平均 52 倍の速度アップ, 最⼤ 160 倍の速度アップ
•  ベンチマークの総実⾏時間が 7.8 ⽇から 9.3 時間に短縮
•  Hive 14に追加された Cost-Based Optimizer が更に 2.5倍の速度アップ実現

Tez
Beyond MapReduce

Apache Tez
Page 15
Ã データ処理アプリのための汎⽤分散処理エン
ジン
– アプリ（フレームワーク）向け、エンドユーザー向
けではない
– Hive on Tez, Pig on Tez, Cascading on Tez, …
Ã MapReduceの教訓を活かした結果
– ⼤幅なパフォーマンス改善
– バッチ、インタラクティブ
– Petabytesスケール
Ã YARNの上で動かす
– クラスタリソースの活⽤
DAG(無閉路有向グラフ)

MapReduce & Tez
M M M
R R
M M
R
M M
R
M M
R
HDFS
HDFS
HDFS
M M M
R R
R
M M
R
R
Map – Reduce
Intermediate results in HDFS
Tez
Optimized Pipeline
•  中間データをHDFS
に書き出さない
•  Map-Reduce-
Reduceのような構
成を取ることができ
る
•  セッションによるコ
ンテナの再利⽤
•  ジョブを通してのパ
イプラインの最適化

What is DAG & Why DAG
Projection
Filter
GroupBy
…
Join
Union
Intersect
…
Split
…
• Directed Acyclic Graph（無閉路有向グラフ）
• どんなに複雑なDAGでも、基本的には以下の3つのパターンに分類ができる
– Sequential
– Merge
– Divide

Tezの⼤まかな動き
ProcessorInput Output

Tez – Key beneﬁts
• DAGの表現⼒
•  Easier to express computation in DAG
• 中間データをHDFSに吐き出さない
•  レイテンシ
•  NameNodeへの負荷
• Tezセッション/コンテナ再利⽤
•  AM/タスクコンテナアロケーションのオーバーヘッド
•  ResourceManagerの負荷
•  Object Registryによるデータ使い回し（MapJoin⽤のテーブルなど）
•  JITによる実⾏コードの最適化
• DAG全体を⾒渡しての最適化

Tez - architecture
Ã Client
–  Starts session
–  Submits DAG
Ã Application Master
–  DAG Scheduler
–  Task Scheduler
–  Vertex Manager
Ã TezTask Containers
–  Execution

ORC
Optimized Row Columnar

Hadoopで使われるファイルフォーマット
•  Text
•  SequenceFile
•  RCFile
•  + Can be read required column
•  + Compression on each column
•  - type-free binary blobs
•  - no index
•  - Compression by stream-based codec

ORCFile – Hiveのためのカラム型ストレージ
Ã High Compression
– カラムごとに適⽤されるデータの型スペシフィックな圧縮
– ストリーム単位でのZLIBやSNAPPYによる圧縮
Ã High Performance
– File, Stripe, Rowそれぞれのレベルでのインデックス、メタデータ
– Predicate Pushdown
Ã Flexible Data Model
– Complex types(struct, list, map, union)
– New types(datetime, decimal)
Page 25

ORC at Facebook
Saved more than 1,400
servers worth of storage. (2)
Compression i
Compression raNo
increased from 5x to 8x
globally. (2)
Compression i

ORC at Spotify
16x less HDFS read when
using ORC versus Avro.(3)
IO i
32x less CPU when using ORC
versus Avro.(3)
CPU i

ORC at Yahoo!
6-50x speedup when using
ORC versus Text File.(4)
Speedup i
1.6-30x speedup when using
ORC versus RCFile.(4)
Speedup i

ORCFile – ファイルフォーマット

デフォルトで256MBとい
う⼤きなチャンクサイズ
でファイルの中⾝を分割
Stripe

それぞれのStripeの場所、
スキーマ、ファイル全体
におけるそれぞれのカラ
ムのmin/max/sum値を
保持
File Footer

ファイルの圧縮形式と、
圧縮済みのFooterのサイ
ズを保持。その他カスタ
ムメタデータも保持可能。
最初にここだけ読み取ら
れる。
Post Script

Page 33
…
…
…
…
…
Stream: INDEX
Stream: BROOM FILTER
Stream: DATA
Stream: LENGTH
Stream: DICTIONARY
Row Group(Default: 10K records for each RG)

File
- Column1
- min
- max
- sum
- hasNull
- Column2
- Column3
- ColumnN
- Compression
- Footer Length
Stripe1
- Column1
- min
- max
- sum
- hasNull
- Column2
- Column3
- ColumnN
Column1
RG1
- min
- max
- sum
- hasNull
- pos
RG1
- min
- max
- sum
- hasNull
- pos
…
ColumnN
RG1
- min
- max
- sum
- hasNull
- pos
RG1
- min
- max
- sum
- hasNull
- pos
…
…
StripeN
…

Compression
Ã データの型スペシフィックな圧縮(Light-Weight Compression)
– カラムごとに適⽤される圧縮
– 必ず適⽤される
– RLE, Direct, Patch Base, Delta
Ã データストリームの圧縮(Generic Compression)
– ファイル全体を通して共通で適⽤される圧縮
– 実際にはそれぞれのStream、Footerに適⽤される
– 上記のLight-Weight Compressionが適⽤された上に適⽤される
– NONE, ZLIB, SNAPPY, LZO
Page 35

Page 36
High Compression

High Performance
FileレベルのIndex
StripeレベルのIndex
RowGroupレベルの
Index

ORCの情報をダンプする
orcfiledump
hive --service orcfiledump /apps/hive/warehouse/rankings/000045_0
RowGroupごとのインデックス情報を含めるには rowindex <カラム番号> を指定。0を指定
すれば全カラムの情報がとれる
hive --service orcfiledump --rowindex 1 /apps/hive/warehouse/rankings/000045_0
Page 38

File Statistics
File Statistics:
Column 0: count: 1620325 hasNull: false
Column 1: count: 1620325 hasNull: false min: 1.0.100.215 max: 99.99.97.199 sum:
21531540
Column 2: count: 1620325 hasNull: false min…max: …sum: 88890214
Column 3: count: 1620325 hasNull: false min: 1970-01-01 max: 2012-04-30
Column 4: count: 1620325 hasNull: false min: …-8 max: …sum: 810757.3001111746
Column 5: count: 1620325 hasNull: false min… max: … sum: 85357610
Column 6: count: 1620325 hasNull: false min: ALB max: ZAF sum: 4860975
Page 39

Stripe Statistics
Stripe Statistics:
Stripe 1:
Column 0: count: 1545000 hasNull: false
Column 1: count: 1545000 hasNull: false min: 1.0.100.215 max: 99.99.97.199 sum:
20530443
Column 2: count: 1545000 hasNull: false min: … max: … sum: 84763272
Column 3: count: 1545000 hasNull: false min: 1970-01-01 max: 2012-04-30
Column 4: count: 1545000 hasNull: false min: … max: … sum: 773016.625769496
Column 5: count: 1545000 hasNull: false min: … max: … sum: 81385950
Column 6: count: 1545000 hasNull: false min: ALB max: ZAF sum: 4635000
Page 40

Row Group Indexes
Row group indices for column 1:
Entry 0: count: 10000 hasNull: false min: 1.101.125.195 max: 99.98.152.204 sum: 132919 positions: 0,0,0,0,0
Page 41

SARG & Predicate Pushdown
Ã SARG: Search ARGument
Ã SELECT COUNT(*) FROM CUSTOMER WHERE CUSTOMER.state = ʻCAʼ;
Ã 上記のようなクエリにおいて、RecordReaderはwhere clauseにマッチする
ORCファイル、Stripe、RowGroupだけをストレージから読み出す
Page 42

Bloom Filter Index
1 0 1 110 1 0 11
x y z
w
m=10
k=3
m個の要素を持つ配列に対して⼊⼒値に対してk回の
ハッシュ関数をかけて結果を格納しておく。
確認対象の値をk回ハッシュして、結果がすべて1で
あれば、そのインデックスに値が含まれる。そうで
なければ含まれないのでスキップする。偽陽性の結
果になる可能性もある。

Bloom Filter Indexes Improvements
5999989709
540,000
10,000
No Indexes Min-Max Indexes Bloomﬁlter Indexes
select * from tpch_1000.lineitem where l_orderkey = 1212000001;
(log scale – smaller is beNer)
Rows Read

Bloom Filter Indexes Improvements
74
4.5
1.34
No Indexes Min-Max Indexes Bloomﬁlter Indexes
select * from tpch_1000.lineitem where l_orderkey=1212000001;
(smaller is beNer)
Time Taken (seconds)
~16x improvement
~3.3x improvement

ORCFile – テーブル定義の例
Ã テーブルまたはパーティション別に定義
Ã 選べられる圧縮コーデック
Page 46
create table Addresses (
name string,
street string,
city string,
state string,
zip int
) stored as orc tblproperties ("orc.compress"=”ZLIB");

ORCFile – テキストからORCに変換
Ã ORCを使わない理由はない
Ã SQL 1つでテキストからORCに変換
Page 47
-- Create Text & ORC tables
CREATE TABLE test_details_txt( visit_id INT, store_id SMALLINT) STORED AS TEXTFILE;
CREATE TABLE test_details_orc( visit_id INT, store_id SMALLINT) STORED AS ORC;
-- Load into Text table
LOAD DATA LOCAL INPATH '/home/user/test_details.csv' INTO TABLE test_details_txt;
-- Copy to ORC table
INSERT OVERWRITE INTO test_details_orc SELECT * FROM test_details_txt;

Vectorized Query Execution
Process 1024 Rows at a Time

Vectorization – ベクターSQLエンジン
Ã 機能:
– １⾏づつの代わりに、⼀回に1024⾏を処理
– モーデンなハードウェアアーキテクチャの活⽤
Ã 利点:
– ⼤きいクエリは最⼤３倍早い
– CPU使⽤時間を削減、クラスタリソースの有効利⽤
Page 49

Column Store Layout
Table
Row Store Column Store
A B
1 A1 B1
2 A2 B2
1
A1
B1
2
A2
B2
A
A1
A2
B
B1
B2

Column Store Characteristics
Row Store
•  TextFile, SequenceFile, Avro
•  Slower read performance
•  Reads whole columns
•  Lower compression ratio
•  Higher local cardinality
Column Store
•  RCFile, Parquet, ORC
•  Faster read performance
•  Reads needed columns only
•  Higher compression ratio
•  Lower local cardinality
•  Room for further optimization
•  Vectorization

Hive Vectorization 2014
Rewriting Hive execution engine
for performance
•  No method calls
•  Low instruction count
•  Cache locality to 1,024 values
•  No pipeline stalls
•  SIMD in Java 8
But not excellent without SIMD
set hive.vectorized.execution.enabled = true;
J. Sompolski, M. Zukowski, P. Boncz. Vectorization
vs. Compilation in Query Execution. 2011

Cost Based Optimizer

Cost Based Optimizer
Ã  Apache Calciteを利⽤
Ã  何をしてくれるもの？
–  Ordering joins
–  Bushy Join Tree
–  Converting join algorithms
Ã  Paper: https://cwiki.apache.org/conﬂuence/display/Hive/Cost-based
+optimization+in+Hive
Ã  Anatomy: http://hortonworks.com/blog/hive-0-14-cost-based-
optimizer-cbo-technical-overview/

MySQL
Splunk
Expression tree
SELECT p.“product_name”, COUNT(*) AS c
FROM “splunk”.”splunk” AS s
JOIN “mysql”.”products” AS p
ON s.”product_id” = p.”product_id”
WHERE s.“action” = 'purchase'
GROUP BY p.”product_name”
ORDER BY c DESC
join
Key: product_id
group
Key: product_name
Agg: count
ﬁlter
Condition:
action =
'purchase'
sort
Key: c DESC
scan
scan
Table: splunk
Table: products

Splunk
Expression tree
(optimized)
SELECT p.“product_name”, COUNT(*) AS c
FROM “splunk”.”splunk” AS s
JOIN “mysql”.”products” AS p
ON s.”product_id” = p.”product_id”
WHERE s.“action” = 'purchase'
GROUP BY p.”product_name”
ORDER BY c DESC
join
Key: product_id
group
Key: product_name
Agg: count
ﬁlter
Condition:
action =
'purchase'
sort
Key: c DESC
scan
Table: splunk
MySQL
scan
Table: products

Query preparation – Hive 0.13
SQL
parser
Semantic
analyzer
Logical
Optimizer
Physical
Optimizer
Abstract
Syntax
Tree
(AST)
Hive SQL
Annotated
AST
Plan
Tez
Tuned
Plan

Query preparation – Hive 0.14
SQL
parser
Semantic
analyzer
Logical
Optimizer
Physical
Optimizer
Hive SQL
AST with
optimized join-
ordering
Tez
Tuned
Plan
Translate
to
algebra
Optiq
optimize
r

Star schema
Sales Inventor
y
Time
Product
Custome
r
Warehouse
Key
Fact table
Dimension table
Many-to-one relationship

Query combining two stars
SELECT product.id, sum(sales.units), sum(inventory.on_hand)
FROM sales ON …
JOIN customer ON …
JOIN time ON …
JOIN product ON …
JOIN inventory ON …
JOIN warehouse ON …
WHERE time.year = 2014
AND time.quarter = ʻQ1ʼ
AND product.color = ʻRedʼ
AND warehouse.state = ʻWAʼ
GROUP BY …
Sales InventoryTime
Product
Customer
Warehouse

Left-deep tree
“left-deep”ツリー
すべてのジョインがシリアルに⾏われる。ジョ
インの順番は考慮されているが、ツリーの形は
考慮されていない。
よくあるプラン:
•  最⼤のテーブルを左下に置いてスタート
•  絞り込みの⼤きいJoinから適⽤していく
Sales Customer
Time
Product
Inventory
Warehouse

Bushy tree (Bush:低⽊、茂み）
Joinがどこで⾏われるかに制約をか
けない
“Bushes” はファクトテーブル
(Sales and Inventory)と関連する
ディメンションテーブルで形成され
る
ディメンションテーブルがフィル
ターの役割を果たす
結果としてデータの読み込み⾏数や
ネットワークを介してのやり取りを
少なくしていける
Sales Customer
Time
Product
Inventory Warehouse

Cost variables
Ã  Hr - This is the cost of Reading 1 byte from HDFS in nano seconds.
Ã  Hw - This is the cost of Writing 1 byte to HDFS in nano seconds.
Ã  Lr - This is the cost of Reading 1 byte from Local FS in nano seconds.
Ã  Lw - This is the cost of writing 1 byte to Local FS in nano seconds.
Ã  NEt – This is the average cost of transferring 1 byte over network in
the Hadoop cluster from any node to any node; expressed in nano
seconds.
Ã  T(R) - This is the number of tuples in the relation.
Ã  Tsz – Average size of the tuple in the relation
Ã  V(R, a) –The number of distinct values for attribute a in relation R
Ã  CPUc – CPU cost for a comparison in nano seconds

Assumed values
Ã  CPUc = 1 nano sec
Ã  NEt = 150 * CPUc nano secs
Ã  Lw = 4 * Net
Ã  Lr = 4 * Net
Ã  Hw = 10 * Lw
Ã  Hr = 1.5 * Lr

Proﬁle Hive queries
Ã hive.tez.exec.print.summary=true
←このへんで仕事してる

LLAP: Live Long And Process
Challenge for Sub-Second

What is LLAP?
•  Hiveの処理実⾏のための常駐型プロセス
•  タスクの起動コストの低減
•  JITオプティマイザがより利きやすい
•  プロセスではなくスレッド型のExecutor
•  メタデータやMapJoinのテーブルなどをタスク間で共
有できる
•  IOの⾮同期化とキャッシュの導⼊
•  Query fragment API
Node
LLAP Process
Cache
Query Fragment
HDFS
Query Fragment

What LLAP isn't
•  Hive execution engine (like Tez, MR, Spark…)
•  Execution enginesは処理の組み⽴てやを⾏う
•  Not a storage layer
•  LLAPデーモンはステートレスで、データはHDFSをsource of truth
として利⽤する
•  Does not supersede existing Hive
•  Containerベースの実⾏も引き続き進化していく

Example execution: MR vs Tez vs Tez+LLAP
M M M
R R
M M
R
M M
R
M M
R
HDFS
HDFS
HDFS
T T T
R R
R
T T
T
R
M M M
R R
R
M M
R
R
HDFS
In-Memory
columnar cache
Map – Reduce
Intermediate results in HDFS
Tez
Optimized Pipeline
Tez with LLAP
Resident process on Nodes
Map tasks
read HDFS

LLAP in your cluster
•  LLAPデーモンはYARN上で実⾏される
•  Apache Sliderがデーモン⽤コンテナのプロビジョンとリ
カバリを⾏う
•  Resource management via YARN delegation model
(WIP)
•  LLAP and containers dynamically balance resource
usage (WIP)

Query execu6on

•  DAGによる処理の組み⽴てはそのまま利⽤される。Tezのランタイム
もそのまま利⽤される。
•  フラグメント/タスクはLLAPもしくは通常のコンテナ、AM内のいず
れでも実⾏可能
•  どこで実⾏されるかはHive Clientによって決定される
•  Conﬁgurable – all in LLAP, none in LLAP, intelligent mix
•  LLAPにタスクを割り当てるポリシー(in auto mode)
•  No user code (or only blessed user code)
•  Data source – HDFS
•  ORC and vectorized execution (for now)
•  Others can still run in LLAP in "all" mode, w/o IO elevator and cache
•  Data size limitations (avoid heavy / long running processing within LLAP)
Tez + LLAP – overview

So…
M M M
R R
R
M M
R
R
Tez

AM
So…
T T T
R R
R
T T
T
R
M M M
R R
R
M M
R
R
Tez Tez with LLAP (auto)
auto

AM AM
So…
T T T
R R
R
T T
T
R
M M M
R R
R
M M
R
R
Tez Tez with LLAP (auto)
T T T
R R
R
T T
T
R
Tez with LLAP (all)
all auto

Scheduling for LLAP in Tez AM
•  Greedy scheduling per query
•  クラスタ全体が利⽤可能な前提でスケジューリングが⾏われる
•  Schedule work to preferred location (HDFS locality)
•  同じデータにアクセスする複数のクエリ間で、preferred locationの設定に
よって同じデーモン上でタスクを実⾏させることができる

LLAP
Queue
Queuing fragments
•  LLAPデーモンはスレッドプールを使って
タスク/フラグメントを実⾏する
•  内部にキューを持っており、プラガブルな
優先度付の仕組みもある
Executor
Q1 Reducer 2
Executor
Q1 Map 1
Executor
Q1 Map 1
Executor
Q3 Map 19
Q1 Reducer 2
Q1 Map 1
Q3 Map 19
Q1 Reducer 2

LLAP Scheduling – pipelining and preemption
•  フラグメントは⼊⼒データが揃いきって
いなくても実⾏開始できる
•  ⼊⼒データが揃った時点で”ﬁnishable”と
いうフラグが付与される
LLAP
QueueExecutor
Executor
Interactive
query map 1/3
…
Interactive
query map 3/3
Executor
Interactive
query map 2/3
Wide query
reduce
Well, 10
mapper out of
100 are done!

•  ﬁnishableになるまでexecutorを解放はしない
LLAP
QueueExecutor
Executor
Interactive
query map 1/3
…
Interactive
query map 3/3
Executor
Interactive
query map 2/3
Wide query
reduce

LLAP
QueueExecutor
Executor
Interactive
query map 1/3
…
Interactive
query map 3/3
Executor
Interactive
query map 2/3
Wide query
reduce

•  Non-ﬁnishableなフラグメントはプリエンプショ
ンされる
LLAP
QueueExecutor
Executor
Interactive
query map 1/3
…
Interactive
query map 3/3
Executor
Interactive
query map 2/3
Wide query
reduce

IO elevator and other internals

Asynchronous IO
•  これまでのHiveでは、IO
は同期的に⾏われていた
•  データの圧縮、⾮圧縮も
同期型だった

Asynchronous IO
•  LLAPでは、IOエレベー
タースレッドがディスクIO、
圧縮、などを⾮同期に執り
⾏う
•  IO threads can be
spindle aware (WIP)
•  Depending on workload,
IO and processing
threads can balance
resource usage (throttle
IO, etc.) (WIP)

Caching and oﬀ-heap data
•  解凍されたデータはoﬀ-heapにキャッシュされる
•  キャッシュについてはGCを気にしないでいいように
•  HDFSのIOと解凍コストを排除。特にディメンションテーブ
ルに有効
•  プラガブルなEviction Policy
•  現在はFIFO, LRFUをサポート

Other beneﬁts
•  ファイルのメタデータやインデックスもキャッシュされる
•  Predicate Pushdownの⾼速化
•  MapJoin⽤のハッシュテーブルやフラグメントの実⾏計画
もJVM内で共有される
•  タスク/フラグメントごとに実⾏計画のデシリアライズのコストが減る
•  Better use of JIT optimizer
•  起動しっぱなしのデーモンなので、JITが仕事をするための時間がよ
り⻑く取れる
•  Especially good with vectorization!

まとめ

Sub-second
ショートクエリで
1秒以下のレスポンスを⽬指す
Ã ~Hive1.2.1
– Tez
– Vectorization
Ã Hive2.0
– LLAP
Stinger Initiative

Apache Hiveの今とこれから - 2016

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Apache Hiveの今とこれから - 2016

Similar to Apache Hiveの今とこれから - 2016 (20)

More from Yuta Imai

More from Yuta Imai (8)

Recently uploaded

Recently uploaded (9)

Apache Hiveの今とこれから - 2016