7. Treasure Dataとは?
• 米シリコンバレー発日本人創業のビッグデータ関連企業
– 2011年12月、米Mountain Viewにて創業
– 2012年11月、東京丸の内に日本支社設立
• クラウド型データマネージメントサービス「Treasure Data Service」を提供
7
芳川裕誠 – CEO
Open source business veteran
太田一樹 – CTO
Founder of world’s largest Hadoop Group
主要投資家
Bill Tai
Charles River Ventures, Twitterなどに投資
まつもとゆきひろ
Ruby言語開発者
Sierra Ventures – (Tim Guleri)
企業向けソフト・データベース領域での有力VC
創業者
Jerry Yang
Yahoo! Inc. 創業者
古橋貞之 – Software Engineer
MessagePack, Fluentd開発者
8. Treasure Data Service
ビッグデータのための「クラウド + マネジメント」一体型サービス
データ収集∼保存∼分析までワンストップでサポート
8
• 毎日数百億規模のレコードが取り込まれている
– 2014年5月に5兆(trillion)レコードに到達
• SQLベース(Hive, Presto, Pigなど)による検索サービスを提供
9. Importing more than 500,000 records / sec.
0
1000
2000
3000
4000
5000
6000
7000
DataGrowthinBillions
Data (records) Imported
Service
Launched
3 Trillion
4 Trillion
1 Trillion
5 Trillion
2 Trillion
Series A Funding
100
Customers
Gartner Cool
Vendor
Report
9
17. Query Planner
SELECT
name,
count(*) AS c
FROM impressions
GROUP BY name
SQL
impressions (
name varchar
time bigint
)
Table schema
Table scan
(name:varchar)
GROUP BY
(name, count(*))
Output
(name, c)
+
Sink
Final aggregation
Exchange
Sink
Partial aggregation
Table scan
Output
Exchange
Logical query plan
Distributed query plan
23. Challenges in Database as a Service
• トレードオフ
• Reference
– Workload Management for Big Data Analytics. A. Aboulnaga
[SIGMOD2013 Tutorial]
23
個々のクエリを
単一クラスタで実行
全てのクエリを
必要最小限の
クラスタで実行
速いが非常に高価
$$$
性能は制限されるが、
手頃な価格で利用できる
49. Query Collection in TD
• SQL query logs
– query, detailed query plan, elapsed time, processed rows, etc.
• Presto is used for analyzing the query history
49
51. Deployment
• Building Presto takes more than 20 minute
• Facebook frequently releases new versions
• Let CircleCI build Presto
– Deploy jar files to private Maven repository
– We sometime use non-release versions
• for fixing serious bugs
• hot-fix patches
• Integration Test
– td-presto connector
• PlazmaDB, Multi-tenant query scheduler
• Query optimizer
– Run test queries on staging cluster
51
52. Production: Blue-Green Deployment
• http://martinfowler.com/bliki/BlueGreenDeployment.html
• 2 Presto Coordinators (Blue/Green)
– Route Presto queries to the active cluster
– No down-time upon deployment
• Launch Presto worker instances with chef <- less than 5 min. in AWS
• Inactive clusters is used for pre-production testing and customer
support
– Investigation and tuning of customer query performance
– Trouble shooting
52
54. Admission control is necessary
• Adjust resource utilization
– Running Drivers (Splits)
– MPL (Multi-Programming Level)
54
55. Challenge: Auto Scaling
• Setting the cluster size based on the peak usage is expensive
• But predicting customer usage is difficult
55
56. Problematic Queries
• 90% of queries finishes within 2 min.
– But remaining 10% is still large
• 10% of 10,000 queries is 1,000.
• Long-running queries
• Hog queries
56
57. Long Running Queries
• Typical bottlenecks
– Cross joins
– IN (a, b, c, …)
• semi-join filtering process is slow
– Complex scan condition
• pushing down selection
• but delays column scan
– Tuple materialization
• coordinator generates json data
– Many aggregation columns
• group by 1, 2, 3, 4, 5, 6, …
– Full scan
• Scanning 100 billion rows…
• Adding more resources does not always make query faster
• Storing intermediate data to disks is necessary
57
Result are
buffered
(waiting fetch)
slow process
fast
fast
58. Hog Query
• Queries consuming a lot of CPU/memory resources
– Coined by S. Krompass et al. [EDBT2009]
• Example:
– select 1 as day, count(…) from … where time <= current_date - interval 1 day
union all
select 2 as day, count(…) from … where time <= current_date - interval 2 day
union all
– …
– (up to 190 days)
• More than 1000 query stages.
• Presto tries to run all of the stages at once.
– High CPU usage at coordinator
58
59. Presto Team at Facebook
• Currently 12 members
• Talked with core developers of Presto
• 2015 Q2 Plan
– Per-query memory resource management
– Stage-wise resource allocation
– Raptor connector
• native storage
• + MySQL index
• (internal-use only)
59
60. • Huge Query Processing
• Idea
– Bushy plan -> Deep plan
– Introduce stage-wise resource assignment
Huge Query Processing
60