3. Introduction to DLA
Data Lake Analytics (DLA) is a large scale serverless data federation service on Alibaba Cloud.
Serverless Data Federation Database-like User Experience High performance
4. 列存表
Data Lake Storag
e
(OSS)
One Click
Data Lake
DB - Data Streaming - Data
Spark
Streaming
LogService Application Logs
Serverless
Spark
ETL&ML
Serverless
Presto
Metadata
Management
Auto
Discovery
Archived
Transactional Data
DW
DMS APP QuickBI
Data Lake Engin
e
(DLA)
Introduction to DLA
5. DLA Presto
Multi-Coordinator
s
Lake Formation:One Click Data Warehouse, Metadata Discover
y
Enterprise level Access Contro
l
Cost:Billing methods based on the volume of scanned data, or the number of compute units used.
MySQL protocol support
Caching
Data sources:More than15 types of data source is supported,including Alibaba Cloud OSS, ADB,
Table Store , etc.
8. Oracle
DLA Presto Architecture
FrontNode
Uni
fi
ed
Meta
Service
OSS MySQL SQLServer …
TableStore
MaxCompute ElasticSearch Druid
Worker Worker Worker
Coordinator
Default Cluster
Worker Worker Worker
Coordinator
CU Cluster
Presto Clusters
PostgreSQL
SQL Dialect Transformation/Submit Query/Fetch Result
TableScan/Pushdown
Met
a
Operation
MySQL Protocol
Multiple Charging Model Unified Meta & Access Control
9. About Presto
Presto is an open source distributed SQL query engine for running interactive analytic queries
against data sources of all sizes ranging from gigabytes to petabytes.
Full Memory Processing Pluggable Connectors Great Community
Full SQL Semantics
Blazing fast, suitable for adhoc
queries, data exploration, and
lightweight ETL.
Compliant with ANSI
SQL, don’t need to worry
that any SQL syntax not
supported.
10. Challenges to DLA Presto
Oracle
FrontNode
Uni
fi
ed
Meta
Service
OSS MySQL SQLServer …
TableStore
MaxCompute ElasticSearch Druid
Worker Worker Worker
Coordinator
Default Cluster
Worker Worker Worker
Coordinator
CU Cluster
Presto Clusters
PostgreSQL
SQL Dialect Transformation/Submit Query/Fetch Result
TableScan/Pushdown
Met
a
Operation
11. Challenges to DLA Presto
Oracle
FrontNode
Uni
fi
ed
Meta
Service
OSS MySQL SQLServer …
TableStore
MaxCompute ElasticSearch Druid
Worker Worker Worker
Coordinator
Default Cluster
Worker Worker Worker
Coordinator
CU Cluster
Presto Clusters
PostgreSQL
SQL Dialect Transformation/Submit Query/Fetch Result
TableScan/Pushdown
Met
a
Operation
Request costs
Bandwidth limit
Performance
pulling large
data Latency to get
metadata/partitions
Performance
pulling large data
Pressure on
data source
12. small data big data
update frequently
update infrequently
OTS
OSS
ODPS
?
Mysql
Redis
Mongodb
PostgresSQL
…
Big Data/NoSQL: Performance of pulling large data
Online System:Pressure on data source
Big Data/O
ffl
ine:Performance of pulling large data
Challenges to DLA Prest
o
-Analysis
13. small data big data
update frequently
update not frequently
OTS
OSS
ODPS
?
Mysql
Redis
Mongodb
PostgresSQL
…
Big Data/NoSQL: Performance of pulling large data
Online System:Pressure on data source
Big Data/O
ffl
ine:Performance of pulling large data
Concurrency limitation
Avoid reading master
Challenges to DLA Prest
o
-Analysis
14. small data big data
update frequently
update not frequently
OTS
OSS
ODPS
?
Mysql
Redis
Mongodb
PostgresSQL
…
Big Data/NoSQL: Performance of pulling large data
Online System:Pressure on data source
Big Data/O
ffl
ine:Performance of pulling large data
Concurrency limitation
Avoid reading master
Caching
Challenges to DLA Prest
o
-Analysis
15. small data big data
update frequently
update not frequently
OTS
OSS
ODPS
?
Mysql
Redis
Mongodb
PostgresSQL
…
Big Data/NoSQL: Performance of pulling large data
Online System:Pressure on data source
Big Data/O
ffl
ine:Performance of pulling large data
Concurrency limitation
Avoid reading master
Caching
Pushdown
Challenges to DLA Prest
o
-Analysis
16. Oracle
Solutions
FrontNode
统⼀元
数据管
理
OSS MySQL SQLServer …
TableStore
MaxCompute ElasticSearch Druid
Worker Worker Worker
Coordinator
Default Cluster
Worker Worker Worker
Coordinator
CU Cluster
Presto Clusters
PostgreSQL
SQL改写 / 提交查询 / 取查询结果
TableScan/Pushdown
元数据操作
Decrease
Request count
Alluxio Data
Cache
Data Cache
Partition meta cache
splits cache
对源库影响
对源库影响
对源库影响
Limit Concurrency
Read from slavery
One Click Data Lake
Pushdown
19. Decreasing OSS API request count
Background
Users report that the OSS Calling fees are high, even higher than DLA
fees
OSS Calling fees = Actual calls × Unit price per 10,000 calls/10000
20. Hadoop FileSystem
API Invocation
Alibaba Cloud
OSS API Invocation
read
read
…
seek(100)
read
seek(128MB)
read
#1 read as much data as possible
with 1 request
small seek, continue reading
big seek, start a new request
continue reading
#2 read continue reading
…
1.Reduced API call count down to 1/10 for data stored in Text format.
2.Reduced API call count down to 1/3 for data stored in ORC/Parquet format.
3.Saves cost for about 60% to 90% on average.
Decreasing OSS API request count
22. Alluxio Data Cach
e
-Local Cache v.s. Cluster
OSS
Worker Worker Worker
Coordinator
Presto Cluster
Worker Worker Worker
Master
Alluxio Cluster
read alluxio
on cache miss cache to alluxio
return data
Presto Cluster
23. Alluxio Data Cach
e
-Local Cache
Alluxio data cache is a library
residing in the Presto worker.
Cache data is stored in local
Disk.
24. SOFT_AFFINITY
Makes the best attempt to assign the same split to the same worker when doing the
scheduling
Preferred(0) -> Preferred(1) -> LeastBusy
Alluxio Data Cach
e
-Local Cache
Preferred(1)
Preferred(0) Preferred(0)
Preferred(0)
LeastBusy
25. Alluxio Data Cach
e
-Cluster
OSS
Worker Worker Worker
Coordinator
Presto Cluster
Worker Worker Worker
Master
Alluxio Cluster
read alluxio
on cache miss cache to alluxio
return data
Alluxio is a distributed
caching service to
Presto
Short-circuit read
supported
26. Alluxio Data Cach
e
-Local Cache v.s. Cluster
OSS
Worker Worker Worker
Coordinator
Presto Cluster
Worker Worker Worker
Master
Alluxio Cluster
read alluxio
on cache miss cache to alluxio
return data
Presto Cluster
Local Cache v.s.Cluster
Data closer to compute node
No extra nodes needed
Local Cache v.s. Collocated Cluster
Easy to maintanance
No resource waste if user didn’t has OSS data source
27. Local Cache v.s. Cluster
Data closer to compute node
No extra resource needed
Local Cache v.s. Collocated Cluster
Easy to maintenance
No resource waste if user didn’t has OSS
data source
Alluxio Data Cach
e
-Local Cache v.s. Cluster
28. Alluxio Data Cach
e
-Improvements in DLA
Sceneries of Community Solution v.s. Sceneries of DLA
Queries mainly on hive data sources v.s. Can’t assume that for a specific user
SSD v.s. Ultra cloud disk
Challenges
Performance improvement in the statistical sense may not be perceivable by
users, necessary to increase cache hit ratio for every single query
Low disk throughput affects the acceleration effect
Increase cache hit ratio for every single query Increase disk throughput
29. Alluxio Data Cach
e
-Improvements in DLA
Increase cache hit ratio
Analysis
SOFT_AFFINITY:Preferred(0) -> Preferred(1) ->
LeastBusy
Key is to submit more splits to Preferred Nodes
node-scheduler.max-splits-per-node
Increase node-scheduler.max-splits-per-node
Effect:Cache hit ratio increased
Side effect:load for workers become
Unbalanced
4 splits 1 split 1 split
split1
split2
split3 split5 split6
split4
30. Alluxio Data Cach
e
-Improvements in DLA
Increase cache hit ratio
Unbalanced load
HiveSplit Preferred Nodes:
path.hashCode() % numWorkers
Big file generate more splits, Cause the
corresponding worker getting more load
Need to submit splits of a big file to
different nodes
(path.hashCode() + (start / (fileSize /
numWorkers)))) % numWorkers
2 splits 2 splits 2splits
split4 split5
split1
split2
split3
split4
31. Alluxio Data Cach
e
-Improvements in DLA
Improve disk throughput
20GB Ultra disk throughput:
Write109MB/s Read 108MB/s
Multiple disks
6 ultra disks performance: 600MB/s read/write
Implement
page.path = $root/$page_path
=>
page.path = $roots[page.hash % roots.size]/$page_path
32. Environment:
Cluster:16cpu64GB * 16 nodes
Disk:20GB ultra disk * 6
Data:TPCH-1TB / ORC / Stored at OSS
Queries Chosen from TPCH:
Include scan to table lineitem(the biggest table)
without join between three or more tables
Alluxio Data Cach
e
-Performance
34. Future Plan
Alluxio Cluster
Shared by multi users
Suitable when Presto auto scaling
Improvements for OSS Data Source
Fragment Result Cache
Query Result Cache
Improve performance of querying small files
35. More Information about DLA
• DLA Homepage:https://www.aliyun.com/product/datalakeanalytics
• DLA SQL Introduction:https://developer.aliyun.com/article/770819
We are hiring :)