More Related Content
Similar to 分散DB Apache KuduのアーキテクチャDBの性能と一貫性を両立させる仕組み「HybridTime」とは (20)
More from Cloudera Japan (20)
分散DB Apache KuduのアーキテクチャDBの性能と一貫性を両立させる仕組み「HybridTime」とは
- 2. 2 © Cloudera, Inc. All rights reserved.
• ( ) / takahiko at cloudera.com
•
• Cloudera
•
• Internet & Network
• RDBMS 1
• NoSQL 2
• Hadoop 3 ←Now!
- 3. 3 © Cloudera, Inc. All rights reserved.
• Apache Kudu
• Kudu OLTP OLAP HTAP
DB #dbts2017 Kudu
• BI/DWH DB Kudu
Google Spanner
https://www.slideshare.net/Cloudera_jp/apache-kududb-dbts2017
• HybridTime
DB HybridTime
Kudu
- 5. 5 © Cloudera, Inc. All rights reserved.
• 275 3PB
• 1000 PB
• /
• 1 GB/
• DB
• BLOB
•
• 1000
Kudu
1
...
- 6. 6 © Cloudera, Inc. All rights reserved.
Kudu
Kudu
(Impala)
(Kudu) (S3)(HDFS)
(Impala) (Spark)
(Hive)
(MapReduce)
(ADLS)
SQL
( DB )
HMS
- 7. 7 © Cloudera, Inc. All rights reserved.
SQL Kudu
Impala + Kudu
(Impala)
(Kudu) (S3)(HDFS)
(Impala) (Spark)
(Hive)
(MapReduce)
(ADLS)
HMS
• Kudu SQL
• Impala SQL
- 8. 8 © Cloudera, Inc. All rights reserved.
• Impala SQL Impala Kudu
• Impala Kudu predicate push down
• Kudu SCAN Impala aggregation
SQL
SQL Impala
3
90
- 9. 9 © Cloudera, Inc. All rights reserved.
Spark Kudu
Spark + Kudu
(Impala)
(Kudu) (S3)(HDFS)
(Impala) (Spark)
(Hive)
(MapReduce)
(ADLS)
HMS
• Spark SQL
Kudu API
• SparkSQL
- 10. 10 © Cloudera, Inc. All rights reserved.
• Kudu 1
Kudu
Tablet
Kudu
TabletServer
- 11. 11 © Cloudera, Inc. All rights reserved.
• 1 3
• 3
• Raft
•
•
Tablet
TabletServer
- 12. 12 © Cloudera, Inc. All rights reserved.
•
• INSERT/UPDATE/UPSERT/DELETE
• DECIMAL
•
•
• Kerberos
•
•
•
•
Kudu
- 14. 14 © Cloudera, Inc. All rights reserved.
• OLTP 1TB RAM
•
OLTP OLAP
• 2
DB DB
insert/update/delete
OLTP OLAP DWH BI
select
ETL
- 15. 15 © Cloudera, Inc. All rights reserved.
Hadoop
• (PB)
HDFS OLAP
• Impala/Hive SQL
• HDFS OLTP HBase
HBase
Hadoop DB
put / delete OLTP OLAP BIselect
HBase ImpalaHadoop
data ingestion
HDFS
ETL
- 16. 16 © Cloudera, Inc. All rights reserved.
• OLTP? OLAP?
•
•
• OLTP OLAP
• HTAP(Hybrid Transactional/Analytic Processing)
•
• ...
•
• Kudu HTAP
OLTP OLAP
HTAP
HTAP
- 17. 17 © Cloudera, Inc. All rights reserved.
• OLTP OLAP 1 DB
•
HTAP DB
Kudu
insert/update/delete
HTAP DWH BI
select
Kudu
- 18. 18 © Cloudera, Inc. All rights reserved.
(HDFS)
SQL
Impala
(Spark Streaming)
(Flume)
ETL SQL
Hive/Spark
DB DB
(Kudu)
IoT
(Flume)
BI
BI
BI
ETL
MQTT
BrokerIoT
BI
DB DB
/
DB DB
(Kafka)
( )
- 20. 20 © Cloudera, Inc. All rights reserved.
• DB
•
• ! f
•
• 12:30:00 < 12:30:03
• 2 < 3
• Log Sequence Number LSN
• LSN
• DB
• DB LSN
• DB
DB
12:30:00 12:30:03
2 3
! f
- 21. 21 © Cloudera, Inc. All rights reserved.
• Physical Clock
•
•
•
•
• Logical Clock)
•
• ...
•
•
•
DB
B
A
A B
- 22. 22 © Cloudera, Inc. All rights reserved.
•
•
•
• +1
- Lamport Clock
2
3
6
24
16
61
54
69
70
12 24 48423630
8 32 40 48
50 703020
- 23. 23 © Cloudera, Inc. All rights reserved.
•
• +1
•
•
•
- Vector Clock
2
3
{1,0,0}
{1,1,0}
{1,2,0}
{1,2,1} {1,2,2}
{1,4,2}
{1,3,0}
{2,3,0} {3,3,0}
{3,3,3}
{1,5,2}
{5,5,4}
{5,5,2}{4,5,2}
- 25. 25 © Cloudera, Inc. All rights reserved.
•
•
•
12:30:00
12:29:59
A B
B
!
f
- 26. 26 © Cloudera, Inc. All rights reserved.
• Spanner: Google’s Globally Distributed Database
• DB
ACID
• GPS
• TrueTime API
error bound
Google Spanner
- 27. 27 © Cloudera, Inc. All rights reserved.
• API
• GPS
•
• TrueTime API TT.now() TTinterval
• TT.now()
• Google DC 1 7ms 4ms
Google Spanner TrueTime API
earliest latest
TT"#$%&'(): %(&)"%+$, )($%+$
TT,now()
--
- 28. 28 © Cloudera, Inc. All rights reserved.
• commit wait
• TrueTime API
• e f
2"
• External Consistency
• f e T $ < T &
Google Spanner commit-wait
$
" & "
2"
&
$
&
2"
2"
& → $
- 29. © Cloudera, Inc. All rights reserved.
Technical Report: HybridTime - Accessible Global Consistency
with High Clock Uncertainty
- 30. 30 © Cloudera, Inc. All rights reserved.
• Technical Report: HybridTime - Accessible Global Consistency with High Clock
Uncertainty
•
• Google DC
• HybridTime NTP DB
• Kudu Kudu
HybridTime
•
• 2014 (
)
Kudu
- 31. 31 © Cloudera, Inc. All rights reserved.
• Google Spanner DC
• Amazon Dynamo Cassandra DB
Eventual Consistency
•
•
•
[ ] DC DB
- 32. 32 © Cloudera, Inc. All rights reserved.
• Consistency
•
•
• CAP Consistency ACID Consistency/Isolation
• Consistency
• (Anomaly)
• Lost Update, Dirty Read, Non-Repeatable, Phantom Read, Read Skew, Write Skew, etc...
•
• Lost Update SELECT FOR UPDATE
Consistency
- 33. 33 © Cloudera, Inc. All rights reserved.
• Lamport Clocks Vector Clocks
•
•
•
• RDB Point-in-Time
• Vector Clocks
[ ]
- 34. 34 © Cloudera, Inc. All rights reserved.
• Spinnaker Paxos
•
•
• Spanner commit-wait
•
• GPS
•
[ ]
- 35. 35 © Cloudera, Inc. All rights reserved.
• HybridTime
•
•
• Pint-in-time
• Lamport Clock
• HybridTime
• Vector Clocks Lamport Clocks 2
( commit-wait )
[ ] HybridTime
HTC: { , }
- 36. 36 © Cloudera, Inc. All rights reserved.
•
• NTP
• NTP
• commit-wait
•
• NTP
• commit-wait
•
•
• Kudu DB
HybridTime
- 37. 37 © Cloudera, Inc. All rights reserved.
• !"# $ i e
• !"'() $ e
• *# $ i e
• 1:
• 2:
[ ] HybridTime
- 38. 38 © Cloudera, Inc. All rights reserved.
• HybridTime HTC
• (error)
• Spanner TrueTime API
• HybridTime
• NTP
HybridTime
- 39. 39 © Cloudera, Inc. All rights reserved.
• ntp_adjtime
• timex
• maxerror
HybridTime
- 40. 40 © Cloudera, Inc. All rights reserved.
• Kudu macOS
• macOS OS
macOS
- 42. 42 © Cloudera, Inc. All rights reserved.
• 1 2
• 2 1
• !" − !$ ...
•
• %$
• & = !" − !$ − %$
NTP
2
1
100*+
160*+
T1
100ms
160ms 160-100 = 60ms
T2
%$
NG
- 43. 43 © Cloudera, Inc. All rights reserved.
• !"
• RTT: !
#
$
• ! = !" + !$ = '( − '" − ('+ − '$)
#
$
= !" =
-./-0 /(-1/-2)
$
• 3
• 3 = '$ − '" −
-./-0 / -1/-2
$
• 3 =
$ -2/-0
$
−
-./-0 / -1/-2
$
• 3 =
$-2/$-0/-.4-04-1/-2
$
• 3 =
-2/-0/-.4-1
$
• 3 =
-2/-0 4 -1/-.
$
NTP
NTP
T2 T3
T4T1
NTP
10078
15078 16078
11078
16578
11578 12578
17578
7078 8078 8578 9578
+1078 +1078
+578
!" !$
1) 50ms
2) -30ms
RTT20ms
- 44. 44 © Cloudera, Inc. All rights reserved.
•
•
•
• ! =
#$#%#&& '(#)$%#$*)
)
= 41./
• 50ms -9ms
• RTT
0
)
• ±
0
)
NTP
T2 T3
T4T1
NTP
100./ 101./ 106./ 125./
+1./ +19./
+5./
8# 8)
150./ 151./ 156./ 175./) 50ms
RTT20ms
- 45. 45 © Cloudera, Inc. All rights reserved.
•
• 1
•
•
•
• NTP !
•
• ! +
#
$
NTP
- 46. 46 © Cloudera, Inc. All rights reserved.
• NTP
• DC NTP or Google Public NTP
• AWS Amazon Time Sync Service
• Azure Hyper-V time synchronization Google Public NTP
• GCE Google Public NTP
•
NTP
- 47. 47 © Cloudera, Inc. All rights reserved.
• UPDATE
• 2
• HybridTime
+1
[ ] HybridTime
- 48. 48 © Cloudera, Inc. All rights reserved.
• !" #, % !" HTC #, %
[ ] HybridTime
- 49. 49 © Cloudera, Inc. All rights reserved.
• HTC
• HTC
• ) ! → #
• −%& ! < !(()( # < %* #
[ ] Kudu HybridTime
j
i
#
−%& !
%* #−%& !
!
%& !
- 50. 50 © Cloudera, Inc. All rights reserved.
• KUDU-146 Deal with leap seconds
• leap second
• Stratum 0 NTP Leap Indicator OS
• 23:59:59 -> 23:59:60 -> 00:00:00
• 23:59:59 -> 23:59:59-> 00:00:00
• 2 1
• HybridTime propagate
• Kudu commit-wait
• wait ms
1
• NTP TIME_INS/TIME_OOP max error
• Leap Smearing
https://issues.apache.org/jira/browse/KUDU-430
- 51. 51 © Cloudera, Inc. All rights reserved.
• HybridTime
•
• Kudu RDB
• ACID
• Commit-wait
• NTP
• CLIENT_PROPAGETED
• HybridTime
• HybridTime propagate
Kudu
- 52. 52 © Cloudera, Inc. All rights reserved.
• Kudu MVCC Multi-version Concurrency Control
•
• WAL REDO UNDO
•
• READ_LATEST
•
• READ_AT_SNAPSHOT
• MVCC
•
Repeatable Read
Kudu
- 53. 53 © Cloudera, Inc. All rights reserved.
Kudu
https://blog.cloudera.co.jp/11c3a749a81b
- 54. 54 © Cloudera, Inc. All rights reserved.
• YCSB
•
• 3 8
• insert 60%, update 20%, single-row read 20%
•
• GCE: nl-standard-8 x10
• RAM 30GB
• Disk 350GB
• NTP
• GCE
[ ]
NTP
- 55. 55 © Cloudera, Inc. All rights reserved.
[ ]
HybridTime Commit Wait Commit Wait
Clock Error
- 57. 57 © Cloudera, Inc. All rights reserved.
• OLTP OLAP 1 DB Kudu
• HybridTime
DB
• HybridTime → P.36
• Kudu
HybridTime Serializable
• OLAP Kudu #dbts2017
• 11 6 Cloudera World Tokyo 2018
Kudu
Kudu
- 58. 58 © Cloudera, Inc. All rights reserved.
Cloudera World Tokyo 2018
http://www.clouderaworldtokyo.com/