Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Apache Kudu
A Closer Look at
By Andriy Zabavskyy Mar 2017
A species of antelope from BigData Zoo
Why Kudu
Why Kudu
Analytics on Hadoop before Kudu
Fast Scans Fast Random Access
Weak side of combining Parquet and HBase
• Complex code to manage the flow and synchronization of data
between the two sys...
Lambda Architecture Challenges
• In the real world, systems often need to accommodate
• Late-arriving data
• Corrections o...
Happy Medium
• High Throughput. Goal within 2x Impala
• Low Latency for random read/write. Goal 1ms on SSD
• SQL and NoSQL...
Why Kudu
Data Model
Tables, Schemas, Keys
• Kudu is a storage system for tables of structured data
• Schema consisting of a finite number of c...
Keys
• Some ordered subset of those columns are specified to be the
table’s primary key
• The primary key:
• enforces a un...
Write Operations
• User mutates the table using Insert, Update, and Delete
APIs
• Note: a primary key must be fully specif...
Read Operations
• Scan operation:
• any number of predicates to filter the results
• two types of predicates:
• comparison...
Read/Write Python API Sample
Why Kudu
Storage Layout
Storage Layout Goals
• Fast columnar scans
• best-of-breed immutable data formats
such as Parquet
• efficiently encoded co...
MemRowSet
• In-memory concurrent B-tree
• No removal from tree – MVCC
records instead
• No in-place updates – only
modific...
DiskRowSet
• Column-organized
• Each column is written to
disk in a single contiguous
block of data.
• The column itself i...
Deltas
• A DeltaMemStore is a concurrent B-tree which shares the
implementation of MemRowSets
• A DeltaMemStore flushes in...
Insert Path
• Each DiskRowSet stores a Bloom filter of the set of keys
present
• Each DiskRowSet, we store the minimum and...
Read Path
• Converts the key range predicate into a row offset range
predicate
• Performs the scan one column at a time
• ...
Delta Compaction
• Background maintenance manager periodically
• scans DiskRowSets to find any cases where a large
number ...
RowSet Compaction
• A key-based merge of two or more DiskRowSets
• The output is written back to new DiskRowSets rolling e...
Kudu Trade-Offs
• Random Updates will be slower
• Kudu requires key-lookup before update, bloom lookup
before insert
• Sin...
Why Kudu
Cluster
Architecture
Cluster Roles
The Kudu Master
Kudu’s central master process has several key responsibilities:
• A catalog manager
• keeping track of whi...
Why Kudu
Cluster Architecture
Partitioning
Partitioning
• Tables in Kudu are horizontally partitioned.
• Kudu, like BigTable, calls these partitions tablets
• Kudu s...
Partitioning: Hash
Img source: https://github.com/cloudera/kudu/blob/master/docs/images/hash-partitioning-example.png
Partitioning: Range
Img source: https://github.com/cloudera/kudu/blob/master/docs/images/range-partitioning-example.png
Partitioning: Hash plus Range
Img source: https://github.com/cloudera/kudu/blob/master/docs/images/hash-range-partitioning...
Partitioning Recommendations
• Bigger tables, like fact tables are recommended to partition in
a way so that 1 tablet woul...
Dimension Table with One Partition
Why Kudu
Cluster Architecture
Replication
Replication Approach
• Kudu uses the Leader/Follower or Master-Slave
replication
• Kudu employs the Raft[25] consensus alg...
Raft: Replicated State Machine
• Replicated log ensures state machines execute same commandsinsame order
• Consensus modul...
Consistency Model
• Kudu provides clients the choice between two consistency
modes for reads(scans):
• READ_AT_SNAPSHOT
• ...
READ_LATEST consistency
• Monotonic reads are guaranteed(?) Read-your-writes is not
• Corresponds to "Read Committed" ACID...
READ_LATEST consistency
• The server will always return committed writes at the time
the request was received.
• This type...
READ_AT_SNAPSHOT Consistency
• Guarantees read-your-writes consistency from a single client
• Corresponds "Repeatable Read...
READ_AT_SNAPSHOT Consistency
• The server attempts to perform a read at the provided
timestamp
• In this mode reads are re...
Write Consistency
• Writes to a single tablet are always internally consistent
• By default, Kudu does not provide an exte...
Replication Factor Limitation
• Since Kudu 1.2.0:
• The replication factor of tables is now limited to a
maximum of 7
• In...
Kudu and CAP Theorem
• Kudu is a CP type of storage engine.
• Writing to a tablet will be delayed if
the server that hosts...
Why Kudu
Kudu
Applicability
Applications for which Kudu is a viable
• Reporting applications where new data must be immediately
available for end user...
Why Kudu
Streaming Analytics
Case Study
Business Case
• A leader in health care
compliance consulting and
technology-driven managed
services
• Cloud-based multi-s...
ETL Approach
Key Points:
• Leverage Confluent platform with
Schema Registry
• Apply configuration based approach:
• Avro S...
Stream ETL using Pipeline Architecture
Cache
Manager
Mapper/
Flattener
Types
Adjuster
Data
Enricher
DB Sinker
Data
Reader
...
Why Kudu
Key Types
Benchmark
Kudu Numeric vs String Keys
• Reason:
• Generating surrogate numeric keys adds extra processing step
and complexityto the ...
Benchmark Result
Why Kudu
Lessons
Learnt
Pain Points
• Often releases with many changes
• Data types Limitations (especially in Python Lib, Impala)
• Lack of Seque...
Limitations
• Not recommended more than 50 columns
• Immutable primary keys
• Non-alterable Primary Key, Partitioning, Col...
Modeling Recommendations: Star Schema
Dimensions :
• Replication factor equal to
number of nodes in a cluster
• 1 Tablet p...
Why Kudu
What Kudu
is Not
What Kudu is Not
• Not a SQL interface itself
• It’s just the storage layer – you should use Impala or
SparkSQL
• Not an a...
Why Kudu
Kudu vs MPP
Data Warehouse
Kudu vs MPP Data Warehouses
In Common:
• Fast analytics queries via SQL
• Ability to insert, update, delete data
Differenc...
Useful resources
• Community, Downloads, VM:
• https://kudu.apache.org
• Whitepaper:
• http://kudu.apache.org/kudu.pdf
• S...
USA HQ
Toll Free: 866-687-3588
Tel: +1-512-516-8880
Ukraine HQ
Tel: +380-32-240-9090
Bulgaria
Tel: +359-2-902-3760
Germany...
Upcoming SlideShare
Loading in …5
×

A Closer Look at Apache Kudu

1,707 views

Published on

Presentation conducted at Morning@Lohika:
http://morning.lohika.com/past-events/fast-data

Published in: Software
  • Login to see the comments

A Closer Look at Apache Kudu

  1. 1. Apache Kudu A Closer Look at By Andriy Zabavskyy Mar 2017
  2. 2. A species of antelope from BigData Zoo
  3. 3. Why Kudu Why Kudu
  4. 4. Analytics on Hadoop before Kudu Fast Scans Fast Random Access
  5. 5. Weak side of combining Parquet and HBase • Complex code to manage the flow and synchronization of data between the two systems. • Manage consistent backups, security policies, and monitoring across multiple distinct systems.
  6. 6. Lambda Architecture Challenges • In the real world, systems often need to accommodate • Late-arriving data • Corrections on past records • Privacy-related deletions on data that has already been migrated to the immutable store.
  7. 7. Happy Medium • High Throughput. Goal within 2x Impala • Low Latency for random read/write. Goal 1ms on SSD • SQL and NoSQL style API Fast Scans Fast Random Access
  8. 8. Why Kudu Data Model
  9. 9. Tables, Schemas, Keys • Kudu is a storage system for tables of structured data • Schema consisting of a finite number of columns • Each such column has a name, type: • Boolean, Integers, Unixtime_Micros, • Floating, String, Binary
  10. 10. Keys • Some ordered subset of those columns are specified to be the table’s primary key • The primary key: • enforces a uniqueness constraint • acts as the sole index by which rows may be efficiently updated or deleted
  11. 11. Write Operations • User mutates the table using Insert, Update, and Delete APIs • Note: a primary key must be fully specified • Java, C++, Python API • No multi-row transactional APIs: • each mutation conceptually executes as its own transaction, • despite being automatically batched with other mutations for better performance.
  12. 12. Read Operations • Scan operation: • any number of predicates to filter the results • two types of predicates: • comparisons between a column and a constant value, • and composite primary key ranges. • An user may specify a projection for a scan. • A projection consists of a subset of columns to be retrieved.
  13. 13. Read/Write Python API Sample
  14. 14. Why Kudu Storage Layout
  15. 15. Storage Layout Goals • Fast columnar scans • best-of-breed immutable data formats such as Parquet • efficiently encoded columnar data files. • Low-latency random updates • O(lg n) lookup complexity for random access • Consistency of performance • Majority of users are willing predictability
  16. 16. MemRowSet • In-memory concurrent B-tree • No removal from tree – MVCC records instead • No in-place updates – only modifications without changing the value size • Link together leaf nodes for sequential scans • Row-wise layout - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
  17. 17. DiskRowSet • Column-organized • Each column is written to disk in a single contiguous block of data. • The column itself is subdivided into small pages • Granular random reads, and • An embedded B-tree index
  18. 18. Deltas • A DeltaMemStore is a concurrent B-tree which shares the implementation of MemRowSets • A DeltaMemStore flushes into a DeltaFile • A DeltaFile is a simple binary column
  19. 19. Insert Path • Each DiskRowSet stores a Bloom filter of the set of keys present • Each DiskRowSet, we store the minimum and maximum primary key,
  20. 20. Read Path • Converts the key range predicate into a row offset range predicate • Performs the scan one column at a time • Seeks the target column to the correct row offset • Consult the delta stores to see if any later updates
  21. 21. Delta Compaction • Background maintenance manager periodically • scans DiskRowSets to find any cases where a large number of deltas have accumulated, and • schedules a delta compaction operation which merges those deltas back into the base data columns.
  22. 22. RowSet Compaction • A key-based merge of two or more DiskRowSets • The output is written back to new DiskRowSets rolling every 32 MB • RowSet compaction has two goals: • We take this opportunity to remove deleted rows. • This process reduces the number of DiskRowSets that overlap in key range
  23. 23. Kudu Trade-Offs • Random Updates will be slower • Kudu requires key-lookup before update, bloom lookup before insert • Single Row Seek may be slower • Columnar Design is optimized for scans • Especially slow at reading a row with many recent updates
  24. 24. Why Kudu Cluster Architecture
  25. 25. Cluster Roles
  26. 26. The Kudu Master Kudu’s central master process has several key responsibilities: • A catalog manager • keeping track of which tables and tablets exist, as well as their schemas, desired replication levels, and other metadata • A cluster coordinator • keeping track of which servers in the cluster are aliveand coordinating redistribution of data • A tablet directory • keeping track of which tablet servers are hosting replicas of each tablet
  27. 27. Why Kudu Cluster Architecture Partitioning
  28. 28. Partitioning • Tables in Kudu are horizontally partitioned. • Kudu, like BigTable, calls these partitions tablets • Kudu supports a flexible array of partitioning schemes
  29. 29. Partitioning: Hash Img source: https://github.com/cloudera/kudu/blob/master/docs/images/hash-partitioning-example.png
  30. 30. Partitioning: Range Img source: https://github.com/cloudera/kudu/blob/master/docs/images/range-partitioning-example.png
  31. 31. Partitioning: Hash plus Range Img source: https://github.com/cloudera/kudu/blob/master/docs/images/hash-range-partitioning-example.png
  32. 32. Partitioning Recommendations • Bigger tables, like fact tables are recommended to partition in a way so that 1 tablet would contain about 1GB of data • Do not partition small tables like dimensions • Note: Impala doesn’t allow skipping the partitioning clause, so you need to specify the 1 range partition explicitly:
  33. 33. Dimension Table with One Partition
  34. 34. Why Kudu Cluster Architecture Replication
  35. 35. Replication Approach • Kudu uses the Leader/Follower or Master-Slave replication • Kudu employs the Raft[25] consensus algorithm to replicate its tablets • If a majority of replicas accept the write and log it to their own local write-ahead logs, • the write is considered durably replicated and thus can be committed on all replicas
  36. 36. Raft: Replicated State Machine • Replicated log ensures state machines execute same commandsinsame order • Consensus module ensures proper log replication • System makes progress as long as any majority of servers are up • Visualization: https://raft.github.io/raftscope/index.html
  37. 37. Consistency Model • Kudu provides clients the choice between two consistency modes for reads(scans): • READ_AT_SNAPSHOT • READ_LATEST
  38. 38. READ_LATEST consistency • Monotonic reads are guaranteed(?) Read-your-writes is not • Corresponds to "Read Committed" ACID Isolation mode: • This is the default mode.
  39. 39. READ_LATEST consistency • The server will always return committed writes at the time the request was received. • This type of read is not repeatable.
  40. 40. READ_AT_SNAPSHOT Consistency • Guarantees read-your-writes consistency from a single client • Corresponds "Repeatable Read” ACID Isolation mode.
  41. 41. READ_AT_SNAPSHOT Consistency • The server attempts to perform a read at the provided timestamp • In this mode reads are repeatable • at the expense of waiting for in-flight transactions whose timestamp is lower than the snapshot's timestamp to complete
  42. 42. Write Consistency • Writes to a single tablet are always internally consistent • By default, Kudu does not provide an external consistency guarantee. • However, for users who require a stronger guarantee, Kudu offers the option to manually propagate timestamps between clients
  43. 43. Replication Factor Limitation • Since Kudu 1.2.0: • The replication factor of tables is now limited to a maximum of 7 • In addition, it is no longer allowed to create a table with an even replication factor
  44. 44. Kudu and CAP Theorem • Kudu is a CP type of storage engine. • Writing to a tablet will be delayed if the server that hosts that tablet’s leader replica fails • Kudu gains the following properties by using Raft consensus: • Leader elections are fast • Follower replicas don’t allow writes, but they do allow reads
  45. 45. Why Kudu Kudu Applicability
  46. 46. Applications for which Kudu is a viable • Reporting applications where new data must be immediately available for end users • Time-series applications with • queries across large amounts of historic data • granular queries about an individual entity • Applications that use predictive models to make real-time decisions
  47. 47. Why Kudu Streaming Analytics Case Study
  48. 48. Business Case • A leader in health care compliance consulting and technology-driven managed services • Cloud-based multi-services platform • It offers • enhanced data security and scalability, • operational managed services, and access to business information http://ihealthone.com /wp-c ontent/uploads/2016/12/ Healthcare_Complianc e_Cons ultants-495x400.jpg
  49. 49. ETL Approach Key Points: • Leverage Confluent platform with Schema Registry • Apply configuration based approach: • Avro Schema in Schema Registry for Input Schema • Impala Kudu SQL scripts for Target Schema • Stick to Python App as primary ETL code, but extend: • Develop new abstractions to work with mapping rules • Streaming processing for both facts and dimensions Cons: • Scaling needs extra effortsData Flow Analytics DWH Event Topics ETL Code Configuration Input Schema Mapping Rules Target Schema Other Configurations
  50. 50. Stream ETL using Pipeline Architecture Cache Manager Mapper/ Flattener Types Adjuster Data Enricher DB Sinker Data Reader Configuration Pipeline Modules: • Data Reader: reads data from source DB • Mapper/Flattener: flatten JSON treelike structure into flat one and maps the field names to target ones • Types Adjuster: adjusts/converts data types properly • Data Enricher: enriches the data structure with new data: • Generates surrogate key • Looks up for the data from target DB(using cache) • DB Sinker: writes data into target DB Other Modules: • Cache Manager: manages the cache with dimension data
  51. 51. Why Kudu Key Types Benchmark
  52. 52. Kudu Numeric vs String Keys • Reason: • Generating surrogate numeric keys adds extra processing step and complexityto the overall ETL process • Sample Schema: • Dimension: • Promotion dimension with 1000 unique members, 30 categories • Products dimension with 50 000 unique members, 300 categories • Facts • Fact table containing the references to the 2 dimension above with 1 million of rows • Fact table containing the references to the 2 dimension above with 100 million of rows
  53. 53. Benchmark Result
  54. 54. Why Kudu Lessons Learnt
  55. 55. Pain Points • Often releases with many changes • Data types Limitations (especially in Python Lib, Impala) • Lack of Sequences/Constraints • Lack of Multi-Row transactions
  56. 56. Limitations • Not recommended more than 50 columns • Immutable primary keys • Non-alterable Primary Key, Partitioning, Column Types • Partitions splitable
  57. 57. Modeling Recommendations: Star Schema Dimensions : • Replication factor equal to number of nodes in a cluster • 1 Tablet per dimension Facts: • Aim for as many tablets as you have cores in the cluster
  58. 58. Why Kudu What Kudu is Not
  59. 59. What Kudu is Not • Not a SQL interface itself • It’s just the storage layer – you should use Impala or SparkSQL • Not an application that runs on HDFS • It’s an alternative, native Hadoop storage engine • Not a replacement for HDFS or Hbase • Select the right storage for the right use case • Cloudera will support and invest in all three
  60. 60. Why Kudu Kudu vs MPP Data Warehouse
  61. 61. Kudu vs MPP Data Warehouses In Common: • Fast analytics queries via SQL • Ability to insert, update, delete data Differences: üFaster streaming inserts üImproved Hadoop integration o Slower batch inserts o No transactional data loading, multi-row transactions, indexing
  62. 62. Useful resources • Community, Downloads, VM: • https://kudu.apache.org • Whitepaper: • http://kudu.apache.org/kudu.pdf • Slack channel: • https://getkudu-slack.herokuapp.com
  63. 63. USA HQ Toll Free: 866-687-3588 Tel: +1-512-516-8880 Ukraine HQ Tel: +380-32-240-9090 Bulgaria Tel: +359-2-902-3760 Germany Tel: +49-69-2602-5857 Netherlands Tel: +31-20-262-33-23 Poland Tel: +48-71-382-2800 UK Tel: +44-207-544-8414 EMAIL info@softserveinc.com WEBSITE: www.softserveinc.com Questions ?

×