This document provides an overview of Apache Kudu, an open source storage layer for Apache Hadoop that enables fast analytics on fast data. Some key points:
- Kudu is a columnar storage engine that allows for both fast analytics queries as well as low-latency updates to the stored data.
- It addresses gaps in the existing Hadoop storage landscape by providing efficient scans, individual row lookups, and mutable data all within the same system.
- Kudu uses a master-tablet server architecture with tablets that are horizontally partitioned and replicated for fault tolerance. It supports SQL and NoSQL interfaces.
- Integrations with Spark, Impala and MapReduce allow it to be used for both
Similar to Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data by Todd Lipcon, Software Engineer, Cloudera / Kudu Founder (20)