Druid is a real-time analytics engine that allows for fast querying of large datasets. It is column-oriented, supports high concurrency, and can scale to thousands of servers and millions of messages per second. Druid uses an immutable segment-based architecture to store data efficiently and supports real-time and batch ingestion of data. It allows for complex queries through SQL and JSON and uses approximations and data sketches to improve query performance for large datasets.
2. San Francisco Airport Marriott Waterfront
Real-Time Analytics at Scale
https://www.druidsummit.org/
Register Using Code “Webinar50” and Receive 50% Off!
4. Key features
● Column oriented
● High concurrency
● Scalable to 1000s of servers, millions of messages/sec
● Continuous, real-time ingest
● Query through SQL
● Target query latency sub-second to a few seconds
4
5. Open core
Imply’s open engine, Druid, is becoming a standard part of modern data infrastructure.
Druid
● Next generation analytics engine
● Widely adopted
Workflow transformation
● Subsecond speed unlocks new workflows
● Self-service explanations of data patterns
● Make data fun again
5Confidential. Do not redistribute.
6. Where Druid fits in
6
Data lakes
Message buses
Raw data Storage Analyze Application
7. What is Druid ?
7
Search
platform
OLAP
● Real-time ingestion
● Flexible schema
● Full text search
● Batch ingestion
● Efficient storage
● Fast analytic queries
Timeseries
database
● Optimized for
time-based datasets
● Time-based functions
Druid combines ideas to power a new type of analytics application.
11. Data Server
Historical
● Stores data segments
● Handles most of computation workloads
MiddleManager
● Controls tasks (peons)
Peons
● Ingests and indexes streaming data
● Serves data that is in-flight
15. Streaming Ingestion
Method Kafka Kinesis Tranquility
Supervisor
type
kafka kinesis N/A
How it works
Druid reads
directly from
Apache Kafka.
Druid reads directly
from Amazon
Kinesis.
Tranquility, a library that ships separately
from Druid, is used to push data into
Druid.
Can ingest
late data?
Yes Yes
No (late data is dropped based on the
windowPeriod config)
Exactly-once
guarantees?
Yes Yes No
16. Batch Ingestion
Method Native batch (simple) Native batch (parallel) Hadoop-based
Parallel? No. Each task is single-threaded.
Yes, if firehose is splittable and
maxNumConcurrentSubTasks> 1 in
tuningConfig. See firehose
documentation for details.
Yes, always.
Can append or
overwrite?
Yes, both. Yes, both. Overwrite only.
File formats
Text file formats (CSV, TSV,
JSON).
Text file formats (CSV, TSV, JSON).
Any Hadoop
InputFormat.
Rollup modes
Perfect if forceGuaranteedRollup=
true in the tuningConfig.
Perfect if forceGuaranteedRollup=
true in the tuningConfig.
Always perfect.
Partitioning
options
Hash-based partitioning is
supported when
forceGuaranteedRollup= true in
the tuningConfig.
Hash-based partitioning (when
forceGuaranteedRollup= true).
Hash-based or
range-based
partitioning via
partitionsSpec.
18. How data is structured
● Druid stores data in immutable segments
● Column-oriented compressed format
● Dictionary-encoded at column level
● Bitmap Index Compression : concise & roaring
○ Roaring -typically recommended, faster for boolean operations such
as filters
● Rollup (partial aggregation)
19. Choose column types carefully
String column
indexed
fast aggregation
fast grouping
Numeric column
indexed
fast aggregation
fast grouping
22. Druid Segments
2011-01-01T00:01:35Z Justin Bieber SF 10 5
2011-01-01T00:03:45Z Justin Bieber LA 25 37
2011-01-01T00:05:62Z Justin Bieber SF 15 19
2011-01-01T01:06:33Z Ke$ha LA 30 45
2011-01-01T01:08:51Z Ke$ha LA 16 8
2011-01-01T01:09:17Z Miley Cyrus DC 75 10
2011-01-01T02:23:30Z Miley Cyrus DC 22 12
2011-01-01T02:49:33Z Miley Cyrus DC 90 41
Segment 2011-01-01T00/2011-01-01T01
Segment 2011-01-01T01/2011-01-01T02
Segment 2011-01-01T02/2011-01-01T03
timestamp page city added deleted
23. page (STRING)
Anatomy of Druid Segment
Physical storage format
removed
(LONG)__time (LONG)
1293840000000
1293840000000
1293840000000
1293840000000
1293840000000
1293840000000
1293840000000
1293840000000
DATA
DICT
INDEX
0
0
0
1
1
2
2
2
Justin = 0
Ke$ha = 1
Miley = 2
[0,1,2](11100000)
[3,4] (00011000)
[5,6,7](0000111)
25
42
17
170
112
67
53
94
DATA2
1
2
1
1
0
0
0
IND
[0,2] (10100000)
[1,3,4](01011000)
[5,6,7](00000111)
DICT
DC = 0
LA=1
SF = 2
INDEX
city (STRING)
added
(LONG)
1800
2912
1953
3194
5690
1100
8423
9080
Dict encoded
(sorted)
Bitmap index
(stored compressed)
25. Optimize segment size
Ideally 300 - 700 mb (~ 5 million rows)
To control segment size
● Alter segment granularity
● Specify partition spec
● Use Automatic Compaction
26. Controlling Segment Size
● Segment Granularity - Increase if only 1 file per segment and <
200MB
"segmentGranularity": "HOUR"
● Max Rows Per Segment - Increase if a single segment is <
200MB
"maxRowsPerSegment": 5000000
27. Compaction
● Combines small segments into larger segments
● Useful for late-arriving data
● Task submitted to Overlord
{
"type" : "compact",
"dataSource" : "wikipedia",
"interval" : "2017-01-01/2018-01-01"
}
29. Rollup
timestamp page city count sum_added sum_deleted
2011-01-01T00:00:00Z
Justin
Bieber
SF 3 50 61
2011-01-01T00:00:00Z Ke$ha LA 2 46 53
2011-01-01T00:00:00Z Miley
Cyrus
DC 4 198 88
timestamp page city added deleted
2011-01-01T00:01:35Z
Justin
Bieber
SF 10 5
2011-01-01T00:03:45Z
Justin
Bieber
SF 25 37
2011-01-01T00:05:62Z
Justin
Bieber
SF 15 19
2011-01-01T00:06:33Z Ke$ha LA 30 45
2011-01-01T00:08:51Z Ke$ha LA 16 8
2011-01-01T00:09:17Z
Miley
Cyrus
DC 75 10
2011-01-01T00:11:25Z
Miley
Cyrus
DC 11 25
2011-01-01T00:23:30Z
Miley
Cyrus
DC 22 12
2011-01-01T00:49:33Z
Miley
Cyrus
DC 90 41
30. Roll-up vs no roll-up
Do roll-up
● Working with space constraint.
● No need to retain high cardinality dimensions (like user id, precise
location information).
● Maximize price / performance.
Don’t roll-up
● Need the ability to retrieve individual events.
● May need to group or filter on any column.
31. Partitioning beyond time
● Druid always partitions by time
● Decide which dimension to
partition on… next
● Partition by some dimension you
often filter on
● Improves locality, compression,
storage size, query performance
32. Modeling data for fast search
Exact match or prefix filtering
○ Uses binary search
○ Only dictionary + index section of
dimension is needed
○ Example, store SSN backwards :
123-45-6789 if searching last-4 digits
frequently.
select count(*) from wikiticker where "comment" like 'A%'
select count(*) from wikiticker where "comment" like '%A%'
33. Approx Algorithms
● Data sketches are lossy data structures
● Tradeoff accuracy for reduced storage and improved
performance.
● Summarize data at ingestion time using sketches
● Improves roll-up, reduce memory footprint
34. Summarize with data sketches
timestamp page city count
sum_
added
sum_
deleted userid_sketch
2011-01-01T00:00:00Z
Justin
Bieber
SF 3 50 61 sketch_obj
2011-01-01T00:00:00Z Ke$ha LA 2 46 53 sketch_obj
2011-01-01T00:00:00Z Miley
Cyrus
DC 4 198 88 sketch_obj
timestamp page userid city added deleted
2011-01-01T00:01:3
5Z
Justin
Bieber
user11 SF 10 5
2011-01-01T00:03:4
5Z
Justin
Bieber
user22 SF 25 37
2011-01-01T00:05:6
2Z
Justin
Bieber
user11 SF 15 19
2011-01-01T00:06:3
3Z
Ke$ha user33 LA 30 45
2011-01-01T00:08:5
1Z
Ke$ha user33 LA 16
8
2011-01-01T00:09:1
7Z
Miley
Cyrus
user11 DC 75 10
2011-01-01T00:11:2
5Z
Miley
Cyrus
user44 DC 11 25
2011-01-01T00:23:3
0Z
Miley
Cyrus
user44 DC 22 12
2011-01-01T00:49:3
3Z
Miley
Cyrus
user55 DC 90 41
35. When close enough is good enough
Approximate queries can provide up to 99% accuracy while greatly
improving performance
● Bloom Filters
○ Self Joins
● Theta Sketches
○ Union/Intersection/Difference
● HLL Sketches
○ Count Distinct
● Quantile Sketches
○ Median, percentiles
36. When close enough is good enough
● Hashes can be calculated at query time or ingestion
○ Pre-computed hashes can save up to 50% query time
● K value determines precision and performance
● Default values will count within 5% accuracy 99% of the time.
● HLL and Theta sketch can both provide COUNT DISTINCT
support, but HLL will do it faster and more accurately with a
smaller data footprint.
● Theta Sketches are more flexible, but require more storage.
38. Use Druid SQL
● Easier to learn/more familiar
● Will attempt to make intelligent query type choices (timeseries
vs topN vs groupBy)
● There are some limitations - such as multi-value dimensions,
not all aggregations are supported
40. Explain Plan
EXPLAIN PLAN FOR
SELECT channel, sum(added)
FROM wikipedia
WHERE commentLength >= 50
GROUP BY channel
ORDER BY sum(added) desc
LIMIT 3
42. Pick your query carefully
● TimeBoundary - Returns min/max timestamp for given interval.
● Timeseries - When you don’t want to group by dimension
● TopN - When you want to group by a single dimension
○ Approximate if > 1000 dimension values
● GroupBy - Least performant/most flexible
● Scan - For returning streaming raw data
○ Perfect ordering not preserved
● Select - For returning paginated raw data
● Search - Returns dimensions that match text search
43. Datasources
● Table
○ Most basic, queries from single Druid table
● Union
○ Equivalent of UNION ALL
○ Returns results of multiple queries
● Query
○ Equivalent of a Sub-Query
○ Results of inner query used as datasource for outer query
○ Increases load on broker
44. Filters
● Interval
○ Matches time range, can be used on __time or any column with
millisecond timestamp
● Selector
○ Matches a single dimension to a value
● Column Comparison
○ Compares two columns. Example ColA == ColB
● Search
○ Filter on partial string matches
● In
○ Matches to list of values
45. Filters (cont)
● Like Filter
○ Equivalent of SQL LIKE
○ Can perform better than Search Filter for prefix only searching
A note about Extraction Functions….
Most filters also support extraction functions. Performance of
functions varies greatly, and if possible using functions on ingest
instead of query time can improve performance
46. Context
● QueryID can be specified - prefixes are very useful for
debugging in Clarity
● Can control timeout, cache usage, and other fine tuning
parameters
● MinTopNThreshold
○ Default 1000 - specifies minimum number of records to return for
merging topN results. Can be increased to improve precision
● skipEmptyBuckets
○ Stops Zero-filling of timeseries queries
https://druid.apache.org/docs/latest/querying/query-context.html
47. Using Lookups
● Lookups are key/value pairs stored on every node.
○ Stored in memory
○ Alpha - stored disk in PalDB format
● Lookups loaded via Coordinator API
● Can be queries with either JSON or SQL queries
48. Virtual Columns and Expressions
● Used to manipulate columns at query time, including for
lookups
{
"type": "expression",
"name": “outputRowName”,
"expression": replace(inputColumn,”foo”,”bar”),
"outputType": “STRING”
}
49. Other Approximate SQL functions
● APPROX_COUNT_DISTINCT
○ Uses HyperUnique - can be a dimension or cardinality column
● APPROX_COUNT_DISTINCT_DS_HLL
○ Same as above, but using DS
○ More ability to tune precision
● APPROX_QUANTILE
○ Calculates Quantiles using ApproxHistogram algorithm
● APPROX_QUANTILE_DS
○ Calculates Quantiles using DS
● APPROX_QUANTILE_FIXED_BUCKETS
○ Calculates fixed bin histogram using DS
○ Faster calculation than APPROX_QUANTILEs
More Details: https://druid.apache.org/docs/latest/querying/sql.html