5. A series of float value follow time axis
(t1, x1), (t2, x2)… (tn, xn) (where t is time
stampt and x is value at the moment)
Tn
Xn
6. What we need
- A storage to store time series data which is
- Extremely fast to write (hundreds of millions
data point / minute)
- Very fast to read (few thousands query per
sec)
- Efficient space usage (memory/disk)
7. What we can trade off
- Data consistency (data could be duplicate)
- Immutable data (once it written, could not be
change)
8. We need a storage to store
data looks like
message Sample {
double value = 1;
sint64 timestamp = 2;
}
// Serie is a collection of sample data with same serie_id
message Serie {
uint64 id = 1;
repeated Sample samples = 2;
repeated Label labels = 3;
}
// Series collection of series
message Series {
repeated Serie series = 1;
uint32 total_samples = 2;
}
10. We tried many options but
- None of available solutions fit (performance
problem (clickhouse), or overprice (influxdb))
- Or some fit, but poorly maintained (facebook
beringei) or (netflix/atlas)
- Or looks potential, but poorly documented and
very unstable (uber/m3)
11. We decided to
- Build our own on memory TSDB
- For fast read/write
- But trade off for low retention (1 day instead
of months or years)
- But …
- We’re not database expert
12.
13. Solutions
- Reuse as much as possible what people
already did good
- TANSTAAFL — “there ain’t no such thing as
a free lunch”
14. TSDB anatomy
// Series collection of series
message Series {
repeated Serie series = 1;
uint32 total_samples = 2;
}
- How to store Series efficient
- Especially space (because we’re using RAM)
15. Prometheus/tsdb package
- Which provides us
- Implementation to store series as chunk
- And compress it super efficient with loss-less
delta-of-delta encoding algorithm
(bstream.go) (original idea is from beringei)
16. We need better
compression
- Save single byte for each data point == save dozens
Gig of RAM
- Further compress (freeze) old data (not frequent read)
- Lossless compression
- Brotli
- Zstd
18. We need data replication
- We could not just lose data when restart,
replication will solve the problem
- Replication in distributed environment is hard
19.
20. What we need for data
replication?
- Leader election
21. We know where to find
distributed system best
quality package
- github.com/hashicorp
22. hashicorp/raft package
- Golang implementation of the Raft consensus
protocol (https://raft.github.io/)
- Provide us
- Leader election
- Log replication
- EVERY communication between nodes are
stored as replicated log (like event sourcing)
- You need to provide your own replicated log
implementation
27. Thanks to tons of
awesome golang OSS
- Our storage now is serving in avg 1m samples
written per second without any problem
- And could store few billions samples in single
machine
28. Building your own
database is hard, but not
impossible
- We feel that it’s like playing lego with building
blocks area awesome golang OSS package
- May be that’s the reason why many awesome
databases are written in golang
- https://github.com/gostor/awesome-go-storage
29. We still has tons of things
to share, so stay tune!