Time series in MongoDB - Mydbops Mywebinar Edition 24. - Explore the fascinating world of time series data management in MongoDB with our insightful webinar presentation. Join us as we dive into the intricacies of leveraging MongoDB for time series use cases, discussing best practices, performance optimization techniques, and real-world examples. Discover how MongoDB can empower your applications to efficiently handle time-based data and unlock valuable insights. Don't miss out on this opportunity to enhance your knowledge and stay ahead in the evolving field of data management. Dive into our speaker deck presentation now!
Watch the webinar recording here: https://youtu.be/rwjHRLGZ7pg
Mydbops Blogs: https://www.mydbops.com/blog/
7. What and Why Time Series Collection
❖ MongoDB Time Series Collection is a specialized feature introduced in MongoDB 5.0
❖ Time Series Collection in MongoDB: A specialized collection designed for efficient
storage and management and analysis of time series data.
9. Benefits Of Time Series
Performance
Data Management
Space Management
10. Time Series Components
★ Time when the data point was recorded
★ Metadata also known as a source, is a label or tag
that uniquely identifies a time series and typically
remains unchanged throughout its lifespan.
★ Measurement is a data point tracked at
increments in time
Vestibulum congue
tempus
Lorem ipsum dolor sit amet,
consectetur adipiscing elit, sed do
eiusmod tempor. Donec facilisis
lacus eget mauris.
3
1
2
3
MetaData
TimeField
Measurement
Components
11. Creation Of Time Series Collection
Use the following command to create a time series collection
test:PRIMARY> db.createCollection("windsensors", { timeseries: { timeField: "ts",
metaField: "metadata", granularity: "seconds" } })
{
"ok" : 1,
"$clusterTime" : {
"clusterTime" : Timestamp(1682983705, 3),
"signature" : {
"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
"keyId" : NumberLong(0)
}
},
"operationTime" : Timestamp(1682983705, 3)
}
12. Inserting Of Documents
Inserting of the documents in windsensors collection is same as normal collection.
db.windsensors.insertMany([
{"metadata":{"place":"Hyderabad", "sensorId":52396},"ts":ISODate("2023-06-20T10:00:02Z"),
"value" :18},
{"metadata":{"place":"Hyderabad", "sensorId":52396},"ts":ISODate("2023-06-20T10:02:03Z"),
"value":50}
])
13. Behavior Of Time Series : Granularity for Time Series Data
❖ "granularity" refers to the level of detail or precision at which time series data is stored and queried in a time
series collection. It determines the duration of each bucket or interval within the collection.
❖ Possible values are for granularity are "seconds","minutes","hours".
❖ It determines the maximum time span that a single bucket can cover.
Granularity Covered Time Span
seconds (default) one hour
minutes 24 hours
hours 30 days
15. Change the granularity of a Time Series Collection
To change the granularity parameter value, issue the following collMod command
Once the granularity is set it can only be increased by one level at a time.
db.runCommand({collMod: "windsensors",timeseries: { granularity: "minutes" }})
16. Behavior Of Time Series: Materialized Views on Top of Time
Series Data
MongoDB treats time series collections as writable
non-materialized views, automatically organizing the
inserted data into an optimized storage format.
We can also create a materialised view.Materialized views
on time series data are useful for
❖ Archiving
❖ Analytics
❖ Facilitating data access for teams that cannot access
the raw data
17. Behavior Of Time Series: Materialized Views on Top of Time Series Data
Use the following command to create the On-demand materialized views.
The $match stage filters the data to process only those value greater than one
db.windsensors.aggregate( [
{ $match: {"value":{$gt:1}}},
{ $merge: { into: "windsensors_view", whenMatched: "replace" } }
] )
18. Behavior Of Time Series : Clustered Index
❖ Clustered collections are collections with a clustered
index.
❖ Clustered collections store documents ordered by
clustered index key value.
❖ clustered Indexes is best when you have equality or range
queries
19. Clustered Collection & Index Creation
This section shows clustered collection examples.
"key": { id: 1 }, which sets the clustered index key to the id field.
"unique": true, which indicates the clustered index key value must be unique.
"name": "windsensors clustered key", which sets the clustered index name.
db.runCommand( { create: "windsensors",
clusteredIndex: { "key": { id: 1 }, "unique": true, "name": "windsensors clustered key" }
} )
20. Limitations Of Clustered Index
● You cannot transform a non-clustered collection to a clustered collection, or the reverse.
● You cannot hide a clustered index.
● Clustered collections may not be capped collections.
21. Behavior Of Time Series : Sharding
Shard a Time Series Collection
To create a timeseries collection in sharding.Use the shardCollection() method with the timeseries option.
Limitation
➔ You can't reshard a sharded time series collection. However, you can refine its shard key.
sh.shardCollection("sample.windsensors",{ "metadata.sensorId": 1 },{timeseries:
{timeField: "ts",metaField: "metadata",granularity: "seconds"}})
22. Shard an Existing Time Series Collection
To shard the collection, run the following command
In this example, sh.shardCollection()
● Specifies the metadata.sensorId field as the shard key. sensorId is a sub-field of the collection's metaField.
● When the collection you specify to sh.shardCollection() is a time series collection, you do not need to specify
the timeseries option.
sh.shardCollection( "sample.windsensors_original", { "metadata.sensorId": 1 } )
24. Managing Time Series : Indexes in Time Series Collection
❖ The internal index provides several performance
benefits including improved query efficiency and
reduced disk usage.
❖ When you create a time series collection,
MongoDB automatically creates an internal
clustered index on the time field.
❖ The internal index for a time series collection is
not displayed by listIndexes.
25. Managing Time Series :Secondary Indexes to Time Series Collections
❖ Secondary indexes provide additional flexibility in querying and retrieving data
based on specific fields or criteria and improves query performance and enables
efficient data filtering and sorting.
❖ In MongoDB 5.0 and earlier, both the metaField and timeField can have
secondary indexes.
❖ Starting from MongoDB 6.0, you have the ability to create a secondary index on
any field or subfield.
26. Managing Time Series : expireAfterSeconds In Time Series
❖ To delete data after a specified period, use expireAfterSeconds field to automatically destroy documents
once they reach the expiration duration.
❖ The expiration threshold is the timeField field value plus the specified number of seconds.
db.createCollection(
"weather",
{
timeseries: {
timeField: "timestamp",
metaField: "metadata",
granularity: "hours"
},
expireAfterSeconds: 86400
}
)
27. Timing of Delete Operations
❖ The deletion worker runs for every 60 seconds and removes expired
buckets, and the maximum bucket duration is based on the
collection's granularity.
❖ Expired buckets are removed in the next run of the background task
once all documents within them have expired.
28. Enable Automatic Removal on a Existing Collection
To enable automatic removal of documents for an existing time series collection, issue the following collMod
command.
To retrieve the current value of expireAfterSeconds, use the listCollections command.
db.runCommand({collMod: "windsensors",expireAfterSeconds:86400})
db.runCommand( { listCollections: 1 } )
29. Change the expireAfterSeconds Parameter
To change the expireAfterSeconds parameter value, issue the following collMod
command.
Here we have changing the expireAfterSeconds from 86400 to 172800.
db.runCommand({collMod: "windsensors",expireAfterSeconds: 172800})
30. Disable Automatic Removal
To disable automatic removal, use the collMod command to set expireAfterSeconds to off.
db.runCommand({collMod: "windsensors",expireAfterSeconds: "off"})
31. Managing Time Series : Migrate Data into a Time Series Collection
To migrate data from an existing collection into a
time series collection
❖ Create a New Time Series Collection
❖ Dump of Data
❖ Migrate Data into a Time Series Collection
33. Time Series Compression
❖ Time series collections use zstd compression for efficient storage and retrieval of data.
❖ Starting in MongoDB 5.2, time series collections use column compression.
❖ Columnar compression is a technique used to compress data in a column-wise manner, rather than row-wise.
34. Columnar Compression
❖ Columnar compression works by storing data values from each
column together, rather than storing entire documents or rows.
❖ MongoDB supports columnar compression through the use of
the WiredTiger storage engine.
❖ WiredTiger uses a technique called "prefix compression" to
compress data in a column-oriented manner.
36. Best Practices for Time Series Collections:Optimize Inserts
To optimize insert performance for time series collections, perform the following actions.
❖ Batch Documents by multiple measurements.
db.windsensors.insertMany([
{"metadata":{"place":"Hyderabad", "sensorId":52396},"ts":ISODate("2021-07-10T00:00:022"), "value"
:18.263742590570686},
{"metadata":{"place":"Chennai", "sensorId":31096},"ts":ISODate("2021-07-10700:00:03Z"),
"value":32.53987084180961},
{"metadata":{"place":"Hyderabad", "sensorId":52396},"ts":ISODate("2021-07-10T00:00:032"),
"value":18.106480571706808},
{"metadata":{"place":"Chennai", "sensorId":31096},"ts":ISODate("2021-07-10T00:00:04Z"),
"value":0.6909954039798452}
])
37. Best Practices for Time Series Collections:Optimize Compression
To optimize data compression for time series collections, perform the following actions.
➔ Omit Fields Containing Empty Objects and Arrays from Documents.
➔ Round numeric data to the precision required for your application. Rounding numeric data to fewer decimal
places improves the compression ratio.
38. Best Practices for Time Series Collections:Optimize Compression
For example, consider the following documents
The alternation between value fields with sensors values and an empty array result in a schema change for the
compressor. The schema change causes the second and third documents in the sequence remain uncompressed.
db.windsensors. insertMany([
{"metadata":{"place":"Hyderabad","sensorId":52396},"ts":ISODate("2021-07-10700:00:022"),
"value": [18,20]},
{"metadata":{"place":"Chennai","sensorld":31096},"ts":ISODate("2021-07-10700:00:032"),
"value": []},
{"metadata":{"place":"Hyderabad","sensorId":52396},"ts":ISODate("2021-07-10700:00:032"),
"value": [14,161]}
])
39. Limitations Of Time Series
The following features are not supported for time series collections
➔ Change streams
➔ Client-Side Field Level Encryption
➔ Database Triggers
➔ Schema validation rules
➔ reIndex
➔ renameCollection
➔ Modification of timeField and metaField
➔ Capped Collections