3. Agenda ■ Distributed, Data-Intensive Systems
■ Spring Cloud Data Flow toolkit
■ Operational metrics and monitoring
■ Micrometer, Time Series and Dimensions
■ Architectural Patterns and Practices
■ Q+A
4. “We call an application data-intensive if data is its
primary challenge—the quantity of data, the
complexity of data, or the speed at which it is
changing.”
5. What is Spring Cloud Data Flow
A toolkit for building
data integration, real-
time streaming, and
batch data
processing pipelines.
6. What is Spring Cloud Data Flow
A toolkit for building
data integration, real-
time streaming, and
batch data
processing pipelines.
Data pipelines consist of Spring Boot apps, using
Spring Cloud Stream for event-streaming or
Spring Cloud Task for batch processes.
Ready for Data Integration with >60 out-of-the-
box streaming and batch Apps.
DSL, GUI, and REST-APIs to build and
orchestrate data pipelines onto platforms like
Kubernetes and Cloud Foundry.
Continuous delivery for streaming data pipelines
using Spring Cloud Skipper.
Cron-job scheduler for batch data pipelines using
Spring Cloud Scheduler.
7. What is Spring Cloud Data Flow
Data pipelines consist of Spring Boot apps, using
Spring Cloud Stream for event-streaming or
Spring Cloud Task for batch processes.
Ready for Data Integration with >60 out-of-the-
box streaming and batch Apps.
DSL, GUI, and REST-APIs to build and
orchestrate data pipelines onto platforms like
Kubernetes and Cloud Foundry.
Continuous delivery for streaming data pipelines
using Spring Cloud Skipper.
Cron-job scheduler for batch data pipelines using
Spring Cloud Scheduler.
A toolkit for building
data integration, real-
time streaming, and
batch data
processing pipelines.
8. What is Spring Cloud Data Flow
Data pipelines consist of Spring Boot apps, using
Spring Cloud Stream for event-streaming or
Spring Cloud Task for batch processes.
Ready for Data Integration with >60 out-of-the-
box streaming and batch Apps.
DSL, GUI, and REST-APIs to build and
orchestrate data pipelines onto platforms like
Kubernetes and Cloud Foundry.
Continuous delivery for streaming data pipelines
using Spring Cloud Skipper.
Cron-job scheduler for batch data pipelines using
Spring Cloud Scheduler.
A toolkit for building
data integration, real-
time streaming, and
batch data
processing pipelines.
9. What is Spring Cloud Data Flow
Data pipelines consist of Spring Boot apps, using
Spring Cloud Stream for event-streaming or
Spring Cloud Task for batch processes.
Ready for Data Integration with >60 out-of-the-
box streaming and batch Apps.
DSL, GUI, and REST-APIs to build and
orchestrate data pipelines onto platforms like
Kubernetes and Cloud Foundry.
Continuous delivery for streaming data pipelines
using Spring Cloud Skipper.
Cron-job scheduler for batch data pipelines using
Spring Cloud Scheduler.
A toolkit for building
data integration, real-
time streaming, and
batch data
processing pipelines.
10. What is Spring Cloud Data Flow
Data pipelines consist of Spring Boot apps, using
Spring Cloud Stream for event-streaming or
Spring Cloud Task for batch processes.
Ready for Data Integration with >60 out-of-the-
box streaming and batch Apps.
DSL, GUI, and REST-APIs to build and
orchestrate data pipelines onto platforms like
Kubernetes and Cloud Foundry.
Continuous delivery for streaming data pipelines
using Spring Cloud Skipper.
Cron-job scheduler for batch data pipelines using
Spring Cloud Scheduler.
A toolkit for building
data integration, real-
time streaming, and
batch data
processing pipelines.
11. Runtime and Message Broker Abstraction
Kubernetes
Cloud Foundry
Local / Dev
Rabbit MQ Apache Kafka Google PubSub
Amazon Kinesis Solace
Opportunities: Same code; Same tests; Works with a variety of Message Brokers
12. Common Denominator = Spring Boot => Consolidate On
Development Practices
Test Infrastructure
CI/CD Tooling and Automation
Operational Metrics and Monitoring
14. Data Flow
Lifecycle
• Data Flow accepts the data
pipeline definition and
delegates to Skipper for
lifecycle managements.
• Operational metrics and
Monitoring.
17. Time Series &
Dimensional Data
Model
● A time series is a series of data points ordered in
time order.
● timestamped values belonging to the same
metric and the same set of labeled dimensions.
● Every time series is uniquely identified by
its metric name and a set of key-value pairs,
known as labels, tags, dimensions.
Compute an answer for any challenging question by looking through all the available data
A Time Series Database (TSDB) is a database optimized for time-stamped or time series data. Time series data are simply measurements or events that are tracked, monitored, downsampled, and aggregated over time. This could be server metrics, application performance monitoring, network data, sensor data, events, clicks, trades in a market, and many other types of analytics data.
A Time Series Database is built specifically for handling metrics and events or measurements that are time-stamped. A TSDB is optimized for measuring change over time. Properties that make time series data very different than other data workloads are data lifecycle management, summarization, and large range scans of many records.