At GO-JEK, we build products that help millions of Indonesians commute, shop, eat and pay, daily. Data at GO-JEK doesn’t grow linearly with the business, but exponentially, as people start building new products and logging new activities on top of the growth of the business. We currently see 6 Billion events daily and rising. GO-JEK currently has 18+ products. Each and every team publishes events as Protobuf messages to Kafka clusters in order to have a well-defined schema and to ensure backward compatibility. This makes data available to all teams for different use-cases. In order to make sense out of this raw data, we needed to have some data aggregation pipeline. We found Flink to be useful. First use-case/requirement for real-time aggregation : We needed to implement Dynamic Surge Pricing. In order to do this, we needed real-time data of booking being created and drivers available to accept bookings per min per s2Id (http://s2geometry.io/) . We created two Flink jobs to achieve this. What are daggers? After the successful implementation of Surge pricing, we realised that real-time data aggregation can solve a lot of problems. So instead of creating different jobs for these use-cases by ourselves, we came up with a DIY solution for creating Flink jobs. We created a generic application knows as DAGGERS on top of Flink that could take parameters like the topic from which the user wants to read the data along with some options including watermark intervals, delays and parallelism. What is Datlantis? In order to give a DIY interface to the user, we created a portal called Datlantis which allows users to create and deploy massive, production-grade real-time data aggregation pipelines within minutes. Datlantis uses Flink's Monitoring REST API for communicating with the Flink cluster to monitor current jobs and deploying new ones. Now the users can just select Kafka topics from all Kafka clusters and write a simple SQL query on the UI which will spawn a new Flink job. Users also have the option to select one more Kafka data-stream in order to write JOINS query. This Flink job pushes data to InfluxDB, that enables the user to visualize their data on Grafana dashboards. Once the logic of the SQL is verified using the dashboard, the Flink job is then promoted to push the data to Kafka. The users can manage their Flink jobs on Datlantis. They can edit the jobs, stop or restart the jobs or change the job configurations. They can also see logs of their Flink jobs on Datlantis itself. The reasoning behind pushing data back to Kafka is so that the aggregated data is available for all the other teams. Our application FIREHOSE takes care of consuming this data from Kafka and pushing it to different sinks like relational DB sink, HTTP sink, GRPC sink, Influx sink, Redis sink etc. This data is then pushed to our cold storage which enables us to do historical analysis. Data Pipeline: Producer Apps → Kafka → Deserialization → DataStream → SQL → Result → Serialize
5. Data Aggregation
● Started with one use case - Dynamic Surge Pricing
● Hand coded Flink jobs in 4 weeks
● 20 other use cases in pipeline
● Created a DIY platform
6.
7. Daggers
● Generic Flink Job
● Feeds data from Kafka
● Deserializes protobuf messages
● Can process upto 2 streams
● Aggregated data from stream(s) is sent to a sink
8. 3
Flink SQL
Apply SQL and generate result
4
Data Stream Sink
Sink result to Influx or Kafka Sinks
1
Kafka Connector
Consume Protobuf encoded data from
Kafka
2
Streaming Table Source
Decode data and registering a streaming
table and UDFS
Dagger Insights
10. Datlantis
● DIY platform
● User friendly interface to a fully automated system
● Creates and deploys DAGGERS
● In minutes using SQL-like syntax
11. What it does?
● Flink’s Monitoring REST API
● Get status of currently running Jobs
● Create new dagger jobs on flink cluster
● Stop any running Job
● Edit any running Job
23. Time Series Sink
● Preview mode
● Default data sink
● Integrated with grafana - used for monitoring & alerting
24.
25. Kafka Sink
● Publish to Kafka topic
● Another DIY tool to sink from Kafka to one of the following:
○ Services - HTTP or GRPC
○ DB - relational OR time series
○ Analytics platforms - Clevertap or Mixpanel
○ Log - for debugging
29. 1K+
● Spanned over 6 Flink Cluster
● Most of it created by analysts
● Actively used for monitoring
● Dashboards created are used by
city heads
REAL TIME DAGGERS
2 min
● Single Form to create DAGGER
● The data can be sent to a sink
● Data ready to be consumed as soon
as generated
TO PRODUCTION
1+ TB
● Real time data analysis across all cluster
● Processed data is sent to one of the sinks
DATA PROCESSED EVERYDAY
30. Deployment
● We have Flink Clusters on Yarn and Kubernetes
● Checkpointing - HDFS and Google Cloud Storage
● Dagger Kubernetes controller -
○ Job JAR is available on Flink cluster
○ Scales cluster when more slots needed
34. Alerting
● Automated alerts from Datlantis
● Users are provided with a Health dashboard
● Alerts are sent to specific teams via their slack channels and pager
duties
35. 25+ Metrics03
● For allocation metrics
● Created & maintained by analysts
44,000 geolocation02
● For dynamic surge pricing
● Demand & supply
5+ Billion Messages/day01
● For system uptime
● Across 500 microservices
User segmentation & Real-
time triggers
04
● For growth campaign
● 26% better conversion
Impact
36. Let’s talk !
Prakhar Mathur
Medium : @prakharmathur_345
Rohil Surana
Medium : @rohilsurana