This document summarizes a presentation about a new real-time stream processing solution implemented by GetInData for telecommunications company Kcell. The new solution processes over 160,000 events per second from 10 million subscribers using Apache Flink on Apache Hadoop for reliability and scalability. It provides real-time insights for use cases like detecting fraudulent roaming or offering targeted promotions. The implementation leverages open source technologies like Apache Kafka, HDFS, and Druid to cost-effectively meet Kcell's growing analytics needs. Future work includes adding more data sources and machine learning to further enhance the platform.
2. 2
The Speakers
Who are these guys?
Alexey Brodovshuk
@alexeybrod
Krzysztof Zarzycki
@k_zarzyk
3. 3
About Kcell
Kcell JSC is a part of the largest Scandinavian telecommunications holding – TeliaCompany
Kcell has a strong software
development team and lots
of experience in building
services and products
We like innovations
> 10 000 000 subscribers
Largest GSM operator
in Kazakhstan
4G (40%), 3G (73%), 2G
(96%) population
Great network
coverage
There is the ongoing
process of company digital
transformation
Not only telco
4. 4
Business needs
Assisting Millions of User in Real-Time
SMS events
Voice usage events
Data usage events
Roaming events
Location events
Input Process Actions
5. 5
Use Cases
Use case scenarios. Just few of many.
Case
If subscriber top-ups her balance too often in
short period of time. We can offer her a less
expensive tariff or auto-payment services.
Balance Top Up Case
Trigger
UI
6. 6
Roaming
Fraud
Trigger to Marketing Platform if subscriber
visited X country OR/AND registered in Y
visited mobile network and his device's type
is Z
Roaming case
Send an email to the anti-fraud unit if
subscriber registered in roaming but his
balance at the moment is equal to 0.
This situation is impossible in standard case.
Fraud case in roaming
7. 7
Old System
Why did we start to look for the new solution?
External Vendor
Solution
Blackbox Solution
Scalability issues
Not reliable
1
2
3
Kcell Developers can’t fix, tweak or optimize it
Limited to ~2000 events / sec
Can’t support all needed data sources
Multiple accidents which took too much time to resolve
14. 14
New Solution (Data Lake)
Data Lake and Sub-second OLAP Analytics
flink
ingestion outgestion
events
hub
events
processing
HTTP
push/pull
FTP
NFS
MQ
HTTP
push/pull
FTP
MQ
Data Lake
Historical Storage (HDFS)
Batch (Spark) SQL (Hive)
Keep history, Report, Explore
Column-oriented
Data store
OLAP (Druid)
online
15. 15
Decisions made
Some decisions our team made before or during project implementation
Streaming-first approach
Apache Kafka for event hub
Apache Flink
Powerful Real-Time Analytics
16. 16
Apache Avro
Keep state local to the process
Ingest reference data for local joins and
enrichment
● No need to query external systems
while processing
● Data time correlation correctness
Performance
transformed
events
transformed
events
Subscriber profile data
(events)
Local State
Not at >100K
events / sec
17. 17
Nifi for data ingestion (no coding)
● but not for CEP
Web UI for configuring triggers
Ease of Use
18. 18
Flink on YARN, with HDFS
HA for redundancy and running ~24/7
InfluxDb & Grafana for monitoring & alerting
ELK for logs collection and aggregation
Reliability and battle-tested techniques
Kerberos and AD thanks to FreeIPA
Apache Ranger for authorization
Security
19. 19
One platform for the whole Enterprise
Batch (adhoc) queries too
● Spark, Hive/Presto
Online analytics
● OLAP
Extensiveness
HDP
Open-source technologies
HDP as a licence-free distribution
Just start with a bunch of servers
Cost-Efficiency
20. 20
Before You Start
Words of wisdom
1
Simple sketchy trigger request quickly becomes a complex algorithm2
Know your data sources
● Do NOT assume that your data sources are ready for streaming
3
DO prepare yourself to use open-source
● NiFi is a great framework, but not a comprehend set of processors
● HDP is a great distribution, but versions in it are quickly outdated
4
Start small, Start fast
21. 21
Our Collaboration
Two heads are better than one
Joint development team
Not a vendor solution
Development as one team
Code quality
Code review and
automated tools for
code quality control
Agile Practices
Distant geographic
locations, but
everyday standups
Go live quickly!
<4 months to first
production case
running 24/7!
Deliver
DevOps/Automation
Knowledge sharing
Constant knowledge
exchange in areas of
expertise
Testing
Separate testing
environment
Automated Unit/E2E tests
22. 22
Make it a company-wide,
self-service go-to place for data
analysis
Future Work
We have already done a lot. But more great things are coming.
2018 Q2 2018 Q3 2018 Q4 Bright Future
More Data Sources
More Triggers
Geolocation data
CDRs
Equipment logs
Data Lake
Machine Learning
We plan to include machine
learning and other tools that
would enhance our platform even
more
Real-time BI
Intraday view on business and
operations
Call center, clickstream,
communication… all in one place
ready for behavioral analysis
Customer 360 view
Monetize valuable insights from
our combined rich data sources.
Data Monetization
Predictive maintenance
Network Optimization
To lower operational costs
And make better investments
And many more...