SlideShare a Scribd company logo
1 of 20
Importance of ‘Centralized Event collection’
and BigData platform for Analysis !

DevOpsDays India, Bangalore - 2013

~/Piyush
Manager, Website Operations at MakeMyTrip
What to expect:










MakeMyTrip data challenges!
Event Data a.k.a. Logs & Log Analysis
Why Centralized Logging …for systems and applications !
Capturing Events: Why structured data emitted from apps for
machines is a better approach!
Data Service Platform : DSP – Why ?
Inputs: Data for DSP
Top Architecture Considerations
Top level key tasks
Tools Arsenal and API Management and Service Cloud

DevOpsDays India 2013 : ~/Piyush
MakeMyTrip data challenges …!
•
•

Multi-DC/colocation setup
Different type of data sources : internal/ external(structured, semi-structured,
unstructured))
– Online Transaction Data Store
– ERP
– CRM
•

Email Behavior / Survey results

– Web Analytics
– Logs
•
•
•

–
–
–
–

Web
Application
User Activity logs

Social Media
Inventory / Catalog
Data residing in excel files
Monitoring Metric Data :
•
•

Graphite (Time-series whisper),
Splunk , ElasticSearch (Logstash)

– Many other different sources

•

Storing and Analyzing Huge Event Data !

DevOpsDays India 2013 : ~/Piyush
Some challenges …!
•
•
•
•
•
•

Aggregate web usage data and transactional data to generate one view
Process multiple GB's-TB’s of data every day
Serve more than a million data services API request / day
Ensure business continuity as more and more reliance on MyDSP increases
Store Terabytes of historical data
Meshing transactional (online and offline) data with consumer behavior
and derive analytics
• Build flexible data ingestion platform to manage many data feeds from
multiple data sources

DevOpsDays India 2013 : ~/Piyush
Flow of an Event

DevOpsDays India 2013 : ~/Piyush
Event Data a.k.a. Logs
• Event Data -> set of chronologically sequenced data records that capture
information about an event !
• Virtually every form of system produces event data
– Capture it from all components and both client and server side events!

• You may call logs as the footprint generated by any activity with the
system/app.
• Event Data has different characteristics from data stored in traditional
data warehouses
– Huge Volume: Event data accumulates rapidly and often must be stored for years; many
organizations are managing hundreds of terabytes and some are managing petabytes.
– Format: Because of the huge variety of sources, event data is unstructured and semi
structured.
– Velocity – New event data is constantly coming in
– Collection : Event data is difficult to collect because of broadly dispersed systems and
networks.
– Time-stamped : Event data is always inserted once with a time-stamp. It never changes.

DevOpsDays India 2013 : ~/Piyush
Log Analysis
• Logs are one of the most useful things when it comes to analysis; in simple
terms Log analysis is making sense out of system/app-generated log
messages (or just LOGS). Through logs we get insights into what is
happening into the system.
• Help root cause analysis that occurs after any incident.
• Personalize User Experience Analyzing Web Usage Data
“Security Req“:
• Traditionally some compliance requirements too of : Log Management
/SEM+ SIM => SIEM
• For Data Security – to have one centralized platform for collecting ALL
events (Logs) , correlate them and have real time intelligent visibility.
• To not just monitor network, OS , devices etc. but ALL applications ,
business processes too.

DevOpsDays India 2013 : ~/Piyush
Why Centralized Logging …for systems and applications !
• Need for Centralized Logging is quiet important nowadays due to:–
–
–
–

growth in number of applications,
distributed architecture (Service Oriented Architecture)
Cloud based apps
number of machines and infrastructure size is increasing day by day.

• This means that centralized logging and the ability to spot errors in a
distributed systems & applications has become even more “valuable” &
“needed”.
And most importantly
– be able to understand the customers and how they interact with websites;
– Understanding Change: whether using A/B or Multivariate experiments or tweak /
understand new implementations.

DevOpsDays India 2013 : ~/Piyush
Capturing Events: Why structured data emitted from apps for
machines is a better approach!
• Need for standardization:– Developers assume that the first level consumer of a log message is a human and they
only know what information is needed to debug an issue.
Logs are not just for humans!
The primary consumers of logs are shifting from humans to computers. This means log
formats should have a well-defined structure that can be parsed easily and robustly.
Logs change!
If the logs never changed, writing a custom parser might not be too terrible. The
engineer would write it once and be done. But in reality, logs change.
Every time you add a feature, you start logging more data, and as you add more data,
the printf-style format inevitably changes. This implies that the custom parser has to be
updated constantly, consuming valuable development time.

• Suggested Approach : “Logging in JSON Format”
– Just to keep it simple and generic for any Application the approach
recommended is to {Key: Value} , JSON Log Format (structured/semistructured).
– This approach will be helpful for easy parsing and consumption, which
would be irrespective of whatever technology/tools we choose to use!

DevOpsDays India 2013 : ~/Piyush
Key things to keep in mind/ Rules
•
•
•
•
•
•
•
•

•
•

Use timestamps for every event
Use unique identifiers (IDs) like Transaction ID / User ID / Session ID or may be
append unique user Identification (UUID) number to track unique users.
Log in text format / means Avoid logging binary information!
Log anything that can add value when aggregated, charted, or further
analyzed.
Use categories: like “severity”: “WARN”, INFO, WARN, ERROR, and DEBUG.
The 80/20 Rule: %80 or of our goals can be achieved with %20 of the work, so
don’t log too much 
NTP synced same date time / timezone on every producer and collector
machine(#ntpdate ntp.example.com).
Reliability: Like video recordings … you don’t’ want to lose the most valuable
shoot … so you record every frame and then later during analysis; you may
throw away rest of the stuff…picking your best shoot / frame. Here also – logs
as events are recorded & should be recorded with proper reliability so that
you don’t’ lose any important and usable part of it like the important video
frame.
Correlation Rules for various event streams to generated and minimize
alerts/events.
Write Connectors for integrations
DevOpsDays India 2013 : ~/Piyush
Data Service Platform : DSP
Why we need a data services platform ?
-

-

Integration Layer to bring data from more
sources in less time
Serve various components – applications
and also to Monitoring systems etc.

DevOpsDays India 2013 : ~/Piyush
Inputs : Data – what data to include
• Clickstream / Web Usage Data
– User Activity Logs

• Transactional Data Store
• Off-line
– CRM
– Email Behavior -> Logs/ Events

DevOpsDays India 2013 : ~/Piyush
Top Architecture Considerations
•
•
•
•
•

Non blocking data ingestion
UUID Tagged Events / messages
Load balanced data processing across data centers
Use of memory based data storage for real-time data systems
Easy scalable, HA - highly available and easy to maintain large historical
data sets
• Data caching to achieve low latency
• To ensure Business Continuity , parallel process between two different
data centers
• Use of Centralized service cloud for API management , security
(authentication, authorization), metering and integration

DevOpsDays India 2013 : ~/Piyush
Top level key tasks for User Activity Logging & Analysis
1. Data Collection of both Client-Side and Server-Side user activity streams
•
•

Tag every Website visitor with UUID similar to the System UUID’s
Collect the activity streams on BigData Platform for Analysis through Kafka Queues & NoSQL data
stores

2. Near real-time Data Processing
•

Preprocessing / Aggregations
•

•

Filtering etc.

Pattern Discovery along with the already available cooked data from point 4
•

Clustering/Classification/association discovery/Sequence Mining

3. Rule Engine / recommendations algorithms
•
•

Rule Engine : Building effective business rule engine / Correlate Events
Content-based filtering / Collaborative Filtering

4. Batch Processing / post processing using Hadoop Ecosystem
•

Analysis & Storing Cooked data in NoSQL data store

5. Data Services (Web-services)
•

RESTful API’s to make the data/insights consumable through various data services

6. Reporting/Search interface & Visualization for Product Development teams and other
business owners.

DevOpsDays India 2013 : ~/Piyush
Data System
Lets’ store
everything!

Query =
function (data)
Layered
Architecture:

• every event : Data !

• Precompute View

• Batch Layer : Hadoop M/R
• Speed Layer : Storm NRT Computation
• Serving Layer
DevOpsDays India 2013 : ~/Piyush
DevOpsDays India 2013 : ~/Piyush
Clickstream / User Activities Capture : Data is-> “Events”
•

•

Tag every Website visitor with UUID using Apache module - Done
– https://github.com/piykumar/modified_mod_cookietrack
– Cookie : UUID like 24617072-3124-674f-4b72-675746562434.1381297617597249
JSON Messages like

{
"timestamp": "2012-12-14T02:30:18",
"facility": "clientSide",
"clientip": "123.123.123.123",
"uuid": "24617072-3124-5544-2f61-695256432432.1379399183414528",
"domain": "www.example.com",
"server": "abc-123",
"request": "/page/request",
"pagename": "funnel:example com:page1",
"searchKey": "1234567890_",
"sessionID": "11111111111111",
"event1": "loading",
"event2": "interstitial display banner",
"severity": "WARN",
"short_message": "....meaning short message for aggregation...",
"full_message": "full LOG message",
"userAgent": "...blah...blah..blah...",
"RT": 2
}

DevOpsDays India 2013 : ~/Piyush
Tools Arsenal
•
•
•
•
•
•
•
•
•
•
•

ETL : Talend
BI : SpagoBI & QlikView
Hadoop : Hortonworks
NRT Computation: Twitter Storm
Document-Oriented NoSQL DB : Couchbase
Distributed Search: ElasticSearch
Log Collection: Flume, Logstash, Syslog-NG
Distributed messaging system : Kafka , RabbitMQ
NoSQL : Cassandra, Redis, Neo4J (Graph)
API Management : WSO2 API Manager, 3Scale /Nginx
Programming Languages : Java , Python, R

DevOpsDays India 2013 : ~/Piyush
API Management and Data Services
Cloud
• 3Scale / Nginx , WSO2: API Manager etc
– For centralized distributed repository to serve API’s and provides
throttling,meetring, Security features etc.

• Inject building a data services layer in Culture
and make sure what ever components you
create you have some way to chain it in the
pipeline or call in independently.

DevOpsDays India 2013 : ~/Piyush
Thanks!
Questions – If Any  !

~/Piyush
@piykumar
http://piyush.me

DevOpsDays India 2013 : ~/Piyush

More Related Content

What's hot

WSO2Con ASIA 2016: Building Apps Using WSO2 App Dev Platform
WSO2Con ASIA 2016: Building Apps Using WSO2 App Dev PlatformWSO2Con ASIA 2016: Building Apps Using WSO2 App Dev Platform
WSO2Con ASIA 2016: Building Apps Using WSO2 App Dev PlatformWSO2
 
Keep your Metadata Repository Current with Event-Driven Updates using CDC and...
Keep your Metadata Repository Current with Event-Driven Updates using CDC and...Keep your Metadata Repository Current with Event-Driven Updates using CDC and...
Keep your Metadata Repository Current with Event-Driven Updates using CDC and...confluent
 
A Practical Guide to Selecting a Stream Processing Technology
A Practical Guide to Selecting a Stream Processing Technology A Practical Guide to Selecting a Stream Processing Technology
A Practical Guide to Selecting a Stream Processing Technology confluent
 
Introduction to Apache Kafka and Confluent... and why they matter
Introduction to Apache Kafka and Confluent... and why they matterIntroduction to Apache Kafka and Confluent... and why they matter
Introduction to Apache Kafka and Confluent... and why they matterconfluent
 
Por trás da infraestrutura do Cloud - Campus Party 2014
Por trás da infraestrutura do Cloud - Campus Party 2014Por trás da infraestrutura do Cloud - Campus Party 2014
Por trás da infraestrutura do Cloud - Campus Party 2014Gleicon Moraes
 
OCCIware@POSS 2016 - an extensible, standard XaaS cloud consumer platform
OCCIware@POSS 2016 - an extensible, standard XaaS cloud consumer platformOCCIware@POSS 2016 - an extensible, standard XaaS cloud consumer platform
OCCIware@POSS 2016 - an extensible, standard XaaS cloud consumer platformMarc Dutoo
 
Change data capture with MongoDB and Kafka.
Change data capture with MongoDB and Kafka.Change data capture with MongoDB and Kafka.
Change data capture with MongoDB and Kafka.Dan Harvey
 
Mainframe Integration, Offloading and Replacement with Apache Kafka | Kai Wae...
Mainframe Integration, Offloading and Replacement with Apache Kafka | Kai Wae...Mainframe Integration, Offloading and Replacement with Apache Kafka | Kai Wae...
Mainframe Integration, Offloading and Replacement with Apache Kafka | Kai Wae...HostedbyConfluent
 
A closer look to locaweb IaaS
A closer look to locaweb IaaSA closer look to locaweb IaaS
A closer look to locaweb IaaSGleicon Moraes
 
War Stories: DIY Kafka
War Stories: DIY KafkaWar Stories: DIY Kafka
War Stories: DIY Kafkaconfluent
 
Bootstrap SaaS startup using Open Source Tools
Bootstrap SaaS startup using Open Source ToolsBootstrap SaaS startup using Open Source Tools
Bootstrap SaaS startup using Open Source Toolsbotsplash.com
 
WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber
WSO2Con USA 2017: Scalable Real-time Complex Event Processing at UberWSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber
WSO2Con USA 2017: Scalable Real-time Complex Event Processing at UberWSO2
 
Scala eXchange: Building robust data pipelines in Scala
Scala eXchange: Building robust data pipelines in ScalaScala eXchange: Building robust data pipelines in Scala
Scala eXchange: Building robust data pipelines in ScalaAlexander Dean
 
URP? Excuse You! The Three Metrics You Have to Know
URP? Excuse You! The Three Metrics You Have to Know URP? Excuse You! The Three Metrics You Have to Know
URP? Excuse You! The Three Metrics You Have to Know confluent
 
Reactive Fast Data & the Data Lake with Akka, Kafka, Spark
Reactive Fast Data & the Data Lake with Akka, Kafka, SparkReactive Fast Data & the Data Lake with Akka, Kafka, Spark
Reactive Fast Data & the Data Lake with Akka, Kafka, SparkTodd Fritz
 
Using Kafka as a Database For Real-Time Transaction Processing | Chad Preisle...
Using Kafka as a Database For Real-Time Transaction Processing | Chad Preisle...Using Kafka as a Database For Real-Time Transaction Processing | Chad Preisle...
Using Kafka as a Database For Real-Time Transaction Processing | Chad Preisle...HostedbyConfluent
 
How does Riak compare to Cassandra? [Cassandra London User Group July 2011]
How does Riak compare to Cassandra? [Cassandra London User Group July 2011]How does Riak compare to Cassandra? [Cassandra London User Group July 2011]
How does Riak compare to Cassandra? [Cassandra London User Group July 2011]Rainforest QA
 
Taboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache SparkTaboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache Sparktsliwowicz
 
Metrics Are Not Enough: Monitoring Apache Kafka and Streaming Applications
Metrics Are Not Enough: Monitoring Apache Kafka and Streaming ApplicationsMetrics Are Not Enough: Monitoring Apache Kafka and Streaming Applications
Metrics Are Not Enough: Monitoring Apache Kafka and Streaming Applicationsconfluent
 
When the Cloud is a Rockin: High Availability in Apache CloudStack
When the Cloud is a Rockin: High Availability in Apache CloudStackWhen the Cloud is a Rockin: High Availability in Apache CloudStack
When the Cloud is a Rockin: High Availability in Apache CloudStackJohn Burwell
 

What's hot (20)

WSO2Con ASIA 2016: Building Apps Using WSO2 App Dev Platform
WSO2Con ASIA 2016: Building Apps Using WSO2 App Dev PlatformWSO2Con ASIA 2016: Building Apps Using WSO2 App Dev Platform
WSO2Con ASIA 2016: Building Apps Using WSO2 App Dev Platform
 
Keep your Metadata Repository Current with Event-Driven Updates using CDC and...
Keep your Metadata Repository Current with Event-Driven Updates using CDC and...Keep your Metadata Repository Current with Event-Driven Updates using CDC and...
Keep your Metadata Repository Current with Event-Driven Updates using CDC and...
 
A Practical Guide to Selecting a Stream Processing Technology
A Practical Guide to Selecting a Stream Processing Technology A Practical Guide to Selecting a Stream Processing Technology
A Practical Guide to Selecting a Stream Processing Technology
 
Introduction to Apache Kafka and Confluent... and why they matter
Introduction to Apache Kafka and Confluent... and why they matterIntroduction to Apache Kafka and Confluent... and why they matter
Introduction to Apache Kafka and Confluent... and why they matter
 
Por trás da infraestrutura do Cloud - Campus Party 2014
Por trás da infraestrutura do Cloud - Campus Party 2014Por trás da infraestrutura do Cloud - Campus Party 2014
Por trás da infraestrutura do Cloud - Campus Party 2014
 
OCCIware@POSS 2016 - an extensible, standard XaaS cloud consumer platform
OCCIware@POSS 2016 - an extensible, standard XaaS cloud consumer platformOCCIware@POSS 2016 - an extensible, standard XaaS cloud consumer platform
OCCIware@POSS 2016 - an extensible, standard XaaS cloud consumer platform
 
Change data capture with MongoDB and Kafka.
Change data capture with MongoDB and Kafka.Change data capture with MongoDB and Kafka.
Change data capture with MongoDB and Kafka.
 
Mainframe Integration, Offloading and Replacement with Apache Kafka | Kai Wae...
Mainframe Integration, Offloading and Replacement with Apache Kafka | Kai Wae...Mainframe Integration, Offloading and Replacement with Apache Kafka | Kai Wae...
Mainframe Integration, Offloading and Replacement with Apache Kafka | Kai Wae...
 
A closer look to locaweb IaaS
A closer look to locaweb IaaSA closer look to locaweb IaaS
A closer look to locaweb IaaS
 
War Stories: DIY Kafka
War Stories: DIY KafkaWar Stories: DIY Kafka
War Stories: DIY Kafka
 
Bootstrap SaaS startup using Open Source Tools
Bootstrap SaaS startup using Open Source ToolsBootstrap SaaS startup using Open Source Tools
Bootstrap SaaS startup using Open Source Tools
 
WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber
WSO2Con USA 2017: Scalable Real-time Complex Event Processing at UberWSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber
WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber
 
Scala eXchange: Building robust data pipelines in Scala
Scala eXchange: Building robust data pipelines in ScalaScala eXchange: Building robust data pipelines in Scala
Scala eXchange: Building robust data pipelines in Scala
 
URP? Excuse You! The Three Metrics You Have to Know
URP? Excuse You! The Three Metrics You Have to Know URP? Excuse You! The Three Metrics You Have to Know
URP? Excuse You! The Three Metrics You Have to Know
 
Reactive Fast Data & the Data Lake with Akka, Kafka, Spark
Reactive Fast Data & the Data Lake with Akka, Kafka, SparkReactive Fast Data & the Data Lake with Akka, Kafka, Spark
Reactive Fast Data & the Data Lake with Akka, Kafka, Spark
 
Using Kafka as a Database For Real-Time Transaction Processing | Chad Preisle...
Using Kafka as a Database For Real-Time Transaction Processing | Chad Preisle...Using Kafka as a Database For Real-Time Transaction Processing | Chad Preisle...
Using Kafka as a Database For Real-Time Transaction Processing | Chad Preisle...
 
How does Riak compare to Cassandra? [Cassandra London User Group July 2011]
How does Riak compare to Cassandra? [Cassandra London User Group July 2011]How does Riak compare to Cassandra? [Cassandra London User Group July 2011]
How does Riak compare to Cassandra? [Cassandra London User Group July 2011]
 
Taboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache SparkTaboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache Spark
 
Metrics Are Not Enough: Monitoring Apache Kafka and Streaming Applications
Metrics Are Not Enough: Monitoring Apache Kafka and Streaming ApplicationsMetrics Are Not Enough: Monitoring Apache Kafka and Streaming Applications
Metrics Are Not Enough: Monitoring Apache Kafka and Streaming Applications
 
When the Cloud is a Rockin: High Availability in Apache CloudStack
When the Cloud is a Rockin: High Availability in Apache CloudStackWhen the Cloud is a Rockin: High Availability in Apache CloudStack
When the Cloud is a Rockin: High Availability in Apache CloudStack
 

Viewers also liked

"In love with Open Source : Past, Present and Future" : Keynote OSDConf 2014
"In love with Open Source : Past, Present and Future" : Keynote OSDConf 2014"In love with Open Source : Past, Present and Future" : Keynote OSDConf 2014
"In love with Open Source : Past, Present and Future" : Keynote OSDConf 2014Piyush Kumar
 
PyCon India 2012: Celery Talk
PyCon India 2012: Celery TalkPyCon India 2012: Celery Talk
PyCon India 2012: Celery TalkPiyush Kumar
 
Open World of #OSS and #HealthTech
Open World of #OSS and #HealthTechOpen World of #OSS and #HealthTech
Open World of #OSS and #HealthTechPiyush Kumar
 
Oss as a competitive advantage
Oss as a competitive advantageOss as a competitive advantage
Oss as a competitive advantageRegunath B
 
NetBSD and Linux for Embedded Systems
NetBSD and Linux for Embedded SystemsNetBSD and Linux for Embedded Systems
NetBSD and Linux for Embedded SystemsMahendra M
 
An Introduction to Celery
An Introduction to CeleryAn Introduction to Celery
An Introduction to CeleryIdan Gazit
 
Advanced task management with Celery
Advanced task management with CeleryAdvanced task management with Celery
Advanced task management with CeleryMahendra M
 
An Introduction to Yatra.com
An Introduction to Yatra.comAn Introduction to Yatra.com
An Introduction to Yatra.comYatra.Com
 
Introduction to airline reservation systems
Introduction to airline reservation systemsIntroduction to airline reservation systems
Introduction to airline reservation systemsJava and .NET Architect
 
Air ticket reservation system presentation
Air ticket reservation system presentation Air ticket reservation system presentation
Air ticket reservation system presentation Smit Patel
 
Project Proposal document for Hotel Management System
Project Proposal document for Hotel Management SystemProject Proposal document for Hotel Management System
Project Proposal document for Hotel Management SystemCharitha Gamage
 

Viewers also liked (12)

"In love with Open Source : Past, Present and Future" : Keynote OSDConf 2014
"In love with Open Source : Past, Present and Future" : Keynote OSDConf 2014"In love with Open Source : Past, Present and Future" : Keynote OSDConf 2014
"In love with Open Source : Past, Present and Future" : Keynote OSDConf 2014
 
PyCon India 2012: Celery Talk
PyCon India 2012: Celery TalkPyCon India 2012: Celery Talk
PyCon India 2012: Celery Talk
 
Open World of #OSS and #HealthTech
Open World of #OSS and #HealthTechOpen World of #OSS and #HealthTech
Open World of #OSS and #HealthTech
 
Oss as a competitive advantage
Oss as a competitive advantageOss as a competitive advantage
Oss as a competitive advantage
 
NetBSD and Linux for Embedded Systems
NetBSD and Linux for Embedded SystemsNetBSD and Linux for Embedded Systems
NetBSD and Linux for Embedded Systems
 
An Introduction to Celery
An Introduction to CeleryAn Introduction to Celery
An Introduction to Celery
 
Advanced task management with Celery
Advanced task management with CeleryAdvanced task management with Celery
Advanced task management with Celery
 
An Introduction to Yatra.com
An Introduction to Yatra.comAn Introduction to Yatra.com
An Introduction to Yatra.com
 
Introduction to airline reservation systems
Introduction to airline reservation systemsIntroduction to airline reservation systems
Introduction to airline reservation systems
 
Air ticket reservation system presentation
Air ticket reservation system presentation Air ticket reservation system presentation
Air ticket reservation system presentation
 
How Flipkart scales PHP
How Flipkart scales PHPHow Flipkart scales PHP
How Flipkart scales PHP
 
Project Proposal document for Hotel Management System
Project Proposal document for Hotel Management SystemProject Proposal document for Hotel Management System
Project Proposal document for Hotel Management System
 

Similar to Importance of ‘Centralized Event collection’ and BigData platform for Analysis !

Apigee Insights: Data & Context-Driven Actions
Apigee Insights: Data & Context-Driven ActionsApigee Insights: Data & Context-Driven Actions
Apigee Insights: Data & Context-Driven ActionsApigee | Google Cloud
 
Agile Gurugram 2023 | Observability for Modern Applications. How does it help...
Agile Gurugram 2023 | Observability for Modern Applications. How does it help...Agile Gurugram 2023 | Observability for Modern Applications. How does it help...
Agile Gurugram 2023 | Observability for Modern Applications. How does it help...AgileNetwork
 
Service quality monitoring system architecture
Service quality monitoring system architectureService quality monitoring system architecture
Service quality monitoring system architectureMatsuo Sawahashi
 
Actionable Insights - Thompson
Actionable Insights - ThompsonActionable Insights - Thompson
Actionable Insights - ThompsonProlifics
 
InfoSphere BigInsights
InfoSphere BigInsightsInfoSphere BigInsights
InfoSphere BigInsightsWilfried Hoge
 
SplunkLive! Munich 2018: Data Onboarding Overview
SplunkLive! Munich 2018: Data Onboarding OverviewSplunkLive! Munich 2018: Data Onboarding Overview
SplunkLive! Munich 2018: Data Onboarding OverviewSplunk
 
Impact 2013 2963 - IBM Business Process Manager Top Practices
Impact 2013 2963 - IBM Business Process Manager Top PracticesImpact 2013 2963 - IBM Business Process Manager Top Practices
Impact 2013 2963 - IBM Business Process Manager Top PracticesBrian Petrini
 
Machine Data Is EVERYWHERE: Use It for Testing
Machine Data Is EVERYWHERE: Use It for TestingMachine Data Is EVERYWHERE: Use It for Testing
Machine Data Is EVERYWHERE: Use It for TestingTechWell
 
IMCSummit 2015 - Day 1 Developer Track - Implementing Operational Intelligenc...
IMCSummit 2015 - Day 1 Developer Track - Implementing Operational Intelligenc...IMCSummit 2015 - Day 1 Developer Track - Implementing Operational Intelligenc...
IMCSummit 2015 - Day 1 Developer Track - Implementing Operational Intelligenc...In-Memory Computing Summit
 
Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...Sri Ambati
 
GraphSummit - Process Tempo - Build Graph Applications.pdf
GraphSummit - Process Tempo - Build Graph Applications.pdfGraphSummit - Process Tempo - Build Graph Applications.pdf
GraphSummit - Process Tempo - Build Graph Applications.pdfNeo4j
 
Modern Data Architectures for Business Outcomes
Modern Data Architectures for Business OutcomesModern Data Architectures for Business Outcomes
Modern Data Architectures for Business OutcomesAmazon Web Services
 
ACDKOCHI19 - Next Generation Data Analytics Platform on AWS
ACDKOCHI19 - Next Generation Data Analytics Platform on AWSACDKOCHI19 - Next Generation Data Analytics Platform on AWS
ACDKOCHI19 - Next Generation Data Analytics Platform on AWSAWS User Group Kochi
 
Active directory solutions brochure
Active directory solutions brochureActive directory solutions brochure
Active directory solutions brochureZoho Corporation
 
New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S...
 New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S... New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S...
New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S...Big Data Spain
 
SplunkLive! Zurich 2018: Integrating Metrics and Logs
SplunkLive! Zurich 2018: Integrating Metrics and LogsSplunkLive! Zurich 2018: Integrating Metrics and Logs
SplunkLive! Zurich 2018: Integrating Metrics and LogsSplunk
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream ProcessingGuido Schmutz
 
SplunkLive! Milano 2016 - customer presentation - Unicredit
SplunkLive! Milano 2016 -  customer presentation - UnicreditSplunkLive! Milano 2016 -  customer presentation - Unicredit
SplunkLive! Milano 2016 - customer presentation - UnicreditSplunk
 
SplunkLive! Frankfurt 2018 - Data Onboarding Overview
SplunkLive! Frankfurt 2018 - Data Onboarding OverviewSplunkLive! Frankfurt 2018 - Data Onboarding Overview
SplunkLive! Frankfurt 2018 - Data Onboarding OverviewSplunk
 

Similar to Importance of ‘Centralized Event collection’ and BigData platform for Analysis ! (20)

Apigee Insights: Data & Context-Driven Actions
Apigee Insights: Data & Context-Driven ActionsApigee Insights: Data & Context-Driven Actions
Apigee Insights: Data & Context-Driven Actions
 
Agile Gurugram 2023 | Observability for Modern Applications. How does it help...
Agile Gurugram 2023 | Observability for Modern Applications. How does it help...Agile Gurugram 2023 | Observability for Modern Applications. How does it help...
Agile Gurugram 2023 | Observability for Modern Applications. How does it help...
 
Service quality monitoring system architecture
Service quality monitoring system architectureService quality monitoring system architecture
Service quality monitoring system architecture
 
Machine Data Analytics
Machine Data AnalyticsMachine Data Analytics
Machine Data Analytics
 
Actionable Insights - Thompson
Actionable Insights - ThompsonActionable Insights - Thompson
Actionable Insights - Thompson
 
InfoSphere BigInsights
InfoSphere BigInsightsInfoSphere BigInsights
InfoSphere BigInsights
 
SplunkLive! Munich 2018: Data Onboarding Overview
SplunkLive! Munich 2018: Data Onboarding OverviewSplunkLive! Munich 2018: Data Onboarding Overview
SplunkLive! Munich 2018: Data Onboarding Overview
 
Impact 2013 2963 - IBM Business Process Manager Top Practices
Impact 2013 2963 - IBM Business Process Manager Top PracticesImpact 2013 2963 - IBM Business Process Manager Top Practices
Impact 2013 2963 - IBM Business Process Manager Top Practices
 
Machine Data Is EVERYWHERE: Use It for Testing
Machine Data Is EVERYWHERE: Use It for TestingMachine Data Is EVERYWHERE: Use It for Testing
Machine Data Is EVERYWHERE: Use It for Testing
 
IMCSummit 2015 - Day 1 Developer Track - Implementing Operational Intelligenc...
IMCSummit 2015 - Day 1 Developer Track - Implementing Operational Intelligenc...IMCSummit 2015 - Day 1 Developer Track - Implementing Operational Intelligenc...
IMCSummit 2015 - Day 1 Developer Track - Implementing Operational Intelligenc...
 
Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...
 
GraphSummit - Process Tempo - Build Graph Applications.pdf
GraphSummit - Process Tempo - Build Graph Applications.pdfGraphSummit - Process Tempo - Build Graph Applications.pdf
GraphSummit - Process Tempo - Build Graph Applications.pdf
 
Modern Data Architectures for Business Outcomes
Modern Data Architectures for Business OutcomesModern Data Architectures for Business Outcomes
Modern Data Architectures for Business Outcomes
 
ACDKOCHI19 - Next Generation Data Analytics Platform on AWS
ACDKOCHI19 - Next Generation Data Analytics Platform on AWSACDKOCHI19 - Next Generation Data Analytics Platform on AWS
ACDKOCHI19 - Next Generation Data Analytics Platform on AWS
 
Active directory solutions brochure
Active directory solutions brochureActive directory solutions brochure
Active directory solutions brochure
 
New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S...
 New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S... New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S...
New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S...
 
SplunkLive! Zurich 2018: Integrating Metrics and Logs
SplunkLive! Zurich 2018: Integrating Metrics and LogsSplunkLive! Zurich 2018: Integrating Metrics and Logs
SplunkLive! Zurich 2018: Integrating Metrics and Logs
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
 
SplunkLive! Milano 2016 - customer presentation - Unicredit
SplunkLive! Milano 2016 -  customer presentation - UnicreditSplunkLive! Milano 2016 -  customer presentation - Unicredit
SplunkLive! Milano 2016 - customer presentation - Unicredit
 
SplunkLive! Frankfurt 2018 - Data Onboarding Overview
SplunkLive! Frankfurt 2018 - Data Onboarding OverviewSplunkLive! Frankfurt 2018 - Data Onboarding Overview
SplunkLive! Frankfurt 2018 - Data Onboarding Overview
 

Recently uploaded

The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 

Recently uploaded (20)

The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 

Importance of ‘Centralized Event collection’ and BigData platform for Analysis !

  • 1. Importance of ‘Centralized Event collection’ and BigData platform for Analysis ! DevOpsDays India, Bangalore - 2013 ~/Piyush Manager, Website Operations at MakeMyTrip
  • 2. What to expect:          MakeMyTrip data challenges! Event Data a.k.a. Logs & Log Analysis Why Centralized Logging …for systems and applications ! Capturing Events: Why structured data emitted from apps for machines is a better approach! Data Service Platform : DSP – Why ? Inputs: Data for DSP Top Architecture Considerations Top level key tasks Tools Arsenal and API Management and Service Cloud DevOpsDays India 2013 : ~/Piyush
  • 3. MakeMyTrip data challenges …! • • Multi-DC/colocation setup Different type of data sources : internal/ external(structured, semi-structured, unstructured)) – Online Transaction Data Store – ERP – CRM • Email Behavior / Survey results – Web Analytics – Logs • • • – – – – Web Application User Activity logs Social Media Inventory / Catalog Data residing in excel files Monitoring Metric Data : • • Graphite (Time-series whisper), Splunk , ElasticSearch (Logstash) – Many other different sources • Storing and Analyzing Huge Event Data ! DevOpsDays India 2013 : ~/Piyush
  • 4. Some challenges …! • • • • • • Aggregate web usage data and transactional data to generate one view Process multiple GB's-TB’s of data every day Serve more than a million data services API request / day Ensure business continuity as more and more reliance on MyDSP increases Store Terabytes of historical data Meshing transactional (online and offline) data with consumer behavior and derive analytics • Build flexible data ingestion platform to manage many data feeds from multiple data sources DevOpsDays India 2013 : ~/Piyush
  • 5. Flow of an Event DevOpsDays India 2013 : ~/Piyush
  • 6. Event Data a.k.a. Logs • Event Data -> set of chronologically sequenced data records that capture information about an event ! • Virtually every form of system produces event data – Capture it from all components and both client and server side events! • You may call logs as the footprint generated by any activity with the system/app. • Event Data has different characteristics from data stored in traditional data warehouses – Huge Volume: Event data accumulates rapidly and often must be stored for years; many organizations are managing hundreds of terabytes and some are managing petabytes. – Format: Because of the huge variety of sources, event data is unstructured and semi structured. – Velocity – New event data is constantly coming in – Collection : Event data is difficult to collect because of broadly dispersed systems and networks. – Time-stamped : Event data is always inserted once with a time-stamp. It never changes. DevOpsDays India 2013 : ~/Piyush
  • 7. Log Analysis • Logs are one of the most useful things when it comes to analysis; in simple terms Log analysis is making sense out of system/app-generated log messages (or just LOGS). Through logs we get insights into what is happening into the system. • Help root cause analysis that occurs after any incident. • Personalize User Experience Analyzing Web Usage Data “Security Req“: • Traditionally some compliance requirements too of : Log Management /SEM+ SIM => SIEM • For Data Security – to have one centralized platform for collecting ALL events (Logs) , correlate them and have real time intelligent visibility. • To not just monitor network, OS , devices etc. but ALL applications , business processes too. DevOpsDays India 2013 : ~/Piyush
  • 8. Why Centralized Logging …for systems and applications ! • Need for Centralized Logging is quiet important nowadays due to:– – – – growth in number of applications, distributed architecture (Service Oriented Architecture) Cloud based apps number of machines and infrastructure size is increasing day by day. • This means that centralized logging and the ability to spot errors in a distributed systems & applications has become even more “valuable” & “needed”. And most importantly – be able to understand the customers and how they interact with websites; – Understanding Change: whether using A/B or Multivariate experiments or tweak / understand new implementations. DevOpsDays India 2013 : ~/Piyush
  • 9. Capturing Events: Why structured data emitted from apps for machines is a better approach! • Need for standardization:– Developers assume that the first level consumer of a log message is a human and they only know what information is needed to debug an issue. Logs are not just for humans! The primary consumers of logs are shifting from humans to computers. This means log formats should have a well-defined structure that can be parsed easily and robustly. Logs change! If the logs never changed, writing a custom parser might not be too terrible. The engineer would write it once and be done. But in reality, logs change. Every time you add a feature, you start logging more data, and as you add more data, the printf-style format inevitably changes. This implies that the custom parser has to be updated constantly, consuming valuable development time. • Suggested Approach : “Logging in JSON Format” – Just to keep it simple and generic for any Application the approach recommended is to {Key: Value} , JSON Log Format (structured/semistructured). – This approach will be helpful for easy parsing and consumption, which would be irrespective of whatever technology/tools we choose to use! DevOpsDays India 2013 : ~/Piyush
  • 10. Key things to keep in mind/ Rules • • • • • • • • • • Use timestamps for every event Use unique identifiers (IDs) like Transaction ID / User ID / Session ID or may be append unique user Identification (UUID) number to track unique users. Log in text format / means Avoid logging binary information! Log anything that can add value when aggregated, charted, or further analyzed. Use categories: like “severity”: “WARN”, INFO, WARN, ERROR, and DEBUG. The 80/20 Rule: %80 or of our goals can be achieved with %20 of the work, so don’t log too much  NTP synced same date time / timezone on every producer and collector machine(#ntpdate ntp.example.com). Reliability: Like video recordings … you don’t’ want to lose the most valuable shoot … so you record every frame and then later during analysis; you may throw away rest of the stuff…picking your best shoot / frame. Here also – logs as events are recorded & should be recorded with proper reliability so that you don’t’ lose any important and usable part of it like the important video frame. Correlation Rules for various event streams to generated and minimize alerts/events. Write Connectors for integrations DevOpsDays India 2013 : ~/Piyush
  • 11. Data Service Platform : DSP Why we need a data services platform ? - - Integration Layer to bring data from more sources in less time Serve various components – applications and also to Monitoring systems etc. DevOpsDays India 2013 : ~/Piyush
  • 12. Inputs : Data – what data to include • Clickstream / Web Usage Data – User Activity Logs • Transactional Data Store • Off-line – CRM – Email Behavior -> Logs/ Events DevOpsDays India 2013 : ~/Piyush
  • 13. Top Architecture Considerations • • • • • Non blocking data ingestion UUID Tagged Events / messages Load balanced data processing across data centers Use of memory based data storage for real-time data systems Easy scalable, HA - highly available and easy to maintain large historical data sets • Data caching to achieve low latency • To ensure Business Continuity , parallel process between two different data centers • Use of Centralized service cloud for API management , security (authentication, authorization), metering and integration DevOpsDays India 2013 : ~/Piyush
  • 14. Top level key tasks for User Activity Logging & Analysis 1. Data Collection of both Client-Side and Server-Side user activity streams • • Tag every Website visitor with UUID similar to the System UUID’s Collect the activity streams on BigData Platform for Analysis through Kafka Queues & NoSQL data stores 2. Near real-time Data Processing • Preprocessing / Aggregations • • Filtering etc. Pattern Discovery along with the already available cooked data from point 4 • Clustering/Classification/association discovery/Sequence Mining 3. Rule Engine / recommendations algorithms • • Rule Engine : Building effective business rule engine / Correlate Events Content-based filtering / Collaborative Filtering 4. Batch Processing / post processing using Hadoop Ecosystem • Analysis & Storing Cooked data in NoSQL data store 5. Data Services (Web-services) • RESTful API’s to make the data/insights consumable through various data services 6. Reporting/Search interface & Visualization for Product Development teams and other business owners. DevOpsDays India 2013 : ~/Piyush
  • 15. Data System Lets’ store everything! Query = function (data) Layered Architecture: • every event : Data ! • Precompute View • Batch Layer : Hadoop M/R • Speed Layer : Storm NRT Computation • Serving Layer DevOpsDays India 2013 : ~/Piyush
  • 16. DevOpsDays India 2013 : ~/Piyush
  • 17. Clickstream / User Activities Capture : Data is-> “Events” • • Tag every Website visitor with UUID using Apache module - Done – https://github.com/piykumar/modified_mod_cookietrack – Cookie : UUID like 24617072-3124-674f-4b72-675746562434.1381297617597249 JSON Messages like { "timestamp": "2012-12-14T02:30:18", "facility": "clientSide", "clientip": "123.123.123.123", "uuid": "24617072-3124-5544-2f61-695256432432.1379399183414528", "domain": "www.example.com", "server": "abc-123", "request": "/page/request", "pagename": "funnel:example com:page1", "searchKey": "1234567890_", "sessionID": "11111111111111", "event1": "loading", "event2": "interstitial display banner", "severity": "WARN", "short_message": "....meaning short message for aggregation...", "full_message": "full LOG message", "userAgent": "...blah...blah..blah...", "RT": 2 } DevOpsDays India 2013 : ~/Piyush
  • 18. Tools Arsenal • • • • • • • • • • • ETL : Talend BI : SpagoBI & QlikView Hadoop : Hortonworks NRT Computation: Twitter Storm Document-Oriented NoSQL DB : Couchbase Distributed Search: ElasticSearch Log Collection: Flume, Logstash, Syslog-NG Distributed messaging system : Kafka , RabbitMQ NoSQL : Cassandra, Redis, Neo4J (Graph) API Management : WSO2 API Manager, 3Scale /Nginx Programming Languages : Java , Python, R DevOpsDays India 2013 : ~/Piyush
  • 19. API Management and Data Services Cloud • 3Scale / Nginx , WSO2: API Manager etc – For centralized distributed repository to serve API’s and provides throttling,meetring, Security features etc. • Inject building a data services layer in Culture and make sure what ever components you create you have some way to chain it in the pipeline or call in independently. DevOpsDays India 2013 : ~/Piyush
  • 20. Thanks! Questions – If Any  ! ~/Piyush @piykumar http://piyush.me DevOpsDays India 2013 : ~/Piyush