SlideShare a Scribd company logo
1 of 26
building a system for
event-oriented data
esammer@scalingdata.com / @esammer
cto @ scalingdata
“…try to fill the spaces
in between.”[1]
[1] obligatory htda reference
what we do
we build a system for the operation of modern data
centers
triage and diagnostics, exploration, trends,
advanced analytics of complex systems
our data: logs, metrics, human activity, anything
that occurs in the data center
“enterprise software” (i.e. we build for others.)
our typical customer
>1TB/hr (100s of millions of events and up), sub-second end to
end latency, full fidelity retention, critical use cases
quality of service - “are credit card transactions happening fast
enough?”
fraud detection - “detect, investigate, prosecute, and learn from
fraud.”
forensic diagnostics - “what really caused the outage last friday?”
security - “who’s doing what, where, when, why, and how, and is
that ok?”
overview
guarantees
no single point of failure exists
all components scale horizontally[1]
data retention and latency is a function of cost, not tech[1]
every event is delivered if r - 1 kafka brokers are, or can
be made, available (unless you do silly stuff)
all operations, including upgrade, are online[2]
every event is delivered exactly once[2]
[1] we’re positive there’s a limit, but thus far it has been cost.
[2] from the user’s perspective, at a system level.
modeling our world
everything is an event
each event contains a timestamp, type, location,
host, service, body, and type-specific attributes
(k/v pairs)
build aggregates as necessary - just optimized
views of the data
event schema
{
ts: long,
event_type_id: int,
location: string,
host: string,
service: string,
body: [ null, bytes ],
attributes: map<string>
}
event types
some event types are standard
syslog, http, log4j, generic text record, …
users define custom event types
producer populates event type
event type metadata tells downstream systems how
to interpret body and attributes
ex: generic syslog event
event_type_id: 100, // rfc3164, rfc5424 (syslog)
body: … // raw syslog message bytes
attributes: {
syslog_message: “DHCPACK from 10.10.0.1 (xid=0x45b63bdc)”,
syslog_severity: “6”, // info severity
syslog_facility: “3”, // daemon facility
syslog_process: “dhclient”,
syslog_pid: “668”,
…
}
Generic fields omitted
ex: generic http event
event_type_id: 102, // generic http event
body: … // raw http log message bytes
attributes: {
http_req_method: “GET”,
http_req_vhost: “w2a-demo-02”,
http_req_url: “/api/v1/search?q=service%3Asshd&p=1&s=200”,
http_req_query: “q=service%3Asshd&p=1&s=200”,
http_resp_code: “200”,
…
}
Generic fields omitted
consumers
…do most of the work
parallelism
kafka offset management
message de-duplication
transformation (embedded library)
downstream system knowledge
consumers
…do most of the work
parallelism
kafka offset management
message de-duplication
transformation (embedded library)
downstream system knowledge
inside a consumer
aggregation
mostly for metrics, just a specialized consumer
reduce(a: A, b: A): B
all functions are commutative and associative
spark streaming :(
(dimensions) => (aggregates)
ex: service volume by min
dimensions: yyyy, mm, dd, hh, mi, host, service
aggregates: count(1)
2014, 01, 01, 00, 00, w2a-demo-1, sshd => 17
SELECT year, month, day, hour, minute,
host, service, count(1) FROM events
GROUP BY year, month, day, hour,
minute, host, service
fancy math stuff
anomalous event detection
event clustering
building the models: micro-batches in spark
classification/scoring: streaming process
application
picks the query engine based on operation
graphs and metrics = sql (impala)
event search = solr
soon: fancier realtime stuff (web sockets -> kafka)
reprocessing data
sometimes you get it wrong
include “kafka coordinates” in data
topic, partition, offset
to rebuild data: 1. pick a point in time, 2. find
coordinates at t - 1, 3. delete and replay from there.
reprocessing data
“you can’t (always) go home” - kafka retention is
limited. not a persistent store.
solution: retain a raw+ dataset from which you can
rebuild
raw+: raw event data. enrichments are allowed, but
no destructive operations (masking, filtering, etc.)
that would preclude reprocessing.
oh the inefficiency!
kafka coordinates in data
raw+ dataset
materialized aggregates
searchable version in solr
oh the… reality?
solr =~ multiple rdbms indexes
aggregates =~ materialized views
kafka coordinate storage is largely erased by smart
hdfs storage formats (i.e. parquet)
pax structured, dictionary encoding, RLE, bit
packing, generic compression, inline metadata
extension
custom producers
custom consumers
event types
parsers / transformations
cube definitions and aggregate functions
custom processing jobs on landed data
pain
lots of tradeoffs when picking a stream processing solution
samza: right features, but low level programming model, not
supported by vendors. missing security features.
storm: too rigid, too slow. not supported by all vendors.
spark streaming: where do i even start? tons of issues, but has
all the community energy. our current bet, for better or worse.
stack complexity
relative (im)maturity
if you’re going to do this
read all the literature on stream processing[1]
understand, make, and provide guarantees
find the right abstractions
never trust the hand waving or “hello worlds”
fully evaluate the projects in this space
[1] wait, like all of it? yea, like all of it.
thanks <3
questions?
esammer@scalingdata.com / @esammer
http://www.scalingdata.com
(hiring)

More Related Content

What's hot

The State of Stream Processing
The State of Stream ProcessingThe State of Stream Processing
The State of Stream Processing
confluent
 
Building Reactive Fast Data & the Data Lake with Akka, Kafka, Spark
Building Reactive Fast Data & the Data Lake with Akka, Kafka, SparkBuilding Reactive Fast Data & the Data Lake with Akka, Kafka, Spark
Building Reactive Fast Data & the Data Lake with Akka, Kafka, Spark
Todd Fritz
 

What's hot (20)

Reactive Streams 1.0.0 and Why You Should Care (webinar)
Reactive Streams 1.0.0 and Why You Should Care (webinar)Reactive Streams 1.0.0 and Why You Should Care (webinar)
Reactive Streams 1.0.0 and Why You Should Care (webinar)
 
Azure + DataStax Enterprise Powers Office 365 Per User Store
Azure + DataStax Enterprise Powers Office 365 Per User StoreAzure + DataStax Enterprise Powers Office 365 Per User Store
Azure + DataStax Enterprise Powers Office 365 Per User Store
 
The State of Stream Processing
The State of Stream ProcessingThe State of Stream Processing
The State of Stream Processing
 
Always On: Building Highly Available Applications on Cassandra
Always On: Building Highly Available Applications on CassandraAlways On: Building Highly Available Applications on Cassandra
Always On: Building Highly Available Applications on Cassandra
 
Cassandra & Spark for IoT
Cassandra & Spark for IoTCassandra & Spark for IoT
Cassandra & Spark for IoT
 
Building Reactive Fast Data & the Data Lake with Akka, Kafka, Spark
Building Reactive Fast Data & the Data Lake with Akka, Kafka, SparkBuilding Reactive Fast Data & the Data Lake with Akka, Kafka, Spark
Building Reactive Fast Data & the Data Lake with Akka, Kafka, Spark
 
Webinar: Diagnosing Apache Cassandra Problems in Production
Webinar: Diagnosing Apache Cassandra Problems in ProductionWebinar: Diagnosing Apache Cassandra Problems in Production
Webinar: Diagnosing Apache Cassandra Problems in Production
 
Proofpoint: Fraud Detection and Security on Social Media
Proofpoint: Fraud Detection and Security on Social MediaProofpoint: Fraud Detection and Security on Social Media
Proofpoint: Fraud Detection and Security on Social Media
 
Stream processing - Apache flink
Stream processing - Apache flinkStream processing - Apache flink
Stream processing - Apache flink
 
New Analytics Toolbox DevNexus 2015
New Analytics Toolbox DevNexus 2015New Analytics Toolbox DevNexus 2015
New Analytics Toolbox DevNexus 2015
 
netflix-real-time-data-strata-talk
netflix-real-time-data-strata-talknetflix-real-time-data-strata-talk
netflix-real-time-data-strata-talk
 
British Gas Connected Homes: Data Engineering
British Gas Connected Homes: Data EngineeringBritish Gas Connected Homes: Data Engineering
British Gas Connected Homes: Data Engineering
 
Using the SDACK Architecture to Build a Big Data Product
Using the SDACK Architecture to Build a Big Data ProductUsing the SDACK Architecture to Build a Big Data Product
Using the SDACK Architecture to Build a Big Data Product
 
Using Riak for Events storage and analysis at Booking.com
Using Riak for Events storage and analysis at Booking.comUsing Riak for Events storage and analysis at Booking.com
Using Riak for Events storage and analysis at Booking.com
 
Data Streaming Ecosystem Management at Booking.com
Data Streaming Ecosystem Management at Booking.com Data Streaming Ecosystem Management at Booking.com
Data Streaming Ecosystem Management at Booking.com
 
Developing a Real-time Engine with Akka, Cassandra, and Spray
Developing a Real-time Engine with Akka, Cassandra, and SprayDeveloping a Real-time Engine with Akka, Cassandra, and Spray
Developing a Real-time Engine with Akka, Cassandra, and Spray
 
Macy's: Changing Engines in Mid-Flight
Macy's: Changing Engines in Mid-FlightMacy's: Changing Engines in Mid-Flight
Macy's: Changing Engines in Mid-Flight
 
Perfecting Your Streaming Skills with Spark and Real World IoT Data
Perfecting Your Streaming Skills with Spark and Real World IoT DataPerfecting Your Streaming Skills with Spark and Real World IoT Data
Perfecting Your Streaming Skills with Spark and Real World IoT Data
 
Capital One: Using Cassandra In Building A Reporting Platform
Capital One: Using Cassandra In Building A Reporting PlatformCapital One: Using Cassandra In Building A Reporting Platform
Capital One: Using Cassandra In Building A Reporting Platform
 
Event Stream Processing with Kafka and Samza
Event Stream Processing with Kafka and SamzaEvent Stream Processing with Kafka and Samza
Event Stream Processing with Kafka and Samza
 

Similar to from source to solution - building a system for event-oriented data

How We Analyzed 1000 Dumps in One Day - Dina Goldshtein, Brightsource - DevOp...
How We Analyzed 1000 Dumps in One Day - Dina Goldshtein, Brightsource - DevOp...How We Analyzed 1000 Dumps in One Day - Dina Goldshtein, Brightsource - DevOp...
How We Analyzed 1000 Dumps in One Day - Dina Goldshtein, Brightsource - DevOp...
DevOpsDays Tel Aviv
 
Building an Event-oriented Data Platform with Kafka, Eric Sammer
Building an Event-oriented Data Platform with Kafka, Eric Sammer Building an Event-oriented Data Platform with Kafka, Eric Sammer
Building an Event-oriented Data Platform with Kafka, Eric Sammer
confluent
 
Robert Pankowecki - Czy sprzedawcy SQLowych baz nas oszukali?
Robert Pankowecki - Czy sprzedawcy SQLowych baz nas oszukali?Robert Pankowecki - Czy sprzedawcy SQLowych baz nas oszukali?
Robert Pankowecki - Czy sprzedawcy SQLowych baz nas oszukali?
SegFaultConf
 
Windows Remote Management - EN
Windows Remote Management - ENWindows Remote Management - EN
Windows Remote Management - EN
Kirill Nikolaev
 
sector-sphere
sector-spheresector-sphere
sector-sphere
xlight
 

Similar to from source to solution - building a system for event-oriented data (20)

Building a system for machine and event-oriented data with Rocana
Building a system for machine and event-oriented data with RocanaBuilding a system for machine and event-oriented data with Rocana
Building a system for machine and event-oriented data with Rocana
 
Dynamic Population Discovery for Lateral Movement (Using Machine Learning)
Dynamic Population Discovery for Lateral Movement (Using Machine Learning)Dynamic Population Discovery for Lateral Movement (Using Machine Learning)
Dynamic Population Discovery for Lateral Movement (Using Machine Learning)
 
Hacktive Directory Forensics - HackCon18, Oslo
Hacktive Directory Forensics - HackCon18, OsloHacktive Directory Forensics - HackCon18, Oslo
Hacktive Directory Forensics - HackCon18, Oslo
 
Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...
Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...
Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...
 
SHOWDOWN: Threat Stack vs. Red Hat AuditD
SHOWDOWN: Threat Stack vs. Red Hat AuditDSHOWDOWN: Threat Stack vs. Red Hat AuditD
SHOWDOWN: Threat Stack vs. Red Hat AuditD
 
How We Analyzed 1000 Dumps in One Day - Dina Goldshtein, Brightsource - DevOp...
How We Analyzed 1000 Dumps in One Day - Dina Goldshtein, Brightsource - DevOp...How We Analyzed 1000 Dumps in One Day - Dina Goldshtein, Brightsource - DevOp...
How We Analyzed 1000 Dumps in One Day - Dina Goldshtein, Brightsource - DevOp...
 
Aggregated queries with Druid on terrabytes and petabytes of data
Aggregated queries with Druid on terrabytes and petabytes of dataAggregated queries with Druid on terrabytes and petabytes of data
Aggregated queries with Druid on terrabytes and petabytes of data
 
Building an Event-oriented Data Platform with Kafka, Eric Sammer
Building an Event-oriented Data Platform with Kafka, Eric Sammer Building an Event-oriented Data Platform with Kafka, Eric Sammer
Building an Event-oriented Data Platform with Kafka, Eric Sammer
 
Open source Tools and Frameworks for M2M - Sierra Wireless Developer Days
Open source Tools and Frameworks for M2M - Sierra Wireless Developer DaysOpen source Tools and Frameworks for M2M - Sierra Wireless Developer Days
Open source Tools and Frameworks for M2M - Sierra Wireless Developer Days
 
Apache Drill: An Active, Ad-hoc Query System for large-scale Data Sets
Apache Drill: An Active, Ad-hoc Query System for large-scale Data SetsApache Drill: An Active, Ad-hoc Query System for large-scale Data Sets
Apache Drill: An Active, Ad-hoc Query System for large-scale Data Sets
 
FlinkForward Asia 2019 - Evolving Keystone to an Open Collaborative Real Time...
FlinkForward Asia 2019 - Evolving Keystone to an Open Collaborative Real Time...FlinkForward Asia 2019 - Evolving Keystone to an Open Collaborative Real Time...
FlinkForward Asia 2019 - Evolving Keystone to an Open Collaborative Real Time...
 
Strata Presentation: One Billion Objects in 2GB: Big Data Analytics on Small ...
Strata Presentation: One Billion Objects in 2GB: Big Data Analytics on Small ...Strata Presentation: One Billion Objects in 2GB: Big Data Analytics on Small ...
Strata Presentation: One Billion Objects in 2GB: Big Data Analytics on Small ...
 
Robert Pankowecki - Czy sprzedawcy SQLowych baz nas oszukali?
Robert Pankowecki - Czy sprzedawcy SQLowych baz nas oszukali?Robert Pankowecki - Czy sprzedawcy SQLowych baz nas oszukali?
Robert Pankowecki - Czy sprzedawcy SQLowych baz nas oszukali?
 
Windows Remote Management - EN
Windows Remote Management - ENWindows Remote Management - EN
Windows Remote Management - EN
 
Anatomy of an action
Anatomy of an actionAnatomy of an action
Anatomy of an action
 
Remote Log Analytics Using DDS, ELK, and RxJS
Remote Log Analytics Using DDS, ELK, and RxJSRemote Log Analytics Using DDS, ELK, and RxJS
Remote Log Analytics Using DDS, ELK, and RxJS
 
Sector Sphere 2009
Sector Sphere 2009Sector Sphere 2009
Sector Sphere 2009
 
sector-sphere
sector-spheresector-sphere
sector-sphere
 
ISSA Siem Fraud
ISSA Siem FraudISSA Siem Fraud
ISSA Siem Fraud
 
Berlin Buzz Words - Apache Drill by Ted Dunning & Michael Hausenblas
Berlin Buzz Words - Apache Drill by Ted Dunning & Michael HausenblasBerlin Buzz Words - Apache Drill by Ted Dunning & Michael Hausenblas
Berlin Buzz Words - Apache Drill by Ted Dunning & Michael Hausenblas
 

Recently uploaded

Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
amitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
amitlee9823
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
amitlee9823
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
amitlee9823
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
JoseMangaJr1
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Recently uploaded (20)

Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning Approach
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 

from source to solution - building a system for event-oriented data

  • 1. building a system for event-oriented data esammer@scalingdata.com / @esammer cto @ scalingdata
  • 2. “…try to fill the spaces in between.”[1] [1] obligatory htda reference
  • 3. what we do we build a system for the operation of modern data centers triage and diagnostics, exploration, trends, advanced analytics of complex systems our data: logs, metrics, human activity, anything that occurs in the data center “enterprise software” (i.e. we build for others.)
  • 4. our typical customer >1TB/hr (100s of millions of events and up), sub-second end to end latency, full fidelity retention, critical use cases quality of service - “are credit card transactions happening fast enough?” fraud detection - “detect, investigate, prosecute, and learn from fraud.” forensic diagnostics - “what really caused the outage last friday?” security - “who’s doing what, where, when, why, and how, and is that ok?”
  • 6. guarantees no single point of failure exists all components scale horizontally[1] data retention and latency is a function of cost, not tech[1] every event is delivered if r - 1 kafka brokers are, or can be made, available (unless you do silly stuff) all operations, including upgrade, are online[2] every event is delivered exactly once[2] [1] we’re positive there’s a limit, but thus far it has been cost. [2] from the user’s perspective, at a system level.
  • 7. modeling our world everything is an event each event contains a timestamp, type, location, host, service, body, and type-specific attributes (k/v pairs) build aggregates as necessary - just optimized views of the data
  • 8. event schema { ts: long, event_type_id: int, location: string, host: string, service: string, body: [ null, bytes ], attributes: map<string> }
  • 9. event types some event types are standard syslog, http, log4j, generic text record, … users define custom event types producer populates event type event type metadata tells downstream systems how to interpret body and attributes
  • 10. ex: generic syslog event event_type_id: 100, // rfc3164, rfc5424 (syslog) body: … // raw syslog message bytes attributes: { syslog_message: “DHCPACK from 10.10.0.1 (xid=0x45b63bdc)”, syslog_severity: “6”, // info severity syslog_facility: “3”, // daemon facility syslog_process: “dhclient”, syslog_pid: “668”, … } Generic fields omitted
  • 11. ex: generic http event event_type_id: 102, // generic http event body: … // raw http log message bytes attributes: { http_req_method: “GET”, http_req_vhost: “w2a-demo-02”, http_req_url: “/api/v1/search?q=service%3Asshd&p=1&s=200”, http_req_query: “q=service%3Asshd&p=1&s=200”, http_resp_code: “200”, … } Generic fields omitted
  • 12. consumers …do most of the work parallelism kafka offset management message de-duplication transformation (embedded library) downstream system knowledge
  • 13. consumers …do most of the work parallelism kafka offset management message de-duplication transformation (embedded library) downstream system knowledge
  • 15. aggregation mostly for metrics, just a specialized consumer reduce(a: A, b: A): B all functions are commutative and associative spark streaming :( (dimensions) => (aggregates)
  • 16. ex: service volume by min dimensions: yyyy, mm, dd, hh, mi, host, service aggregates: count(1) 2014, 01, 01, 00, 00, w2a-demo-1, sshd => 17 SELECT year, month, day, hour, minute, host, service, count(1) FROM events GROUP BY year, month, day, hour, minute, host, service
  • 17. fancy math stuff anomalous event detection event clustering building the models: micro-batches in spark classification/scoring: streaming process
  • 18. application picks the query engine based on operation graphs and metrics = sql (impala) event search = solr soon: fancier realtime stuff (web sockets -> kafka)
  • 19. reprocessing data sometimes you get it wrong include “kafka coordinates” in data topic, partition, offset to rebuild data: 1. pick a point in time, 2. find coordinates at t - 1, 3. delete and replay from there.
  • 20. reprocessing data “you can’t (always) go home” - kafka retention is limited. not a persistent store. solution: retain a raw+ dataset from which you can rebuild raw+: raw event data. enrichments are allowed, but no destructive operations (masking, filtering, etc.) that would preclude reprocessing.
  • 21. oh the inefficiency! kafka coordinates in data raw+ dataset materialized aggregates searchable version in solr
  • 22. oh the… reality? solr =~ multiple rdbms indexes aggregates =~ materialized views kafka coordinate storage is largely erased by smart hdfs storage formats (i.e. parquet) pax structured, dictionary encoding, RLE, bit packing, generic compression, inline metadata
  • 23. extension custom producers custom consumers event types parsers / transformations cube definitions and aggregate functions custom processing jobs on landed data
  • 24. pain lots of tradeoffs when picking a stream processing solution samza: right features, but low level programming model, not supported by vendors. missing security features. storm: too rigid, too slow. not supported by all vendors. spark streaming: where do i even start? tons of issues, but has all the community energy. our current bet, for better or worse. stack complexity relative (im)maturity
  • 25. if you’re going to do this read all the literature on stream processing[1] understand, make, and provide guarantees find the right abstractions never trust the hand waving or “hello worlds” fully evaluate the projects in this space [1] wait, like all of it? yea, like all of it.
  • 26. thanks <3 questions? esammer@scalingdata.com / @esammer http://www.scalingdata.com (hiring)