SlideShare a Scribd company logo
1 of 36
Download to read offline
Log Everything!
@DC13
Stefan & Mike

Dr. Stefan Schadwinkel

Mike Lohmann

Co-Founder / Analytics Engineer

Co-Founder / Software Engineer

stefan.schadwinkel@deck36.de

mike.lohmann@deck36.de
ABOUT DECK36
Who We Are
–  DECK36 is a young spin-off from ICANS
–  Small team of 7 engineers
–  Longstanding expertise in designing, implementing and operating complex web
systems
–  Developing own data intelligence-focused tools and web services
–  Offering our expert knowledge in Automation & Operations, Architecture &
Engineering, Analytics & Data Logistics
WHAT WE WILL TALK ABOUT
Topics
–  Log everything! – The Data Pipeline.
–  Tackling the Leviathan – Realtime Stream Processing with Storm.
–  JS Client DataCollector: Live Demo
–  Storm Processing with PHP: Live Demo
Log everything!
The Data Pipeline
THE DATA PIPELINE
Requirements
Background: Building and operating multiple education communities
Baseline: PokerStrategy.com KPIs
–  6M registered users, 700k posts/month, 2.8M page impressions/day, 7.6M requests/
day
New products à New business models à New Questions
–  Extendable generic solution
–  Storage and accessability more important than specific, optimized applications
THE DATA PIPELINE
Requirements
Producer

Transport

Storage

Analytics

Realtime Stream Processing
Producer
–  Monolog Plugin, JS Client
Transport
–  Flume 0.9.4 m( à RabbitMQ, Erlang Consumer 
–  Evaluated Apache Kafka
Storage
–  Hadoop HDFS (our very own) à Amazon S3
THE DATA PIPELINE
Logging Pipeline
Producer

Transport

Storage

Analytics

Realtime Stream Processing
Analytics 
-  Hadoop MapReduce à Amazon EMR, Python, R 
-  Exports to Excel (CSV), Qlikview à Amazon
Redshift
Realtime Stream Processing
-  Twitter Storm
THE DATA PIPELINE
Unified Message Format

-  Fixed, guaranteed envelope

-  Processing driven by message content 
-  Single message gets compressed (LZOP) to about 70% of original size "
(1184 B à 817 B)
-  Message bulk gets compressed to about 12-14% of original size "
(@ 42k & 325k messages)
Unified Message Form
THE DATA PIPELINE
Compaction
RabbitMQ consumer (Erlang) stores data to cloud 
-  Relatively large amount of files
-  Mixed messages
We want
-  A few files
-  Messages grouped by „Event Type“ and „Time Partition“
-  Data transformation
Determined by message content

s3://[BUCKET]/icanslog/[WEBSITE]/icans.content/year=2012/month=10/day=01/part-00000.lzo



Hive partitioning!
THE DATA PIPELINE
Compaction
Using Cascalog
-  Based on Clojure (LISP) and Cascading
-  Provides a Datalog-like query language
-  Don‘t LISP? à JCascalog

Very handy features (unavailable in Hive or Pig)
-  Cascading Output Taps can be parameterized by data records
-  Trap location for corrupted records (job finishes for all the correct messages)
-  Runs within the JVM à large available codebase, arbitrary processing is simple
Cacalog Query Syntax

Cascalog is Clojure, Clojure is Lisp

(?<- (stdout)
Query
Operator

Cascading
Output Tap

[?person]
Columns of
the dataset
generated
by the query

(age ?person ?age) … (< ?age 30))
„Generator“

„Predicate“

-  as many as you want
-  both can be any clojure function
-  clojure can call anything that is
available within a JVM
Cacalog Query Syntax

Run the Cascalog processing on Amazon EMR:
./elastic-mapreduce [standard parameters omitted]
--jar s3://[BUCKET]/mapreduce/compaction/icans-cascalog.jar
--main-class icans.cascalogjobs.processing.compaction
--args "s3://[BUCKET]/incoming/*/*/*/","s3://[BUCKET]/icanslog","s3://[BUCKET]/icanslog-error
The Data Pipeline
Data Queries with Hive
Hive is table-based and provides SQL-like syntax
-  Assumes one storage location (directory) per table
-  Simple to use if you know SQL
-  Widely used, rapid development for „simple“ queries
Hive @ Amazon
-  Table locations can be S3
-  „Cluster on demand“ à requires to rebuild Hive metadata 
-  CREATE TABLE for source and target S3 locations
-  Import Table metadata (auto-discovery for partitions)
-  INSERT OVERWRITE to query source table(s) and store to target S3 location
Hive @ Amazon (1)
Hive @ Amazon (2)

We can now simply copy the data from S3 
and import into any local analytical tool
e.g. Excel, Redshift, QlikView, R, etc.
Further Reading

-  More details in the Log Everything! ebook
-  Available at Amazon and DeveloperPress
THE DATA PIPELINE
Still: It’s Batch Processing
-  While quite efficient in flight, the logistics
of getting the job started are significant.
-  Only cost-efficient for long distance
travel.
THE DATA PIPELINE

Instant Insight through Stream Processing
-  Often, only updates for the recent day,
week, or month are necessary
-  Time is of importance when direct
feedback or user interaction is desired
More Wind In The Sails
With Storm
REALTIME STREAM PROCESSING

Instant Insight through Stream Processing
-  Distributed realtime processing
framework
-  Battle-proven by Twitter
-  All *BINGO-Abilities fulfilled!
-  Hadoop = data batch processing; Storm
= realtime data processing 
-  More (and maybe new) *BINGO: DRPC,
ETL, RTET, Spouts, Bolts, Tuple,
Topology 
-  Easy to use (Really!)
Realtime Stream Processing Infrastructure with Storm

Producer

Transport

Analytics

Storage
Realtime Data Stream Analytics

Storm-Cluster
Supervisor
NodeJS

Supervisor

S3

Worker

Worker
Worker
Zabbix
Graylog

Apps
&Server

Queue

Zookeeper

Nimbus
(Master)

DB
REALTIME STREAM PROCESSING
JS Client Features
-  Event system
-  Master/Slave Tabs
-  Local queuing of data
-  Ability to use node modules
-  Easy to extend
-  Complete development suite
-  Deliver bundles with vendors or not
Realtime Stream Processing - Loading the JS Client

<script .. src=“https://cdn.tradimo.com/js/starlog-client.min.js?5193e1ba0325c756b78d87384d2f80e9"></script>

https://../starlog-client.min.js

Create signed
cookie

starlog-client.min.js
Set-Cookie:UUID
/socket.io/1/websockets
Upgrade: websockets
Cookie: UUID
Established connection

Check cookie

HTTP 101 – Protocol Change
Connection: Upgrade
Upgrade: websocket
Collecting Data

Sending data in UMF
Sending data to the client

UMF
NodeJS
Counts
Queue

Backend
Magic
Queue
Realtime Stream Processing - JS Client in action

UseCase: If num of clicks on a Domain % 10 == 0, send „Star Trek Commander“ Badge

ClickEvent collector

register onclick Event

Clicked-Data

observe

localstorage

Clicked-Data

Clicked-Data-UMF
SocketConnect
NodeJS
Realtime Stream Processing - JS Client in action
function ClickFetcher()
{
this.collectData = function (callback)
{
var clicked = 1;
logger.debug('ClickFetcher - collectData called!');
window.onclick = function() {
var collectedData = {
key : window.location.host.toString()+window.location.pathname.toString(),
value: {
payload: clicked,
timestamp: +new Date()
}
};
localstorage.set(collectedData, function (storageResult)
{
logger.debug("err = " + storageResult.hasError());
logger.debug("storageResult = " + storageResult);
}, false, true, true);
clicked++;
};
};
}
var clickFetcher = new ClickFetcher();
starlogclient.on(starlogclient.COLLECTINGDATA, clickFetcher.collectData);
Client Live Demo 


https://localhost:3001/test/1-page-stub.html
REALTIME STREAM PROCESSING
Producer Libraries
-  LoggingComponent: Provides interfaces, filters and handlers
-  LoggingBundle: Glues all together for Symfony2
-  Drupal Logging Module: Using the LoggingComponent
-  JS Frontend Client: LogClient Framework for Browsers

https://github.com/ICANS/IcansLoggingComponent
https://github.com/ICANS/IcansLoggingBundle
https://github.com/ICANS/drupal-logging-module
https://github.com/DECK36/starlog-js-frontend-client
Realtime Stream Processing - PHP & Storm

UseCase: If num of clicks on a Domain % 10 == 0, send „Star Trek Commander“ Badge
Using PHP for that!
https://github.com/Lazyshot/storm-php/blob/master/lib/storm.php

Clicked-Data-UMF

Queue

Event: „Star Trek Commander“ Badge
Storm & PHP Live Demo
REALTIME STREAM PROCESSING
Get Inspired!
Powered-by Storm: https://github.com/nathanmarz/storm/wiki/Powered-By
-  50+ companies (Twitter, Yahoo, Groupon, Ooyala, Baidu, Wayfair, …)
-  Ads & real-time bidding, Data-centric (Economic, Environmental, Health), User interactions
Language-agnostic backend systems (Operate Storm, Develop in PHP)
Streaming „counts“: Sentiment Analysis, Frequent Items, Multi-armed Bandits, …
DRPC: Custom user feeds, Complex Queries (i.e. trace graph links)
Realtime, distributed ETL
-  Buffering / Retries
-  Integrate Data: Third-party API, Machine Learning
-  Store to DBs, Search engines, etc
Questions?
Thanks a lot!
You can find us:

github.com/DECK36

info@deck36.de

deck36.de

More Related Content

What's hot

Ingesting streaming data into Graph Database
Ingesting streaming data into Graph DatabaseIngesting streaming data into Graph Database
Ingesting streaming data into Graph DatabaseGuido Schmutz
 
Redis + Structured Streaming—A Perfect Combination to Scale-Out Your Continuo...
Redis + Structured Streaming—A Perfect Combination to Scale-Out Your Continuo...Redis + Structured Streaming—A Perfect Combination to Scale-Out Your Continuo...
Redis + Structured Streaming—A Perfect Combination to Scale-Out Your Continuo...Databricks
 
Dataflow - A Unified Model for Batch and Streaming Data Processing
Dataflow - A Unified Model for Batch and Streaming Data ProcessingDataflow - A Unified Model for Batch and Streaming Data Processing
Dataflow - A Unified Model for Batch and Streaming Data ProcessingDoiT International
 
Big Data Redis Mongodb Dynamodb Sharding
Big Data Redis Mongodb Dynamodb ShardingBig Data Redis Mongodb Dynamodb Sharding
Big Data Redis Mongodb Dynamodb ShardingAraf Karsh Hamid
 
MongoDB World 2016: Keynote
MongoDB World 2016: KeynoteMongoDB World 2016: Keynote
MongoDB World 2016: KeynoteMongoDB
 
Streaming Auto-scaling in Google Cloud Dataflow
Streaming Auto-scaling in Google Cloud DataflowStreaming Auto-scaling in Google Cloud Dataflow
Streaming Auto-scaling in Google Cloud DataflowC4Media
 
How to write your database: the story about Event Store
How to write your database: the story about Event StoreHow to write your database: the story about Event Store
How to write your database: the story about Event StoreVictor Haydin
 
Streaming Analytics for Financial Enterprises
Streaming Analytics for Financial EnterprisesStreaming Analytics for Financial Enterprises
Streaming Analytics for Financial EnterprisesDatabricks
 
Data Stream Processing - Concepts and Frameworks
Data Stream Processing - Concepts and FrameworksData Stream Processing - Concepts and Frameworks
Data Stream Processing - Concepts and FrameworksMatthias Niehoff
 
Using Riak for Events storage and analysis at Booking.com
Using Riak for Events storage and analysis at Booking.comUsing Riak for Events storage and analysis at Booking.com
Using Riak for Events storage and analysis at Booking.comDamien Krotkine
 
Querying Data Pipeline with AWS Athena
Querying Data Pipeline with AWS AthenaQuerying Data Pipeline with AWS Athena
Querying Data Pipeline with AWS AthenaYaroslav Tkachenko
 
Real-Time Analytics with Apache Cassandra and Apache Spark
Real-Time Analytics with Apache Cassandra and Apache SparkReal-Time Analytics with Apache Cassandra and Apache Spark
Real-Time Analytics with Apache Cassandra and Apache SparkGuido Schmutz
 
[SmartNews] Globally Scalable Web Document Classification Using Word2Vec
[SmartNews] Globally Scalable Web Document Classification Using Word2Vec[SmartNews] Globally Scalable Web Document Classification Using Word2Vec
[SmartNews] Globally Scalable Web Document Classification Using Word2VecKouhei Nakaji
 
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache KafkaSolutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache KafkaGuido Schmutz
 
Real-Time Anomaly Detection with Spark MLlib, Akka and Cassandra
Real-Time Anomaly Detection  with Spark MLlib, Akka and  CassandraReal-Time Anomaly Detection  with Spark MLlib, Akka and  Cassandra
Real-Time Anomaly Detection with Spark MLlib, Akka and CassandraNatalino Busa
 
201809 DB tech showcase
201809 DB tech showcase201809 DB tech showcase
201809 DB tech showcaseKeisuke Suzuki
 
Closing the Loop in Extended Reality with Kafka Streams and Machine Learning ...
Closing the Loop in Extended Reality with Kafka Streams and Machine Learning ...Closing the Loop in Extended Reality with Kafka Streams and Machine Learning ...
Closing the Loop in Extended Reality with Kafka Streams and Machine Learning ...confluent
 
Lambda Architecture Using SQL
Lambda Architecture Using SQLLambda Architecture Using SQL
Lambda Architecture Using SQLSATOSHI TAGOMORI
 

What's hot (20)

Ingesting streaming data into Graph Database
Ingesting streaming data into Graph DatabaseIngesting streaming data into Graph Database
Ingesting streaming data into Graph Database
 
Redis + Structured Streaming—A Perfect Combination to Scale-Out Your Continuo...
Redis + Structured Streaming—A Perfect Combination to Scale-Out Your Continuo...Redis + Structured Streaming—A Perfect Combination to Scale-Out Your Continuo...
Redis + Structured Streaming—A Perfect Combination to Scale-Out Your Continuo...
 
Dataflow - A Unified Model for Batch and Streaming Data Processing
Dataflow - A Unified Model for Batch and Streaming Data ProcessingDataflow - A Unified Model for Batch and Streaming Data Processing
Dataflow - A Unified Model for Batch and Streaming Data Processing
 
Big Data Redis Mongodb Dynamodb Sharding
Big Data Redis Mongodb Dynamodb ShardingBig Data Redis Mongodb Dynamodb Sharding
Big Data Redis Mongodb Dynamodb Sharding
 
MongoDB World 2016: Keynote
MongoDB World 2016: KeynoteMongoDB World 2016: Keynote
MongoDB World 2016: Keynote
 
Streaming Auto-scaling in Google Cloud Dataflow
Streaming Auto-scaling in Google Cloud DataflowStreaming Auto-scaling in Google Cloud Dataflow
Streaming Auto-scaling in Google Cloud Dataflow
 
How to write your database: the story about Event Store
How to write your database: the story about Event StoreHow to write your database: the story about Event Store
How to write your database: the story about Event Store
 
Streaming Analytics for Financial Enterprises
Streaming Analytics for Financial EnterprisesStreaming Analytics for Financial Enterprises
Streaming Analytics for Financial Enterprises
 
Data Stream Processing - Concepts and Frameworks
Data Stream Processing - Concepts and FrameworksData Stream Processing - Concepts and Frameworks
Data Stream Processing - Concepts and Frameworks
 
Using Riak for Events storage and analysis at Booking.com
Using Riak for Events storage and analysis at Booking.comUsing Riak for Events storage and analysis at Booking.com
Using Riak for Events storage and analysis at Booking.com
 
Querying Data Pipeline with AWS Athena
Querying Data Pipeline with AWS AthenaQuerying Data Pipeline with AWS Athena
Querying Data Pipeline with AWS Athena
 
Presto+MySQLで分散SQL
Presto+MySQLで分散SQLPresto+MySQLで分散SQL
Presto+MySQLで分散SQL
 
Real-Time Analytics with Apache Cassandra and Apache Spark
Real-Time Analytics with Apache Cassandra and Apache SparkReal-Time Analytics with Apache Cassandra and Apache Spark
Real-Time Analytics with Apache Cassandra and Apache Spark
 
[SmartNews] Globally Scalable Web Document Classification Using Word2Vec
[SmartNews] Globally Scalable Web Document Classification Using Word2Vec[SmartNews] Globally Scalable Web Document Classification Using Word2Vec
[SmartNews] Globally Scalable Web Document Classification Using Word2Vec
 
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache KafkaSolutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
 
Real-Time Anomaly Detection with Spark MLlib, Akka and Cassandra
Real-Time Anomaly Detection  with Spark MLlib, Akka and  CassandraReal-Time Anomaly Detection  with Spark MLlib, Akka and  Cassandra
Real-Time Anomaly Detection with Spark MLlib, Akka and Cassandra
 
201809 DB tech showcase
201809 DB tech showcase201809 DB tech showcase
201809 DB tech showcase
 
Closing the Loop in Extended Reality with Kafka Streams and Machine Learning ...
Closing the Loop in Extended Reality with Kafka Streams and Machine Learning ...Closing the Loop in Extended Reality with Kafka Streams and Machine Learning ...
Closing the Loop in Extended Reality with Kafka Streams and Machine Learning ...
 
Lambda Architecture Using SQL
Lambda Architecture Using SQLLambda Architecture Using SQL
Lambda Architecture Using SQL
 
Google Cloud Dataflow
Google Cloud DataflowGoogle Cloud Dataflow
Google Cloud Dataflow
 

Similar to Log everything! @DC13

Google Cloud Dataflow Two Worlds Become a Much Better One
Google Cloud Dataflow Two Worlds Become a Much Better OneGoogle Cloud Dataflow Two Worlds Become a Much Better One
Google Cloud Dataflow Two Worlds Become a Much Better OneDataWorks Summit
 
Paris Data Geek - Spark Streaming
Paris Data Geek - Spark Streaming Paris Data Geek - Spark Streaming
Paris Data Geek - Spark Streaming Djamel Zouaoui
 
Giga Spaces Data Grid / Data Caching Overview
Giga Spaces Data Grid / Data Caching OverviewGiga Spaces Data Grid / Data Caching Overview
Giga Spaces Data Grid / Data Caching Overviewjimliddle
 
Pivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream AnalyticsPivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream Analyticskgshukla
 
Buildingsocialanalyticstoolwithmongodb
BuildingsocialanalyticstoolwithmongodbBuildingsocialanalyticstoolwithmongodb
BuildingsocialanalyticstoolwithmongodbMongoDB APAC
 
Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21JDA Labs MTL
 
DSDT Meetup Nov 2017
DSDT Meetup Nov 2017DSDT Meetup Nov 2017
DSDT Meetup Nov 2017DSDT_MTL
 
Deep learning and streaming in Apache Spark 2.2 by Matei Zaharia
Deep learning and streaming in Apache Spark 2.2 by Matei ZahariaDeep learning and streaming in Apache Spark 2.2 by Matei Zaharia
Deep learning and streaming in Apache Spark 2.2 by Matei ZahariaGoDataDriven
 
Apache Spark 2.0: Faster, Easier, and Smarter
Apache Spark 2.0: Faster, Easier, and SmarterApache Spark 2.0: Faster, Easier, and Smarter
Apache Spark 2.0: Faster, Easier, and SmarterDatabricks
 
MongoDB Days Silicon Valley: Winning the Dreamforce Hackathon with MongoDB
MongoDB Days Silicon Valley: Winning the Dreamforce Hackathon with MongoDBMongoDB Days Silicon Valley: Winning the Dreamforce Hackathon with MongoDB
MongoDB Days Silicon Valley: Winning the Dreamforce Hackathon with MongoDBMongoDB
 
Stsg17 speaker yousunjeong
Stsg17 speaker yousunjeongStsg17 speaker yousunjeong
Stsg17 speaker yousunjeongYousun Jeong
 
Instrumenting and Scaling Databases with Envoy
Instrumenting and Scaling Databases with EnvoyInstrumenting and Scaling Databases with Envoy
Instrumenting and Scaling Databases with EnvoyDaniel Hochman
 
Strata+Hadoop 2015 NYC End User Panel on Real-Time Data Analytics
Strata+Hadoop 2015 NYC End User Panel on Real-Time Data AnalyticsStrata+Hadoop 2015 NYC End User Panel on Real-Time Data Analytics
Strata+Hadoop 2015 NYC End User Panel on Real-Time Data AnalyticsSingleStore
 
TenMax Data Pipeline Experience Sharing
TenMax Data Pipeline Experience SharingTenMax Data Pipeline Experience Sharing
TenMax Data Pipeline Experience SharingChen-en Lu
 
Big Telco - Yousun Jeong
Big Telco - Yousun JeongBig Telco - Yousun Jeong
Big Telco - Yousun JeongSpark Summit
 
Big Telco Real-Time Network Analytics
Big Telco Real-Time Network AnalyticsBig Telco Real-Time Network Analytics
Big Telco Real-Time Network AnalyticsYousun Jeong
 
Unified Big Data Processing with Apache Spark
Unified Big Data Processing with Apache SparkUnified Big Data Processing with Apache Spark
Unified Big Data Processing with Apache SparkC4Media
 
Transforming Mobile Push Notifications with Big Data
Transforming Mobile Push Notifications with Big DataTransforming Mobile Push Notifications with Big Data
Transforming Mobile Push Notifications with Big Dataplumbee
 
Writing New Relic Plugins: NSQ
Writing New Relic Plugins: NSQWriting New Relic Plugins: NSQ
Writing New Relic Plugins: NSQlxfontes
 

Similar to Log everything! @DC13 (20)

Javantura v3 - Real-time BigData ingestion and querying of aggregated data – ...
Javantura v3 - Real-time BigData ingestion and querying of aggregated data – ...Javantura v3 - Real-time BigData ingestion and querying of aggregated data – ...
Javantura v3 - Real-time BigData ingestion and querying of aggregated data – ...
 
Google Cloud Dataflow Two Worlds Become a Much Better One
Google Cloud Dataflow Two Worlds Become a Much Better OneGoogle Cloud Dataflow Two Worlds Become a Much Better One
Google Cloud Dataflow Two Worlds Become a Much Better One
 
Paris Data Geek - Spark Streaming
Paris Data Geek - Spark Streaming Paris Data Geek - Spark Streaming
Paris Data Geek - Spark Streaming
 
Giga Spaces Data Grid / Data Caching Overview
Giga Spaces Data Grid / Data Caching OverviewGiga Spaces Data Grid / Data Caching Overview
Giga Spaces Data Grid / Data Caching Overview
 
Pivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream AnalyticsPivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream Analytics
 
Buildingsocialanalyticstoolwithmongodb
BuildingsocialanalyticstoolwithmongodbBuildingsocialanalyticstoolwithmongodb
Buildingsocialanalyticstoolwithmongodb
 
Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21
 
DSDT Meetup Nov 2017
DSDT Meetup Nov 2017DSDT Meetup Nov 2017
DSDT Meetup Nov 2017
 
Deep learning and streaming in Apache Spark 2.2 by Matei Zaharia
Deep learning and streaming in Apache Spark 2.2 by Matei ZahariaDeep learning and streaming in Apache Spark 2.2 by Matei Zaharia
Deep learning and streaming in Apache Spark 2.2 by Matei Zaharia
 
Apache Spark 2.0: Faster, Easier, and Smarter
Apache Spark 2.0: Faster, Easier, and SmarterApache Spark 2.0: Faster, Easier, and Smarter
Apache Spark 2.0: Faster, Easier, and Smarter
 
MongoDB Days Silicon Valley: Winning the Dreamforce Hackathon with MongoDB
MongoDB Days Silicon Valley: Winning the Dreamforce Hackathon with MongoDBMongoDB Days Silicon Valley: Winning the Dreamforce Hackathon with MongoDB
MongoDB Days Silicon Valley: Winning the Dreamforce Hackathon with MongoDB
 
Stsg17 speaker yousunjeong
Stsg17 speaker yousunjeongStsg17 speaker yousunjeong
Stsg17 speaker yousunjeong
 
Instrumenting and Scaling Databases with Envoy
Instrumenting and Scaling Databases with EnvoyInstrumenting and Scaling Databases with Envoy
Instrumenting and Scaling Databases with Envoy
 
Strata+Hadoop 2015 NYC End User Panel on Real-Time Data Analytics
Strata+Hadoop 2015 NYC End User Panel on Real-Time Data AnalyticsStrata+Hadoop 2015 NYC End User Panel on Real-Time Data Analytics
Strata+Hadoop 2015 NYC End User Panel on Real-Time Data Analytics
 
TenMax Data Pipeline Experience Sharing
TenMax Data Pipeline Experience SharingTenMax Data Pipeline Experience Sharing
TenMax Data Pipeline Experience Sharing
 
Big Telco - Yousun Jeong
Big Telco - Yousun JeongBig Telco - Yousun Jeong
Big Telco - Yousun Jeong
 
Big Telco Real-Time Network Analytics
Big Telco Real-Time Network AnalyticsBig Telco Real-Time Network Analytics
Big Telco Real-Time Network Analytics
 
Unified Big Data Processing with Apache Spark
Unified Big Data Processing with Apache SparkUnified Big Data Processing with Apache Spark
Unified Big Data Processing with Apache Spark
 
Transforming Mobile Push Notifications with Big Data
Transforming Mobile Push Notifications with Big DataTransforming Mobile Push Notifications with Big Data
Transforming Mobile Push Notifications with Big Data
 
Writing New Relic Plugins: NSQ
Writing New Relic Plugins: NSQWriting New Relic Plugins: NSQ
Writing New Relic Plugins: NSQ
 

More from DECK36

Our Puppet Story (GUUG FFG 2015)
Our Puppet Story (GUUG FFG 2015)Our Puppet Story (GUUG FFG 2015)
Our Puppet Story (GUUG FFG 2015)DECK36
 
PHP Backends for Real-Time User Interaction using Apache Storm.
PHP Backends for Real-Time User Interaction using Apache Storm.PHP Backends for Real-Time User Interaction using Apache Storm.
PHP Backends for Real-Time User Interaction using Apache Storm.DECK36
 
Effizientere WordPress-Plugin-Entwicklung mit Softwaretests
Effizientere WordPress-Plugin-Entwicklung mit SoftwaretestsEffizientere WordPress-Plugin-Entwicklung mit Softwaretests
Effizientere WordPress-Plugin-Entwicklung mit SoftwaretestsDECK36
 
Real-time Data De-duplication using Locality-sensitive Hashing powered by Sto...
Real-time Data De-duplication using Locality-sensitive Hashing powered by Sto...Real-time Data De-duplication using Locality-sensitive Hashing powered by Sto...
Real-time Data De-duplication using Locality-sensitive Hashing powered by Sto...DECK36
 
Our Puppet Story (Linuxtag 2014)
Our Puppet Story (Linuxtag 2014)Our Puppet Story (Linuxtag 2014)
Our Puppet Story (Linuxtag 2014)DECK36
 
Our Puppet Story – Patterns and Learnings (sage@guug, March 2014)
Our Puppet Story – Patterns and Learnings (sage@guug, March 2014)Our Puppet Story – Patterns and Learnings (sage@guug, March 2014)
Our Puppet Story – Patterns and Learnings (sage@guug, March 2014)DECK36
 
Hyperdex - A closer look
Hyperdex - A closer lookHyperdex - A closer look
Hyperdex - A closer lookDECK36
 

More from DECK36 (7)

Our Puppet Story (GUUG FFG 2015)
Our Puppet Story (GUUG FFG 2015)Our Puppet Story (GUUG FFG 2015)
Our Puppet Story (GUUG FFG 2015)
 
PHP Backends for Real-Time User Interaction using Apache Storm.
PHP Backends for Real-Time User Interaction using Apache Storm.PHP Backends for Real-Time User Interaction using Apache Storm.
PHP Backends for Real-Time User Interaction using Apache Storm.
 
Effizientere WordPress-Plugin-Entwicklung mit Softwaretests
Effizientere WordPress-Plugin-Entwicklung mit SoftwaretestsEffizientere WordPress-Plugin-Entwicklung mit Softwaretests
Effizientere WordPress-Plugin-Entwicklung mit Softwaretests
 
Real-time Data De-duplication using Locality-sensitive Hashing powered by Sto...
Real-time Data De-duplication using Locality-sensitive Hashing powered by Sto...Real-time Data De-duplication using Locality-sensitive Hashing powered by Sto...
Real-time Data De-duplication using Locality-sensitive Hashing powered by Sto...
 
Our Puppet Story (Linuxtag 2014)
Our Puppet Story (Linuxtag 2014)Our Puppet Story (Linuxtag 2014)
Our Puppet Story (Linuxtag 2014)
 
Our Puppet Story – Patterns and Learnings (sage@guug, March 2014)
Our Puppet Story – Patterns and Learnings (sage@guug, March 2014)Our Puppet Story – Patterns and Learnings (sage@guug, March 2014)
Our Puppet Story – Patterns and Learnings (sage@guug, March 2014)
 
Hyperdex - A closer look
Hyperdex - A closer lookHyperdex - A closer look
Hyperdex - A closer look
 

Recently uploaded

The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
 

Recently uploaded (20)

The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
 

Log everything! @DC13

  • 1.
  • 3. Stefan & Mike Dr. Stefan Schadwinkel Mike Lohmann Co-Founder / Analytics Engineer Co-Founder / Software Engineer stefan.schadwinkel@deck36.de mike.lohmann@deck36.de
  • 4. ABOUT DECK36 Who We Are –  DECK36 is a young spin-off from ICANS –  Small team of 7 engineers –  Longstanding expertise in designing, implementing and operating complex web systems –  Developing own data intelligence-focused tools and web services –  Offering our expert knowledge in Automation & Operations, Architecture & Engineering, Analytics & Data Logistics
  • 5. WHAT WE WILL TALK ABOUT Topics –  Log everything! – The Data Pipeline. –  Tackling the Leviathan – Realtime Stream Processing with Storm. –  JS Client DataCollector: Live Demo –  Storm Processing with PHP: Live Demo
  • 7. THE DATA PIPELINE Requirements Background: Building and operating multiple education communities Baseline: PokerStrategy.com KPIs –  6M registered users, 700k posts/month, 2.8M page impressions/day, 7.6M requests/ day New products à New business models à New Questions –  Extendable generic solution –  Storage and accessability more important than specific, optimized applications
  • 8. THE DATA PIPELINE Requirements Producer Transport Storage Analytics Realtime Stream Processing Producer –  Monolog Plugin, JS Client Transport –  Flume 0.9.4 m( à RabbitMQ, Erlang Consumer –  Evaluated Apache Kafka Storage –  Hadoop HDFS (our very own) à Amazon S3
  • 9. THE DATA PIPELINE Logging Pipeline Producer Transport Storage Analytics Realtime Stream Processing Analytics -  Hadoop MapReduce à Amazon EMR, Python, R -  Exports to Excel (CSV), Qlikview à Amazon Redshift Realtime Stream Processing -  Twitter Storm
  • 10. THE DATA PIPELINE Unified Message Format -  Fixed, guaranteed envelope -  Processing driven by message content -  Single message gets compressed (LZOP) to about 70% of original size " (1184 B à 817 B) -  Message bulk gets compressed to about 12-14% of original size " (@ 42k & 325k messages)
  • 12. THE DATA PIPELINE Compaction RabbitMQ consumer (Erlang) stores data to cloud -  Relatively large amount of files -  Mixed messages We want -  A few files -  Messages grouped by „Event Type“ and „Time Partition“ -  Data transformation Determined by message content s3://[BUCKET]/icanslog/[WEBSITE]/icans.content/year=2012/month=10/day=01/part-00000.lzo Hive partitioning!
  • 13. THE DATA PIPELINE Compaction Using Cascalog -  Based on Clojure (LISP) and Cascading -  Provides a Datalog-like query language -  Don‘t LISP? à JCascalog Very handy features (unavailable in Hive or Pig) -  Cascading Output Taps can be parameterized by data records -  Trap location for corrupted records (job finishes for all the correct messages) -  Runs within the JVM à large available codebase, arbitrary processing is simple
  • 14. Cacalog Query Syntax Cascalog is Clojure, Clojure is Lisp (?<- (stdout) Query Operator Cascading Output Tap [?person] Columns of the dataset generated by the query (age ?person ?age) … (< ?age 30)) „Generator“ „Predicate“ -  as many as you want -  both can be any clojure function -  clojure can call anything that is available within a JVM
  • 15. Cacalog Query Syntax Run the Cascalog processing on Amazon EMR: ./elastic-mapreduce [standard parameters omitted] --jar s3://[BUCKET]/mapreduce/compaction/icans-cascalog.jar --main-class icans.cascalogjobs.processing.compaction --args "s3://[BUCKET]/incoming/*/*/*/","s3://[BUCKET]/icanslog","s3://[BUCKET]/icanslog-error
  • 16. The Data Pipeline Data Queries with Hive Hive is table-based and provides SQL-like syntax -  Assumes one storage location (directory) per table -  Simple to use if you know SQL -  Widely used, rapid development for „simple“ queries Hive @ Amazon -  Table locations can be S3 -  „Cluster on demand“ à requires to rebuild Hive metadata -  CREATE TABLE for source and target S3 locations -  Import Table metadata (auto-discovery for partitions) -  INSERT OVERWRITE to query source table(s) and store to target S3 location
  • 18. Hive @ Amazon (2) We can now simply copy the data from S3 and import into any local analytical tool e.g. Excel, Redshift, QlikView, R, etc.
  • 19. Further Reading -  More details in the Log Everything! ebook -  Available at Amazon and DeveloperPress
  • 20. THE DATA PIPELINE Still: It’s Batch Processing -  While quite efficient in flight, the logistics of getting the job started are significant. -  Only cost-efficient for long distance travel.
  • 21. THE DATA PIPELINE Instant Insight through Stream Processing -  Often, only updates for the recent day, week, or month are necessary -  Time is of importance when direct feedback or user interaction is desired
  • 22. More Wind In The Sails With Storm
  • 23. REALTIME STREAM PROCESSING Instant Insight through Stream Processing -  Distributed realtime processing framework -  Battle-proven by Twitter -  All *BINGO-Abilities fulfilled! -  Hadoop = data batch processing; Storm = realtime data processing -  More (and maybe new) *BINGO: DRPC, ETL, RTET, Spouts, Bolts, Tuple, Topology -  Easy to use (Really!)
  • 24. Realtime Stream Processing Infrastructure with Storm Producer Transport Analytics Storage Realtime Data Stream Analytics Storm-Cluster Supervisor NodeJS Supervisor S3 Worker Worker Worker Zabbix Graylog Apps &Server Queue Zookeeper Nimbus (Master) DB
  • 25. REALTIME STREAM PROCESSING JS Client Features -  Event system -  Master/Slave Tabs -  Local queuing of data -  Ability to use node modules -  Easy to extend -  Complete development suite -  Deliver bundles with vendors or not
  • 26. Realtime Stream Processing - Loading the JS Client <script .. src=“https://cdn.tradimo.com/js/starlog-client.min.js?5193e1ba0325c756b78d87384d2f80e9"></script> https://../starlog-client.min.js Create signed cookie starlog-client.min.js Set-Cookie:UUID /socket.io/1/websockets Upgrade: websockets Cookie: UUID Established connection Check cookie HTTP 101 – Protocol Change Connection: Upgrade Upgrade: websocket Collecting Data Sending data in UMF Sending data to the client UMF NodeJS Counts Queue Backend Magic Queue
  • 27. Realtime Stream Processing - JS Client in action UseCase: If num of clicks on a Domain % 10 == 0, send „Star Trek Commander“ Badge ClickEvent collector register onclick Event Clicked-Data observe localstorage Clicked-Data Clicked-Data-UMF SocketConnect NodeJS
  • 28. Realtime Stream Processing - JS Client in action function ClickFetcher() { this.collectData = function (callback) { var clicked = 1; logger.debug('ClickFetcher - collectData called!'); window.onclick = function() { var collectedData = { key : window.location.host.toString()+window.location.pathname.toString(), value: { payload: clicked, timestamp: +new Date() } }; localstorage.set(collectedData, function (storageResult) { logger.debug("err = " + storageResult.hasError()); logger.debug("storageResult = " + storageResult); }, false, true, true); clicked++; }; }; } var clickFetcher = new ClickFetcher(); starlogclient.on(starlogclient.COLLECTINGDATA, clickFetcher.collectData);
  • 29. Client Live Demo https://localhost:3001/test/1-page-stub.html
  • 30. REALTIME STREAM PROCESSING Producer Libraries -  LoggingComponent: Provides interfaces, filters and handlers -  LoggingBundle: Glues all together for Symfony2 -  Drupal Logging Module: Using the LoggingComponent -  JS Frontend Client: LogClient Framework for Browsers https://github.com/ICANS/IcansLoggingComponent https://github.com/ICANS/IcansLoggingBundle https://github.com/ICANS/drupal-logging-module https://github.com/DECK36/starlog-js-frontend-client
  • 31. Realtime Stream Processing - PHP & Storm UseCase: If num of clicks on a Domain % 10 == 0, send „Star Trek Commander“ Badge Using PHP for that! https://github.com/Lazyshot/storm-php/blob/master/lib/storm.php Clicked-Data-UMF Queue Event: „Star Trek Commander“ Badge
  • 32. Storm & PHP Live Demo
  • 33. REALTIME STREAM PROCESSING Get Inspired! Powered-by Storm: https://github.com/nathanmarz/storm/wiki/Powered-By -  50+ companies (Twitter, Yahoo, Groupon, Ooyala, Baidu, Wayfair, …) -  Ads & real-time bidding, Data-centric (Economic, Environmental, Health), User interactions Language-agnostic backend systems (Operate Storm, Develop in PHP) Streaming „counts“: Sentiment Analysis, Frequent Items, Multi-armed Bandits, … DRPC: Custom user feeds, Complex Queries (i.e. trace graph links) Realtime, distributed ETL -  Buffering / Retries -  Integrate Data: Third-party API, Machine Learning -  Store to DBs, Search engines, etc
  • 36. You can find us: github.com/DECK36 info@deck36.de deck36.de