SlideShare a Scribd company logo
1 of 37
Download to read offline
Perfect Norikra
2nd Season
Stream Processing Casual Talks #2
2017/07/27
Satoshi Tagomori (@tagomoris)
Satoshi "Moris" Tagomori
(@tagomoris)
Fluentd, MessagePack-Ruby, Norikra, ...
Treasure Data, Inc.
http://norikra.github.io/
Streaming
+
SQL
Norikra:

Schema-less Stream Processing using SQL
• Server software, written in JRuby, runs on JVM
• Open source software (GPLv2)
• http://norikra.github.io/
• https://github.com/norikra/norikra
SELECT user.age, COUNT(*) as cnt
FROM events.win:time_batch(5 mins)
WHERE current=”San Diego”
AND attend.$0 AND attend.$1
GROUP BY user.age
{“name”:”tagomoris”,
“user:{“age”:35, “corp”:”LINE”,
“address”:”Tokyo”},
“current”:”San Diego”,
“speaker”:true,
“attend”:[true,true,false, ...]
}
{“user.age":35,"cnt":5},

{"user.age":36,"cnt":8}, ...
How Norikra is Perfect
• Ultra fast bootstrap
• Schema on read
• Handling complex (nested) events
• Dynamic query registration/unregistration
• Simple Web UI
• Data connector: Fluentd
• Extensible: UDF/Listener plugins
• Performance: good enough for small/middle site
Schema on Read
• Query first, Data next
• Query must know what it requires
• field names, types of fields, ...
• Platform can ingest any data into processor.

Query can fetch events which matches required
schema.
schema-less (mixed)
data stream
fields subset
for query A
fields subset
for query B
query A
query B
events from
billing service
events from
API endpoint
Architecture
Norikra Server (on JVM)
Esper Instance (Query Engine)
Type Definition

Manager
Output Event
Pool
Norikra Engine
RPC Server

mizuno (Jetty + Rack)
Rack RPC Handler
Norikra

Client
msgpack-
rpc-over-http
For details :)
• Norikra: Stream Processing with SQL

http://www.slideshare.net/tagomoris/norikra-stream-processing-with-sql
• Norikra: SQL Stream Processing in Ruby

http://www.slideshare.net/tagomoris/norikra-sql-stream-processing-in-ruby
• Norikra in Action

http://www.slideshare.net/tagomoris/norikra-in-action-ver-2014-spring
• Landscape of Norikra Features

http://www.slideshare.net/tagomoris/norikra-meetup-features
• Norikra Recent Updates

http://www.slideshare.net/tagomoris/norikra-recent-updates
Recent Updates
• v1.4.0: Jul 19, 2016
• Add support for "-D" and "-agentlib" of JVM
• Update msgpack version
• Previous release v1.3.1: May 7, 2015
• Explained in "Norikra Recent Updates" slide
User Companies
• LINE Corporation
• Kayac Inc.
• Mercari, Inc.
• (and some/many others)
https://www.slideshare.net/tagomoris/how-to-make-norikra-perfect
Perfect Norikra
• All features of Norikra
• Including "Ultra fast bootstrap"
• Compatible RPC API w/ original Norikra
• Distributed execution on any scheduler
• YARN? Mesos? or ...?
• Automatic failover & retry for failures (HA)
• Automated optimization for load balancing
• Dynamic scaling out

from 1 to 100 nodes - without any restarts/retries
MAKE
Norikra
PERFECT
AGAIN
Features for More Perfection
• Loading operator internal states from Batch query
engines
• Sharing operator internal states between queries
Stream Processing
• Monitoring, Reporting, Alerting
• Fast recommendation
• Matching behaviors
• and ...
Handling Long Term Data/History
timeline
Website audience data
Jul 24, 2014
Purchase a car
Jul 28, 2017
....?
Start batch query

to read 3~4 years history
Offer a nice bonus
to possible customer!
Browser session already expired......
Stream Processing on Long Term Data
timeline
Website audience data: processed continuously
Jul 24, 2014
Purchase a car
Jul 28, 2017
Got a nice bonus offer!
Jul 28, 2017
Got a wrong offer...
Rewrite the query & start it

without past data...
more 3 years required for test?
Resume/Restart of Queries
• Queries may be stopped/killed by many reasons
• cluster version up / migration
• troubles
• Queries should be modified anytime
• wrong logic
• data schema upgrade
• new business requirement
What we want:
timeline
Website audience data: processed continuously
Jul 24, 2014
Purchase a car
Jul 28, 2017
Got a nice bonus offer!
Jul 28, 2017
Got a wrong offer...
Rewrite & start the query
with past long history
Load "Running" Queries
Load "running" stream query from batch engines!
Submit a stream query
Query the history on batch engines
& load the result as intermediate state of stream query
Start to process realtime data
Load "Running" Queries
Load "running" stream query from batch engines!
Submit a stream query
Query the history on batch engines
& load the result as intermediate state of stream query
Start to process realtime data
JOINs with Past Data
Submit a stream query w/ JOIN past data
JOIN
Submit a query
Query past data from batch & load it
JOIN
Start to process realtime data w/ JOIN
JOINs with Past Data
Submit a stream query w/ JOIN past data
JOIN
Submit a query
Query past data from batch & load it
JOIN
Start to process realtime data w/ JOIN
True Lambda Architecture
• Use just one DSL on both of Stream & Batch
• SQL!
• Ingest data stream to both of Stream & Storage
• Handle time window intelligently
• Specify time window out of DSL
• Write once on batch, Run anywhere :D
Idempotent Operator State
• As a stream operator with realtime data
• As a loaded stream operator with past data
• Serializable operator internal states
Sharing Operators
between Queries
Query A
Query B
SHARED Operators
Sharing Operators between Queries
history
(stream)
history
(batch: 3 - 4 years ago)
JOIN
Query A
filter + projection
Query B
filter + projection
Sharing Operators during Updating Query
history
(stream)
history
(batch: 3 - 4 years ago)
JOIN
Query A
filter + projection
Oops, I found mistake on Query A!
SHARED Operators
Sharing Operators during Updating Query
history
(stream)
history
(batch: 3 - 4 years ago)
JOIN
Query A
filter + projection
Query A'
filter + projection
I've just added updated query...
Sharing Operators during Updating Query
history
(stream)
history
(batch: 3 - 4 years ago)
JOIN
Query A'
filter + projection
It works!
I can remove older one.
Perfect Stream Processing
Engine
• Just same SQL on both of Batch and Stream
• Stream processor which can resume queries using batch
query engine results
• reduces memory usage of JOINs
• reduces memory usage about historical data
• Stream Processor which can share operators between
queries
• reduces total amount of memory usage
• makes it possible to restart/update queries anytime,
casually
Perfect
Norikra
Named
It has still 0 bytes.
Stay tuned!
We are hiring! - Treasure Data

More Related Content

What's hot

What's hot (20)

Open Source Software, Distributed Systems, Database as a Cloud Service
Open Source Software, Distributed Systems, Database as a Cloud ServiceOpen Source Software, Distributed Systems, Database as a Cloud Service
Open Source Software, Distributed Systems, Database as a Cloud Service
 
Docker and Fluentd (revised)
Docker and Fluentd (revised)Docker and Fluentd (revised)
Docker and Fluentd (revised)
 
Fluentd - Flexible, Stable, Scalable
Fluentd - Flexible, Stable, ScalableFluentd - Flexible, Stable, Scalable
Fluentd - Flexible, Stable, Scalable
 
Presto At Treasure Data
Presto At Treasure DataPresto At Treasure Data
Presto At Treasure Data
 
Introduction to Presto at Treasure Data
Introduction to Presto at Treasure DataIntroduction to Presto at Treasure Data
Introduction to Presto at Treasure Data
 
Hoodie: How (And Why) We built an analytical datastore on Spark
Hoodie: How (And Why) We built an analytical datastore on SparkHoodie: How (And Why) We built an analytical datastore on Spark
Hoodie: How (And Why) We built an analytical datastore on Spark
 
Presto+MySQLで分散SQL
Presto+MySQLで分散SQLPresto+MySQLで分散SQL
Presto+MySQLで分散SQL
 
Fluentd and Docker - running fluentd within a docker container
Fluentd and Docker - running fluentd within a docker containerFluentd and Docker - running fluentd within a docker container
Fluentd and Docker - running fluentd within a docker container
 
Overview of data analytics service: Treasure Data Service
Overview of data analytics service: Treasure Data ServiceOverview of data analytics service: Treasure Data Service
Overview of data analytics service: Treasure Data Service
 
Lambda Architecture Using SQL
Lambda Architecture Using SQLLambda Architecture Using SQL
Lambda Architecture Using SQL
 
Prestogres, ODBC & JDBC connectivity for Presto
Prestogres, ODBC & JDBC connectivity for PrestoPrestogres, ODBC & JDBC connectivity for Presto
Prestogres, ODBC & JDBC connectivity for Presto
 
How to ensure Presto scalability 
in multi use case
How to ensure Presto scalability 
in multi use case How to ensure Presto scalability 
in multi use case
How to ensure Presto scalability 
in multi use case
 
Planet-scale Data Ingestion Pipeline: Bigdam
Planet-scale Data Ingestion Pipeline: BigdamPlanet-scale Data Ingestion Pipeline: Bigdam
Planet-scale Data Ingestion Pipeline: Bigdam
 
Treasure Data and AWS - Developers.io 2015
Treasure Data and AWS - Developers.io 2015Treasure Data and AWS - Developers.io 2015
Treasure Data and AWS - Developers.io 2015
 
Presto at Twitter
Presto at TwitterPresto at Twitter
Presto at Twitter
 
Presto in my_use_case
Presto in my_use_casePresto in my_use_case
Presto in my_use_case
 
Presto anatomy
Presto anatomyPresto anatomy
Presto anatomy
 
Presto at Facebook - Presto Meetup @ Boston (10/6/2015)
Presto at Facebook - Presto Meetup @ Boston (10/6/2015)Presto at Facebook - Presto Meetup @ Boston (10/6/2015)
Presto at Facebook - Presto Meetup @ Boston (10/6/2015)
 
Presto meetup 2015-03-19 @Facebook
Presto meetup 2015-03-19 @FacebookPresto meetup 2015-03-19 @Facebook
Presto meetup 2015-03-19 @Facebook
 
Presto updates to 0.178
Presto updates to 0.178Presto updates to 0.178
Presto updates to 0.178
 

Viewers also liked

Goのサーバサイド実装におけるレイヤ設計とレイヤ内実装について考える
Goのサーバサイド実装におけるレイヤ設計とレイヤ内実装について考えるGoのサーバサイド実装におけるレイヤ設計とレイヤ内実装について考える
Goのサーバサイド実装におけるレイヤ設計とレイヤ内実装について考える
pospome
 
AWSにおけるバッチ処理の ベストプラクティス - Developers.IO Meetup 05
AWSにおけるバッチ処理の ベストプラクティス - Developers.IO Meetup 05AWSにおけるバッチ処理の ベストプラクティス - Developers.IO Meetup 05
AWSにおけるバッチ処理の ベストプラクティス - Developers.IO Meetup 05
都元ダイスケ Miyamoto
 

Viewers also liked (16)

Ruby and Distributed Storage Systems
Ruby and Distributed Storage SystemsRuby and Distributed Storage Systems
Ruby and Distributed Storage Systems
 
RSpec Performance Turning
RSpec Performance TurningRSpec Performance Turning
RSpec Performance Turning
 
やさしいGemパッチの作り方
やさしいGemパッチの作り方やさしいGemパッチの作り方
やさしいGemパッチの作り方
 
Test::Kantan - Perl and Testing
Test::Kantan - Perl and TestingTest::Kantan - Perl and Testing
Test::Kantan - Perl and Testing
 
How to Begin to Develop Ruby Core
How to Begin to Develop Ruby CoreHow to Begin to Develop Ruby Core
How to Begin to Develop Ruby Core
 
Quine・難解プログラミングについて
Quine・難解プログラミングについてQuine・難解プログラミングについて
Quine・難解プログラミングについて
 
Cookpad 17 day Tech internship 2017 言語処理系入門 Rubyをコンパイルしよう
Cookpad 17 day Tech internship 2017 言語処理系入門 RubyをコンパイルしようCookpad 17 day Tech internship 2017 言語処理系入門 Rubyをコンパイルしよう
Cookpad 17 day Tech internship 2017 言語処理系入門 Rubyをコンパイルしよう
 
Fighting API Compatibility On Fluentd Using "Black Magic"
Fighting API Compatibility On Fluentd Using "Black Magic"Fighting API Compatibility On Fluentd Using "Black Magic"
Fighting API Compatibility On Fluentd Using "Black Magic"
 
20160730 fluentd meetup in matsue slide
20160730 fluentd meetup in matsue slide20160730 fluentd meetup in matsue slide
20160730 fluentd meetup in matsue slide
 
Modern Black Mages Fighting in the Real World
Modern Black Mages Fighting in the Real WorldModern Black Mages Fighting in the Real World
Modern Black Mages Fighting in the Real World
 
Fluentd Overview, Now and Then
Fluentd Overview, Now and ThenFluentd Overview, Now and Then
Fluentd Overview, Now and Then
 
The Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and ContainersThe Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and Containers
 
Esoteric, Obfuscated, Artistic Programming in Ruby
Esoteric, Obfuscated, Artistic Programming in RubyEsoteric, Obfuscated, Artistic Programming in Ruby
Esoteric, Obfuscated, Artistic Programming in Ruby
 
Goのサーバサイド実装におけるレイヤ設計とレイヤ内実装について考える
Goのサーバサイド実装におけるレイヤ設計とレイヤ内実装について考えるGoのサーバサイド実装におけるレイヤ設計とレイヤ内実装について考える
Goのサーバサイド実装におけるレイヤ設計とレイヤ内実装について考える
 
AWSにおけるバッチ処理の ベストプラクティス - Developers.IO Meetup 05
AWSにおけるバッチ処理の ベストプラクティス - Developers.IO Meetup 05AWSにおけるバッチ処理の ベストプラクティス - Developers.IO Meetup 05
AWSにおけるバッチ処理の ベストプラクティス - Developers.IO Meetup 05
 
Fluentd v0.14 Plugin API Details
Fluentd v0.14 Plugin API DetailsFluentd v0.14 Plugin API Details
Fluentd v0.14 Plugin API Details
 

Similar to Perfect Norikra 2nd Season

Similar to Perfect Norikra 2nd Season (20)

Pinot: Enabling Real-time Analytics Applications @ LinkedIn's Scale
Pinot: Enabling Real-time Analytics Applications @ LinkedIn's ScalePinot: Enabling Real-time Analytics Applications @ LinkedIn's Scale
Pinot: Enabling Real-time Analytics Applications @ LinkedIn's Scale
 
Realtime Analytics on AWS
Realtime Analytics on AWSRealtime Analytics on AWS
Realtime Analytics on AWS
 
SplunkLive! Presentation - Data Onboarding with Splunk
SplunkLive! Presentation - Data Onboarding with SplunkSplunkLive! Presentation - Data Onboarding with Splunk
SplunkLive! Presentation - Data Onboarding with Splunk
 
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
 
Apache CarbonData+Spark to realize data convergence and Unified high performa...
Apache CarbonData+Spark to realize data convergence and Unified high performa...Apache CarbonData+Spark to realize data convergence and Unified high performa...
Apache CarbonData+Spark to realize data convergence and Unified high performa...
 
[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL
 
Data Onboarding Breakout Session
Data Onboarding Breakout SessionData Onboarding Breakout Session
Data Onboarding Breakout Session
 
Druid at naver.com - part 1
Druid at naver.com - part 1Druid at naver.com - part 1
Druid at naver.com - part 1
 
Building Scalable Aggregation Systems
Building Scalable Aggregation SystemsBuilding Scalable Aggregation Systems
Building Scalable Aggregation Systems
 
FiloDB: Reactive, Real-Time, In-Memory Time Series at Scale
FiloDB: Reactive, Real-Time, In-Memory Time Series at ScaleFiloDB: Reactive, Real-Time, In-Memory Time Series at Scale
FiloDB: Reactive, Real-Time, In-Memory Time Series at Scale
 
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
 
Making sense of your data jug
Making sense of your data   jugMaking sense of your data   jug
Making sense of your data jug
 
Genji: Framework for building resilient near-realtime data pipelines
Genji: Framework for building resilient near-realtime data pipelinesGenji: Framework for building resilient near-realtime data pipelines
Genji: Framework for building resilient near-realtime data pipelines
 
Towards Benchmaking Modern Distruibuted Systems-(Grace Huang, Intel)
Towards Benchmaking Modern Distruibuted Systems-(Grace Huang, Intel)Towards Benchmaking Modern Distruibuted Systems-(Grace Huang, Intel)
Towards Benchmaking Modern Distruibuted Systems-(Grace Huang, Intel)
 
Cloud Security Monitoring and Spark Analytics
Cloud Security Monitoring and Spark AnalyticsCloud Security Monitoring and Spark Analytics
Cloud Security Monitoring and Spark Analytics
 
(ATS6-PLAT04) Query service
(ATS6-PLAT04) Query service (ATS6-PLAT04) Query service
(ATS6-PLAT04) Query service
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming Visualization
 
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache SparkData-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
 
Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...
Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...
Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...
 
Norikra: SQL Stream Processing In Ruby
Norikra: SQL Stream Processing In RubyNorikra: SQL Stream Processing In Ruby
Norikra: SQL Stream Processing In Ruby
 

More from SATOSHI TAGOMORI

More from SATOSHI TAGOMORI (14)

Ractor's speed is not light-speed
Ractor's speed is not light-speedRactor's speed is not light-speed
Ractor's speed is not light-speed
 
Maccro Strikes Back
Maccro Strikes BackMaccro Strikes Back
Maccro Strikes Back
 
Invitation to the dark side of Ruby
Invitation to the dark side of RubyInvitation to the dark side of Ruby
Invitation to the dark side of Ruby
 
Hijacking Ruby Syntax in Ruby (RubyConf 2018)
Hijacking Ruby Syntax in Ruby (RubyConf 2018)Hijacking Ruby Syntax in Ruby (RubyConf 2018)
Hijacking Ruby Syntax in Ruby (RubyConf 2018)
 
Make Your Ruby Script Confusing
Make Your Ruby Script ConfusingMake Your Ruby Script Confusing
Make Your Ruby Script Confusing
 
Hijacking Ruby Syntax in Ruby
Hijacking Ruby Syntax in RubyHijacking Ruby Syntax in Ruby
Hijacking Ruby Syntax in Ruby
 
Lock, Concurrency and Throughput of Exclusive Operations
Lock, Concurrency and Throughput of Exclusive OperationsLock, Concurrency and Throughput of Exclusive Operations
Lock, Concurrency and Throughput of Exclusive Operations
 
Data Processing and Ruby in the World
Data Processing and Ruby in the WorldData Processing and Ruby in the World
Data Processing and Ruby in the World
 
Fluentd 101
Fluentd 101Fluentd 101
Fluentd 101
 
Hive dirty/beautiful hacks in TD
Hive dirty/beautiful hacks in TDHive dirty/beautiful hacks in TD
Hive dirty/beautiful hacks in TD
 
Data Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby UsageData Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby Usage
 
Tale of ISUCON and Its Bench Tools
Tale of ISUCON and Its Bench ToolsTale of ISUCON and Its Bench Tools
Tale of ISUCON and Its Bench Tools
 
Data-Driven Development Era and Its Technologies
Data-Driven Development Era and Its TechnologiesData-Driven Development Era and Its Technologies
Data-Driven Development Era and Its Technologies
 
Engineer as a Leading Role
Engineer as a Leading RoleEngineer as a Leading Role
Engineer as a Leading Role
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 

Perfect Norikra 2nd Season

  • 1. Perfect Norikra 2nd Season Stream Processing Casual Talks #2 2017/07/27 Satoshi Tagomori (@tagomoris)
  • 2. Satoshi "Moris" Tagomori (@tagomoris) Fluentd, MessagePack-Ruby, Norikra, ... Treasure Data, Inc.
  • 3.
  • 6. Norikra:
 Schema-less Stream Processing using SQL • Server software, written in JRuby, runs on JVM • Open source software (GPLv2) • http://norikra.github.io/ • https://github.com/norikra/norikra
  • 7. SELECT user.age, COUNT(*) as cnt FROM events.win:time_batch(5 mins) WHERE current=”San Diego” AND attend.$0 AND attend.$1 GROUP BY user.age {“name”:”tagomoris”, “user:{“age”:35, “corp”:”LINE”, “address”:”Tokyo”}, “current”:”San Diego”, “speaker”:true, “attend”:[true,true,false, ...] } {“user.age":35,"cnt":5},
 {"user.age":36,"cnt":8}, ...
  • 8. How Norikra is Perfect • Ultra fast bootstrap • Schema on read • Handling complex (nested) events • Dynamic query registration/unregistration • Simple Web UI • Data connector: Fluentd • Extensible: UDF/Listener plugins • Performance: good enough for small/middle site
  • 9. Schema on Read • Query first, Data next • Query must know what it requires • field names, types of fields, ... • Platform can ingest any data into processor.
 Query can fetch events which matches required schema. schema-less (mixed) data stream fields subset for query A fields subset for query B query A query B events from billing service events from API endpoint
  • 10. Architecture Norikra Server (on JVM) Esper Instance (Query Engine) Type Definition Manager Output Event Pool Norikra Engine RPC Server mizuno (Jetty + Rack) Rack RPC Handler Norikra Client msgpack- rpc-over-http
  • 11. For details :) • Norikra: Stream Processing with SQL
 http://www.slideshare.net/tagomoris/norikra-stream-processing-with-sql • Norikra: SQL Stream Processing in Ruby
 http://www.slideshare.net/tagomoris/norikra-sql-stream-processing-in-ruby • Norikra in Action
 http://www.slideshare.net/tagomoris/norikra-in-action-ver-2014-spring • Landscape of Norikra Features
 http://www.slideshare.net/tagomoris/norikra-meetup-features • Norikra Recent Updates
 http://www.slideshare.net/tagomoris/norikra-recent-updates
  • 12. Recent Updates • v1.4.0: Jul 19, 2016 • Add support for "-D" and "-agentlib" of JVM • Update msgpack version • Previous release v1.3.1: May 7, 2015 • Explained in "Norikra Recent Updates" slide
  • 13. User Companies • LINE Corporation • Kayac Inc. • Mercari, Inc. • (and some/many others)
  • 15. Perfect Norikra • All features of Norikra • Including "Ultra fast bootstrap" • Compatible RPC API w/ original Norikra • Distributed execution on any scheduler • YARN? Mesos? or ...? • Automatic failover & retry for failures (HA) • Automated optimization for load balancing • Dynamic scaling out
 from 1 to 100 nodes - without any restarts/retries
  • 17. Features for More Perfection • Loading operator internal states from Batch query engines • Sharing operator internal states between queries
  • 18. Stream Processing • Monitoring, Reporting, Alerting • Fast recommendation • Matching behaviors • and ...
  • 19. Handling Long Term Data/History timeline Website audience data Jul 24, 2014 Purchase a car Jul 28, 2017 ....? Start batch query
 to read 3~4 years history Offer a nice bonus to possible customer! Browser session already expired......
  • 20. Stream Processing on Long Term Data timeline Website audience data: processed continuously Jul 24, 2014 Purchase a car Jul 28, 2017 Got a nice bonus offer! Jul 28, 2017 Got a wrong offer... Rewrite the query & start it
 without past data... more 3 years required for test?
  • 21. Resume/Restart of Queries • Queries may be stopped/killed by many reasons • cluster version up / migration • troubles • Queries should be modified anytime • wrong logic • data schema upgrade • new business requirement
  • 22. What we want: timeline Website audience data: processed continuously Jul 24, 2014 Purchase a car Jul 28, 2017 Got a nice bonus offer! Jul 28, 2017 Got a wrong offer... Rewrite & start the query with past long history
  • 23. Load "Running" Queries Load "running" stream query from batch engines! Submit a stream query Query the history on batch engines & load the result as intermediate state of stream query Start to process realtime data
  • 24. Load "Running" Queries Load "running" stream query from batch engines! Submit a stream query Query the history on batch engines & load the result as intermediate state of stream query Start to process realtime data
  • 25. JOINs with Past Data Submit a stream query w/ JOIN past data JOIN Submit a query Query past data from batch & load it JOIN Start to process realtime data w/ JOIN
  • 26. JOINs with Past Data Submit a stream query w/ JOIN past data JOIN Submit a query Query past data from batch & load it JOIN Start to process realtime data w/ JOIN
  • 27. True Lambda Architecture • Use just one DSL on both of Stream & Batch • SQL! • Ingest data stream to both of Stream & Storage • Handle time window intelligently • Specify time window out of DSL • Write once on batch, Run anywhere :D
  • 28. Idempotent Operator State • As a stream operator with realtime data • As a loaded stream operator with past data • Serializable operator internal states
  • 30. SHARED Operators Sharing Operators between Queries history (stream) history (batch: 3 - 4 years ago) JOIN Query A filter + projection Query B filter + projection
  • 31. Sharing Operators during Updating Query history (stream) history (batch: 3 - 4 years ago) JOIN Query A filter + projection Oops, I found mistake on Query A!
  • 32. SHARED Operators Sharing Operators during Updating Query history (stream) history (batch: 3 - 4 years ago) JOIN Query A filter + projection Query A' filter + projection I've just added updated query...
  • 33. Sharing Operators during Updating Query history (stream) history (batch: 3 - 4 years ago) JOIN Query A' filter + projection It works! I can remove older one.
  • 34. Perfect Stream Processing Engine • Just same SQL on both of Batch and Stream • Stream processor which can resume queries using batch query engine results • reduces memory usage of JOINs • reduces memory usage about historical data • Stream Processor which can share operators between queries • reduces total amount of memory usage • makes it possible to restart/update queries anytime, casually
  • 36. Named
  • 37. It has still 0 bytes. Stay tuned! We are hiring! - Treasure Data