6. Norikra:
Schema-less Stream Processing using SQL
• Server software, written in JRuby, runs on JVM
• Open source software (GPLv2)
• http://norikra.github.io/
• https://github.com/norikra/norikra
7. SELECT user.age, COUNT(*) as cnt
FROM events.win:time_batch(5 mins)
WHERE current=”San Diego”
AND attend.$0 AND attend.$1
GROUP BY user.age
{“name”:”tagomoris”,
“user:{“age”:35, “corp”:”LINE”,
“address”:”Tokyo”},
“current”:”San Diego”,
“speaker”:true,
“attend”:[true,true,false, ...]
}
{“user.age":35,"cnt":5},
{"user.age":36,"cnt":8}, ...
8. How Norikra is Perfect
• Ultra fast bootstrap
• Schema on read
• Handling complex (nested) events
• Dynamic query registration/unregistration
• Simple Web UI
• Data connector: Fluentd
• Extensible: UDF/Listener plugins
• Performance: good enough for small/middle site
9. Schema on Read
• Query first, Data next
• Query must know what it requires
• field names, types of fields, ...
• Platform can ingest any data into processor.
Query can fetch events which matches required
schema.
schema-less (mixed)
data stream
fields subset
for query A
fields subset
for query B
query A
query B
events from
billing service
events from
API endpoint
10. Architecture
Norikra Server (on JVM)
Esper Instance (Query Engine)
Type Definition
Manager
Output Event
Pool
Norikra Engine
RPC Server
mizuno (Jetty + Rack)
Rack RPC Handler
Norikra
Client
msgpack-
rpc-over-http
11. For details :)
• Norikra: Stream Processing with SQL
http://www.slideshare.net/tagomoris/norikra-stream-processing-with-sql
• Norikra: SQL Stream Processing in Ruby
http://www.slideshare.net/tagomoris/norikra-sql-stream-processing-in-ruby
• Norikra in Action
http://www.slideshare.net/tagomoris/norikra-in-action-ver-2014-spring
• Landscape of Norikra Features
http://www.slideshare.net/tagomoris/norikra-meetup-features
• Norikra Recent Updates
http://www.slideshare.net/tagomoris/norikra-recent-updates
12. Recent Updates
• v1.4.0: Jul 19, 2016
• Add support for "-D" and "-agentlib" of JVM
• Update msgpack version
• Previous release v1.3.1: May 7, 2015
• Explained in "Norikra Recent Updates" slide
13. User Companies
• LINE Corporation
• Kayac Inc.
• Mercari, Inc.
• (and some/many others)
15. Perfect Norikra
• All features of Norikra
• Including "Ultra fast bootstrap"
• Compatible RPC API w/ original Norikra
• Distributed execution on any scheduler
• YARN? Mesos? or ...?
• Automatic failover & retry for failures (HA)
• Automated optimization for load balancing
• Dynamic scaling out
from 1 to 100 nodes - without any restarts/retries
19. Handling Long Term Data/History
timeline
Website audience data
Jul 24, 2014
Purchase a car
Jul 28, 2017
....?
Start batch query
to read 3~4 years history
Offer a nice bonus
to possible customer!
Browser session already expired......
20. Stream Processing on Long Term Data
timeline
Website audience data: processed continuously
Jul 24, 2014
Purchase a car
Jul 28, 2017
Got a nice bonus offer!
Jul 28, 2017
Got a wrong offer...
Rewrite the query & start it
without past data...
more 3 years required for test?
21. Resume/Restart of Queries
• Queries may be stopped/killed by many reasons
• cluster version up / migration
• troubles
• Queries should be modified anytime
• wrong logic
• data schema upgrade
• new business requirement
22. What we want:
timeline
Website audience data: processed continuously
Jul 24, 2014
Purchase a car
Jul 28, 2017
Got a nice bonus offer!
Jul 28, 2017
Got a wrong offer...
Rewrite & start the query
with past long history
23. Load "Running" Queries
Load "running" stream query from batch engines!
Submit a stream query
Query the history on batch engines
& load the result as intermediate state of stream query
Start to process realtime data
24. Load "Running" Queries
Load "running" stream query from batch engines!
Submit a stream query
Query the history on batch engines
& load the result as intermediate state of stream query
Start to process realtime data
25. JOINs with Past Data
Submit a stream query w/ JOIN past data
JOIN
Submit a query
Query past data from batch & load it
JOIN
Start to process realtime data w/ JOIN
26. JOINs with Past Data
Submit a stream query w/ JOIN past data
JOIN
Submit a query
Query past data from batch & load it
JOIN
Start to process realtime data w/ JOIN
27. True Lambda Architecture
• Use just one DSL on both of Stream & Batch
• SQL!
• Ingest data stream to both of Stream & Storage
• Handle time window intelligently
• Specify time window out of DSL
• Write once on batch, Run anywhere :D
28. Idempotent Operator State
• As a stream operator with realtime data
• As a loaded stream operator with past data
• Serializable operator internal states
30. SHARED Operators
Sharing Operators between Queries
history
(stream)
history
(batch: 3 - 4 years ago)
JOIN
Query A
filter + projection
Query B
filter + projection
31. Sharing Operators during Updating Query
history
(stream)
history
(batch: 3 - 4 years ago)
JOIN
Query A
filter + projection
Oops, I found mistake on Query A!
32. SHARED Operators
Sharing Operators during Updating Query
history
(stream)
history
(batch: 3 - 4 years ago)
JOIN
Query A
filter + projection
Query A'
filter + projection
I've just added updated query...
33. Sharing Operators during Updating Query
history
(stream)
history
(batch: 3 - 4 years ago)
JOIN
Query A'
filter + projection
It works!
I can remove older one.
34. Perfect Stream Processing
Engine
• Just same SQL on both of Batch and Stream
• Stream processor which can resume queries using batch
query engine results
• reduces memory usage of JOINs
• reduces memory usage about historical data
• Stream Processor which can share operators between
queries
• reduces total amount of memory usage
• makes it possible to restart/update queries anytime,
casually