SlideShare a Scribd company logo
1 of 94
System revolution
How we did it
Victor Perepelitsky
questions: www.meetup.com/ILTechTalks/events/226834931/
slideshare: www.slideshare.net/victorperepelitsky
email: victor.prp@gmail.com
LivePerson customer example
salesman visitor from UK
chat lines
get session state activity revents
chat lines
sales
manager
invite chat UK visitors
see reports
invite
3
LivePerson at a glance
4
● Account (brand) - LivePerson customer
● Visitor - individuals who interacts with the
business owner’s brand
● Agent - an account representative who may
interact with visitors (examples: technical
support, sales)
● Admin - an account representative who defined
the business goals and normally manages
agents in order to effectively reach them
LivePerson at a glance
agent visitor
Chat scale
(2K req/sec)
Visitor scale
(100K req/sec)
chat lines
get session state activity revents
chat lines
admin
define business rules
see reports
Admin scale
(under 100
req/sec)
invite
5
Legacy
agent visitor
Chat scale
(2K req/sec)
Visitor scale
(100K req/sec)
chat lines
get session state activity revents
chat lines
admin
define business rulessee reports
Admin scale
(under 100
req/sec)
Real Time Server
Offline and Reporting
6
Legacy - stateful + account sticky
session from
account B
RT server
E, F, G
RT server
A, C
RT server
B, D
web server web server
session from
account A
7
Legacy
● Works
● Fast
● Partially resilient
● Huge amount of features
8
Legacy - pains
● Hard to scale
● Hard to add new features
● Poor resource utilization
● Poor manageability
● Poor QoS
● Huge friction with customers 9
Let's go back
agent visitor
Chat scale
(2K req/sec)
Visitor scale
(100K req/sec)
chat lines
get session state activity revents
chat lines
admin
define business rules
see reports
Admin scale
(under 100
req/sec)
invite
10
Proper system architecture
agent visitor
Chat scale
(2K req/sec)
Visitor scale
(100K req/sec)
chat lines
get session state activity revents
chat lines
admin
define business rulessee reports
Admin scale
(under 100
req/sec)
real time
offline
reporting config
11
The new dream
agent visitor
Chat scale
(2K req/sec)
Visitor scale
(100K req/sec)
chat lines
session state
activity reventschat lines
admin
business rules
see reports
Admin scale
(under 1K
req/sec)
chat
offline
reporting config
monitor and
engage
* Business App / Extension 12
Monitor and engage = shark
Shark manifesto
● Collects and makes available data about
individuals (visitors) as they interact with the
business owner’s brand (account)
● Acts in real-time to engage visitors (chat, ad,
call etc..)
● Is a platform for a business logic modules
(sharklets) which might be independently
developed and deployed
13
Fundamental decisions
Requirements?
14
Platform requirements
● E2E latency within DC < 30 mills
● Good resources utilization (CPU > 50%)
● Efficient - At least 500 req/sec per node
● Sharklet development lifecycle is independent
● High Availability
○ uptime > 99.99999%
○ data loss < 0.01%
● Resilient - no service downtime when external resource is
unavailable (minimal degradation is allowed)
● Business logic correctness - 99.9%
15
Fundamental decisions
Requirements? -> defined
Stateful or stateless?
16
Stateful
stickiness
is
required
session 1 session 2 session 3 session 4
17
Stateless
session 1 session 2 session 3 session 4
session
data
Each request
potentially
requires
access to
session data
store
18
Facts that helped us to decide
1. Legacy works as “Stateful without HA”
2. A small data loss has a tiny customer
impact (0.01% loss is good enough)
3. Stateless requires much more
resources and initial effort
4. We can add HA store in the future
19
Stateful shark
ACCOUNT Nsession B
RT server
E, F, G
RT server
A, C
RT server
D
web server web server
session A
NN , B
20
Fundamental decisions
Requirements? -> defined
Stateful or stateless?
What are the big parts?
21
What are the big parts?
22
Legacy - successful patterns
1. Requests are processed in memory
2. External resources are accessed
asynchronously to visitor requests
3. Customer Rules and Data
(AccountConfig) are kept in memory
and may be updated on background
23
Legacy - pains
1. Order of calls (inside code + rules)
2. Business logic are not pluggable
components
3. Http requests are tightly coupled
within logical levels (hard to move
toward other protocols as
WebSockets)
24
25
SYNC -
Fast CEP,
engagements
ASYNC -
slow actions,
external
resources
access
sharklet A
(sync
handlers)
sharklet A
(async
handlers)
web
visitor
agent
mobile
visitor
facadeadapter adapter adapter
Account
Runtime
Data
Message
BUS
external
resource
26
Shark - The Big Parts
1. Facade - decouples real world protocols
from the logical layers
2. CEP - avoids call order management
3. Sync - very fast in memory processing
4. Async - allows slow actions and ext
resources access
5. Account Runtime Store - allows in
memory access to customer
configuration
27
Fundamental decisions
Requirements? -> defined
Stateful or stateless? -> stateful
What are the big parts? -> we have it
Basic technology stack
28
Basic technology stack - ?
29
We were practical
CEP technology?
30
CEP - in a nutshell
31
Drools - in a nutshell
32
Drools - we tried to kill it
We had
● played with it - :)
● integrated into shark - :)
● made a POC using LivePerson logic - :)
● tested for performance - :(
33
We played with more technologies
34
And finally chose the solution
35
Shark CEP - processing cycle
handler 1
handler 2
handler 3
Event
Queue
b
a
36
Shark CEP - processing cycle
handler 1
handler 2
handler 3
Event
Queue
a
b
37
Shark CEP - processing cycle
handler 1
handler 2
handler 3
Event
Queue
ba
a
38
Shark CEP - processing cycle
handler 1
handler 2
handler 3
Event
Queue
b
c
39
Shark CEP - processing cycle
handler 1
handler 2
handler 3
Event
Queue
b
c
40
Shark CEP - processing cycle
handler 1
handler 2
handler 3
Event
Queue
b
c
41
Shark CEP - processing cycle
handler 1
handler 2
handler 3
Event
Queue
42
Sharklet handler example
43
Fundamental decisions
Stateful or stateless? -> stateful
What are the big parts? -> we have it
Basic technology stack -> choosed
CEP - Technology choice -> DIY (inhouse)
44
Fundamental decisions
Stateful or stateless? -> stateful
What are the big parts? -> we have it
Basic technology stack -> choosed
CEP - Technology choice -> DIY (inhouse)
Locking architecture
45
Locking - The model
The world
account A
session 1
session 1
session 1
session
4
46
Locking - Legacy pains
● You must be aware of locking when
writing a business logic
● Write lock on account freezes all account
operations
● Locking became the bottleneck
(Not a CPU)
● BUGs 47
Locking - Shark solution
● Read/Write lock for session
● Write business logic only - no locking
awareness
● No write lock on account - copy on write
48
SYNC -
A single proc
cycle uses
consistent
account data
copy
ASYNC -
updates
account data
using copy on
write pattern
sharklet A
(sync
handlers)
sharklet A
(async
handlers)
web
visitor
agent
mobile
visitor
facadeadapter adapter adapter
Account
Runtime
Data
external
resource
49
Sharklet example (no locks)
50
Fundamental decisions
Stateful or stateless? -> stateful
What are the big parts? -> we have it
Basic technology stack -> choosed
CEP - Technology choice -> DIY (inhouse)
Locking architecture -> decided
51
We had a good start
52
But! We were alone
53
LiveEngage - the big decision
54
Dream = LiveEngage platform
agent visitor
Chat scale
(2K req/sec)
Visitor scale
(100K req/sec)
chat lines
session state
activity reventschat lines
admin
business rules
see reports
Admin scale
(under 1K
req/sec)
chat
offline
reporting config
monitor and
engage
* Business App / Extension 55
Rules - from definition to runtime
visitor
activity revents
admin
business rules
config
monitor and
engage
* Business App / Extension
if the
visitor
meets the
conditions
-> invite
to chat
56
Rules in LiveEngage dream
57
What is rules engine
Rules engine serves as pluggable software
component which executes business rules
These rules are externalized or separated
from application code
58
Rules engine implementation
Boolean logic is the easy part
59
Rules engine implementation
Hard to detect which conditions
must be evaluated
new
Fact
60
Rules engine implementation
Hard to implement drools like DSL
61
Rules Engine -
How to make it happen?
62
● Drools - Eats memory
● Legacy rules engine
○ Customer friction is too high
○ Not efficient
63
64
GRF - Generic Rules Framework
Conditions and outcomes are
building blocks that can be used
for complex rules creation
hard coded building blocks
TimeOnPage
GeoLocation
InviteToChat
rule
if (
timeOnPage(5)
and
geoLocation(“US”)
)execute{
inviteToChat()
}
65
GRF + CEP = RulesEngine
GeoLocation condition
trigger when (geo data is changed)
evaluate(geo, accountConfig){
if (geo == accountConfig.geo)
TRUE
else
FALSE
}
Condition type
implementor defines
the evaluation
trigger instead of
automatic detection
66
Shark Rules Engine (Condition)
67
GRF - giraffe
GIRAFFE
68
SYNC -
Detects which
conditions
should be
evaluated and
trigger GRF
ASYNC -
loades rules to
shark rules
engine
sharklet A
(sync
handlers)
sharklet A
(async
handlers)
web
visitor
agent
mobile
visitor
facadeadapter adapter adapter
Account
Runtime
Data
Message
BUS
Account
Config
Rules
Engine
69
We did a little more
AND
Felt ready to go
70
SYNC -
CEP, Rules,
Report-Sharklet
ASYNC -
integrated with
account config
sharklet B sharklet B
web
visitor
agent
mobile
visitor
facadeadapter adapter adapter
Rules
Engine
Account
Config
Account
Runtime
Data
Message
BUS
sharklet A sharklet A
Account
Config
Service
71
Feel the field
Legacy
agent visitoradmin
activities
- Silent mode
72
The dream comes true
agent visitor
chat lines
session state
activity reventschat lines
admin
business rules
see reports
chat
offline
reporting config
monitor and
engage
* Business App / Extension 73
Platform in action
Legacy
chat
agent visitoradmin
activities
engagements
Account
Config
Reports
First small customers
74
Shark
We started with small cluster
And just added servers with business growth
75
We recognized major bottlenecks
76
And easily fixed it
77
Tools and techniques
● Statistics monitoring
● Testing methodology
● Java 8
● Notes about G1
78
Statistics monitoring - graphite
79
Statistics monitoring - graphite
80
Statistics monitoring - metrics
https://github.com/dropwizard/metrics
http://metrics.dropwizard.io
private final Timer responses =
metrics.timer(name(RequestHandler.class, "responses"));
public String handleRequest(Request request, Response response) {
final Timer.Context context = responses.time();
try {
// etc;
return "OK";
} finally {
context.stop();
}
}
81
Testing methodology
● Unit test - use it
● Integration test - invest here
● System test - try to minimize effort
● Performance
○ Integration - worth it
○ System - choose your tests
82
Performance test logs
83
Performance test validations
84
Testing methodology
How did we test platform?
We had
● built main code with tests in mind
● mocked our clients
85
Java 8
● We moved to java 8 one year ago
● It was easy :)
● Pushed us to
○ more expressive code
○ functional style
○ immutability
search on youtube - LivePerson Functional Java 8
86
Notes about G1
● Designed for big heaps and
minimizes big pauses
● Is considered to be the default GC
in java 9
● We have tested our system with G1
when 12 GB was used and
○ received good results (no big GC
paused)
87
88
We are happy now
● Horizontal scalability
● Independent and safe business
logic development
● Fast development cycles (platform,
sharklets, data-model)
● Efficient resource utilization
● Less BUGs (Easier to fix)
● Better QoS
● Overall confidence
89
Numbers
____________________________________
Pick statistics Shark Legacy
Concurrent visitors ~ 100K ~ 1 Million
Request/Sec ~ 11K ~ 110K
Machines ~ 34 ~700
Cores ~ 224 ~ 6300
Cost per visitor ~ 0.001 ~ 0.006
90
Future challenges and ideas
● Better High availability
● Deployment with no downtime
● Management tools
● 100K accounts
91
Tips
● Define scope and requirements
● Company commitment is a must
● Work with your clients
● Treat test code as if it runs in
production
● Automated perf tests - it helps
● Sometimes DIY is the best solution
● Respect legacy - combine old ideas
with new technologies
● Understand the complexity and find
the simplest solution 92
Never stop dreaming
93
THANK YOU!
We are hiring
94

More Related Content

What's hot

Flink Forward San Francisco 2018: David Reniz & Dahyr Vergara - "Real-time m...
Flink Forward San Francisco 2018:  David Reniz & Dahyr Vergara - "Real-time m...Flink Forward San Francisco 2018:  David Reniz & Dahyr Vergara - "Real-time m...
Flink Forward San Francisco 2018: David Reniz & Dahyr Vergara - "Real-time m...
Flink Forward
 
SFBigAnalytics_20190724: Monitor kafka like a Pro
SFBigAnalytics_20190724: Monitor kafka like a ProSFBigAnalytics_20190724: Monitor kafka like a Pro
SFBigAnalytics_20190724: Monitor kafka like a Pro
Chester Chen
 
Introducing Kafka's Streams API
Introducing Kafka's Streams APIIntroducing Kafka's Streams API
Introducing Kafka's Streams API
confluent
 

What's hot (20)

Understanding Apache Kafka® Latency at Scale
Understanding Apache Kafka® Latency at ScaleUnderstanding Apache Kafka® Latency at Scale
Understanding Apache Kafka® Latency at Scale
 
Flink Forward San Francisco 2018: Andrew Gao & Jeff Sharpe - "Finding Bad Ac...
Flink Forward San Francisco 2018: Andrew Gao &  Jeff Sharpe - "Finding Bad Ac...Flink Forward San Francisco 2018: Andrew Gao &  Jeff Sharpe - "Finding Bad Ac...
Flink Forward San Francisco 2018: Andrew Gao & Jeff Sharpe - "Finding Bad Ac...
 
A Practical Guide to Selecting a Stream Processing Technology
A Practical Guide to Selecting a Stream Processing Technology A Practical Guide to Selecting a Stream Processing Technology
A Practical Guide to Selecting a Stream Processing Technology
 
How Samsung Engineers Do Pre-Commit Builds with Perforce Helix Streams
How Samsung Engineers Do Pre-Commit Builds with Perforce Helix StreamsHow Samsung Engineers Do Pre-Commit Builds with Perforce Helix Streams
How Samsung Engineers Do Pre-Commit Builds with Perforce Helix Streams
 
Redis Day Bangalore 2020 - Session state caching with redis
Redis Day Bangalore 2020 - Session state caching with redisRedis Day Bangalore 2020 - Session state caching with redis
Redis Day Bangalore 2020 - Session state caching with redis
 
Abstractions for managed stream processing platform (Arya Ketan - Flipkart)
Abstractions for managed stream processing platform (Arya Ketan - Flipkart)Abstractions for managed stream processing platform (Arya Ketan - Flipkart)
Abstractions for managed stream processing platform (Arya Ketan - Flipkart)
 
Using Perforce Data in Development at Tableau
Using Perforce Data in Development at TableauUsing Perforce Data in Development at Tableau
Using Perforce Data in Development at Tableau
 
Supercharge Your Real-time Event Processing with Neo4j's Streams Kafka Connec...
Supercharge Your Real-time Event Processing with Neo4j's Streams Kafka Connec...Supercharge Your Real-time Event Processing with Neo4j's Streams Kafka Connec...
Supercharge Your Real-time Event Processing with Neo4j's Streams Kafka Connec...
 
Metrics Are Not Enough: Monitoring Apache Kafka and Streaming Applications
Metrics Are Not Enough: Monitoring Apache Kafka and Streaming ApplicationsMetrics Are Not Enough: Monitoring Apache Kafka and Streaming Applications
Metrics Are Not Enough: Monitoring Apache Kafka and Streaming Applications
 
Flink Forward San Francisco 2018: David Reniz & Dahyr Vergara - "Real-time m...
Flink Forward San Francisco 2018:  David Reniz & Dahyr Vergara - "Real-time m...Flink Forward San Francisco 2018:  David Reniz & Dahyr Vergara - "Real-time m...
Flink Forward San Francisco 2018: David Reniz & Dahyr Vergara - "Real-time m...
 
SFBigAnalytics_20190724: Monitor kafka like a Pro
SFBigAnalytics_20190724: Monitor kafka like a ProSFBigAnalytics_20190724: Monitor kafka like a Pro
SFBigAnalytics_20190724: Monitor kafka like a Pro
 
Metrics are Not Enough: Monitoring Apache Kafka / Gwen Shapira (Confluent)
Metrics are Not Enough: Monitoring Apache Kafka / Gwen Shapira (Confluent)Metrics are Not Enough: Monitoring Apache Kafka / Gwen Shapira (Confluent)
Metrics are Not Enough: Monitoring Apache Kafka / Gwen Shapira (Confluent)
 
Pivoting Spring XD to Spring Cloud Data Flow with Sabby Anandan
Pivoting Spring XD to Spring Cloud Data Flow with Sabby AnandanPivoting Spring XD to Spring Cloud Data Flow with Sabby Anandan
Pivoting Spring XD to Spring Cloud Data Flow with Sabby Anandan
 
Flink Forward San Francisco 2018 keynote: Srikanth Satya - "Stream Processin...
Flink Forward San Francisco 2018 keynote:  Srikanth Satya - "Stream Processin...Flink Forward San Francisco 2018 keynote:  Srikanth Satya - "Stream Processin...
Flink Forward San Francisco 2018 keynote: Srikanth Satya - "Stream Processin...
 
Apache Pulsar at Tencent Game: Adoption, Operational Quality Optimization Exp...
Apache Pulsar at Tencent Game: Adoption, Operational Quality Optimization Exp...Apache Pulsar at Tencent Game: Adoption, Operational Quality Optimization Exp...
Apache Pulsar at Tencent Game: Adoption, Operational Quality Optimization Exp...
 
Build a High-performance Partner Analytics Platform by Ashish Jadhav and Neer...
Build a High-performance Partner Analytics Platform by Ashish Jadhav and Neer...Build a High-performance Partner Analytics Platform by Ashish Jadhav and Neer...
Build a High-performance Partner Analytics Platform by Ashish Jadhav and Neer...
 
Zeebe - a Microservice Orchestration Engine
Zeebe - a Microservice Orchestration Engine Zeebe - a Microservice Orchestration Engine
Zeebe - a Microservice Orchestration Engine
 
Running Kafka for Maximum Pain
Running Kafka for Maximum PainRunning Kafka for Maximum Pain
Running Kafka for Maximum Pain
 
Introducing Kafka's Streams API
Introducing Kafka's Streams APIIntroducing Kafka's Streams API
Introducing Kafka's Streams API
 
Kafka Needs No Keeper
Kafka Needs No KeeperKafka Needs No Keeper
Kafka Needs No Keeper
 

Viewers also liked

WWDC 2012 in novosti iz Applovega sveta_Marketing Magazin_jul2012_st.374_str.16
WWDC 2012 in novosti iz Applovega sveta_Marketing Magazin_jul2012_st.374_str.16WWDC 2012 in novosti iz Applovega sveta_Marketing Magazin_jul2012_st.374_str.16
WWDC 2012 in novosti iz Applovega sveta_Marketing Magazin_jul2012_st.374_str.16
Urska Saletinger
 
Internet Filtering In South Korea
Internet Filtering In South KoreaInternet Filtering In South Korea
Internet Filtering In South Korea
michroeder
 
[Lt]hyper vの仮想ネットワーク
[Lt]hyper vの仮想ネットワーク[Lt]hyper vの仮想ネットワーク
[Lt]hyper vの仮想ネットワーク
Masaya Sawada
 
Value of libraries - ANU Outsell persentation
Value of libraries - ANU Outsell persentationValue of libraries - ANU Outsell persentation
Value of libraries - ANU Outsell persentation
Roxanne Missingham
 

Viewers also liked (20)

Satfinder
SatfinderSatfinder
Satfinder
 
Installing jdeveloper 11.1.1.7 in linux
Installing jdeveloper 11.1.1.7 in linuxInstalling jdeveloper 11.1.1.7 in linux
Installing jdeveloper 11.1.1.7 in linux
 
eBook: Reductions in Force - A Ten Point Inspection
eBook: Reductions in Force - A Ten Point InspectioneBook: Reductions in Force - A Ten Point Inspection
eBook: Reductions in Force - A Ten Point Inspection
 
Rendement is meer dan alleen het verbeteren van het energielabel
Rendement is meer dan alleen het verbeteren van het energielabelRendement is meer dan alleen het verbeteren van het energielabel
Rendement is meer dan alleen het verbeteren van het energielabel
 
Tevii
TeviiTevii
Tevii
 
Azbox
AzboxAzbox
Azbox
 
夏サミ2013 Hadoopを使わない独自の分散処理環境の構築とその運用
夏サミ2013 Hadoopを使わない独自の分散処理環境の構築とその運用夏サミ2013 Hadoopを使わない独自の分散処理環境の構築とその運用
夏サミ2013 Hadoopを使わない独自の分散処理環境の構築とその運用
 
Missingham foi
Missingham foiMissingham foi
Missingham foi
 
Cache management obiee 11g
Cache management obiee 11gCache management obiee 11g
Cache management obiee 11g
 
Weblogic as a windows service
Weblogic as a windows serviceWeblogic as a windows service
Weblogic as a windows service
 
WWDC 2012 in novosti iz Applovega sveta_Marketing Magazin_jul2012_st.374_str.16
WWDC 2012 in novosti iz Applovega sveta_Marketing Magazin_jul2012_st.374_str.16WWDC 2012 in novosti iz Applovega sveta_Marketing Magazin_jul2012_st.374_str.16
WWDC 2012 in novosti iz Applovega sveta_Marketing Magazin_jul2012_st.374_str.16
 
Internet Filtering In South Korea
Internet Filtering In South KoreaInternet Filtering In South Korea
Internet Filtering In South Korea
 
Advanced Excel, Day 1
Advanced Excel, Day 1Advanced Excel, Day 1
Advanced Excel, Day 1
 
Groenregeling na update ubouw nieuwbouw en renovatie
Groenregeling na update ubouw nieuwbouw en renovatieGroenregeling na update ubouw nieuwbouw en renovatie
Groenregeling na update ubouw nieuwbouw en renovatie
 
Activiteitenbesluit Wet Milieubeheer
Activiteitenbesluit Wet MilieubeheerActiviteitenbesluit Wet Milieubeheer
Activiteitenbesluit Wet Milieubeheer
 
Horizon
HorizonHorizon
Horizon
 
[Lt]hyper vの仮想ネットワーク
[Lt]hyper vの仮想ネットワーク[Lt]hyper vの仮想ネットワーク
[Lt]hyper vの仮想ネットワーク
 
Value of libraries - ANU Outsell persentation
Value of libraries - ANU Outsell persentationValue of libraries - ANU Outsell persentation
Value of libraries - ANU Outsell persentation
 
Stepmother Myth
Stepmother MythStepmother Myth
Stepmother Myth
 
26 1
26 126 1
26 1
 

Similar to System Revolution- How We Did It

webtechfeb20replicationmanagement_final
webtechfeb20replicationmanagement_finalwebtechfeb20replicationmanagement_final
webtechfeb20replicationmanagement_final
Koichiro Nakajima
 
39245147 intro-es-i
39245147 intro-es-i39245147 intro-es-i
39245147 intro-es-i
Embeddedbvp
 
Service Ownership with PagerDuty and Rundeck: Help others help you
Service Ownership with PagerDuty and Rundeck:  Help others help you Service Ownership with PagerDuty and Rundeck:  Help others help you
Service Ownership with PagerDuty and Rundeck: Help others help you
TraciMyers5
 
01_Team_03_CS_591_Project
01_Team_03_CS_591_Project01_Team_03_CS_591_Project
01_Team_03_CS_591_Project
harsh mehta
 

Similar to System Revolution- How We Did It (20)

webtechfeb20replicationmanagement_final
webtechfeb20replicationmanagement_finalwebtechfeb20replicationmanagement_final
webtechfeb20replicationmanagement_final
 
Honorable Squires
Honorable SquiresHonorable Squires
Honorable Squires
 
39245147 intro-es-i
39245147 intro-es-i39245147 intro-es-i
39245147 intro-es-i
 
RPA Webinar Wise Men Solutions
RPA Webinar  Wise Men SolutionsRPA Webinar  Wise Men Solutions
RPA Webinar Wise Men Solutions
 
Webcast - Making kubernetes production ready
Webcast - Making kubernetes production readyWebcast - Making kubernetes production ready
Webcast - Making kubernetes production ready
 
MGT3342BUS - Architecting Data Protection with Rubrik - VMworld 2017
MGT3342BUS - Architecting Data Protection with Rubrik - VMworld 2017MGT3342BUS - Architecting Data Protection with Rubrik - VMworld 2017
MGT3342BUS - Architecting Data Protection with Rubrik - VMworld 2017
 
Why And When Should We Consider Stream Processing In Our Solutions Teqnation ...
Why And When Should We Consider Stream Processing In Our Solutions Teqnation ...Why And When Should We Consider Stream Processing In Our Solutions Teqnation ...
Why And When Should We Consider Stream Processing In Our Solutions Teqnation ...
 
Automic Support Tips and Tricks
Automic Support Tips and TricksAutomic Support Tips and Tricks
Automic Support Tips and Tricks
 
Service Ownership with PagerDuty and Rundeck: Help others help you
Service Ownership with PagerDuty and Rundeck:  Help others help you Service Ownership with PagerDuty and Rundeck:  Help others help you
Service Ownership with PagerDuty and Rundeck: Help others help you
 
From an idea to production: building a recommender for BBC Sounds
From an idea to production: building a recommender for BBC SoundsFrom an idea to production: building a recommender for BBC Sounds
From an idea to production: building a recommender for BBC Sounds
 
01_Team_03_CS_591_Project
01_Team_03_CS_591_Project01_Team_03_CS_591_Project
01_Team_03_CS_591_Project
 
Introduction to architecture exploration
Introduction to architecture explorationIntroduction to architecture exploration
Introduction to architecture exploration
 
Next generation business automation with the red hat decision manager and red...
Next generation business automation with the red hat decision manager and red...Next generation business automation with the red hat decision manager and red...
Next generation business automation with the red hat decision manager and red...
 
Complex event processing platform handling millions of users - Krzysztof Zarz...
Complex event processing platform handling millions of users - Krzysztof Zarz...Complex event processing platform handling millions of users - Krzysztof Zarz...
Complex event processing platform handling millions of users - Krzysztof Zarz...
 
byteLAKE's CFD Suite (AI-accelerated CFD) (2024-02)
byteLAKE's CFD Suite (AI-accelerated CFD) (2024-02)byteLAKE's CFD Suite (AI-accelerated CFD) (2024-02)
byteLAKE's CFD Suite (AI-accelerated CFD) (2024-02)
 
[Rakuten TechConf2014] [Fukuoka] Technologies that underlie service delivery
[Rakuten TechConf2014] [Fukuoka] Technologies that underlie service delivery[Rakuten TechConf2014] [Fukuoka] Technologies that underlie service delivery
[Rakuten TechConf2014] [Fukuoka] Technologies that underlie service delivery
 
sap basis transaction codes
sap basis transaction codessap basis transaction codes
sap basis transaction codes
 
Magento Live UK Nexcess Performance & Security Session
Magento Live UK Nexcess Performance & Security SessionMagento Live UK Nexcess Performance & Security Session
Magento Live UK Nexcess Performance & Security Session
 
The value of reactive
The value of reactiveThe value of reactive
The value of reactive
 
The Value of Reactive
The Value of ReactiveThe Value of Reactive
The Value of Reactive
 

More from LivePerson

More from LivePerson (20)

Microservices on top of kafka
Microservices on top of kafkaMicroservices on top of kafka
Microservices on top of kafka
 
Graph QL Introduction
Graph QL IntroductionGraph QL Introduction
Graph QL Introduction
 
Kubernetes your tests! automation with docker on google cloud platform
Kubernetes your tests! automation with docker on google cloud platformKubernetes your tests! automation with docker on google cloud platform
Kubernetes your tests! automation with docker on google cloud platform
 
Growing into a proactive Data Platform
Growing into a proactive Data PlatformGrowing into a proactive Data Platform
Growing into a proactive Data Platform
 
Measure() or die()
Measure() or die() Measure() or die()
Measure() or die()
 
Resilience from Theory to Practice
Resilience from Theory to PracticeResilience from Theory to Practice
Resilience from Theory to Practice
 
Mobile app real-time content modifications using websockets
Mobile app real-time content modifications using websocketsMobile app real-time content modifications using websockets
Mobile app real-time content modifications using websockets
 
Mobile SDK: Considerations & Best Practices
Mobile SDK: Considerations & Best Practices Mobile SDK: Considerations & Best Practices
Mobile SDK: Considerations & Best Practices
 
Functional programming with Java 8
Functional programming with Java 8Functional programming with Java 8
Functional programming with Java 8
 
Apache Avro in LivePerson [Hebrew]
Apache Avro in LivePerson [Hebrew]Apache Avro in LivePerson [Hebrew]
Apache Avro in LivePerson [Hebrew]
 
Apache Avro and Messaging at Scale in LivePerson
Apache Avro and Messaging at Scale in LivePersonApache Avro and Messaging at Scale in LivePerson
Apache Avro and Messaging at Scale in LivePerson
 
Data compression in Modern Application
Data compression in Modern ApplicationData compression in Modern Application
Data compression in Modern Application
 
Support Office Hour Webinar - LivePerson API
Support Office Hour Webinar - LivePerson API Support Office Hour Webinar - LivePerson API
Support Office Hour Webinar - LivePerson API
 
SIP - Introduction to SIP Protocol
SIP - Introduction to SIP ProtocolSIP - Introduction to SIP Protocol
SIP - Introduction to SIP Protocol
 
Scalding: Reaching Efficient MapReduce
Scalding: Reaching Efficient MapReduceScalding: Reaching Efficient MapReduce
Scalding: Reaching Efficient MapReduce
 
Building Enterprise Level End-To-End Monitor System with Open Source Solution...
Building Enterprise Level End-To-End Monitor System with Open Source Solution...Building Enterprise Level End-To-End Monitor System with Open Source Solution...
Building Enterprise Level End-To-End Monitor System with Open Source Solution...
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
From a Kafkaesque Story to The Promised Land at LivePerson
From a Kafkaesque Story to The Promised Land at LivePersonFrom a Kafkaesque Story to The Promised Land at LivePerson
From a Kafkaesque Story to The Promised Land at LivePerson
 
How can A/B testing go wrong?
How can A/B testing go wrong?How can A/B testing go wrong?
How can A/B testing go wrong?
 
Introduction to Vertica (Architecture & More)
Introduction to Vertica (Architecture & More)Introduction to Vertica (Architecture & More)
Introduction to Vertica (Architecture & More)
 

Recently uploaded

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Recently uploaded (20)

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 

System Revolution- How We Did It

  • 1.
  • 2. System revolution How we did it Victor Perepelitsky questions: www.meetup.com/ILTechTalks/events/226834931/ slideshare: www.slideshare.net/victorperepelitsky email: victor.prp@gmail.com
  • 3. LivePerson customer example salesman visitor from UK chat lines get session state activity revents chat lines sales manager invite chat UK visitors see reports invite 3
  • 4. LivePerson at a glance 4 ● Account (brand) - LivePerson customer ● Visitor - individuals who interacts with the business owner’s brand ● Agent - an account representative who may interact with visitors (examples: technical support, sales) ● Admin - an account representative who defined the business goals and normally manages agents in order to effectively reach them
  • 5. LivePerson at a glance agent visitor Chat scale (2K req/sec) Visitor scale (100K req/sec) chat lines get session state activity revents chat lines admin define business rules see reports Admin scale (under 100 req/sec) invite 5
  • 6. Legacy agent visitor Chat scale (2K req/sec) Visitor scale (100K req/sec) chat lines get session state activity revents chat lines admin define business rulessee reports Admin scale (under 100 req/sec) Real Time Server Offline and Reporting 6
  • 7. Legacy - stateful + account sticky session from account B RT server E, F, G RT server A, C RT server B, D web server web server session from account A 7
  • 8. Legacy ● Works ● Fast ● Partially resilient ● Huge amount of features 8
  • 9. Legacy - pains ● Hard to scale ● Hard to add new features ● Poor resource utilization ● Poor manageability ● Poor QoS ● Huge friction with customers 9
  • 10. Let's go back agent visitor Chat scale (2K req/sec) Visitor scale (100K req/sec) chat lines get session state activity revents chat lines admin define business rules see reports Admin scale (under 100 req/sec) invite 10
  • 11. Proper system architecture agent visitor Chat scale (2K req/sec) Visitor scale (100K req/sec) chat lines get session state activity revents chat lines admin define business rulessee reports Admin scale (under 100 req/sec) real time offline reporting config 11
  • 12. The new dream agent visitor Chat scale (2K req/sec) Visitor scale (100K req/sec) chat lines session state activity reventschat lines admin business rules see reports Admin scale (under 1K req/sec) chat offline reporting config monitor and engage * Business App / Extension 12
  • 13. Monitor and engage = shark Shark manifesto ● Collects and makes available data about individuals (visitors) as they interact with the business owner’s brand (account) ● Acts in real-time to engage visitors (chat, ad, call etc..) ● Is a platform for a business logic modules (sharklets) which might be independently developed and deployed 13
  • 15. Platform requirements ● E2E latency within DC < 30 mills ● Good resources utilization (CPU > 50%) ● Efficient - At least 500 req/sec per node ● Sharklet development lifecycle is independent ● High Availability ○ uptime > 99.99999% ○ data loss < 0.01% ● Resilient - no service downtime when external resource is unavailable (minimal degradation is allowed) ● Business logic correctness - 99.9% 15
  • 16. Fundamental decisions Requirements? -> defined Stateful or stateless? 16
  • 18. Stateless session 1 session 2 session 3 session 4 session data Each request potentially requires access to session data store 18
  • 19. Facts that helped us to decide 1. Legacy works as “Stateful without HA” 2. A small data loss has a tiny customer impact (0.01% loss is good enough) 3. Stateless requires much more resources and initial effort 4. We can add HA store in the future 19
  • 20. Stateful shark ACCOUNT Nsession B RT server E, F, G RT server A, C RT server D web server web server session A NN , B 20
  • 21. Fundamental decisions Requirements? -> defined Stateful or stateless? What are the big parts? 21
  • 22. What are the big parts? 22
  • 23. Legacy - successful patterns 1. Requests are processed in memory 2. External resources are accessed asynchronously to visitor requests 3. Customer Rules and Data (AccountConfig) are kept in memory and may be updated on background 23
  • 24. Legacy - pains 1. Order of calls (inside code + rules) 2. Business logic are not pluggable components 3. Http requests are tightly coupled within logical levels (hard to move toward other protocols as WebSockets) 24
  • 25. 25
  • 26. SYNC - Fast CEP, engagements ASYNC - slow actions, external resources access sharklet A (sync handlers) sharklet A (async handlers) web visitor agent mobile visitor facadeadapter adapter adapter Account Runtime Data Message BUS external resource 26
  • 27. Shark - The Big Parts 1. Facade - decouples real world protocols from the logical layers 2. CEP - avoids call order management 3. Sync - very fast in memory processing 4. Async - allows slow actions and ext resources access 5. Account Runtime Store - allows in memory access to customer configuration 27
  • 28. Fundamental decisions Requirements? -> defined Stateful or stateless? -> stateful What are the big parts? -> we have it Basic technology stack 28
  • 30. We were practical CEP technology? 30
  • 31. CEP - in a nutshell 31
  • 32. Drools - in a nutshell 32
  • 33. Drools - we tried to kill it We had ● played with it - :) ● integrated into shark - :) ● made a POC using LivePerson logic - :) ● tested for performance - :( 33
  • 34. We played with more technologies 34
  • 35. And finally chose the solution 35
  • 36. Shark CEP - processing cycle handler 1 handler 2 handler 3 Event Queue b a 36
  • 37. Shark CEP - processing cycle handler 1 handler 2 handler 3 Event Queue a b 37
  • 38. Shark CEP - processing cycle handler 1 handler 2 handler 3 Event Queue ba a 38
  • 39. Shark CEP - processing cycle handler 1 handler 2 handler 3 Event Queue b c 39
  • 40. Shark CEP - processing cycle handler 1 handler 2 handler 3 Event Queue b c 40
  • 41. Shark CEP - processing cycle handler 1 handler 2 handler 3 Event Queue b c 41
  • 42. Shark CEP - processing cycle handler 1 handler 2 handler 3 Event Queue 42
  • 44. Fundamental decisions Stateful or stateless? -> stateful What are the big parts? -> we have it Basic technology stack -> choosed CEP - Technology choice -> DIY (inhouse) 44
  • 45. Fundamental decisions Stateful or stateless? -> stateful What are the big parts? -> we have it Basic technology stack -> choosed CEP - Technology choice -> DIY (inhouse) Locking architecture 45
  • 46. Locking - The model The world account A session 1 session 1 session 1 session 4 46
  • 47. Locking - Legacy pains ● You must be aware of locking when writing a business logic ● Write lock on account freezes all account operations ● Locking became the bottleneck (Not a CPU) ● BUGs 47
  • 48. Locking - Shark solution ● Read/Write lock for session ● Write business logic only - no locking awareness ● No write lock on account - copy on write 48
  • 49. SYNC - A single proc cycle uses consistent account data copy ASYNC - updates account data using copy on write pattern sharklet A (sync handlers) sharklet A (async handlers) web visitor agent mobile visitor facadeadapter adapter adapter Account Runtime Data external resource 49
  • 50. Sharklet example (no locks) 50
  • 51. Fundamental decisions Stateful or stateless? -> stateful What are the big parts? -> we have it Basic technology stack -> choosed CEP - Technology choice -> DIY (inhouse) Locking architecture -> decided 51
  • 52. We had a good start 52
  • 53. But! We were alone 53
  • 54. LiveEngage - the big decision 54
  • 55. Dream = LiveEngage platform agent visitor Chat scale (2K req/sec) Visitor scale (100K req/sec) chat lines session state activity reventschat lines admin business rules see reports Admin scale (under 1K req/sec) chat offline reporting config monitor and engage * Business App / Extension 55
  • 56. Rules - from definition to runtime visitor activity revents admin business rules config monitor and engage * Business App / Extension if the visitor meets the conditions -> invite to chat 56
  • 58. What is rules engine Rules engine serves as pluggable software component which executes business rules These rules are externalized or separated from application code 58
  • 59. Rules engine implementation Boolean logic is the easy part 59
  • 60. Rules engine implementation Hard to detect which conditions must be evaluated new Fact 60
  • 61. Rules engine implementation Hard to implement drools like DSL 61
  • 62. Rules Engine - How to make it happen? 62 ● Drools - Eats memory ● Legacy rules engine ○ Customer friction is too high ○ Not efficient
  • 63. 63
  • 64. 64
  • 65. GRF - Generic Rules Framework Conditions and outcomes are building blocks that can be used for complex rules creation hard coded building blocks TimeOnPage GeoLocation InviteToChat rule if ( timeOnPage(5) and geoLocation(“US”) )execute{ inviteToChat() } 65
  • 66. GRF + CEP = RulesEngine GeoLocation condition trigger when (geo data is changed) evaluate(geo, accountConfig){ if (geo == accountConfig.geo) TRUE else FALSE } Condition type implementor defines the evaluation trigger instead of automatic detection 66
  • 67. Shark Rules Engine (Condition) 67
  • 69. SYNC - Detects which conditions should be evaluated and trigger GRF ASYNC - loades rules to shark rules engine sharklet A (sync handlers) sharklet A (async handlers) web visitor agent mobile visitor facadeadapter adapter adapter Account Runtime Data Message BUS Account Config Rules Engine 69
  • 70. We did a little more AND Felt ready to go 70
  • 71. SYNC - CEP, Rules, Report-Sharklet ASYNC - integrated with account config sharklet B sharklet B web visitor agent mobile visitor facadeadapter adapter adapter Rules Engine Account Config Account Runtime Data Message BUS sharklet A sharklet A Account Config Service 71
  • 72. Feel the field Legacy agent visitoradmin activities - Silent mode 72
  • 73. The dream comes true agent visitor chat lines session state activity reventschat lines admin business rules see reports chat offline reporting config monitor and engage * Business App / Extension 73
  • 74. Platform in action Legacy chat agent visitoradmin activities engagements Account Config Reports First small customers 74
  • 75. Shark We started with small cluster And just added servers with business growth 75
  • 76. We recognized major bottlenecks 76
  • 78. Tools and techniques ● Statistics monitoring ● Testing methodology ● Java 8 ● Notes about G1 78
  • 79. Statistics monitoring - graphite 79
  • 80. Statistics monitoring - graphite 80
  • 81. Statistics monitoring - metrics https://github.com/dropwizard/metrics http://metrics.dropwizard.io private final Timer responses = metrics.timer(name(RequestHandler.class, "responses")); public String handleRequest(Request request, Response response) { final Timer.Context context = responses.time(); try { // etc; return "OK"; } finally { context.stop(); } } 81
  • 82. Testing methodology ● Unit test - use it ● Integration test - invest here ● System test - try to minimize effort ● Performance ○ Integration - worth it ○ System - choose your tests 82
  • 85. Testing methodology How did we test platform? We had ● built main code with tests in mind ● mocked our clients 85
  • 86. Java 8 ● We moved to java 8 one year ago ● It was easy :) ● Pushed us to ○ more expressive code ○ functional style ○ immutability search on youtube - LivePerson Functional Java 8 86
  • 87. Notes about G1 ● Designed for big heaps and minimizes big pauses ● Is considered to be the default GC in java 9 ● We have tested our system with G1 when 12 GB was used and ○ received good results (no big GC paused) 87
  • 88. 88
  • 89. We are happy now ● Horizontal scalability ● Independent and safe business logic development ● Fast development cycles (platform, sharklets, data-model) ● Efficient resource utilization ● Less BUGs (Easier to fix) ● Better QoS ● Overall confidence 89
  • 90. Numbers ____________________________________ Pick statistics Shark Legacy Concurrent visitors ~ 100K ~ 1 Million Request/Sec ~ 11K ~ 110K Machines ~ 34 ~700 Cores ~ 224 ~ 6300 Cost per visitor ~ 0.001 ~ 0.006 90
  • 91. Future challenges and ideas ● Better High availability ● Deployment with no downtime ● Management tools ● 100K accounts 91
  • 92. Tips ● Define scope and requirements ● Company commitment is a must ● Work with your clients ● Treat test code as if it runs in production ● Automated perf tests - it helps ● Sometimes DIY is the best solution ● Respect legacy - combine old ideas with new technologies ● Understand the complexity and find the simplest solution 92
  • 94. THANK YOU! We are hiring 94