SlideShare a Scribd company logo
1 of 108
Download to read offline
Scalability & Big Data challenges in
Real-Time Multiplayer
games
Real-Time games in Top 100 Grossing (2017)
2014
2015
2016
2017
(3)
(6)
(8)
(13)
2018 ???
Enabling Factors
Source: PC Mag
Enabling Factors
Source: OpenSignal
Enabling Factors
QUIZ TIME: In 2017, which of these games
has made the most revenue?
The world’s most popular
MOBA on PC
The world’s most popular
First Person Shooter
Some game by Blizzard
Some game by EA
A Chinese 5v5 mobile
game you never hear of
Some game by King
The world’s most popular
MOBA on PC
The world’s most popular
First Person Shooter
Some game by Blizzard
Some game by EA
A Chinese 5v5 mobile
game you never heard of
Some game by King
QUIZ TIME: In 2017, which of these games
has made the most revenue?
>$400M Monthly Revenue
Source: Bloomberg
>80M DAU
Source: Tencent
10-20 inputs/s, sensitive to lags (> 300ms)
unpredictable network, limited bandwidth
Decisions, decisions...
Build vs Buy?
Self-hosted vs Cloud?
Global deployment vs Centralized?
TCP vs UDP?
Server Authoritative vs Lock-Step?
Constraints/Trade-offs
Latency (RTT)
Cost
Complexity
Scalability
Operational overhead
Global Deployment
vs
Centralised
10-20 inputs/s, sensitive to lags (> 300ms)
optimize for this
Global Deployment
● Players are geo-routed to closest multiplayer server.
● Matched with other players in the same geo-region for best UX.
● No need for players to “choose server”, it should just work.
Global Deployment
● Should leaderboards be global or regional?
● Should guilds/alliances be global or regional?
● Should chatrooms be global or regional?
● Should liveops events be global or regional?
● Should players be allowed to play with others in another region?
ie. play with distant relatives/friends.
● Should players be allowed to switch default region?
eg. moved to Europe after Brexit
Server Authoritative
vs
Lock-Step
Server Authoritative
● Server decides game logic.
● Client sends all inputs to server.
● Client receives game state (either full, or delta) from server.
Server Authoritative
● Server decides game logic.
● Client sends all inputs to server.
● Client receives game state (either full, or delta) from server.
● Client keeps internal state for game world, which mirrors server state.
● Client doesn’t modify world state directly, only display with some
prediction to mask network latency.
Client 1 Client 2Server
C1 control 1 C2 control 1
game state 1
Client 1 Client 2Server
C1 control 1 C2 control 1
C2 control 2
game state 1
game state 2
Client 1 Client 2Server
C1 control 1 C2 control 1
C2 control 2
game state 1
game state 2
Client 1 Client 2Server
C1 control 1 C2 control 1
C2 control 2
game state 1
game state 2
game state 3
C1 control 1
C2 control 1
C2 control 2
game state 3
C1 control 1
C2 control 1
C2 control 2
C2 control 3
Client 1 Client 2Server
C1 control 1 C2 control 1
C2 control 2
C2 control 3
game state 1
game state 2
game state 3
C1 control 1
C2 control 1
C2 control 2
game state 3
C1 control 1
C2 control 1
C2 control 2
game state 4
Client 1 Client 2Server
C1 control 1 C2 control 1
C2 control 2
C2 control 3
game state 1
game state 2
game state 3
C1 control 1
C2 control 1
C2 control 2
game state 3
C1 control 1
C2 control 1
C2 control 2
game state 4
Client 1 Client 2Server
C1 control 1 C2 control 1
C2 control 2
C2 control 3
game state 1
game state 2
game state 3
C1 control 1
C2 control 1
C2 control 2
game state 3
C1 control 1
C2 control 1
C2 control 2
game state 5
C2 control 3
game state 4
Client 1 Client 2Server
C1 control 1 C2 control 1
C2 control 2
C2 control 3
game state 1
game state 2
game state 3
C1 control 1
C2 control 1
C2 control 2
game state 3
C1 control 1
C2 control 1
C2 control 2
game state 5
C2 control 3
game state 4
Pros
● Always in-sync.
● Hard to cheat - no memory hacks, etc.
● Easy (and quick) to join mid-match.
● Server can detect lagged/DC’d client and take over with AI.
Cons
● High server load.
● High bandwidth usage.
● Synchronization on the client is complicated.
● Little experience in the company with server-side .Net stack.
(bus factor of 1)
● .NetCore was/is still a moving target.
high server load and
bandwidth needs
client has to receive
more data
Lock-Step*
● Client sends all inputs to server.
● Server collects all inputs, and buffers them.
● Server sends all buffered inputs to all clients X times a second.
* traditional RTS games tend to use peer-to-peer model
Lock-Step*
● Client sends all inputs to server.
● Server collects all inputs, and buffers them.
● Server sends all buffered inputs to all clients X times a second.
● Client executes all inputs in the same order.
● Because everyone is 'guaranteed' to have executed the same input at
the same frame in the same order, we get synchronicity.
● Use prediction to mask network latency.
* traditional RTS games tend to use peer-to-peer model
Client 1 Client 2Server
C1 control 1 C2 control 1
C2 control 2
C2 control 3
C1 control 1
C2 control 1
C2 control 2
C1 control 1
C2 control 1
C2 control 2
C2 control 3
inputs, instead
of game state
Client 1 Client 2Server
C1 control 1 C2 control 1
C2 control 2
C2 control 3
C1 control 1
C2 control 1
C2 control 2
C1 control 1
C2 control 1
C2 control 2
C2 control 3
RTT: time between sending an input
to receiving it back from server
Client 1 Client 2Server
C1 control 1 C2 control 1
C2 control 2
C2 control 3
C1 control 1
C2 control 1
C2 control 2
C1 control 1
C2 control 1
C2 control 2
C2 control 3
Client 1 Client 2Server
C1 control 1 C2 control 1
C2 control 2
C2 control 3
C1 control 1
C2 control 1
C2 control 2
C1 control 1
C2 control 1
C2 control 2
C2 control 3
RTT
frame time
Client 1 Client 2Server
C1 control 1 C2 control 1
C2 control 2
C2 control 3
C1 control 1
C2 control 1
C2 control 2
C1 control 1
C2 control 1
C2 control 2
C2 control 3
RTT
frame time
RTT = latency x 2 + X
Xmin = 0, Xmax = frame time
Pros
● Light server load.
● Lower bandwidth usage.
● Simpler server implementation.
Cons
● Needs deterministic game engine.
● Unity has long-standing determinism problem with floating point.
● Hackable, requires some form of server-side validation.
● All clients must take over lagged/DC’d client with AI.
● Slower to join mid-match, need to process all inputs.
● Need to ensure all clients in a match are compatible.
fix-point math,
server validation, ...
bandwidth
Build vs Buy
Pros
● Easy to use.
● Already use it for prototype games.
● Multi-region, lobby, etc. come out-of-the-box.
● Had a long time to optimize their solution.
Cons
● Quite expensive, pay for provisioned peak monthly CCU.
● “can we bet the future of our company on a third-party?”.
● Unknown global distribution at scale
● Accessibility of support.
● Limited extensibility.
● Runs on Windows.
So, we decided to build our
own networking stack
+
A model for describing computation, coined by
Carl Hewitt & co in 1973.
Later popularised by Erlang.
Actor Model
Carl Hewitt
Everything is an actor.
Every actor has a mailbox.
An actor is the fundamental unit that embodies
the 3 essential things for computation:
● processing
● storage
● communications
Actor Model
Actors don’t share memory, they communicate
only via messages.
When an actor receives a message, it can:
● create new actors
● send messages to other actors
● do work
Actor Model
Actors don’t share memory, they communicate
only via messages.
When an actor receives a message, it can:
● create new actors
● send messages to other actors
● do work
Actor Model Johnny?
Not sharing memory prevents cascade failures when an actor crashes.
Ericsson AXD301
Inside an actor, messages are processed one-at-a-time, in a
single-threaded fashion.
No need for locks!
Actor Model
single-threaded
Inside an actor, messages are processed one-at-a-time, in a
single-threaded fashion.
No need for locks!
Simplifies concurrency, no deadlocks, race conditions, etc.
Actor Model
single-threaded
Lifts concurrency management to the mailbox.
Allows you to “think globally, but act locally”.
Actor Model
Lifts concurrency management to the mailbox.
Allows you to “think globally, but act locally”.
Easier to think about a complex system in terms of states and
transitions, than to manage state mutations.
Actor Model
MATCH 1
C1 input
C2 input
current frame history
frame 1
frame 2
frame 3
buffering
connection open
MATCH 1
C1 input
C2 input
current frame history
frame 1
frame 2
frame 3
buffering
connection open
authenticate
MATCH 1
C1 input
C2 input
current frame history
frame 1
frame 2
frame 3C3 joined
buffering
connection open
authenticate
send/receive
MATCH 1
C1 input
C2 input
current frame history
frame 1
frame 2
frame 3C3 joined
buffering
MATCH 1
C1 input
C2 input
current frame history
frame 1
frame 2
frame 3C3 joined
C3 input
connection open
authenticate
send/receive
buffering
MATCH 1
C1 input
C2 input
current frame history
frame 1
frame 2
frame 3C3 joined
C3 input
connection open
authenticate
send/receive
buffering
broadcast!
MATCH 1
current frame history
frame 1
frame 2
frame 3
C1 input
C2 input
C3 joined
C3 input
connection open
authenticate
send/receive
buffering
broadcast!
MATCH 1
current frame history
frame 1
frame 2
frame 3
frame 4
connection open
authenticate
send/receive
buffering
broadcast!
MATCH 1
current frame history
frame 1
frame 2
frame 3
frame 4
connection open
authenticate
send/receive
buffering
broadcast!
C3 input
concurrency
MATCH 1
current frame history
frame 1
frame 2
frame 3
...
C1 input
C2 input
C3 joined
C3 input
connection open
authenticate
send/receive
buffering
broadcast!
C1 input
MATCH 1
current frame history
frame 1
frame 2
frame 3
...
C1 input
C2 input
C3 joined
C3 input
buffering
broadcast!
C1 input
C2 input
MATCH MATCH MATCH MATCH MATCH
MATCH MATCH MATCH MATCH MATCH
MATCH MATCH MATCH MATCH MATCH
MATCH MATCH MATCH MATCH MATCH
MATCH MATCH MATCH MATCH MATCH
MATCH MATCH MATCH MATCH MATCH
MATCH
C1 input
C2 input
current frame history
frame 1
frame 2
frame 3C3 joined
connection open
authenticate
send/receive
buffering
broadcast!
MATCH
C1 input
C2 input
current frame history
frame 1
frame 2
frame 3C3 joined
connection open
authenticate
send/receive
buffering
broadcast!
MATCH
current frame history
frame 1
frame 2
frame 3
C1 input
C2 input
C3 joined
Socket
actor
Match
actor
MATCH
current frame history
frame 1
frame 2
frame 3
C1 input
C2 input
C3 joined
Root Aggregate
Socket
actor
Match
actor
MATCH
current frame history
frame 1
frame 2
frame 3
C1 input
C2 input
C3 joined
Root Aggregate
Socket
actor
Match
actor
MATCH
current frame history
frame 1
frame 2
frame 3
C1 input
C2 input
C3 joined
MATCH
current frame history
frame 1
frame 2
frame 3
C1 input
C2 input
C3 joined
C3 joined
act locally
think globally
how actors interact with each other
aka, the “protocol”
the secret to building high
performance systems is simplicity
complexity kills performance
Higher CCU per server
Fewer servers
Lower cost
Less operational overhead
Performance Matters
We should forget about small
efficiencies, say about 97% of the
time: premature optimization is
the root of all evil. Yet we should
not pass up our opportunities in
that critical 3%.
Performance Matters
We should forget about small
efficiencies, say about 97% of the
time: premature optimization is
the root of all evil. Yet we should
not pass up our opportunities in
that critical 3%.
Performance Matters
Threads are heavy OS constructs.
Each thread is allocated 1MB stack space by default.
Context Switching is expensive at scale.
Actors are cheap.
Actor system can optimise use of threads to minimise context switching.
Actor Model
>
Non-blocking I/O framework for JVM.
Highly performant.
Simplifies implementation of socket servers (TCP/ UDP).
UDP support is “meh”...
Netty
Custom network protocol (bandwidth).
Buffer pooling (GC pressure).
Minimise Netty object creations (GC pressure).
Using direct buffers (GC pressure).
Disable Nagle's algorithm (latency).
Epoll.
Performance Tuning
AWS Lambda functions to run bot clients (written with Akka):
● Cheaper
● Faster to boot up
● Easy to update
Each Lambda invocation could simulate up to 100 bots.
Automated Load Testing
from US-EAST (Lambda)
to EU-WEST (game server)
optimize for tail latencies
from US-EAST (Lambda)
to EU-WEST (game server)
http://bit.ly/2xgGHXZ
Thank You!
QUESTIONS?

More Related Content

Similar to Scalability & Big Data challenges in real time multiplayer games

GamingAnywhere: An Open Cloud Gaming System
GamingAnywhere: An Open Cloud Gaming SystemGamingAnywhere: An Open Cloud Gaming System
GamingAnywhere: An Open Cloud Gaming System
Academia Sinica
 
Multiplayer Networking Game
Multiplayer Networking GameMultiplayer Networking Game
Multiplayer Networking Game
Tanmay Krishna
 
mloc.js 2014 - JavaScript and the browser as a platform for game development
mloc.js 2014 - JavaScript and the browser as a platform for game developmentmloc.js 2014 - JavaScript and the browser as a platform for game development
mloc.js 2014 - JavaScript and the browser as a platform for game development
David Galeano
 
Identifying Hotspots in Software Build Processes
Identifying Hotspots in Software Build ProcessesIdentifying Hotspots in Software Build Processes
Identifying Hotspots in Software Build Processes
Shane McIntosh
 

Similar to Scalability & Big Data challenges in real time multiplayer games (20)

Game Networking for Online games
Game Networking for Online gamesGame Networking for Online games
Game Networking for Online games
 
NetRacer for the Commodore 64
NetRacer for the Commodore 64NetRacer for the Commodore 64
NetRacer for the Commodore 64
 
Building Multiplayer Games (w/ Unity)
Building Multiplayer Games (w/ Unity)Building Multiplayer Games (w/ Unity)
Building Multiplayer Games (w/ Unity)
 
GamingAnywhere: An Open Cloud Gaming System
GamingAnywhere: An Open Cloud Gaming SystemGamingAnywhere: An Open Cloud Gaming System
GamingAnywhere: An Open Cloud Gaming System
 
Tech solutions and tricks in real time mobile multiplayer
Tech solutions and tricks in real time mobile multiplayerTech solutions and tricks in real time mobile multiplayer
Tech solutions and tricks in real time mobile multiplayer
 
Building fast,scalable game server in node.js
Building fast,scalable game server in node.jsBuilding fast,scalable game server in node.js
Building fast,scalable game server in node.js
 
Hadean: How We Tackled A Gaming World Record
Hadean: How We Tackled A Gaming World RecordHadean: How We Tackled A Gaming World Record
Hadean: How We Tackled A Gaming World Record
 
Multiplayer Networking Game
Multiplayer Networking GameMultiplayer Networking Game
Multiplayer Networking Game
 
Harlan Beverly Lag The Barrier to innovation gdc austin 2009
Harlan Beverly Lag The Barrier to innovation gdc austin 2009Harlan Beverly Lag The Barrier to innovation gdc austin 2009
Harlan Beverly Lag The Barrier to innovation gdc austin 2009
 
Reliving the history of multiplayer games
Reliving the history of multiplayer gamesReliving the history of multiplayer games
Reliving the history of multiplayer games
 
Architecting for the Cloud: Hoping for the best, prepared for the worst
Architecting for the Cloud: Hoping for the best, prepared for the worstArchitecting for the Cloud: Hoping for the best, prepared for the worst
Architecting for the Cloud: Hoping for the best, prepared for the worst
 
mloc.js 2014 - JavaScript and the browser as a platform for game development
mloc.js 2014 - JavaScript and the browser as a platform for game developmentmloc.js 2014 - JavaScript and the browser as a platform for game development
mloc.js 2014 - JavaScript and the browser as a platform for game development
 
Identifying Hotspots in Software Build Processes
Identifying Hotspots in Software Build ProcessesIdentifying Hotspots in Software Build Processes
Identifying Hotspots in Software Build Processes
 
111223_Ext_Cloud+Gaming+Latency_GFN_Perspective.pdf
111223_Ext_Cloud+Gaming+Latency_GFN_Perspective.pdf111223_Ext_Cloud+Gaming+Latency_GFN_Perspective.pdf
111223_Ext_Cloud+Gaming+Latency_GFN_Perspective.pdf
 
[db tech showcase Tokyo 2017] A11: SQLite - The most used yet least appreciat...
[db tech showcase Tokyo 2017] A11: SQLite - The most used yet least appreciat...[db tech showcase Tokyo 2017] A11: SQLite - The most used yet least appreciat...
[db tech showcase Tokyo 2017] A11: SQLite - The most used yet least appreciat...
 
Multicast tutorial v3
Multicast tutorial v3Multicast tutorial v3
Multicast tutorial v3
 
Photon vs UNET: Battle of the Giants
Photon vs UNET: Battle of the GiantsPhoton vs UNET: Battle of the Giants
Photon vs UNET: Battle of the Giants
 
Architecting for the Cloud: Hoping for the Best, Prepared for the Worst
Architecting for the Cloud: Hoping for the Best, Prepared for the WorstArchitecting for the Cloud: Hoping for the Best, Prepared for the Worst
Architecting for the Cloud: Hoping for the Best, Prepared for the Worst
 
GDC Next 2013 - Synching Game States Across Multiple Devices
GDC Next 2013 - Synching Game States Across Multiple DevicesGDC Next 2013 - Synching Game States Across Multiple Devices
GDC Next 2013 - Synching Game States Across Multiple Devices
 
SJNC13.pptx
SJNC13.pptxSJNC13.pptx
SJNC13.pptx
 

More from Yan Cui

How serverless changes the cost paradigm
How serverless changes the cost paradigmHow serverless changes the cost paradigm
How serverless changes the cost paradigm
Yan Cui
 

More from Yan Cui (20)

How to win the game of trade-offs
How to win the game of trade-offsHow to win the game of trade-offs
How to win the game of trade-offs
 
How to choose the right messaging service
How to choose the right messaging serviceHow to choose the right messaging service
How to choose the right messaging service
 
How to choose the right messaging service for your workload
How to choose the right messaging service for your workloadHow to choose the right messaging service for your workload
How to choose the right messaging service for your workload
 
Patterns and practices for building resilient serverless applications.pdf
Patterns and practices for building resilient serverless applications.pdfPatterns and practices for building resilient serverless applications.pdf
Patterns and practices for building resilient serverless applications.pdf
 
Lambda and DynamoDB best practices
Lambda and DynamoDB best practicesLambda and DynamoDB best practices
Lambda and DynamoDB best practices
 
Lessons from running AppSync in prod
Lessons from running AppSync in prodLessons from running AppSync in prod
Lessons from running AppSync in prod
 
Serverless observability - a hero's perspective
Serverless observability - a hero's perspectiveServerless observability - a hero's perspective
Serverless observability - a hero's perspective
 
How to ship customer value faster with step functions
How to ship customer value faster with step functionsHow to ship customer value faster with step functions
How to ship customer value faster with step functions
 
How serverless changes the cost paradigm
How serverless changes the cost paradigmHow serverless changes the cost paradigm
How serverless changes the cost paradigm
 
Why your next serverless project should use AWS AppSync
Why your next serverless project should use AWS AppSyncWhy your next serverless project should use AWS AppSync
Why your next serverless project should use AWS AppSync
 
Build social network in 4 weeks
Build social network in 4 weeksBuild social network in 4 weeks
Build social network in 4 weeks
 
Patterns and practices for building resilient serverless applications
Patterns and practices for building resilient serverless applicationsPatterns and practices for building resilient serverless applications
Patterns and practices for building resilient serverless applications
 
How to bring chaos engineering to serverless
How to bring chaos engineering to serverlessHow to bring chaos engineering to serverless
How to bring chaos engineering to serverless
 
Migrating existing monolith to serverless in 8 steps
Migrating existing monolith to serverless in 8 stepsMigrating existing monolith to serverless in 8 steps
Migrating existing monolith to serverless in 8 steps
 
Building a social network in under 4 weeks with Serverless and GraphQL
Building a social network in under 4 weeks with Serverless and GraphQLBuilding a social network in under 4 weeks with Serverless and GraphQL
Building a social network in under 4 weeks with Serverless and GraphQL
 
FinDev as a business advantage in the post covid19 economy
FinDev as a business advantage in the post covid19 economyFinDev as a business advantage in the post covid19 economy
FinDev as a business advantage in the post covid19 economy
 
How to improve lambda cold starts
How to improve lambda cold startsHow to improve lambda cold starts
How to improve lambda cold starts
 
What can you do with lambda in 2020
What can you do with lambda in 2020What can you do with lambda in 2020
What can you do with lambda in 2020
 
A chaos experiment a day, keeping the outage away
A chaos experiment a day, keeping the outage awayA chaos experiment a day, keeping the outage away
A chaos experiment a day, keeping the outage away
 
How to debug slow lambda response times
How to debug slow lambda response timesHow to debug slow lambda response times
How to debug slow lambda response times
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 

Recently uploaded (20)

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 

Scalability & Big Data challenges in real time multiplayer games

  • 1. Scalability & Big Data challenges in Real-Time Multiplayer games
  • 2. Real-Time games in Top 100 Grossing (2017) 2014 2015 2016 2017 (3) (6) (8) (13) 2018 ???
  • 6. QUIZ TIME: In 2017, which of these games has made the most revenue? The world’s most popular MOBA on PC The world’s most popular First Person Shooter Some game by Blizzard Some game by EA A Chinese 5v5 mobile game you never hear of Some game by King
  • 7. The world’s most popular MOBA on PC The world’s most popular First Person Shooter Some game by Blizzard Some game by EA A Chinese 5v5 mobile game you never heard of Some game by King QUIZ TIME: In 2017, which of these games has made the most revenue?
  • 8. >$400M Monthly Revenue Source: Bloomberg >80M DAU Source: Tencent
  • 9.
  • 10. 10-20 inputs/s, sensitive to lags (> 300ms)
  • 12.
  • 13. Decisions, decisions... Build vs Buy? Self-hosted vs Cloud? Global deployment vs Centralized? TCP vs UDP? Server Authoritative vs Lock-Step?
  • 16.
  • 17.
  • 18. 10-20 inputs/s, sensitive to lags (> 300ms)
  • 19.
  • 20.
  • 22. Global Deployment ● Players are geo-routed to closest multiplayer server. ● Matched with other players in the same geo-region for best UX. ● No need for players to “choose server”, it should just work.
  • 23. Global Deployment ● Should leaderboards be global or regional? ● Should guilds/alliances be global or regional? ● Should chatrooms be global or regional? ● Should liveops events be global or regional? ● Should players be allowed to play with others in another region? ie. play with distant relatives/friends. ● Should players be allowed to switch default region? eg. moved to Europe after Brexit
  • 25. Server Authoritative ● Server decides game logic. ● Client sends all inputs to server. ● Client receives game state (either full, or delta) from server.
  • 26. Server Authoritative ● Server decides game logic. ● Client sends all inputs to server. ● Client receives game state (either full, or delta) from server. ● Client keeps internal state for game world, which mirrors server state. ● Client doesn’t modify world state directly, only display with some prediction to mask network latency.
  • 27. Client 1 Client 2Server C1 control 1 C2 control 1 game state 1
  • 28. Client 1 Client 2Server C1 control 1 C2 control 1 C2 control 2 game state 1 game state 2
  • 29. Client 1 Client 2Server C1 control 1 C2 control 1 C2 control 2 game state 1 game state 2
  • 30. Client 1 Client 2Server C1 control 1 C2 control 1 C2 control 2 game state 1 game state 2 game state 3 C1 control 1 C2 control 1 C2 control 2 game state 3 C1 control 1 C2 control 1 C2 control 2 C2 control 3
  • 31. Client 1 Client 2Server C1 control 1 C2 control 1 C2 control 2 C2 control 3 game state 1 game state 2 game state 3 C1 control 1 C2 control 1 C2 control 2 game state 3 C1 control 1 C2 control 1 C2 control 2 game state 4
  • 32. Client 1 Client 2Server C1 control 1 C2 control 1 C2 control 2 C2 control 3 game state 1 game state 2 game state 3 C1 control 1 C2 control 1 C2 control 2 game state 3 C1 control 1 C2 control 1 C2 control 2 game state 4
  • 33. Client 1 Client 2Server C1 control 1 C2 control 1 C2 control 2 C2 control 3 game state 1 game state 2 game state 3 C1 control 1 C2 control 1 C2 control 2 game state 3 C1 control 1 C2 control 1 C2 control 2 game state 5 C2 control 3 game state 4
  • 34. Client 1 Client 2Server C1 control 1 C2 control 1 C2 control 2 C2 control 3 game state 1 game state 2 game state 3 C1 control 1 C2 control 1 C2 control 2 game state 3 C1 control 1 C2 control 1 C2 control 2 game state 5 C2 control 3 game state 4
  • 35.
  • 36. Pros ● Always in-sync. ● Hard to cheat - no memory hacks, etc. ● Easy (and quick) to join mid-match. ● Server can detect lagged/DC’d client and take over with AI.
  • 37. Cons ● High server load. ● High bandwidth usage. ● Synchronization on the client is complicated. ● Little experience in the company with server-side .Net stack. (bus factor of 1) ● .NetCore was/is still a moving target.
  • 38. high server load and bandwidth needs client has to receive more data
  • 39. Lock-Step* ● Client sends all inputs to server. ● Server collects all inputs, and buffers them. ● Server sends all buffered inputs to all clients X times a second. * traditional RTS games tend to use peer-to-peer model
  • 40. Lock-Step* ● Client sends all inputs to server. ● Server collects all inputs, and buffers them. ● Server sends all buffered inputs to all clients X times a second. ● Client executes all inputs in the same order. ● Because everyone is 'guaranteed' to have executed the same input at the same frame in the same order, we get synchronicity. ● Use prediction to mask network latency. * traditional RTS games tend to use peer-to-peer model
  • 41. Client 1 Client 2Server C1 control 1 C2 control 1 C2 control 2 C2 control 3 C1 control 1 C2 control 1 C2 control 2 C1 control 1 C2 control 1 C2 control 2 C2 control 3 inputs, instead of game state
  • 42. Client 1 Client 2Server C1 control 1 C2 control 1 C2 control 2 C2 control 3 C1 control 1 C2 control 1 C2 control 2 C1 control 1 C2 control 1 C2 control 2 C2 control 3 RTT: time between sending an input to receiving it back from server
  • 43. Client 1 Client 2Server C1 control 1 C2 control 1 C2 control 2 C2 control 3 C1 control 1 C2 control 1 C2 control 2 C1 control 1 C2 control 1 C2 control 2 C2 control 3
  • 44. Client 1 Client 2Server C1 control 1 C2 control 1 C2 control 2 C2 control 3 C1 control 1 C2 control 1 C2 control 2 C1 control 1 C2 control 1 C2 control 2 C2 control 3 RTT frame time
  • 45. Client 1 Client 2Server C1 control 1 C2 control 1 C2 control 2 C2 control 3 C1 control 1 C2 control 1 C2 control 2 C1 control 1 C2 control 1 C2 control 2 C2 control 3 RTT frame time RTT = latency x 2 + X Xmin = 0, Xmax = frame time
  • 46.
  • 47. Pros ● Light server load. ● Lower bandwidth usage. ● Simpler server implementation.
  • 48. Cons ● Needs deterministic game engine. ● Unity has long-standing determinism problem with floating point. ● Hackable, requires some form of server-side validation. ● All clients must take over lagged/DC’d client with AI. ● Slower to join mid-match, need to process all inputs. ● Need to ensure all clients in a match are compatible.
  • 50.
  • 51.
  • 54.
  • 55. Pros ● Easy to use. ● Already use it for prototype games. ● Multi-region, lobby, etc. come out-of-the-box. ● Had a long time to optimize their solution.
  • 56. Cons ● Quite expensive, pay for provisioned peak monthly CCU. ● “can we bet the future of our company on a third-party?”. ● Unknown global distribution at scale ● Accessibility of support. ● Limited extensibility. ● Runs on Windows.
  • 57. So, we decided to build our own networking stack
  • 58. +
  • 59. A model for describing computation, coined by Carl Hewitt & co in 1973. Later popularised by Erlang. Actor Model Carl Hewitt
  • 60. Everything is an actor. Every actor has a mailbox. An actor is the fundamental unit that embodies the 3 essential things for computation: ● processing ● storage ● communications Actor Model
  • 61. Actors don’t share memory, they communicate only via messages. When an actor receives a message, it can: ● create new actors ● send messages to other actors ● do work Actor Model
  • 62. Actors don’t share memory, they communicate only via messages. When an actor receives a message, it can: ● create new actors ● send messages to other actors ● do work Actor Model Johnny? Not sharing memory prevents cascade failures when an actor crashes.
  • 63.
  • 65. Inside an actor, messages are processed one-at-a-time, in a single-threaded fashion. No need for locks! Actor Model single-threaded
  • 66. Inside an actor, messages are processed one-at-a-time, in a single-threaded fashion. No need for locks! Simplifies concurrency, no deadlocks, race conditions, etc. Actor Model single-threaded
  • 67. Lifts concurrency management to the mailbox. Allows you to “think globally, but act locally”. Actor Model
  • 68. Lifts concurrency management to the mailbox. Allows you to “think globally, but act locally”. Easier to think about a complex system in terms of states and transitions, than to manage state mutations. Actor Model
  • 69. MATCH 1 C1 input C2 input current frame history frame 1 frame 2 frame 3 buffering
  • 70. connection open MATCH 1 C1 input C2 input current frame history frame 1 frame 2 frame 3 buffering
  • 71. connection open authenticate MATCH 1 C1 input C2 input current frame history frame 1 frame 2 frame 3C3 joined buffering
  • 72. connection open authenticate send/receive MATCH 1 C1 input C2 input current frame history frame 1 frame 2 frame 3C3 joined buffering
  • 73. MATCH 1 C1 input C2 input current frame history frame 1 frame 2 frame 3C3 joined C3 input connection open authenticate send/receive buffering
  • 74. MATCH 1 C1 input C2 input current frame history frame 1 frame 2 frame 3C3 joined C3 input connection open authenticate send/receive buffering broadcast!
  • 75. MATCH 1 current frame history frame 1 frame 2 frame 3 C1 input C2 input C3 joined C3 input connection open authenticate send/receive buffering broadcast!
  • 76. MATCH 1 current frame history frame 1 frame 2 frame 3 frame 4 connection open authenticate send/receive buffering broadcast!
  • 77. MATCH 1 current frame history frame 1 frame 2 frame 3 frame 4 connection open authenticate send/receive buffering broadcast! C3 input
  • 78.
  • 80. MATCH 1 current frame history frame 1 frame 2 frame 3 ... C1 input C2 input C3 joined C3 input connection open authenticate send/receive buffering broadcast! C1 input
  • 81. MATCH 1 current frame history frame 1 frame 2 frame 3 ... C1 input C2 input C3 joined C3 input buffering broadcast! C1 input C2 input
  • 82. MATCH MATCH MATCH MATCH MATCH
  • 83. MATCH MATCH MATCH MATCH MATCH MATCH MATCH MATCH MATCH MATCH
  • 84. MATCH MATCH MATCH MATCH MATCH MATCH MATCH MATCH MATCH MATCH MATCH MATCH MATCH MATCH MATCH
  • 85. MATCH C1 input C2 input current frame history frame 1 frame 2 frame 3C3 joined connection open authenticate send/receive buffering broadcast!
  • 86. MATCH C1 input C2 input current frame history frame 1 frame 2 frame 3C3 joined connection open authenticate send/receive buffering broadcast!
  • 87. MATCH current frame history frame 1 frame 2 frame 3 C1 input C2 input C3 joined Socket actor Match actor
  • 88. MATCH current frame history frame 1 frame 2 frame 3 C1 input C2 input C3 joined Root Aggregate Socket actor Match actor
  • 89. MATCH current frame history frame 1 frame 2 frame 3 C1 input C2 input C3 joined Root Aggregate Socket actor Match actor
  • 90. MATCH current frame history frame 1 frame 2 frame 3 C1 input C2 input C3 joined
  • 91. MATCH current frame history frame 1 frame 2 frame 3 C1 input C2 input C3 joined C3 joined act locally think globally how actors interact with each other aka, the “protocol”
  • 92.
  • 93.
  • 94. the secret to building high performance systems is simplicity complexity kills performance
  • 95. Higher CCU per server Fewer servers Lower cost Less operational overhead Performance Matters
  • 96. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%. Performance Matters
  • 97. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%. Performance Matters
  • 98. Threads are heavy OS constructs. Each thread is allocated 1MB stack space by default. Context Switching is expensive at scale. Actors are cheap. Actor system can optimise use of threads to minimise context switching. Actor Model >
  • 99. Non-blocking I/O framework for JVM. Highly performant. Simplifies implementation of socket servers (TCP/ UDP). UDP support is “meh”... Netty
  • 100. Custom network protocol (bandwidth). Buffer pooling (GC pressure). Minimise Netty object creations (GC pressure). Using direct buffers (GC pressure). Disable Nagle's algorithm (latency). Epoll. Performance Tuning
  • 101. AWS Lambda functions to run bot clients (written with Akka): ● Cheaper ● Faster to boot up ● Easy to update Each Lambda invocation could simulate up to 100 bots. Automated Load Testing
  • 102.
  • 103.
  • 104. from US-EAST (Lambda) to EU-WEST (game server)
  • 105. optimize for tail latencies from US-EAST (Lambda) to EU-WEST (game server)