Scalability, basics, application to systems, teams and processes

•Download as PPTX, PDF•

0 likes•130 views

This talk describes some mental models that can help us scale software based systems. In particular it gives some ideas regarding design, equipment topology and engineering processes. Mental models: Linear scalability, Amdahl’s Law, Universal Law of scalability, Queueing Theory, and Little’s law.

Engineering

Scalability
Basics, application to systems, teams and processes

I am Edu Ferro
@eferro
https://www.eferro.net
Hello!
3
I am Fran Ortiz
@fortiz2305

What is scalability?
◉ Wikipedia’s definition
○ Scalability: Capability of a system to handle a growing
amount of work by adding resources to the system.
◉ Mathematical definition
○ Function, with size or load on the X axis, and throughput
on the Y axis.
4

Scalability vs. Performance
5
Performance measures the
speed/latency of a single
request.
Scalability measures the
ability of a system to handle a
growing amount of work.

Linear Scalability
◉ Ideal case: throughput increases linearly with load.
◉ Examples:
○ 100 operations/s with 1 node -> 400 operations/s with 4
nodes.
○ Adding people to an organisation, the organisational
capacity to do work increases linearly with the number of
people.
8

Linear Scalability: ideal case
9
Throughput increases linearly with
load.

Contention factor
Contention factor measures the effect of
waiting or queueing for shared resources.
12

Amdahl’s Law: contention factor
13
Contention factor: measures the
effect of waiting or queueing for
shared resources.

Teams
◉ Centralized
tasks/processes
Contention factor: examples
Systems
◉ Monolith infrastructure
◉ Optimization engines
15

Universal Scalability
Law (USL)
Coherence factor
3
17

Coherence factor
Coherence factor refers to the time spent
restoring a common view of the world or
getting an agreement across different
processors.
18

The Universal Scalability Law
19
Coherence factor
Contention factor
😭

Why is coherency quadratic?
N workers = N(N-1) pairs of interactions.
20

Teams
◉ Very large teams
◉ Minimize team
dependencies
USL: examples
Systems
◉ Nextail BI Subsystem
21

Queueing Theory
25
😭
😉
Source: Queueing Theory in Practice

Queues
◉ > 80% utilization -> rapid degradation
◉ 100% utilization -> inf waiting time
◉ Three regimes:
○ everything is okay
○ Oh wait
○ F**k
26

Teams
◉ Operations team vs each
team has operations
◉ Front/Back teams vs End to
End teams
Queuing: examples
Systems
◉ From monoqueue to multi
queue
29

Teams/Processes
◉ WIP limits in each team
◉ Flow optimization (instead of resource optimization)
◉ Self-service platform (no ops team)
Little’s Law: examples
31

Mental models
◉ Linear scalability
◉ Amdahl’s Law
◉ Universal Law of scalability
◉ Queueing Theory
◉ Little’s law
33

Think about scalability
Nextail
○ Design to support x2 size / x10 clients
○ Think about x20 (size/clients)
○ Imagine/Brainstorm x100 (size/clients)
34

References
◉ Scalability is Quantifiable: The Universal Scalability Law, Baron Schwartz
◉ Queueing Theory in Practice, Eben Freeman
◉ Coherence Penalty for Humans, Michael Nygard
◉ Applying the USL to Organizations, Adrian Colyer
◉ Applying the USL to Distributed Systems, Neil J. Gunther
◉ Applied Performance Theory, Kavya Joshi
◉ Super Sizing Your Servers and the Payback Trap, Neil J. Gunther
37

Any questions ?
You can find us at
◉ @eferro
◉ @fortiz2305
Thanks!
38

Similar to Scalability, basics, application to systems, teams and processes

Understanding Microservice PerformanceRob Harrop

Discreate Event Simulation_PPT1-R0.pptdiklatMSU

simulation modeling in DSSEnaam Alotaibi

Resource management techniquesDr Geetha Mohan

Lean-Six-Sigma-An-OverviewNational Refinery Limitted Karachi Pakistan

Understanding Distributed Databases ScalabilityRicardo Jimenez-Peris

Dynamic Optimization without Markov Assumptions: application to power systemsOlivier Teytaud

SMART International Symposium for Next Generation Infrastructure: The roles o...SMART Infrastructure Facility

Which of the following basic types of production layoutjohann11372

High-Speed Reactive MicroservicesRick Hightower

Artificial Neural NetworksGermán Ramos García

Which of the following approaches to service designjohann11372

Moving the Elephant in the Room: Data Migration at ScaleTyrone Hinderson

Xiangen Hu - WESST Keynote - Conversational Tutors and the Experience APINUS Institute of Applied Learning Sciences and Educational Technology

CS3114_09212011.pptArumugam90

Which of the following is a total measure of productivityjohann11373

Ilab Metis: we optimize power systems and we are not afraid of direct policy ...Olivier Teytaud

Elsevier - Smart Data and Algorithms for the Publishing IndustryAntonio Gulli

Which of the following is not a problem definition tooljohann11374

Deep learning Unit1 BasicsAllllllll.pptxFreefireGarena30

Similar to Scalability, basics, application to systems, teams and processes (20)

Understanding Microservice Performance

Discreate Event Simulation_PPT1-R0.ppt

simulation modeling in DSS

Resource management techniques

Lean-Six-Sigma-An-Overview

Understanding Distributed Databases Scalability

Dynamic Optimization without Markov Assumptions: application to power systems

SMART International Symposium for Next Generation Infrastructure: The roles o...

Which of the following basic types of production layout

High-Speed Reactive Microservices

Artificial Neural Networks

Which of the following approaches to service design

Moving the Elephant in the Room: Data Migration at Scale

Xiangen Hu - WESST Keynote - Conversational Tutors and the Experience API

CS3114_09212011.ppt

Which of the following is a total measure of productivity

Ilab Metis: we optimize power systems and we are not afraid of direct policy ...

Elsevier - Smart Data and Algorithms for the Publishing Industry

Which of the following is not a problem definition tool

Deep learning Unit1 BasicsAllllllll.pptx

Recently uploaded

Heart Disease Prediction using machine learning.pptxPoojaBan

Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha

IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst

Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665

power system scada applications and usesDevarapalliHaritha

HARMONY IN THE HUMAN BEING - Unit-II UHV-2RajaP95

Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...srsj9000

Introduction to Microprocesso programming and interfacing.pptxvipinkmenon1

High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile

College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile

🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...9953056974 Low Rate Call Girls In Saket, Delhi NCR

Architect Hassan Khalil Portfolio for 2024hassan khalil

Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxnull - The Open Security Community

(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat

Software and Systems Engineering Standards: Verification and Validation of Sy...VICTOR MAESTRE RAMIREZ

Artificial-Intelligence-in-Electronics (K).pptxbritheesh05

microprocessor 8085 and its interfacingjaychoudhary37

SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome

Internship report on mechanical engineeringmalavadedarshan25

What are the advantages and disadvantages of membrane structures.pptxwendy cai

Recently uploaded (20)

Heart Disease Prediction using machine learning.pptx

Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx

IVE Industry Focused Event - Defence Sector 2024

Call Girls Delhi {Jodhpur} 9711199012 high profile service

power system scada applications and uses

HARMONY IN THE HUMAN BEING - Unit-II UHV-2

Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...

Introduction to Microprocesso programming and interfacing.pptx

High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts

College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik

🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...

Architect Hassan Khalil Portfolio for 2024

Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx

(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts

Software and Systems Engineering Standards: Verification and Validation of Sy...

Artificial-Intelligence-in-Electronics (K).pptx

microprocessor 8085 and its interfacing

SPICE PARK APR2024 ( 6,793 SPICE Models )

Internship report on mechanical engineering

What are the advantages and disadvantages of membrane structures.pptx

Scalability, basics, application to systems, teams and processes

1. 1

2. Scalability Basics, application to systems, teams and processes

3. I am Edu Ferro @eferro https://www.eferro.net Hello! 3 I am Fran Ortiz @fortiz2305

4. What is scalability? ◉ Wikipedia’s definition ○ Scalability: Capability of a system to handle a growing amount of work by adding resources to the system. ◉ Mathematical definition ○ Function, with size or load on the X axis, and throughput on the Y axis. 4

5. Scalability vs. Performance 5 Performance measures the speed/latency of a single request. Scalability measures the ability of a system to handle a growing amount of work.

6. 6

7. Linear Scalability Ideal case 1 7

8. Linear Scalability ◉ Ideal case: throughput increases linearly with load. ◉ Examples: ○ 100 operations/s with 1 node -> 400 operations/s with 4 nodes. ○ Adding people to an organisation, the organisational capacity to do work increases linearly with the number of people. 8

9. Linear Scalability: ideal case 9 Throughput increases linearly with load.

10. 10

11. Amdahl’s Law Contention factor 2 11

12. Contention factor Contention factor measures the effect of waiting or queueing for shared resources. 12

13. Amdahl’s Law: contention factor 13 Contention factor: measures the effect of waiting or queueing for shared resources.

14. 14 0% 1% 2% 5%

15. Teams ◉ Centralized tasks/processes Contention factor: examples Systems ◉ Monolith infrastructure ◉ Optimization engines 15

16. Nextail Optimization engine 16

17. Universal Scalability Law (USL) Coherence factor 3 17

18. Coherence factor Coherence factor refers to the time spent restoring a common view of the world or getting an agreement across different processors. 18

19. The Universal Scalability Law 19 Coherence factor Contention factor 😭

20. Why is coherency quadratic? N workers = N(N-1) pairs of interactions. 20

21. Teams ◉ Very large teams ◉ Minimize team dependencies USL: examples Systems ◉ Nextail BI Subsystem 21

22. Nextail BI Subsystem 22

23. Nextail BI Subsystem 23

24. Queueing Theory4 24

25. Queueing Theory 25 😭 😉 Source: Queueing Theory in Practice

26. Queues ◉ > 80% utilization -> rapid degradation ◉ 100% utilization -> inf waiting time ◉ Three regimes: ○ everything is okay ○ Oh wait ○ F**k 26

27. 27

28. 28 High variability Low variability

29. Teams ◉ Operations team vs each team has operations ◉ Front/Back teams vs End to End teams Queuing: examples Systems ◉ From monoqueue to multi queue 29

30. Little’s Law 30

31. Teams/Processes ◉ WIP limits in each team ◉ Flow optimization (instead of resource optimization) ◉ Self-service platform (no ops team) Little’s Law: examples 31

32. Conclusions5 32

33. Mental models ◉ Linear scalability ◉ Amdahl’s Law ◉ Universal Law of scalability ◉ Queueing Theory ◉ Little’s law 33

34. Think about scalability Nextail ○ Design to support x2 size / x10 clients ○ Think about x20 (size/clients) ○ Imagine/Brainstorm x100 (size/clients) 34

35. 35

36. 36

37. References ◉ Scalability is Quantifiable: The Universal Scalability Law, Baron Schwartz ◉ Queueing Theory in Practice, Eben Freeman ◉ Coherence Penalty for Humans, Michael Nygard ◉ Applying the USL to Organizations, Adrian Colyer ◉ Applying the USL to Distributed Systems, Neil J. Gunther ◉ Applied Performance Theory, Kavya Joshi ◉ Super Sizing Your Servers and the Payback Trap, Neil J. Gunther 37

38. Any questions ? You can find us at ◉ @eferro ◉ @fortiz2305 Thanks! 38

39. 39

Editor's Notes

Por qué los procesos no escalan linealmente? Dos razones: factor de contención y factor de coherencia Contención: Estamos limitados por las cosas que puedo ejecutar en paralelo Coherencia: comunicaciones necesarias (sincronizaciones)
Nosotros nos vamos a centrar a partir de ahora en la definición de escalabilidad como función, hay una relación entre dos variables, donde una depende del valor del valor de la entrada (input y output). Veremos como el throughput es dependiente de la carga o el tamaño que tiene el sistema. Importante: aunque nos vayamos a centrar en la definición matemática, no habrá mucha matemática ni ningún requisito para la charla. Lo que queremos transmitir es una serie de modelos mentales que nos proporcionen algo de intuición Otros posibles inputs: Concurrencia Número de requests Tasa de llegada Tamaño de equipo Definicion de escalabilidadEs una función Load: -> Cost -> number of node-> Team size -> company size Throughput : num transacciones/s bytes/s Served, Num operaciones/s No es lo mismo performance Performance -> Wait time (desde el punto de vista del usuario) Un algoritmo puede ser escalable, y tener un mal rendimiento. Es muy típico, un algoritmo no paralelo suele tener mucho mejor rendimiento, hasta un punto, el problema es que típicamente a partir de un punto no pude ir más rápido o tener más carga
Scalability vs. Performance Performance measures the speed with which a single request can be executed, while scalability measures the ability of a request to maintain its performance under increasing load. La unidad de medida del performance es la latencia, mientras que de la escalabilidad es el throughput
Suppose there are parts in a system which can not be parallelized. Suppose there is a person who needs to be involved in every decision. Contention factor measures the effect of waiting or queueing for shared resources.
Al comiendo de la gráfica parece que estamos en un crecimiento lineal, pero en realidad no es así, simplemente que nuestro sistema no tiene todavía una carga con la que se pueda apreciar la NO linealidad. Una vez N crece mucho, tenemos asíntota en 1 / factor de contención. En la siguiente diapositiva podemos ver diferentes ejemplos. Hay rendimiento decreciente
A medida que crece la paralelización, el factor de contención se convierte en el factor en el factor limitante.Ejemplo: map - reduce.
Centralized tasks/processes -> Tareas y conocimiento especializado dependiente de un reducido grupo de personas -> DB migrations -> deployments -> decisiones técnicas importantes -> identificado silos y bottlenecks / identificacion cultura de heroes -> automatizacion y el selfservice (como requisito a la automatizacion) (DB migrations, transalations, creacion servicios mejorable) -> trabajo explícito por eliminar silos, pairing, tech concerns a nivel de squad y generales Monolith infrastructure -> <10 clientes. 1 Operacion (calculo) al dia -> 1 EC2 -> 1xcliente EC2 -> Contencion para integracion/pruebas/ajustes (computacion) -> k8s contencion por computacion eliminada (hasta un punto) -> Actualmente BD como contencion/bottleneck para el siguiente paso de escalabilidad Optimization Engines -> siguiente
2 partes principales IO bound / contention (DB/scenario preparation) CPU bound (Optimization) 2 problemas diferentes Desacoplables Optimizacion Paralelizable 95% 5% serie Amdahl -> x20 Optimization Engines -> Equipos. Eliminado Silo -> Clientes mucho más grandes (stores x prods x tiempo) -> Optimizacion general (tiempo de servicio teoria colas) --> 1TB -> < 256G --> mejora general carga BD -> Estudio sobre escalabilidad, paralalizacion, etc -> A nivel global, la contención se ha limitado ejecutando más engines en paralelo
Aquí el trabajo ha empezado. Antes había contención para entrar al sistema, para empezar la tarea que sea. Esto es una vez empezado, los workers-personas trabajando necesitan comunicarse y ponerse de acuerdo.
Explicar alpha y beta aquí. Alpha es el factor de contención que hemos dicho antes. Si beta = 0, tenemos la ley de Amdahl Novedad, podemos no solo no mejorar, sino que a partir de cierto punto empeora (coherencia)
No afecta al principio, sino a cierta escala. En sistemas pequeños es más visible la contención Como es cuadrático, lo queremos limitar lo máximo posible: ej. Tengo 1000 nodos y tengo que decidir a quien se lo envio, Quorum de base de datos, Load Balancer...
Very large teams -> Equipos muy grandes. factor de coherencia alto, se pierde la velocidad -> coordination overhead (N²) -> Solución? división en equipos más pequeños con responsabilidades bien definidas y con responsabilidad end-to-end. Se reduce la necesidad de sincronización continua. Se habla menos, con más abstracción. Minimize team dependencies -> plataforma self service -> toma decisiones más descentralizada -> division modulos/funcionalidades por squad -> desacoplamiento de lo posible -> Criterios de aceptacion -> tech concerns -> RFCs / RFKs Nextail BI Subsystem Siguiente Nextail Job Scheduler -> Scheduler escalable en horizontal (sharding, eventos) -> Permite gestionar colas (y definir esas colas en base al punto de contencion que tiene cada trabajo) -> Permite reducir contencion, haciendo dinamico el compute, pero teniendo en cuenta los bottlenecks para no saturar un punto de contención
Nextail BI Subsystem BI: proceso donde se realizan una serie de agregaciones/cálculos sobre datos que nos pasa el cliente (ventas, stocks…) -> Proceso secuencial: un tipo de dato detrás de otro. Algunos de ellos son dependientes, otros no -> 1er paso: identificar qué tipos de datos se pueden agregar/calcular en paralelo. Mediante eventos empezamos cada cálculo tan pronto como es posible. -> En ese punto tenemos muchos procesos en paralelo, el factor de contención disminuye bastante. A partir de ahí, decisiones para que el factor de coherencia no aumente, que los procesos en paralelo no se tengan que comunicar entre ellos. Algunas de esas decisiones: hay un cálculo que depende de que terminen dos de los procesos en paralelo, se ejecuta cada vez que termina cada uno de ellos, pero no queremos que se tengan que comunicar, guardar estado, etc. -> Durante este proceso, también se ha producido un cambio de tecnología que nos ha reducido bastante el factor de contención: Aurora to Redshift (por el tipo de queries es mucho más óptima), no pasamos tiempo esperando por BBDD
Nextail BI Subsystem BI: proceso donde se realizan una serie de agregaciones/cálculos sobre datos que nos pasa el cliente (ventas, stocks…) -> Proceso secuencial: un tipo de dato detrás de otro. Algunos de ellos son dependientes, otros no -> 1er paso: identificar qué tipos de datos se pueden agregar/calcular en paralelo. Mediante eventos empezamos cada cálculo tan pronto como es posible. -> En ese punto tenemos muchos procesos en paralelo, el factor de contención disminuye bastante. A partir de ahí, decisiones para que el factor de coherencia no aumente, que los procesos en paralelo no se tengan que comunicar entre ellos. Algunas de esas decisiones: hay un cálculo que depende de que terminen dos de los procesos en paralelo, se ejecuta cada vez que termina cada uno de ellos, pero no queremos que se tengan que comunicar, guardar estado, etc. -> Durante este proceso, también se ha producido un cambio de tecnología que nos ha reducido bastante el factor de contención: Aurora to Redshift (por el tipo de queries es mucho más óptima), no pasamos tiempo esperando por BBDD
Hasta ahora hemos visto CUÁNTO trabajo pueden producir nuestros sistemas o nuestros equipos en función de la carga, pero no hemos hablado de CÓMO DE RÁPIDO lo hacen. La teoría de colas y la ley de Little cubren esta parte
TODO: meter referencia M/M/1/s Autoscaler 60%/70% Leyenda ocupación
100% team/individual utilization -> caos -> low adaptability to changes Operations team vs each team has operations: -> High stress for the operation team (aka brent) -> High response time, Stress and low quality Front/Back teams vs End to End teams: -> 100% resource utilization -> generate queues, inventory (work in progress), and some time infinite waiting time From monoqueue to multi queue -> detect real/domain dependencies -> identify contention points (Aurora DB cluster) -> split by similar variability
Relación entre cuántas cosas estás haciendo a la vez y cuánto tiempo tarda 1 elemento en terminarse.
100% team/individual utilization -> caos -> low adaptability to changes Flow oriented Limit WIP in each team -Development, From skill teams, and not limited WIP to WIP limited by team and each team all the skills (pairing, one user story flow, trunk base development, etc… ) -Kaizen limit to improvement experiments Selfservice platform (no ops team) Better lead time, increasing the throughput because each team can deploy, can make migrations, etc.
Aquí el trabajo ha empezado. Antes había contención para entrar al sistema, para empezar la tarea que sea. Esto es una vez empezado, los workers-personas trabajando necesitan comunicarse y ponerse de acuerdo.

Scalability, basics, application to systems, teams and processes

Recommended

Recommended

More Related Content

Similar to Scalability, basics, application to systems, teams and processes

Similar to Scalability, basics, application to systems, teams and processes (20)

More from Eduardo Ferro Aldama

More from Eduardo Ferro Aldama (19)

Recently uploaded

Recently uploaded (20)

Scalability, basics, application to systems, teams and processes

Editor's Notes