SlideShare a Scribd company logo
1 of 52
Download to read offline
IS WRITING PERFORMANT
CODE TOO EXPENSIVE?
SLIDO.COM #GEECON2019
TOMASZ KOWALCZEWSKI
WHOIS
TOMASZ KOWALCZEWSKI
▸ Codewise/Tornimo.io
▸ Twitter: @tkowalcz
▸ Failed tech lead
▸ Spent about half of his carrier optimising allocations :(
FAILED
ASPIRATIONS
IN
DATABASE
SYSTEMS
SPRINT
10 000$ / MONTH
20 000$ / MONTH
©DRESDEN DATABASE SYSTEMS GROUP
WHERE DID WE GO WRONG
0.1X DEVELOPER NEEDED
THE 0.1X DEVELOPER
▸ Let’s not add that feature
▸ Let’s not adopt this new technology
▸ Let’s not automate this
▸ Can we approximate the result so no one notices?
▸ Can we use a small lookup table?
▸ Benji Weber: https://benjiweber.co.uk/blog/2016/01/25/why-i-strive-to-be-a-0-1x-
engineer/
▸ Mike Acton: https://twitter.com/mike_acton/status/989001065893801984
VERY LARGE DATABASES CONFERENCE 2017
source: wikipedia
VERY LARGE DATABASES CONFERENCE
2015 2016 2017
20ms
80ms
150ms
VERY LARGE DATABASES CONFERENCE
2015 2016 2017
20ms
80ms
150ms
250MS
HUMAN PERCEPTION
EFFICIENTLY COMPILING EFFICIENT QUERY PLANS FOR MODERN HARDWARE
DBMS "X" MYSQL 4.1 MONETDB HAND-CODED C
220ms
3s 700ms
27s28s
SCALABILITY! BUT AT WHAT COST?
SCALABILITY! BUT AT WHAT COST?
NOW, WE ARE GOING TO INTRODUCE SOME
NEW CANDIDATES FOR "STATE-OF-THE-ART"
ALGORITHMS, BY WRITING SOME FOR-
LOOPS AND RUNNING THEM ON MY LAPTOP.
Frank McSherry
SCALABILITY! BUT AT WHAT COST?
SCALABILITY! BUT AT WHAT COST?
SCALABILITY! BUT AT WHAT COST?
SCALABILITY! BUT AT WHAT COST?
THERE ARE NO PROBLEMS ONLY OPPORTUNITIES
source: xkcd
PERFORMANCE?
PERFORMANCE!
‣ Intuition about what is possible
‣ Impact on real life Java code
CPU FUNCTIONAL BLOCKS
source: Martin Thompson’s blog
2 SOCKET SERVER
CPU FUNCTIONAL BLOCKS
NUMA domain 0 NUMA domain 1
AMD EPYC DIE
AMD EPYC DIE
AMD EPYC DIAGRAM
./lstopo --physical --output-format png /tmp/lstopo.png
LATENCY EXPERIMENT
CORE 0
CORE 0 “LOCAL” RAM
CORE 1
CORE N
CAS
CAS
CAS
CAS = UNSAFE.compareAndSwapInt
…
AMD EPYC CORE PING LATENCY (JAVA-JMH)
<dependency>
<groupId>org.openjdk.jmh</groupId>
<artifactId>jmh-core</artifactId>
<version>1.21</version>
</dependency>
<dependency>
<groupId>org.openjdk.jmh</groupId>
<artifactId>jmh-generator-annprocess</artifactId>
<version>1.21</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.xerial</groupId>
<artifactId>jnuma</artifactId>
<version>0.1.3</version>
</dependency>
AMD EPYC CORE PING LATENCY (JAVA-JMH)
@Setup
public void setUp(CoreSequence coreSequence) {
Numa.runOnNode(node);
ByteBuffer buffer = Numa.allocOnNode(capacity, node);
long address = address(buffer);
}
AMD EPYC CORE PING LATENCY (JAVA-JMH)
@Benchmark
@Group("pingpong")
public void ping(Control cnt, GroupState groupState, CoreAssigner coreAssigner) {
while (
!cnt.stopMeasurement &&
!UNSAFE.compareAndSwapInt(null, groupState.flagAddress, 0, 1)
) {
// this body is intentionally left blank
}
}
@Benchmark
@Group("pingpong")
public void pong(Control cnt, GroupState groupState, CoreAssigner coreAssigner) {
while (
!cnt.stopMeasurement &&
!UNSAFE.compareAndSwapInt(null, groupState.flagAddress, 1, 0)
) {
// this body is intentionally left blank
}
}
AMD EPYC CORE PING LATENCY (JAVA-JMH)
AMD EPYC CORE PING LATENCY (JAVA-JMH)
INTEL XEON PLATINUM PING LATENCY (JAVA-JMH)
NUMA
RAM READ THROUGHPUT
A. <500 MB/s
B. <2GB/s
C. <20GB/s
D. ~100GB/s
NUMA
RAM READ THROUGHPUT - PREDICTABLE PATTERN
@Benchmark
public int readByteBufferInOrder() {
Numa.runOnNode(threadNumaNode);
int result = 0;
for (int i = 0; i < byteBuffer.limit(); i += 8) {
result += byteBuffer.getLong(i);
}
return result;
}
AMD MEMORY BANDWIDTH
BRANCH PREDICTION
RAM THROUGHPUT - READ 400MB OF DATA
A. ~1s
B. 0.348s
C. 0.150s
D. It depends
MULTI BYTE PROCESSING WITH FULL WORD INSTRUCTIONS
0
150MS
300MS
450MS
600MS
Selectivity (%)
0 10 20 30 40 50 60 70 80 90 100
533ms
43ms 43ms
MULTI BYTE PROCESSING WITH FULL WORD INSTRUCTIONS
@Benchmark
public int plainIf() {
int result = 0;
for (int i = 0; i < inputSize; i++) {
if (onHeapValues.getInt(i * 4) < predicate) {
result++;
}
}
return result;
}
SIMPLE PREDICATE
LESLIE LAMPORT
MULTI BYTE PROCESSING WITH FULL WORD INSTRUCTIONS
MASK = 0111 1111
~MASK = 1000 0000
42 = 0010 1010
13 = 0000 1101
13 ^ MASK = 0111 0010
+ 42 = 1001 1100
&~MASK = 1000 0000
MULTI BYTE PROCESSING WITH FULL WORD INSTRUCTIONS
@Benchmark
public int plainIf() {
int result = 0;
for (int i = 0; i < inputSize; i++) {
if (onHeapValues.getInt(i * 4) < predicate) {
result++;
}
}
return result;
}
SIMPLE PREDICATE
MULTI BYTE PROCESSING WITH FULL WORD INSTRUCTIONS
@Benchmark
public int plainIf() {
int result = 0;
for (int i = 0; i < inputSize; i++) {
result += RETURN_1_IF_MATCH(
onHeapValues.getInt(i * 4),
predicate
)
}
return result;
}
BRANCHLESS (PSEUDOCODE)
MULTI BYTE PROCESSING WITH FULL WORD INSTRUCTIONS
@Benchmark
public int plainArithmetic() {
int result = 0;
int mask = 0b0111_1111;
for (int i = 0; i < inputSize; i++) {
int x = onHeapValues.getInt(i * 4) ^ mask;
int z = (predicate + x) & ~mask;
result += Integer.bitCount(z);
}
return result;
}
BRANCHLESS
MULTI BYTE PROCESSING WITH FULL WORD INSTRUCTIONS
0
150MS
300MS
450MS
600MS
0 10 20 30 40 50 60 70 80 90 100
CONDITIONAL ARITHMETIC
97ms
533ms
43ms 43ms
MULTI BYTE PROCESSING WITH FULL WORD INSTRUCTIONS
0
150MS
300MS
450MS
600MS
0 10 20 30 40 50 60 70 80 90 100
CONDITIONAL ARITHMETIC
PACKED
11ms
97ms
533ms
43ms 43ms
BRANCHLESS DOOM
BRANCHLESS DOOM
“The mov-only DOOM renders approximately
one frame every 7 hours,
so playing this version requires somewhat increased patience.”
CONCLUSION
‣ THERE ARE TWO HATS YOU CAN WEAR
‣ 0.1X CONCENTRATES ON USER VALUE
‣ 10X ON TECHNOLOGY
‣ THESE ARE NOT EXCLUSIVE
‣ LEARN ALGORITHMS AND DATA STRUCTURES (LINDY’S!)
‣ EXPERIMENT
‣ DEVELOP INTUITION OF WHAT IS POSSIBLE
PAPERS
REFERENCES
‣Efficiently Compiling Efficient Query Plans for Modern
Hardware
‣Constant-Time Query Processing
‣https://danluu.com/branch-prediction/
‣StarCraft II as an AI research environment
‣Multiple Byte Processing with FullWord Instructions
‣http://www.vldb.org/2017/keynotes.php
COPYRIGHTS
COPYRIGHTS
▸ Houston opportunities by XKCD.com
▸ A/B Testing example: https://en.wikipedia.org/wiki/A/
B_testing#/media/File:A-B_testing_example.png by Maxime
Lorant
▸ AMD for EPYC diagrams
▸ Martin Thompson (https://mechanical-
sympathy.blogspot.com/2013/02/cpu-cache-flushing-
fallacy.html) for 2 socket CPU diagram

More Related Content

What's hot

Webmontag Berlin "coffee script"
Webmontag Berlin "coffee script"Webmontag Berlin "coffee script"
Webmontag Berlin "coffee script"Webmontag Berlin
 
"Wix Engineering Media AI Photo Studio", Mykola Mykhailych
"Wix Engineering Media AI Photo Studio", Mykola Mykhailych"Wix Engineering Media AI Photo Studio", Mykola Mykhailych
"Wix Engineering Media AI Photo Studio", Mykola MykhailychFwdays
 
Incremental Type Safety in React Apollo
Incremental Type Safety in React Apollo Incremental Type Safety in React Apollo
Incremental Type Safety in React Apollo Evans Hauser
 
Powershell Tech Ed2009
Powershell Tech Ed2009Powershell Tech Ed2009
Powershell Tech Ed2009rsnarayanan
 
Testing persistence in PHP with DbUnit
Testing persistence in PHP with DbUnitTesting persistence in PHP with DbUnit
Testing persistence in PHP with DbUnitPeter Wilcsinszky
 
Remote controlling Parrot AR Drone with Spring Boot & Vaadin (JavaCro15)
Remote controlling Parrot AR Drone with Spring Boot & Vaadin (JavaCro15)Remote controlling Parrot AR Drone with Spring Boot & Vaadin (JavaCro15)
Remote controlling Parrot AR Drone with Spring Boot & Vaadin (JavaCro15)Peter Lehto
 
VPN Access Runbook
VPN Access RunbookVPN Access Runbook
VPN Access RunbookTaha Shakeel
 
Simple PHP programs by kongu Engineering College
Simple  PHP programs by  kongu  Engineering CollegeSimple  PHP programs by  kongu  Engineering College
Simple PHP programs by kongu Engineering Collegepyingkodi maran
 
Remote controlling Parrot AR drone with Vaadin & Spring Boot @ GWT.create
Remote controlling Parrot AR drone with Vaadin & Spring Boot @ GWT.createRemote controlling Parrot AR drone with Vaadin & Spring Boot @ GWT.create
Remote controlling Parrot AR drone with Vaadin & Spring Boot @ GWT.createPeter Lehto
 
for this particular program how do i create the input innotepad 1st ? #includ...
for this particular program how do i create the input innotepad 1st ? #includ...for this particular program how do i create the input innotepad 1st ? #includ...
for this particular program how do i create the input innotepad 1st ? #includ...hwbloom59
 
Symfony 4 Workshop - Limenius
Symfony 4 Workshop - LimeniusSymfony 4 Workshop - Limenius
Symfony 4 Workshop - LimeniusIgnacio Martín
 
Towards the Cloud: Event-driven Architectures in PHP
Towards the Cloud: Event-driven Architectures in PHPTowards the Cloud: Event-driven Architectures in PHP
Towards the Cloud: Event-driven Architectures in PHPBenjamin Eberlei
 
Python and EM CLI: The Enterprise Management Super Tools
Python and EM CLI: The Enterprise Management Super ToolsPython and EM CLI: The Enterprise Management Super Tools
Python and EM CLI: The Enterprise Management Super ToolsSeth Miller
 
for this particular program how do i create the input innotepad 1st ?#include...
for this particular program how do i create the input innotepad 1st ?#include...for this particular program how do i create the input innotepad 1st ?#include...
for this particular program how do i create the input innotepad 1st ?#include...hwbloom14
 
Running a Scalable And Reliable Symfony2 Application in Cloud (Symfony Sweden...
Running a Scalable And Reliable Symfony2 Application in Cloud (Symfony Sweden...Running a Scalable And Reliable Symfony2 Application in Cloud (Symfony Sweden...
Running a Scalable And Reliable Symfony2 Application in Cloud (Symfony Sweden...Ville Mattila
 
My Top 5 APEX JavaScript API's
My Top 5 APEX JavaScript API'sMy Top 5 APEX JavaScript API's
My Top 5 APEX JavaScript API'sRoel Hartman
 
What's new in ECMAScript 6
What's new in ECMAScript 6What's new in ECMAScript 6
What's new in ECMAScript 6Ran Wahle
 

What's hot (20)

Webmontag Berlin "coffee script"
Webmontag Berlin "coffee script"Webmontag Berlin "coffee script"
Webmontag Berlin "coffee script"
 
"Wix Engineering Media AI Photo Studio", Mykola Mykhailych
"Wix Engineering Media AI Photo Studio", Mykola Mykhailych"Wix Engineering Media AI Photo Studio", Mykola Mykhailych
"Wix Engineering Media AI Photo Studio", Mykola Mykhailych
 
Incremental Type Safety in React Apollo
Incremental Type Safety in React Apollo Incremental Type Safety in React Apollo
Incremental Type Safety in React Apollo
 
Nativescript angular
Nativescript angularNativescript angular
Nativescript angular
 
Powershell Tech Ed2009
Powershell Tech Ed2009Powershell Tech Ed2009
Powershell Tech Ed2009
 
Angular mix chrisnoring
Angular mix chrisnoringAngular mix chrisnoring
Angular mix chrisnoring
 
Testing persistence in PHP with DbUnit
Testing persistence in PHP with DbUnitTesting persistence in PHP with DbUnit
Testing persistence in PHP with DbUnit
 
Dojo and Adobe AIR
Dojo and Adobe AIRDojo and Adobe AIR
Dojo and Adobe AIR
 
Remote controlling Parrot AR Drone with Spring Boot & Vaadin (JavaCro15)
Remote controlling Parrot AR Drone with Spring Boot & Vaadin (JavaCro15)Remote controlling Parrot AR Drone with Spring Boot & Vaadin (JavaCro15)
Remote controlling Parrot AR Drone with Spring Boot & Vaadin (JavaCro15)
 
VPN Access Runbook
VPN Access RunbookVPN Access Runbook
VPN Access Runbook
 
Simple PHP programs by kongu Engineering College
Simple  PHP programs by  kongu  Engineering CollegeSimple  PHP programs by  kongu  Engineering College
Simple PHP programs by kongu Engineering College
 
Remote controlling Parrot AR drone with Vaadin & Spring Boot @ GWT.create
Remote controlling Parrot AR drone with Vaadin & Spring Boot @ GWT.createRemote controlling Parrot AR drone with Vaadin & Spring Boot @ GWT.create
Remote controlling Parrot AR drone with Vaadin & Spring Boot @ GWT.create
 
for this particular program how do i create the input innotepad 1st ? #includ...
for this particular program how do i create the input innotepad 1st ? #includ...for this particular program how do i create the input innotepad 1st ? #includ...
for this particular program how do i create the input innotepad 1st ? #includ...
 
Symfony 4 Workshop - Limenius
Symfony 4 Workshop - LimeniusSymfony 4 Workshop - Limenius
Symfony 4 Workshop - Limenius
 
Towards the Cloud: Event-driven Architectures in PHP
Towards the Cloud: Event-driven Architectures in PHPTowards the Cloud: Event-driven Architectures in PHP
Towards the Cloud: Event-driven Architectures in PHP
 
Python and EM CLI: The Enterprise Management Super Tools
Python and EM CLI: The Enterprise Management Super ToolsPython and EM CLI: The Enterprise Management Super Tools
Python and EM CLI: The Enterprise Management Super Tools
 
for this particular program how do i create the input innotepad 1st ?#include...
for this particular program how do i create the input innotepad 1st ?#include...for this particular program how do i create the input innotepad 1st ?#include...
for this particular program how do i create the input innotepad 1st ?#include...
 
Running a Scalable And Reliable Symfony2 Application in Cloud (Symfony Sweden...
Running a Scalable And Reliable Symfony2 Application in Cloud (Symfony Sweden...Running a Scalable And Reliable Symfony2 Application in Cloud (Symfony Sweden...
Running a Scalable And Reliable Symfony2 Application in Cloud (Symfony Sweden...
 
My Top 5 APEX JavaScript API's
My Top 5 APEX JavaScript API'sMy Top 5 APEX JavaScript API's
My Top 5 APEX JavaScript API's
 
What's new in ECMAScript 6
What's new in ECMAScript 6What's new in ECMAScript 6
What's new in ECMAScript 6
 

Similar to Is writing performant code too expensive?

Is writing performant code too expensive?
Is writing performant code too expensive? Is writing performant code too expensive?
Is writing performant code too expensive? Tomasz Kowalczewski
 
What We Talk About When We Talk About Unit Testing
What We Talk About When We Talk About Unit TestingWhat We Talk About When We Talk About Unit Testing
What We Talk About When We Talk About Unit TestingKevlin Henney
 
How much performance can you get out of Javascript? - Massimiliano Mantione -...
How much performance can you get out of Javascript? - Massimiliano Mantione -...How much performance can you get out of Javascript? - Massimiliano Mantione -...
How much performance can you get out of Javascript? - Massimiliano Mantione -...Codemotion
 
JavaOne 2013: Java 8 - The Good Parts
JavaOne 2013: Java 8 - The Good PartsJavaOne 2013: Java 8 - The Good Parts
JavaOne 2013: Java 8 - The Good PartsKonrad Malawski
 
node.js - Eventful JavaScript on the Server
node.js - Eventful JavaScript on the Servernode.js - Eventful JavaScript on the Server
node.js - Eventful JavaScript on the ServerDavid Ruiz
 
Tech Talk: App Functionality (Android)
Tech Talk: App Functionality (Android)Tech Talk: App Functionality (Android)
Tech Talk: App Functionality (Android)Lifeparticle
 
Beyond php it's not (just) about the code
Beyond php   it's not (just) about the codeBeyond php   it's not (just) about the code
Beyond php it's not (just) about the codeWim Godden
 
Mutation Testing with PIT
Mutation Testing with PITMutation Testing with PIT
Mutation Testing with PITRafał Leszko
 
Working effectively with legacy code
Working effectively with legacy codeWorking effectively with legacy code
Working effectively with legacy codeShriKant Vashishtha
 
From clever code to better code
From clever code to better codeFrom clever code to better code
From clever code to better codeDror Helper
 
What's in Kotlin for us - Alexandre Greschon, MyHeritage
What's in Kotlin for us - Alexandre Greschon, MyHeritageWhat's in Kotlin for us - Alexandre Greschon, MyHeritage
What's in Kotlin for us - Alexandre Greschon, MyHeritageDroidConTLV
 
Sydney Oracle Meetup - execution plans
Sydney Oracle Meetup - execution plansSydney Oracle Meetup - execution plans
Sydney Oracle Meetup - execution planspaulguerin
 

Similar to Is writing performant code too expensive? (20)

Is writing performant code too expensive?
Is writing performant code too expensive? Is writing performant code too expensive?
Is writing performant code too expensive?
 
Day 5
Day 5Day 5
Day 5
 
What We Talk About When We Talk About Unit Testing
What We Talk About When We Talk About Unit TestingWhat We Talk About When We Talk About Unit Testing
What We Talk About When We Talk About Unit Testing
 
How much performance can you get out of Javascript? - Massimiliano Mantione -...
How much performance can you get out of Javascript? - Massimiliano Mantione -...How much performance can you get out of Javascript? - Massimiliano Mantione -...
How much performance can you get out of Javascript? - Massimiliano Mantione -...
 
Java 8: the good parts!
Java 8: the good parts!Java 8: the good parts!
Java 8: the good parts!
 
JavaOne 2013: Java 8 - The Good Parts
JavaOne 2013: Java 8 - The Good PartsJavaOne 2013: Java 8 - The Good Parts
JavaOne 2013: Java 8 - The Good Parts
 
node.js - Eventful JavaScript on the Server
node.js - Eventful JavaScript on the Servernode.js - Eventful JavaScript on the Server
node.js - Eventful JavaScript on the Server
 
ES6: The Awesome Parts
ES6: The Awesome PartsES6: The Awesome Parts
ES6: The Awesome Parts
 
Writing Good Tests
Writing Good TestsWriting Good Tests
Writing Good Tests
 
Tech Talk: App Functionality (Android)
Tech Talk: App Functionality (Android)Tech Talk: App Functionality (Android)
Tech Talk: App Functionality (Android)
 
Beyond php it's not (just) about the code
Beyond php   it's not (just) about the codeBeyond php   it's not (just) about the code
Beyond php it's not (just) about the code
 
Mutation Testing with PIT
Mutation Testing with PITMutation Testing with PIT
Mutation Testing with PIT
 
Jsp And Jdbc
Jsp And JdbcJsp And Jdbc
Jsp And Jdbc
 
Working effectively with legacy code
Working effectively with legacy codeWorking effectively with legacy code
Working effectively with legacy code
 
From clever code to better code
From clever code to better codeFrom clever code to better code
From clever code to better code
 
What's in Kotlin for us - Alexandre Greschon, MyHeritage
What's in Kotlin for us - Alexandre Greschon, MyHeritageWhat's in Kotlin for us - Alexandre Greschon, MyHeritage
What's in Kotlin for us - Alexandre Greschon, MyHeritage
 
Sydney Oracle Meetup - execution plans
Sydney Oracle Meetup - execution plansSydney Oracle Meetup - execution plans
Sydney Oracle Meetup - execution plans
 
Dartprogramming
DartprogrammingDartprogramming
Dartprogramming
 
mobl
moblmobl
mobl
 
Backendless apps
Backendless appsBackendless apps
Backendless apps
 

More from Tomasz Kowalczewski

More from Tomasz Kowalczewski (13)

How I learned to stop worrying and love the dark silicon apocalypse.pdf
How I learned to stop worrying and love the dark silicon apocalypse.pdfHow I learned to stop worrying and love the dark silicon apocalypse.pdf
How I learned to stop worrying and love the dark silicon apocalypse.pdf
 
Is writing performant code too expensive?
Is writing performant code too expensive?Is writing performant code too expensive?
Is writing performant code too expensive?
 
Deep dive reactive java (DevoxxPl)
Deep dive reactive java (DevoxxPl)Deep dive reactive java (DevoxxPl)
Deep dive reactive java (DevoxxPl)
 
Everybody Lies
Everybody LiesEverybody Lies
Everybody Lies
 
Forgive me for i have allocated
Forgive me for i have allocatedForgive me for i have allocated
Forgive me for i have allocated
 
AWS Java SDK @ scale
AWS Java SDK @ scaleAWS Java SDK @ scale
AWS Java SDK @ scale
 
Measure to fail
Measure to failMeasure to fail
Measure to fail
 
Reactive Java at JDD 2014
Reactive Java at JDD 2014Reactive Java at JDD 2014
Reactive Java at JDD 2014
 
Reactive Java (33rd Degree)
Reactive Java (33rd Degree)Reactive Java (33rd Degree)
Reactive Java (33rd Degree)
 
Reactive Java (GeeCON 2014)
Reactive Java (GeeCON 2014)Reactive Java (GeeCON 2014)
Reactive Java (GeeCON 2014)
 
Introduction to Reactive Java
Introduction to Reactive JavaIntroduction to Reactive Java
Introduction to Reactive Java
 
Java 8 jest tuż za rogiem
Java 8 jest tuż za rogiemJava 8 jest tuż za rogiem
Java 8 jest tuż za rogiem
 
Java gets a closure
Java gets a closureJava gets a closure
Java gets a closure
 

Recently uploaded

Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...Erbil Polytechnic University
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...asadnawaz62
 
US Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionUS Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionMebane Rash
 
"Exploring the Essential Functions and Design Considerations of Spillways in ...
"Exploring the Essential Functions and Design Considerations of Spillways in ..."Exploring the Essential Functions and Design Considerations of Spillways in ...
"Exploring the Essential Functions and Design Considerations of Spillways in ...Erbil Polytechnic University
 
Transport layer issues and challenges - Guide
Transport layer issues and challenges - GuideTransport layer issues and challenges - Guide
Transport layer issues and challenges - GuideGOPINATHS437943
 
Katarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School CourseKatarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School Coursebim.edu.pl
 
Indian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.pptIndian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.pptMadan Karki
 
BSNL Internship Training presentation.pptx
BSNL Internship Training presentation.pptxBSNL Internship Training presentation.pptx
BSNL Internship Training presentation.pptxNiranjanYadav41
 
Configuration of IoT devices - Systems managament
Configuration of IoT devices - Systems managamentConfiguration of IoT devices - Systems managament
Configuration of IoT devices - Systems managamentBharaniDharan195623
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AIabhishek36461
 
11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdf11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdfHafizMudaserAhmad
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
 
home automation using Arduino by Aditya Prasad
home automation using Arduino by Aditya Prasadhome automation using Arduino by Aditya Prasad
home automation using Arduino by Aditya Prasadaditya806802
 
Crystal Structure analysis and detailed information pptx
Crystal Structure analysis and detailed information pptxCrystal Structure analysis and detailed information pptx
Crystal Structure analysis and detailed information pptxachiever3003
 
Energy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptxEnergy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptxsiddharthjain2303
 
Main Memory Management in Operating System
Main Memory Management in Operating SystemMain Memory Management in Operating System
Main Memory Management in Operating SystemRashmi Bhat
 
Engineering Drawing section of solid
Engineering Drawing     section of solidEngineering Drawing     section of solid
Engineering Drawing section of solidnamansinghjarodiya
 

Recently uploaded (20)

Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
 
young call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Serviceyoung call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Service
 
Designing pile caps according to ACI 318-19.pptx
Designing pile caps according to ACI 318-19.pptxDesigning pile caps according to ACI 318-19.pptx
Designing pile caps according to ACI 318-19.pptx
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...
 
US Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionUS Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of Action
 
"Exploring the Essential Functions and Design Considerations of Spillways in ...
"Exploring the Essential Functions and Design Considerations of Spillways in ..."Exploring the Essential Functions and Design Considerations of Spillways in ...
"Exploring the Essential Functions and Design Considerations of Spillways in ...
 
Transport layer issues and challenges - Guide
Transport layer issues and challenges - GuideTransport layer issues and challenges - Guide
Transport layer issues and challenges - Guide
 
Katarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School CourseKatarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School Course
 
Indian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.pptIndian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.ppt
 
BSNL Internship Training presentation.pptx
BSNL Internship Training presentation.pptxBSNL Internship Training presentation.pptx
BSNL Internship Training presentation.pptx
 
Configuration of IoT devices - Systems managament
Configuration of IoT devices - Systems managamentConfiguration of IoT devices - Systems managament
Configuration of IoT devices - Systems managament
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AI
 
11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdf11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdf
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
 
home automation using Arduino by Aditya Prasad
home automation using Arduino by Aditya Prasadhome automation using Arduino by Aditya Prasad
home automation using Arduino by Aditya Prasad
 
Crystal Structure analysis and detailed information pptx
Crystal Structure analysis and detailed information pptxCrystal Structure analysis and detailed information pptx
Crystal Structure analysis and detailed information pptx
 
Energy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptxEnergy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptx
 
Main Memory Management in Operating System
Main Memory Management in Operating SystemMain Memory Management in Operating System
Main Memory Management in Operating System
 
Engineering Drawing section of solid
Engineering Drawing     section of solidEngineering Drawing     section of solid
Engineering Drawing section of solid
 

Is writing performant code too expensive?

  • 1. IS WRITING PERFORMANT CODE TOO EXPENSIVE? SLIDO.COM #GEECON2019 TOMASZ KOWALCZEWSKI
  • 2. WHOIS TOMASZ KOWALCZEWSKI ▸ Codewise/Tornimo.io ▸ Twitter: @tkowalcz ▸ Failed tech lead ▸ Spent about half of his carrier optimising allocations :(
  • 5. 10 000$ / MONTH
  • 6. 20 000$ / MONTH
  • 8. WHERE DID WE GO WRONG
  • 9. 0.1X DEVELOPER NEEDED THE 0.1X DEVELOPER ▸ Let’s not add that feature ▸ Let’s not adopt this new technology ▸ Let’s not automate this ▸ Can we approximate the result so no one notices? ▸ Can we use a small lookup table? ▸ Benji Weber: https://benjiweber.co.uk/blog/2016/01/25/why-i-strive-to-be-a-0-1x- engineer/ ▸ Mike Acton: https://twitter.com/mike_acton/status/989001065893801984
  • 10. VERY LARGE DATABASES CONFERENCE 2017 source: wikipedia
  • 11. VERY LARGE DATABASES CONFERENCE 2015 2016 2017 20ms 80ms 150ms
  • 12. VERY LARGE DATABASES CONFERENCE 2015 2016 2017 20ms 80ms 150ms 250MS HUMAN PERCEPTION
  • 13. EFFICIENTLY COMPILING EFFICIENT QUERY PLANS FOR MODERN HARDWARE DBMS "X" MYSQL 4.1 MONETDB HAND-CODED C 220ms 3s 700ms 27s28s
  • 14. SCALABILITY! BUT AT WHAT COST?
  • 15. SCALABILITY! BUT AT WHAT COST?
  • 16. NOW, WE ARE GOING TO INTRODUCE SOME NEW CANDIDATES FOR "STATE-OF-THE-ART" ALGORITHMS, BY WRITING SOME FOR- LOOPS AND RUNNING THEM ON MY LAPTOP. Frank McSherry SCALABILITY! BUT AT WHAT COST?
  • 17. SCALABILITY! BUT AT WHAT COST?
  • 18. SCALABILITY! BUT AT WHAT COST?
  • 19. SCALABILITY! BUT AT WHAT COST?
  • 20. THERE ARE NO PROBLEMS ONLY OPPORTUNITIES source: xkcd
  • 21. PERFORMANCE? PERFORMANCE! ‣ Intuition about what is possible ‣ Impact on real life Java code
  • 22. CPU FUNCTIONAL BLOCKS source: Martin Thompson’s blog
  • 24. CPU FUNCTIONAL BLOCKS NUMA domain 0 NUMA domain 1
  • 27. AMD EPYC DIAGRAM ./lstopo --physical --output-format png /tmp/lstopo.png
  • 28. LATENCY EXPERIMENT CORE 0 CORE 0 “LOCAL” RAM CORE 1 CORE N CAS CAS CAS CAS = UNSAFE.compareAndSwapInt …
  • 29. AMD EPYC CORE PING LATENCY (JAVA-JMH) <dependency> <groupId>org.openjdk.jmh</groupId> <artifactId>jmh-core</artifactId> <version>1.21</version> </dependency> <dependency> <groupId>org.openjdk.jmh</groupId> <artifactId>jmh-generator-annprocess</artifactId> <version>1.21</version> <scope>test</scope> </dependency> <dependency> <groupId>org.xerial</groupId> <artifactId>jnuma</artifactId> <version>0.1.3</version> </dependency>
  • 30. AMD EPYC CORE PING LATENCY (JAVA-JMH) @Setup public void setUp(CoreSequence coreSequence) { Numa.runOnNode(node); ByteBuffer buffer = Numa.allocOnNode(capacity, node); long address = address(buffer); }
  • 31. AMD EPYC CORE PING LATENCY (JAVA-JMH) @Benchmark @Group("pingpong") public void ping(Control cnt, GroupState groupState, CoreAssigner coreAssigner) { while ( !cnt.stopMeasurement && !UNSAFE.compareAndSwapInt(null, groupState.flagAddress, 0, 1) ) { // this body is intentionally left blank } } @Benchmark @Group("pingpong") public void pong(Control cnt, GroupState groupState, CoreAssigner coreAssigner) { while ( !cnt.stopMeasurement && !UNSAFE.compareAndSwapInt(null, groupState.flagAddress, 1, 0) ) { // this body is intentionally left blank } }
  • 32. AMD EPYC CORE PING LATENCY (JAVA-JMH)
  • 33. AMD EPYC CORE PING LATENCY (JAVA-JMH)
  • 34. INTEL XEON PLATINUM PING LATENCY (JAVA-JMH)
  • 35. NUMA RAM READ THROUGHPUT A. <500 MB/s B. <2GB/s C. <20GB/s D. ~100GB/s
  • 36. NUMA RAM READ THROUGHPUT - PREDICTABLE PATTERN @Benchmark public int readByteBufferInOrder() { Numa.runOnNode(threadNumaNode); int result = 0; for (int i = 0; i < byteBuffer.limit(); i += 8) { result += byteBuffer.getLong(i); } return result; }
  • 38. BRANCH PREDICTION RAM THROUGHPUT - READ 400MB OF DATA A. ~1s B. 0.348s C. 0.150s D. It depends
  • 39. MULTI BYTE PROCESSING WITH FULL WORD INSTRUCTIONS 0 150MS 300MS 450MS 600MS Selectivity (%) 0 10 20 30 40 50 60 70 80 90 100 533ms 43ms 43ms
  • 40. MULTI BYTE PROCESSING WITH FULL WORD INSTRUCTIONS @Benchmark public int plainIf() { int result = 0; for (int i = 0; i < inputSize; i++) { if (onHeapValues.getInt(i * 4) < predicate) { result++; } } return result; } SIMPLE PREDICATE
  • 42. MULTI BYTE PROCESSING WITH FULL WORD INSTRUCTIONS MASK = 0111 1111 ~MASK = 1000 0000 42 = 0010 1010 13 = 0000 1101 13 ^ MASK = 0111 0010 + 42 = 1001 1100 &~MASK = 1000 0000
  • 43. MULTI BYTE PROCESSING WITH FULL WORD INSTRUCTIONS @Benchmark public int plainIf() { int result = 0; for (int i = 0; i < inputSize; i++) { if (onHeapValues.getInt(i * 4) < predicate) { result++; } } return result; } SIMPLE PREDICATE
  • 44. MULTI BYTE PROCESSING WITH FULL WORD INSTRUCTIONS @Benchmark public int plainIf() { int result = 0; for (int i = 0; i < inputSize; i++) { result += RETURN_1_IF_MATCH( onHeapValues.getInt(i * 4), predicate ) } return result; } BRANCHLESS (PSEUDOCODE)
  • 45. MULTI BYTE PROCESSING WITH FULL WORD INSTRUCTIONS @Benchmark public int plainArithmetic() { int result = 0; int mask = 0b0111_1111; for (int i = 0; i < inputSize; i++) { int x = onHeapValues.getInt(i * 4) ^ mask; int z = (predicate + x) & ~mask; result += Integer.bitCount(z); } return result; } BRANCHLESS
  • 46. MULTI BYTE PROCESSING WITH FULL WORD INSTRUCTIONS 0 150MS 300MS 450MS 600MS 0 10 20 30 40 50 60 70 80 90 100 CONDITIONAL ARITHMETIC 97ms 533ms 43ms 43ms
  • 47. MULTI BYTE PROCESSING WITH FULL WORD INSTRUCTIONS 0 150MS 300MS 450MS 600MS 0 10 20 30 40 50 60 70 80 90 100 CONDITIONAL ARITHMETIC PACKED 11ms 97ms 533ms 43ms 43ms
  • 49. BRANCHLESS DOOM “The mov-only DOOM renders approximately one frame every 7 hours, so playing this version requires somewhat increased patience.”
  • 50. CONCLUSION ‣ THERE ARE TWO HATS YOU CAN WEAR ‣ 0.1X CONCENTRATES ON USER VALUE ‣ 10X ON TECHNOLOGY ‣ THESE ARE NOT EXCLUSIVE ‣ LEARN ALGORITHMS AND DATA STRUCTURES (LINDY’S!) ‣ EXPERIMENT ‣ DEVELOP INTUITION OF WHAT IS POSSIBLE
  • 51. PAPERS REFERENCES ‣Efficiently Compiling Efficient Query Plans for Modern Hardware ‣Constant-Time Query Processing ‣https://danluu.com/branch-prediction/ ‣StarCraft II as an AI research environment ‣Multiple Byte Processing with FullWord Instructions ‣http://www.vldb.org/2017/keynotes.php
  • 52. COPYRIGHTS COPYRIGHTS ▸ Houston opportunities by XKCD.com ▸ A/B Testing example: https://en.wikipedia.org/wiki/A/ B_testing#/media/File:A-B_testing_example.png by Maxime Lorant ▸ AMD for EPYC diagrams ▸ Martin Thompson (https://mechanical- sympathy.blogspot.com/2013/02/cpu-cache-flushing- fallacy.html) for 2 socket CPU diagram