CS Fundamentals: Scalability and Memory

SCALABILITY AND MEMORY
CS FUNDAMENTALS SERIES
http://bit.ly/1TPJCe6

HOW DO YOU MEASURE
AN ALGORITHM?

ALSO, TOO FLAKY
EVEN ON THE SAME
MACHINE.

THIS IS TWO LINES, BUT A WHOLE
LOT OF STUPID.

ALL THESE METHODS SUCK.
NONE OF THEM CAPTURE WHAT WE
ACTUALLY CARE ABOUT.

TEXT
ASYMPTOTIC ANALYSIS
▸ Big O is about asymptotic analysis

TEXT
ASYMPTOTIC ANALYSIS
▸ In other words, it’s about how an algorithm scales when
the numbers get huge

TEXT
ASYMPTOTIC ANALYSIS
▸ You can also describe this as “the rate of growth”

TEXT
ASYMPTOTIC ANALYSIS
▸ You can also describe this as “the rate of growth”
▸ How fast do the numbers become unmanageable?

TEXT
ASYMPTOTIC ANALYSIS
▸ Another way to think about this is:

TEXT
ASYMPTOTIC ANALYSIS
▸ What happens when your input size is 10,000,000? Will
your program be able to resolve?

TEXT
ASYMPTOTIC ANALYSIS
▸ What happens when your input size is 10,000,000? Will
your program be able to resolve?
▸ It’s about scalability, not necessarily speed

TEXT
PRINCIPLES OF BIG O
▸ Big O is a kind of mathematical notation

TEXT
PRINCIPLES OF BIG O
▸ In computer science, it means essentially means

TEXT
PRINCIPLES OF BIG O
“the asymptotic rate of growth”

TEXT
PRINCIPLES OF BIG O
▸ In other words, how does the running time of this function
scale with the input size when the numbers get big?

TEXT
PRINCIPLES OF BIG O
▸ Big O notation looks like this:

TEXT
PRINCIPLES OF BIG O
▸ Big O notation looks like this:
O(n) O(nlog(n)) O(n2
)

TEXT
PRINCIPLES OF BIG O
▸ n here refers to the input size

TEXT
PRINCIPLES OF BIG O
▸ Can be the size of an array, the length of a string, the
number of bits in a number, etc.

TEXT
PRINCIPLES OF BIG O
▸ O(n) means the algorithm scales linearly with the input

TEXT
PRINCIPLES OF BIG O
▸ O(n) means the algorithm scales linearly with the input
▸ Think like a line (y = x)

TEXT
PRINCIPLES OF BIG O
▸ “Scaling linearly” can mean 1:1 (one iteration per extra
input), but it doesn’t necessarily

TEXT
PRINCIPLES OF BIG O
▸ “Scaling linearly” can mean 1:1 (one iteration per extra
input), but it doesn’t necessarily
▸ It can simply mean k:1 where k is constant, like 3:1 or 5:1
(i.e., a constant amount of time per extra input)

TEXT
PRINCIPLES OF BIG O
▸ In Big O, we strip out any coefﬁcients or smaller factors.

TEXT
PRINCIPLES OF BIG O
▸ The fastest-growing factor wins. This is also known as the
dominant factor.

TEXT
PRINCIPLES OF BIG O
dominant factor.
▸ Just think, when the numbers get huge, what dwarfs
everything else?

TEXT
PRINCIPLES OF BIG O
dominant factor.
everything else?
▸ O(5n) => O(n)

TEXT
PRINCIPLES OF BIG O
dominant factor.
everything else?
▸ O(5n) => O(n)
▸ O(½n - 10) also => O(n)

TEXT
PRINCIPLES OF BIG O
▸ O(k) where k is any constant reduces to O(1).

TEXT
PRINCIPLES OF BIG O
▸ O(200) = O(1)

TEXT
PRINCIPLES OF BIG O
▸ O(200) = O(1)
▸ Where there are multiple factors of growth, the most
dominant one wins.

TEXT
PRINCIPLES OF BIG O
▸ O(200) = O(1)
▸ Where there are multiple factors of growth, the most
dominant one wins.
▸ O(n4 + n2 + 40n) = O(n4)

TEXT
PRINCIPLES OF BIG O
▸ If there are two inputs (say you’re trying to ﬁnd all the
common substrings of two strings), then you use two
variables in your Big O notation => O(n * m)

TEXT
PRINCIPLES OF BIG O
▸ Doesn’t matter if one variable probably dwarfs the other.
You always include both.

TEXT
PRINCIPLES OF BIG O
▸ O(n + m) => this is considered linear

TEXT
PRINCIPLES OF BIG O
▸ O(n + m) => this is considered linear
▸ O(2n + log(m)) => this is considered exponential

TEXT
COMPREHENSION TEST
Convert each of these to their appropriate Big O form!

TEXT
COMPREHENSION TEST
▸ O(3n + 5)

TEXT
COMPREHENSION TEST
▸ O(3n + 5)
▸ O(n + 1/5n2)

TEXT
COMPREHENSION TEST
▸ O(3n + 5)
▸ O(n + 1/5n2)
▸ O(log(n) + 5000)

TEXT
COMPREHENSION TEST
▸ O(3n + 5)
▸ O(n + 1/5n2)
▸ O(log(n) + 5000)
▸ O(2m3
+ 50 + ½n)

TEXT
COMPREHENSION TEST
▸ O(3n + 5)
▸ O(n + 1/5n2)
▸ O(log(n) + 5000)
▸ O(2m3
+ 50 + ½n)
▸ O(nlog(m) + 2m2
+ nm)

▸ What should n be for this function?

For each character in the string…
Unshift them into an array…
And then join the array together.
Let’s break it down.
Make an empty array.

▸ Initialize an empty array => O(1)
▸ Then, split the string into an array of characters => O(n)
▸ Then for each character => O(n)
▸ Unshift into an array => O(n)
▸ Then join the characters into a string => O(n)
We’ll see later why this is.

These multiply. => O(n2
)

▸ O(n2 + 2n) = O(n2)

▸ O(n2 + 2n) = O(n2)
▸ This algorithm is quadratic.

▸ O(n2 + 2n) = O(n2)
▸ This algorithm is quadratic.
▸ Let’s see how badly it sucks.

Benchmark
away!
(showSlowReverse.js)

TEXT
TIME COMPLEXITIES WAY TOO FAST

TEXT
Constant O(1)
math, pop, push, arr[i], property access,
conditionals, initializing a variable

TEXT
Constant O(1)
Logarithmic O(logn) binary search

TEXT
Constant O(1)
Linear O(n) linear search, iteration

TEXT
Constant O(1)
Linearithmic O(nlogn) sorting (merge sort, quick sort)

TEXT
Constant O(1)
Quadratic O(n2
) nested looping, bubble sort

TEXT
Constant O(1)
Quadratic O(n2
Cubic O(n3
) triply nested looping, matrix multiplication

TEXT
Constant O(1)
Quadratic O(n2
Cubic O(n3
Polynomial O(nk
) all “efﬁcient” algorithms

TEXT
Constant O(1)
Quadratic O(n2
Cubic O(n3
Polynomial O(nk
Exponential O(2n
) subsets, solving chess

TEXT
Constant O(1)
Quadratic O(n2
Cubic O(n3
Polynomial O(nk
Exponential O(2n
) subsets, solving chess
Factorial O(n!) permutations

TIME TO IDENTIFY
TIME COMPLEXITIES

OPTIMIZATIONS DON’T
ALWAYS MATTER

BOTTLENECKS
▸ A bottleneck is the part of your code where your algorithm
spends most of its time.

BOTTLENECKS
▸ Asymptotically, it’s wherever the dominant factor is.

BOTTLENECKS
▸ If your algorithm is has an O(n) part and an O(50) part,
the bottleneck is the O(n) part.

BOTTLENECKS
▸ If your algorithm is has an O(n) part and an O(50) part,
the bottleneck is the O(n) part.
▸ As n => ∞, your algorithm will eventually spend 99%+ of
its time in the bottleneck.

BOTTLENECKS
▸ When trying to optimize or speed up an algorithm, focus
on the bottleneck.

BOTTLENECKS
on the bottleneck.
▸ Optimizing code outside the bottleneck will have a
minuscule effect.

BOTTLENECKS
on the bottleneck.
▸ Optimizing code outside the bottleneck will have a
minuscule effect.
▸ Bottleneck optimizations on the other hand can easily be
huge!

BOTTLENECKS
▸ If you cut down non-bottleneck code, you might be able to
save .01% of your runtime.

BOTTLENECKS
▸ If you cut down on bottleneck code, you might be able to
save 30% of your runtime.

BOTTLENECKS
▸ If you cut down on bottleneck code, you might be able to
save 30% of your runtime.
▸ Better yet, try to lower the time complexity altogether if
you can!

SPACE COMPLEXITY
▸ Same thing, except now with memory instead of time.

SPACE COMPLEXITY
▸ Do you take linear extra space relative to the input?

SPACE COMPLEXITY
▸ Do you take linear extra space relative to the input?
▸ Do you allocate new arrays? Do you have to make a copy
of the original input? Are you creating nested data
structures?

COMPREHENSION CHECK
▸ What is the space complexity of:

COMPREHENSION CHECK
▸ max(arr)

COMPREHENSION CHECK
▸ max(arr)
▸ ﬁrstFive(arr)

COMPREHENSION CHECK
▸ max(arr)
▸ ﬁrstFive(arr)
▸ substrings(str)

COMPREHENSION CHECK
▸ max(arr)
▸ ﬁrstFive(arr)
▸ substrings(str)
▸ hasVowel(str)

SO WHAT THE HELL
IS MEMORY ANYWAY

TO UNDERSTAND MEMORY, WE
NEED TO UNDERSTAND HOW A
COMPUTER IS STRUCTURED.

Immediate workspace. A CPU usually has 16 of these.
Data Layers
1 cycle

Data Layers
A nearby reservoir of useful data we’ve recently read. Close-by.
1 cycle
~4 cycles

Data Layers
More nearby data, but a little farther away.
1 cycle
~4 cycles
~10 cycles

Data Layers
~800 cycles. Getting pretty far now.
It’s completely random-access, but takes a while.
1 cycle
~4 cycles
~10 cycles

Data Layers
~800 cycles. Getting pretty far now.
It’s completely random-access, but takes a while.
1 cycle
~4 cycles
~10 cycles
On an SSD, you’re looking at ~5,000 cycles.
This is pretty much another country.
And on a spindle drive, it’s more like 50,000.

SO ALL DATA TAKES A JOURNEY
UP FROM THE HARD DISK TO
EVENTUALLY LIVE IN A REGISTER.

WHAT DOES MEMORY
ACTUALLY LOOK LIKE?

IT’S JUST A BUNCH OF CELLS
WITH SHIT IN ‘EM.

IT’S ALL BINARY DATA.
STRINGS, FLOATS, OBJECTS, THEY’RE
ALL STORED AS BINARY.

AND IT’S ALL STORED CONTIGUOUSLY.
THIS IS VERY IMPORTANT WHEN IT
COMES TO ARRAYS.

ARRAYS ARE JUST
CONTIGUOUS BLOCKS OF
MEMORY.

THAT’S WHY
THEY’RE SO FAST.

Garbage Also garbage
Assume each of these cells are 8 bytes (64-bits)

Garbage Also garbage
Let’s imagine they’re addressed like so…

832968 833032 833096 833160 833224 833288 833352 833416 833480 833544
this.startAddr = 833096;

832968 833032 833096 833160 833224 833288 833352 833416 833480 833544
Each cell is offset by exactly 64 in the address space

832968 833032 833096 833160 833224 833288 833352 833416 833480 833544
Each cell is offset by exactly 64 in the address space
Meaning you can easily derive the address of any index

832968 833032 833096 833160 833224 833288 833352 833416 833480 833544
function get(i) {
return this.startAddr + i * 64;
}

832968 833032 833096 833160 833224 833288 833352 833416 833480 833544
function get(i) {
return this.startAddr + i * 64;
}
get(3) = 833096 + 3 * 64 = 83306 + 192 = 833288

THIS IS WHAT MAKES
ARRAY LOOKUPS O(1)

AND IT’S WHY ARRAYS ARE BY
FAR THE FASTEST DATA
STRUCTURE

LET’S WRAP UP BY
TALKING ABOUT CACHE
EFFICIENCY.

When the CPU needs data, it ﬁrst looks in the cache.

Say it’s not in the cache. This is called a cache miss.

The cache then loads the data the CPU requested from RAM…

The cache then loads the data the CPU requested from RAM…
But the cache guesses that if the CPU wanted this data, it probably will also want
other nearby data eventually. It would be stupid to have to make multiple round trips.

In other words, the cache assumes that
related data will be stored around the same
physical area.

In other words, the cache assumes that
related data will be stored around the same
physical area.
The cache assumes locality of data.

So the cache just loads a huge
contiguous chunk of data around
the address the CPU asked for.

Remember this?
Loading from memory is slow as shit.

Remember this?
Loading from memory is slow as shit.
We really want to minimize cache misses.

SO KEEP YOUR DATA LOCAL AND
YOUR DATA STRUCTURES
CONTIGUOUS.

ARRAYS ARE KING, BECAUSE ALL OF
THE DATA IS LITERALLY RIGHT NEXT
TO EACH OTHER IN MEMORY!

An algorithm that jumps around in memory
or follows a bunch of pointers to other objects
will trigger lots of cache misses!

An algorithm that jumps around in memory
or follows a bunch of pointers to other objects
will trigger lots of cache misses!
Think linked lists, trees, even hash maps.

IDEALLY, YOU WANT TO WORK
LOCALLY WITHIN ARRAYS OF
CONTIGUOUS DATA.

I AM
HASEEB QURESHI
You can ﬁnd me on Twitter: @hosseeb
You can read my blog at: haseebq.com

PLEASE DONATE IF YOU GOT
SOMETHING OUT OF THIS
<3
Ranked by GiveWell as the most
efﬁcient charity in the world!

CS Fundamentals: Scalability and Memory

Recommended

Recommended

More Related Content

What's hot

What's hot (16)

Viewers also liked

Viewers also liked (20)

Similar to CS Fundamentals: Scalability and Memory

Similar to CS Fundamentals: Scalability and Memory (20)

Recently uploaded

Recently uploaded (20)

CS Fundamentals: Scalability and Memory