Immutable, persistent data structures are at the heart of Clojure's philosophy. It is instructive to see how these are implemented, to appreciate the trade-offs between persistence and performance. Lets explore the key ideas that led to effective, practical implementations of these data structures. There will be animations that should help clarify key concepts!
9. “… functional programming’s stricture
against destructive updates (assignments)
is a staggering handicap, tantamount to
confiscating a master chef’s knives.”
- Chris Okasaki
10. ABSTRACT DATA TYPE
enqueue add an element to the end
head first element
tail remaining elements
QUEUE
INTERFACE INVARIANTS
NAME
20. ‘(1 2 3) Lists: Code manipulation
[1 2 3] Vectors: All things sequential
{:a 1 :b 2} Maps: Structured Data
#{a e i o u} Sets: Ermm, Sets
CLOJURE DATA STRUCTURES
22. GET GET value for given key
ASSOC ADD key,value to map
DISSOC REMOVE key,value from map
MERGE MERGE two maps together
THE MAP INTERFACE
23. WHAT MAKES A GOOD MAP?
Constant time operations
independent of number of keys
Efficient space utilization even with mutation
Objects as keys, Objects as values
38. RED BLACK TREES
Root is black
Every path from root to an empty node
contains the same number of black nodes
Every node is colored red or black
No red node can have a red child
71. Ideal hash trees, Bagwell 2001
Use a good hash function
to generate an integer key.
STEP 1
0010 1101 1011 1110 1100 1111 1111 1001
hasheq
72. STEP 2
72021 35
Divide the 32 bit integer into ‘symbols’
5 bits at a time.
00101 001111010010101 000110100101
11
Use the ‘symbols’ to walk down an AMT
75. BIT JUGGLING!
Compute ‘symbols’ by shifting and masking
00111000110010110100101010100101
00 00000 00000 00000 00000 00000 11111
(hash >>> shift) & 0x01f
How to calculate nth digit?
Shift by 5*n and mask with 0x1f
76. BEST COMMENT EVER.
A persistent rendition of Phil Bagwell's
Hash Array Mapped Trie
Hickey R., Grand C., Emerick C., Miller A., Fingerhut A.
Uses path copying for persistence
HashCollision leaves vs. extended hashing
Node polymorphism vs. conditionals
No sub-tree pools or root-resizing
Any errors are my own
PersistentHashMap.java:19
77. NODE POLYMORPHISM
ArrayNode - 32 wide pointers to sub-tries
BitmapIndexedNode - bitmap + dynamic array
HashCollisionNode - array for things that collide
92. VECTOR CATENATION
Based on Bagwell and Rompf,
“RRB-Trees: Efficient Immutable Vectors”
logarithmic catenation and slicing
Michal Marczyk
core.rrb-vector
TODO: benchmarks
94. 1959 Birandais, Fredkin Trie
1960 Windley,Booth, Colin,Hibbard Binary Search Trees
1962 Adelson-Velsky, Landis AVL Trees
1978 Guibas, Sedgwick Red Black Trees
1985 Sleator, Tarjan Splay Trees
1996 Okasaki Purely Functional
Data Structures
1998 Sedgwick Ternary Search Trees
2000 Phil Bagwell AMT
2001 Phil Bagwell HAMT
2007 Rich Hickey Clojure!
95. Reading List
Ideal Hash Trees, Bagwell 2001
Fast and efficient trie searches, Bagwell 2000
Fast Mergeable Integer Maps, Okasaki & Gill, 1998
The worlds fastest scrabble program, Appel & Jacobson, 1988
File searching using variable length keys, Birandais, 1959
Purely Functional Data Structures, Okasaki 1996