We all have certainly learned data structures at school: arrays, lists, sets, stacks, queues (LIFO/FIFO), heaps, associative arrays, trees, ... and what do we mostly use in PHP? The "array"! In most cases, we do everything and anything with it but we stumble upon it when profiling code. During this session, we'll learn again to use the structures appropriately, leaning closer on the way to employ arrays, the SPL and other structures from PHP extensions as well.The impact that PHP 7 should have on data structures will be introduced as well.
2. About me
● Patrick Allaert
● Founder of Libereco and co-founder of catchy.io
● Playing with PHP/Linux for +15 years
● eZ Publish core developer
● Author of the APM PHP extension
● @patrick_allaert
● patrickallaert@php.net
● http://github.com/patrickallaert/
● http://patrickallaert.blogspot.com/
13. Array: PHP's untruthfulness
PHP“Arrays”can dynamically grow and be iterated
both directions (reset(), next(), prev(), end()),
exclusively with O(1) operations.
Let's have a Doubly Linked List (DLL):
Data Data Data Data Data
Head Tail
Enables Queue, Stack and Deque implementations
15. Array: PHP's untruthfulness
PHP“Arrays”elements are always accessible using a
key (index).
Let's have an Hash Table:
Data Data Data Data Data
Head Tail
Bucket Bucket Bucket Bucket Bucket
Bucket pointers array
Bucket *
0
Bucket *
1
Bucket *
2
Bucket *
3
Bucket *
4
Bucket *
5 ...
Bucket *
nTableSize -1
19. Array: PHP's untruthfulness
● In C: 100 000 integers (using long on 64bits => 8
bytes) can be stored in 0.76 MiB.
● In PHP 5:
● it will take 13.97 MiB!≅
● A variable (containing an integer) takes 48 bytes.
● The overhead for every“array”entries is about 96 bytes.
20. Array: PHP's untruthfulness
● In C: 100 000 integers (using long on 64bits => 8
bytes) can be stored in 0.76 MiB.
● In PHP 5 7:
● it will take ≅ 13.97 4 MiB!
● A variable (containing an integer) takes 48 16 bytes.
● The overhead for every“array”entries is about 96 20
bytes.
23. Structs (or records, tuples,...)
● A struct is a value containing other values which
are typically accessed using a name.
● Example:
Person => firstName / lastName
ComplexNumber => realPart / imaginaryPart
25. Structs – Using a class
$person = new PersonStruct(
"Patrick", "Allaert"
);
26. Structs – Using a class
(Implementation)
class PersonStruct
{
public $firstName;
public $lastName;
public function __construct($firstName, $lastName)
{
$this->firstName = $firstName;
$this->lastName = $lastName;
}
}
27. Structs – Using a class
(Implementation)
class PersonStruct
{
public $firstName;
public $lastName;
public function __construct($firstName, $lastName)
{
$this->firstName = $firstName;
$this->lastName = $lastName;
}
public function __set($key, $value)
{
// a. Do nothing
// b. trigger_error()
// c. Throws an exception
}
}
29. Structs – Pros and Cons
Using a class implementation
+ Type hinting possible
+ Rigid structure
+ More OO
+ Uses ~ 26% less memory
- Slower to create by ~ 50%
Starting PHP 7:
+ Uses ~ 66% less memory
- Slower to create by a factor 2!
37. Queues
● A queue is an ordered collection respecting First
In, First Out (FIFO) order.
● Elements are inserted at one end and removed at
the other.
38. Queues
● A queue is an ordered collection respecting First
In, First Out (FIFO) order.
● Elements are inserted at one end and removed at
the other.
Data DataDataData Data Data
Data
Data
Enqueue
Dequeue
39. Queues – Using array
$queue = [];
$queue[] = 1; // or array_push()
$queue[] = 2; // or array_push()
$queue[] = 3; // or array_push()
array_shift($queue); // gives 1
array_shift($queue); // gives 2
array_shift($queue); // gives 3
40. Queues – Using SplQueue
$queue = new SplQueue();
$queue[] = 1; // or $queue->enqueue()
$queue[] = 2; // or $queue->enqueue()
$queue[] = 3; // or $queue->enqueue()
$queue->dequeue(); // gives 1
$queue->dequeue(); // gives 2
$queue->dequeue(); // gives 3
42. Stacks
● A stack is an ordered collection respecting Last In,
First Out (LIFO) order.
● Elements are inserted and removed on the same
end.
43. Stacks
● A stack is an ordered collection respecting Last In,
First Out (LIFO) order.
● Elements are inserted and removed on the same
end.
Data DataDataData Data Data
Data
Data
Push
Pop
44. Stacks – Using array
$stack = [];
$stack[] = 1; // or array_push()
$stack[] = 2; // or array_push()
$stack[] = 3; // or array_push()
array_pop($stack); // gives 3
array_pop($stack); // gives 2
array_pop($stack); // gives 1
45. Stacks – Using SplStack
$stack = new SplStack();
$stack[] = 1; // or $stack->push()
$stack[] = 2; // or $stack->push()
$stack[] = 3; // or $stack->push()
$stack->pop(); // gives 3
$stack->pop(); // gives 2
$stack->pop(); // gives 1
47. Queues/Stacks – Pros and Cons
SplQueue / SplStack
+ Uses less memory
+ Type hinting
+ More OO
- A bit more cpu intensive
Starting PHP 7 (comparatively to arrays):
- Uses more memory
- Much more cpu intensive
=> They haven't received as much attention as arrays did (yet?).
49. Sets
● A set is a collection with no particular ordering
especially suited for testing the membership of a
value against a collection or to perform
union/intersection/complement operations
between them.
50. Sets
● A set is a collection with no particular ordering
especially suited for testing the membership of a
value against a collection or to perform
union/intersection/complement operations
between them.
Data
Data
Data
Data
Data
51. Sets – Using array
$set = [];
// Adding elements to a set
$set[] = 1;
$set[] = 2;
$set[] = 3;
// Checking presence in a set
in_array(2, $set); // true
in_array(5, $set); // false
array_merge($set1, $set2); // union
array_intersect($set1, $set2); // intersection
array_diff($set1, $set2); // complement
52. Sets – Using array
$set = [];
// Adding elements to a set
$set[] = 1;
$set[] = 2;
$set[] = 3;
// Checking presence in a set
in_array(2, $set); // true
in_array(5, $set); // false
array_merge($set1, $set2); // union
array_intersect($set1, $set2); // intersection
array_diff($set1, $set2); // complement
True
performance
killers!
56. Sets – Mis-usage
Testing 5 * 107
membership against set of 3 elements
in_array compare switch optimized way ;)
0
5
10
15
20
25
19,59
3,15
5,2
1,97
3,43
2,34
1,53
0,75
PHP 5.6
PHP 7
Time(s)
57. Sets – Using array (simple types)
$set = [];
// Adding elements to a set
$set[1] = true; // Any dummy value
$set[2] = true; // is good but NULL!
$set[3] = true;
// Checking presence in a set
isset($set[2]); // true
isset($set[5]); // false
$set1 + $set2; // union
array_intersect_key($set1, $set2); // intersection
array_diff_key($set1, $set2); // complement
58. Sets – Using array (simple types)
$set = [];
// Adding elements to a set
$set[1] = true; // Any dummy value
$set[2] = true; // is good but NULL!
$set[3] = true;
// Checking presence in a set
isset($set[2]); // true
isset($set[5]); // false
$set1 + $set2; // union
array_intersect_key($set1, $set2); // intersection
array_diff_key($set1, $set2); // complement
● Remember that PHP Array keys can be integers or
strings only!
59. Sets – Using array (objects)
$set = [];
// Adding elements to a set
$set[spl_object_hash($object1)] = $object1;
$set[spl_object_hash($object2)] = $object2;
$set[spl_object_hash($object3)] = $object3;
// Checking presence in a set
isset($set[spl_object_hash($object2)]); // true
isset($set[spl_object_hash($object5)]); // false
$set1 + $set2; // union
array_intersect_key($set1, $set2); // intersection
array_diff_key($set1, $set2); // complement
60. Sets – Using array (objects)
$set = [];
// Adding elements to a set
$set[spl_object_hash($object1)] = $object1;
$set[spl_object_hash($object2)] = $object2;
$set[spl_object_hash($object3)] = $object3;
// Checking presence in a set
isset($set[spl_object_hash($object2)]); // true
isset($set[spl_object_hash($object5)]); // false
$set1 + $set2; // union
array_intersect_key($set1, $set2); // intersection
array_diff_key($set1, $set2); // complement
Store a
reference of
the object!
61. Sets – Using SplObjectStorage
(objects)
$set = new SplObjectStorage();
// Adding elements to a set
$set->attach($object1); // or $set[$object1] = null;
$set->attach($object2); // or $set[$object2] = null;
$set->attach($object3); // or $set[$object3] = null;
// Checking presence in a set
isset($set[$object2]); // true
isset($set[$object5]); // false
$set1->$addAll($set2); // union
$set1->removeAllExcept($set2); // intersection
$set1->removeAll($set2); // complement
62. Sets – Using QuickHash (int)
● No union/intersection/complement operations
(yet?)
● Yummy features like (loadFrom|saveTo)(String|File)
$set = new QuickHashIntSet(64,QuickHashIntSet::CHECK_FOR_DUPES);
// Adding elements to a set
$set->add(1);
$set->add(2);
$set->add(3);
// Checking presence in a set
$set->exists(2); // true
$set->exists(5); // false
isset($set[2]);
63. Sets – Using bitsets
function remove(
$path, $files = true, $dir = true,
$links = true, $exec = true
)
{
if (!$files && is_file($path))
return false;
if (!$dir && is_dir($path))
return false;
if (!$links && is_link($path))
return false;
if (!$exec && is_executable($path))
return false;
// ...
}
72. Sets – Using bitsets
define("E_ERROR", 1 << 0);
define("E_WARNING", 1 << 1);
define("E_PARSE", 1 << 2);
define("E_NOTICE", 1 << 3);
// Adding elements to a set
$set = 0;
$set |= E_ERROR;
$set |= E_WARNING;
$set |= E_PARSE;
// Checking presence in a set
$set & E_ERROR; // true
$set & E_NOTICE; // false
$set1 | $set2; // union
$set1 & $set2; // intersection
$set1 ^ $set2; // complement
73. Sets – Using bitsets (example)
define("REMOVE_FILES", 1 << 0);
define("REMOVE_DIRS", 1 << 1);
define("REMOVE_LINKS", 1 << 2);
define("REMOVE_EXEC", 1 << 3);
define("REMOVE_ALL", ~0); // Setting all bits
function remove($path, $options = REMOVE_ALL)
{
if (~$options & REMOVE_FILES && is_file($path))
return false;
if (~$options & REMOVE_DIRS && is_dir($path))
return false;
if (~$options & REMOVE_LINKS && is_link($path))
return false;
if (~$options & REMOVE_EXEC && is_executable($path))
return false;
// ...
}
74. Sets – Using bitsets (example)
remove("/tmp/removeMe", REMOVE_FILES | REMOVE_LINKS);
75. Sets – Using bitsets (example)
remove("/tmp/removeMe", REMOVE_FILES | REMOVE_LINKS);
// Much better :)
76. Sets: Conclusions
● Use the key and not the value when using PHP
Arrays.
● Use QuickHash for set of integers if possible.
● Use SplObjectStorage as soon as you are playing
with objects.
● Use bitsets when playing with finite number of
elements (and known in advance).
● Avoid array_unique() / in_array() at all price!
77. Maps
● A map is a collection of key/value pairs where all
keys are unique.
78. Maps – Using array
● Don't use array_merge() on maps.
$map = [];
$map["ONE"] = 1;
$map["TWO"] = 2;
$map["THREE"] = 3;
// Merging maps:
array_merge($map1, $map2); // SLOW!
$map2 + $map1; // Fast :)
79. Maps – Using array
Testing 107
merges against 2 maps of 5 elements
array_merge +
0
0,5
1
1,5
2
2,5
3
3,5
4
4,5
5
4,74
2,77
1,42
1,09
PHP 5.6
PHP 7
Time(s)
81. Heap
● A heap is a tree-based structure in which all
elements are ordered with largest key at the top,
and the smallest one as leafs.
82. Heap
● A heap is a tree-based structure in which all
elements are ordered with largest key at the top,
and the smallest one as leafs.
83. Heap – Using Spl(Min|Max)Heap
$heap = new SplMinHeap;
$heap->insert(30);
$heap->insert(20);
$heap->insert(25);
var_dump($heap->top());
/* int(20) */
84. Heaps: Conclusions
● MUCH faster than having to re-sort() an array at
every insertion.
● If you don't require a collection to be sorted at
every single step and can insert all data at once
and then sort(). Array is a much better/faster
approach.
● SplPriorityQueue is very similar, consider it is the
same as SplHeap but where the sorting is made on
the key rather than the value.
85. Bloom filters
● A bloom filter is a space-efficient probabilistic data
structure used to test whether an element is
member of a set.
● False positives are possible, but false negatives are
not!
86. Bloom filters – Using bloomy
$bloom = new BloomFilter(
10000, // capacity
0,001 // (optional) error rate
// (optional) random seed
);
$bloom->add("An element");
$bloom->has("An element"); // true for sure
$bloom->has("Foo"); // false, most probably
87. Other related projects
● SPL Types: Various types implemented as object:
SplInt, SplFloat, SplEnum, SplBool and SplString
http://pecl.php.net/package/SPL_Types
88. Other related projects
● SPL Types: Various types implemented as object:
SplInt, SplFloat, SplEnum, SplBool and SplString
http://pecl.php.net/package/SPL_Types
● Judy: Sparse dynamic arrays implementation
http://pecl.php.net/package/Judy
89. Other related projects
● SPL Types: Various types implemented as object:
SplInt, SplFloat, SplEnum, SplBool and SplString
http://pecl.php.net/package/SPL_Types
● Judy: Sparse dynamic arrays implementation
http://pecl.php.net/package/Judy
● Weakref: Weak references implementation.
Provides a gateway to an object without
preventing that object from being collected by the
garbage collector.
91. Conclusions
● Use appropriate data structure. It will keep your
code clean and fast.
● Think about the time and space complexity
involved by your algorithms.
92. Conclusions
● Use appropriate data structure. It will keep your
code clean and fast.
● Think about the time and space complexity
involved by your algorithms.
● Name your variables accordingly: use“Map”,“Set”,
“List”,“Queue”,... to describe them instead of using
something like: $ordersArray.