2. logistics
2
• assignment 1 is up
• cowbook available at learning center beta, otakaari 1 x
• if you do not have access to the lab, provide Aalto
username or email today!!
• if you do not have access at mycourses, i will post material
(slides and assignments) also at http://michalis.co/moderndb/
3. in this lecture...
b+ trees and hash-based indexing
external sorting
join algorithms
query optimization
3
5. b+ trees
5
leaf nodes
contain data-entries
sequentially linked
each node stored in one page
data entries can be any one of the three alternative types
type 1: data records; type 2: (k, rid); type 3: (k, rids)
at least 50% capacity - except for root!
non-leaf nodes
index entries
used to direct search
in the examples that follow...
alternative 2 is used
all nodes have between d and 2d key entries
d is the order of the tree
6. b+ trees
6
non-leaf nodes
index entries
used to direct search
P0 K 1 P 1 K 2 P 2 K m P m
k* < K1 K1 ≤ k* < K2
Km ≤ k*
leaf nodes
contain data-entries
sequentially linked
closer look at non-leaf nodes
search key values pointers
7. b+ trees
7
most widely used index
search and updates at logFN cost (cost = pages I/O)
F = fanout (num of pointers per index node); N = num of leaf pages
efficient equality and range queries
non-leaf nodes
index entries
used to direct search
leaf nodes
contain data-entries
sequentially linked
8. example b+ tree - search
search begins at root, and key comparisons direct it to a leaf
search for 5*; search for all data entries >= 24*
root
17 24 30
2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*
13
8
9. inserting a data entry
1. find correct leaf L
2. place data entry onto L
a. if L has enough space, done!
b. else must split L into L and L2
• redistribute entries evenly
• copy up the middle key to parent of L
• insert entry pointing to L2 to parent of L
9
the above happens recursively
when index nodes are split, push up middle key
splits grow the tree
root split increases height
10. example b+ tree
insert 8*
root
17 24 30
2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*
13
10
middle key is copied up (and
continues to appear in the leaf)
split!
5
≥ 5< 5
5 24 30
17
13
middle key is pushed up
≥ 17< 17split parent node!
L
2* 3* 5* 7* 8*
L L2
12. deleting a data entry
12
inverse of insertion
re-distribute entries &
(maybe) merge nodes
vs split nodes &
re-distribute entries
when nodes
are less than half-full
when nodes
overflow
remove data entry add data entryvs
deletion insertion
13. b+ trees in practice
typical order d = 100, fill-factor = 67%
average fan-out 133
typical capacities:
for height 4: 1334 = 312,900,700 records
for height 3: 1333 = 2,352,637 records
can often hold top levels in main memory
level 1: 1 page = 8KBytes
level 2: 133 pages = 1MByte
level 3: 17,689 pages = 133 MBytes
13
15. hash-based index
the index supports equality queries
does not support range queries
static and dynamic variants exist 15
data entries organized in M buckets
bucket = a collection of pages
the data entry for record
with search key value key
is assigned to bucket
h(key) mod M
hash function h(key)
e.g., h(key) = α key + β
h(key) mod M
key
h
0
1
2
M-1
...
buckets
16. static hashing
number of buckets is fixed
start with one page per bucket
allocated sequentially, never de-allocated
can use overflow pages
16
h(key) mod M
key
h
0
1
2
M-1
...
buckets
17. static hashing
drawback
long overflow chains can degrade performance
dynamic hashing techniques
adapt index to data size
extendible and linear hashing
17
18. extendible hashing
problem: bucket becomes full
one solution
double the number of buckets...
...and redistribute data entries
however
reading and re-writing all buckets is expensive
better idea:
use directory of pointers to buckets
double number of ‘logical’ buckets…
but split ‘physically’ only the overflown bucket
directory much smaller than data entry pages - good!
no overflow pages - good! 18
19. example
2
2
2
2
local depth
2
global depth
directory
00
01
10
11
data entries
bucket A
bucket B
bucket C
bucket D
4* 12* 32* 16*
1* 5* 21* 13*
10*
15* 7* 19*
directory is array of size M = 4 = 22
to find bucket for r, take last 2 # bits of h(r)
h(r) = key
e.g., if h(r) = 5 = binary 101
it is in bucket pointed to by 01
global depth = 2
= min. bits enough to enumerate buckets
local depth
= min bits to identify individual bucket
= 2
19
20. insertion
2
2
2
2
local depth
2
global depth
directory
00
01
10
11
data entries
bucket A
bucket B
bucket C
bucket D
4* 12* 32* 16*
1* 5* 21* 13*
10*
15* 7* 19*
try to insert entry to
corresponding bucket
if necessary, double the directory
i.e., when for split bucket
local depth > global depth
20
when directory doubles,
increase global depth +1
if bucket is full,
increase +1 local depth
and split bucket
(allocate new bucket, re-distribute)
21. example
insert record with h(r) = 20
= binary 10100 è bucket A
split bucket A !
allocate new page,
redistribute according to
modulo 2M = 8
3 least significant bits
we’ll have more than 4
buckets now,
so double the directory!
00
01
10
11
2
2
2
2
2
local depth
global depth
directory
bucket A
bucket B
bucket C
bucket D
data entries
4* 12* 32* 16*
1* 5* 21* 13*
10*
15* 7* 19*
21
22. global depth
2
example
3
2
2
2
2
2
local depth 3
2
2
2
3
directory
001
4* 12* 20* bucket A2
split image of A
000
010
011
bucket A
bucket B
bucket C
bucket D
32* 16*
1* 5* 21* 13*
10*
15* 7* 19*
100
101
110
111
22
split bucket A and
redistribute entries
update local depth
double the directory
update global depth
update pointers
23. notes
20 = binary 10100
last 2 bits (00) tell us r belongs in A or A2
last 3 bits needed to tell which
global depth of directory
number of bits enough to determine which bucket any entry belongs to
local depth of a bucket
number of bits enough to determine if an entry belongs to this bucket
when does bucket split cause directory doubling?
before insert, local depth of bucket = global depth
23
24. example
3
4* 12* 20* bucket A2
000
001
010
011
3
3
2
2
2
local depth
global depth
directory
bucket A
bucket B
bucket C
bucket D
32* 16*
1* 5* 21* 13*
10*
15* 7* 19*
100
101
110
111
insert h(r) = 17
split bucket B
000
001
010
011
3 3
global depth
directory
bucket B1* 17*
100
101
110
111
3
bucket B25* 21* 13*
24
other buckets
not shown
25. comments on extendible hashing
if directory fits in memory,
equality query answered in one disk access
answered = retrieve rid
directory grows in spurts
if hash values are skewed, it might grow large
delete: reverse algorithm
empty bucket can be merged with its ‘split image’
when can the directory be halved?
when all directory elements point to same bucket as their ‘split image’
25
27. create index
CREATE INDEX indexb
ON students (age, grade)
USING BTREE;
CREATE INDEX indexh
ON students (age, grade)
USING HASH;
DROP INDEX indexh
ON student;
27
29. the sorting problem
setting
a relation R, stored over N disk pages
3≤B<N pages available in memory (buffer pages)
task
sort records of R and store result on disk
sort by a function of record field values f(r)
why
application need records ordered
part of join implementation (soon...)
29
30. sorting with 3 buffer pages
2 phases
30
1
2
N
input
relation R
Npagesstoredondisk
output
sorted R
f(r)
Npagesstoredondisk
buffer (memory used by dbms)
31. sorting with 3 buffer pages - first phase
31
1
2
N
input
relation R
Npagesstoredondisk
pass 0: output N runs
run: sorted sub-file
after first phase: one run is one page
how: load one page at a time,
sort it in-memory, output to disk
run #1
run #2
run #N
only 1 buffer page needed for first phase
output
N runs
32. run #1
...
run #N/2
sorting with 3 buffer pages - second phase
32
input
relation R
Npagesstoredondisk
run #1
run #2
run #(N-1)
run #N
pass 1,2,...: halve the runs
how: scan pairs of runs, each in own page,
merge in-memory into a new run,
output to disk
input page 1
input page 2
output page
Npagesstoredondisk
output
half runs
33. merge?
33
input page 1
input page 2
output page8 4 3 1
7 6 5 2
merge the two sorted input pages
into the output page
maintaining sorted order
compare the next smallest value from each page
move smallest to output page
values
f(r)
34. merge?
34
input page 1
input page 2
output page8 4 3
7 6 5 2
1
merge the two input pages into the output page
maintaining sorted order
compare the next smallest value from each page
move smallest to output page
35. merge?
35
input page 1
input page 2
output page8 4 3
7 6 5
2 1
merge the two input pages into the output page
maintaining sorted order
compare the next smallest value from each page
move smallest to output page
36. merge?
36
input page 1
input page 2
output page8 4
7 6 5
3 2 1
merge the two input pages into the output page
maintaining sorted order
compare the next smallest value from each page
move smallest to output page
37. merge?
37
input page 1
input page 2
output page8
7 6 5
4 3 2 1
merge the two input pages into the output page
maintaining sorted order
compare the next smallest value from each page
move smallest to output page
output page is full,
what do we do?
write it to disk!
38. merge?
38
input page 1
input page 2
output page8
7 6 5
merge the two input pages into the output page
maintaining sorted order
compare the next smallest value from each page
move smallest to output page
39. merge?
39
input page 1
input page 2
output page8
7 6
5
merge the two input pages into the output page
maintaining sorted order
compare the next smallest value from each page
move smallest to output page
40. merge?
40
input page 1
input page 2
output page8
7
6 5
merge the two input pages into the output page
maintaining sorted order
compare the next smallest value from each page
move smallest to output page
41. merge?
41
input page 1
input page 2
output page8
7 6 5
merge the two input pages into the output page
maintaining sorted order
compare the next smallest value from each page
move smallest to output page
input page is empty!
what do we do?
if the input run has
more pages, load
next one
42. merge?
42
input page 1
input page 2
output page
8 7 6 5
merge the two input pages into the output page
maintaining sorted order
compare the next smallest value from each page
move smallest to output page
43. merge?
43
input page 1
input page 2
output page
merge the two input pages into the output page
maintaining sorted order
compare the next smallest value from each page
move smallest to output page
44. sorting with 3 buffer pages - second phase
44
input
relation R
Npagesstoredondisk
Npagesstoredondisk
run #1
run #2
run #(N-1)
run #N
pass 1,2,...: halve the runs
after log2N passes...
we are done!
input page 1
input page 2
output page
output
sorted R
f(r)
45. sorting with B buffer pages
45
1
2
N
input
relation R
Npagesstoredondisk
output
sorted R
f(r)
Npagesstoredondisk
page #B
page #1
page #2
page #(B-1)
same approach
...
46. sorting with B buffer pages - first phase
46
1
2
N
input
relation R
Npagesstoredondisk
output
sorted R
Npagesstoredondisk
...
pass 0: output [N/B] runs
how: load R to memory in chunks
of B pages, sort in-memory,
output to disk
run #1
run #2
run #[N/B]
page #B
page #1
page #2
page #(B-1)
47. sorting with B buffer pages - second phase
47
input
relation R
Npagesstoredondisk
output
sorted R
f(r)
Npagesstoredondisk
output page
pass 1,2,...:
merge runs in groups of B-1
...
input page #1
input page #2
input page #(B-1)
run #1
run #2
run #[N/B]
48. sorting with B buffer pages
48
1
2
N
input
relation R
Npagesstoredondisk
output
sorted R
f(r)
Npagesstoredondisk
output page
how many passes in total?
let N1 = [N/B]
total number of passes =
1 + [logB-1(N1)]
...
input page #1
input page #2
input page #(B-1)
phase 1 phase 2
49. sorting with B buffer
49
1
2
N
input
relation R
Npagesstoredondisk
output
sorted R
f(r)
Npagesstoredondisk
output page
how many pages I/O per pass?
...
input page #1
input page #2
input page #(B-1)
2N: N input, N output
51. joins
so far, we have seen queries that
operate on a single relation
but we can also have queries that
combine information
from two or more relations
51
52. joins
sid name username age
53666 Sam Jones jones 22
53688 Alice Smith smith 22
53650 Jon Edwards jon 23
students
sid points grade
53666 92 A
53650 65 C
dbcourse
52
SELECT *
FROM students S, dbcourse C
WHERE S.sid = C.sid
what does this compute?
S.sid S.name S.username C.age C.sid C.points C.grade
53666 Sam Jones jones 22 53666 92 A
53650 Jon Edwards jon 23 53650 65 C
53. joins
53
SELECT *
FROM students S, dbcourse C
WHERE S.sid = C.sid
S record #1 C record #1
S record #1 C record #2
S record #1 C record #3
... ...
S record #2 C record #1
S record #2 C record #2
S record #2 C record #3
... ...
intuitively...
take all pairs of records from S and C
(the “cross product” S x C)
keep only records that
satisfy WHERE condition
54. joins
54
SELECT *
FROM students S, dbcourse C
WHERE S.sid = C.sid
S record #1 C record #1
S record #1 C record #2
S record #1 C record #3
... ...
S record #2 C record #1
S record #2 C record #2
S record #2 C record #3
... ...
keep only records that
satisfy WHERE condition
intuitively...
take all pairs of records from S and C
(the “cross product” S x C)
output join result
S S.sid=C.sid C
55. joins
55
SELECT *
FROM students S, dbcourse C
WHERE S.sid = C.sid
keep only records that
satisfy WHERE condition
intuitively...
take all pairs of records from S and C
(the “cross product” S x C) S record #1 C record #2
S record #2 C record #1
output join result
S S.sid=C.sid C
expensive to
materialize!
56. in what follows...
algorithms to compute joins
without materializing cross product
56
SELECT *
FROM students S, dbcourse C
WHERE S.sid = C.sid
assuming WHERE condition is equality condition
as in the example
assumption is not essential, though
58. the join problem
input
relation R: M pages on disk, pR records per page
relation S: N pages on disk, pS records per page
M ≤ N
output
58
R R.a=S.b S
59. if there is enough memory...
load both relations in memory
59
R
S
M pages
N pages
output pages
in-memory
for each record r in R
for each record s in S
if r.a = s.b:
store (r, s) in output pages
output
I/O cost
(ignoring final output cost)
M + N pages
not necessarily
to disk
we only have to scan
each relation once
output
60. page-oriented simple nested loops join
60
1 input page
output page
join using 3 memory (buffer) pages
R
S
1 input page
for each page P of R
for each page Q of S
compute P join Q; store in output page
output
I/O cost (pages)
M + M * N
outer
relation
inner
relation
R is scanned once
S is scanned M times
61. block nested loops join
61
B - 2 input pages
output page
join using B memory (buffer) pages
R
S
1 input page
for each block P of (B-2) pages of R
for each page Q of S
compute P join Q; store in output page
output
I/O cost (pages)
M + [M/(B-2)] * N
outer
relation
inner
relation
R is scanned once
S is scanned M/(B-2) times
62. index nested loop join
relation S has an index on the join attribute
use one page to make a pass over R
use index to retrieve only matching records of S
62
1 input page
output page
R
S
1 input page
output
outer
relation
inner
relation
for each record r of R
for each record s of S with s.b = r.a // query index
add (r,s) to output
63. index nested loop join
cost
63
M
to scan R
( MpR ) x (cost of query on S)
number of records of R
covered in previous lectures
total
M + MpR x (cost of query on S)
64. sort-merge join
two phases
sort and merge
sort
R and S on the join attribute
using external sort algorithm
cost O(nlogn), n: number of relation pages
merge sorted R and S
64
66. sort-merge join: merge
66
1
2
6
7
8
9
current pages
in memory
(only join attributes
are shown)
3
4
5
6
9
10
R.a S.b
r s main loop
repeat while r != s:
advance r until r >= s
advance s until s >= r
output (r, s)
advance r and s
assumption
b is a key for S
67. sort-merge join: merge
67
1
2
6
7
8
9
current pages
in memory
(only join attributes
are shown)
3
4
5
6
9
10
R.a S.b
r
s main loop
repeat while r != s:
advance r until r >= s
advance s until s >= r
output (r, s)
advance r and s
assumption
b is a key for S
68. sort-merge join: merge
68
1
2
6
7
8
9
current pages
in memory
(only join attributes
are shown)
3
4
5
6
9
10
R.a S.b
r
s main loop
repeat while r != s:
advance r until r >= s
advance s until s >= r
output (r, s)
advance r and s
assumption
b is a key for S
69. sort-merge join: merge
69
1
2
6
7
8
9
current pages
in memory
(only join attributes
are shown)
3
4
5
6
9
10
R.a S.b
r
s
main loop
repeat while r != s:
advance r until r >= s
advance s until s >= r
output (r, s)
advance r and s
assumption
b is a key for S
70. sort-merge join: merge
70
1
2
6
7
8
9
current pages
in memory
(only join attributes
are shown)
3
4
5
6
9
10
R.a S.b
r s
main loop
repeat while r != s:
advance r until r >= s
advance s until s >= r
output (r, s)
advance r and s
assumption
b is a key for S
71. sort-merge join: merge
71
1
2
6
7
8
9
current pages
in memory
(only join attributes
are shown)
3
4
5
6
9
10
R.a S.b
r
s
main loop
repeat while r != s:
advance r until r >= s
advance s until s >= r
output (r, s)
advance r and s
assumption
b is a key for S
72. sort-merge join: merge
72
1
2
6
7
8
9
current pages
in memory
(only join attributes
are shown)
3
4
5
6
9
10
R.a S.b
r
s
repeat while r != s:
advance r until r >= s
advance s until s >= r
output (r, s)
advance r and s
main loop
assumption
b is a key for S
73. sort-merge join: merge
73
1
2
6
7
8
9
current pages
in memory
(only join attributes
are shown)
3
4
5
6
9
10
R.a S.b
r
s
repeat while r != s:
advance r until r >= s
advance s until s >= r
output (r, s)
advance r and s
main loop
assumption
b is a key for S
74. sort-merge join: merge
74
1
2
6
7
8
9
current pages
in memory
(only join attributes
are shown)
3
4
5
6
9
10
R.a S.b
r s
repeat while r != s:
advance r until r >= s
advance s until s >= r
output (r, s)
advance r and s
main loop
assumption
b is a key for S
75. sort-merge join: merge
75
1
2
6
7
8
9
current pages
in memory
(only join attributes
are shown)
3
4
5
6
9
10
R.a S.b
r
s
repeat while r != s:
advance r until r >= s
advance s until s >= r
output (r, s)
advance r and s
main loop
assumption
b is a key for S
76. sort-merge join: merge
76
1
2
6
7
8
9
current pages
in memory
(only join attributes
are shown)
3
4
5
6
9
10
R.a S.b
r
s
repeat while r != s:
advance r until r >= s
advance s until s >= r
output (r, s)
advance r and s
main loop
assumption
b is a key for S
77. sort-merge join: merge
77
13
14
15
17
19
23
current pages
in memory
(only join attributes
are shown)
3
4
5
6
9
10
R.a S.b
r
repeat while r != s:
advance r until r >= s
advance s until s >= r
output (r, s)
advance r and s
main loop
assumption
b is a key for S
s
new page
for R!
78. sort-merge join: merge
78
13
14
15
17
19
23
current pages
in memory
(only join attributes
are shown)
3
4
5
6
9
10
R.a S.b
r
repeat while r != s:
advance r until r >= s
advance s until s >= r
output (r, s)
advance r and s
main loop
assumption
b is a key for S
s
cost for merge
M + N
79. sort-merge join: merge
79
13
14
15
17
19
23
current pages
in memory
(only join attributes
are shown)
3
4
5
6
9
10
R.a S.b
r
repeat while r != s:
advance r until r >= s
advance s until s >= r
output (r, s)
advance r and s
main loop
assumption
b is a key for S
s
cost for merge
M + N
what if this assumption
does not hold?
81. sort-merge join: merge
81
sorted R sorted S
the pages of
sorted R and S
join attribute is not key
on either relation
parts of R and S with same
join attribute value
must output cross product
of these parts
R.a = 15
R.a = 20
S.b = 15
S.b = 20
82. sort-merge join: merge
82
sorted R sorted S
the pages of
sorted R and S
join attribute is not key
on either relation
modify algorithm to perform
join over these areas
e.g., page-oriented
simple nested loops
R.a = 15
R.a = 20
S.b = 15
S.b = 20 for that algorithm,
worst case cost
is M + MN
83. two phases
partition and probe
partition
each relation into partitions
using the same hash function
on the join attribute
probe
join the corresponding partitions
83
hash join
84. hash join - partition
84
1 input page
(B - 1) partition / output pages
h(R.a)
R
B-1 partitions of R
use B-1 pages to hold the partitions,
flush when full or scan of R ends
B buffer pages available
scan R with 1 buffer page
hash into B-1 partitions
on join attribute
85. hash join - partition
85
1 input page
(B - 1) partition / output pages
h(S.b)
S
B-1 partitions of R
B-1 partitions of S
B buffer pages available
scan S with 1 buffer page
hash into B-1 partitions
on join attribute
use B-1 pages to hold the partitions,
flush when full or scan of S ends
86. hash join - probe
86
B-2 input pages
output page
1 input page
output
k-th partition
of R
k-th partition
of S
B buffer pages available
load k-th partition of R into memory
(assuming it fits in B-2 pages)
scan k-th partition of S one page at a time;
for each record of S, probe the partition
of R for matching records;
store matches in output page;
flush when full or done
holds when size of each
partition fits in B-2 pages
approximately
B-2 > M / (B - 1)
B > √M
variant
re-partition the partition of
R in-memory with hash
function h2,
probe using h2
87. hash join
cost
87
partition phase
read and write each relation once
2M + 2N
probing phase
read each partition once
(final output cost ignored)
M + N
total
3M + 3N
89. query optimization
once we submit a query
the dbms is responsible for
efficient computation
the same query can be
executed in many ways
each is an ‘execution plan’
or ‘query evaluation plan’
89
90. example
select *
from students
where sid = 100
90
students
σsid = 100
(scan)
(on-the-fly)
execution plan
annotated relational
algebra tree
access path
how we retrieve data from
the relation: scan or index?
algorithm used by operator
students
σsid = 100
(b+ tree index stud_btree on sid)
(query index stud_btree)
another
execution plan
91. example
select *
from students S, dbcourse C
where S.sid = C.sid
91
students S (scan)
execution plan
C.sid = S.sid
dbcourse C (scan)
(block-nested
-loops)
another
execution plan
students S
(index
stud_btree)
C.sid = S.sid
dbcourse C (scan)
(index-nested-loops)
92. which plan to choose?
dbms estimates cost for a number of execution plans
(not all possible plans, necessarily!)
the estimates follow the cost analysis
we presented earlier
dbms picks the execution plan with
minimum estimated cost
92
96. references
● “cowbook”, database management systems, by ramakrishnan and gehrke
● “elmasri”, fundamentals of database systems, elmasri and navathe
● other database textbooks
96
credits
some slides based on material from
database management systems, by ramakrishnan and gehrke
99. deleting a data entry
1. start at root, find leaf L of entry
2. remove the entry, if it exists
○ if L is at least half-full, done!
○ else
■ try to re-distribute, borrowing from sibling
● adjacent node with same parent as L
■ if that fails, merge L into sibling
o if merge occured,
must delete L from parent of L
99
merge could propagate to root
110. example b+ tree
135 17 20
22
30
14* 16* 17* 18* 20* 33* 34* 38* 39*22* 27* 29*21*7*5* 8*3*2*
during deletion of 24* -- different example
110
left child of root has many entries (full)
can redistribute entries of index nodes
pushing through root splitting entry
111. example b+ tree
135
17
20 22 30
14* 16* 17* 18* 20* 33* 34* 38* 39*22* 27* 29*21*7*5* 8*3*2*
during deletion of 24* -- different example
111
left child of root has many entries (full)
can redistribute entries of index nodes
pushing through root splitting entry
114. linear hashing
dynamic hashing
uses overflow pages; no directory
splits a bucket in round-robin fashion
when an overflow occurs
M = 2level: number of buckets at beginning of round
pointer next ∈ [0, M) points at next bucket to split
already next - 1 ‘split-image’ buckets appended to original M
114
115. linear hashing
to allocate entries, use
H0(key) = h(key) mod M, or
H1(key) = h(key) mod 2M
i.e., level or level+1 least significant bits of h(key)
to allocate bucket for key
first use H0(key)
if H0(key) is less than next
then it refers to a split bucket
use H1(key) to determine if it refers
to original or its split image
115
116. linear hashing
in the middle of a round...
M buckets that existed at the
beginning of this round;
this is the range of H0
bucket to be split next
buckets already split in this round
split image buckets
created through splitting of
other buckets in this round
if H0(key) is in this range, then
must use H1(key) to decide if
entry is in split image bucket
> is a directory necessary? 116
117. linear hashing inserts
insert
find bucket by applying H0 / H1 and
insert if there is space
if bucket to insert into is full:
add overflow page, insert data entry,
split next bucket and increment next
since buckets are split round-robin,
long overflow chains don’t develop!
117
118. example - insert h(r) = 43
on split, H1 is used to redistribute entries
H0
this is for
illustration only!
M=4
00
01
10
11
000
001
010
011
actual contents of the
linear hashing file
next=0
PRIMARY
PAGES
44* 36*32*
25*9* 5*
14* 18*10*30*
31*35* 11*7*
00
01
10
11
000
001
010
011
next=1
PRIMARY
PAGES
44* 36*
32*
25*9* 5*
14* 18*10*30*
31*35* 11*7*
OVERFLOW
PAGES
43*
00100
H1 H0H1
118