2. ContentsContents
Static HashingStatic Hashing
• File OrganizationFile Organization
• Properties of the Hash FunctionProperties of the Hash Function
• Bucket OverflowBucket Overflow
• IndicesIndices
Dynamic HashingDynamic Hashing
• Underlying Data StructureUnderlying Data Structure
• Querying and UpdatingQuerying and Updating
ComparisonsComparisons
• Other types of hashingOther types of hashing
• Ordered Indexing vs. HashingOrdered Indexing vs. Hashing
3. Static HashingStatic Hashing
Hashing provides a means forHashing provides a means for
accessing data without the use of anaccessing data without the use of an
index structure.index structure.
Data is addressed on disk byData is addressed on disk by
computing a function on a searchcomputing a function on a search
key instead.key instead.
4. OrganizationOrganization
AA bucketbucket in a hash file is unit ofin a hash file is unit of
storage (typically a disk block) thatstorage (typically a disk block) that
can hold one or more records.can hold one or more records.
TheThe hash functionhash function, h, is a function, h, is a function
from the set of all search-keys, K, tofrom the set of all search-keys, K, to
the set of all bucket addresses, B.the set of all bucket addresses, B.
Insertion, deletion, and lookup areInsertion, deletion, and lookup are
done in constant time.done in constant time.
5. Querying and UpdatesQuerying and Updates
To insert a record into the structureTo insert a record into the structure
compute the hash value h(Kcompute the hash value h(Kii), and), and
place the record in the bucketplace the record in the bucket
address returned.address returned.
For lookup operations, compute theFor lookup operations, compute the
hash value as above and search eachhash value as above and search each
record in the bucket for the specificrecord in the bucket for the specific
record.record.
To delete simply lookup and remove.To delete simply lookup and remove.
6. Properties of the Hash FunctionProperties of the Hash Function
The distribution should be uniform.The distribution should be uniform.
• An ideal hash function should assign theAn ideal hash function should assign the
same number of records in each bucket.same number of records in each bucket.
The distribution should be random.The distribution should be random.
• Regardless of the actual search-keys,Regardless of the actual search-keys,
the each bucket has the same numberthe each bucket has the same number
of records on averageof records on average
• Hash values should not depend on anyHash values should not depend on any
ordering or the search-keysordering or the search-keys
7. Bucket OverflowBucket Overflow
How does bucket overflow occur?How does bucket overflow occur?
• Not enough buckets to handle dataNot enough buckets to handle data
• A few buckets have considerably moreA few buckets have considerably more
records then others. This is referred torecords then others. This is referred to
as skew.as skew.
Multiple records have the same hash valueMultiple records have the same hash value
Non-uniform hash function distribution.Non-uniform hash function distribution.
8. SolutionsSolutions
Provide more buckets then areProvide more buckets then are
needed.needed.
Overflow chainingOverflow chaining
• If a bucket is full, link another bucket toIf a bucket is full, link another bucket to
it. Repeat as necessary.it. Repeat as necessary.
• The system must then check overflowThe system must then check overflow
buckets for querying and updates. Thisbuckets for querying and updates. This
is known asis known as closed hashingclosed hashing..
9. AlternativesAlternatives
Open hashingOpen hashing
• The number of buckets is fixedThe number of buckets is fixed
• Overflow is handled by using the nextOverflow is handled by using the next
bucket in cyclic order that has space.bucket in cyclic order that has space.
This is known asThis is known as linear probinglinear probing..
Compute more hash functions.Compute more hash functions.
Note: Closed hashing is preferred inNote: Closed hashing is preferred in
database systems.database systems.
10. IndicesIndices
AA hash indexhash index organizes the searchorganizes the search
keys, with their pointers, into a hashkeys, with their pointers, into a hash
file.file.
Hash indices never primary evenHash indices never primary even
though they provide direct access.though they provide direct access.
12. Dynamic HashingDynamic Hashing
More effective then static hashingMore effective then static hashing
when the database grows or shrinkswhen the database grows or shrinks
Extendable hashingExtendable hashing splits andsplits and
coalesces buckets appropriately withcoalesces buckets appropriately with
the database size.the database size.
• i.e. buckets are added and deleted oni.e. buckets are added and deleted on
demand.demand.
13. The Hash FunctionThe Hash Function
Typically produces a large number ofTypically produces a large number of
values, uniformly and randomly.values, uniformly and randomly.
Only part of the value is usedOnly part of the value is used
depending on the size of thedepending on the size of the
database.database.
14. Data StructureData Structure
Hash indices are typically a prefix ofHash indices are typically a prefix of
the entire hash value.the entire hash value.
More then one consecutive index canMore then one consecutive index can
point to the same bucket.point to the same bucket.
• The indices have the same hash prefixThe indices have the same hash prefix
which can be shorter then the length ofwhich can be shorter then the length of
the index.the index.
16. Queries and UpdatesQueries and Updates
LookupLookup
• Take the first i bits of the hash value.Take the first i bits of the hash value.
• Following the corresponding entry in theFollowing the corresponding entry in the
bucket address table.bucket address table.
• Look in the bucket.Look in the bucket.
17. Queries and Updates (Cont’d)Queries and Updates (Cont’d)
InsertionInsertion
• Follow lookup procedureFollow lookup procedure
• If the bucket has space, add the record.If the bucket has space, add the record.
• If not…If not…
18. Insertion (Cont’d)Insertion (Cont’d)
Case 1: i = iCase 1: i = ijj
• Use an additional bit in the hash valueUse an additional bit in the hash value
This doubles the size of the bucket address table.This doubles the size of the bucket address table.
Makes two entries in the table point to the fullMakes two entries in the table point to the full
bucket.bucket.
• Allocate a new bucket, z.Allocate a new bucket, z.
Set iSet ijj and iand izz to ito i
Point the second entry to the new bucketPoint the second entry to the new bucket
Rehash the old bucketRehash the old bucket
• Repeat insertion attemptRepeat insertion attempt
19. Insertion (Cont’d)Insertion (Cont’d)
Case 2: i > iCase 2: i > ijj
• Allocate a new bucket, zAllocate a new bucket, z
• Add 1 to iAdd 1 to ijj, set, set iijj andand iizz to this new valueto this new value
• Put half of the entries in the first bucketPut half of the entries in the first bucket
and half in the otherand half in the other
• Rehash records in bucket jRehash records in bucket j
• Reattempt insertionReattempt insertion
20. Insertion (Finally)Insertion (Finally)
If all the records in the bucket haveIf all the records in the bucket have
the same search value, simply usethe same search value, simply use
overflow buckets as seen in staticoverflow buckets as seen in static
hashing.hashing.
21. Use of Extendable HashUse of Extendable Hash
Structure: ExampleStructure: Example
Initial Hash structure, bucket size = 2
22. Example (Cont.)Example (Cont.)
Hash structure after insertion ofHash structure after insertion of
one Brighton and two Downtownone Brighton and two Downtown
recordsrecords
25. Example (Cont.)Example (Cont.)
Hash structure after insertion ofHash structure after insertion of
Redwood and Round Hill recordsRedwood and Round Hill records
26. Comparison to Other HashingComparison to Other Hashing
MethodsMethods
Advantage: performance does notAdvantage: performance does not
decrease as the database sizedecrease as the database size
increasesincreases
• Space is conserved by adding andSpace is conserved by adding and
removing as necessaryremoving as necessary
Disadvantage: additional level ofDisadvantage: additional level of
indirection for operationsindirection for operations
• Complex implementationComplex implementation
27. Ordered Indexing vs. HashingOrdered Indexing vs. Hashing
Hashing is less efficient if queries toHashing is less efficient if queries to
the database include ranges asthe database include ranges as
opposed to specific values.opposed to specific values.
In cases where ranges are infrequentIn cases where ranges are infrequent
hashing provides faster insertion,hashing provides faster insertion,
deletion, and lookup then ordereddeletion, and lookup then ordered
indexing.indexing.