SlideShare a Scribd company logo
1 of 25
Dictionaries
 the dictionary ADT
 binary search
 Hashing
Dictionaries
 Dictionaries store elements so that they
  can be located quickly using keys
 A dictionary may hold bank accounts
   each  account is an object that is identified by
    an account number
   each account stores a wealth of additional
    information
      including the current balance,
      the name and address of the account holder, and
      the history of deposits and withdrawals performed
   an application wishing to operate on an
    account would have to provide the account
    number as a search key
The Dictionary ADT
 A dictionary is an abstract model of a database
    A dictionary stores key-element pairs
    The main operation supported by a dictionary
     is searching by key
 simple container methods: size(), isEmpty(),
  elements()
 query methods: findElem(k), findAllElem(k)
 update methods: insertItem(k,e), removeElem(k),
  removeAllElem(k)
 special element: NIL, returned by an
  unsuccessful search
The Dictionary ADT

 Supporting  order (methods min, max,
 successor, predecessor ) is not required,
 thus it is enough that keys are
 comparable for equality
The Dictionary ADT
 Different data structures to realize dictionaries
    arrays, linked lists (inefficient)
    Hash table (used in Java...)
    Binary trees
    Red/Black trees
    AVL trees
    B-trees
 In Java:
    java.util.Dictionary – abstract class
    java.util.Map – interface
Searching
INPUT                       OUTPUT
• sequence of numbers       • index of the found
(database)                  number or NIL
• a single number (query)

 a1, a2, a3,….,an; q            j

  2   5   4   10   7;   5      2

  2   5   4   10   7;   9      NIL
Binary Search
 Idea: Divide and conquer, a key design technique
 narrow down the search range in stages
 findElement(22)
A recursive procedure
Algorithm BinarySearch(A, k, low, high)
if low > high then return Nil
else mid       (low+high) / 2
       if k = A[mid] then return mid
       elseif k < A[mid] then
             return BinarySearch(A, k, low, mid-1)
       else return BinarySearch(A, k, mid+1, high)
     2    4   5   7   8   9   12   14    17    19   22   25    27   28   33    37

    low                            mid                                        high

     2    4   5   7   8   9   12   14    17    19   22   25    27   28   33    37

                                         low             mid                  high

     2    4   5   7   8   9   12   14    17    19 22     25    27   28   33    37

                                         low mid
An iterative procedure
INPUT: A[1..n] – a sorted (non-decreasing) array of
  integers, key – an integer.
OUTPUT: an index j such that A[j] = k.
          NIL, if j (1 j n): A[j] k
                              2    4   5   7   8   9   12   14    17    19   22   25    27   28   33    37

low 1                        low                            mid                                        high


high     n                    2    4   5   7   8   9   12   14    17    19   22   25    27   28   33    37



do                            2    4   5   7   8   9   12   14
                                                                  low

                                                                  17    19 22
                                                                                  mid

                                                                                  25    27   28   33
                                                                                                       high

                                                                                                        37


   mid     (low+high)/2                                           low mid


   if A[mid] = k then return mid
   else if A[mid] > k then high   mid-1
                      else low   mid+1
while low <= high
return NIL
Running Time of Binary Search
   The range of candidate items to be searched is
    halved after each comparison




   In the array-based implementation, access by
    rank takes O(1) time, thus binary search runs in
    O(log n) time
Searching in an unsorted array
INPUT: A[1..n] – an array of integers, q – an integer.
OUTPUT: an index j such that A[j] = q. NIL, if j (1 j n):
  A[j] q
j   1
while j n and A[j] q
   do j++
if j n then return j
        else return NIL


 Worst-case running time: O(n), average-case:
  O(n)
 We can’t do better. This is a lower bound for the
  problem of searching in an arbitrary sequence.
The Problem

T&T is a large phone company, and they
  want to provide caller ID capability:
 given a phone number, return the caller’s
  name
 phone numbers range from 0 to r = 108 -1
 There are n phone numbers, n << r.
 want to do this as efficiently as possible
Using an unordered sequence




 searching  and removing takes O(n) time
 inserting takes O(1) time
 applications to log files (frequent
  insertions, rare searches and removals)
Using an ordered sequence




 searching   takes O(log n) time (binary
  search)
 inserting and removing takes O(n) time
 application to look-up tables (frequent
  searches, rare insertions and removals)
Other Suboptimal ways
direct addressing: an array indexed by key:
 takes O(1) time,
 O(r) space where r is the range of numbers
  (108)
 huge amount of wasted space


     (null)    (null)    Ankur     (null)    (null)
   0000-0000 0000-0000 9635-8904 0000-0000 0000-0000
Another Solution
 Can do better, with a Hash table -- O(1) expected
  time, O(n+m) space, where m is table size
 Like an array, but come up with a function to map
  the large range into one which we can manage
    e.g., take the original key, modulo the (relatively
     small) size of the array, and use that as an index
    Insert (9635-8904, Ankur) into a hashed array
     with, say, five slots. 96358904 mod 5 = 4

        (null)   (null)    (null)   (null)   Ankur
          0        1         2        3        4
An Example
 Let keys be entry no’s of students in
  CSL201. eg. 2004CS10110.
 There are 100 students in the class. We
  create a hash table of size, say 100.
 Hash function is, say, last two digits.
 Then 2004CS10110 goes to location 10.
 Where does 2004CS50310 go?
 Also to location 10. We have a collision!!
Collision Resolution
 How to deal with two keys which hash to the
  same spot in the array?
 Use chaining
    Set up an array of links (a table), indexed by
     the keys, to lists of items with the same key




     Most
         efficient (time-wise) collision resolution
     scheme
Collision resolution (2)
   To find/insert/delete an element
      using h, look up its position in table T
      Search/insert/delete the element in the
       linked list of the hashed slot
Analysis of Hashing
 An  element with key k is stored in slot
  h(k) (instead of slot k without hashing)
 The hash function h maps the universe U
  of keys into the slots of hash table
  T[0...m-1]
       h :U     0,1,..., m 1
 Assume      time to compute h(k) is   (1)
Analysis of Hashing(2)
 An good hash function is one which distributes
  keys evenly amongst the slots.
 An ideal hash function would pick a
  slot, uniformly at random and hash the key to it.
 However, this is not a hash function since we
  would not know which slot to look into when
  searching for a key.
 For our analysis we will use this simple uniform
  hash function
 Given hash table T with m slots holding n
  elements, the load factor is defined as =n/m
Analysis of Hashing(3)
Unsuccessful search
 element is not in the linked list
 Simple uniform hashing yields an average
  list length = n/m
 expected number of elements to be
  examined
 search time O(1+ ) (includes computing
  the hash value)
Analysis of Hashing (4)

Successful search
 assume that a new element is inserted at the
  end of the linked list
 upon insertion of the i-th element, the expected
  length of the list is (i-1)/m
 in case of a successful search, the expected
  number of elements examined is 1 more that the
  number of elements examined when the sought-
  for element was inserted!
Analysis of Hashing (5)
   The expected number of elements examined is
    thus      1
                 1
                   n
                   i 1
                        1
                           1
                              i 1
                                       n


               n   i 1   m        nm   i 1

                                   1 n 1 n
                              1
                                  nm     2
                                  n 1
                               1
                                  2m
                                   n   1
                               1
                                  2m 2m
                                     1
                             1
                                 2 2m

   Considering the time for computing the hash
    function, we obtain
                   (2    / 2 1/ 2m)          (1   )
Analysis of Hashing (6)
Assuming the number of hash table slots is
  proportional to the number of elements in
  the table
 n=O(m)
 = n/m = O(m)/m = O(1)
 searching takes constant time on average
 insertion takes O(1) worst-case time
 deletion takes O(1) worst-case time when
  the lists are doubly-linked

More Related Content

What's hot (20)

Lec8
Lec8Lec8
Lec8
 
Lec9
Lec9Lec9
Lec9
 
Lec3
Lec3Lec3
Lec3
 
Lec5
Lec5Lec5
Lec5
 
Lec2
Lec2Lec2
Lec2
 
Lec11
Lec11Lec11
Lec11
 
Interpolation search
Interpolation searchInterpolation search
Interpolation search
 
Greedy algorithm activity selection fractional
Greedy algorithm activity selection fractionalGreedy algorithm activity selection fractional
Greedy algorithm activity selection fractional
 
Functors, Applicatives and Monads In Scala
Functors, Applicatives and Monads In ScalaFunctors, Applicatives and Monads In Scala
Functors, Applicatives and Monads In Scala
 
Pandas
PandasPandas
Pandas
 
skip list
skip listskip list
skip list
 
Stacks overview with its applications
Stacks overview with its applicationsStacks overview with its applications
Stacks overview with its applications
 
Linear and Binary search
Linear and Binary searchLinear and Binary search
Linear and Binary search
 
how to calclute time complexity of algortihm
how to calclute time complexity of algortihmhow to calclute time complexity of algortihm
how to calclute time complexity of algortihm
 
LIST IN PYTHON
LIST IN PYTHONLIST IN PYTHON
LIST IN PYTHON
 
Lec20
Lec20Lec20
Lec20
 
Managing input and output operation in c
Managing input and output operation in cManaging input and output operation in c
Managing input and output operation in c
 
4.4 hashing
4.4 hashing4.4 hashing
4.4 hashing
 
SINGLE SOURCE SHORTEST PATH.ppt
SINGLE SOURCE SHORTEST PATH.pptSINGLE SOURCE SHORTEST PATH.ppt
SINGLE SOURCE SHORTEST PATH.ppt
 
08 Hash Tables
08 Hash Tables08 Hash Tables
08 Hash Tables
 

Viewers also liked (20)

国内移动广告行业研究
国内移动广告行业研究国内移动广告行业研究
国内移动广告行业研究
 
ChapmanCreative Porfolio
ChapmanCreative PorfolioChapmanCreative Porfolio
ChapmanCreative Porfolio
 
App storeとandroid market
App storeとandroid marketApp storeとandroid market
App storeとandroid market
 
Graduacion 8vo (clase 2010)
Graduacion 8vo (clase 2010)Graduacion 8vo (clase 2010)
Graduacion 8vo (clase 2010)
 
Turnaround Highlights
Turnaround HighlightsTurnaround Highlights
Turnaround Highlights
 
Maine Tobacco Control Timeline, January 2009-July 2011 (Updated July 24, 2011)
Maine Tobacco Control Timeline, January 2009-July 2011 (Updated July 24, 2011)Maine Tobacco Control Timeline, January 2009-July 2011 (Updated July 24, 2011)
Maine Tobacco Control Timeline, January 2009-July 2011 (Updated July 24, 2011)
 
Printemps au japon
Printemps au japonPrintemps au japon
Printemps au japon
 
Costing System example מערכת תמחיר - דוגמא
Costing System example      מערכת תמחיר - דוגמאCosting System example      מערכת תמחיר - דוגמא
Costing System example מערכת תמחיר - דוגמא
 
Australia grande barriera corallina
Australia   grande barriera corallinaAustralia   grande barriera corallina
Australia grande barriera corallina
 
삼성핵심가치위반사례
삼성핵심가치위반사례삼성핵심가치위반사례
삼성핵심가치위반사례
 
Time
TimeTime
Time
 
κινα νο 3
κινα   νο 3κινα   νο 3
κινα νο 3
 
Piano
PianoPiano
Piano
 
The Google+ Project
The Google+ ProjectThe Google+ Project
The Google+ Project
 
American deserts
American desertsAmerican deserts
American deserts
 
Mass Innovation Nights
Mass Innovation NightsMass Innovation Nights
Mass Innovation Nights
 
Abctest
AbctestAbctest
Abctest
 
Pamukkale gr
Pamukkale grPamukkale gr
Pamukkale gr
 
Fund for a Healthy Maine PPT, July 2011
Fund for a Healthy Maine PPT, July 2011Fund for a Healthy Maine PPT, July 2011
Fund for a Healthy Maine PPT, July 2011
 
Měřidlo chodidel prezentace
Měřidlo chodidel   prezentaceMěřidlo chodidel   prezentace
Měřidlo chodidel prezentace
 

Similar to Lec4

chapter1.pdf ......................................
chapter1.pdf ......................................chapter1.pdf ......................................
chapter1.pdf ......................................nourhandardeer3
 
1_Asymptotic_Notation_pptx.pptx
1_Asymptotic_Notation_pptx.pptx1_Asymptotic_Notation_pptx.pptx
1_Asymptotic_Notation_pptx.pptxpallavidhade2
 
Jurnal informatika
Jurnal informatika Jurnal informatika
Jurnal informatika MamaMa28
 
Joc live session
Joc live sessionJoc live session
Joc live sessionSIT Tumkur
 
DS Unit 1.pptx
DS Unit 1.pptxDS Unit 1.pptx
DS Unit 1.pptxchin463670
 
Analysis and design of algorithms part2
Analysis and design of algorithms part2Analysis and design of algorithms part2
Analysis and design of algorithms part2Deepak John
 
pradeepbishtLecture13 div conq
pradeepbishtLecture13 div conqpradeepbishtLecture13 div conq
pradeepbishtLecture13 div conqPradeep Bisht
 
Analysis Of Algorithms - Hashing
Analysis Of Algorithms - HashingAnalysis Of Algorithms - Hashing
Analysis Of Algorithms - HashingSam Light
 
Advanced data structure
Advanced data structureAdvanced data structure
Advanced data structureShakil Ahmed
 
An Experiment to Determine and Compare Practical Efficiency of Insertion Sort...
An Experiment to Determine and Compare Practical Efficiency of Insertion Sort...An Experiment to Determine and Compare Practical Efficiency of Insertion Sort...
An Experiment to Determine and Compare Practical Efficiency of Insertion Sort...Tosin Amuda
 

Similar to Lec4 (20)

Lec4
Lec4Lec4
Lec4
 
chapter1.pdf ......................................
chapter1.pdf ......................................chapter1.pdf ......................................
chapter1.pdf ......................................
 
1_Asymptotic_Notation_pptx.pptx
1_Asymptotic_Notation_pptx.pptx1_Asymptotic_Notation_pptx.pptx
1_Asymptotic_Notation_pptx.pptx
 
Jurnal informatika
Jurnal informatika Jurnal informatika
Jurnal informatika
 
algorithm Unit 2
algorithm Unit 2 algorithm Unit 2
algorithm Unit 2
 
Unit 2 in daa
Unit 2 in daaUnit 2 in daa
Unit 2 in daa
 
Binsort
BinsortBinsort
Binsort
 
Joc live session
Joc live sessionJoc live session
Joc live session
 
Sorting2
Sorting2Sorting2
Sorting2
 
Unit 8 searching and hashing
Unit   8 searching and hashingUnit   8 searching and hashing
Unit 8 searching and hashing
 
DS Unit 1.pptx
DS Unit 1.pptxDS Unit 1.pptx
DS Unit 1.pptx
 
Analysis and design of algorithms part2
Analysis and design of algorithms part2Analysis and design of algorithms part2
Analysis and design of algorithms part2
 
Sorting
SortingSorting
Sorting
 
sorting
sortingsorting
sorting
 
pradeepbishtLecture13 div conq
pradeepbishtLecture13 div conqpradeepbishtLecture13 div conq
pradeepbishtLecture13 div conq
 
Q
QQ
Q
 
Analysis Of Algorithms - Hashing
Analysis Of Algorithms - HashingAnalysis Of Algorithms - Hashing
Analysis Of Algorithms - Hashing
 
Advanced data structure
Advanced data structureAdvanced data structure
Advanced data structure
 
Hashing .pptx
Hashing .pptxHashing .pptx
Hashing .pptx
 
An Experiment to Determine and Compare Practical Efficiency of Insertion Sort...
An Experiment to Determine and Compare Practical Efficiency of Insertion Sort...An Experiment to Determine and Compare Practical Efficiency of Insertion Sort...
An Experiment to Determine and Compare Practical Efficiency of Insertion Sort...
 

More from Anjneya Varshney (6)

Lec23
Lec23Lec23
Lec23
 
Lec22
Lec22Lec22
Lec22
 
Lec21
Lec21Lec21
Lec21
 
Lec19
Lec19Lec19
Lec19
 
Lec16
Lec16Lec16
Lec16
 
Lec13
Lec13Lec13
Lec13
 

Lec4

  • 1. Dictionaries  the dictionary ADT  binary search  Hashing
  • 2. Dictionaries  Dictionaries store elements so that they can be located quickly using keys  A dictionary may hold bank accounts  each account is an object that is identified by an account number  each account stores a wealth of additional information  including the current balance,  the name and address of the account holder, and  the history of deposits and withdrawals performed  an application wishing to operate on an account would have to provide the account number as a search key
  • 3. The Dictionary ADT  A dictionary is an abstract model of a database  A dictionary stores key-element pairs  The main operation supported by a dictionary is searching by key  simple container methods: size(), isEmpty(), elements()  query methods: findElem(k), findAllElem(k)  update methods: insertItem(k,e), removeElem(k), removeAllElem(k)  special element: NIL, returned by an unsuccessful search
  • 4. The Dictionary ADT  Supporting order (methods min, max, successor, predecessor ) is not required, thus it is enough that keys are comparable for equality
  • 5. The Dictionary ADT  Different data structures to realize dictionaries  arrays, linked lists (inefficient)  Hash table (used in Java...)  Binary trees  Red/Black trees  AVL trees  B-trees  In Java:  java.util.Dictionary – abstract class  java.util.Map – interface
  • 6. Searching INPUT OUTPUT • sequence of numbers • index of the found (database) number or NIL • a single number (query) a1, a2, a3,….,an; q j 2 5 4 10 7; 5 2 2 5 4 10 7; 9 NIL
  • 7. Binary Search  Idea: Divide and conquer, a key design technique  narrow down the search range in stages  findElement(22)
  • 8. A recursive procedure Algorithm BinarySearch(A, k, low, high) if low > high then return Nil else mid (low+high) / 2 if k = A[mid] then return mid elseif k < A[mid] then return BinarySearch(A, k, low, mid-1) else return BinarySearch(A, k, mid+1, high) 2 4 5 7 8 9 12 14 17 19 22 25 27 28 33 37 low mid high 2 4 5 7 8 9 12 14 17 19 22 25 27 28 33 37 low mid high 2 4 5 7 8 9 12 14 17 19 22 25 27 28 33 37 low mid
  • 9. An iterative procedure INPUT: A[1..n] – a sorted (non-decreasing) array of integers, key – an integer. OUTPUT: an index j such that A[j] = k. NIL, if j (1 j n): A[j] k 2 4 5 7 8 9 12 14 17 19 22 25 27 28 33 37 low 1 low mid high high n 2 4 5 7 8 9 12 14 17 19 22 25 27 28 33 37 do 2 4 5 7 8 9 12 14 low 17 19 22 mid 25 27 28 33 high 37 mid (low+high)/2 low mid if A[mid] = k then return mid else if A[mid] > k then high mid-1 else low mid+1 while low <= high return NIL
  • 10. Running Time of Binary Search  The range of candidate items to be searched is halved after each comparison  In the array-based implementation, access by rank takes O(1) time, thus binary search runs in O(log n) time
  • 11. Searching in an unsorted array INPUT: A[1..n] – an array of integers, q – an integer. OUTPUT: an index j such that A[j] = q. NIL, if j (1 j n): A[j] q j 1 while j n and A[j] q do j++ if j n then return j else return NIL  Worst-case running time: O(n), average-case: O(n)  We can’t do better. This is a lower bound for the problem of searching in an arbitrary sequence.
  • 12. The Problem T&T is a large phone company, and they want to provide caller ID capability:  given a phone number, return the caller’s name  phone numbers range from 0 to r = 108 -1  There are n phone numbers, n << r.  want to do this as efficiently as possible
  • 13. Using an unordered sequence  searching and removing takes O(n) time  inserting takes O(1) time  applications to log files (frequent insertions, rare searches and removals)
  • 14. Using an ordered sequence  searching takes O(log n) time (binary search)  inserting and removing takes O(n) time  application to look-up tables (frequent searches, rare insertions and removals)
  • 15. Other Suboptimal ways direct addressing: an array indexed by key:  takes O(1) time,  O(r) space where r is the range of numbers (108)  huge amount of wasted space (null) (null) Ankur (null) (null) 0000-0000 0000-0000 9635-8904 0000-0000 0000-0000
  • 16. Another Solution  Can do better, with a Hash table -- O(1) expected time, O(n+m) space, where m is table size  Like an array, but come up with a function to map the large range into one which we can manage  e.g., take the original key, modulo the (relatively small) size of the array, and use that as an index  Insert (9635-8904, Ankur) into a hashed array with, say, five slots. 96358904 mod 5 = 4 (null) (null) (null) (null) Ankur 0 1 2 3 4
  • 17. An Example  Let keys be entry no’s of students in CSL201. eg. 2004CS10110.  There are 100 students in the class. We create a hash table of size, say 100.  Hash function is, say, last two digits.  Then 2004CS10110 goes to location 10.  Where does 2004CS50310 go?  Also to location 10. We have a collision!!
  • 18. Collision Resolution  How to deal with two keys which hash to the same spot in the array?  Use chaining  Set up an array of links (a table), indexed by the keys, to lists of items with the same key  Most efficient (time-wise) collision resolution scheme
  • 19. Collision resolution (2)  To find/insert/delete an element  using h, look up its position in table T  Search/insert/delete the element in the linked list of the hashed slot
  • 20. Analysis of Hashing  An element with key k is stored in slot h(k) (instead of slot k without hashing)  The hash function h maps the universe U of keys into the slots of hash table T[0...m-1] h :U 0,1,..., m 1  Assume time to compute h(k) is (1)
  • 21. Analysis of Hashing(2)  An good hash function is one which distributes keys evenly amongst the slots.  An ideal hash function would pick a slot, uniformly at random and hash the key to it.  However, this is not a hash function since we would not know which slot to look into when searching for a key.  For our analysis we will use this simple uniform hash function  Given hash table T with m slots holding n elements, the load factor is defined as =n/m
  • 22. Analysis of Hashing(3) Unsuccessful search  element is not in the linked list  Simple uniform hashing yields an average list length = n/m  expected number of elements to be examined  search time O(1+ ) (includes computing the hash value)
  • 23. Analysis of Hashing (4) Successful search  assume that a new element is inserted at the end of the linked list  upon insertion of the i-th element, the expected length of the list is (i-1)/m  in case of a successful search, the expected number of elements examined is 1 more that the number of elements examined when the sought- for element was inserted!
  • 24. Analysis of Hashing (5)  The expected number of elements examined is thus 1 1 n i 1 1 1 i 1 n n i 1 m nm i 1 1 n 1 n 1 nm 2 n 1 1 2m n 1 1 2m 2m 1 1 2 2m  Considering the time for computing the hash function, we obtain (2 / 2 1/ 2m) (1 )
  • 25. Analysis of Hashing (6) Assuming the number of hash table slots is proportional to the number of elements in the table  n=O(m)  = n/m = O(m)/m = O(1)  searching takes constant time on average  insertion takes O(1) worst-case time  deletion takes O(1) worst-case time when the lists are doubly-linked