SlideShare a Scribd company logo
1 of 25
Download to read offline
CHAPTER               4
    Algorithm Analysis


    Algorithms are designed to solve problems, but a given problem can have many
    di↵erent solutions. How then are we to determine which solution is the most
    e cient for a given problem? One approach is to measure the execution time. We
    can implement the solution by constructing a computer program, using a given
    programming language. We then execute the program and time it using a wall
    clock or the computer’s internal clock.
        The execution time is dependent on several factors. First, the amount of data
    that must be processed directly a↵ects the execution time. As the data set size
    increases, so does the execution time. Second, the execution times can vary de-
    pending on the type of hardware and the time of day a computer is used. If
    we use a multi-process, multi-user system to execute the program, the execution
    of other programs on the same machine can directly a↵ect the execution time of
    our program. Finally, the choice of programming language and compiler used to
    implement an algorithm can also influence the execution time. Some compilers
    are better optimizers than others and some languages produce better optimized
    code than others. Thus, we need a method to analyze an algorithm’s e ciency
    independent of the implementation details.


4.1 Complexity Analysis
    To determine the e ciency of an algorithm, we can examine the solution itself and
    measure those aspects of the algorithm that most critically a↵ect its execution time.
    For example, we can count the number of logical comparisons, data interchanges,
    or arithmetic operations. Consider the following algorithm for computing the sum
    of each row of an n ⇥ n matrix and an overall sum of the entire matrix:
        totalSum = 0                       # Version 1
        for i in range( n ) :
          rowSum[i] = 0
          for j in range( n ) :
            rowSum[i] = rowSum[i] + matrix[i,j]
            totalSum = totalSum + matrix[i,j]

                                             97
98   CHAPTER 4   Algorithm Analysis

              Suppose we want to analyze the algorithm based on the number of additions
          performed. In this example, there are only two addition operations, making this a
          simple task. The algorithm contains two loops, one nested inside the other. The
          inner loop is executed n times and since it contains the two addition operations,
          there are a total of 2n additions performed by the inner loop for each iteration
          of the outer loop. The outer loop is also performed n times, for a total of 2n2
          additions.
              Can we improve upon this algorithm to reduce the total number of addition
          operations performed? Consider a new version of the algorithm in which the second
          addition is moved out of the inner loop and modified to sum the entries in the
          rowSum array instead of individual elements of the matrix.

              totalSum = 0                       # Version 2
              for i in range( n ) :
                rowSum[i] = 0
                for j in range( n ) :
                  rowSum[i] = rowSum[i] + matrix[i,j]
                totalSum = totalSum + rowSum[i]


              In this version, the inner loop is again executed n times, but this time, it only
          contains one addition operation. That gives a total of n additions for each iteration
          of the outer loop, but the outer loop now contains an addition operator of its own.
          To calculate the total number of additions for this version, we take the n additions
          performed by the inner loop and add one for the addition performed at the bottom
          of the outer loop. This gives n + 1 additions for each iteration of the outer loop,
          which is performed n times for a total of n2 + n additions.
              If we compare the two results, it’s obvious the number of additions in the second
          version is less than the first for any n greater than 1. Thus, the second version will
          execute faster than the first, but the di↵erence in execution times will not be signif-
          icant. The reason is that both algorithms execute on the same order of magnitude,
          namely n2 . Thus, as the size of n increases, both algorithms increase at approxi-
          mately the same rate (though one is slightly better), as illustrated numerically in
          Table 4.1 and graphically in Figure 4.1.


                                  n             2n2            n2 + n
                                      10              200               110
                                   100            20,000             10,100
                                 1000           2,000,000        1,001,000
                                10000        200,000,000       100,010,000
                               100000      20,000,000,000   10,000,100,000

                       Table 4.1: Growth rate comparisons for different input sizes.
4.1 Complexity Analysis     99




               Figure 4.1: Graphical comparison of the growth rates from Table 4.1.


4.1.1 Big-O Notation
      Instead of counting the precise number of operations or steps, computer scientists
      are more interested in classifying an algorithm based on the order of magnitude
      as applied to execution time or space requirements. This classification approx-
      imates the actual number of required steps for execution or the actual storage
      requirements in terms of variable-sized data sets. The term big-O, which is de-
      rived from the expression “on the order of,” is used to specify an algorithm’s
      classification.

      Defining Big-O
      Assume we have a function T (n) that represents the approximate number of steps
      required by an algorithm for an input of size n. For the second version of our
      algorithm in the previous section, this would be written as

                                        T2 (n) = n2 + n

      Now, suppose there exists a function f (n) defined for the integers n      0, such that
      for some constant c, and some constant m,

                                         T (n)  cf (n)

      for all su ciently large values of n m. Then, such an algorithm is said to have a
      time-complexity of, or executes on the order of, f (n) relative to the number of
      operations it requires. In other words, there is a positive integer m and a constant
      c (constant of proportionality ) such that for all n          m, T (n)  cf (n). The
100   CHAPTER 4   Algorithm Analysis

           function f (n) indicates the rate of growth at which the run time of an algorithm
           increases as the input size, n, increases. To specify the time-complexity of an
           algorithm, which runs on the order of f (n), we use the notation

                                                O( f (n) )

              Consider the two versions of our algorithm from earlier. For version one, the
           time was computed to be T1 (n) = 2n2 . If we let c = 2, then

                                                2n2  2n2

           for a result of O(n2 ). For version two, we computed a time of T2 (n) = n2 + n.
           Again, if we let c = 2, then

                                               n2 + n  2n2

           for a result of O(n2 ). In this case, the choice of c comes from the observation that
           when n 1, we have n  n2 and n2 + n  n2 + n2 , which satisfies the equation in
           the definition of big-O.
               The function f (n) = n2 is not the only choice for satisfying the condition
           T (n)  cf (n). We could have said the algorithms had a run time of O(n3 ) or
           O(n4 ) since 2n2  n3 and 2n2  n4 when n > 1. The objective, however, is
           to find a function f (·) that provides the tightest (lowest) upper bound or limit
           for the run time of an algorithm. The big-O notation is intended to indicate an
           algorithm’s e ciency for large values of n. There is usually little di↵erence in the
           execution times of algorithms when n is small.


           Constant of Proportionality
           The constant of proportionality is only crucial when two algorithms have the same
           f (n). It usually makes no di↵erence when comparing algorithms whose growth
           rates are of di↵erent magnitudes. Suppose we have two algorithms, L1 and L2 ,
           with run times equal to n2 and 2n respectively. L1 has a time-complexity of O(n2 )
           with c = 1 and L2 has a time of O(n) with c = 2. Even though L1 has a smaller
           constant of proportionality, L1 is still slower and, in fact an order of magnitude
           slower, for large values of n. Thus, f (n) dominates the expression cf (n) and the run
           time performance of the algorithm. The di↵erences between the run times of these
           two algorithms is shown numerically in Table 4.2 and graphically in Figure 4.2.


           Constructing T(n)
           Instead of counting the number of logical comparisons or arithmetic operations, we
           evaluate an algorithm by considering every operation. For simplicity, we assume
           that each basic operation or statement, at the abstract level, takes the same amount
           of time and, thus, each is assumed to cost constant time. The total number of
4.1 Complexity Analysis   101


                          n              n2              2n
                              10               100          20
                           100              10,000         200
                          1000          1,000,000        2,000
                         10000        100,000,000      20,000
                        100000     10,000,000,000     200,000

             Table 4.2: Numerical comparison of two sample algorithms.




            Figure 4.2: Graphical comparison of the data from Table 4.2.



operations required by an algorithm can be computed as a sum of the times required
to perform each step:


                       T (n) = f1 (n) + f2 (n) + . . . + fk (n).


    The steps requiring constant time are generally omitted since they eventually
become part of the constant of proportionality. Consider Figure 4.3(a), which
shows a markup of version one of the algorithm from earlier. The basic operations
are marked with a constant time while the loops are marked with the appropriate
total number of iterations. Figure 4.3(b) shows the same algorithm but with the
constant steps omitted since these operations are independent of the data set size.
102   CHAPTER 4   Algorithm Analysis

                           1           totalSum = 0

                                   1 for i in range( n ) :
                                   1   rowSum[i] = 0

                     (a)   n       1       for j in range( n ) :
                               n   1         rowSum[i] = rowSum[i] + matrix[i,j]
                                   1         totalSum = totalSum + matrix[i,j]




                                               for i in range( n ) :
                                                    ...
                     (b)               n
                                           n    for j in range( n ) :
                                                     ...


           Figure 4.3: Markup for version one of the matrix summing algorithm: (a) shows all oper-
           ations marked with the appropriate time and (b) shows only the non-constant time steps.


           Choosing the Function
           The function f (n) used to categorize a particular algorithm is chosen to be the
           dominant term within T (n). That is, the term that is so large for big values of
           n, that we can ignore the other terms when computing a big-O value. For example,
           in the expression

                                                  n2 + log2 n + 3n

           the term n2 dominates the other terms since for n           3, we have


                                   n2 + log2 n + 3n  n2 + n2 + n2
                                   n2 + log2 n + 3n  3n2

           which leads to a time-complexity of O(n2 ). Now, consider the function T (n) =
           2n2 + 15n + 500 and assume it is the polynomial that represents the exact number
           of instructions required to execute some algorithm. For small values of n (less
           than 16), the constant value 500 dominates the function, but what happens as n
           gets larger, say 100, 000? The term n2 becomes the dominant term, with the other
           two becoming less significant in computing the final result.

           Classes of Algorithms
           We will work with many di↵erent algorithms in this text, but most will have a
           time-complexity selected from among a common set of functions, which are listed
           in Table 4.3 and illustrated graphically in Figure 4.4.
               Algorithms can be classified based on their big-O function. The various classes
           are commonly named based upon the dominant term. A logarithmic algorithm is
4.1 Complexity Analysis   103


                              f (·)       Common Name
                               1                 constant
                             log n             logarithmic
                               n                   linear
                             n log n             log linear
                               n2                quadratic
                               n3                  cubic
                               an              exponential

 Table 4.3: Common big-O functions listed from smallest to largest order of magnitude.




          Figure 4.4: Growth rates of the common time-complexity functions.


any algorithm whose time-complexity is O(loga n). These algorithms are generally
very e cient since loga n will increase more slowly than n. For many problems
encountered in computer science a will typically equal 2 and thus we use the no-
tation log n to imply log2 n. Logarithms of other bases will be explicitly stated.
Polynomial algorithms with an e ciency expressed as a polynomial of the form

                    am nm + am      1n
                                         m 1
                                               + . . . + a2 n2 + a1 n + a0
104    CHAPTER 4    Algorithm Analysis

              are characterized by a time-complexity of O(nm ) since the dominant term is the
              highest power of n. The most common polynomial algorithms are linear (m = 1),
              quadratic (m = 2), and cubic (m = 3). An algorithm whose e ciency is char-
              acterized by a dominant term in the form an is called exponential . Exponential
              algorithms are among the worst algorithms in terms of time-complexity.

      4.1.2   Evaluating Python Code
              As indicated earlier, when evaluating the time complexity of an algorithm or code
              segment, we assume that basic operations only require constant time. But what
              exactly is a basic operation? The basic operations include statements and func-
              tion calls whose execution time does not depend on the specific values of the data
              that is used or manipulated by the given instruction. For example, the assignment
              statement
                  x = 5


              is a basic instruction since the time required to assign a reference to the given
              variable is independent of the value or type of object specified on the righthand
              side of the = sign. The evaluation of arithmetic and logical expressions
                  y = x
                  z = x + y * 6
                  done = x > 0 and x < 100


              are basic instructions, again since they require the same number of steps to perform
              the given operations regardless of the values of their operands. The subscript
              operator, when used with Python’s sequence types (strings, tuples, and lists) is
              also a basic instruction.

              Linear Time Examples
              Now, consider the following assignment statement:
                  y = ex1(n)


              An assignment statement only requires constant time, but that is the time required
              to perform the actual assignment and does not include the time required to execute
              any function calls used on the righthand side of the assignment statement.
                  To determine the run time of the previous statement, we must know the cost of
              the function call ex1(n). The time required by a function call is the time it takes
              to execute the given function. For example, consider the ex1() function, which
              computes the sum of the integer values in the range [0 . . . n):
                  def ex1( n ):
                    total = 0
                    for i in range( n ) :
                      total += i
                    return total
4.1 Complexity Analysis   105


                Efficiency of String Operations. Most of the string operations have
            i   a time-complexity that is proportional to the length of the string. For

NOTE
          most problems that do not involve string processing, string operations sel-
          dom have an impact on the run time of an algorithm. Thus, in the text, we
          assume the string operations, including the use of the print() function, only
          require constant time, unless explicitly stated otherwise.



    The time required to execute a loop depends on the number of iterations per-
formed and the time needed to execute the loop body during each iteration. In this
case, the loop will be executed n times and the loop body only requires constant
time since it contains a single basic instruction. (Note that the underlying mech-
anism of the for loop and the range() function are both O(1).) We can compute
the time required by the loop as T (n) = n ⇤ 1 for a result of O(n).
    But what about the other statements in the function? The first line of the
function and the return statement only require constant time. Remember, it’s
common to omit the steps that only require constant time and instead focus on
the critical operations, those that contribute to the overall time. In most instances,
this means we can limit our evaluation to repetition and selection statements and
function and method calls since those have the greatest impact on the overall time
of an algorithm. Since the loop is the only non-constant step, the function ex1()
has a run time of O(n). That means the statement y = ex1(n) from earlier requires
linear time. Next, consider the following function, which includes two for loops:

       def ex2( n ):
         count = 0
         for i in range( n ) :
           count += 1
         for j in range( n ) :
           count += 1
         return count


   To evaluate the function, we have to determine the time required by each loop.
The two loops each require O(n) time as they are just like the loop in function
ex1() earlier. If we combine the times, it yields T (n) = n + n for a result of O(n).


Quadratic Time Examples
When presented with nested loops, such as in the following, the time required by
the inner loop impacts the time of the outer loop.

       def ex3( n ):
         count = 0
         for i in range( n ) :
           for j in range( n ) :
             count += 1
         return count
106   CHAPTER 4   Algorithm Analysis


               Both loops will be executed n, but since the inner loop is nested inside the outer
           loop, the total time required by the outer loop will be T (n) = n ⇤ n, resulting in
           a time of O(n2 ) for the ex3() function. Not all nested loops result in a quadratic
           time. Consider the following function:

               def ex4( n ):
                 count = 0
                 for i in range( n ) :
                   for j in range( 25 ) :
                     count += 1
                 return count

           which has a time-complexity of O(n). The function contains a nested loop, but
           the inner loop executes independent of the size variable n. Since the inner loop
           executes a constant number of times, it is a constant time operation. The outer
           loop executes n times, resulting in a linear run time. The next example presents a
           special case of nested loops:

               def ex5( n ):
                 count = 0
                 for i in range( n ) :
                   for j in range( i+1 ) :
                     count += 1
                 return count

               How many times does the inner loop execute? It depends on the current it-
           eration of the outer loop. On the first iteration of the outer loop, the inner loop
           will execute one time; on the second iteration, it executes two times; on the third
           iteration, it executes three times, and so on until the last iteration when the inner
           loop will execute n times. The time required to execute the outer loop will be the
           number of times the increment statement count += 1 is executed. Since the inner
           loop varies from 1 to n iterations by increments of 1, the total number of times
           the increment statement will be executed is equal to the sum of the first n positive
           integers:

                                                 n(n + 1)   n2 + n
                                       T (n) =            =
                                                    2          2
           which results in a quadratic time of O(n2 ).

           Logarithmic Time Examples
           The next example contains a single loop, but notice the change to the modification
           step. Instead of incrementing (or decrementing) by one, it cuts the loop variable
           in half each time through the loop.

               def ex6( n ):
                 count = 0
                 i = n
                 while i >= 1 :
4.1 Complexity Analysis     107

        count += 1
        i = i // 2
      return count


    To determine the run time of this function, we have to determine the number of
loop iterations just like we did with the earlier examples. Since the loop variable is
cut in half each time, this will be less than n. For example, if n equals 16, variable
i will contain the following five values during subsequent iterations (16, 8, 4, 2, 1).
    Given a small number, it’s easy to determine the number of loop iterations.
But how do we compute the number of iterations for any given value of n? When
the size of the input is reduced by half in each subsequent iteration, the number
of iterations required to reach a size of one will be equal to

                                    blog2 nc + 1

or the largest integer less than log2 n, plus 1. In our example of n = 16, there are
log2 16 + 1, or four iterations. The logarithm to base a of a number n, which is
normally written as y = loga n, is the power to which a must be raised to equal
n, n = ay . Thus, function ex6() requires O(log n) time. Since many problems in
computer science that repeatedly reduce the input size do so by half, it’s not un-
common to use log n to imply log2 n when specifying the run time of an algorithm.
    Finally, consider the following definition of function ex7(), which calls ex6()
from within a loop. Since the loop is executed n times and function ex6() requires
logarithmic time, ex7() will have a run time of O(n log n).

    def ex7( n ):
      count = 0
      for i in range( n )
        count += ex6( n )
      return count




Different Cases
Some algorithms can have run times that are di↵erent orders of magnitude for
di↵erent sets of inputs of the same size. These algorithms can be evaluated for
their best, worst, and average cases. Algorithms that have di↵erent cases can
typically be identified by the inclusion of an event-controlled loop or a conditional
statement. Consider the following example, which traverses a list containing integer
values to find the position of the first negative value. Note that for this problem,
the input is the collection of n values contained in the list.

    def findNeg( intList ):
      n = len(intList)
      for i in range( n ) :
        if intList[i] < 0 :
          return i
      return None
108   CHAPTER 4    Algorithm Analysis

               At first glance, it appears the loop will execute n times, where n is the size of
            the list. But notice the return statement inside the loop, which can cause it to
            terminate early. If the list does not contain a negative value,


                L = [ 72, 4, 90, 56, 12, 67, 43, 17, 2, 86, 33 ]
                p = findNeg( L )



            the return statement inside the loop will not be executed and the loop will ter-
            minate in the normal fashion from having traversed all n times. In this case, the
            function requires O(n) time. This is known as the worst case since the function
            must examine every value in the list requiring the most number of steps. Now
            consider the case where the list contains a negative value in the first element:


                L = [ -12, 50, 4, 67, 39, 22, 43, 2, 17, 28 ]
                p = findNeg( L )



                There will only be one iteration of the loop since the test of the condition by the
            if statement will be true the first time through and the return statement inside
            the loop will be executed. In this case, the findNeg() function only requires O(1)
            time. This is known as the best case since the function only has to examine the
            first value in the list requiring the least number of steps.
                The average case is evaluated for an expected data set or how we expect the
            algorithm to perform on average. For the findNeg() function, we would expect the
            search to iterate halfway through the list before finding the first negative value,
            which on average requires n/2 iterations. The average case is more di cult to
            evaluate because it’s not always readily apparent what constitutes the average
            case for a particular problem.
                In general, we are more interested in the worst case time-complexity of an
            algorithm as it provides an upper bound over all possible inputs. In addition, we
            can compare the worst case run times of di↵erent implementations of an algorithm
            to determine which is the most e cient for any input.



      4.2   Evaluating the Python List
            We defined several abstract data types for storing and using collections of data in
            the previous chapters. The next logical step is to analyze the operations of the
            various ADTs to determine their e ciency. The result of this analysis depends on
            the e ciency of the Python list since it was the primary data structure used to
            implement many of the earlier abstract data types.
                The implementation details of the list were discussed in Chapter 2. In this
            section, we use those details and evaluate the e ciency of some of the more common
            operations. A summary of the worst case run times are shown in Table 4.4.
C HAPTER 4

   Basic Searching Algorithms



Searching for data is a fundamental computer programming task and one
that has been studied for many years. This chapter looks at just one aspect of
the search problem—searching for a given value in a list (array).
   There are two fundamental ways to search for data in a list: the sequential
search and the binary search. Sequential search is used when the items in the
list are in random order; binary search is used when the items are sorted in
the list.


SEQUENTIAL SEARCHING

The most obvious type of search is to begin at the beginning of a set of
records and move through each record until you find the record you are
looking for or you come to the end of the records. This is called a sequential
search.
   A sequential search (also called a linear search) is very easy to implement.
Start at the beginning of the array and compare each accessed array element
to the value you’re searching for. If you find a match, the search is over. If you
get to the end of the array without generating a match, then the value is not
in the array.



                                       55
56                                         BASIC SEARCHING ALGORITHMS



  Here is a function that performs a sequential search:

  bool SeqSearch(int[] arr, int sValue) {
    for (int index = 0; index < arr.Length-1; index++)
      if (arr[index] == sValue)
        return true;
    return false;
  }


If a match is found, the function immediately returns True and exits.
If the end of the array is reached without the function returning True,
then the value being searched for is not in array and the function returns
False.
   Here is a program to test our implementation of a sequential search:


  using System;
  using System.IO;

  public class Chapter4 {

     static void Main() {
       int [] numbers = new int[100];
       StreamReader numFile =
             File.OpenText("c:numbers.txt");
       for (int i = 0; i < numbers.Length-1; i++)
         numbers[i] =
             Convert.ToInt32(numFile.ReadLine(), 10);
       int searchNumber;
       Console.Write("Enter a number to search for: ");
       searchNumber = Convert.ToInt32(Console.ReadLine(),
                                       10);
       bool found;
       found = SeqSearch(numbers, searchNumber);
       if (found)
         Console.WriteLine(searchNumber + " is in the
                           array.");
       else
         Console.WriteLine(searchNumber + " is not in the
                           array.");
Sequential Searching                                                          57



      }

      static bool SeqSearch(int[] arr, int sValue) {
        for (int index = 0; index < arr.Length-1; index++)
          if (arr[index] == sValue)
          return true;
        return false;
      }

  }


   The program works by first reading in a set of data from a text file. The data
consists of the first 100 integers, stored in the file in a partially random order.
The program then prompts the user to enter a number to search for and calls
the SeqSearch function to perform the search.
   You can also write the sequential search function so that the function returns
the position in the array where the searched-for value is found or a −1 if the
value cannot be found. First, let’s look at the new function:

  static int SeqSearch(int[] arr, int sValue) {
    for (int index = 0; index < arr.Length-1; index++)
      if (arr[index] == sValue)
        return index;
    return -1;
  }


  The following program uses this function:

  using System;
  using System.IO;

  public class Chapter4 {

      static void Main() {
        int [] numbers = new int[100];
        StreamReader numFile =_
            File.OpenText("c:numbers.txt");
        for (int i = 0; i < numbers.Length-1; i++)
          numbers[i] = Convert.ToInt32(numFile.ReadLine(),
                                       10);
58                                            BASIC SEARCHING ALGORITHMS



             int searchNumber;
             Console.Write("Enter a number to search for: ");
             searchNumber = Convert.ToInt32(Console.ReadLine(),
                                            10);
             int foundAt;
             foundAt = SeqSearch(numbers, searchNumber);
             if (foundAt >= 0)
               Console.WriteLine(searchNumber + " is in the_
                                 array at position " + foundAt);
             else
               Console.WriteLine(searchNumber + " is not in the
                                 array.");
         }

         static int SeqSearch(int[] arr, int sValue) {
           for (int index = 0; index < arr.Length-1; index++)
             if (arr[index] == sValue)
               return index;
           return -1;
         }

     }


Searching for Minimum and Maximum Values

Computer programs are often asked to search an array (or other data structure)
for minimum and maximum values. In an ordered array, searching for these
values is a trivial task. Searching an unordered array, however, is a little more
challenging.
   Let’s start by looking at how to find the minimum value in an array. The
algorithm is:

1. Assign the first element of the array to a variable as the minimum value.
2. Begin looping through the array, comparing each successive array element
   with the minimum value variable.
3. If the currently accessed array element is less than the minimum value,
   assign this element to the minimum value variable.
4. Continue until the last array element is accessed.
5. The minimum value is stored in the variable.
Sequential Searching                                                         59



  Let’s look at a function, FindMin, which implements this algorithm:

  static int FindMin(int[] arr) {
    int min = arr[0];
    for(int i = 0; i < arr.Length-1; i++)
      if (arr[index] < min)
        min = arr[index];
    return min;
  }

   Notice that the array search starts at position 1 and not at position 0. The
0th position is assigned as the minimum value before the loop starts, so we
can start making comparisons at position 1.
   The algorithm for finding the maximum value in an array works in the same
way. We assign the first array element to a variable that holds the maximum
amount. Next we loop through the array, comparing each array element with
the value stored in the variable, replacing the current value if the accessed
value is greater. Here’s the code:

  static int FindMax(int[] arr) {
    int max = arr[0];
    for(int i = 0; i < arr.Length-1; i++)
      if (arr[index] > max)
        max = arr[index];
      return max;
  }


  An alternative version of these two functions could return the position of
the maximum or minimum value in the array rather than the actual value.


Making Sequential Search Faster: Self-Organizing Data

The fastest successful sequential searches occur when the data element being
searched for is at the beginning of the data set. You can ensure that a success-
fully located data item is at the beginning of the data set by moving it there
after it has been found.
   The concept behind this strategy is that we can minimize search times
by putting frequently searched-for items at the beginning of the data set.
60                                            BASIC SEARCHING ALGORITHMS



Eventually, all the most frequently searched-for data items will be located at
the beginning of the data set. This is an example of self-organization, in that
the data set is organized not by the programmer before the program runs, but
by the program while the program is running.
   It makes sense to allow your data to organize in this way since the data being
searched probably follows the “80–20” rule, meaning that 80% of the searches
conducted on your data set are searching for 20% of the data in the data set.
Self-organization will eventually put that 20% at the beginning of the data set,
where a sequential search will find them quickly.
   Probability distributions such as this are called Pareto distributions, named
for Vilfredo Pareto, who discovered these distributions studying the spread of
income and wealth in the late nineteenth century. See Knuth (1998, pp. 399–
401) for more on probability distributions in data sets.
   We can modify our SeqSearch method quite easily to include self-
organization. Here’s a first stab at the method:

  static bool SeqSearch(int sValue) {
    for(int index = 0; i < arr.Length-1; i++)
      if (arr[index] == sValue) {
        swap(index, index-1);
        return true;
      }
    return false;
  }


  If the search is successful, the item found is switched with the element at
the first of the array using a swap function, shown as follows:

  static void swap(ref int item1, ref int item2) {
    int temp = arr[item1];
    arr[item1] = arr[item2];
    arr[item2] = temp;
  }


   The problem with the SeqSearch method as we’ve modified it is that fre-
quently accessed items might be moved around quite a bit during the course
of many searches. We want to keep items that are moved to the first of the
Sequential Searching                                                           61



data set there and not moved farther back when a subsequent item farther
down in the set is successfully located.
   There are two ways we can achieve this goal. First, we can only swap found
items if they are located away from the beginning of the data set. We only
have to determine what is considered to be far enough back in the data set to
warrant swapping. Following the “80–20” rule again, we can make a rule that
a data item is relocated to the beginning of the data set only if its location is
outside the first 20% of the items in the data set. Here’s the code for this first
rewrite:

  static int SeqSearch(int sValue) {
    for(int index = 0; i < arr.Length-1; i++)
      if (arr[index] == sValue && index > (arr.Length *_
          0.2)) {
        swap(index, index-1);
        return index;
      } else
           if (arr[index] == sValue)
             return index;
    return -1;
  }

   The If–Then statement is short-circuited because if the item isn’t found in
the data set, there’s no reason to test to see where the index is in the data set.
   The other way we can rewrite the SeqSearch method is to swap a found item
with the element that precedes it in the data set. Using this method, which
is similar to how data is sorted using the Bubble sort, the most frequently
accessed items will eventually work their way up to the front of the data set.
This technique also guarantees that if an item is already at the beginning of
the data set, it won’t move back down.
   The code for this new version of SeqSearch is shown as follows:
  static int SeqSearch(int sValue) {
    for(int index = 0; i < arr.Length-1; i++)
      if (arr[index] == sValue) {
        swap(index, index-1);
        return index;
      }
    return -1;
  }
62                                           BASIC SEARCHING ALGORITHMS



   Either of these solutions will help your searches when, for whatever reason,
you must keep your data set in an unordered sequence. In the next section, we
will discuss a search algorithm that is more efficient than any of the sequen-
tial algorithms mentioned, but that only works on ordered data—the binary
search.


Binary Search

When the records you are searching through are sorted into order, you can
perform a more efficient search than the sequential search to find a value. This
search is called a binary search.
   To understand how a binary search works, imagine you are trying to guess
a number between 1 and 100 chosen by a friend. For every guess you make,
the friend tells you if you guessed the correct number, or if your guess is too
high, or if your guess is too low. The best strategy then is to choose 50 as
the first guess. If that guess is too high, you should then guess 25. If 50 is to
low, you should guess 75. Each time you guess, you select a new midpoint
by adjusting the lower range or the upper range of the numbers (depending
on if your guess is too high or too low), which becomes your next guess.
As long as you follow that strategy, you will eventually guess the correct
number. Figure 4.1 demonstrates how this works if the number to be chosen
is 82.
   We can implement this strategy as an algorithm, the binary search algo-
rithm. To use this algorithm, we first need our data stored in order (ascending,
preferably) in an array (though other data structures will work as well). The
first steps in the algorithm are to set the lower and upper bounds of the search.
At the beginning of the search, this means the lower and upper bounds of the
array. Then, we calculate the midpoint of the array by adding the lower and
upper bounds together and dividing by 2. The array element stored at this
position is compared to the searched-for value. If they are the same, the value
has been found and the algorithm stops. If the searched-for value is less than
the midpoint value, a new upper bound is calculated by subtracting 1 from the
midpoint. Otherwise, if the searched-for value is greater than the midpoint
value, a new lower bound is calculated by adding 1 to the midpoint. The
algorithm iterates until the lower bound equals the upper bound, which indi-
cates the array has been completely searched. If this occurs, a -1 is returned,
indicating that no element in the array holds the value being searched
for.
Sequential Searching                                                                   63



        Guessing Game-Secret number is 82

         1                                                                       100
                            25                       50               75    82


        First Guess : 50
        Answer : Too low
                      51                                              100
                                                    75    82


        Second Guess : 75
        Answer : Too low
                      76                                   100
                             82           88


        Third Guess : 88
        Answer : Too high
                      76                       87
                                  81 82


        Fourth Guess : 81
        Answer : Too low
                      82          87
                             84


        Fifth Guess : 84
        Answer : Too high
                      82          83

                           Midpoint is 82.5, which is rounded to 82

        Sixth Guess : 82
        Answer : Correct

                              FIGURE 4.1. A Binary Search Analogy.

  Here’s the algorithm written as a C# function:
  static int binSearch(int value) {
    int upperBound, lowerBound, mid;
    upperBound = arr.Length-1;
    lowerBound = 0;
    while(lowerBound <= upperBound) {
      mid = (upperBound + lowerBound) / 2;
64                                          BASIC SEARCHING ALGORITHMS



          if (arr[mid] == value)
            return mid;
          else
            if (value < arr[mid])
              upperBound = mid - 1;
          else
            lowerBound = mid + 1;
         }
         return -1;
     }

  Here’s a program that uses the binary search method to search an array:

  static void Main(string[] args)
  {
    Random random = new Random();
    CArray mynums = new CArray(9);
    for(int i = 0; i <= 9; i++)
      mynums.Insert(random.next(100));
    mynums.SortArr();
    mynums.showArray();
    int position = mynums.binSearch(77, 0, 0);
    if (position >= -1)
    {
      Console.WriteLine("found item");
      mynums.showArray();
    } else
      Console.WriteLine("Not in the array");
    Console.Read();
    }


A Recursive Binary Search Algorithm

Although the version of the binary search algorithm developed in the previ-
ous section is correct, it’s not really a natural solution to the problem. The
binary search algorithm is really a recursive algorithm because, by constantly
subdividing the array until we find the item we’re looking for (or run out of
room in the array), each subdivision is expressing the problem as a smaller
Sequential Searching                                                           65



version of the original problem. Viewing the problem this ways leads us to
discover a recursive algorithm for performing a binary search.
   In order for a recursive binary search algorithm to work, we have to make
some changes to the code. Let’s take a look at the code first and then we’ll
discuss the changes we’ve made:

  public int RbinSearch(int value, int lower, int upper) {
    if (lower > upper)
      return -1;
    else {
      int mid;
      mid = (int)(upper+lower) / 2;
      if (value < arr[mid])
        RbinSearch(value, lower, mid-1);
      else if (value = arr[mid])
        return mid;
      else
        RbinSearch(value, mid+1, upper)
    }
  }


   The main problem with the recursive binary search algorithm, as compared
to the iterative algorithm, is its efficiency. When a 1,000-element array is sorted
using both algorithms, the recursive algorithm is consistently 10 times slower
than the iterative algorithm:




Of course, recursive algorithms are often chosen for other reasons than effi-
ciency, but you should keep in mind that anytime you implement a recursive
algorithm, you should also look for an iterative solution so that you can
compare the efficiency of the two algorithms.
   Finally, before we leave the subject of binary search, we should mention that
the Array class has a built-in binary search method. It takes two arguments,
66                                            BASIC SEARCHING ALGORITHMS



an array name and an item to search for, and it returns the position of the item
in the array, or -1 if the item can’t be found.
   To demonstrate how the method works, we’ve written yet another binary
search method for our demonstration class. Here’s the code:

  public int Bsearh(int value) {
    return Array.BinarySearch(arr, value)
  }


   When the built-in binary search method is compared with our custom-
built method, it consistently performs 10 times faster than the custom-built
method, which should not be surprising. A built-in data structure or algorithm
should always be chosen over one that is custom-built, if the two can be used
in exactly the same ways.


SUMMARY

Searching a data set for a value is a ubiquitous computational operation. The
simplest method of searching a data set is to start at the beginning and search
for the item until either the item is found or the end of the data set is reached.
This searching method works best when the data set is relatively small and
unordered.
   If the data set is ordered, the binary search algorithm is a better choice.
Binary search works by continually subdividing the data set until the item
being searched for is found. You can write the binary search algorithm using
both iterative and recursive codes. The Array class in C# includes a built-in
binary search method, which should be used whenever a binary search is
called for.


EXERCISES

1. The sequential search algorithm will always find the first occurrence of
   an item in a data set. Create a new sequential search method that takes a
   second integer argument indicating which occurrence of an item you want
   to search for.
2. Write a sequential search method that finds the last occurrence of an item.
3. Run the binary search method on a set of unordered data. What happens?
Exercises                                                                67



4. Using the CArray class with the SeqSearch method and the BinSearch
   method, create an array of 1,000 random integers. Add a new private Inte-
   ger data member named compCount that is initialized to 0. In each of the
   search algorithms, add a line of code right after the critical comparison
   is made that increments compCount by 1. Run both methods, searching
   for the same number, say 734, with each method. Compare the values of
   compCount after running both methods. What is the value of compCount
   for each method? Which method makes the fewest comparisons?

More Related Content

What's hot

Performance analysis(Time & Space Complexity)
Performance analysis(Time & Space Complexity)Performance analysis(Time & Space Complexity)
Performance analysis(Time & Space Complexity)swapnac12
 
Mathematical Analysis of Recursive Algorithm.
Mathematical Analysis of Recursive Algorithm.Mathematical Analysis of Recursive Algorithm.
Mathematical Analysis of Recursive Algorithm.mohanrathod18
 
Time complexity (linear search vs binary search)
Time complexity (linear search vs binary search)Time complexity (linear search vs binary search)
Time complexity (linear search vs binary search)Kumar
 
Fundamentals of the Analysis of Algorithm Efficiency
Fundamentals of the Analysis of Algorithm EfficiencyFundamentals of the Analysis of Algorithm Efficiency
Fundamentals of the Analysis of Algorithm EfficiencySaranya Natarajan
 
02 order of growth
02 order of growth02 order of growth
02 order of growthHira Gul
 
Lec03 04-time complexity
Lec03 04-time complexityLec03 04-time complexity
Lec03 04-time complexityAbbas Ali
 
asymptotic analysis and insertion sort analysis
asymptotic analysis and insertion sort analysisasymptotic analysis and insertion sort analysis
asymptotic analysis and insertion sort analysisAnindita Kundu
 
Algorithm Analysis
Algorithm AnalysisAlgorithm Analysis
Algorithm AnalysisMegha V
 
TIME EXECUTION OF DIFFERENT SORTED ALGORITHMS
TIME EXECUTION   OF  DIFFERENT SORTED ALGORITHMSTIME EXECUTION   OF  DIFFERENT SORTED ALGORITHMS
TIME EXECUTION OF DIFFERENT SORTED ALGORITHMSTanya Makkar
 
Algorithm And analysis Lecture 03& 04-time complexity.
 Algorithm And analysis Lecture 03& 04-time complexity. Algorithm And analysis Lecture 03& 04-time complexity.
Algorithm And analysis Lecture 03& 04-time complexity.Tariq Khan
 
Analysis of algorithn class 2
Analysis of algorithn class 2Analysis of algorithn class 2
Analysis of algorithn class 2Kumar
 
DSP_FOEHU - MATLAB 04 - The Discrete Fourier Transform (DFT)
DSP_FOEHU - MATLAB 04 - The Discrete Fourier Transform (DFT)DSP_FOEHU - MATLAB 04 - The Discrete Fourier Transform (DFT)
DSP_FOEHU - MATLAB 04 - The Discrete Fourier Transform (DFT)Amr E. Mohamed
 
Asymptotic Notation and Complexity
Asymptotic Notation and ComplexityAsymptotic Notation and Complexity
Asymptotic Notation and ComplexityRajandeep Gill
 
Asymptotic notations(Big O, Omega, Theta )
Asymptotic notations(Big O, Omega, Theta )Asymptotic notations(Big O, Omega, Theta )
Asymptotic notations(Big O, Omega, Theta )swapnac12
 
Asymptotic Analysis
Asymptotic AnalysisAsymptotic Analysis
Asymptotic Analysissonugupta
 

What's hot (20)

Performance analysis(Time & Space Complexity)
Performance analysis(Time & Space Complexity)Performance analysis(Time & Space Complexity)
Performance analysis(Time & Space Complexity)
 
Mathematical Analysis of Recursive Algorithm.
Mathematical Analysis of Recursive Algorithm.Mathematical Analysis of Recursive Algorithm.
Mathematical Analysis of Recursive Algorithm.
 
Time complexity (linear search vs binary search)
Time complexity (linear search vs binary search)Time complexity (linear search vs binary search)
Time complexity (linear search vs binary search)
 
Fundamentals of the Analysis of Algorithm Efficiency
Fundamentals of the Analysis of Algorithm EfficiencyFundamentals of the Analysis of Algorithm Efficiency
Fundamentals of the Analysis of Algorithm Efficiency
 
Complexity of Algorithm
Complexity of AlgorithmComplexity of Algorithm
Complexity of Algorithm
 
02 order of growth
02 order of growth02 order of growth
02 order of growth
 
Lec03 04-time complexity
Lec03 04-time complexityLec03 04-time complexity
Lec03 04-time complexity
 
Big o
Big oBig o
Big o
 
Asymptotic Notation
Asymptotic NotationAsymptotic Notation
Asymptotic Notation
 
asymptotic analysis and insertion sort analysis
asymptotic analysis and insertion sort analysisasymptotic analysis and insertion sort analysis
asymptotic analysis and insertion sort analysis
 
Algorithm Analysis
Algorithm AnalysisAlgorithm Analysis
Algorithm Analysis
 
TIME EXECUTION OF DIFFERENT SORTED ALGORITHMS
TIME EXECUTION   OF  DIFFERENT SORTED ALGORITHMSTIME EXECUTION   OF  DIFFERENT SORTED ALGORITHMS
TIME EXECUTION OF DIFFERENT SORTED ALGORITHMS
 
Algorithm And analysis Lecture 03& 04-time complexity.
 Algorithm And analysis Lecture 03& 04-time complexity. Algorithm And analysis Lecture 03& 04-time complexity.
Algorithm And analysis Lecture 03& 04-time complexity.
 
Time complexity
Time complexityTime complexity
Time complexity
 
Analysis of algorithn class 2
Analysis of algorithn class 2Analysis of algorithn class 2
Analysis of algorithn class 2
 
DSP_FOEHU - MATLAB 04 - The Discrete Fourier Transform (DFT)
DSP_FOEHU - MATLAB 04 - The Discrete Fourier Transform (DFT)DSP_FOEHU - MATLAB 04 - The Discrete Fourier Transform (DFT)
DSP_FOEHU - MATLAB 04 - The Discrete Fourier Transform (DFT)
 
Asymptotic Notation and Complexity
Asymptotic Notation and ComplexityAsymptotic Notation and Complexity
Asymptotic Notation and Complexity
 
Asymptotic notations(Big O, Omega, Theta )
Asymptotic notations(Big O, Omega, Theta )Asymptotic notations(Big O, Omega, Theta )
Asymptotic notations(Big O, Omega, Theta )
 
Algorithm.ppt
Algorithm.pptAlgorithm.ppt
Algorithm.ppt
 
Asymptotic Analysis
Asymptotic AnalysisAsymptotic Analysis
Asymptotic Analysis
 

Viewers also liked

NoSQL Databases, Not just a Buzzword
NoSQL Databases, Not just a Buzzword NoSQL Databases, Not just a Buzzword
NoSQL Databases, Not just a Buzzword Haitham El-Ghareeb
 
EMC Academic Alliance Presentation
EMC Academic Alliance PresentationEMC Academic Alliance Presentation
EMC Academic Alliance PresentationHaitham El-Ghareeb
 
وحدة التعلم الذاتي 2015
وحدة التعلم الذاتي 2015وحدة التعلم الذاتي 2015
وحدة التعلم الذاتي 2015Haitham El-Ghareeb
 
Gather - a wide range of vegan dishes that excited and pleased the taste buds!
Gather - a wide range of vegan dishes that excited and pleased the taste buds!Gather - a wide range of vegan dishes that excited and pleased the taste buds!
Gather - a wide range of vegan dishes that excited and pleased the taste buds!Heena Modi
 
Zanzibar, The old days
Zanzibar, The old daysZanzibar, The old days
Zanzibar, The old daysHeena Modi
 
ROI Conference 2013 - Your Social Success Story
ROI Conference 2013 - Your Social Success StoryROI Conference 2013 - Your Social Success Story
ROI Conference 2013 - Your Social Success StoryChris Treadaway
 
Tom tom go 510
Tom tom go 510Tom tom go 510
Tom tom go 510Heena Modi
 
ExL Pharma Clinical Trials Phase I and Phase IIa Conference Brochure: Phase 1...
ExL Pharma Clinical Trials Phase I and Phase IIa Conference Brochure: Phase 1...ExL Pharma Clinical Trials Phase I and Phase IIa Conference Brochure: Phase 1...
ExL Pharma Clinical Trials Phase I and Phase IIa Conference Brochure: Phase 1...bryonmain
 
PROYECTO PUNTA COLONET
PROYECTO PUNTA COLONETPROYECTO PUNTA COLONET
PROYECTO PUNTA COLONETguest711f1b7
 
Afrika Sunrise Jantjebeton
Afrika Sunrise  JantjebetonAfrika Sunrise  Jantjebeton
Afrika Sunrise JantjebetonHeena Modi
 
Vegetarian vision
Vegetarian visionVegetarian vision
Vegetarian visionHeena Modi
 

Viewers also liked (20)

LectureNotes-06-DSA
LectureNotes-06-DSALectureNotes-06-DSA
LectureNotes-06-DSA
 
NoSQL Databases, Not just a Buzzword
NoSQL Databases, Not just a Buzzword NoSQL Databases, Not just a Buzzword
NoSQL Databases, Not just a Buzzword
 
LectureNotes-02-DSA
LectureNotes-02-DSALectureNotes-02-DSA
LectureNotes-02-DSA
 
EMC Academic Alliance Presentation
EMC Academic Alliance PresentationEMC Academic Alliance Presentation
EMC Academic Alliance Presentation
 
LectureNotes-04-DSA
LectureNotes-04-DSALectureNotes-04-DSA
LectureNotes-04-DSA
 
وحدة التعلم الذاتي 2015
وحدة التعلم الذاتي 2015وحدة التعلم الذاتي 2015
وحدة التعلم الذاتي 2015
 
LectureNotes-05-DSA
LectureNotes-05-DSALectureNotes-05-DSA
LectureNotes-05-DSA
 
LectureNotes-03-DSA
LectureNotes-03-DSALectureNotes-03-DSA
LectureNotes-03-DSA
 
Gather - a wide range of vegan dishes that excited and pleased the taste buds!
Gather - a wide range of vegan dishes that excited and pleased the taste buds!Gather - a wide range of vegan dishes that excited and pleased the taste buds!
Gather - a wide range of vegan dishes that excited and pleased the taste buds!
 
Zanzibar, The old days
Zanzibar, The old daysZanzibar, The old days
Zanzibar, The old days
 
ROI Conference 2013 - Your Social Success Story
ROI Conference 2013 - Your Social Success StoryROI Conference 2013 - Your Social Success Story
ROI Conference 2013 - Your Social Success Story
 
Tom tom go 510
Tom tom go 510Tom tom go 510
Tom tom go 510
 
10 words
10 words10 words
10 words
 
ExL Pharma Clinical Trials Phase I and Phase IIa Conference Brochure: Phase 1...
ExL Pharma Clinical Trials Phase I and Phase IIa Conference Brochure: Phase 1...ExL Pharma Clinical Trials Phase I and Phase IIa Conference Brochure: Phase 1...
ExL Pharma Clinical Trials Phase I and Phase IIa Conference Brochure: Phase 1...
 
Father
FatherFather
Father
 
Future of Retail-Twig Lane Group
  Future of Retail-Twig Lane Group  Future of Retail-Twig Lane Group
Future of Retail-Twig Lane Group
 
Utagoe intro
Utagoe introUtagoe intro
Utagoe intro
 
PROYECTO PUNTA COLONET
PROYECTO PUNTA COLONETPROYECTO PUNTA COLONET
PROYECTO PUNTA COLONET
 
Afrika Sunrise Jantjebeton
Afrika Sunrise  JantjebetonAfrika Sunrise  Jantjebeton
Afrika Sunrise Jantjebeton
 
Vegetarian vision
Vegetarian visionVegetarian vision
Vegetarian vision
 

Similar to Data Structures - Lecture 8 - Study Notes

Data Structure & Algorithms - Mathematical
Data Structure & Algorithms - MathematicalData Structure & Algorithms - Mathematical
Data Structure & Algorithms - Mathematicalbabuk110
 
Ch24 efficient algorithms
Ch24 efficient algorithmsCh24 efficient algorithms
Ch24 efficient algorithmsrajatmay1992
 
Data Structures- Part2 analysis tools
Data Structures- Part2 analysis toolsData Structures- Part2 analysis tools
Data Structures- Part2 analysis toolsAbdullah Al-hazmy
 
Unit i basic concepts of algorithms
Unit i basic concepts of algorithmsUnit i basic concepts of algorithms
Unit i basic concepts of algorithmssangeetha s
 
Aad introduction
Aad introductionAad introduction
Aad introductionMr SMAK
 
Asymptotic Notations
Asymptotic NotationsAsymptotic Notations
Asymptotic NotationsNagendraK18
 
Algorithm chapter 2
Algorithm chapter 2Algorithm chapter 2
Algorithm chapter 2chidabdu
 
DAA-Unit1.pptx
DAA-Unit1.pptxDAA-Unit1.pptx
DAA-Unit1.pptxNishaS88
 
Introduction to data structures and complexity.pptx
Introduction to data structures and complexity.pptxIntroduction to data structures and complexity.pptx
Introduction to data structures and complexity.pptxPJS KUMAR
 
Asymptotic Analysis in Data Structure using C
Asymptotic Analysis in Data Structure using CAsymptotic Analysis in Data Structure using C
Asymptotic Analysis in Data Structure using CMeghaj Mallick
 
Data Structures Notes
Data Structures NotesData Structures Notes
Data Structures NotesRobinRohit2
 
algorithmanalysis and effciency.pptx
algorithmanalysis and effciency.pptxalgorithmanalysis and effciency.pptx
algorithmanalysis and effciency.pptxChSreenivasuluReddy
 
DSA Complexity.pptx What is Complexity Analysis? What is the need for Compl...
DSA Complexity.pptx   What is Complexity Analysis? What is the need for Compl...DSA Complexity.pptx   What is Complexity Analysis? What is the need for Compl...
DSA Complexity.pptx What is Complexity Analysis? What is the need for Compl...2022cspaawan12556
 
ALGORITHMS - SHORT NOTES
ALGORITHMS - SHORT NOTESALGORITHMS - SHORT NOTES
ALGORITHMS - SHORT NOTESsuthi
 
Analysis Of Algorithms I
Analysis Of Algorithms IAnalysis Of Algorithms I
Analysis Of Algorithms ISri Prasanna
 
Ch-2 final exam documet compler design elements
Ch-2 final exam documet compler design elementsCh-2 final exam documet compler design elements
Ch-2 final exam documet compler design elementsMAHERMOHAMED27
 
ASYMTOTIC NOTATIONS BIG O OEMGA THETE NOTATION.pptx
ASYMTOTIC NOTATIONS BIG O OEMGA THETE NOTATION.pptxASYMTOTIC NOTATIONS BIG O OEMGA THETE NOTATION.pptx
ASYMTOTIC NOTATIONS BIG O OEMGA THETE NOTATION.pptxsunitha1792
 

Similar to Data Structures - Lecture 8 - Study Notes (20)

Data Structure & Algorithms - Mathematical
Data Structure & Algorithms - MathematicalData Structure & Algorithms - Mathematical
Data Structure & Algorithms - Mathematical
 
Ch24 efficient algorithms
Ch24 efficient algorithmsCh24 efficient algorithms
Ch24 efficient algorithms
 
Data Structures- Part2 analysis tools
Data Structures- Part2 analysis toolsData Structures- Part2 analysis tools
Data Structures- Part2 analysis tools
 
Unit i basic concepts of algorithms
Unit i basic concepts of algorithmsUnit i basic concepts of algorithms
Unit i basic concepts of algorithms
 
Aad introduction
Aad introductionAad introduction
Aad introduction
 
Asymptotic Notations
Asymptotic NotationsAsymptotic Notations
Asymptotic Notations
 
Big Oh.ppt
Big Oh.pptBig Oh.ppt
Big Oh.ppt
 
Algorithm chapter 2
Algorithm chapter 2Algorithm chapter 2
Algorithm chapter 2
 
DAA-Unit1.pptx
DAA-Unit1.pptxDAA-Unit1.pptx
DAA-Unit1.pptx
 
Introduction to data structures and complexity.pptx
Introduction to data structures and complexity.pptxIntroduction to data structures and complexity.pptx
Introduction to data structures and complexity.pptx
 
Asymptotic Analysis in Data Structure using C
Asymptotic Analysis in Data Structure using CAsymptotic Analysis in Data Structure using C
Asymptotic Analysis in Data Structure using C
 
Data Structures Notes
Data Structures NotesData Structures Notes
Data Structures Notes
 
algorithmanalysis and effciency.pptx
algorithmanalysis and effciency.pptxalgorithmanalysis and effciency.pptx
algorithmanalysis and effciency.pptx
 
DSA Complexity.pptx What is Complexity Analysis? What is the need for Compl...
DSA Complexity.pptx   What is Complexity Analysis? What is the need for Compl...DSA Complexity.pptx   What is Complexity Analysis? What is the need for Compl...
DSA Complexity.pptx What is Complexity Analysis? What is the need for Compl...
 
Unit 1.pptx
Unit 1.pptxUnit 1.pptx
Unit 1.pptx
 
ALGORITHMS - SHORT NOTES
ALGORITHMS - SHORT NOTESALGORITHMS - SHORT NOTES
ALGORITHMS - SHORT NOTES
 
Analysis Of Algorithms I
Analysis Of Algorithms IAnalysis Of Algorithms I
Analysis Of Algorithms I
 
Analysis of algorithms
Analysis of algorithmsAnalysis of algorithms
Analysis of algorithms
 
Ch-2 final exam documet compler design elements
Ch-2 final exam documet compler design elementsCh-2 final exam documet compler design elements
Ch-2 final exam documet compler design elements
 
ASYMTOTIC NOTATIONS BIG O OEMGA THETE NOTATION.pptx
ASYMTOTIC NOTATIONS BIG O OEMGA THETE NOTATION.pptxASYMTOTIC NOTATIONS BIG O OEMGA THETE NOTATION.pptx
ASYMTOTIC NOTATIONS BIG O OEMGA THETE NOTATION.pptx
 

More from Haitham El-Ghareeb (20)

مختصر وحدة التعلم الذاتي 2015
مختصر وحدة التعلم الذاتي 2015مختصر وحدة التعلم الذاتي 2015
مختصر وحدة التعلم الذاتي 2015
 
DSA - 2012 - Conclusion
DSA - 2012 - ConclusionDSA - 2012 - Conclusion
DSA - 2012 - Conclusion
 
Lecture 9 - DSA - Python Data Structures
Lecture 9 - DSA - Python Data StructuresLecture 9 - DSA - Python Data Structures
Lecture 9 - DSA - Python Data Structures
 
Lect07
Lect07Lect07
Lect07
 
Lecture 07 Data Structures - Basic Sorting
Lecture 07 Data Structures - Basic SortingLecture 07 Data Structures - Basic Sorting
Lecture 07 Data Structures - Basic Sorting
 
LectureNotes-01-DSA
LectureNotes-01-DSALectureNotes-01-DSA
LectureNotes-01-DSA
 
Lecture-05-DSA
Lecture-05-DSALecture-05-DSA
Lecture-05-DSA
 
Learn Latex
Learn LatexLearn Latex
Learn Latex
 
Research Methodologies - Lecture 02
Research Methodologies - Lecture 02Research Methodologies - Lecture 02
Research Methodologies - Lecture 02
 
DSA-Lecture-05
DSA-Lecture-05DSA-Lecture-05
DSA-Lecture-05
 
DSA - Lecture 04
DSA - Lecture 04DSA - Lecture 04
DSA - Lecture 04
 
DSA - Lecture 03
DSA - Lecture 03DSA - Lecture 03
DSA - Lecture 03
 
Ph.D. Registeration seminar
Ph.D. Registeration seminarPh.D. Registeration seminar
Ph.D. Registeration seminar
 
Lecture 01 - Research Methods
Lecture 01 - Research MethodsLecture 01 - Research Methods
Lecture 01 - Research Methods
 
Lecture 02 - DSA
Lecture 02 - DSALecture 02 - DSA
Lecture 02 - DSA
 
DSA-2012-Lect00
DSA-2012-Lect00DSA-2012-Lect00
DSA-2012-Lect00
 
Python Tutorial Part 2
Python Tutorial Part 2Python Tutorial Part 2
Python Tutorial Part 2
 
Python Tutorial Part 1
Python Tutorial Part 1Python Tutorial Part 1
Python Tutorial Part 1
 
Mozilla B2G
Mozilla B2GMozilla B2G
Mozilla B2G
 
Cisco ios-cont
Cisco ios-contCisco ios-cont
Cisco ios-cont
 

Data Structures - Lecture 8 - Study Notes

  • 1. CHAPTER 4 Algorithm Analysis Algorithms are designed to solve problems, but a given problem can have many di↵erent solutions. How then are we to determine which solution is the most e cient for a given problem? One approach is to measure the execution time. We can implement the solution by constructing a computer program, using a given programming language. We then execute the program and time it using a wall clock or the computer’s internal clock. The execution time is dependent on several factors. First, the amount of data that must be processed directly a↵ects the execution time. As the data set size increases, so does the execution time. Second, the execution times can vary de- pending on the type of hardware and the time of day a computer is used. If we use a multi-process, multi-user system to execute the program, the execution of other programs on the same machine can directly a↵ect the execution time of our program. Finally, the choice of programming language and compiler used to implement an algorithm can also influence the execution time. Some compilers are better optimizers than others and some languages produce better optimized code than others. Thus, we need a method to analyze an algorithm’s e ciency independent of the implementation details. 4.1 Complexity Analysis To determine the e ciency of an algorithm, we can examine the solution itself and measure those aspects of the algorithm that most critically a↵ect its execution time. For example, we can count the number of logical comparisons, data interchanges, or arithmetic operations. Consider the following algorithm for computing the sum of each row of an n ⇥ n matrix and an overall sum of the entire matrix: totalSum = 0 # Version 1 for i in range( n ) : rowSum[i] = 0 for j in range( n ) : rowSum[i] = rowSum[i] + matrix[i,j] totalSum = totalSum + matrix[i,j] 97
  • 2. 98 CHAPTER 4 Algorithm Analysis Suppose we want to analyze the algorithm based on the number of additions performed. In this example, there are only two addition operations, making this a simple task. The algorithm contains two loops, one nested inside the other. The inner loop is executed n times and since it contains the two addition operations, there are a total of 2n additions performed by the inner loop for each iteration of the outer loop. The outer loop is also performed n times, for a total of 2n2 additions. Can we improve upon this algorithm to reduce the total number of addition operations performed? Consider a new version of the algorithm in which the second addition is moved out of the inner loop and modified to sum the entries in the rowSum array instead of individual elements of the matrix. totalSum = 0 # Version 2 for i in range( n ) : rowSum[i] = 0 for j in range( n ) : rowSum[i] = rowSum[i] + matrix[i,j] totalSum = totalSum + rowSum[i] In this version, the inner loop is again executed n times, but this time, it only contains one addition operation. That gives a total of n additions for each iteration of the outer loop, but the outer loop now contains an addition operator of its own. To calculate the total number of additions for this version, we take the n additions performed by the inner loop and add one for the addition performed at the bottom of the outer loop. This gives n + 1 additions for each iteration of the outer loop, which is performed n times for a total of n2 + n additions. If we compare the two results, it’s obvious the number of additions in the second version is less than the first for any n greater than 1. Thus, the second version will execute faster than the first, but the di↵erence in execution times will not be signif- icant. The reason is that both algorithms execute on the same order of magnitude, namely n2 . Thus, as the size of n increases, both algorithms increase at approxi- mately the same rate (though one is slightly better), as illustrated numerically in Table 4.1 and graphically in Figure 4.1. n 2n2 n2 + n 10 200 110 100 20,000 10,100 1000 2,000,000 1,001,000 10000 200,000,000 100,010,000 100000 20,000,000,000 10,000,100,000 Table 4.1: Growth rate comparisons for different input sizes.
  • 3. 4.1 Complexity Analysis 99 Figure 4.1: Graphical comparison of the growth rates from Table 4.1. 4.1.1 Big-O Notation Instead of counting the precise number of operations or steps, computer scientists are more interested in classifying an algorithm based on the order of magnitude as applied to execution time or space requirements. This classification approx- imates the actual number of required steps for execution or the actual storage requirements in terms of variable-sized data sets. The term big-O, which is de- rived from the expression “on the order of,” is used to specify an algorithm’s classification. Defining Big-O Assume we have a function T (n) that represents the approximate number of steps required by an algorithm for an input of size n. For the second version of our algorithm in the previous section, this would be written as T2 (n) = n2 + n Now, suppose there exists a function f (n) defined for the integers n 0, such that for some constant c, and some constant m, T (n)  cf (n) for all su ciently large values of n m. Then, such an algorithm is said to have a time-complexity of, or executes on the order of, f (n) relative to the number of operations it requires. In other words, there is a positive integer m and a constant c (constant of proportionality ) such that for all n m, T (n)  cf (n). The
  • 4. 100 CHAPTER 4 Algorithm Analysis function f (n) indicates the rate of growth at which the run time of an algorithm increases as the input size, n, increases. To specify the time-complexity of an algorithm, which runs on the order of f (n), we use the notation O( f (n) ) Consider the two versions of our algorithm from earlier. For version one, the time was computed to be T1 (n) = 2n2 . If we let c = 2, then 2n2  2n2 for a result of O(n2 ). For version two, we computed a time of T2 (n) = n2 + n. Again, if we let c = 2, then n2 + n  2n2 for a result of O(n2 ). In this case, the choice of c comes from the observation that when n 1, we have n  n2 and n2 + n  n2 + n2 , which satisfies the equation in the definition of big-O. The function f (n) = n2 is not the only choice for satisfying the condition T (n)  cf (n). We could have said the algorithms had a run time of O(n3 ) or O(n4 ) since 2n2  n3 and 2n2  n4 when n > 1. The objective, however, is to find a function f (·) that provides the tightest (lowest) upper bound or limit for the run time of an algorithm. The big-O notation is intended to indicate an algorithm’s e ciency for large values of n. There is usually little di↵erence in the execution times of algorithms when n is small. Constant of Proportionality The constant of proportionality is only crucial when two algorithms have the same f (n). It usually makes no di↵erence when comparing algorithms whose growth rates are of di↵erent magnitudes. Suppose we have two algorithms, L1 and L2 , with run times equal to n2 and 2n respectively. L1 has a time-complexity of O(n2 ) with c = 1 and L2 has a time of O(n) with c = 2. Even though L1 has a smaller constant of proportionality, L1 is still slower and, in fact an order of magnitude slower, for large values of n. Thus, f (n) dominates the expression cf (n) and the run time performance of the algorithm. The di↵erences between the run times of these two algorithms is shown numerically in Table 4.2 and graphically in Figure 4.2. Constructing T(n) Instead of counting the number of logical comparisons or arithmetic operations, we evaluate an algorithm by considering every operation. For simplicity, we assume that each basic operation or statement, at the abstract level, takes the same amount of time and, thus, each is assumed to cost constant time. The total number of
  • 5. 4.1 Complexity Analysis 101 n n2 2n 10 100 20 100 10,000 200 1000 1,000,000 2,000 10000 100,000,000 20,000 100000 10,000,000,000 200,000 Table 4.2: Numerical comparison of two sample algorithms. Figure 4.2: Graphical comparison of the data from Table 4.2. operations required by an algorithm can be computed as a sum of the times required to perform each step: T (n) = f1 (n) + f2 (n) + . . . + fk (n). The steps requiring constant time are generally omitted since they eventually become part of the constant of proportionality. Consider Figure 4.3(a), which shows a markup of version one of the algorithm from earlier. The basic operations are marked with a constant time while the loops are marked with the appropriate total number of iterations. Figure 4.3(b) shows the same algorithm but with the constant steps omitted since these operations are independent of the data set size.
  • 6. 102 CHAPTER 4 Algorithm Analysis 1 totalSum = 0 1 for i in range( n ) : 1 rowSum[i] = 0 (a) n 1 for j in range( n ) : n 1 rowSum[i] = rowSum[i] + matrix[i,j] 1 totalSum = totalSum + matrix[i,j] for i in range( n ) : ... (b) n n for j in range( n ) : ... Figure 4.3: Markup for version one of the matrix summing algorithm: (a) shows all oper- ations marked with the appropriate time and (b) shows only the non-constant time steps. Choosing the Function The function f (n) used to categorize a particular algorithm is chosen to be the dominant term within T (n). That is, the term that is so large for big values of n, that we can ignore the other terms when computing a big-O value. For example, in the expression n2 + log2 n + 3n the term n2 dominates the other terms since for n 3, we have n2 + log2 n + 3n  n2 + n2 + n2 n2 + log2 n + 3n  3n2 which leads to a time-complexity of O(n2 ). Now, consider the function T (n) = 2n2 + 15n + 500 and assume it is the polynomial that represents the exact number of instructions required to execute some algorithm. For small values of n (less than 16), the constant value 500 dominates the function, but what happens as n gets larger, say 100, 000? The term n2 becomes the dominant term, with the other two becoming less significant in computing the final result. Classes of Algorithms We will work with many di↵erent algorithms in this text, but most will have a time-complexity selected from among a common set of functions, which are listed in Table 4.3 and illustrated graphically in Figure 4.4. Algorithms can be classified based on their big-O function. The various classes are commonly named based upon the dominant term. A logarithmic algorithm is
  • 7. 4.1 Complexity Analysis 103 f (·) Common Name 1 constant log n logarithmic n linear n log n log linear n2 quadratic n3 cubic an exponential Table 4.3: Common big-O functions listed from smallest to largest order of magnitude. Figure 4.4: Growth rates of the common time-complexity functions. any algorithm whose time-complexity is O(loga n). These algorithms are generally very e cient since loga n will increase more slowly than n. For many problems encountered in computer science a will typically equal 2 and thus we use the no- tation log n to imply log2 n. Logarithms of other bases will be explicitly stated. Polynomial algorithms with an e ciency expressed as a polynomial of the form am nm + am 1n m 1 + . . . + a2 n2 + a1 n + a0
  • 8. 104 CHAPTER 4 Algorithm Analysis are characterized by a time-complexity of O(nm ) since the dominant term is the highest power of n. The most common polynomial algorithms are linear (m = 1), quadratic (m = 2), and cubic (m = 3). An algorithm whose e ciency is char- acterized by a dominant term in the form an is called exponential . Exponential algorithms are among the worst algorithms in terms of time-complexity. 4.1.2 Evaluating Python Code As indicated earlier, when evaluating the time complexity of an algorithm or code segment, we assume that basic operations only require constant time. But what exactly is a basic operation? The basic operations include statements and func- tion calls whose execution time does not depend on the specific values of the data that is used or manipulated by the given instruction. For example, the assignment statement x = 5 is a basic instruction since the time required to assign a reference to the given variable is independent of the value or type of object specified on the righthand side of the = sign. The evaluation of arithmetic and logical expressions y = x z = x + y * 6 done = x > 0 and x < 100 are basic instructions, again since they require the same number of steps to perform the given operations regardless of the values of their operands. The subscript operator, when used with Python’s sequence types (strings, tuples, and lists) is also a basic instruction. Linear Time Examples Now, consider the following assignment statement: y = ex1(n) An assignment statement only requires constant time, but that is the time required to perform the actual assignment and does not include the time required to execute any function calls used on the righthand side of the assignment statement. To determine the run time of the previous statement, we must know the cost of the function call ex1(n). The time required by a function call is the time it takes to execute the given function. For example, consider the ex1() function, which computes the sum of the integer values in the range [0 . . . n): def ex1( n ): total = 0 for i in range( n ) : total += i return total
  • 9. 4.1 Complexity Analysis 105 Efficiency of String Operations. Most of the string operations have i a time-complexity that is proportional to the length of the string. For NOTE most problems that do not involve string processing, string operations sel- dom have an impact on the run time of an algorithm. Thus, in the text, we assume the string operations, including the use of the print() function, only require constant time, unless explicitly stated otherwise. The time required to execute a loop depends on the number of iterations per- formed and the time needed to execute the loop body during each iteration. In this case, the loop will be executed n times and the loop body only requires constant time since it contains a single basic instruction. (Note that the underlying mech- anism of the for loop and the range() function are both O(1).) We can compute the time required by the loop as T (n) = n ⇤ 1 for a result of O(n). But what about the other statements in the function? The first line of the function and the return statement only require constant time. Remember, it’s common to omit the steps that only require constant time and instead focus on the critical operations, those that contribute to the overall time. In most instances, this means we can limit our evaluation to repetition and selection statements and function and method calls since those have the greatest impact on the overall time of an algorithm. Since the loop is the only non-constant step, the function ex1() has a run time of O(n). That means the statement y = ex1(n) from earlier requires linear time. Next, consider the following function, which includes two for loops: def ex2( n ): count = 0 for i in range( n ) : count += 1 for j in range( n ) : count += 1 return count To evaluate the function, we have to determine the time required by each loop. The two loops each require O(n) time as they are just like the loop in function ex1() earlier. If we combine the times, it yields T (n) = n + n for a result of O(n). Quadratic Time Examples When presented with nested loops, such as in the following, the time required by the inner loop impacts the time of the outer loop. def ex3( n ): count = 0 for i in range( n ) : for j in range( n ) : count += 1 return count
  • 10. 106 CHAPTER 4 Algorithm Analysis Both loops will be executed n, but since the inner loop is nested inside the outer loop, the total time required by the outer loop will be T (n) = n ⇤ n, resulting in a time of O(n2 ) for the ex3() function. Not all nested loops result in a quadratic time. Consider the following function: def ex4( n ): count = 0 for i in range( n ) : for j in range( 25 ) : count += 1 return count which has a time-complexity of O(n). The function contains a nested loop, but the inner loop executes independent of the size variable n. Since the inner loop executes a constant number of times, it is a constant time operation. The outer loop executes n times, resulting in a linear run time. The next example presents a special case of nested loops: def ex5( n ): count = 0 for i in range( n ) : for j in range( i+1 ) : count += 1 return count How many times does the inner loop execute? It depends on the current it- eration of the outer loop. On the first iteration of the outer loop, the inner loop will execute one time; on the second iteration, it executes two times; on the third iteration, it executes three times, and so on until the last iteration when the inner loop will execute n times. The time required to execute the outer loop will be the number of times the increment statement count += 1 is executed. Since the inner loop varies from 1 to n iterations by increments of 1, the total number of times the increment statement will be executed is equal to the sum of the first n positive integers: n(n + 1) n2 + n T (n) = = 2 2 which results in a quadratic time of O(n2 ). Logarithmic Time Examples The next example contains a single loop, but notice the change to the modification step. Instead of incrementing (or decrementing) by one, it cuts the loop variable in half each time through the loop. def ex6( n ): count = 0 i = n while i >= 1 :
  • 11. 4.1 Complexity Analysis 107 count += 1 i = i // 2 return count To determine the run time of this function, we have to determine the number of loop iterations just like we did with the earlier examples. Since the loop variable is cut in half each time, this will be less than n. For example, if n equals 16, variable i will contain the following five values during subsequent iterations (16, 8, 4, 2, 1). Given a small number, it’s easy to determine the number of loop iterations. But how do we compute the number of iterations for any given value of n? When the size of the input is reduced by half in each subsequent iteration, the number of iterations required to reach a size of one will be equal to blog2 nc + 1 or the largest integer less than log2 n, plus 1. In our example of n = 16, there are log2 16 + 1, or four iterations. The logarithm to base a of a number n, which is normally written as y = loga n, is the power to which a must be raised to equal n, n = ay . Thus, function ex6() requires O(log n) time. Since many problems in computer science that repeatedly reduce the input size do so by half, it’s not un- common to use log n to imply log2 n when specifying the run time of an algorithm. Finally, consider the following definition of function ex7(), which calls ex6() from within a loop. Since the loop is executed n times and function ex6() requires logarithmic time, ex7() will have a run time of O(n log n). def ex7( n ): count = 0 for i in range( n ) count += ex6( n ) return count Different Cases Some algorithms can have run times that are di↵erent orders of magnitude for di↵erent sets of inputs of the same size. These algorithms can be evaluated for their best, worst, and average cases. Algorithms that have di↵erent cases can typically be identified by the inclusion of an event-controlled loop or a conditional statement. Consider the following example, which traverses a list containing integer values to find the position of the first negative value. Note that for this problem, the input is the collection of n values contained in the list. def findNeg( intList ): n = len(intList) for i in range( n ) : if intList[i] < 0 : return i return None
  • 12. 108 CHAPTER 4 Algorithm Analysis At first glance, it appears the loop will execute n times, where n is the size of the list. But notice the return statement inside the loop, which can cause it to terminate early. If the list does not contain a negative value, L = [ 72, 4, 90, 56, 12, 67, 43, 17, 2, 86, 33 ] p = findNeg( L ) the return statement inside the loop will not be executed and the loop will ter- minate in the normal fashion from having traversed all n times. In this case, the function requires O(n) time. This is known as the worst case since the function must examine every value in the list requiring the most number of steps. Now consider the case where the list contains a negative value in the first element: L = [ -12, 50, 4, 67, 39, 22, 43, 2, 17, 28 ] p = findNeg( L ) There will only be one iteration of the loop since the test of the condition by the if statement will be true the first time through and the return statement inside the loop will be executed. In this case, the findNeg() function only requires O(1) time. This is known as the best case since the function only has to examine the first value in the list requiring the least number of steps. The average case is evaluated for an expected data set or how we expect the algorithm to perform on average. For the findNeg() function, we would expect the search to iterate halfway through the list before finding the first negative value, which on average requires n/2 iterations. The average case is more di cult to evaluate because it’s not always readily apparent what constitutes the average case for a particular problem. In general, we are more interested in the worst case time-complexity of an algorithm as it provides an upper bound over all possible inputs. In addition, we can compare the worst case run times of di↵erent implementations of an algorithm to determine which is the most e cient for any input. 4.2 Evaluating the Python List We defined several abstract data types for storing and using collections of data in the previous chapters. The next logical step is to analyze the operations of the various ADTs to determine their e ciency. The result of this analysis depends on the e ciency of the Python list since it was the primary data structure used to implement many of the earlier abstract data types. The implementation details of the list were discussed in Chapter 2. In this section, we use those details and evaluate the e ciency of some of the more common operations. A summary of the worst case run times are shown in Table 4.4.
  • 13. C HAPTER 4 Basic Searching Algorithms Searching for data is a fundamental computer programming task and one that has been studied for many years. This chapter looks at just one aspect of the search problem—searching for a given value in a list (array). There are two fundamental ways to search for data in a list: the sequential search and the binary search. Sequential search is used when the items in the list are in random order; binary search is used when the items are sorted in the list. SEQUENTIAL SEARCHING The most obvious type of search is to begin at the beginning of a set of records and move through each record until you find the record you are looking for or you come to the end of the records. This is called a sequential search. A sequential search (also called a linear search) is very easy to implement. Start at the beginning of the array and compare each accessed array element to the value you’re searching for. If you find a match, the search is over. If you get to the end of the array without generating a match, then the value is not in the array. 55
  • 14. 56 BASIC SEARCHING ALGORITHMS Here is a function that performs a sequential search: bool SeqSearch(int[] arr, int sValue) { for (int index = 0; index < arr.Length-1; index++) if (arr[index] == sValue) return true; return false; } If a match is found, the function immediately returns True and exits. If the end of the array is reached without the function returning True, then the value being searched for is not in array and the function returns False. Here is a program to test our implementation of a sequential search: using System; using System.IO; public class Chapter4 { static void Main() { int [] numbers = new int[100]; StreamReader numFile = File.OpenText("c:numbers.txt"); for (int i = 0; i < numbers.Length-1; i++) numbers[i] = Convert.ToInt32(numFile.ReadLine(), 10); int searchNumber; Console.Write("Enter a number to search for: "); searchNumber = Convert.ToInt32(Console.ReadLine(), 10); bool found; found = SeqSearch(numbers, searchNumber); if (found) Console.WriteLine(searchNumber + " is in the array."); else Console.WriteLine(searchNumber + " is not in the array.");
  • 15. Sequential Searching 57 } static bool SeqSearch(int[] arr, int sValue) { for (int index = 0; index < arr.Length-1; index++) if (arr[index] == sValue) return true; return false; } } The program works by first reading in a set of data from a text file. The data consists of the first 100 integers, stored in the file in a partially random order. The program then prompts the user to enter a number to search for and calls the SeqSearch function to perform the search. You can also write the sequential search function so that the function returns the position in the array where the searched-for value is found or a −1 if the value cannot be found. First, let’s look at the new function: static int SeqSearch(int[] arr, int sValue) { for (int index = 0; index < arr.Length-1; index++) if (arr[index] == sValue) return index; return -1; } The following program uses this function: using System; using System.IO; public class Chapter4 { static void Main() { int [] numbers = new int[100]; StreamReader numFile =_ File.OpenText("c:numbers.txt"); for (int i = 0; i < numbers.Length-1; i++) numbers[i] = Convert.ToInt32(numFile.ReadLine(), 10);
  • 16. 58 BASIC SEARCHING ALGORITHMS int searchNumber; Console.Write("Enter a number to search for: "); searchNumber = Convert.ToInt32(Console.ReadLine(), 10); int foundAt; foundAt = SeqSearch(numbers, searchNumber); if (foundAt >= 0) Console.WriteLine(searchNumber + " is in the_ array at position " + foundAt); else Console.WriteLine(searchNumber + " is not in the array."); } static int SeqSearch(int[] arr, int sValue) { for (int index = 0; index < arr.Length-1; index++) if (arr[index] == sValue) return index; return -1; } } Searching for Minimum and Maximum Values Computer programs are often asked to search an array (or other data structure) for minimum and maximum values. In an ordered array, searching for these values is a trivial task. Searching an unordered array, however, is a little more challenging. Let’s start by looking at how to find the minimum value in an array. The algorithm is: 1. Assign the first element of the array to a variable as the minimum value. 2. Begin looping through the array, comparing each successive array element with the minimum value variable. 3. If the currently accessed array element is less than the minimum value, assign this element to the minimum value variable. 4. Continue until the last array element is accessed. 5. The minimum value is stored in the variable.
  • 17. Sequential Searching 59 Let’s look at a function, FindMin, which implements this algorithm: static int FindMin(int[] arr) { int min = arr[0]; for(int i = 0; i < arr.Length-1; i++) if (arr[index] < min) min = arr[index]; return min; } Notice that the array search starts at position 1 and not at position 0. The 0th position is assigned as the minimum value before the loop starts, so we can start making comparisons at position 1. The algorithm for finding the maximum value in an array works in the same way. We assign the first array element to a variable that holds the maximum amount. Next we loop through the array, comparing each array element with the value stored in the variable, replacing the current value if the accessed value is greater. Here’s the code: static int FindMax(int[] arr) { int max = arr[0]; for(int i = 0; i < arr.Length-1; i++) if (arr[index] > max) max = arr[index]; return max; } An alternative version of these two functions could return the position of the maximum or minimum value in the array rather than the actual value. Making Sequential Search Faster: Self-Organizing Data The fastest successful sequential searches occur when the data element being searched for is at the beginning of the data set. You can ensure that a success- fully located data item is at the beginning of the data set by moving it there after it has been found. The concept behind this strategy is that we can minimize search times by putting frequently searched-for items at the beginning of the data set.
  • 18. 60 BASIC SEARCHING ALGORITHMS Eventually, all the most frequently searched-for data items will be located at the beginning of the data set. This is an example of self-organization, in that the data set is organized not by the programmer before the program runs, but by the program while the program is running. It makes sense to allow your data to organize in this way since the data being searched probably follows the “80–20” rule, meaning that 80% of the searches conducted on your data set are searching for 20% of the data in the data set. Self-organization will eventually put that 20% at the beginning of the data set, where a sequential search will find them quickly. Probability distributions such as this are called Pareto distributions, named for Vilfredo Pareto, who discovered these distributions studying the spread of income and wealth in the late nineteenth century. See Knuth (1998, pp. 399– 401) for more on probability distributions in data sets. We can modify our SeqSearch method quite easily to include self- organization. Here’s a first stab at the method: static bool SeqSearch(int sValue) { for(int index = 0; i < arr.Length-1; i++) if (arr[index] == sValue) { swap(index, index-1); return true; } return false; } If the search is successful, the item found is switched with the element at the first of the array using a swap function, shown as follows: static void swap(ref int item1, ref int item2) { int temp = arr[item1]; arr[item1] = arr[item2]; arr[item2] = temp; } The problem with the SeqSearch method as we’ve modified it is that fre- quently accessed items might be moved around quite a bit during the course of many searches. We want to keep items that are moved to the first of the
  • 19. Sequential Searching 61 data set there and not moved farther back when a subsequent item farther down in the set is successfully located. There are two ways we can achieve this goal. First, we can only swap found items if they are located away from the beginning of the data set. We only have to determine what is considered to be far enough back in the data set to warrant swapping. Following the “80–20” rule again, we can make a rule that a data item is relocated to the beginning of the data set only if its location is outside the first 20% of the items in the data set. Here’s the code for this first rewrite: static int SeqSearch(int sValue) { for(int index = 0; i < arr.Length-1; i++) if (arr[index] == sValue && index > (arr.Length *_ 0.2)) { swap(index, index-1); return index; } else if (arr[index] == sValue) return index; return -1; } The If–Then statement is short-circuited because if the item isn’t found in the data set, there’s no reason to test to see where the index is in the data set. The other way we can rewrite the SeqSearch method is to swap a found item with the element that precedes it in the data set. Using this method, which is similar to how data is sorted using the Bubble sort, the most frequently accessed items will eventually work their way up to the front of the data set. This technique also guarantees that if an item is already at the beginning of the data set, it won’t move back down. The code for this new version of SeqSearch is shown as follows: static int SeqSearch(int sValue) { for(int index = 0; i < arr.Length-1; i++) if (arr[index] == sValue) { swap(index, index-1); return index; } return -1; }
  • 20. 62 BASIC SEARCHING ALGORITHMS Either of these solutions will help your searches when, for whatever reason, you must keep your data set in an unordered sequence. In the next section, we will discuss a search algorithm that is more efficient than any of the sequen- tial algorithms mentioned, but that only works on ordered data—the binary search. Binary Search When the records you are searching through are sorted into order, you can perform a more efficient search than the sequential search to find a value. This search is called a binary search. To understand how a binary search works, imagine you are trying to guess a number between 1 and 100 chosen by a friend. For every guess you make, the friend tells you if you guessed the correct number, or if your guess is too high, or if your guess is too low. The best strategy then is to choose 50 as the first guess. If that guess is too high, you should then guess 25. If 50 is to low, you should guess 75. Each time you guess, you select a new midpoint by adjusting the lower range or the upper range of the numbers (depending on if your guess is too high or too low), which becomes your next guess. As long as you follow that strategy, you will eventually guess the correct number. Figure 4.1 demonstrates how this works if the number to be chosen is 82. We can implement this strategy as an algorithm, the binary search algo- rithm. To use this algorithm, we first need our data stored in order (ascending, preferably) in an array (though other data structures will work as well). The first steps in the algorithm are to set the lower and upper bounds of the search. At the beginning of the search, this means the lower and upper bounds of the array. Then, we calculate the midpoint of the array by adding the lower and upper bounds together and dividing by 2. The array element stored at this position is compared to the searched-for value. If they are the same, the value has been found and the algorithm stops. If the searched-for value is less than the midpoint value, a new upper bound is calculated by subtracting 1 from the midpoint. Otherwise, if the searched-for value is greater than the midpoint value, a new lower bound is calculated by adding 1 to the midpoint. The algorithm iterates until the lower bound equals the upper bound, which indi- cates the array has been completely searched. If this occurs, a -1 is returned, indicating that no element in the array holds the value being searched for.
  • 21. Sequential Searching 63 Guessing Game-Secret number is 82 1 100 25 50 75 82 First Guess : 50 Answer : Too low 51 100 75 82 Second Guess : 75 Answer : Too low 76 100 82 88 Third Guess : 88 Answer : Too high 76 87 81 82 Fourth Guess : 81 Answer : Too low 82 87 84 Fifth Guess : 84 Answer : Too high 82 83 Midpoint is 82.5, which is rounded to 82 Sixth Guess : 82 Answer : Correct FIGURE 4.1. A Binary Search Analogy. Here’s the algorithm written as a C# function: static int binSearch(int value) { int upperBound, lowerBound, mid; upperBound = arr.Length-1; lowerBound = 0; while(lowerBound <= upperBound) { mid = (upperBound + lowerBound) / 2;
  • 22. 64 BASIC SEARCHING ALGORITHMS if (arr[mid] == value) return mid; else if (value < arr[mid]) upperBound = mid - 1; else lowerBound = mid + 1; } return -1; } Here’s a program that uses the binary search method to search an array: static void Main(string[] args) { Random random = new Random(); CArray mynums = new CArray(9); for(int i = 0; i <= 9; i++) mynums.Insert(random.next(100)); mynums.SortArr(); mynums.showArray(); int position = mynums.binSearch(77, 0, 0); if (position >= -1) { Console.WriteLine("found item"); mynums.showArray(); } else Console.WriteLine("Not in the array"); Console.Read(); } A Recursive Binary Search Algorithm Although the version of the binary search algorithm developed in the previ- ous section is correct, it’s not really a natural solution to the problem. The binary search algorithm is really a recursive algorithm because, by constantly subdividing the array until we find the item we’re looking for (or run out of room in the array), each subdivision is expressing the problem as a smaller
  • 23. Sequential Searching 65 version of the original problem. Viewing the problem this ways leads us to discover a recursive algorithm for performing a binary search. In order for a recursive binary search algorithm to work, we have to make some changes to the code. Let’s take a look at the code first and then we’ll discuss the changes we’ve made: public int RbinSearch(int value, int lower, int upper) { if (lower > upper) return -1; else { int mid; mid = (int)(upper+lower) / 2; if (value < arr[mid]) RbinSearch(value, lower, mid-1); else if (value = arr[mid]) return mid; else RbinSearch(value, mid+1, upper) } } The main problem with the recursive binary search algorithm, as compared to the iterative algorithm, is its efficiency. When a 1,000-element array is sorted using both algorithms, the recursive algorithm is consistently 10 times slower than the iterative algorithm: Of course, recursive algorithms are often chosen for other reasons than effi- ciency, but you should keep in mind that anytime you implement a recursive algorithm, you should also look for an iterative solution so that you can compare the efficiency of the two algorithms. Finally, before we leave the subject of binary search, we should mention that the Array class has a built-in binary search method. It takes two arguments,
  • 24. 66 BASIC SEARCHING ALGORITHMS an array name and an item to search for, and it returns the position of the item in the array, or -1 if the item can’t be found. To demonstrate how the method works, we’ve written yet another binary search method for our demonstration class. Here’s the code: public int Bsearh(int value) { return Array.BinarySearch(arr, value) } When the built-in binary search method is compared with our custom- built method, it consistently performs 10 times faster than the custom-built method, which should not be surprising. A built-in data structure or algorithm should always be chosen over one that is custom-built, if the two can be used in exactly the same ways. SUMMARY Searching a data set for a value is a ubiquitous computational operation. The simplest method of searching a data set is to start at the beginning and search for the item until either the item is found or the end of the data set is reached. This searching method works best when the data set is relatively small and unordered. If the data set is ordered, the binary search algorithm is a better choice. Binary search works by continually subdividing the data set until the item being searched for is found. You can write the binary search algorithm using both iterative and recursive codes. The Array class in C# includes a built-in binary search method, which should be used whenever a binary search is called for. EXERCISES 1. The sequential search algorithm will always find the first occurrence of an item in a data set. Create a new sequential search method that takes a second integer argument indicating which occurrence of an item you want to search for. 2. Write a sequential search method that finds the last occurrence of an item. 3. Run the binary search method on a set of unordered data. What happens?
  • 25. Exercises 67 4. Using the CArray class with the SeqSearch method and the BinSearch method, create an array of 1,000 random integers. Add a new private Inte- ger data member named compCount that is initialized to 0. In each of the search algorithms, add a line of code right after the critical comparison is made that increments compCount by 1. Run both methods, searching for the same number, say 734, with each method. Compare the values of compCount after running both methods. What is the value of compCount for each method? Which method makes the fewest comparisons?