35A. Linear and Binary Search


Linear search

Suppose function member(x, A, n) is intended to return true if x occurs among the first n numbers in array A. A search algorithm seems to be called for.

  bool member(const int x, const int A[], const int n)
  {
    for(int i = 0; i < n; i++)
    {
      if(A[i] == n)
      {
        return true;
      }
    }
    return false;
  }

The worst case of member(x, A, n) is when the result if false; in that case, all n values in A need to be looked at, and the time is Θ(n). This algorithm is called linear search because its time is Θ(n), and f (n) = n is a linear function.


Binary search

You don't look up a name in a telephone book by starting at the first name and looking at every name until you find the one you want. You can take advantage of the fact that the telephone book is in alphabetical order.

If array A is in ascending order, can we take advantage of that to search it faster? Yes. To search for x among A[0], …, A[n−1], compare x to the middle value A[mid] in A, where mid = n/2; There are three possible outcomes.

  1. If x = A[mid], then return true.

  2. If x < A[mid], then x cannot be any of the values A[0], …, A[mid], since A is in ascending order. Look for x among A[mid+1], …, A[n−1].

  3. If x > A[mid], then x cannot be any of the values A[mid], …, A[n−1], since A is in ascending order. Look for x among A[0], …, A[mid−1].

In the first case, the search is finished. In the second and third cases, there remain no more than n/2 values left to search. If we apply that idea recursively, it will turn into a very efficient algorithm. The number of values left to search is cut in half at each step. That means, in only about log2(n) steps, there will be only one value left, and the search can end.

Of course, there needs to be a basis case. An easy case is to notice that, if there are no values to look at, then the answer is false.

There is a small catch. Function member(x, A, n) is designed to look for x in a prefix of A. The recursive calls don't fit that; they look at segments that do not necessarily start at index 0. That is easy to fix by providing both a first index, lo, and last index, hi, to search. The mid point is (lo + hi)/2, the (approximate) average of lo and hi. The algorithm is called binary search.

  // binarySearch(x, A, lo, hi) returns true if x occurs
  // among A[lo], ..., A[hi].  If lo > hi, then
  // binarySearch(x, A, lo, hi) returns false.

  bool binarySearch(const int x, const int A[], const int lo, const int hi)
  {
    if(lo > hi)
    {
      return false;
    }
    else {
      int mid = (lo + hi)/2;
      if(x == A[mid])
      {
        return true;
      }
      else if(x < A[mid])
      {
        return binarySearch(x, A, lo, mid-1);
      }
      else
      {
        return binarySearch(x, A, mid+1, hi);
      }
    }
  }

  // member(x, A, n) returns true if x occurs among
  // the first n values in array A.

  bool member(const int x, const int A[], const int n)
  {
    return binarySearch(x, A, 0, n-1);
  }
  

Since there are hi − lo + 1 values in the range from lo to hi, member(x, A, lo, hi) takes time about log2(hi − lo + 1).


Insertions and deletions?

With linear search, insertions and deletions are easy. You just leave some room in the array, and add or remove values at the end of the occupied part.

With binary search, it is not so easy. The array needs to be kept in ascending order. Inserting a value can mean moving other values to make room for the new value, and removing a value can involve moving other values to fill in the hole. Because it is possible that all of the values need to be moved, the worst-case time to insert or remove is Θ(n).

Our goal now is to find a way of representing a set n of values so that searching, inserting and removing all take time Θ(log2(n)) in the worst case. That takes us to binary search trees.


Exercises

  1. The definition of binarySearch is tail recursive. An optimizing compiler will turn it into a loop.

    Rewrite binarySearch so that it is not recursive.

    Answer