I Human Network: 4.1 Sequential Searches

Let's examine how long it will take to find an item matching a key in the collections we have discussed so far. We're interested in:

the average time
the worst-case time and
the best possible time.

However, we will generally be most concerned with the worst-case time as calculations based on worst-case times can lead to guaranteed performance predictions. Conveniently, the worst-case times are generally easier to calculate than average times. If there are n items in our collection - whether it is stored as an array or as a linked list - then it is obvious that in the worst case, when there is no item in the collection with the desired key, then n comparisons of the key with keys of the items in the collection will have to be made.

To simplify analysis and comparison of algorithms, we look for a dominant operation and count the number of times that dominant operation has to be performed. In the case of searching, the dominant operation is the comparison, since the search requires n comparisons in the worst case, we say this is a O(n) (pronounce this "big-Oh-n" or "Oh-n") algorithm. The best case - in which the first comparison returns a match - requires a single comparison and is O(1). The average time depends on the probability that the key will be found in the collection - this is something that we would not expect to know in the majority of cases. Thus in this case, as in most others, estimation of the average time is of little utility. If the performance of the system is vital, i.e. it's part of a life-critical system, then we must use the worst case in our design calculations as it represents the best guaranteed performance.

The simplest algorithm to search a dictionary for a given key is to test successively against each element.

This works correctly regardless of the order of the elements in the list. However, in the worst case all elements might have to be tested.

Procedure Search(head:pointer, key:item):pointer;
Var
   p:pointer;
   found:boolean;
Begin
   found:=false;
   p:=head;
   While (p # NIL) AND (not found) Do
    Begin
       If (p^.info = key) then 
          found = true;
       Else
          p = p^.next;
   End;
   return p;
END;

With and Without Sentinels

A sentinel is a value placed at the end of an array to insure that the normal case of searching returns something even if the item is not found. It is a way to simplify coding by eliminating the special case.

MODULE LinearSearch EXPORTS Main;  (*1.12.94. LB*)
(* Linear search without a sentinel *)

  ...

  i:= FIRST(a);
  WHILE (i <= last) AND NOT Text.Equal(a[i], x) DO INC(i) END;

  IF i > last THEN
    SIO.PutText("NOT found");
  ELSE
    SIO.PutText("Found at position: ");
    SIO.PutInt(i)
  END; (*IF i > last*)
  SIO.Nl();
END LinearSearch.

The sentinel insures that the search will eventually succeed:

 
MODULE SentinelSearch EXPORTS Main;      (*27.10.93. LB*)
(* Linear search with sentinel. *)

...

(* Do search *)
  a[LAST(a)]:= x; (*sentinel at position N+1*)
  i:= FIRST(a);
  WHILE x # a[i] DO INC(i) END;

(* Output result *)
  IF i = LAST(a) THEN
    SIO.PutText("NOT found");
  ELSE
    SIO.PutText("Found at position: "); SIO.PutInt(i)
  END;
  SIO.Nl();
 END SentinelSearch.

Weighted Sequential Search

Sometimes sequential search is not a bad algorithm, especially when the list isn't long. After all, sequential search is easier to implement than binary search, and does not require the list to be sorted.

However, if we are going to do a sequential search, what order do we want the elements? Sorted order in a linked list doesn't really help, except maybe to help us stop early if the item isn't in the list.

Suppose you were organizing your personal phone book for sequential search. You would want your most frequently called friends to be at the front: In sequential search, you want the keys ordered by frequency of use!

Why? If

is the probability of searching for the ith key, which is a distance

from the front, the expected search time is

which is minimized by placing the list in decreasing probability of access order.

For the list (Cheryl,0.4), (Lisa,0.25), (Lori,0.2), (Lauren,0.15), the expected search time is:

If access probability had been uniform, the expected search time would have been

So I win using this order, and win even more if the access probabilities are furthered skewed.

But how do I find the probabilities?

Self-Organizing Lists

Since it is often impractical to compute usage frequencies, and because usage frequencies often change in the middle of a program (locality), we would like our data structure to automatically adjust to the distribution.

Such data structures are called self-organizing.

The idea is to use a heuristic to move an element forward in the list whenever it is accessed. There are two possibilities:

Move forward one is the ``conservative'' approach. (1,2,3,4,5) becomes (1,2,4,3,5) after a Find(4).
Move to front is the ``liberal'' approach. (1,2,3,4,5) becomes (4,1,2,3,5) after a Find(4).

Which Heuristic is Better?

Move-forward-one can get caught in traps which won't fool move-to-front:

For list (1,2,3,4,5,6,7,8), the queries Find(8), Find(7), Find(8), Find(7), ... will search the entire list every time. With move-to-front, it averages only two comparisons per query!

In fact, it can be shown that the total search time with move-to-front is never more than twice the time if you knew the actual probabilities in advance!!

We will see self-organization again later in the semester when we talk about splay trees.

Let's Play 20 Questions!

1.                       11.
2.                       12.
3.                       13.
4.                       14.
5.                       15.
6.                       16.
7.                       17.
8.                       18.
9.                       19.
10.                      20.

Example in C:

In this example, there exists an element in the index array for every 8 elements in the main array.

In real-world implementations, this number will be different. You can change it accordingly.

Snippet

#include <stdio.h>

#define MAX 24

void createIndex(int index[],int isize,int arr[],int asize)
{
    int i,j;
    for(i=0,j=0;i<asize;i+=8,j++)
    {
        index[j]= arr[i];
    }
    index[j] = arr[asize-1];
}

int indexSeqSearch(int val, int index[], int isize, int arr[], int asize)
{
    int i=0,j=0,pos=0;
    int high=0,low=0;

    if(val > index[isize-1] && val < index[0])
        return -1;

    while(i<isize)
    {
        if(val == index[i])
        {
            pos = 8 * i;
            return pos;
        }
        if(val < index[i])
        {
            low = 8 * (i-1);
            high = 8 * i;
            break;
        }
        else
        {
            low = 8 * i;
            high = 8 * (i+1);
        }
        i++;
    }
    while(low < high)
    {
        if(val == arr[low])
            return low;
        else
            low++;
    }
    return -1;
}

int main()
{
    int arr[MAX]={ 8,14,26,38,72,115,306,
        321,329,387,409,512,540,567,583,592,602,611,618,741,798,811,814,876};

    int index[(MAX/8)+1]={0};

    createIndex(&index[0],(MAX/8)+1,&arr[0],MAX);
    int opt=0,pos=0;
    while(opt < MAX)
    {
        pos = indexSeqSearch(arr[opt],&index[0],(MAX/8)+1,&arr[0],MAX);
        if( pos != -1)
        {
            printf("\n%d found at position %d",arr[opt],pos);
        }
        else
            printf("\n%d not found.",arr[opt]);

        opt++;
    }

    return 0;
}

Example in C++

#include <iostream.h>
void main()
{
int arr[25];
int n,item;
cout<<"How many element you want to enter in array:";
cin>>n;
for(int i=0;i<n;i++)
{
cout<<"enter the element "<<":";
cin>>arr[i];
}
cout<<"Enter the no. you want to search: ";
cin>>item;
for(i=0;i<n;i++)
{
if(arr[i]==item)
{
cout<<" found at position ";
break;
}
}
if(i==n)
cout<<" not found in the array";

}

Output:

How many element you want to enter in array:5
enter the element 1:10
enter the element 2:25
enter the element 3:7
enter the element 4:12
enter the element 5:100
Enter the no. you want to search:12
12 found at position 4
Press any key to continue

Home

4.1 Sequential Searches

Snippet

Cisco Q & A

TeleCom

Programming

Projects in C++

Popular Read