Sorting is one of the most important operations performed by computers. In the days of magnetic tape storage before modern data-bases, it was almost certainly the most common operation performed by computers as most "database" updating was done by sorting transactions and merging them with a master file. It's still important for presentation of data extracted from databases: most people prefer to get reports sorted into some relevant order before wading through pages of data!
Finding better algorithms to sort a given set of data is an ongoing problem in the field of computer science. Sorting is placing a given set of data in a particular order. Simple sorts place data in ascending or descending order. For discussion purposes, we will look at sorting data in ascending order. However, you may modify the code to sort the data in descending order by reversing the relational operators (i.e. change '
In this lesson we will analyze sorts of different efficiency, and discuss when and where they can be used. In order to simplify the explanation of certain algorithms, we will assume a
We will first analyze sorts that are O(N^2). These sorts are very easy to understand, however they are very slow when there are a lot of elements to be sorted.
The first sort we will look it is called the insertion sort. The algorithm processes each element in turn, and compares it to the elements before it. The first element has no elements before it for comparison, so it is left alone. In the next iteration, the second element is evaluated. It is compared to the element directly before it, which is the first element in the structure. If the second element has a value less than the first, their positions are switched. If they second element is more than the first, then they are left as they are, and the third element is processed. The third element is then compared to the second element in the new list ('new' list here since the first two items may have been swapped). If it is less than the second, then they are swapped, and it is then compared to the first element. If it is more than the second element, it is left in place and the process continues to the next element.
In short, each element is moved to the front of the list by switching positions with the previous elements as long as it is smaller than the elements before it.
The algorithm is programmed using two nested
Finding better algorithms to sort a given set of data is an ongoing problem in the field of computer science. Sorting is placing a given set of data in a particular order. Simple sorts place data in ascending or descending order. For discussion purposes, we will look at sorting data in ascending order. However, you may modify the code to sort the data in descending order by reversing the relational operators (i.e. change '
nums[j] < nums[j-1]
' to 'nums[j] > nums[j-1]
').In this lesson we will analyze sorts of different efficiency, and discuss when and where they can be used. In order to simplify the explanation of certain algorithms, we will assume a
swap()
int
function exists that switches the values of two variables. An example of such a function for variables is displayed below.void swap(int &item1,int &item2) //reference parameters, point directly to the storage location of the variables passed. Local copies are not made, and these values are saved after the function life span ends. See 'Functions' in the preliminary lesson for further information.
{
int tmp;
tmp = item1;
item1 = item2;
item2 = tmp;
}
We will first analyze sorts that are O(N^2). These sorts are very easy to understand, however they are very slow when there are a lot of elements to be sorted.
The first sort we will look it is called the insertion sort. The algorithm processes each element in turn, and compares it to the elements before it. The first element has no elements before it for comparison, so it is left alone. In the next iteration, the second element is evaluated. It is compared to the element directly before it, which is the first element in the structure. If the second element has a value less than the first, their positions are switched. If they second element is more than the first, then they are left as they are, and the third element is processed. The third element is then compared to the second element in the new list ('new' list here since the first two items may have been swapped). If it is less than the second, then they are swapped, and it is then compared to the first element. If it is more than the second element, it is left in place and the process continues to the next element.
In short, each element is moved to the front of the list by switching positions with the previous elements as long as it is smaller than the elements before it.
The algorithm is programmed using two nested
for
loops. The first loop creates n-1 iterations, where n is the number of elements in the list. Since element[0]
does not have any elements before it to compare to, we start with the second element. The nested loop statement starts at the element that is being processed by the first loop, and works backwards, comparing the element to the one before it. If it is smaller, a swap is made and the loop continues. If it is larger, the loop ends, and the next iteration begins in the outer loop. The code for the insertion sort algorithm is shown below. A standard array of int
variable is used for simplicity. However, you can modify the code to work for any linear structure.void InsertionSort(int *nums,int n) //array called nums with n elements to be sorted
{
for(int i=1; i<n; i++)
for(int j=i; (j>0) && (nums[j]<nums[j-1])); j--)
swap(j,j-1);
}
7.1 Bubble, Selection, Insertion Sorts
There are a large number of variations of one basic strategy for sorting. It's the same strategy that you use for sorting your bridge hand. You pick up a card, start at the beginning of your hand and find the place to insert the new card, insert it and move all the others up one place.
/* Insertion sort for integers */ void insertion( int a[], int n ) { /* Pre-condition: a contains n items to be sorted */ int i, j, v; /* Initially, the first item is considered 'sorted' */ /* i divides a into a sorted region, x<i, and an unsorted one, x >= i */ for(i=1;i<n;i++) { /* Select the item at the beginning of the as yet unsorted section */ v = a[i]; /* Work backwards through the array, finding where v should go */ j = i; /* If this element is greater than v, move it up one */ while ( a[j-1] > v ) { a[j] = a[j-1]; j = j-1; if ( j <= 0 ) break; } /* Stopped when a[j-1] <= v, so put v at position j */ a[j] = v; } }
Bubble Sort
The next O(N^2) algorithm that we will analyze is the bubble sort. The bubble sort works from the bottom-up (back to front), and evaluates each element to the one before it. If the element on the bottom has a smaller value than the top, the two are swapped, if not - they remain in their original position. The algorithm compares the next two elements from the bottom-up, no matter what the outcome of the previous comparison was. In this fashion, the smallest value "bubbles up" to the top in each iteration. In subsequent comparisons, the values that were bubbled up in previous iterations are no longer compared, since they are in place.
The code for the bubble sort is shown below, using a standard array of
The next O(N^2) algorithm that we will analyze is the bubble sort. The bubble sort works from the bottom-up (back to front), and evaluates each element to the one before it. If the element on the bottom has a smaller value than the top, the two are swapped, if not - they remain in their original position. The algorithm compares the next two elements from the bottom-up, no matter what the outcome of the previous comparison was. In this fashion, the smallest value "bubbles up" to the top in each iteration. In subsequent comparisons, the values that were bubbled up in previous iterations are no longer compared, since they are in place.
The code for the bubble sort is shown below, using a standard array of
int
variables. Two nested for
loops are used. The first loop has n iterations, the number of elements. Each iteration, at least one element is set into its proper sorted position. The inner for
loop runs from the bottom-up, comparing adjacent values, and stops at the group of values that have already been set in place, this position being one more each iteration of the outer loop.void BubbleSort(int *nums, int n)
{
for (int i=0; i<n-1; i++)
for (int j=n-1; j>i; j--)
if(nums[j] < nums[j-1]
swap(j,j-1);
}
Another variant of this procedure, called bubble sort, is commonly taught:
/* Bubble sort for integers */ #define SWAP(a,b) { int t; t=a; a=b; b=t; } void bubble( int a[], int n ) /* Pre-condition: a contains n items to be sorted */ { int i, j; /* Make n passes through the array */ for(i=0;i<n;i++) { /* From the first element to the end of the unsorted section */ for(j=1;j<(n-i);j++) { /* If adjacent items are out of order, swap them */ if( a[j-1]>a[j] ) SWAP(a[j-1],a[j]); } } }
Analysis
Each of these algorithms requires n-1 passes: each pass places one item in its correct place. (The nth is then in the correct place also.) The ith pass makes either ior n - i comparisons and moves. So:
or O(n2) - but we already know we can use heaps to get an O(n logn) algorithm. Thus these algorithms are only suitable for small problems where their simple code makes them faster than the more complex code of the O(n logn) algorithm. As a rule of thumb, expect to find an O(n logn) algorithm faster for n>10 - but the exact value depends very much on individual machines!.
Selection Sort
Selection sort is one of the O(n2) sorting algorithms, which makes it quite inefficient for sorting large data volumes. Selection sort is notable for its programming simplicity and it can over perform other sorts in certain situations (see complexity analysis for more details).
Algorithm
The idea of algorithm is quite simple. Array is imaginary divided into two parts - sorted one and unsorted one. At the beginning, sorted part is empty, while unsorted one contains whole array. At every step, algorithm finds minimal element in the unsorted part and adds it to the end of the sorted one. When unsorted partempty, algorithm stops. becomes
When algorithm sorts an array, it swaps first element of unsorted part with minimal element and then it is included to the sorted part. This implementation of selection sort in not stable. In case of linked list is sorted, and, instead of swaps, minimal element is linked to the unsorted part, selection sort is stable.
Let us see an example of sorting an array to make the idea of selection sort clearer.
Example. Sort {5, 1, 12, -5, 16, 2, 12, 14} using selection sort.
Complexity analysis
Selection sort stops, when unsorted part becomes empty. As we know, on every step number of unsorted elements decreased by one. Therefore, selection sort makes n steps (n is number of elements in array) of outer loop, before stop. Every step of outer loop requires finding minimum in unsorted part. Summing up, n + (n - 1) + (n - 2) + ... + 1, results in O(n2) number of comparisons. Number of swaps may vary from zero (in case of sorted array) to n - 1 (in case array was sorted in reversed order), which results in O(n) number of swaps. Overall algorithm complexity is O(n2).
Fact, that selection sort requires n - 1 number of swaps at most, makes it very efficient in situations, when write operation is significantly more expensive, than read operation.
Code snippets
Java
public void selectionSort(int[] arr) {
int i, j, minIndex, tmp;
int n = arr.length;
for (i = 0; i < n - 1; i++) {
minIndex = i;
for (j = i + 1; j < n; j++)
if (arr[j] < arr[minIndex])
minIndex = j;
if (minIndex != i) {
tmp = arr[i];
arr[i] = arr[minIndex];
arr[minIndex] = tmp;
}
}
}
C++
void selectionSort(int arr[], int n) {
int i, j, minIndex, tmp;
for (i = 0; i < n - 1; i++) {
minIndex = i;
for (j = i + 1; j < n; j++)
if (arr[j] < arr[minIndex])
minIndex = j;
if (minIndex != i) {
tmp = arr[i];
arr[i] = arr[minIndex];
arr[minIndex] = tmp;
}
}
}