Sorting algorithms¶

Estimated time to read: 23 minutes

TODO: Note for my furune self: add complete example of how to use those algorithms

Sorting are algorithms that put elements of a list in a certain order. It is cruxial to understand the basics of sorting in order to start understanding more complex algorithms and why you have to pay attention to efficiency.

Before going deep, please watch this video:

SEIZURE WARNING!!

and this one:

Explore the concepts interactively at visualgo.net.

Try to answer the following questions, before continuing:

What are the slowest sorting algorithms?
What are the fastest sorting algorithms?
Con you infer the difference between a stable and unstable sorting algorithm?
What is the difference between a comparison and a non-comparison sorting algorithm?
What would be an in-place and a non-in-place sorting algorithm?
What is the difference between a recursive and a non-recursive sorting algorithm?

The basics¶

Many of the algorithms will have to swap elements from the array, vector or list. In order to do that, we will need to create a function that swaps two elements. Here is the function:

// A function to swap two elements
void swap(int *xp, int *yp) {  
    int temp = *xp;  
    *xp = *yp;  
    *yp = temp;  
}

The * operator used in the function signature means that the function will receive a pointer to an integer. So it will efectivelly change the content in another scope. The * operator is used to dereference a pointer, which means that it will return the value stored in the memory address pointed by the pointer. Given the declaration is int *xp, the *xp will return the value stored in the memory address pointed by xp.

Alternatively you could use the & operator to pass the reference to that variable in the similar fashion, but the usage wont be requiring the * before the variable name as follows:

// A function to swap two elements
void swap(int &xp, int &yp) {  
    int temp = xp;  
    xp = yp;  
    yp = temp;  
}

The result is the same, but the usage is different. The first one is more common in C++, while the second one is more common in C.

Bubble sort¶

Bubble sort is the simplest sorting algorithm that works by repeatedly swapping the adjacent elements if they are in wrong order.

// A function to implement bubble sort
void bubbleSort(int arr[], int n) {  
    // if the array has only one element, it is already sorted
    if(n<=1)
        return;

    int i, j;  
    for (i = 0; i < n-1; i++)
        // Last i elements are already in place  
        for (j = 0; j < n-i-1; j++)  
            if (arr[j] > arr[j+1])  
                swap(&arr[j], &arr[j+1]);  
}

As you can see, the algorithm is very simple, but it is not very efficient. It has a time complexity of O(n^2) and a space complexity of O(1).

One of the drawbacks of this algorithm is the sheer amount of swaps. In the worst scenario, it does n^2 swaps, which is a lot. If your machine have slow writes, it will be very slow.

Insertion sort¶

Insertion sort is a simple sorting algorithm that works the way we sort playing cards in our hands. You pick one card and insert it in the correct position in the sorted part of the list. You repeat this process until you have sorted the whole list. Here is the code:

// A function to implement insertion sort
void insertionSort(int arr[], int n) {  
    // if the array has only one element, it is already sorted
    if(n<=1)
        return;

    int i, key, j;  
    for (i = 1; i < n; i++) {  
        key = arr[i];  
        j = i - 1;  

        /* Move elements of arr[0..i-1], that are  
        greater than key, to one position ahead  
        of their current position */
        while (j >= 0 && arr[j] > key) {  
            arr[j + 1] = arr[j];  
            j = j - 1;  
        }  
        arr[j + 1] = key;  
    }  
}

It falls in the same category of algorithms that are very simple, but not very efficient. It has a time complexity of O(n^2) and a space complexity of O(1).

Although it have the same complexity as bubble sort, it is a little bit more efficient. It does less swaps than bubble sort, but it is still not very efficient. It will swap all numbers to the left of the current number, which is a lot of swaps.

Selection sort¶

Selection sort is a simple sorting algorithm. This sorting algorithm is an in-place comparison-based algorithm in which the list is divided into two parts, the sorted part at the left end and the unsorted part at the right end. Initially, the sorted part is empty and the unsorted part is the entire list. The smallest element is selected from the unsorted array and swapped with the leftmost element, and that element becomes a part of the sorted array. This process continues moving unsorted array boundary by one element to the right. Here is the code:

// A function to implement selection sort
void selectionSort(int arr[], int n) {
    // if the array has only one element, it is already sorted
    if(n<=1)
        return;

    int i, j, min_idx;  

    // One by one move boundary of unsorted subarray  
    for (i = 0; i < n-1; i++) {  
        // Find the minimum element in unsorted array  
        min_idx = i;  
        for (j = i+1; j < n; j++)  
        if (arr[j] < arr[min_idx])  
            min_idx = j;  

        // Swap the found minimum element with the first element  
        swap(&arr[min_idx], &arr[i]);  
    }  
}

It is also a simple algorithm, but it is a little bit more efficient than the previous two. It has a time complexity of O(n^2) and a space complexity of O(1).

It does less swaps than the previous two algorithms, potentially n swaps, but it is still not very efficient. It selects for the current position, the smallest number to the right of it and swaps it with the current number. It does this for every number in the list, which fatally a lot of swaps.

Merge sort¶

Merge sort is a divide and conquer algorithm. It divides input array in two halves, calls itself for the two halves and then merges the two sorted halves. Here is the code:

// recursive merge sort
void mergeSort(int arr[], int l, int r) {  
    if (l < r) {  
        // Same as (l+r)/2, but avoids overflow for  
        // large l and h  
        int m = l+(r-l)/2;  

        // Sort first and second halves  
        mergeSort(arr, l, m);  
        mergeSort(arr, m+1, r);  

        merge(arr, l, m, r);  
    }  
}  

// merge function
void merge(int arr[], int l, int m, int r) {  
    int i, j, k;  
    int n1 = m - l + 1;  
    int n2 =  r - m;  

    // allocate memory for the sub arrays
    int *L = new int[n1];
    int *R = new int[n2];

    /* Copy data to temp arrays L[] and R[] */
    for (i = 0; i < n1; i++)  
        L[i] = arr[l + i];  
    for (j = 0; j < n2; j++)  
        R[j] = arr[m + 1+ j];  

    /* Merge the temp arrays back into arr[l..r]*/
    i = 0; // Initial index of first subarray  
    j = 0; // Initial index of second subarray  
    k = l; // Initial index of merged subarray  
    while (i < n1 && j < n2) {  
        if (L[i] <= R[j]) {  
            arr[k] = L[i];  
            i++;  
        }  
        else {  
            arr[k] = R[j];  
            j++;  
        }  
        k++;  
    }  

    /* Copy the remaining elements of L[], if there are any */
    while (i < n1) {  
        arr[k] = L[i];  
        i++;  
        k++;  
    }  

    /* Copy the remaining elements of R[], if there  
    are any */
    while (j < n2) {  
        arr[k] = R[j];  
        j++;  
        k++;  
    }

    // deallocate memory
    delete[] L;
    delete[] R;
}

It is a very efficient algorithm that needs extra memory to work. It has a time complexity of O(n*log(n)) and a space complexity of O(n). It is a very efficient algorithm, but it is not very simple. It is quite more complex than the previous algorithms. It is a divide and conquer algorithm, which means that it divides the problem in smaller problems and solves them. It divides the list in two halves, sorts them and then merges them. It does this recursively until it has a list of size 1, which is sorted. Then it merges the lists and returns the sorted list.

Quick sort¶

Quick sort is a divide and conquer algorithm. It picks an element as pivot and partitions the given array around the picked pivot. Here is the code:

// recursive quick sort
void quickSort(int arr[], int low, int high) {  
    if (low < high) {  
        /* pi is partitioning index, arr[p] is now  
        at right place */
        int pi = partition(arr, low, high);  

        // Separately sort elements before  
        // partition and after partition  
        quickSort(arr, low, pi - 1);  
        quickSort(arr, pi + 1, high);  
    }  
}

// partition function
int partition (int arr[], int low, int high) {  
    int pivot = arr[high]; // pivot  
    int i = (low - 1); // Index of smaller element  

    for (int j = low; j <= high- 1; j++) {  
        // If current element is smaller than or  
        // equal to pivot  
        if (arr[j] <= pivot) {  
            i++; // increment index of smaller element  
            swap(&arr[i], &arr[j]);  
        }  
    }  
    swap(&arr[i + 1], &arr[high]);  
    return (i + 1);  
}

It is a very efficient algorithm that don't needs extra memory, which means it is in-place. In average, it can be as fast as mergesort with time complexity of O(n*log(n)), but in the worst case it can be as slow as O(n^2). But it is a better choice if you are not allowed to use extra memory. It is a divide and conquer algorithm, which means that it divides the problem in smaller problems and solves them. It selects a pivot and partitions the list around the pivot. It does this recursively until it has a list of size 1, which is sorted. Then it merges the lists and returns the sorted list.

Counting sort¶

Counting sort is a specialized algorithm for sorting numbers. It only works well if you have a small range of numbers. It counts the number of occurrences of each number and then uses the count to place the numbers in the right position. Here is the code:

// counting sort
void countingSort(int arr[], int n) {  
    // if the array has only one element, it is already sorted
    if(n<=1)
        return;

    int max=arr[0];
    int min[0];

    // find the max and min number
    for(int i=0; i<n; i++) {
        if(arr[i]>max) {
            max=arr[i];
        }
        if(arr[i]<min) {
            min=arr[i];
        }
    }

    // allocate memory for the count array
    int *count = new int[max-min+1];

    // initialize the count array
    for(int i=0; i<max-min+1; i++) {
        count[i]=0;
    }

    // count the number of occurrences of each number
    for(int i=0; i<n; i++) {
        count[arr[i]-min]++;
    }

    // place the numbers in the right position
    int j=0;
    for(int i=0; i<max-min+1; i++) {
        while(count[i]>0) {
            arr[j]=i+min;
            j++;
            count[i]--;
        }
    }

    // deallocate memory
    delete[] count;
}

Counting sort is a very efficient sorting algorithm which do not rely on comparisons. It has a time complexity of O(n+k) where k is the range of numbers. Space complexity is O(k) which means it is not an in-place sorting algorithm. It is a very efficient algorithm, but it is not very simple. It counts the number of occurrences of each number and then uses the count to place the numbers in the right position.

Radix sort¶

Radix sort is a specialized algorithm for sorting numbers. It only works well if you have a small range of numbers. It sorts the numbers by their digits. Here is the code:

// Radix sort
void radixSort(int arr[], int n) {
    // if the array has only one element, return
    if(n<=1)
        return;

    // initialize the max number as the first number. 
    int max=arr[0];

    // find the max number
    for(int i=0; i<n; i++) {
        if(arr[i]>max) {
            max=arr[i];
        }
    }

    // allocate memory for the count array
    int *count = new int[10]; // 10 digits

    // allocate memory for the output array
    int *output = new int[n];

    // do counting sort for every digit
    for(int exp=1; max/exp>0; exp*=10) {
        // initialize the count array
        for(int i=0; i<10; i++) {
            count[i]=0;
        }

        // count the number of occurrences of each number
        for(int i=0; i<n; i++) {
            count[(arr[i]/exp)%10]++;
        }

        // change count[i] so that count[i] now contains actual position of this digit in output[]
        for(int i=1; i<10; i++) {
            count[i]+=count[i-1];
        }

        // build the output array
        for(int i=n-1; i>=0; i--) {
            output[count[(arr[i]/exp)%10]-1]=arr[i];
            count[(arr[i]/exp)%10]--;
        }

        // copy the output array to the input array
        for(int i=0; i<n; i++) {
            arr[i]=output[i];
        }
    }
}

Radix sort is just a counting sort that is applied to every digit. It has a time complexity of O(n*k) where k is the number of digits.

Conclusion¶

This is the first time we will talk about efficiency, and for now on, you will start evaluating and taking care about your algorithms' efficiency. You will learn more about efficiency in the next semester and course when we cover data structures.