Make the frequent cases fast and the rare case correct

Order Statistics – General (Find the Kth smallest element from an unsorted Array in Linear Time) – Specific ( Find median of unsorted array in Linear Time) – Randomised and Deterministic Algorithms

Given an unsorted array, how quickly can one find the median element?

In a simple way we can use a sorting algorithm initially probably a O(n lg n) algorithm and then finding a median or k^{th} smallest element is a trivial matter.

Can one do it more quickly than by sorting? This was an open question for some time, solved affirmatively in 1972 by (Manuel) Blum, Floyd, Pratt, Rivest, and Tarjan. This Post implements a Linear time Randomised algorithm (expected run time is Linear) based on Divide and Conquer Technique which Quicksort makes use of and also the worst case linear time Deterministic Algorithm

The Analysis of the Randomised select is best explained in the video. Both Randomised and Deterministic(worst case Linear time algorithms are explained in the MIT lecture)

Randomised Select Algorithm:

Algorithm :
QuickSelect:
Given array A of size n and integer k ≤ n,
1. Pick a pivot element p at random from A.
2. Split A into subarrays LESS and GREATER by comparing each element to p as in Quicksort. While we are at it, count the number L of elements going in to LESS.
3. (a) If L = k − 1, then output p.
(b) If L > k − 1, output QuickSelect(LESS, k).
(c) If L < k − 1, output QuickSelect(GREATER, k − L − 1)

Thus, equipped with this O(n) algorithm below is python implementation :

#AUTHOR : Chinmay Lokesh
#Source Algorithm : MIT lecture 6 and PDF lecture in this article
import random
import time
def RandSelect(A,k,length):
#let r be chosen uniformly at random in the range 1 to length(A)
n = length-1
r = random.randint(0, length-1)
A1 = []
A2 = []
pivot = A[r]
# lesser and bigger array
for i in range ( 0 , n+1):
if A[i] < pivot :
A1.append(A[i])
if A[i] > pivot :
A2.append(A[i])
if k <= len(A1):
# search in list of small elements
return RandSelect(A1, k ,len(A1))
if k > len(A) - len(A2):
# search in the pile of big elements
return RandSelect(A2, k - (len(A) - len(A2)) , len(A2))
else :
return pivot
A = range(1,1000001)
random.shuffle(A)
length = len(A)
start = time.strftime('%s')
value = RandSelect(A,450000,length)
print value
end = time.strftime('%s')
time = int(end) - int(start)
print "Time to execute the Randomized select on the input =", time ,"Second(s)"

NOTE : In the video @ 16:20 the Prof mentions that input is an unique array as we are more interested in it. However even if we have a duplicate element array we can eliminate those by using a O(n) duplicate elimination algorithm. a simple addition to the quickselect.

OUTPUT : Input is a shuffled list of unique numbers from 1 to 1 million. Hence it is obvious to get the kth number between 1-1million at kth postion.

labuser@ubuntu:~$ python quickselect.py
450000
Time to execute the Randomized select on the input = 4 Second(s)
labuser@ubuntu:~$

Randomised algo Summary:
Expected Run time : O(n)
Worst case : O(n^{2})

Deterministic Algorithm : Worst case Linear time – Median of Medians

The algorithm in words:
1. Divide n elements into groups of 5

2. Find median of each group (How? How long?)
3. Use Select() recursively to find median x of the ?n/5? medians
4. Partition the n elements around x. Let k = rank(x)
5. if (i == k) then return x
if (i k) use Select() recursively to find (i-k)th smallest element in last partition

I am Delighted that my friend implemented this taking time from his busy full time software engineering job.

import java.util.HashSet;
import java.util.Scanner;
public class Median
{
//BSelect by Blum, Floyd, Pratt, Rivest, Tarjon (1972)
//@Author Kishore Ravindran
//@Date of Creating 21-Feb-2011
//Algorithm: DeterministicSelect
//Step 1: Break n elements into blocks of 5 each.
//Step 2: Compute the median of each 5-element block in constant time.
//Step 3: Collect together the n/5 medians and recursively compute their median.
//Step 4: Chose the resulting element as a pivot
public static void main (String[] args)
{
System.out.println("Please enter the size of the array");
Scanner myscan = new Scanner(System.in);
// Handle InputMismatchException exception here;
int number = myscan.nextInt();
Integer [] input = new Integer[number];
System.out.println("Please enter the array elements");
System.out.println("");
for(int i=0;i<number;i++)
{
input[i] = myscan.nextInt();
}
System.out.println("Enter a number less than the size of the array");
System.out.println("");
// Check and InputMismatchException exception needs to be handled here;
number = myscan.nextInt();
Median m = new Median();
System.out.println("The " + number + "th smallest element in the input is " + m.DeterministicSelect(input, number));
}
// simple module to find the median of a array
int median(Integer[] array)
{
if (array.length == 1)
{
return array[0];
}
int smallerCount = 0;
for (int i = 0 ; i < array.length ; i++)
{
for (int j = 0 ; j < array.length ; j++)
{
if (array[i] < array[j])
{
smallerCount++;
}
}
if (smallerCount == (array.length - 1)/2)
{
return array[i];
}
smallerCount = 0;
}
return -1; //should never happen
}
// finds pivot element of a given array recursively using DeterministicSelect
int Pivot(Integer[] array)
{
if (array.length == 1)
{
return array[0];
}
//Divide A into n/5 groups of 5 elements each
int numGroups = array.length / 5;
if (array.length % 5 > 0)
{
numGroups++;
}
Integer[] setOfMedians = new Integer[numGroups];
for (int i = 0 ; i < numGroups ; i++)
{
Integer[] subset;
if(array.length % 5 > 0)
{
if (i == numGroups - 1)
{
subset = new Integer[array.length % 5];
}
else
{
subset = new Integer[5];
}
}
else
{
subset = new Integer[5];
}
for (int j = 0; j < subset.length ; j++)
{
subset[j] = array[5*i+j];
}
//Find the median of each group
setOfMedians[i] = median(subset);
}
//Use DeterministicSelect to find the median, p, of the n/5 medians
int goodPivot = DeterministicSelect(setOfMedians, setOfMedians.length/2 );
return goodPivot;
}
//The algorithm in words:
//1. Divide n elements into groups of 5
//2. Find median of each group (How? How long?)
//3. Use Select() recursively to find median x of the n/5? medians
//4. Partition the n elements around x. Let k = rank(x)
//5. if (i == k) then return x else (i > k) use Select() recursively to find (i-k)th
//smallest element in last partition
//source
//Lecture PDF mentioned in the blog post
//and MIT Lecture 6 order statistics.
int DeterministicSelect(Integer[] array, int k)
{
if (array.length == 1)
{
return array[0];
}
int pivot = Pivot(array);
//set is used to ignore duplicate values
HashSet<Integer> A1 = new HashSet<Integer>();
HashSet<Integer> A2 = new HashSet<Integer>();
for (int i = 0; i < array.length ; i++)
{
if (array[i] < pivot)
{
A1.add(array[i]);
}
else if (array[i] > pivot)
{
A2.add(array[i]);
}
}
if (A1.size() >= k)
{
return DeterministicSelect(A1.toArray(new Integer[0]) ,k);
}
else if (A1.size() == k-1)
{
return pivot;
}
else
{
return DeterministicSelect(A2.toArray(new Integer[0]) , k - A1.size() - 1);
}
}
}

OUTPUT:

labuser@ubuntu:~$ java Median
Please enter the size of the array
9
Please enter the array elements

3
2
6
4
5
1
8
9
10
Enter a number less than the size of the array

4
The 4th smallest element in the input is 4
labuser@ubuntu:~$

NOTE : In the above implementation O(n^{2} sort algorithm is used for finding median in the groups. But we can use a O(nlgn) sort.The sort part is trivial and not much of concern here

The median-calculating recursive call does not exceed worst-case linear behavior because the list of medians is 20% of the size of the list, while the other recursive call recurs on at most 70% of the list, making the running time

T(n) ≤ T(n/5) + T(7n/10) + O(n)

The O(n) is for the partitioning work (we visited each element a constant number of times, in order to form them into O(n) groups and take each median in O(1) time). From this, one can then show that T(n) ≤ c*n*(1 + (9/10) + (9/10)2 + …) = O(n).
[edit] Important notes

Although this approach optimizes quite well, it is typically outperformed in practice by the expected linear algorithm with random pivot choices.

The worst-case algorithm can construct a worst-case O(n log n) quicksort algorithm, by using it to find the median at every step.

This article does some advanced analysis and shows theorems from the Art of Programming book by Dr Knuth

This post has only given the problem statement algorithm and implementation. However the most crucial part “THE ANALYSIS” is in the MIT video Lecture and I personally feel knowing the analysis is of far greater importance than the implementation.

10 responses to “Order Statistics – General (Find the Kth smallest element from an unsorted Array in Linear Time) – Specific ( Find median of unsorted array in Linear Time) – Randomised and Deterministic Algorithms”

Your absolutely right, analysis is more critical that implementing the algo, I believe Deterministic select algorithm is far more easier to analyse than the Randomised select algorithm.

Algo for Deterministic select algorithm… I have implemented it.. why dont u share it in ur blog 🙂
/*
* Procedure: Select(A,l,h,k)
A: Array
l: first element index
h: last element index
k: the position of element to be found (i.e. this procedure find kth element of array A[l…h]

1) Let q be the length of the array q = h – l;
2) if q<44, sort the array and return A[k]; //this 44 number comes from complexity analysis of this algo.
// can be ignored
3) take 5 elements of array at once and find their medians.
4) Now we have q/5 medians. Let array Q store these medians.
5) MediaOfMedians = Select(Q,1,q/5,q/10);
6) Divide Array A into three parts, A1,A2 and A3 such that
A1 = Elements of A MedianOfMedians
7) If |A1| > k return Select(A1,1,|A1|,k);
Else if |A1|+|A2| > k return MedianOfMedians
Else return Select(A3,1,|A3|,k – |A1| – |A2|);
*/

if there douplicated values it will not give right Kth ,
i sugesst the next soulation
– using three list to store the number
A1: the number less than the pivote value
A2: the number more than the pivote value
A3 : the number equal to the pivote value
– changing the condition as next
if (A1.count >= k) then DeterministicSelect(A1 ,k);
else if (A1.count == k- A3.count ) return pivote
else DeterministicSelect(A2.toArray(new Integer[0]) , k – A1.count- A3.count);

Appreciate your suggestion, but I guess we could just go ahead and use a duplicate element elimination algo and put it as input to the above algo.if you noticed the video and article is meant for an unique element array. Jump to 16:20 of the video.

What is the easiest way to get this to work for an array with duplicate elements. For example: if A={1,1,2,2,3}, median should be 2 [ (length/2)+1 th element ] and NOT 3.

Your absolutely right, analysis is more critical that implementing the algo, I believe Deterministic select algorithm is far more easier to analyse than the Randomised select algorithm.

Algo for Deterministic select algorithm… I have implemented it.. why dont u share it in ur blog 🙂

/*

* Procedure: Select(A,l,h,k)

A: Array

l: first element index

h: last element index

k: the position of element to be found (i.e. this procedure find kth element of array A[l…h]

1) Let q be the length of the array q = h – l;

2) if q<44, sort the array and return A[k]; //this 44 number comes from complexity analysis of this algo.

// can be ignored

3) take 5 elements of array at once and find their medians.

4) Now we have q/5 medians. Let array Q store these medians.

5) MediaOfMedians = Select(Q,1,q/5,q/10);

6) Divide Array A into three parts, A1,A2 and A3 such that

A1 = Elements of A MedianOfMedians

7) If |A1| > k return Select(A1,1,|A1|,k);

Else if |A1|+|A2| > k return MedianOfMedians

Else return Select(A3,1,|A3|,k – |A1| – |A2|);

*/

Thanks for the implementation . I will put this up shortly

Good explanation!

It is a great post. Thanks for the implementation.

Thnx for the implementation..:)

if there douplicated values it will not give right Kth ,

i sugesst the next soulation

– using three list to store the number

A1: the number less than the pivote value

A2: the number more than the pivote value

A3 : the number equal to the pivote value

– changing the condition as next

if (A1.count >= k) then DeterministicSelect(A1 ,k);

else if (A1.count == k- A3.count ) return pivote

else DeterministicSelect(A2.toArray(new Integer[0]) , k – A1.count- A3.count);

Appreciate your suggestion, but I guess we could just go ahead and use a duplicate element elimination algo and put it as input to the above algo.if you noticed the video and article is meant for an unique element array. Jump to 16:20 of the video.

What is the easiest way to get this to work for an array with duplicate elements. For example: if A={1,1,2,2,3}, median should be 2 [ (length/2)+1 th element ] and NOT 3.

NOTE : In the video @ 16:20 the Prof mentions that input is an unique array. You can remove duplicates and set it as input.