Regarding those quick-sort variants which do things like median-of-k pivot selection before subdividing…

A couple of years ago (before I lost access to the write-up I did at that time) it struck me that the diminishing-returns of using the median of larger samples could be amortised over multiple levels of recursion by keeping the outcome for later; because computing the median is a fair way towards sorting the whole sample.

So I wrote plpqsort, which might have stood for something like “pivot-list-prefixed quick sort”, and maybe it could be pronounced “plippick sort”. It’s hard to say what I was thinking back then.

But don’t worry about the name. It takes a lot more careful thought to do a better quicksort than what’s many people have put a lot of effort into already, and I only wanted to demonstrate a concept and slap a silly name on it so people could draw inspiration from its singular innovation.

How it works

The way this works is to take a random sampling of the unsorted data, and to move that sample to the front of the list (doing it this way isn’t efficient, but it represents a simpler conceptual model).

</g>

The median of these is going to be our pivot. But rather than compute just the median, the whole sample is sorted fully. This is now our “pivot-list prefix”, and the median picked from the middle of that prefix.

Then partition the data after that prefix in the usual way.

Then, exchange the top part of the low portion for the top part of the pivot list (again, maybe not an efficient way to go about it), and set the start of the second partition to start at the part of the prefix that was just moved.

Now we have two partitions, one less than or equal to the pivot, and one greater than the pivot; and each partition begins with a sorted prefix. Exclude the old pivot. We’re done with that.

And recurse.

Now we’re beginning to amortise. Since the prefix is already sorted we don’t need to do that again. It is half the size, but so long as it’s not too small we can still use it. Otherwise collect a fresh sample and sort that.

That’s it.

the result

Does it work? Hard to say. My implementation is dumb, and my benchmark is dumber.

The only metrics I looked at were swaps and compares, and they didn’t turn out great. A serious implementation would probably break out of STL and resort to SIMD optimisations and suchlike, and make a proportion of the operations not-worth-measuring; and I just wasn’t going to spend the time on that because it’s not meaningful until the rest of the algorithm has been carefully designed and tested.

The prefix list also offers insights which I didn’t try to exploit because I didn’t have a realistic benchmark in which to validate that effort.

  • Before you sort the prefix, is it already sorted (collect samples at ascending offsets to benefit from this)? That might be worth a closer look.
  • After you sort the prefix, is some value grossly overrepresented in the sample? Or at the very least, does the pivot value also appear at some distance threshold from the middle of the prefix (it’s sorted so there’s no scanning involved to figure this out)?
  • Pack a list of samples in the range of the block being partitioned into a SIMD register, and for every value visited increment a counter for each lane where the visited value is less than that register’s value. At the end you can use this to interpolate a much more accurate pivot for the next round. Or even use it to pre-lay bins for an American-flag sort.

But it’s an idea. With a name.