Supernaturally Fast Sorting

28 Sep 2021 • 7 minute read

When I was in my mid-teens, I was writing a program (in FORTRAN) that required me to sort an array of a couple of dozen items. I had no idea that there were lots of established algorithms (Quicksort was invented in 1959 for example). We had almost no books on programming in the school library, and certainly none that included anything about sorting. The definitive work on sorting, volume 3 of Knuth's The Art of Computer Programming called Sorting and Searching would not be published until 1973.

I came up with a not-very-good algorithm, to run through the array and swap each pair of elements that were out of order. Repeat. If I got through the whole array without any swaps then the array was sorted and I was done. Actually, for a couple of dozen elements, an n² algorithm like this is not bad since it has so little overhead. If you are going to sort a thousand elements, let alone a billion, it is worth using an algorithm with much more fixed overhead but better asymptotic performance. You may know that an optimal sorting algorithm is n*log(n). For a small number, this is not much different from n² but for a large number it is much smaller. For example, for one million, n² is a trillion, but n*log(n) is twenty million, quite a difference. Even if the overhead difference is a factor of ten, that is still two hundred million versus a trillion, thousands of times smaller.

When I wrote VLSI Technology's circuit extractor in the early 1980s, I had to get serious about sorting. I'm not going to go into all the details of how a circuit extractor works, but basically it runs a scanline across the designs looking at the polygons in the layout. This requires all the polygons to have already been sorted so that they show up in the order that they are needed. This was more polygons than would fit in memory so I had to go to Knuth's masterwork and find out how people sorted large amounts of data on magnetic tape, since this was the closest thing to what I needed to do. Sorting large amounts of data on tape required 4 tape drives. You would start with all the data on one tape drive, read as much into the memory of the computer as would fit, sort it, and then write it out to one of the other two tape drives, alternating between them. Once that phase was over, the operator would remove the tape with the original data (no write-protect ring) and replace it with a blank tape (with the ring in place). The program would then read data from the sorted runs on the two tapes, interlace them so they were sorted, and write them out to the other two tape drives. Since both tapes contained runs of sorted records, this did not require fitting more than a couple of records in memory. When that was done, the program would switch over, using the previous output tapes as new inputs, and overwriting the old input tapes. This process was repeated until a single run was written, which was the original data now all sorted. There are actually various optimizations so you can get away with three tape drives, but that is the basic idea.

Most of the academic work on sorting and searching has been on algorithms that achieve n*log(n) performance. Probably the most famous sorting algorithm is Quicksort, that I already mentioned above. This was invented by then-26-year-old Tony Hoare at Oxford University in the late 1950s. Despite its venerable age, it is still a widely used algorithm today. It has low overhead, at least when implemented on a language that allows recursion, so it is faster than other n*log(n) sorting algorithms such as heapsort, although under some extreme cases it can degenerate into n² performance.

Timsort

tim peters A lot of sorting is done today in Python and none of these algorithms are used. Instead, a sorting algorithm that you have probably never heard of is used, called Timsort.

Timsort was developed by Tim Peters in 2002, so it is much more recent than the older algorithms invented in the 1950s and 1960s. It is also the algorithm used in some implementations of Java, Swift, Rust, and more. Timsort optimizes for runs within the data that are already sorted (either ascending or descending). This is common, in practice. The worst case for sorting of truly randomly ordered keys simply doesn't happen much (except, ironically, when developing and testing sorting algorithms). Timsort, on real-world data, is "supernatural" in Tim Peters' own words:

[Here is] stable, natural merge sort, modestly called Timsort (hey, I earned it). It has supernatural performance on many kinds of partially ordered arrays (less than lg(N!) comparisons needed, and as few as N-1), yet as fast as Python’s previous highly tuned sample sort hybrid on random arrays.

The main routine marches over the array once, left to right, alternately identifying the next run, then merging it into the previous runs “intelligently”. Everything else is complication for speed, and some hard-won measure of memory efficiency.

There is a lot more detail on the algorithm on the Timsort page on Github which explains how the algorithm was implemented by Tim Peters in July 2002. There is also an extensive Wikipedia entry for Timsort.

Here's a story Beating Timsort at Merging to show just how good Timsort is. Adam Gordon Bell was merging lists, which should be very fast. And it is. But Timsort is even faster. It was just not possible to beat Timsort in Python without dropping down into C (Timsort is also implemented in C). Adam's conclusion:

The surprising thing, though, is how good Timsort still is. It wasn’t designed for merging sorted lists but for sorting real-world data. It turns out real-world data is often partially sorted, just like our use case. Timsort on partially sorted data shows us where Big O notation can misinform us. If your input always keeps you near the median or best-case performance, then the worst-case performance doesn’t matter much. It’s no wonder then that since its first creation, Timsort has spread from Python to JavaScript, Swift, and Rust. Thank you, Tim Peters!

The Zen of Python

Tim Peters also wrote a set of rules for Python programming (actually, programming in general):

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

The Battle of Salamis

This is nothing to do with fighting with sausages, nor anything to do with sorting. But today is the 2,500th anniversary of the battle of Salamis, the most significant battle of the era when the outnumbered Greek city-states decisively defeated the invading forces of the Persian Xerxes. We might not be a democracy today if they had lost, so it was significant. Different sources give different precise dates, between September 23 and 29, 480 BC. But wait, you might say, isn't 480+2020 = 2500 so it was surely last year. But that is not quite the right calculation since that assumes that there was a year 0. But there wasn't, 1 BC was followed by 1 AD. So today is actually the 2,500th anniversary. Although it is probably actually in October, since I don't expect that the date has been adjusted for the Julian to Gregorian calendar switch, which, as I wrote in my post October Revolution, is how the anniversary of the Russian October Revolution is on November 7. In fact, since Salamis pre-dates even the Julian calendar (45 BC) by over 400 years, I've no idea how the precise date was actually derived.

Sign up for Sunday Brunch, the weekly Breakfast Bytes email.