17. Adaptive merge sort

When I was implementing STL I said that the OS vendor should know how much extra storage is available. So, I’m going to have this function called get_temporary_buffer which allocates as much physical memory is available at this moment. It’s the only outside hook which STL will use. But, it is vendor specific, you cannot do it as a client. There is actually no call in UNIX which tells you how much physical memory you have, how much is used, it’s just impossible. But, I needed to ship it, and in order to do that, I couldn’t just require them to add a hook. So I wrote the following thing:

So it binary searches for a buffer small enough to fit¹. Is it a useful piece of code? No. But, I had to ship. Guess what happened after that. You might think they didn’t change it. You’re wrong, they did change it. They removed my comment.

Every implementation, UNIX, Microsoft, Apple, does the same binary search. I have been telling this story for decades, nothing. Therefore, my function always returns whatever you ask, because it just allocates virtual memory². It used to be a problem when we had 16-bit address spaces. But, we have 64 bit address spaces.

If it was correct, I think it would be useful. You want as much physical memory as the system has. There is virtual memory but virtual memory is actually useless unless it’s backed by physical memory. It’s useful for remapping things³. But, it is a figment of imagination. It does not exist. As Seymour Cray used to say, “you can’t simulate what you do not have”⁴. If your algorithm working set doesn’t fit into physical memory, it will not just thrash, your program will not terminate, because your memory starts working at the speed of a disk. That’s not good enough.

Merge with buffer

The first thing we are going to write is merge_with_buffer. Let’s assume that this buffer is big enough. Eventually we will have to figure out how to deal with limited buffers. Right, now let’s assume whoever is going to call is going to assure it. What we do is copy from the first range into our buffer, then we merge back into the original buffer.

Even though we aren’t worried about it now, we can see the buffer will need to be big enough to copy the entire left half in, so about size n/2.

Note that the buffer doesn’t have to match the type of the container. We will probably use an array for buffer, but I could be an iterator for a linked list. This is a general principle, relax type requirements.

Let’s grab our in-place sort from before and modify it. It’s identical to our other, except it uses our new merge_with_buffer. Should this function allocate the buffer? No because it is recursive.

Note we put the buffer argument at the end, because we are extending the interface of the previous sort.

Now to use it in our framework we need a more convenient interface. We have too many parameters, so we need to somehow get rid of all of them. We write a wrapper.

Performance test

Look at how fast that is. It’s already within 10% of our goal. It shows us the spectrum of what’s possible.

Adaptive merge

We play the copy and paste game with our work from before. The function merge_inplace_left_subproblem and the right variant, do not need to be changed, so they can be included.

Now we know the drill to turn this into sort. I will just show the sort so we can see the buffer allocation:

Performance test

How does this one do? Worse than before, but we are also using about 10x less memory.

Code

malloc returns NULL when it fails to allocate of the requested size. Alex’s get_temporary_buffer function uses that as an indicator that the requested buffer was too large and continues attempting smaller and smaller buffers.↩
Virtual memory allows programs to allocate more memory than is physically available by saving and loading portions of memory to disk as needed. When memory is fully utilized the system starts working slower rather than simply crashing.

Even though the total amount of virtual memory available on a system is very large, individual memory allocations are typically limited. For example, when testing this code on Linux, the system only allows a program to allocate a buffer up to the total physical memory size.

What this means is that Alex’s implementation of get_temporary_buffer is not useful. It is equivalent to malloc(n) for anything but extremely large allocations.

Exercise: Experiment with get_temporary_buffer on your machine. How large of an allocation will it give you?
↩
Memory mapping files is a very useful application of virtual memory. When a program wants to interact with a file on disk it can instead request that the system map it to a range in memory. The file can then be manipulated by reading and writing to pointers as if it was a buffer instead of a file. In other words, the program can interact with the file, just like other data. See mmap(2) for details.

Alex has used memory mapped files in his own code.
↩
I cannot find a reference to this quotation.↩
AMD Ryzen 5 2400G (8 core, 3.6 GHz). GCC 9.3.0↩

17. Adaptive merge sort

“temporary” buffers in STL

Merge with buffer

Performance test

Adaptive merge

Performance test

Code