Counting Sort
This time, I was in the mood for generalizing counting sort to arbitrary (Comparable) objects.
This is not possible. Counting sort requires that the sort keys be "small" positive integers. You cannot sort arbitrary objects with counting sort, even if you restrict them to be comparable.
Note: the objects themselves don't have to be small positive integers, but the sort keys have to be, or at least have to map to a finite set of small positive integers.
Comparable
Ironically, given I wrote above that you are being too general, you are also needlessly restricting your input: there is no requirement that the elements being sorted by counting sort are Comparable. In fact, the whole point of counting sort is that it is not a comparison sort, i.e., that it is not based on comparisons. Equality is all you need. (Obviously, the sorted output induces a mathematical comparison relation over the input, but it is not an a priori requirement for the actual implementation of the algorithm that the objects implement the Comparable interface.)
The fact that counting sort is not a comparison sort and does not use comparisons means that it is not subject to the \$\mathcal{\Omega(n \log{}n)}\$ lower bound of comparison sorts. In fact, counting sort's worst-case asymptotic step complexity is \$\mathcal{O(n+k)}\$, as is its worst-case asymptotic space complexity, where \$\mathcal{k = key_{max}-key_{min}}\$.
That asymptotic bound is a key ingredient of what makes counting sort a counting sort, and would probably be one of the major reasons why someone would choose a counting sort. (For some interesting thoughts on what it means for a program to be a genuine implementation of an algorithm, I can highly recommend Melissa E. O'Neill's The Genuine Sieve of Eratosthenes.)
API documentation
There is no API documentation in your code. Regardless of whether you actually meant to implement a genuine counting sort or not, this code needs documentation:
- If this were a genuine counting sort, the requirements for the elements of the collection must be documented, since they cannot be (easily) expressed in the Java type system.
- Since this is not a genuine counting sort, it really should be documented that it is not a genuine counting sort, especially since it has counting sort in the name.
- One of the major reasons for choosing one sorting algorithm over another are the various performance and stability guarantees the algorithm gives as well as the overhead of the particular implementation. In this case, the sort is:
- stable,
- not-in-place, and
- the worst-case asymptotic step complexity looks to be \$\mathcal{O((n+c) \log{}c)}\$, where \$c\$ is
keys.size().
- Your algorithm requires the elements to be both comparable and hashable.
Comparable is called out in the type signature, but hashability is not mentioned. Of course, in Java, all objects are hashable, but a user will likely not expect a sort method to require hashing, therefore, it should be explicitly called out.
Naming
Since this is not, in fact, a counting sort, I would refrain from using "counting sort" anywhere in the name.
Benchmarking
As already mentioned in other answers, your benchmark methodology leaves something to be desired. I will not critique the benchmark in detail, but instead offer some more general advice:
Benchmarking is hard. Really hard. It requires above-average knowledge of programming language pragmatics (e.g., high-level and low-level compiler optimizations, runtime internals, memory allocation and management, etc.), computer organization and implementation (memory hierarchy, caching and cache coherency, etc.), CPUs (ISAs, microarchitectures, optimizations, bottlenecks, etc.), statistics, and probably dozens of other things I forget. Benchmarks are generally written by entire teams of people with lots of experience, for whom writing benchmarks is their full-time and only job … and they still get it wrong sometimes.
Some (in)famous benchmark mistakes that actually happened in industrial, peer-reviewed benchmarks:
- Writing the benchmark in such a way that the compiler recognized that the result of the benchmark was never used and thus optimized the entire benchmark away.
- Writing the benchmark in such a way that the compiler recognized that the benchmark result was independent of the benchmark loop and thus hoisted the benchmark outside of the loop and only executed it once.
- Writing a SQL database benchmark that was so trivial that the benchmark actually just measured how fast the memory allocator of the JVM could allocate the
java.sql.ResultSet objects.
In fact, you did not just write the benchmarks, you also wrote the benchmark harness, i.e., the code which wraps around the benchmarks, manages their lifecycle, performs the measurements, collects the results, minimizes interference, etc. That is even harder than just writing the benchmarks. I encourage everyone who writes benchmarks and/or harnesses to read (and understand!) the mailing list thread on the Mechanical Sympathy mailing list starting at https://groups.google.com/g/mechanical-sympathy/c/m4opvy4xq3U/m/7lY8x8SvHgwJ.
Just to point out one example: almost all JVMs have a profiling period before they actually perform any optimizations. For Oracle's OpenJDK in server mode and in the default configuration, it does not even compile a method until the method has been executed 20000 times. Your sort method is only executed once, so it will not be optimized, unless you really force the JVM with a barrage of command line arguments. And even then, you will need many measurements to filter out measurement noise, run-to-run variations, etc.
Another problem with your benchmarks is that they are not repeatable. I cannot run the benchmark and compare my results to yours, because I do not know the random seed you used to generate the input list.