Replacing the global memory manager
Admittedly, modern memory managers became very good at satisfying these conflicting requirements, so has any change been made to beat them at their game? I'd say, probably not. But there are alternative general memory managers that are touting their performance advantages over the standard malloc function. Should we use them instead?
There are some alternative, high-performance memory managers that have to be taken seriously, such as Google's tcmalloc, ptmalloc2 used in the GCC compiler, and jemalloc from FreeBSD. However, as one study has shown, they are all on a par, and more importantly, each of them can outperform the other ones under certain conditions. So, changing the memory manager isn't a silver bullet. You will have to test their impact in your specific setting.
For example, one study that has tested replacing the default GCC allocator in an embedded device, reports 300 ms faster startup times of a QML application when using tcmalloc. However, the overall startup times decreased from 5,325 ms to 5,045 ms, hence the overall improvement amounts to less than 6%. In this case, it seems that the startup is rather dominated by reading and parsing the QML files.
One counter-example widely used in games is the VMem memory allocator, which is specifically optimized for the constrained environments of game consoles.