Beyond linkers
There are optimization tools even beyond linkers, although it might sound incredible. However, they aren't widely used and are rather more at the advanced end of what's available, so read this section only if you are interested in cool technologies.
We already introduced profile-guided optimization as a possibility to guide the compiler in its optimizations by providing a real-world execution profile. This is a great technique; however, it has its limitations. When parts of the code are coming from assembly or third-party libraries, compiler lacks the code and cannot apply the profile accurately. Other limitations are the previously mentioned problems of deciding what the typical load is to conduct the measurements and the high cost of obtaining performance data, especially for big applications.
Because of these problems, a different class of optimization tools was tried out, that is, post-link optimizers, which will take an already linked binary and optimize it without knowledge of its source code. One of the recent examples of post-link optimizers is Facebook's BOLT. It turned out that it's possible in this way to achieve improvements even for applications already optimized with PGO and LTO. The idea is simple—instead of obtaining profile data by instrumentation we obtain it by sampling, making the whole process a lot cheaper! The other problem, namely, the lack of source code can be solved by a code decompilation, which can only concentrate on interesting aspects of code, like the control flow. The Facebook tool mentioned uses the reconstructed control flow to reorganize the binary layout of code to improve the cache utilizations and take away pressure from the branch predictor. The reported improvements lay in the range from 2% up to 8%-15% and the tool is able to group the often-used code in a quite compact region of memory.
As we can see, the quest for improving performance goes onward!