Linking the executable
Linking is the last step in the creation of the ELF file. The cross-compiling GCC groups all the object files together and resolves the dependencies among symbols. By passing the -T filename option at the command line, the linker is asked to replace the default memory layout for the program with a custom script, contained in the filename.
The linker script is a file containing the description of the memory sections in the target, which need to be known in advance in order for the linker to place the symbols in the correct sections in flash, and instruct the software components about special locations in the memory mapping area that can be referenced in the code. The file is recognizable by its .ld extension, and it is written in a specific language. As a rule of thumb, all the symbols from every single compiled object are grouped in the sections of the final executable image.
The script can interact with the C code, exporting symbols defined within the script, and following indications provided in the code using GCC-specific attributes associated with symbols. The __attribute__ keyword is provided by GCC to be put in front of the symbol definition, to activate GCC-specific, non-standard attributes for each symbol.
Some GCC attributes can be used to communicate to the linker about:
- Weak symbols, which can be overridden by symbols with the same name
- Symbols to be stored in a specific section in the ELF file, defined in the linker script
- Implicitly used symbols, which prevent the linker from discarding the symbol because it is referred to nowhere in the code
The weak attribute is used to define weak symbols, which can be overridden anywhere else in the code by another definition with the same name. Consider, for example, the following definition:
void __attribute__(weak) my_procedure(int x) { /* do nothing */ }
In this case, the procedure is defined to do nothing, but it is possible to override it anywhere else in the code base by defining it again, using the same name, but this time without the weak attribute:
void my_procedure(int x) { y = x; }
The linker step ensures that the final executable contains exactly one copy of each defined symbol, which is the one without the attribute, if available. This mechanism introduces the possibility of having several different implementations of the same functionality within the code, which can be altered by including different object files in the linking phase. This is particularly useful when writing code that is portable to different targets, while still maintaining the same abstractions.
Besides the default sections required in the ELF description, custom sections may be added to store specific symbols, such as functions and variables, at a fixed memory addresses. This is useful when storing data at the beginning of a flash page, which might be uploaded to flash at a different time than the software itself. This is the case for target-specific settings in some cases.
Using the custom GCC attribute section when defining a symbol ensures that the symbol ends up at the desired position in the final image. Sections may have custom names, as long as an entry exists in the linker to locate them. The section attribute can be added to a symbol definition as follows:
const uint8_t
__attribute__((section(“.keys”)))
private_key[KEY_SIZE] = {0};
In this example, the array is placed in the .keys section, which requires its own entry in the linker script as well.
It is considered good practice to have the linker discard the unused symbols in the final image, especially when using third-party libraries that are not completely utilized by the embedded application. This can be done in GCC using the linker garbage collector, activated via the -gc-sections command-line option. If this flag is provided, the sections that are unused in the code are automatically discarded, and the unused symbols will in fact be kept out of the final image.
To prevent the linker from discarding symbols associated with a particular section, the used attribute marks the symbol as implicitly used by the program. Multiple attributes can be listed in the same declaration, separated by commas, as follows:
const uint8_t
__attribute__((used,section(“.keys”)))
private_key[KEY_SIZE] = {0};
In this example, the attributes indicate both that the private_key array belongs to the .keys section, and that it must not be discarded by the linker garbage collector, because it is marked as used.
A simple linker script for an embedded target defines at least the two sections relative to RAM and FLASH mapping, and exports some predefined symbols to instruct the assembler of the toolchain about the memory areas. A bare-metal system based on the GNU toolchain usually starts with a MEMORY section, describing the mapping of the two different areas in the system, such as:
MEMORY {
FLASH(rx) : ORIGIN = 0x00000000, LENGTH=256k
RAM(rwx) : ORIGIN = 0x20000000, LENGTH=64k
}
The preceding code snippet describes two memory areas used in the system. The first block is 256k mapped to FLASH, with the r and x flags, indicating that the area is accessible for read and execute operations. This enforces the read-only attribute of the whole area, and ensures that no variant sections are placed there. RAM, on the other hand, can be accessed in write mode directly, which means that variables are going to be placed in a section within that area. In this specific example, the target maps the FLASH at the beginning of the address space, while the RAM is mapped starting at 512 MB. Each target has its own address space mapping and flash/RAM size, so the linker script is definitely target-specific.
As mentioned earlier in this chapter, the .text and .rodata ELF sections can only be accessed for reading, so they can safely be stored in the FLASH area, since they will not be modified while the target is running. On the other hand, both .data and .bss must be mapped in RAM to ensure that they are modifiable.
Additional custom sections can be added in the script, in the case where it is necessary to store additional sections at a specific location in memory. The linker script can also export symbols related to a specific position in memory, or to the length of dynamically sized sections in memory, which can be referred to as external symbols and accessed in the C source code.
The second block of statements in the linker script is called SECTIONS, and contains the allocation of the sections in specific positions of the defined memory areas. The . symbol, when associated with a variable in the script, represents the current position in the area, which is filled progressively from the lower addresses available. Each section must specify the area where it has to be mapped. The following example, though still incomplete to run the binary executable, shows how the different sections can be deployed using the linker script. The .text and .rodata sections are mapped in the flash memory:
SECTIONS
{
/* Text section (code and read-only data) */
.text :
{
. = ALIGN(4);
_start_text = .;
*(.text*) /* code */
. = ALIGN(4);
_end_text = .;
*(.rodata*) /* read only data */
. = ALIGN(4);
_end_rodata = .;
} > FLASH
The modifiable sections are mapped in RAM, with two special cases to notice here.
The AT keyword is used to indicate the load address to the linker, which is the area where the original values of the variables in .data are stored, while the actual addresses used in the execution are in a different memory region. More details about the load address and the virtual address for the .data section are explained in Chapter 4, The Boot-Up Procedure.
The NOLOAD attribute used for the .bss section ensures that no predefined values are stored in the ELF file for this section. Uninitialized global and static variables are mapped by the linker in the RAM area, which is allocated by the linker:
_stored_data = .;
.data: AT(__stored_data)
{
. = ALIGN(4);
_start_data = .;
*(.data*)
. = ALIGN(4);
_start_data = .;
} > RAM
.bss (NOLOAD):
{
. = ALIGN(4);
_start_bss = .;
*(.bss*)
. = ALIGN(4);
_end_bss = .;
} > RAM
}
The alternative way to force the linker to keep sections in the final executable, avoiding their removal due to the linker garbage collector, is the use of the KEEP instruction to mark sections. Please note that this is an alternative to the __attribute__((used)) mechanism explained earlier:
.keys :
{
. = ALIGN(4);
*(.keys*) = .;
KEEP(*(.keys*));
} > FLASH
It is useful, and advisable in general, to have the linker create a .map file alongside the resultant binary. This is done by appending the -Map=filename option to the link step, such as in:
$ arm-none-eabi-ld -o image.elf object1.o object2.o -T linker_script.ld -Map=map_file.map
The map file contains the location and the description of all the symbols, grouped by sections. This is useful to look for the specific location of symbols in the image, as well as for verifying that useful symbols are not accidentally discarded due to a misconfiguration.
Cross-compiling toolchains provide standard C libraries for generic functionalities, such as string manipulation or standard types declarations. These are substantially a subset of the library calls available in the application space of an operating system, including standard input/output functions. The backend implementation of these functions is often left to the applications, so that calling a function from the library that requires interaction with the hardware, such as printf, implies that a write function is implemented outside of the library, providing the final transfer to a device or peripheral.
The implementation of the backend write function determines which channel would act as the standard output for the embedded application. The linker is capable of resolving the dependencies towards standard library calls automatically, using the built-in newlib implementation. To exclude the standard C library symbols from the linking process, the -nostdlib option can be added to the options passed to GCC during the linking step.