Learning standard integer types
To address the uncertainty of the default types provided by C and C++, both provide the standard integer types, which are accessible from the stdint.h header file. This header defines the following types:
- int8_t, uint8_t
- int16_t, uint16_t
- int32_t, uint32_t
- int64_t, uint64_t
In addition, stdint.h provides both least and fast versions of the aforementioned types, and a max type and integer pointer type, which is all out-of-scope for this book. The previous types do exactly what you would expect; they define the width of integer types with a specific number of bits. For example, an int8_t is a signed 8 bit integer. No matter what the CPU architecture, operating system, or mode is, these types are always the same (with the only thing not being defined is their endianness, which is usually only needed when working with networking and external devices).
In general, if the size of the data type you are working with is important, use the standard integer types instead of the default types provided by the language. Although the standard types do solve a lot of the problems already identified, they do have their own issues. Specifically, stdint.h is a compiler provided header file, with a different header being defined for each CPU architecture and operating system combination possible. The types defined in this file are typically represented using the default types under the hood. This can be done because the compiler knows if an int32_t is an int, or a long int. To demonstrate this, let's create an application that's capable of comparing integer types.
We will start with the following headers:
#include <typeinfo>
#include <iostream>
#include <string>
#include <cstdint>
#include <cstdlib>
#include <cxxabi.h>
The typeinfo header will provide us with C++ supported type information, which will ultimately provide us with the root type for a specific integer type. The problem is that typeinfo provides us with the mangled versions of this type information. To demangle this information, we will need the cxxabi.h header, which provides access to the demangler built into C++ itself:
template<typename T>
std::string type_name()
{
int status;
std::string name = typeid(T).name();
auto demangled_name =
abi::__cxa_demangle(name.c_str(), nullptr, nullptr, &status);
if (status == 0) {
name = demangled_name;
std::free(demangled_name);
}
return name;
}
The previous function returns the root name for a provided type T. This is done by first getting the type's name from C++, and then using the demangler to convert the mangled type information into its human-readable form. Finally, the resulting name is returned:
template<typename T1, typename T2>
void
are_equal()
{
#define red "\033[1;31m"
#define reset "\033[0m"
std::cout << type_name<T1>() << " vs "
<< type_name<T2>() << '\n';
if (sizeof(T1) == sizeof(T2)) {
std::cout << " - size: both == " << sizeof(T1) << '\n';
}
else {
std::cout << red " - size: "
<< sizeof(T1)
<< " != "
<< sizeof(T2)
<< reset "\n";
}
if (type_name<T1>() == type_name<T2>()) {
std::cout << " - name: both == " << type_name<T1>() << '\n';
}
else {
std::cout << red " - name: "
<< type_name<T1>()
<< " != "
<< type_name<T2>()
<< reset "\n";
}
}
The previous function checks to see if both the name and size of the type are the same, as they do not need to be the same (for example, the size could be the same, but the type's root might not be). It should be noted that we add some strange characters to the output of this function (which outputs to stdout). These strange characters tell the console to output in the color red in the event that a match was not found, providing a simple means to see which types are the same, and which types are not the same:
int main()
{
are_equal<uint8_t, int8_t>();
are_equal<uint8_t, uint32_t>();
are_equal<signed char, int8_t>();
are_equal<unsigned char, uint8_t>();
are_equal<signed short int, int16_t>();
are_equal<unsigned short int, uint16_t>();
are_equal<signed int, int32_t>();
are_equal<unsigned int, uint32_t>();
are_equal<signed long int, int64_t>();
are_equal<unsigned long int, uint64_t>();
are_equal<signed long long int, int64_t>();
are_equal<unsigned long long int, uint64_t>();
}
Finally, we will compare each standard integer type with the expected (or more appropriately stated, typical) default type to see if the types are in fact the same on any given architecture. This example can be run on any architecture to see what the differences are between the default types and the standard integer types so that we can look for discrepancies if this information is needed when system programming.
The results are as follow (for an Intel-based 64 bit CPU on Ubuntu) for a uint8_t:
are_equal<uint8_t, int8_t>();
are_equal<uint8_t, uint32_t>();
// unsigned char vs signed char
// - size: both == 1
// - name: unsigned char != signed char
// unsigned char vs unsigned int
// - size: 1 != 4
// - name: unsigned char != unsigned int
The following shows the results of a char:
are_equal<signed char, int8_t>();
are_equal<unsigned char, uint8_t>();
// signed char vs signed char
// - size: both == 1
// - name: both == signed char
// unsigned char vs unsigned char
// - size: both == 1
// - name: both == unsigned char
Finally, the following code shows the results for the remaining int types:
are_equal<signed short int, int16_t>();
are_equal<unsigned short int, uint16_t>();
are_equal<signed int, int32_t>();
are_equal<unsigned int, uint32_t>();
are_equal<signed long int, int64_t>();
are_equal<unsigned long int, uint64_t>();
are_equal<signed long long int, int64_t>();
are_equal<unsigned long long int, uint64_t>();
// short vs short
// - size: both == 2
// - name: both == short
// unsigned short vs unsigned short
// - size: both == 2
// - name: both == unsigned short
// int vs int
// - size: both == 4
// - name: both == int
// unsigned int vs unsigned int
// - size: both == 4
// - name: both == unsigned int
// long vs long
// - size: both == 8
// - name: both == long
// unsigned long vs unsigned long
// - size: both == 8
// - name: both == unsigned long
// long long vs long
// - size: both == 8
// - name: long long != long
// unsigned long long vs unsigned long
// - size: both == 8
// - name: unsigned long long != unsigned long
All of the types are the same, with some notable exceptions:
- The first two tests were provided specifically to ensure that an error would, in fact, be detected.
- On Ubuntu, an int64_t is implemented using long and not a long long, which means that on Ubuntu, a long and a long long mean the same thing. This is not the case with Windows.
The most important thing to recognize with this demonstration is that the output doesn't include the standard integer type names, but instead only contains the default type names. This is because, as previously demonstrated, the compiler implements an int32_t on an Intel 64 bit CPU on Ubuntu using an int, and to the compiler, these types are one and the same. The difference is, on another CPU architecture and operating system, an int32_t might be implemented using a long int.
If you care about the size of an integer type, use a standard integer type, and let the header file pick which default type to use for you. If you don't care about the size of the integer type, or an API dictates the type, leverage the default type instead. In the next section, we will show you how even standard integer types do not guarantee a specific size, and the rules just described can break down using a common system programming pattern.