sanisizer
Sanitize sizes to avoid integer overflow
|
Both new
and many constructors of STL containers accept an integer that specifies the size of the array/container. Any user-supplied value will be implicitly cast to the expected type of the argument - std::size_t
for new
, and some size_type
for container constructors. In some cases, the implicit cast could silently overflow, resulting in a smaller array/container than expected. This library provides a few methods for sanitizing size values so that any overflow results in an error.
Given an integer, we use sanizer::cast()
to convert it to the expected type of the size for our array/container. This will throw an error if the value of our integer would cause an overflow.
We could also use sanisizer::create()
, which creates a new container instance with less of the type-deducation boilerplate:
See the reference documentation for more details.
A related problem is to cap a value at the maximum of its type, typically when defining defaults for function arguments or class members. This avoids compile-time overflow (and the associated compiler warnings) and ensures that a sane value is used. For example, for data member defaults:
We can also use this inside function arguments:
Sometimes, we need to perform some arithmetic to determine the size of our array/container. For example, if we are creating an array that is the concatenation of smaller arrays, we would need to add the sizes of the latter. This summation needs to be checked for overflow using the sum()
functions.
When allocating contiguous memory for a high-dimensional array, we need to compute the product of the dimension extents. Again, this calculation can be done safely by using the product()
functions to check for overflow.
Consider an N-dimensional array of dimensions (d1, d2, ..., dN)
that is flattened and stored contiguously in memory. Let the first dimension be the fastest changing, then the second, and so on. If we want to access element (x1, x2, ..., xN)
, we would compute the offset:
If x1
, d1
, x2
, etc. are of a smaller type than the array/container's size, the intermediate sums and products could overflow, even if the final offset itself would be representable by the array/container size type. To avoid this, we provide the nd_offset()
function, which casts all inputs to the size type before performing calculations.
The idea is to use nd_offset()
within tight loops - say, for accessing the 10-th column of a row-major matrix:
(One might think that it would be better to compute an initial offset outside of the loop and just add NC
in each loop iteration to avoid the multiplication. However, modern compilers - well, Clang and GCC, at least - will optimize out the multiplication so there is no performance penalty in practice. The above approach is easier to reason about and is more amenable to vectorization as there are no dependencies in the loop body. Importantly, it avoids overflow from adding NC
in the final iteration, which could be undefined behavior if the size type is signed.)
FetchContent
If you're using CMake, you just need to add something like this to your CMakeLists.txt
:
Then you can link to sanisizer to make the headers available during compilation:
find_package()
You can install the library by cloning a suitable version of this repository and running the following commands:
Then you can use find_package()
as usual:
If you're not using CMake, the simple approach is to just copy the files in the include/
subdirectory - either directly or with Git submodules - and include their path during compilation with, e.g., GCC's -I
.