subpar
Substitutable parallelization for C++ libraries
|
Substitutable parallelization functions. More...
Functions | |
template<bool nothrow_ = false, typename Task_ , class Run_ > | |
void | parallelize_simple (Task_ num_tasks, Run_ run_task) |
Parallelize individual tasks across workers. | |
template<typename Task_ > | |
int | sanitize_num_workers (int num_workers, Task_ num_tasks) |
Adjust the number of workers to the number of tasks in parallelize_range() . | |
template<bool nothrow_ = false, typename Task_ , class Run_ > | |
void | parallelize_range (int num_workers, Task_ num_tasks, Run_ run_task_range) |
Parallelize a range of tasks across multiple workers. | |
Substitutable parallelization functions.
void subpar::parallelize_range | ( | int | num_workers, |
Task_ | num_tasks, | ||
Run_ | run_task_range | ||
) |
Parallelize a range of tasks across multiple workers.
The aim is to split tasks in [0, num_tasks)
into non-overlapping contiguous intervals that are executed by different workers. In the default parallelization scheme, we create num_workers
evenly-sized intervals that are executed via OpenMP (if available) or <thread>
(otherwise). Not all workers may be used, e.g., if num_tasks < num_workers
.
The SUBPAR_USES_OPENMP_RANGE
macro will be defined as 1 if and only if OpenMP was used in the default scheme. Users can define the SUBPAR_NO_OPENMP_RANGE
macro to force parallelize_range()
to use <thread>
even if OpenMP is available. This is occasionally useful when OpenMP cannot be used in some parts of the application, e.g., with POSIX forks.
Advanced users can substitute in their own parallelization scheme by defining SUBPAR_CUSTOM_PARALLELIZE_RANGE
before including the subpar header. This should be a function-like macro that accepts the same arguments as parallelize_range()
or the name of a function that accepts the same arguments as parallelize_range()
. If defined, the custom scheme will be used instead of the default scheme whenever parallelize_range()
is called. Macro authors should note the expectations on run_task_range()
.
If nothrow_ = true
, exception handling is omitted from the default parallelization scheme. This avoids some unnecessary work when the caller knows that run_task_range()
will never throw. For custom schemes, if SUBPAR_CUSTOM_PARALLELIZE_RANGE_NOTHROW
is defined, it will be used if nothrow_ = true
; otherwise, SUBPAR_CUSTOM_PARALLELIZE_RANGE
will continue to be used.
It is worth stressing that run_task_range()
may be called multiple times in the same worker, i.e., with the same w
but different start
and length
(not necessarily contiguous). Any use of w
by run_task_range()
should be robust to any number of previous calls with the same w
. For example, if w
is used to store thread-specific results for use outside parallelize_range()
, the results should be accumulated in a manner that preserves the results of previous calls. If the code must be run exactly once per worker, consider using parallelize_simple()
instead. Developers may also consider using silly_parallelize_range()
for testing for correct use of w
.
nothrow_ | Whether the Run_ function cannot throw an exception. |
Task_ | Integer type for the number of tasks. |
Run_ | Function that accepts three arguments:
|
num_workers | Number of workers. This should be a positive integer. Any zero or negative values are treated as 1. (See also sanitize_num_workers() .) |
num_tasks | Number of tasks. This should be a non-negative integer. |
run_task_range | Function to iterate over a range of tasks within a worker. This may be called zero, one or multiple times in any particular worker. In each call:
nothrow_ = false . |
Parallelize individual tasks across workers.
The aim is to parallelize the execution of tasks across workers, under the assumption that there is a 1:1 mapping between them. This is most relevant when the overall computation has already been split up and assigned to workers outside of subpar. In such cases, parallelize_simple()
is more suitable than parallelize_range()
as it avoids the unnecessary overhead of partitioning the task interval.
The SUBPAR_USES_OPENMP_SIMPLE
macro will be defined as 1 if and only if OpenMP was used in the default scheme. Users can define the SUBPAR_NO_OPENMP_SIMPLE
macro to force parallelize_simple()
to use <thread>
even if OpenMP is available. This is occasionally useful when OpenMP cannot be used in some parts of the application, e.g., with POSIX forks.
Advanced users can substitute in their own parallelization scheme by defining SUBPAR_CUSTOM_PARALLELIZE_SIMPLE
before including the subpar header. This should be a function-like macro that accepts the same arguments as parallelize_simple()
or the name of a function that accepts the same arguments as parallelize_simple()
. If defined, the custom scheme will be used instead of the default scheme whenever parallelize_simple()
is called. Macro authors should note the expectations on run_task()
.
If nothrow_ = true
, exception handling is omitted from the default parallelization scheme. This avoids some unnecessary work when the caller knows that run_task()
will never throw. For custom schemes, if SUBPAR_CUSTOM_PARALLELIZE_SIMPLE_NOTHROW
is defined, it will be used if nothrow_ = true
; otherwise, SUBPAR_CUSTOM_PARALLELIZE_SIMPLE
will continue to be used.
nothrow_ | Whether the Run_ function cannot throw an exception. |
Task_ | Integer type for the number of tasks. |
Run_ | Function that accepts w , the index of the task (and thus the worker) as a Task_ . Any return value is ignored. |
num_tasks | Number of tasks. This is also the number of workers as we assume a 1:1 mapping between tasks and workers. |
run_task | Function to execute the task for each worker. This will be called exactly once in each worker, where w is guaranteed to be in [0, num_tasks) . This function may throw an exception if nothrow_ = false . |
Adjust the number of workers to the number of tasks in parallelize_range()
.
It is not strictly necessary to run sanitize_num_workers()
prior to parallelize_range()
as the latter will automatically behave correctly with all inputs. However, on occasion, applications need a better upper bound on the number of workers, e.g., to pre-allocate expensive per-worker data structures. In such cases, the return value of sanitize_num_workers()
can be used by the application before being passed to parallelize_range()
.
Task_ | Integer type for the number of tasks. |
num_workers | Number of workers. This may be negative or zero. |
num_tasks | Number of tasks. This should be a non-negative integer. |
num_workers
are converted to 1 if num_tasks > 0
, otherwise zero. If num_workers
is greater than num_tasks
, the former is set to the latter.