|
subpar
Substitutable parallelization for C++ libraries
|
Substitutable parallelization functions. More...
Functions | |
| template<bool nothrow_ = false, typename Task_ , class Run_ > | |
| void | parallelize_simple (const Task_ num_tasks, const Run_ run_task) |
| Parallelize individual tasks across workers. | |
| template<typename Task_ > | |
| int | sanitize_num_workers (const int num_workers, const Task_ num_tasks) |
Adjust the number of workers to the number of tasks in parallelize_range(). | |
| template<bool nothrow_ = false, typename Task_ , class Run_ > | |
| int | parallelize_range (int num_workers, const Task_ num_tasks, const Run_ run_task_range) |
| Parallelize a range of tasks across multiple workers. | |
Substitutable parallelization functions.
| int subpar::parallelize_range | ( | int | num_workers, |
| const Task_ | num_tasks, | ||
| const Run_ | run_task_range ) |
Parallelize a range of tasks across multiple workers.
This function splits the integer sequence [0, num_tasks) into non-overlapping non-empty contiguous ranges. Each range is passed to the user-supplied run_task_range() function for parallel execution by different workers via OpenMP (if available) or <thread> (otherwise). Not all workers may be used, e.g., if num_tasks < num_workers, but each worker will process no more than one range. By default, the ranges are evenly sized for efficient load-sharing across workers. The partitioning of the ranges is also deterministic - given the same num_workers and num_tasks,parallelize_range() will always call run_task_range() with the same combinations of arguments (w, start and length). This avoids stochasticity in downstream applications that perform, e.g., reductions of floating-point results generated in each worker.
The SUBPAR_USES_OPENMP_RANGE macro will be defined as 1 if and only if OpenMP was used in the default scheme. Users can define the SUBPAR_NO_OPENMP_RANGE macro to force parallelize_range() to use <thread> even if OpenMP is available. This is occasionally useful when OpenMP cannot be used in some parts of the application, e.g., with POSIX forks.
Advanced users can substitute in their own parallelization scheme by defining SUBPAR_CUSTOM_PARALLELIZE_RANGE before including the subpar header. For example, we might restrict the number of used workers to the number of physical cores available on the system, or we might create task ranges of different lengths for targeted execution on performance or efficiency cores. SUBPAR_CUSTOM_PARALLELIZE_RANGE should be a function-like macro or the name of a function that accepts the same arguments as parallelize_range(), partitions the tasks into num_workers or fewer ranges, calls run_task_range() on each task range, and returns the number of used workers. All expectations for the arguments and return value for parallelize_range() are still applicable here. Partitioning of task ranges should be deterministic but can vary across compute environments, e.g., with different numbers of available cores. Once the macro is defined, the custom scheme will be used instead of the default scheme whenever parallelize_range() is called.
If nothrow_ = true, exception handling is omitted from the default parallelization scheme. This avoids some unnecessary work when the caller knows that run_task_range() will never throw. For custom schemes, if SUBPAR_CUSTOM_PARALLELIZE_RANGE_NOTHROW is defined, it will be used if nothrow_ = true; otherwise, SUBPAR_CUSTOM_PARALLELIZE_RANGE will continue to be used. Any definition of SUBPAR_CUSTOM_PARALLELIZE_RANGE_NOTHROW should follow the same rules described above for SUBPAR_CUSTOM_PARALLELIZE_RANGE.
A worker ID of zero may or may not indicate that execution is being performed on the main thread. This relationship is true for the default <thread>-based implementation but may not be for OpenMP. (Note that the OpenMP thread number is not the same as the worker ID.) Custom overrides may also use a non-main thread to execute run_task_range() with w = 0.
| nothrow_ | Whether the Run_ function cannot throw an exception. |
| Task_ | Integer type for the number of tasks. |
| Run_ | Function that accepts three arguments:
|
| num_workers | Number of workers. This should be a positive integer. Any zero or negative values are treated as 1. (See also sanitize_num_workers().) |
| num_tasks | Number of tasks. This should be a non-negative integer. |
| run_task_range | Function to iterate over a range of tasks within a worker. This will be called no more than once in each worker. In each call:
nothrow_ = false. |
K) that were actually used. This is guaranteed to be no greater than num_workers (or 1, if num_workers is not positive). It can be assumed that run_task_range was called once for each of [0, 1, ..., K-1], where the union of task ranges across all K workers is [0, num_tasks). | void subpar::parallelize_simple | ( | const Task_ | num_tasks, |
| const Run_ | run_task ) |
Parallelize individual tasks across workers.
The aim is to parallelize the execution of tasks across workers, under the assumption that there is a 1:1 mapping between them. This is most relevant when the overall computation has already been split up and assigned to workers outside of subpar. In such cases, parallelize_simple() is more suitable than parallelize_range() as it avoids the unnecessary overhead of partitioning the task interval.
The SUBPAR_USES_OPENMP_SIMPLE macro will be defined as 1 if and only if OpenMP was used in the default scheme. Users can define the SUBPAR_NO_OPENMP_SIMPLE macro to force parallelize_simple() to use <thread> even if OpenMP is available. This is occasionally useful when OpenMP cannot be used in some parts of the application, e.g., with POSIX forks.
Advanced users can substitute in their own parallelization scheme by defining SUBPAR_CUSTOM_PARALLELIZE_SIMPLE before including the subpar header. This should be a function-like macro that accepts the same arguments as parallelize_simple() or the name of a function that accepts the same arguments as parallelize_simple(). If defined, the custom scheme will be used instead of the default scheme whenever parallelize_simple() is called. Macro authors should note the expectations on run_task().
If nothrow_ = true, exception handling is omitted from the default parallelization scheme. This avoids some unnecessary work when the caller knows that run_task() will never throw. For custom schemes, if SUBPAR_CUSTOM_PARALLELIZE_SIMPLE_NOTHROW is defined, it will be used if nothrow_ = true; otherwise, SUBPAR_CUSTOM_PARALLELIZE_SIMPLE will continue to be used.
A worker ID of zero may or may not indicate that execution is being performed on the main thread. This relationship is true for the default <thread>-based implementation but may not be for OpenMP. (Note that the OpenMP thread number is not the same as the worker ID.) Custom overrides may also use a non-main thread to execute run_task_range() with w = 0.
| nothrow_ | Whether the Run_ function cannot throw an exception. |
| Task_ | Integer type for the number of tasks. |
| Run_ | Function that accepts w, the index of the task (and thus the worker ID) as a Task_. Any return value is ignored. |
| num_tasks | Number of tasks. This is also the number of workers as we assume a 1:1 mapping between tasks and workers. It should be non-negative. |
| run_task | Function to execute each task. This will be called exactly once in its corresponding worker, where w is guaranteed to be in [0, num_tasks). This function may throw an exception if nothrow_ = false. |
| int subpar::sanitize_num_workers | ( | const int | num_workers, |
| const Task_ | num_tasks ) |
Adjust the number of workers to the number of tasks in parallelize_range().
It is not strictly necessary to run sanitize_num_workers() prior to parallelize_range() as the latter will automatically behave correctly with all inputs. However, on occasion, applications need a better upper bound on the number of workers, e.g., to pre-allocate expensive per-worker data structures. This upper bound can be obtained by sanitize_num_workers() to refine the number of workers prior to calling parallelize_range().
| Task_ | Integer type for the number of tasks. |
| num_workers | Number of workers. This may be negative or zero. |
| num_tasks | Number of tasks. This should be a non-negative integer. |
sanitize_num_workers() will be an upper bound to the return value of parallelize_range() with the same num_workers and num_tasks.