subpar
Substitutable parallelization for C++ libraries
Loading...
Searching...
No Matches
subpar Namespace Reference

Substitutable parallelization functions. More...

Functions

template<bool nothrow_ = false, typename Task_ , class Run_ >
void parallelize_simple (const Task_ num_tasks, const Run_ run_task)
 Parallelize individual tasks across workers.
 
template<typename Task_ >
int sanitize_num_workers (const int num_workers, const Task_ num_tasks)
 Adjust the number of workers to the number of tasks in parallelize_range().
 
template<bool nothrow_ = false, typename Task_ , class Run_ >
int parallelize_range (int num_workers, const Task_ num_tasks, const Run_ run_task_range)
 Parallelize a range of tasks across multiple workers.
 

Detailed Description

Substitutable parallelization functions.

Function Documentation

◆ parallelize_range()

template<bool nothrow_ = false, typename Task_ , class Run_ >
int subpar::parallelize_range ( int num_workers,
const Task_ num_tasks,
const Run_ run_task_range )

Parallelize a range of tasks across multiple workers.

This function splits the integer sequence [0, num_tasks) into non-overlapping non-empty contiguous ranges. Each range is passed to the user-supplied run_task_range() function for parallel execution by different workers via OpenMP (if available) or <thread> (otherwise). Not all workers may be used, e.g., if num_tasks < num_workers, but each worker will process no more than one range. By default, the ranges are evenly sized for efficient load-sharing across workers. The partitioning of the ranges is also deterministic - given the same num_workers and num_tasks,parallelize_range() will always call run_task_range() with the same combinations of arguments (w, start and length). This avoids stochasticity in downstream applications that perform, e.g., reductions of floating-point results generated in each worker.

The SUBPAR_USES_OPENMP_RANGE macro will be defined as 1 if and only if OpenMP was used in the default scheme. Users can define the SUBPAR_NO_OPENMP_RANGE macro to force parallelize_range() to use <thread> even if OpenMP is available. This is occasionally useful when OpenMP cannot be used in some parts of the application, e.g., with POSIX forks.

Advanced users can substitute in their own parallelization scheme by defining SUBPAR_CUSTOM_PARALLELIZE_RANGE before including the subpar header. For example, we might restrict the number of used workers to the number of physical cores available on the system, or we might create task ranges of different lengths for targeted execution on performance or efficiency cores. SUBPAR_CUSTOM_PARALLELIZE_RANGE should be a function-like macro or the name of a function that accepts the same arguments as parallelize_range(), partitions the tasks into num_workers or fewer ranges, calls run_task_range() on each task range, and returns the number of used workers. All expectations for the arguments and return value for parallelize_range() are still applicable here. Partitioning of task ranges should be deterministic but can vary across compute environments, e.g., with different numbers of available cores. Once the macro is defined, the custom scheme will be used instead of the default scheme whenever parallelize_range() is called.

If nothrow_ = true, exception handling is omitted from the default parallelization scheme. This avoids some unnecessary work when the caller knows that run_task_range() will never throw. For custom schemes, if SUBPAR_CUSTOM_PARALLELIZE_RANGE_NOTHROW is defined, it will be used if nothrow_ = true; otherwise, SUBPAR_CUSTOM_PARALLELIZE_RANGE will continue to be used. Any definition of SUBPAR_CUSTOM_PARALLELIZE_RANGE_NOTHROW should follow the same rules described above for SUBPAR_CUSTOM_PARALLELIZE_RANGE.

A worker ID of zero may or may not indicate that execution is being performed on the main thread. This relationship is true for the default <thread>-based implementation but may not be for OpenMP. (Note that the OpenMP thread number is not the same as the worker ID.) Custom overrides may also use a non-main thread to execute run_task_range() with w = 0.

Template Parameters
nothrow_Whether the Run_ function cannot throw an exception.
Task_Integer type for the number of tasks.
Run_Function that accepts three arguments:
  • w, the identity of the worker executing this task range. This will be passed as an int in [0, num_workers).
  • start, the start index of the task range. This will be passed as a Task_ in [0, num_tasks).
  • length, the number of tasks in the task range. This will be passed as a Task_ in (0, num_tasks), i.e., it is guaranteed to be positive.
Any return value is ignored.
Parameters
num_workersNumber of workers. This should be a positive integer. Any zero or negative values are treated as 1. (See also sanitize_num_workers().)
num_tasksNumber of tasks. This should be a non-negative integer.
run_task_rangeFunction to iterate over a range of tasks within a worker. This will be called no more than once in each worker. In each call:
  • w is guaranteed to be in [0, K) where K is the return value of parallelize_range(). K itself is guaranteed to be no greater than num_workers.
  • [start, start + length) is guaranteed to be a non-empty range of tasks that lies in [0, num_tasks). It will not overlap with the task range used in any other call to run_task_range() in the same call to parallelize_range().
This function may throw an exception if nothrow_ = false.
Returns
The number of workers (K) that were actually used. This is guaranteed to be no greater than num_workers (or 1, if num_workers is not positive). It can be assumed that run_task_range was called once for each of [0, 1, ..., K-1], where the union of task ranges across all K workers is [0, num_tasks).

◆ parallelize_simple()

template<bool nothrow_ = false, typename Task_ , class Run_ >
void subpar::parallelize_simple ( const Task_ num_tasks,
const Run_ run_task )

Parallelize individual tasks across workers.

The aim is to parallelize the execution of tasks across workers, under the assumption that there is a 1:1 mapping between them. This is most relevant when the overall computation has already been split up and assigned to workers outside of subpar. In such cases, parallelize_simple() is more suitable than parallelize_range() as it avoids the unnecessary overhead of partitioning the task interval.

The SUBPAR_USES_OPENMP_SIMPLE macro will be defined as 1 if and only if OpenMP was used in the default scheme. Users can define the SUBPAR_NO_OPENMP_SIMPLE macro to force parallelize_simple() to use <thread> even if OpenMP is available. This is occasionally useful when OpenMP cannot be used in some parts of the application, e.g., with POSIX forks.

Advanced users can substitute in their own parallelization scheme by defining SUBPAR_CUSTOM_PARALLELIZE_SIMPLE before including the subpar header. This should be a function-like macro that accepts the same arguments as parallelize_simple() or the name of a function that accepts the same arguments as parallelize_simple(). If defined, the custom scheme will be used instead of the default scheme whenever parallelize_simple() is called. Macro authors should note the expectations on run_task().

If nothrow_ = true, exception handling is omitted from the default parallelization scheme. This avoids some unnecessary work when the caller knows that run_task() will never throw. For custom schemes, if SUBPAR_CUSTOM_PARALLELIZE_SIMPLE_NOTHROW is defined, it will be used if nothrow_ = true; otherwise, SUBPAR_CUSTOM_PARALLELIZE_SIMPLE will continue to be used.

A worker ID of zero may or may not indicate that execution is being performed on the main thread. This relationship is true for the default <thread>-based implementation but may not be for OpenMP. (Note that the OpenMP thread number is not the same as the worker ID.) Custom overrides may also use a non-main thread to execute run_task_range() with w = 0.

Template Parameters
nothrow_Whether the Run_ function cannot throw an exception.
Task_Integer type for the number of tasks.
Run_Function that accepts w, the index of the task (and thus the worker ID) as a Task_. Any return value is ignored.
Parameters
num_tasksNumber of tasks. This is also the number of workers as we assume a 1:1 mapping between tasks and workers. It should be non-negative.
run_taskFunction to execute each task. This will be called exactly once in its corresponding worker, where w is guaranteed to be in [0, num_tasks). This function may throw an exception if nothrow_ = false.

◆ sanitize_num_workers()

template<typename Task_ >
int subpar::sanitize_num_workers ( const int num_workers,
const Task_ num_tasks )

Adjust the number of workers to the number of tasks in parallelize_range().

It is not strictly necessary to run sanitize_num_workers() prior to parallelize_range() as the latter will automatically behave correctly with all inputs. However, on occasion, applications need a better upper bound on the number of workers, e.g., to pre-allocate expensive per-worker data structures. This upper bound can be obtained by sanitize_num_workers() to refine the number of workers prior to calling parallelize_range().

Template Parameters
Task_Integer type for the number of tasks.
Parameters
num_workersNumber of workers. This may be negative or zero.
num_tasksNumber of tasks. This should be a non-negative integer.
Returns
A more suitable number of workers, possibly zero. The return value of sanitize_num_workers() will be an upper bound to the return value of parallelize_range() with the same num_workers and num_tasks.