subpar
Substitutable parallelization for C++ libraries
Loading...
Searching...
No Matches
Functions
subpar Namespace Reference

Substitutable parallelization functions. More...

Functions

template<bool nothrow_ = false, typename Task_ , class Run_ >
void parallelize_simple (Task_ num_tasks, Run_ run_task)
 Parallelize individual tasks across workers.
 
template<typename Task_ >
int sanitize_num_workers (int num_workers, Task_ num_tasks)
 Adjust the number of workers to the number of tasks in parallelize_range().
 
template<bool nothrow_ = false, typename Task_ , class Run_ >
void parallelize_range (int num_workers, Task_ num_tasks, Run_ run_task_range)
 Parallelize a range of tasks across multiple workers.
 

Detailed Description

Substitutable parallelization functions.

Function Documentation

◆ parallelize_range()

template<bool nothrow_ = false, typename Task_ , class Run_ >
void subpar::parallelize_range ( int  num_workers,
Task_  num_tasks,
Run_  run_task_range 
)

Parallelize a range of tasks across multiple workers.

The aim is to split tasks in [0, num_tasks) into non-overlapping contiguous intervals that are executed by different workers. In the default parallelization scheme, we create num_workers evenly-sized intervals that are executed via OpenMP (if available) or <thread> (otherwise). Not all workers may be used, e.g., if num_tasks < num_workers.

The SUBPAR_USES_OPENMP_RANGE macro will be defined as 1 if and only if OpenMP was used in the default scheme. Users can define the SUBPAR_NO_OPENMP_RANGE macro to force parallelize_range() to use <thread> even if OpenMP is available. This is occasionally useful when OpenMP cannot be used in some parts of the application, e.g., with POSIX forks.

Advanced users can substitute in their own parallelization scheme by defining SUBPAR_CUSTOM_PARALLELIZE_RANGE before including the subpar header. This should be a function-like macro that accepts the same arguments as parallelize_range() or the name of a function that accepts the same arguments as parallelize_range(). If defined, the custom scheme will be used instead of the default scheme whenever parallelize_range() is called. Macro authors should note the expectations on run_task_range().

If nothrow_ = true, exception handling is omitted from the default parallelization scheme. This avoids some unnecessary work when the caller knows that run_task_range() will never throw. For custom schemes, if SUBPAR_CUSTOM_PARALLELIZE_RANGE_NOTHROW is defined, it will be used if nothrow_ = true; otherwise, SUBPAR_CUSTOM_PARALLELIZE_RANGE will continue to be used.

It is worth stressing that run_task_range() may be called multiple times in the same worker, i.e., with the same w but different start and length (not necessarily contiguous). Any use of w by run_task_range() should be robust to any number of previous calls with the same w. For example, if w is used to store thread-specific results for use outside parallelize_range(), the results should be accumulated in a manner that preserves the results of previous calls. If the code must be run exactly once per worker, consider using parallelize_simple() instead. Developers may also consider using silly_parallelize_range() for testing for correct use of w.

Template Parameters
nothrow_Whether the Run_ function cannot throw an exception.
Task_Integer type for the number of tasks.
Run_Function that accepts three arguments:
  • w, the identity of the worker executing this task range. This will be passed as an int.
  • start, the start index of the task range. This will be passed as a Task_.
  • length, the number of tasks in the task range. This will be passed as a Task_.
Any return value is ignored.
Parameters
num_workersNumber of workers. This should be a positive integer. Any zero or negative values are treated as 1. (See also sanitize_num_workers().)
num_tasksNumber of tasks. This should be a non-negative integer.
run_task_rangeFunction to iterate over a range of tasks within a worker. This may be called zero, one or multiple times in any particular worker. In each call:
  • w is guaranteed to be in [0, num_workers).
  • [start, start + length) is guaranteed to be a non-empty range of tasks that lies in [0, num_tasks). It will not overlap with any other range in any other call to run_task_range().
This function may throw an exception if nothrow_ = false.

◆ parallelize_simple()

template<bool nothrow_ = false, typename Task_ , class Run_ >
void subpar::parallelize_simple ( Task_  num_tasks,
Run_  run_task 
)

Parallelize individual tasks across workers.

The aim is to parallelize the execution of tasks across workers, under the assumption that there is a 1:1 mapping between them. This is most relevant when the overall computation has already been split up and assigned to workers outside of subpar. In such cases, parallelize_simple() is more suitable than parallelize_range() as it avoids the unnecessary overhead of partitioning the task interval.

The SUBPAR_USES_OPENMP_SIMPLE macro will be defined as 1 if and only if OpenMP was used in the default scheme. Users can define the SUBPAR_NO_OPENMP_SIMPLE macro to force parallelize_simple() to use <thread> even if OpenMP is available. This is occasionally useful when OpenMP cannot be used in some parts of the application, e.g., with POSIX forks.

Advanced users can substitute in their own parallelization scheme by defining SUBPAR_CUSTOM_PARALLELIZE_SIMPLE before including the subpar header. This should be a function-like macro that accepts the same arguments as parallelize_simple() or the name of a function that accepts the same arguments as parallelize_simple(). If defined, the custom scheme will be used instead of the default scheme whenever parallelize_simple() is called. Macro authors should note the expectations on run_task().

If nothrow_ = true, exception handling is omitted from the default parallelization scheme. This avoids some unnecessary work when the caller knows that run_task() will never throw. For custom schemes, if SUBPAR_CUSTOM_PARALLELIZE_SIMPLE_NOTHROW is defined, it will be used if nothrow_ = true; otherwise, SUBPAR_CUSTOM_PARALLELIZE_SIMPLE will continue to be used.

Template Parameters
nothrow_Whether the Run_ function cannot throw an exception.
Task_Integer type for the number of tasks.
Run_Function that accepts w, the index of the task (and thus the worker) as a Task_. Any return value is ignored.
Parameters
num_tasksNumber of tasks. This is also the number of workers as we assume a 1:1 mapping between tasks and workers.
run_taskFunction to execute the task for each worker. This will be called exactly once in each worker, where w is guaranteed to be in [0, num_tasks). This function may throw an exception if nothrow_ = false.

◆ sanitize_num_workers()

template<typename Task_ >
int subpar::sanitize_num_workers ( int  num_workers,
Task_  num_tasks 
)

Adjust the number of workers to the number of tasks in parallelize_range().

It is not strictly necessary to run sanitize_num_workers() prior to parallelize_range() as the latter will automatically behave correctly with all inputs. However, on occasion, applications need a better upper bound on the number of workers, e.g., to pre-allocate expensive per-worker data structures. In such cases, the return value of sanitize_num_workers() can be used by the application before being passed to parallelize_range().

Template Parameters
Task_Integer type for the number of tasks.
Parameters
num_workersNumber of workers. This may be negative or zero.
num_tasksNumber of tasks. This should be a non-negative integer.
Returns
A more suitable number of workers. Negative or zero num_workers are converted to 1 if num_tasks > 0, otherwise zero. If num_workers is greater than num_tasks, the former is set to the latter.