scran
C++ library for basic single-cell RNA-seq analyses
|
Sanitize invalid size factors. More...
#include <SanitizeSizeFactors.hpp>
Classes | |
struct | Defaults |
Default parameters. More... | |
Public Types | |
enum class | HandlerAction : char { IGNORE , ERROR , SANITIZE } |
Public Member Functions | |
SanitizeSizeFactors & | set_handle_zero (HandlerAction h=Defaults::handle_zero) |
SanitizeSizeFactors & | set_handle_negative (HandlerAction h=Defaults::handle_negative) |
SanitizeSizeFactors & | set_handle_nan (HandlerAction h=Defaults::handle_nan) |
SanitizeSizeFactors & | set_handle_infinite (HandlerAction h=Defaults::handle_infinite) |
SanitizeSizeFactors & | set_handle_non_positive (HandlerAction h) |
SanitizeSizeFactors & | set_handle_non_finite (HandlerAction h) |
template<typename T > | |
void | run (size_t n, T *size_factors, const SizeFactorValidity &status) const |
template<typename T > | |
void | run (size_t n, T *size_factors) const |
Sanitize invalid size factors.
Replace zero, missing or infinite values in the size factor array so that it can be used to compute well-defined normalized values. Such size factors can occasionally arise if, e.g., insufficient quality control was performed upstream. Check out the documentation in set_handle_zero()
, set_handle_negative()
, and set_handle_nan()
and set_handle_infinite()
for more details.
missing size factors with unity, and infinite size factors with the largest finite size factor.
|
strong |
How invalid size factors should be handled:
IGNORE
: ignore invalid size factors with no error or change.ERROR
: throw an error.SANITIZE
: fix each invalid size factor.
|
inline |
How should we handle zero size factors? If SANITIZE
, they will be automatically set to the smallest valid size factor (or 1, if all size factors are invalid).
This approach is motivated by the observation that size factors of zero are typically generated from all-zero cells. By replacing the size factor with a finite value, we ensure that any all-zero cells are represented by all-zero columns in the normalized matrix, which is a reasonable outcome if those cells cannot be filtered out during upstream quality control.
We also need to handle cases where a zero size factor may be generated from a cell with non-zero rows, e.g., with MedianSizeFactors
. By using a "relatively small" replacement value, we ensure that the normalized values reflect the extremity of the scaling.
h | How to handle a size factor of zero. |
SanitizeSizeFactors
object.
|
inline |
How should we handle negative size factors? If SANITIZE
, they will be automatically set to the smallest valid size factor (or 1, if all size factors are invalid). This approach follows the same logic as set_handle_zero()
, though negative size factors are quite unusual.
h | How to handle a negative size factor. |
SanitizeSizeFactors
object.
|
inline |
How should we handle NaN size factors? If `SANITIZE, NaN size factors will be automatically set to 1, meaning that scaling is a no-op.
h | How to handle NaN size factors. |
SanitizeSizeFactors
object.
|
inline |
How shuld be handle infinite size factors. If SANITIZE
, infinite size factors will be automatically set to the largest valid size factor (or 1, if all size factors are invalid). This ensures that any normalized values will be, at least, finite; the choice of a relatively large replacement value reflects the extremity of the scaling.
h | How to handle infinite size factors. |
SanitizeSizeFactors
object.
|
inline |
Wrapper to both set_handle_zero()
and set_handle_negative()
in one call.
h | How to handle non-positive size factors. |
SanitizeSizeFactors
object.
|
inline |
Wrapper to both set_handle_infinte()
and set_handle_nan()
in one call.
h | How to handle non-finite size factors. |
SanitizeSizeFactors
object.
|
inline |
T | Floating-point type for the size factors. |
n | Number of size factors. | |
[in,out] | size_factors | Pointer to an array of positive size factors of length n . On output, invalid size factors are replaced. |
status | A pre-computed object indicating whether invalid size factors are present in size_factors . This can be useful if this information is already provided by, e.g., CenterSizeFactors::run() . |
|
inline |
T | Floating-point type for the size factors. |
n | Number of size factors. | |
[in,out] | size_factors | Pointer to an array of positive size factors of length n . On output, invalid size factors are replaced. |