scran
C++ library for basic single-cell RNA-seq analyses
Loading...
Searching...
No Matches
Classes | Public Types | Public Member Functions | List of all members
scran::CenterSizeFactors Class Reference

Center size factors prior to scaling normalization. More...

#include <CenterSizeFactors.hpp>

Classes

struct  Defaults
 Default parameter settings. More...
 

Public Types

enum  BlockMode { PER_BLOCK , LOWEST }
 

Public Member Functions

CenterSizeFactorsset_block_mode (BlockMode b=Defaults::block_mode)
 
CenterSizeFactorsset_ignore_zeros (bool i=Defaults::ignore_zeros)
 
template<typename T >
SizeFactorValidity run (size_t n, T *size_factors) const
 
template<typename T , typename B >
SizeFactorValidity run_blocked (size_t n, T *size_factors, const B *block) const
 

Detailed Description

Center size factors prior to scaling normalization.

The aim of centering is to ensure that the normalized expression values are on roughly the same scale as the original counts. This simplifies interpretation and ensures that any added pseudo-count prior to log-transformation has a predictable shrinkage effect. The functionality in this class is used automatically by LogNormCounts but can be called separately if desired.

Member Enumeration Documentation

◆ BlockMode

Strategy for handling blocks, see set_block_mode() for details.

Member Function Documentation

◆ set_block_mode()

CenterSizeFactors & scran::CenterSizeFactors::set_block_mode ( BlockMode  b = Defaults::block_mode)
inline
Parameters
bStrategy for handling blocks in run_blocked().
Returns
A reference to this CenterSizeFactors object.

With the PER_BLOCK strategy, size factors are scaled separately for each block so that they have a mean of 1 within each block. The scaled size factors are identical to those obtained by separate invocations of CenterSizeFactors::run() on the size factors for each block. This can be desirable to ensure consistency with independent analyses of each block - otherwise, the centering would depend on the size factors across all blocks.

With the LOWEST strategy, we compute the mean size factor for each block and we divide all size factors by the minimum mean. In effect, we downscale all blocks to match the coverage of the lowest-coverage block. This is useful for datasets with highly heterogeneous coverage of different blocks as it avoids egregious upscaling of low-coverage blocks. (By contrast, downscaling is always safe as it simply discards information across all blocks by shrinking log-fold changes towards zero at low expression.)

◆ set_ignore_zeros()

CenterSizeFactors & scran::CenterSizeFactors::set_ignore_zeros ( bool  i = Defaults::ignore_zeros)
inline
Parameters
iWhether to ignore zeros when computing the mean size factor.
Returns
A reference to this CenterSizeFactors object.

While size factors of zero are generally invalid, they may occur in datasets that have not been properly filtered to remove low-quality cells. In such cases, we may wish to ignore size factors of zero so as to avoid a spurious deflation of the mean during centering. This is useful if some filtering is to be applied after normalization - by ignoring zeros now, we ensure that we get the same result as if we had removed those zeros prior to centering.

Note that non-finite size factors (e.g., Inf, NaN) are always ignored when computing the mean.

◆ run()

template<typename T >
SizeFactorValidity scran::CenterSizeFactors::run ( size_t  n,
T *  size_factors 
) const
inline
Template Parameters
TFloating-point type for the size factors.
Parameters
nNumber of size factors.
[in,out]size_factorsPointer to an array of size factors of length n. On output, entries are scaled so that their mean is equal to 1. Note that this only considers the mean across finite (and, if set_ignore_zeros() is true, positive) entries. If there are no non-zero finite size factors, no centering is performed.
Returns
Object indicating whether invalid size factors (zero or non-finite) were detected.

◆ run_blocked()

template<typename T , typename B >
SizeFactorValidity scran::CenterSizeFactors::run_blocked ( size_t  n,
T *  size_factors,
const B *  block 
) const
inline
Template Parameters
TFloating-point type for the size factors.
BAn integer type, to hold the block IDs.
Parameters
nNumber of size factors.
[in,out]size_factorsPointer to an array of positive size factors of length 1n1. On output, entries are scaled so that their mean is equal to 1 according to the strategy defined in set_block_mode(). This only considers the mean across finite (and, if set_ignore_zeros() is true, positive) entries within each block. If there are no non-zero finite size factors in a block, no centering is performed for that block.
[in]blockPointer to an array of block identifiers. If provided, the array should be of length equal to n. Values should be integer IDs in $[0, N)$ where $N$ is the number of blocks. This can also be a NULL, in which case all cells are assumed to belong to the same block.
Returns
Object indicating whether invalid size factors (zero or non-finite) were detected.

The documentation for this class was generated from the following file: