scran
C++ library for basic single-cell RNA-seq analyses
Loading...
Searching...
No Matches
Classes | Public Member Functions | List of all members
scran::GroupedSizeFactors Class Reference

Compute grouped size factors to handle composition bias. More...

#include <GroupedSizeFactors.hpp>

Classes

struct  Defaults
 Default parameter settings. More...
 
struct  Results
 Result of the size factor calculations. More...
 

Public Member Functions

GroupedSizeFactorsset_center (bool c=Defaults::center)
 
GroupedSizeFactorsset_prior_count (double p=MedianSizeFactors::Defaults::prior_count)
 
GroupedSizeFactorsset_handle_zeros (bool z=Defaults::handle_zeros)
 
GroupedSizeFactorsset_handle_non_finite (bool n=Defaults::handle_non_finite)
 
GroupedSizeFactorsset_num_threads (int n=Defaults::num_threads)
 
template<typename T , typename IDX , typename Group , typename Out >
void run (const tatami::Matrix< T, IDX > *mat, const Group *group, Out *output) const
 
template<typename T , typename IDX , typename Group , typename Out >
void run (const tatami::Matrix< T, IDX > *mat, const Group *group, size_t reference, Out *output) const
 
template<typename Out = double, typename T , typename IDX , typename Group >
Results< Out > run (const tatami::Matrix< T, IDX > *mat, const Group *group) const
 
template<typename Out = double, typename T , typename IDX , typename Group >
Results< Out > run (const tatami::Matrix< T, IDX > *mat, const Group *group, size_t reference) const
 

Detailed Description

Compute grouped size factors to handle composition bias.

This implements the grouping approach described in Lun et al. (2016) whereby groups/clusters of cells are used to construct pseudo-cells. These pseudo-cells are normalized against each other using median-based size factors (see MedianSizeFactors) to obtain group-specific scaling factors. Each cell is then normalized against its pseudo-cell using the library size; each cell's size factor is defined as product of its library size-based factor and the median-based factor for its group.

This strategy leverages the reduced sparsity of the pseudo-cells to obtain sensible median-based size factors for removing composition biases, while still generating per-cell factors for computing a normalized single-cell expression matrix in LogNormCounts. The assumption is that there are no composition biases within each group; thus, the supplied groupings should correspond to subpopulations in the data, typically generated by clustering.

See also
Lun ATL, Bach K and Marioni JC (2016). Pooling across cells to normalize single-cell RNA sequencing data with many zero counts. Genome Biol. 17:75

Member Function Documentation

◆ set_center()

GroupedSizeFactors & scran::GroupedSizeFactors::set_center ( bool  c = Defaults::center)
inline
Parameters
cWhether to center the size factors to have a mean of unity. This is usually desirable for interpretation of relative values.
Returns
A reference to this GroupedSizeFactors object.

For more control over centering, this can be set to false and the resulting size factors can be passed to CenterSizeFactors.

◆ set_prior_count()

GroupedSizeFactors & scran::GroupedSizeFactors::set_prior_count ( double  p = MedianSizeFactors::Defaults::prior_count)
inline
Parameters
pPrior count for the library size shrinkage, see MedianSizeFactors::set_prior_count() for details.
Returns
A reference to this GroupedSizeFactors object.

◆ set_handle_zeros()

GroupedSizeFactors & scran::GroupedSizeFactors::set_handle_zeros ( bool  z = Defaults::handle_zeros)
inline

Should we gracefully handle size factors of zero for the pseudo-cells? Note that this does not sanitize the per-cell size factors - to do so, users should call SanitizeSizeFactors separately on the output of run().

Parameters
zWhether to replace pseudo-cell size factors of zero with the smallest non-zero size factor across pseudo-cells, see SanitizeSizeFactors::set_handle_zeros() for more details.
Returns
A reference to this GroupedSizeFactors object.

◆ set_handle_non_finite()

GroupedSizeFactors & scran::GroupedSizeFactors::set_handle_non_finite ( bool  n = Defaults::handle_non_finite)
inline

Should we gracefully handle non-finite size factors of zero for the pseudo-cells? Note that this does not sanitize the per-cell size factors - to do so, users should call SanitizeSizeFactors separately on the output of run().

Parameters
nWhether to replace non-finite pseudo-cell size factors with the largest finite size factor across pseudo-cells, see SanitizeSizeFactors::set_handle_non_finite() for more details.
Returns
A reference to this GroupedSizeFactors object.

◆ set_num_threads()

GroupedSizeFactors & scran::GroupedSizeFactors::set_num_threads ( int  n = Defaults::num_threads)
inline
Parameters
nNumber of threads to use.
Returns
A reference to this AggregateAcrossCells object.

◆ run() [1/4]

template<typename T , typename IDX , typename Group , typename Out >
void scran::GroupedSizeFactors::run ( const tatami::Matrix< T, IDX > *  mat,
const Group *  group,
Out *  output 
) const
inline

Compute per-cell size factors based on user-supplied groupings. The reference group is automatically determined from the pseudo-cell with the highest sum of root-counts, inspired by the calcNormFactors function from the edgeR R package. This approach favors higher-coverage libraries with decent transcriptomic complexity.

Template Parameters
TNumeric data type of the input matrix.
IDXInteger index type of the input matrix.
GroupInteger type for the groupings.
OutNumeric data type of the output vector.
Parameters
matMatrix containing non-negative expression data, usually counts. Rows should be genes and columns should be cells.
[in]groupPointer to an array of group identifiers, of length equal to the number of columns in mat. Values should be integers in $[0, N)$ where $N$ is the total number of groups.
[out]outputPointer to an array to use to store the output size factors. This should be of length equal to the number of columns in mat.

◆ run() [2/4]

template<typename T , typename IDX , typename Group , typename Out >
void scran::GroupedSizeFactors::run ( const tatami::Matrix< T, IDX > *  mat,
const Group *  group,
size_t  reference,
Out *  output 
) const
inline

Compute per-cell size factors based on user-supplied groupings and a user-specified reference group.

Template Parameters
TNumeric data type of the input matrix.
IDXInteger index type of the input matrix.
GroupInteger type for the groupings.
OutNumeric data type of the output vector.
Parameters
matMatrix containing non-negative expression data, usually counts. Rows should be genes and columns should be cells.
[in]groupPointer to an array of group identifiers, of length equal to the number of columns in mat. Values should be integers in $[0, N)$ where $N$ is the total number of groups.
referenceIdentifier of the group to use as the reference. This should be an integer in $[0, N)$.
[out]outputPointer to an array to use to store the output size factors. This should be of length equal to the number of columns in mat.

◆ run() [3/4]

template<typename Out = double, typename T , typename IDX , typename Group >
Results< Out > scran::GroupedSizeFactors::run ( const tatami::Matrix< T, IDX > *  mat,
const Group *  group 
) const
inline

Compute per-cell size factors based on user-supplied groupings. The reference sample is automatically chosen, see run() for details.

Template Parameters
TNumeric data type of the input matrix.
IDXInteger index type of the input matrix.
GroupInteger type for the groupings.
Parameters
matMatrix containing non-negative expression data, usually counts. Rows should be genes and columns should be cells.
[in]groupPointer to an array of group identifiers, of length equal to the number of columns in mat. Values should be integers in $[0, N)$ where $N$ is the total number of groups.
Returns
A Results object is returned containing the size factors.

◆ run() [4/4]

template<typename Out = double, typename T , typename IDX , typename Group >
Results< Out > scran::GroupedSizeFactors::run ( const tatami::Matrix< T, IDX > *  mat,
const Group *  group,
size_t  reference 
) const
inline

Compute per-cell size factors based on user-supplied groupings and a user-specified grouping.

Template Parameters
TNumeric data type of the input matrix.
IDXInteger index type of the input matrix.
GroupInteger type for the groupings.
Parameters
matMatrix containing non-negative expression data, usually counts. Rows should be genes and columns should be cells.
[in]groupPointer to an array of group identifiers, of length equal to the number of columns in mat. Values should be integers in $[0, N)$ where $N$ is the total number of groups.
referenceIdentifier of the group to use as the reference. This should be an integer in $[0, N)$.
Returns
A Results object is returned containing the size factors.

The documentation for this class was generated from the following file: