scran
C++ library for basic single-cell RNA-seq analyses
|
Compute grouped size factors to handle composition bias. More...
#include <GroupedSizeFactors.hpp>
Classes | |
struct | Defaults |
Default parameter settings. More... | |
struct | Results |
Result of the size factor calculations. More... | |
Public Member Functions | |
GroupedSizeFactors & | set_center (bool c=Defaults::center) |
GroupedSizeFactors & | set_prior_count (double p=MedianSizeFactors::Defaults::prior_count) |
GroupedSizeFactors & | set_handle_zeros (bool z=Defaults::handle_zeros) |
GroupedSizeFactors & | set_handle_non_finite (bool n=Defaults::handle_non_finite) |
GroupedSizeFactors & | set_num_threads (int n=Defaults::num_threads) |
template<typename T , typename IDX , typename Group , typename Out > | |
void | run (const tatami::Matrix< T, IDX > *mat, const Group *group, Out *output) const |
template<typename T , typename IDX , typename Group , typename Out > | |
void | run (const tatami::Matrix< T, IDX > *mat, const Group *group, size_t reference, Out *output) const |
template<typename Out = double, typename T , typename IDX , typename Group > | |
Results< Out > | run (const tatami::Matrix< T, IDX > *mat, const Group *group) const |
template<typename Out = double, typename T , typename IDX , typename Group > | |
Results< Out > | run (const tatami::Matrix< T, IDX > *mat, const Group *group, size_t reference) const |
Compute grouped size factors to handle composition bias.
This implements the grouping approach described in Lun et al. (2016) whereby groups/clusters of cells are used to construct pseudo-cells. These pseudo-cells are normalized against each other using median-based size factors (see MedianSizeFactors
) to obtain group-specific scaling factors. Each cell is then normalized against its pseudo-cell using the library size; each cell's size factor is defined as product of its library size-based factor and the median-based factor for its group.
This strategy leverages the reduced sparsity of the pseudo-cells to obtain sensible median-based size factors for removing composition biases, while still generating per-cell factors for computing a normalized single-cell expression matrix in LogNormCounts
. The assumption is that there are no composition biases within each group; thus, the supplied groupings should correspond to subpopulations in the data, typically generated by clustering.
|
inline |
c | Whether to center the size factors to have a mean of unity. This is usually desirable for interpretation of relative values. |
GroupedSizeFactors
object.For more control over centering, this can be set to false
and the resulting size factors can be passed to CenterSizeFactors
.
|
inline |
p | Prior count for the library size shrinkage, see MedianSizeFactors::set_prior_count() for details. |
GroupedSizeFactors
object.
|
inline |
Should we gracefully handle size factors of zero for the pseudo-cells? Note that this does not sanitize the per-cell size factors - to do so, users should call SanitizeSizeFactors
separately on the output of run()
.
z | Whether to replace pseudo-cell size factors of zero with the smallest non-zero size factor across pseudo-cells, see SanitizeSizeFactors::set_handle_zeros() for more details. |
GroupedSizeFactors
object.
|
inline |
Should we gracefully handle non-finite size factors of zero for the pseudo-cells? Note that this does not sanitize the per-cell size factors - to do so, users should call SanitizeSizeFactors
separately on the output of run()
.
n | Whether to replace non-finite pseudo-cell size factors with the largest finite size factor across pseudo-cells, see SanitizeSizeFactors::set_handle_non_finite() for more details. |
GroupedSizeFactors
object.
|
inline |
n | Number of threads to use. |
AggregateAcrossCells
object.
|
inline |
Compute per-cell size factors based on user-supplied groupings. The reference group is automatically determined from the pseudo-cell with the highest sum of root-counts, inspired by the calcNormFactors
function from the edgeR R package. This approach favors higher-coverage libraries with decent transcriptomic complexity.
T | Numeric data type of the input matrix. |
IDX | Integer index type of the input matrix. |
Group | Integer type for the groupings. |
Out | Numeric data type of the output vector. |
mat | Matrix containing non-negative expression data, usually counts. Rows should be genes and columns should be cells. | |
[in] | group | Pointer to an array of group identifiers, of length equal to the number of columns in mat . Values should be integers in where is the total number of groups. |
[out] | output | Pointer to an array to use to store the output size factors. This should be of length equal to the number of columns in mat . |
|
inline |
Compute per-cell size factors based on user-supplied groupings and a user-specified reference group.
T | Numeric data type of the input matrix. |
IDX | Integer index type of the input matrix. |
Group | Integer type for the groupings. |
Out | Numeric data type of the output vector. |
mat | Matrix containing non-negative expression data, usually counts. Rows should be genes and columns should be cells. | |
[in] | group | Pointer to an array of group identifiers, of length equal to the number of columns in mat . Values should be integers in where is the total number of groups. |
reference | Identifier of the group to use as the reference. This should be an integer in . | |
[out] | output | Pointer to an array to use to store the output size factors. This should be of length equal to the number of columns in mat . |
|
inline |
Compute per-cell size factors based on user-supplied groupings. The reference sample is automatically chosen, see run()
for details.
T | Numeric data type of the input matrix. |
IDX | Integer index type of the input matrix. |
Group | Integer type for the groupings. |
mat | Matrix containing non-negative expression data, usually counts. Rows should be genes and columns should be cells. | |
[in] | group | Pointer to an array of group identifiers, of length equal to the number of columns in mat . Values should be integers in where is the total number of groups. |
Results
object is returned containing the size factors.
|
inline |
Compute per-cell size factors based on user-supplied groupings and a user-specified grouping.
T | Numeric data type of the input matrix. |
IDX | Integer index type of the input matrix. |
Group | Integer type for the groupings. |
mat | Matrix containing non-negative expression data, usually counts. Rows should be genes and columns should be cells. | |
[in] | group | Pointer to an array of group identifiers, of length equal to the number of columns in mat . Values should be integers in where is the total number of groups. |
reference | Identifier of the group to use as the reference. This should be an integer in . |
Results
object is returned containing the size factors.