scran
C++ library for basic single-cell RNA-seq analyses
|
Compute median-based size factors to handle composition bias. More...
#include <MedianSizeFactors.hpp>
Classes | |
struct | Defaults |
Default parameter settings. More... | |
struct | Results |
Result of the size factor calculation. More... | |
Public Member Functions | |
MedianSizeFactors & | set_center (bool c=Defaults::center) |
MedianSizeFactors & | set_prior_count (double p=Defaults::prior_count) |
MedianSizeFactors & | set_num_threads (int n=Defaults::num_threads) |
template<typename T , typename IDX , typename Ref , typename Out > | |
void | run (const tatami::Matrix< T, IDX > *mat, const Ref *ref, Out *output) const |
template<typename T , typename IDX , typename Out > | |
void | run_with_mean (const tatami::Matrix< T, IDX > *mat, Out *output) const |
template<typename Out = double, typename T , typename IDX , typename Ref > | |
Results< Out > | run (const tatami::Matrix< T, IDX > *mat, const Ref *ref) const |
template<typename Out = double, typename T , typename IDX > | |
Results< Out > | run_with_mean (const tatami::Matrix< T, IDX > *mat) const |
Compute median-based size factors to handle composition bias.
This is roughly equivalent to the DESeq2-based approach where the size factor for each library is defined as the median ratio against a reference profile. The aim is to account for composition biases from differential expression between libraries, which would not be handled properly by library size normalization. The main differences from DESeq2 are:
In practice, this tends to work poorly for actual single-cell data due to its sparsity. Nonetheless, we provide it here because it can be helpful for removing composition biases between clusters based on their averaged pseudo-bulk profiles.
|
inline |
c | Whether to center the size factors to have a mean of unity. This is usually desirable for interpretation of relative values. |
MedianSizeFactors
object.For more control over centering, this can be set to false
and the resulting size factors can be passed to CenterSizeFactors
.
|
inline |
p | Prior count to use for shrinking median-based size factors towards their library size-based counterparts. Larger values result in more shrinkage, while a value of zero will disable shrinkage altogether. |
MedianSizeFactors
object.When using shrinkage, we add a scaled version of the reference profile to each expression profile before computing the ratios. The scaling of the reference profile varies for each profile and is proportional to the (relative) total count of that profile. This implicitly pushes the median-based size factor towards a value that is proportional to the library size of the profile, given that the median of the ratio of the reference against a scaled version of itself is just the scaling factor, i.e., the library size.
The amount of shrinkage depends on the magnitude of the reference scaling. The prior count should be interpreted as the number of extra reads from the reference profile that is added to each profile. For example, the default of 10 means that the equivalent of 10 reads are added to each profile, distributed according to the reference profile. Increasing the prior count will increase the strength of the shrinkage as the reference profile has a greater contribution to the ratios.
|
inline |
n | Number of threads to use. |
MedianSizeFactors
object.
|
inline |
Compute per-column size factors against a user-supplied reference profile.
T | Numeric data type of the input matrix. |
IDX | Integer index type of the input matrix. |
Ref | Numeric data type of the reference profile. |
Out | Numeric data type of the output vector. |
mat | Matrix containing non-negative expression data, usually counts. Rows should be genes; columns may be cells, but are more typically some kind of aggregated pseudo-bulk profile. | |
[in] | ref | Pointer to an array containing the reference expression profile to normalize against. This should be of length equal to the number of rows in mat and should contain non-negative values. |
[out] | output | Pointer to an array to use to store the output size factors. This should be of length equal to the number of columns in mat . |
|
inline |
Compute per-column size factors against an average pseudo-sample constructed from the row means of the input matrix.
T | Numeric data type of the input matrix. |
IDX | Integer index type of the input matrix. |
Out | Numeric data type of the output vector. |
mat | Matrix containing non-negative expression data, usually counts. Rows should be genes; columns may be cells, but are more typically some kind of aggregated pseudo-bulk profile. | |
[out] | output | Pointer to an array to use to store the output size factors. This should be of length equal to the number of columns in mat . |
|
inline |
Compute per-column size factors against a user-supplied reference profile.
Out | Numeric type for the size factors. |
T | Numeric data type of the input matrix. |
IDX | Integer index type of the input matrix. |
Ref | Numeric data type of the reference profile. |
mat | Matrix containing non-negative expression data, usually counts. Rows should be genes; columns may be cells, but are more typically some kind of aggregated pseudo-bulk profile. | |
[in] | ref | Pointer to an array containing the reference expression profile to normalize against. This should be of length equal to the number of rows in mat and should contain non-negative values. |
Results
containing the size factors for each column in mat
.
|
inline |
Compute per-column size factors against an average pseudo-sample constructed from the row means of the input matrix.
Out | Numeric type for the size factors. |
T | Numeric data type of the input matrix. |
IDX | Integer index type of the input matrix. |
mat | Matrix containing non-negative expression data, usually counts. Rows should be genes; columns may be cells, but are more typically some kind of aggregated pseudo-bulk profile. |
Results
containing the size factors for each column in mat
.