scran
C++ library for basic single-cell RNA-seq analyses
|
Choose a pseudo-count for log-transformation. More...
#include <ChoosePseudoCount.hpp>
Classes | |
struct | Defaults |
Default parameters. More... | |
Public Member Functions | |
ChoosePseudoCount & | set_quantile (double q=Defaults::quantile) |
ChoosePseudoCount & | set_max_bias (double b=Defaults::max_bias) |
ChoosePseudoCount & | set_min_value (double v=Defaults::min_value) |
double | run (size_t n, const double *size_factors, double *buffer) const |
double | run (size_t n, const double *size_factors) const |
Choose a pseudo-count for log-transformation.
This class chooses a pseudo-count for log-transformation (see LogNormCounts::set_pseudo_count()
) that aims to control the log-transform bias. Specifically, the log-transform can introduce spurious differences in the expected log-normalized expression between cells with very different size factors. This bias can be mitigated by increasing the pseudo-count, which effectively shrinks all log-expression values towards the zero-expression baseline. The increased shrinkage is strongest at low counts where the log-transform bias is most pronounced, while the transformation of large counts is mostly unaffected.
In practice, the log-transformation bias is modest in datasets where there are stronger sources of variation. When observed, it manifests as a library size-dependent trend in the log-normalized expression values. This is difficult to regress out without also removing biology that is associated with, e.g., total RNA content; rather, a simpler solution is to increase the pseudo-count to suppress the bias.
No centering is performed by this function, so if centering is required, the size factors should be used in CenterSizeFactors
first.
|
inline |
q | Quantile to use for finding the smallest/largest size factors. Setting this to zero will use the observed minimum and maximum, though this is usually too extreme in practice. The default is to take the 5th and 95th percentile, yielding a range that is still representative of most cells. |
ChoosePseudoCount
class.
|
inline |
b | Acceptable upper bound on the log-transformation bias. |
ChoosePseudoCount
class.
|
inline |
v | Minimum value for the pseudo-count returned by run() . Defaults to 1 to stabilize near-zero normalized expression values, otherwise these manifest as avoid large negative values. |
ChoosePseudoCount
class.
|
inline |
n | Number of size factors. | |
[in] | size_factors | Pointer to an array of size factors of length n . Values should be positive, and all non-positive values are ignored. |
buffer | Pointer to an array of length n , to be used as a workspace. |
|
inline |
n | Number of size factors. | |
[in] | size_factors | Pointer to an array of size factors of length n . Values should be positive, and all non-positive values are ignored. |