scran
C++ library for basic single-cell RNA-seq analyses
|
Compute log-normalized expression values. More...
#include <LogNormCounts.hpp>
Classes | |
struct | Defaults |
Default parameter settings. More... | |
Compute log-normalized expression values.
Given a count matrix and a set of size factors, compute log-transformed normalized expression values. Each cell's counts are divided by the cell's size factor, to account for differences in capture efficiency and sequencing depth across cells. The normalized values are then log-transformed so that downstream analyses focus on the relative rather than absolute differences in expression; this process also provides some measure of variance stabilization. These operations are done in a delayed manner using the DelayedIsometricOp
class from the tatami library.
|
inline |
Set the pseudo-count to add to the normalized expression values prior to the log-transformation. Larger pseudo-counts will shrink the log-expression values towards zero such that the dataset variance is driven more by high-abundance genes; this is occasionally useful to mitigate biases introduced by log-expression at low counts. See also set_choose_pseudo_count()
.
p | Pseudo-count, should be a positive number. |
LogNormCounts
object.
|
inline |
Naive addition of a non-unity pseudo-count will break sparsity. This can be avoided by instead dividing the normalized expression values by the pseudo-count and then applying the usual log1p
transformation. However, the resulting values can not be interpreted on the scale of log-counts.
a | Whether to use an effective pseudo-count that avoids breaking sparsity. |
LogNormCounts
object.
|
inline |
Specify whether to center the size factors in run()
. If true
, we center the size factors across cells so that their average is equal to 1; this ensures that the normalized values can still be interpreted on the same scale as the input counts.
If false
, no further centering is performed. This is more efficient when size factors are already centered; it may also be useful for re-using this class to compute other normalized values like log-CPMs.
c | Whether to center the size factors. |
LogNormCounts
object.
|
inline |
b | Blocking mode, see CenterSizeFactors::set_block_mode() for details. |
LogNormCounts
object.
|
inline |
Specify whether to handle zero size factors. If false, size factors of zero will raise an error; otherwise, they will be automatically set to the smallest non-zero size factor after centering (or 1, if all size factors are zero). Setting this to true
ensures that any all-zero cells are represented by all-zero columns in the normalized matrix, which is a reasonable outcome if those cells cannot be filtered out during upstream quality control. Note that the centering process ignores zeros, see CenterSizeFactors::set_ignore_zeros()
for more details.
z | Whether to replace zero size factors with the smallest non-zero size factor. |
LogNormCounts
object.
|
inline |
Specify whether to handle non-finite size factors. If false, non-finite size factors will raise an error. Otherwise, size factors of infinity will be automatically set to the largest finite size factor after centering (or 1, if all size factors are non-finite). Missing (i.e., NaN) size factors will be automatically set to 1 so that scaling is a no-op. Note that the centering process ignores non-finite factors, see CenterSizeFactors
for more details.
z | Whether to replace non-finite size factors with the largest finite size factor. |
LogNormCounts
object.
|
inline |
n | Number of threads to use. |
LogNormCounts
object.Parallelization is only performed to compute size factors, so this method only has an effect if size_factors
are not passed to run()
.
|
inline |
c | Whether to automatically choose an appropriate pseudo-count based on the (centered) size factors. See ChoosePseudoCount for details. |
LogNormCounts
object.
|
inline |
m | See ChoosePseudoCount::set_max_bias() for details. |
LogNormCounts
object.
|
inline |
q | See ChoosePseudoCount::set_quantile() for details. |
LogNormCounts
object.
|
inline |
m | See ChoosePseudoCount::set_min_value() for details. |
LogNormCounts
object.
|
inline |
Compute log-normalized expression values from an input matrix. To avoid copying the data, this is done in a delayed manner using the DelayedIsometricOp
class from the tatami package.
MAT | A tatami matrix class, most typically a tatami::NumericMatrix . |
V | A vector class supporting size() , random access via [ , begin() , end() and data() . |
mat | Pointer to an input count matrix, with features in the rows and cells in the columns. |
size_factors | A vector of positive size factors, of length equal to the number of columns in mat . |
|
inline |
Compute log-normalized expression values from an input matrix with blocking. Specifically, centering of size factors is performed within each block. This allows users to easily mimic normalization of different blocks of cells (e.g., from different samples) in the same matrix.
MAT | A tatami matrix class, most typically a tatami::NumericMatrix . |
V | A vector class supporting size() , random access via [ , begin() , end() and data() . |
B | An integer type, to hold the block IDs. |
mat | Pointer to an input count matrix, with features in the rows and cells in the columns. | |
size_factors | A vector of size factors, of length equal to the number of columns in mat . | |
[in] | block | Pointer to an array of block identifiers. If provided, the array should be of length equal to the number of columns in mat . Values should be integer IDs in where is the number of blocks. This can also be a NULL , in which case all cells are assumed to belong to the same block. |
|
inline |
Compute log-normalized expression values from an input matrix. Size factors are defined as the sum of the total counts for each cell.
MAT | A tatami matrix class, most typically a tatami::NumericMatrix . |
mat | Pointer to an input count matrix, with features in the rows and cells in the columns. |
|
inline |
Compute log-normalized expression values from an input matrix with blocking, see run_blocked()
for details. Size factors are defined as the sum of the total counts for each cell.
MAT | A tatami matrix class, most typically a tatami::NumericMatrix . |
B | An integer type, to hold the block IDs. |
mat | Pointer to an input count matrix, with features in the rows and cells in the columns. | |
[in] | block | Pointer to an array of block identifiers, see run_blocked() for details. |