scran
C++ library for basic single-cell RNA-seq analyses
Loading...
Searching...
No Matches
Classes | Public Member Functions | List of all members
scran::PairwiseEffects Class Reference

Compute pairwise effect size between groups of cells. More...

#include <PairwiseEffects.hpp>

Classes

struct  Defaults
 Default parameter settings. More...
 
struct  Results
 Pairwise effect size results. More...
 

Public Member Functions

PairwiseEffectsset_threshold (double t=Defaults::threshold)
 
PairwiseEffectsset_num_threads (int n=Defaults::num_threads)
 
PairwiseEffectsset_compute_cohen (bool c=Defaults::compute_cohen)
 
PairwiseEffectsset_compute_auc (bool c=Defaults::compute_auc)
 
PairwiseEffectsset_compute_lfc (bool c=Defaults::compute_lfc)
 
PairwiseEffectsset_compute_delta_detected (bool c=Defaults::compute_delta_detected)
 
PairwiseEffectsset_block_weight_policy (WeightPolicy w=Defaults::block_weight_policy)
 
PairwiseEffectsset_variable_block_weight_parameters (VariableBlockWeightParameters v=Defaults::variable_block_weight_parameters)
 
template<typename Data_ , typename Index_ , typename Group_ , typename Stat_ >
void run (const tatami::Matrix< Data_, Index_ > *p, const Group_ *group, std::vector< Stat_ * > means, std::vector< Stat_ * > detected, Stat_ *cohen, Stat_ *auc, Stat_ *lfc, Stat_ *delta_detected) const
 
template<typename Data_ , typename Index_ , typename Group_ , typename Block_ , typename Stat_ >
void run_blocked (const tatami::Matrix< Data_, Index_ > *p, const Group_ *group, const Block_ *block, std::vector< Stat_ * > means, std::vector< Stat_ * > detected, Stat_ *cohen, Stat_ *auc, Stat_ *lfc, Stat_ *delta_detected) const
 
template<typename Stat_ = double, typename Data_ = double, typename Index_ = int, typename Group_ = int>
Results< Stat_ > run (const tatami::Matrix< Data_, Index_ > *p, const Group_ *group)
 
template<typename Stat_ = double, typename Data_ = double, typename Index_ = int, typename Group_ = int, typename Block_ = int>
Results< Stat_ > run_blocked (const tatami::Matrix< Data_, Index_ > *p, const Group_ *group, const Block_ *block)
 

Detailed Description

Compute pairwise effect size between groups of cells.

This class computes the effect sizes for the pairwise comparisons used in ScoreMarkers, prior to any ranking of marker genes. It may be desirable to call this function directly if the pairwise effects themselves are of interest, rather than per-group summaries.

Choice of effect sizes

The log-fold change (LFC) is the difference in the mean log-expression between groups. This is fairly straightforward to interpret - as log-fold change of +1 corresponds to a two-fold upregulation in the first group compared to the second. For this interpretation, we assume that the input matrix contains log-transformed normalized expression values.

The delta-detected is the difference in the proportion of cells with detected expression between groups. This lies between 1 and -1, with the extremes occurring when a gene is silent in one group and detected in all cells of the other group. For this interpretation, we assume that the input matrix contains non-negative expression values, where a value of zero corresponds to lack of detectable expression.

Cohen's d is the standardized log-fold change between two groups. This is defined as the difference in the mean log-expression for each group scaled by the average standard deviation across the two groups. (Technically, we should use the pooled variance; however, this introduces some unpleasant asymmetry depending on the variance of the larger group, so we take a simple average instead.) A positive value indicates that the gene is upregulated in the first gene compared to the second. Cohen's d is analogous to the t-statistic in a two-sample t-test and avoids spuriously large effect sizes from comparisons between highly variable groups. We can also interpret Cohen's d as the number of standard deviations between the two group means.

The area under the curve (AUC) can be interpreted as the probability that a randomly chosen observation in one group is greater than a randomly chosen observation in the other group. Values greater than 0.5 indicate that a gene is upregulated in the first group. The AUC is closely related to the U-statistic used in the Wilcoxon rank sum test. The key difference between the AUC and Cohen's d is that the former is less sensitive to the variance within each group, e.g., if two distributions exhibit no overlap, the AUC is the same regardless of the variance of each distribution. This may or may not be desirable as it improves robustness to outliers but reduces the information available to obtain a highly resolved ranking.

With a log-fold change threshold

Setting a log-fold change threshold can be helpful as it prioritizes genes with large shifts in expression instead of those with low variances. Currently, only positive thresholds are supported - this focuses on genes upregulated in the first group compared to the second. The effect size definitions are generalized when testing against a non-zero log-fold change threshold.

Cohen's d is redefined as the standardized difference between the observed log-fold change and the specified threshold, analogous to the TREAT method from limma. Large positive values are only obtained when the observed log-fold change is significantly greater than the threshold. For example, if we had a threshold of 2 and we obtained a Cohen's d of 3, this means that the observed log-fold change was 3 standard deviations above 2. Importantly, a negative Cohen's d cannot be intepreted as downregulation, as the log-fold change may still be positive but less than the threshold.

The AUC generalized to the probability of obtaining a random observation in one group that is greater than a random observation plus the threshold in the other group. For example, if we had a threshold of 2 and we obtained an AUC of 0.8, this means that - 80% of the time - the random observation from the first group would be greater than a random observation from the second group by 2 or more. Again, AUCs below 0.5 cannot be interpreted as downregulation, as it may be caused by a positive log-fold change that is less than the threshold.

Blocked comparisons

In the presence of multiple batches, we can block on the batch of origin for each cell. Comparisons are only performed between the groups of cells in the same batch (also called "blocking level" below). The batch-specific effect sizes are then combined into a single aggregate value for output. This strategy avoids most problems related to batch effects as we never directly compare across different blocking levels.

Specifically, for each gene and each pair of groups, we obtain one effect size per blocking level. We consolidate these into a single statistic by computing the weighted mean across levels. The weight for each level is defined as the product of the weights of the two groups involved in the comparison, where each weight is computed from the size of the group using the logic described in variable_block_weight().

Obviously, blocking levels with no cells in either group will not contribute anything to the weighted mean. If two groups never co-occur in the same blocking level, no effect size will be computed and a NaN is reported in the output. We do not attempt to reconcile batch effects in a partially confounded scenario.

Other statistics

We report the mean log-expression of all cells in each group, as well as the proportion of cells with detectable expression in each group. These statistics are useful for quickly interpreting the differences in expression driving the effect size summaries. If blocking is involved, we compute the grand average across blocks of the mean and proportion for each group, where the weight for each block is defined from variable_block_weight() on the size of the group in that block.

Member Function Documentation

◆ set_threshold()

PairwiseEffects & scran::PairwiseEffects::set_threshold ( double  t = Defaults::threshold)
inline
Parameters
tThreshold on the log-fold change. This should be non-negative.
Returns
A reference to this PairwiseEffects object.

◆ set_num_threads()

PairwiseEffects & scran::PairwiseEffects::set_num_threads ( int  n = Defaults::num_threads)
inline
Parameters
nNumber of threads to use.
Returns
A reference to this PairwiseEffects object.

◆ set_compute_cohen()

PairwiseEffects & scran::PairwiseEffects::set_compute_cohen ( bool  c = Defaults::compute_cohen)
inline
Parameters
cWhether to compute Cohen's d.

This only has an effect for run() methods that return Results. Otherwise, we make this decision based on the validity of the input pointers.

Returns
A reference to this PairwiseEffects object.

◆ set_compute_auc()

PairwiseEffects & scran::PairwiseEffects::set_compute_auc ( bool  c = Defaults::compute_auc)
inline
Parameters
cWhether to compute the AUC.

This only has an effect for run() methods that return Results. Otherwise, we make this decision based on the validity of the input pointers.

Returns
A reference to this PairwiseEffects object.

◆ set_compute_lfc()

PairwiseEffects & scran::PairwiseEffects::set_compute_lfc ( bool  c = Defaults::compute_lfc)
inline
Parameters
cWhether to compute the log-fold change.

This only has an effect for run() methods that return Results. Otherwise, we make this decision based on the validity of the input pointers.

Returns
A reference to this PairwiseEffects object.

◆ set_compute_delta_detected()

PairwiseEffects & scran::PairwiseEffects::set_compute_delta_detected ( bool  c = Defaults::compute_delta_detected)
inline
Parameters
cWhether to compute the delta-detected.

This only has an effect for run() methods that return Results. Otherwise, we make this decision based on the validity of the input pointers.

Returns
A reference to this PairwiseEffects object.

◆ set_block_weight_policy()

PairwiseEffects & scran::PairwiseEffects::set_block_weight_policy ( WeightPolicy  w = Defaults::block_weight_policy)
inline
Parameters
wPolicy to use for weighting blocks when computing average statistics/effect sizes across blocks.
Returns
A reference to this PairwiseEffects instance.

◆ set_variable_block_weight_parameters()

PairwiseEffects & scran::PairwiseEffects::set_variable_block_weight_parameters ( VariableBlockWeightParameters  v = Defaults::variable_block_weight_parameters)
inline
Parameters
vParameters for the variable block weights, see variable_block_weight() for more details. Only used when the block weight policy is set to WeightPolicy::VARIABLE.
Returns
A reference to this PairwiseEffects instance.

◆ run() [1/2]

template<typename Data_ , typename Index_ , typename Group_ , typename Stat_ >
void scran::PairwiseEffects::run ( const tatami::Matrix< Data_, Index_ > *  p,
const Group_ *  group,
std::vector< Stat_ * >  means,
std::vector< Stat_ * >  detected,
Stat_ *  cohen,
Stat_ *  auc,
Stat_ *  lfc,
Stat_ *  delta_detected 
) const
inline

Compute effect sizes for pairwise comparisons between groups. On completion, means, detected, cohen, auc, lfc and delta_detected are filled with their corresponding statistics.

Template Parameters
Data_Matrix data type.
Index_Matrix index type.
Group_Integer type for the group assignments.
Stat_Floating-point type to store the statistics.
Parameters
pPointer to a tatami matrix instance.
[in]groupPointer to an array of length equal to the number of columns in p, containing the group assignments. Group identifiers should be 0-based and should contain all integers in $[0, N)$ where $N$ is the number of unique groups.
[out]meansVector of length equal to the number of groups. Each element corresponds to a group and is a pointer to an array of length equal to the number of rows in p. This is used to store the mean expression of each group across all genes.
[out]detectedVector of length equal to the number of groups, Each element corresponds to a group and is a pointer to an array of length equal to the number of rows in p. This is used to store the proportion of detected expression in each group.
[out]cohenPointer to an array of length equal to $GN^2$, where N is the number of groups and G is the number of genes (see Results for details). This is filled with the Cohen's d for the pairwise comparisons between groups across all genes. Ignored if set to nullptr, in which case Cohen's d is not computed.
[out]aucPointer to an array as described for cohen, but instead storing the AUC. Ignored if set to nullptr, in which case the AUC is not computed.
[out]lfcPointer to an array as described for cohen, but instead storing the log-fold change. Ignored if set to nullptr, in which case the log-fold change is not computed.
[out]delta_detectedPointer to an array as described for cohen, but instead the delta in the detected proportions. Ignored if set to nullptr, in which case the delta detected is not computed.

◆ run_blocked() [1/2]

template<typename Data_ , typename Index_ , typename Group_ , typename Block_ , typename Stat_ >
void scran::PairwiseEffects::run_blocked ( const tatami::Matrix< Data_, Index_ > *  p,
const Group_ *  group,
const Block_ *  block,
std::vector< Stat_ * >  means,
std::vector< Stat_ * >  detected,
Stat_ *  cohen,
Stat_ *  auc,
Stat_ *  lfc,
Stat_ *  delta_detected 
) const
inline

Compute effect sizes for pairwise comparisons between groups, accounting for any blocking factor in the dataset. On completion, means, detected, cohen, auc, lfc and delta_detected are filled with their corresponding statistics.

Template Parameters
Data_Matrix data type.
Index_Matrix index type.
Group_Integer type for the group assignments.
Block_Integer type for the block assignments.
Stat_Floating-point type to store the statistics.
Parameters
pPointer to a tatami matrix instance.
[in]groupPointer to an array of length equal to the number of columns in p, containing the group assignments. Group identifiers should be 0-based and should contain all integers in $[0, N)$ where $N$ is the number of unique groups.
[in]blockPointer to an array of length equal to the number of columns in p, containing the blocking factor. Block identifiers should be 0-based and should contain all integers in $[0, N)$ where $N$ is the number of unique groups. This can also be NULL in which case all cells are assumed to belong to the same block (i.e., like run()).
[out]meansVector of length equal to the number of groups. Each element corresponds to a group and is another vector of length equal to the number of genes. Each entry of the inner vector contains the grand average of the mean expression across all blocks.
[out]detectedVector of length equal to the number of groups. Each element corresponds to a group and is another vector of length equal to the number of genes. Each entry of the inner vector contains the grand average of the detected proportion across all blocks.
[out]cohenPointer to an array of length equal to $GN^2$, where N is the number of groups and G is the number of genes (see Results for details). This is filled with the Cohen's d for the pairwise comparisons between groups across all genes. Ignored if set to nullptr, in which case Cohen's d is not computed.
[out]aucPointer to an array as described for cohen, but instead storing the AUC. Ignored if set to nullptr, in which case the AUC is not computed.
[out]lfcPointer to an array as described for cohen, but instead storing the log-fold change. Ignored if set to nullptr, in which case the log-fold change is not computed.
[out]delta_detectedPointer to an array as described for cohen, but instead the delta in the detected proportions. Ignored if set to nullptr, in which case the delta detected is not computed.

◆ run() [2/2]

template<typename Stat_ = double, typename Data_ = double, typename Index_ = int, typename Group_ = int>
Results< Stat_ > scran::PairwiseEffects::run ( const tatami::Matrix< Data_, Index_ > *  p,
const Group_ *  group 
)
inline

Score potential marker genes by computing summary statistics across pairwise comparisons between groups.

Template Parameters
Stat_Floating-point type to store the statistics.
Data_Matrix data type.
Index_Matrix index type.
Group_Integer type for the group assignments.
Parameters
pPointer to a tatami matrix instance.
[in]groupPointer to an array of length equal to the number of columns in p, containing the group assignments. Group identifiers should be 0-based and should contain all integers in $[0, N)$ where $N$ is the number of unique groups.
Returns
A Results object is returned containing the pairwise effects, plus the mean expression and detected proportion in each group.

◆ run_blocked() [2/2]

template<typename Stat_ = double, typename Data_ = double, typename Index_ = int, typename Group_ = int, typename Block_ = int>
Results< Stat_ > scran::PairwiseEffects::run_blocked ( const tatami::Matrix< Data_, Index_ > *  p,
const Group_ *  group,
const Block_ *  block 
)
inline

Score potential marker genes by computing summary statistics across pairwise comparisons between groups in multiple blocks.

Template Parameters
Stat_Floating-point type to store the statistics.
Data_Matrix data type.
Index_Matrix index type.
Group_Integer type for the group assignments.
Block_Integer type for the block assignments.
Parameters
pPointer to a tatami matrix instance.
[in]groupPointer to an array of length equal to the number of columns in p, containing the group assignments. Group identifiers should be 0-based and should contain all integers in $[0, N)$ where $N$ is the number of unique groups.
[in]blockPointer to an array of length equal to the number of columns in p, containing the blocking factor. See run_blocked() for more details.
Returns
A Results object is returned containing the pairwise effects, plus the mean expression and detected proportion in each group and block.

The documentation for this class was generated from the following file: