Compute pairwise effect size between groups of cells. More...

#include <PairwiseEffects.hpp>

Classes
struct	Defaults
	Default parameter settings. More...

struct	Results
	Pairwise effect size results. More...

Public Member Functions
PairwiseEffects &	set_threshold (double t=Defaults::threshold)

PairwiseEffects &	set_num_threads (int n=Defaults::num_threads)

PairwiseEffects &	set_compute_cohen (bool c=Defaults::compute_cohen)

PairwiseEffects &	set_compute_auc (bool c=Defaults::compute_auc)

PairwiseEffects &	set_compute_lfc (bool c=Defaults::compute_lfc)

PairwiseEffects &	set_compute_delta_detected (bool c=Defaults::compute_delta_detected)

PairwiseEffects &	set_block_weight_policy (WeightPolicy w=Defaults::block_weight_policy)

PairwiseEffects &	set_variable_block_weight_parameters (VariableBlockWeightParameters v=Defaults::variable_block_weight_parameters)

template<typename Data_ , typename Index_ , typename Group_ , typename Stat_ >
void	run (const tatami::Matrix< Data_, Index_ > p, const Group_ group, std::vector< Stat_ * > means, std::vector< Stat_ * > detected, Stat_ cohen, Stat_ auc, Stat_ lfc, Stat_ delta_detected) const

template<typename Data_ , typename Index_ , typename Group_ , typename Block_ , typename Stat_ >
void	run_blocked (const tatami::Matrix< Data_, Index_ > p, const Group_ group, const Block_ block, std::vector< Stat_ > means, std::vector< Stat_ * > detected, Stat_ cohen, Stat_ auc, Stat_ lfc, Stat_ delta_detected) const

template<typename Stat_ = double, typename Data_ = double, typename Index_ = int, typename Group_ = int>
Results< Stat_ >	run (const tatami::Matrix< Data_, Index_ > p, const Group_ group)

template<typename Stat_ = double, typename Data_ = double, typename Index_ = int, typename Group_ = int, typename Block_ = int>
Results< Stat_ >	run_blocked (const tatami::Matrix< Data_, Index_ > p, const Group_ group, const Block_ *block)

Detailed Description

Compute pairwise effect size between groups of cells.

This class computes the effect sizes for the pairwise comparisons used in ScoreMarkers, prior to any ranking of marker genes. It may be desirable to call this function directly if the pairwise effects themselves are of interest, rather than per-group summaries.

Choice of effect sizes

The log-fold change (LFC) is the difference in the mean log-expression between groups. This is fairly straightforward to interpret - as log-fold change of +1 corresponds to a two-fold upregulation in the first group compared to the second. For this interpretation, we assume that the input matrix contains log-transformed normalized expression values.

The delta-detected is the difference in the proportion of cells with detected expression between groups. This lies between 1 and -1, with the extremes occurring when a gene is silent in one group and detected in all cells of the other group. For this interpretation, we assume that the input matrix contains non-negative expression values, where a value of zero corresponds to lack of detectable expression.

Cohen's d is the standardized log-fold change between two groups. This is defined as the difference in the mean log-expression for each group scaled by the average standard deviation across the two groups. (Technically, we should use the pooled variance; however, this introduces some unpleasant asymmetry depending on the variance of the larger group, so we take a simple average instead.) A positive value indicates that the gene is upregulated in the first gene compared to the second. Cohen's d is analogous to the t-statistic in a two-sample t-test and avoids spuriously large effect sizes from comparisons between highly variable groups. We can also interpret Cohen's d as the number of standard deviations between the two group means.

The area under the curve (AUC) can be interpreted as the probability that a randomly chosen observation in one group is greater than a randomly chosen observation in the other group. Values greater than 0.5 indicate that a gene is upregulated in the first group. The AUC is closely related to the U-statistic used in the Wilcoxon rank sum test. The key difference between the AUC and Cohen's d is that the former is less sensitive to the variance within each group, e.g., if two distributions exhibit no overlap, the AUC is the same regardless of the variance of each distribution. This may or may not be desirable as it improves robustness to outliers but reduces the information available to obtain a highly resolved ranking.

With a log-fold change threshold

Setting a log-fold change threshold can be helpful as it prioritizes genes with large shifts in expression instead of those with low variances. Currently, only positive thresholds are supported - this focuses on genes upregulated in the first group compared to the second. The effect size definitions are generalized when testing against a non-zero log-fold change threshold.

Cohen's d is redefined as the standardized difference between the observed log-fold change and the specified threshold, analogous to the TREAT method from limma. Large positive values are only obtained when the observed log-fold change is significantly greater than the threshold. For example, if we had a threshold of 2 and we obtained a Cohen's d of 3, this means that the observed log-fold change was 3 standard deviations above 2. Importantly, a negative Cohen's d cannot be intepreted as downregulation, as the log-fold change may still be positive but less than the threshold.

The AUC generalized to the probability of obtaining a random observation in one group that is greater than a random observation plus the threshold in the other group. For example, if we had a threshold of 2 and we obtained an AUC of 0.8, this means that - 80% of the time - the random observation from the first group would be greater than a random observation from the second group by 2 or more. Again, AUCs below 0.5 cannot be interpreted as downregulation, as it may be caused by a positive log-fold change that is less than the threshold.

Blocked comparisons

In the presence of multiple batches, we can block on the batch of origin for each cell. Comparisons are only performed between the groups of cells in the same batch (also called "blocking level" below). The batch-specific effect sizes are then combined into a single aggregate value for output. This strategy avoids most problems related to batch effects as we never directly compare across different blocking levels.

Specifically, for each gene and each pair of groups, we obtain one effect size per blocking level. We consolidate these into a single statistic by computing the weighted mean across levels. The weight for each level is defined as the product of the weights of the two groups involved in the comparison, where each weight is computed from the size of the group using the logic described in variable_block_weight().

Obviously, blocking levels with no cells in either group will not contribute anything to the weighted mean. If two groups never co-occur in the same blocking level, no effect size will be computed and a NaN is reported in the output. We do not attempt to reconcile batch effects in a partially confounded scenario.

Other statistics

We report the mean log-expression of all cells in each group, as well as the proportion of cells with detectable expression in each group. These statistics are useful for quickly interpreting the differences in expression driving the effect size summaries. If blocking is involved, we compute the grand average across blocks of the mean and proportion for each group, where the weight for each block is defined from variable_block_weight() on the size of the group in that block.

Member Function Documentation

◆ set_threshold()

PairwiseEffects & scran::PairwiseEffects::set_threshold ( double t = Defaults::threshold )

inline

Parameters

t	Threshold on the log-fold change. This should be non-negative.

Returns: A reference to this PairwiseEffects object.

◆ set_num_threads()

PairwiseEffects & scran::PairwiseEffects::set_num_threads ( int n = Defaults::num_threads )

inline

Parameters

n	Number of threads to use.

Returns: A reference to this PairwiseEffects object.

◆ set_compute_cohen()

PairwiseEffects & scran::PairwiseEffects::set_compute_cohen ( bool c = Defaults::compute_cohen )

inline

Parameters

c	Whether to compute Cohen's d.

This only has an effect for run() methods that return Results. Otherwise, we make this decision based on the validity of the input pointers.

Returns: A reference to this PairwiseEffects object.

◆ set_compute_auc()

PairwiseEffects & scran::PairwiseEffects::set_compute_auc ( bool c = Defaults::compute_auc )

inline

Parameters

c	Whether to compute the AUC.

This only has an effect for run() methods that return Results. Otherwise, we make this decision based on the validity of the input pointers.

Returns: A reference to this PairwiseEffects object.

◆ set_compute_lfc()

PairwiseEffects & scran::PairwiseEffects::set_compute_lfc ( bool c = Defaults::compute_lfc )

inline

Parameters

c	Whether to compute the log-fold change.

This only has an effect for run() methods that return Results. Otherwise, we make this decision based on the validity of the input pointers.

Returns: A reference to this PairwiseEffects object.

◆ set_compute_delta_detected()

PairwiseEffects & scran::PairwiseEffects::set_compute_delta_detected ( bool c = Defaults::compute_delta_detected )

inline

Parameters

c	Whether to compute the delta-detected.

This only has an effect for run() methods that return Results. Otherwise, we make this decision based on the validity of the input pointers.

Returns: A reference to this PairwiseEffects object.

◆ set_block_weight_policy()

PairwiseEffects & scran::PairwiseEffects::set_block_weight_policy ( WeightPolicy w = Defaults::block_weight_policy )

inline

Parameters

w	Policy to use for weighting blocks when computing average statistics/effect sizes across blocks.

Returns: A reference to this PairwiseEffects instance.

◆ set_variable_block_weight_parameters()

PairwiseEffects & scran::PairwiseEffects::set_variable_block_weight_parameters ( VariableBlockWeightParameters v = Defaults::variable_block_weight_parameters )

inline

Parameters

v	Parameters for the variable block weights, see `variable_block_weight()` for more details. Only used when the block weight policy is set to `WeightPolicy::VARIABLE`.

Returns: A reference to this PairwiseEffects instance.

◆ run() [1/2]

template<typename Data_ , typename Index_ , typename Group_ , typename Stat_ >

void scran::PairwiseEffects::run	(	const tatami::Matrix< Data_, Index_ > *	p,
		const Group_ *	group,
		std::vector< Stat_ * >	means,
		std::vector< Stat_ * >	detected,
		Stat_ *	cohen,
		Stat_ *	auc,
		Stat_ *	lfc,
		Stat_ *	delta_detected
	)		const

inline

Compute effect sizes for pairwise comparisons between groups. On completion, means, detected, cohen, auc, lfc and delta_detected are filled with their corresponding statistics.

Template Parameters

Data_	Matrix data type.
Index_	Matrix index type.
Group_	Integer type for the group assignments.
Stat_	Floating-point type to store the statistics.

Parameters

	p	Pointer to a tatami matrix instance.
[in]	group	Pointer to an array of length equal to the number of columns in `p`, containing the group assignments. Group identifiers should be 0-based and should contain all integers in $[0, N)$ where $N$ is the number of unique groups.
[out]	means	Vector of length equal to the number of groups. Each element corresponds to a group and is a pointer to an array of length equal to the number of rows in `p`. This is used to store the mean expression of each group across all genes.
[out]	detected	Vector of length equal to the number of groups, Each element corresponds to a group and is a pointer to an array of length equal to the number of rows in `p`. This is used to store the proportion of detected expression in each group.
[out]	cohen	Pointer to an array of length equal to $GN^2$, where `N` is the number of groups and `G` is the number of genes (see `Results` for details). This is filled with the Cohen's d for the pairwise comparisons between groups across all genes. Ignored if set to `nullptr`, in which case Cohen's d is not computed.
[out]	auc	Pointer to an array as described for `cohen`, but instead storing the AUC. Ignored if set to `nullptr`, in which case the AUC is not computed.
[out]	lfc	Pointer to an array as described for `cohen`, but instead storing the log-fold change. Ignored if set to `nullptr`, in which case the log-fold change is not computed.
[out]	delta_detected	Pointer to an array as described for `cohen`, but instead the delta in the detected proportions. Ignored if set to `nullptr`, in which case the delta detected is not computed.

◆ run_blocked() [1/2]

template<typename Data_ , typename Index_ , typename Group_ , typename Block_ , typename Stat_ >

void scran::PairwiseEffects::run_blocked	(	const tatami::Matrix< Data_, Index_ > *	p,
		const Group_ *	group,
		const Block_ *	block,
		std::vector< Stat_ * >	means,
		std::vector< Stat_ * >	detected,
		Stat_ *	cohen,
		Stat_ *	auc,
		Stat_ *	lfc,
		Stat_ *	delta_detected
	)		const

inline

Compute effect sizes for pairwise comparisons between groups, accounting for any blocking factor in the dataset. On completion, means, detected, cohen, auc, lfc and delta_detected are filled with their corresponding statistics.

Template Parameters

Data_	Matrix data type.
Index_	Matrix index type.
Group_	Integer type for the group assignments.
Block_	Integer type for the block assignments.
Stat_	Floating-point type to store the statistics.

Parameters

	p	Pointer to a tatami matrix instance.
[in]	group	Pointer to an array of length equal to the number of columns in `p`, containing the group assignments. Group identifiers should be 0-based and should contain all integers in $[0, N)$ where $N$ is the number of unique groups.
[in]	block	Pointer to an array of length equal to the number of columns in `p`, containing the blocking factor. Block identifiers should be 0-based and should contain all integers in $[0, N)$ where $N$ is the number of unique groups. This can also be `NULL` in which case all cells are assumed to belong to the same block (i.e., like `run()`).
[out]	means	Vector of length equal to the number of groups. Each element corresponds to a group and is another vector of length equal to the number of genes. Each entry of the inner vector contains the grand average of the mean expression across all blocks.
[out]	detected	Vector of length equal to the number of groups. Each element corresponds to a group and is another vector of length equal to the number of genes. Each entry of the inner vector contains the grand average of the detected proportion across all blocks.
[out]	cohen	Pointer to an array of length equal to $GN^2$, where `N` is the number of groups and `G` is the number of genes (see `Results` for details). This is filled with the Cohen's d for the pairwise comparisons between groups across all genes. Ignored if set to `nullptr`, in which case Cohen's d is not computed.
[out]	auc	Pointer to an array as described for `cohen`, but instead storing the AUC. Ignored if set to `nullptr`, in which case the AUC is not computed.
[out]	lfc	Pointer to an array as described for `cohen`, but instead storing the log-fold change. Ignored if set to `nullptr`, in which case the log-fold change is not computed.
[out]	delta_detected	Pointer to an array as described for `cohen`, but instead the delta in the detected proportions. Ignored if set to `nullptr`, in which case the delta detected is not computed.

◆ run() [2/2]

template<typename Stat_ = double, typename Data_ = double, typename Index_ = int, typename Group_ = int>

Results< Stat_ > scran::PairwiseEffects::run	(	const tatami::Matrix< Data_, Index_ > *	p,
		const Group_ *	group
	)

inline

Score potential marker genes by computing summary statistics across pairwise comparisons between groups.

Template Parameters

Stat_	Floating-point type to store the statistics.
Data_	Matrix data type.
Index_	Matrix index type.
Group_	Integer type for the group assignments.

Parameters

	p	Pointer to a tatami matrix instance.
[in]	group	Pointer to an array of length equal to the number of columns in `p`, containing the group assignments. Group identifiers should be 0-based and should contain all integers in $[0, N)$ where $N$ is the number of unique groups.

Returns: A Results object is returned containing the pairwise effects, plus the mean expression and detected proportion in each group.

◆ run_blocked() [2/2]

template<typename Stat_ = double, typename Data_ = double, typename Index_ = int, typename Group_ = int, typename Block_ = int>

Results< Stat_ > scran::PairwiseEffects::run_blocked	(	const tatami::Matrix< Data_, Index_ > *	p,
		const Group_ *	group,
		const Block_ *	block
	)

inline

Score potential marker genes by computing summary statistics across pairwise comparisons between groups in multiple blocks.

Template Parameters

Stat_	Floating-point type to store the statistics.
Data_	Matrix data type.
Index_	Matrix index type.
Group_	Integer type for the group assignments.
Block_	Integer type for the block assignments.

Parameters

	p	Pointer to a tatami matrix instance.
[in]	group	Pointer to an array of length equal to the number of columns in `p`, containing the group assignments. Group identifiers should be 0-based and should contain all integers in $[0, N)$ where $N$ is the number of unique groups.
[in]	block	Pointer to an array of length equal to the number of columns in `p`, containing the blocking factor. See `run_blocked()` for more details.

Returns: A Results object is returned containing the pairwise effects, plus the mean expression and detected proportion in each group and block.

The documentation for this class was generated from the following file:

scran/differential_analysis/PairwiseEffects.hpp

Classes

Public Member Functions

Detailed Description

Choice of effect sizes

With a log-fold change threshold

Blocked comparisons

Other statistics

Member Function Documentation

◆ set_threshold()

◆ set_num_threads()

◆ set_compute_cohen()

◆ set_compute_auc()

◆ set_compute_lfc()

◆ set_compute_delta_detected()

◆ set_block_weight_policy()

◆ set_variable_block_weight_parameters()

◆ run() [1/2]

◆ run_blocked() [1/2]

◆ run() [2/2]

◆ run_blocked() [2/2]