Summarize pairwise effects into summary statistics per group. More...

#include <SummarizeEffects.hpp>

Classes
struct	Defaults
	Default parameter settings. More...

Public Member Functions
SummarizeEffects &	set_num_threads (int n=Defaults::num_threads)

SummarizeEffects &	set_compute_min (bool c=Defaults::compute_min)

SummarizeEffects &	set_compute_mean (bool c=Defaults::compute_mean)

SummarizeEffects &	set_compute_median (bool c=Defaults::compute_median)

SummarizeEffects &	set_compute_max (bool c=Defaults::compute_max)

SummarizeEffects &	set_compute_min_rank (bool c=Defaults::compute_min_rank)

template<typename Stat >
void	run (size_t ngenes, size_t ngroups, const Stat effects, std::vector< std::vector< Stat > > summaries) const

template<typename Stat >
std::vector< std::vector< std::vector< Stat > > >	run (size_t ngenes, size_t ngroups, const Stat *effects) const

Detailed Description

Summarize pairwise effects into summary statistics per group.

This class computes the statistics that are used for marker detection in ScoreMarkers. Briefly, given n groups, each group is involved in n - 1 pairwise comparisons and thus has n - 1 effect sizes, as computed by PairwiseEffects. For each group, we compute summary statistics - e.g., the minimum, median, mean - of the effect sizes across all of that group's comparisons. Users can then sort by any of these summaries to obtain a ranking of potential marker genes for each group.

The choice of summary statistic dictates the interpretation of the ranking. Given a group X:

A large mean effect size indicates that the gene is upregulated in X compared to the average of the other groups. A small value indicates that the gene is downregulated in X instead. This is a good general-purpose summary statistic for ranking, usually by decreasing size to obtain upregulated markers in X.
A large median effect size indicates that the gene is upregulated in X compared to most (>50%) other groups. A small value indicates that the gene is downregulated in X instead. This is also a good general-purpose summary, with the advantage of being more robust to outlier effects compared to the mean. However, it also has the disadvantage of being less sensitive to strong effects in a minority of comparisons.
A large minimum effect size indicates that the gene is upregulated in X compared to all other groups. A small value indicates that the gene is downregulated in X compared to at least one other group. For upregulation, this is the most stringent summary as markers will only have extreme values if they are uniquely upregulated in X compared to every other group. However, it may not be effective if X is closely related to any of the groups.
A large maximum effect size indicates that the gene is upregulated in X compared to at least one other group. A small value indicates that the gene is downregulated in X compared to all other groups. For downregulation, this is the most stringent summary as markers will only have extreme values if they are uniquely downregulated in X compared to every other group. However, it may not be effective if X is closely related to any of the groups.
The "minimum rank" (a.k.a. min-rank) is defined by ranking genes based on decreasing effect size within each comparison, and then taking the smallest rank across comparisons. A minimum rank of 1 means that the gene is the top upregulated gene in at least one comparison to another group. More generally, a minimum rank of T indicates that the gene is the T-th upregulated gene in at least one comparison. Applying a threshold on the minimum rank is useful for obtaining a set of genes that, in combination, are guaranteed to distinguish X from every other group.

The exact definition of "large" and "small" depends on the choice of effect size from PairwiseEffects. For Cohen's d, LFC and delta-detected, the value must be positive to be considered "large", and negative to be considered "small". For the AUC, a value greater than 0.5 is considered "large" and less than 0.5 is considered "small".

The interpretation above is also contingent on the log-fold change threshold used in PairwiseEffects. For positive thresholds, small effects cannot be unambiguously interpreted as downregulation, as the effect is already adjusted to account for the threshold. As a result, only large effects can be interpreted as evidence for upregulation.

NaN effect sizes are allowed, e.g., if two groups do not exist in the same block for a blocked analysis in PairwiseEffects. This class will ignore NaN values when computing each summary. If all effects are NaN for a particular group, the summary statistic will also be NaN.

All choices of summary statistics are enumerated by differential_analysis::summary.

Member Function Documentation

◆ set_num_threads()

SummarizeEffects & scran::SummarizeEffects::set_num_threads ( int n = Defaults::num_threads )

inline

Parameters

n	Number of threads to use.

Returns: A reference to this SummarizeEffects object.

◆ set_compute_min()

SummarizeEffects & scran::SummarizeEffects::set_compute_min ( bool c = Defaults::compute_min )

inline

Parameters

c	Whether to report the minimum of the pairwise effects.

Returns: A reference to this SummarizeEffects object.

This has no effect on the run() overload that accepts a summaries vector. For this method, the minimum is calculated if summaries[differential_analysis::MIN] is of non-zero length.

◆ set_compute_mean()

SummarizeEffects & scran::SummarizeEffects::set_compute_mean ( bool c = Defaults::compute_mean )

inline

Parameters

c	Whether to report the mean of the pairwise effects.

Returns: A reference to this SummarizeEffects object.

This has no effect on the run() overload that accepts a summaries vector. For this method, the minimum is calculated if summaries[differential_analysis::MEAN] is of non-zero length.

◆ set_compute_median()

SummarizeEffects & scran::SummarizeEffects::set_compute_median ( bool c = Defaults::compute_median )

inline

Parameters

c	Whether to report the median of the pairwise effects.

Returns: A reference to this SummarizeEffects object.

This has no effect on the run() overload that accepts a summaries vector. For this method, the minimum is calculated if summaries[differential_analysis::MEDIAN] is of non-zero length.

◆ set_compute_max()

SummarizeEffects & scran::SummarizeEffects::set_compute_max ( bool c = Defaults::compute_max )

inline

Parameters

c	Whether to report the maximum of the pairwise effects.

Returns: A reference to this SummarizeEffects object.

This has no effect on the run() overload that accepts a summaries vector. For this method, the minimum is calculated if summaries[differential_analysis::MAX] is of non-zero length.

◆ set_compute_min_rank()

SummarizeEffects & scran::SummarizeEffects::set_compute_min_rank ( bool c = Defaults::compute_min_rank )

inline

Parameters

c	Whether to report the min-rank of the pairwise effects.

Returns: A reference to this SummarizeEffects object.

This has no effect on the run() overload that accepts a summaries vector. For this method, the minimum is calculated if summaries[differential_analysis::MIN_RANK] is of non-zero length.

◆ run() [1/2]

template<typename Stat >

void scran::SummarizeEffects::run	(	size_t	ngenes,
		size_t	ngroups,
		const Stat *	effects,
		std::vector< std::vector< Stat * > >	summaries
	)		const

inline

Summarize the effect sizes for the pairwise comparisons to obtain a set of summary statistics for each gene in each group.

If summaries is of length 0, no summaries are computed. If any of the inner vectors of summaries are of length 0, the corresponding summary statistic is not computed.

Template Parameters

Stat	Floating point type for the statistics.

Parameters

	ngenes	Number of genes.
	ngroups	Number of groups.
[in]	effects	Pointer to a 3-dimensional array containing the pairwise statistics, see `PairwiseEffects::Results` for details.
[out]	summaries	Vector of vector of pointers to arrays of length equal to the number of genes. The vector should be of length equal to `differential_analysis::n_summaries` (see `differential_analysis::summary`). Each inner vector corresponds to a summary statistic - i.e., minimum, mean, median, maximum and min-rank - and should be of length equal to the number of groups. Each pointer corresponds to a group, and points to an array that is used to store the associated summary statistic across all genes for that group.

◆ run() [2/2]

template<typename Stat >

std::vector< std::vector< std::vector< Stat > > > scran::SummarizeEffects::run	(	size_t	ngenes,
		size_t	ngroups,
		const Stat *	effects
	)		const

inline

Summarize the effect sizes for the pairwise comparisons to obtain a set of summary statistics for each gene in each group.

If summaries is of length 0, no summaries are computed. If any of the inner vectors of summaries are of length 0, the corresponding summary statistic is not computed.

Template Parameters

Stat	Floating point type for the statistics.

Parameters

	ngenes	Number of genes.
	ngroups	Number of groups.
[in]	effects	Pointer to a 3-dimensional array containing the pairwise statistics, see `PairwiseEffects::Results` for details.

Returns: A vector of vectors of vectors containing summary effects for each gene in each group. The vector is of length equal to differential_analysis::n_summaries (see differential_analysis::summary). Each inner vector corresponds to a summary statistic - i.e., minimum, mean, median, maximum and min-rank - and is of length equal to the number of groups. Each pointer corresponds to a group, and points to an array containing the associated summary statistic across all genes for that group.

The documentation for this class was generated from the following file:

scran/differential_analysis/SummarizeEffects.hpp

Classes

Public Member Functions

Detailed Description

Member Function Documentation

◆ set_num_threads()

◆ set_compute_min()

◆ set_compute_mean()

◆ set_compute_median()

◆ set_compute_max()

◆ set_compute_min_rank()

◆ run() [1/2]

◆ run() [2/2]