Summarize pairwise effects into summary statistics per group.
This class computes the statistics that are used for marker detection in ScoreMarkers
. Briefly, given n
groups, each group is involved in n - 1
pairwise comparisons and thus has n - 1
effect sizes, as computed by PairwiseEffects
. For each group, we compute summary statistics - e.g., the minimum, median, mean - of the effect sizes across all of that group's comparisons. Users can then sort by any of these summaries to obtain a ranking of potential marker genes for each group.
The choice of summary statistic dictates the interpretation of the ranking. Given a group X:
- A large mean effect size indicates that the gene is upregulated in X compared to the average of the other groups. A small value indicates that the gene is downregulated in X instead. This is a good general-purpose summary statistic for ranking, usually by decreasing size to obtain upregulated markers in X.
- A large median effect size indicates that the gene is upregulated in X compared to most (>50%) other groups. A small value indicates that the gene is downregulated in X instead. This is also a good general-purpose summary, with the advantage of being more robust to outlier effects compared to the mean. However, it also has the disadvantage of being less sensitive to strong effects in a minority of comparisons.
- A large minimum effect size indicates that the gene is upregulated in X compared to all other groups. A small value indicates that the gene is downregulated in X compared to at least one other group. For upregulation, this is the most stringent summary as markers will only have extreme values if they are uniquely upregulated in X compared to every other group. However, it may not be effective if X is closely related to any of the groups.
- A large maximum effect size indicates that the gene is upregulated in X compared to at least one other group. A small value indicates that the gene is downregulated in X compared to all other groups. For downregulation, this is the most stringent summary as markers will only have extreme values if they are uniquely downregulated in X compared to every other group. However, it may not be effective if X is closely related to any of the groups.
- The "minimum rank" (a.k.a. min-rank) is defined by ranking genes based on decreasing effect size within each comparison, and then taking the smallest rank across comparisons. A minimum rank of 1 means that the gene is the top upregulated gene in at least one comparison to another group. More generally, a minimum rank of T indicates that the gene is the T-th upregulated gene in at least one comparison. Applying a threshold on the minimum rank is useful for obtaining a set of genes that, in combination, are guaranteed to distinguish X from every other group.
The exact definition of "large" and "small" depends on the choice of effect size from PairwiseEffects
. For Cohen's d, LFC and delta-detected, the value must be positive to be considered "large", and negative to be considered "small". For the AUC, a value greater than 0.5 is considered "large" and less than 0.5 is considered "small".
The interpretation above is also contingent on the log-fold change threshold used in PairwiseEffects
. For positive thresholds, small effects cannot be unambiguously interpreted as downregulation, as the effect is already adjusted to account for the threshold. As a result, only large effects can be interpreted as evidence for upregulation.
NaN
effect sizes are allowed, e.g., if two groups do not exist in the same block for a blocked analysis in PairwiseEffects
. This class will ignore NaN
values when computing each summary. If all effects are NaN
for a particular group, the summary statistic will also be NaN
.
All choices of summary statistics are enumerated by differential_analysis::summary
.