MultiAssayExperiments for Julia

Overview

The MultiAssayExperiment package provides Bioconductor's standard structure for multimodal datasets. This repository ports the basic MultiAssayExperiment functionality from R to Julia, allowing Julians to conveniently manipulate analysis-ready datasets in the same fashion as R/Bioconductor workflows.

The MultiAssayExperiment class is effectively a wrapper around multiple SummarizedExperiment objects, each of which usually represents a different data modality, e.g., gene expression, protein intensity. The sophistication lies in the relationships between columns of the various SummarizedExperiments. A "sample" may map to zero, one or many columns in any of the individual SummarizedExperiments, and many of the MultiAssayExperiment methods are focused on exploiting these relationships for convenient filtering of the dataset.

Check out Figure 1 of the MultiAssayExperiment vignette for more details, though note that this package does make a few changes from the original Bioconductor implementation.

Quick start

Users may install this package from the GitHub repository through the usual process on the Pkg REPL:

add https://github.com/LTLA/MultiAssayExperiments.jl

And then:

julia> using MultiAssayExperiments, SummarizedExperiments

julia> mae = MultiAssayExperiments.exampleobject()
MultiAssayExperiment object
  experiments(2): foo bar
  sampledata(2): name disease
  metadata(1): version

julia> se = experiment(mae, "bar")
50x8 SummarizedExperiments.SummarizedExperiment
  assays(3): foo bar whee
  rownames: Gene1 Gene2 ... Gene49 Gene50
  rowdata(2): name Type
  colnames: bar1 bar2 ... bar7 bar8
  coldata(3): name Treatment Response
  metadata(1): version

julia> coldata(experiment(mae, "bar"; sampledata = true))
8×4 DataFrame
 Row │ name    Treatment  Response   disease  
     │ String  String     Float64    String   
─────┼────────────────────────────────────────
   1 │ bar1    drug2      0.841273   bad
   2 │ bar2    normal     0.523172   bad
   3 │ bar3    drug1      0.253657   good
   4 │ bar4    normal     0.613006   good
   5 │ bar5    drug1      0.0986848  bad
   6 │ bar6    drug1      0.610145   bad
   7 │ bar7    normal     0.179339   very bad
   8 │ bar8    normal     0.832958   very bad

julia> sub1 = multifilter(mae; samples = ["Patient1", "Patient3"]);

julia> experiment(sub1, "bar")
50x2 SummarizedExperiments.SummarizedExperiment
  assays(3): foo bar whee
  rownames: Gene1 Gene2 ... Gene49 Gene50
  rowdata(2): name Type
  colnames: bar3 bar4
  coldata(3): name Treatment Response
  metadata(1): version

julia> sub2 = multifilter(mae; experiments = "foo")
MultiAssayExperiment object
  experiments(1): foo
  sampledata(2): name disease
  metadata(1): version

Class definition

MultiAssayExperiments.MultiAssayExperiment — Type

The MultiAssayExperiment class is a Bioconductor container for multimodal studies. This is basically a list of SummarizedExperiment objects, each of which represents a particular experimental modality. A mapping table specifies the relationships between the columns of each SummarizedExperiment and a conceptual "sample", assuming that each sample has data for zero, one or multiple modalities. A sample can be defined as anything from a cell line culture to an individual patient, depending on the context.

The central idea is to use the sample mapping to easily filter the MultiAssayExperiment based on the samples of interest. For example, a user can call multifilter to only keep the columns of each SummarizedExperiment that correspond to desired samples via the sample mapping. This facilitates coordination across multiple modalities without needing to manually subset each experiment. We also store sample-level annotations in a sample data DataFrame, where they can be easily attached to the coldata of a SummarizedExperiment for further analyses.

This implementation makes a few changes from the original Bioconductor implementation. We do not consider the MultiAssayExperiment to contain any "columns", as this was unnecessarily confusing. The previous colData field has thus been renamed to sampledata, to reflect the fact that we are operating on samples. We are also much more relaxed about harmonization between the experiments, sample mapping, and sample data - or more specifically, we don't harmonize at all, allowing greater flexibility in storage and manipulation.