MultiAssayExperiments for Julia

Overview

The MultiAssayExperiment package provides Bioconductor's standard structure for multimodal datasets. This repository ports the basic MultiAssayExperiment functionality from R to Julia, allowing Julians to conveniently manipulate analysis-ready datasets in the same fashion as R/Bioconductor workflows.

The MultiAssayExperiment class is effectively a wrapper around multiple SummarizedExperiment objects, each of which usually represents a different data modality, e.g., gene expression, protein intensity. The sophistication lies in the relationships between columns of the various SummarizedExperiments. A "sample" may map to zero, one or many columns in any of the individual SummarizedExperiments, and many of the MultiAssayExperiment methods are focused on exploiting these relationships for convenient filtering of the dataset.

Check out Figure 1 of the MultiAssayExperiment vignette for more details, though note that this package does make a few changes from the original Bioconductor implementation.

Quick start

Users may install this package from the GitHub repository through the usual process on the Pkg REPL:

add https://github.com/LTLA/MultiAssayExperiments.jl

And then:

julia> using MultiAssayExperiments, SummarizedExperiments

julia> mae = MultiAssayExperiments.exampleobject()
MultiAssayExperiment object
  experiments(2): foo bar
  sampledata(2): name disease
  metadata(1): version

julia> se = experiment(mae, "bar")
50x8 SummarizedExperiments.SummarizedExperiment
  assays(3): foo bar whee
  rownames: Gene1 Gene2 ... Gene49 Gene50
  rowdata(2): name Type
  colnames: bar1 bar2 ... bar7 bar8
  coldata(3): name Treatment Response
  metadata(1): version

julia> coldata(experiment(mae, "bar"; sampledata = true))
8×4 DataFrame
 Row │ name    Treatment  Response   disease  
     │ String  String     Float64    String   
─────┼────────────────────────────────────────
   1 │ bar1    drug2      0.841273   bad
   2 │ bar2    normal     0.523172   bad
   3 │ bar3    drug1      0.253657   good
   4 │ bar4    normal     0.613006   good
   5 │ bar5    drug1      0.0986848  bad
   6 │ bar6    drug1      0.610145   bad
   7 │ bar7    normal     0.179339   very bad
   8 │ bar8    normal     0.832958   very bad

julia> sub1 = multifilter(mae; samples = ["Patient1", "Patient3"]);

julia> experiment(sub1, "bar")
50x2 SummarizedExperiments.SummarizedExperiment
  assays(3): foo bar whee
  rownames: Gene1 Gene2 ... Gene49 Gene50
  rowdata(2): name Type
  colnames: bar3 bar4
  coldata(3): name Treatment Response
  metadata(1): version

julia> sub2 = multifilter(mae; experiments = "foo")
MultiAssayExperiment object
  experiments(1): foo
  sampledata(2): name disease
  metadata(1): version

Class definition

MultiAssayExperiments.MultiAssayExperimentType

The MultiAssayExperiment class is a Bioconductor container for multimodal studies. This is basically a list of SummarizedExperiment objects, each of which represents a particular experimental modality. A mapping table specifies the relationships between the columns of each SummarizedExperiment and a conceptual "sample", assuming that each sample has data for zero, one or multiple modalities. A sample can be defined as anything from a cell line culture to an individual patient, depending on the context.

The central idea is to use the sample mapping to easily filter the MultiAssayExperiment based on the samples of interest. For example, a user can call multifilter to only keep the columns of each SummarizedExperiment that correspond to desired samples via the sample mapping. This facilitates coordination across multiple modalities without needing to manually subset each experiment. We also store sample-level annotations in a sample data DataFrame, where they can be easily attached to the coldata of a SummarizedExperiment for further analyses.

This implementation makes a few changes from the original Bioconductor implementation. We do not consider the MultiAssayExperiment to contain any "columns", as this was unnecessarily confusing. The previous colData field has thus been renamed to sampledata, to reflect the fact that we are operating on samples. We are also much more relaxed about harmonization between the experiments, sample mapping, and sample data - or more specifically, we don't harmonize at all, allowing greater flexibility in storage and manipulation.

source

Constructors

MultiAssayExperiments.MultiAssayExperimentType
MultiAssayExperiment(experiments, sampledata, samplemap, metadata = Dict{String,Any}())

Creates a new MultiAssayExperiment from its components.

experiments should contain ordered pairs of experiment names and SummarizedExperiment objects. Each SummarizedExperiment may contain any number and identity for the rows. However, the column names must be non-nothing and unique within each object.

Each row of sampledata corresponds to a conceptual sample. The first column should be called name and contain the names of the samples in a Vector{String}. Sample names are arbitrary but should be unique. Any number and type of other columns may be provided, usually containing sample-level annotations.

The samplemap table is expected to have 3 Vector{String} columns - sample, experiment and colname - specifying the correspondence between each conceptual sample and the columns of a particular SummarizedExperiment. See setsamplemap! for more details on the expected format.

Note that values in the samplemap columns need not have a 1:1 match to their cross-referenced target; any values unique to one or the other will be ignored in methods like expandsampledata and filtersamplemap. This allows users to flexibly manipulate the object without constantly hitting validity checks.

The metadata stores other annotations unrelated to the samples.

Examples

julia> using MultiAssayExperiments

julia> using SummarizedExperiments

julia> exp = OrderedDict{String, SummarizedExperiment}();

julia> exp["foo"] = SummarizedExperiments.exampleobject(100, 2);

julia> exp["bar"] = SummarizedExperiments.exampleobject(50, 5);

julia> cd = DataFrame(
           name = ["Aaron", "Michael", "Jayaram", "Sebastien", "John"],
           disease = ["good", "bad", "good", "bad", "very bad"]
       );

julia> sm = DataFrame(
           sample = ["Aaron", "Michael", "Aaron", "Michael", "Jayaram", "Sebastien", "John"],
           experiment = ["foo", "foo", "bar", "bar", "bar", "bar", "bar"],
           colname = ["Patient1", "Patient2", "Patient1", "Patient2", "Patient3", "Patient4", "Patient5"]
       );

julia> using MultiAssayExperiments;

julia> out = MultiAssayExperiment(exp, cd, sm)
MultiAssayExperiment object
  experiments(2): foo bar
  sampledata(2): name disease
  metadata(0):
source
MultiAssayExperiments.MultiAssayExperimentMethod
MultiAssayExperiment(experiments)

Creates an MultiAssayExperiment object from a set of experiments. The per-sample column data and sample mapping is automatically created from the union of column names from all experiments.

Examples

julia> using MultiAssayExperiments

julia> using SummarizedExperiments

julia> exp = OrderedDict{String, SummarizedExperiment}();

julia> exp["foo"] = SummarizedExperiments.exampleobject(100, 10);

julia> exp["bar"] = SummarizedExperiments.exampleobject(50, 20);

julia> out = MultiAssayExperiment(exp)
MultiAssayExperiment object
  experiments(2): foo bar
  sampledata(1): name
  metadata(0):
source
MultiAssayExperiments.MultiAssayExperimentMethod
MultiAssayExperiment()

Creates an empty MultiAssayExperiment object.

Examples

julia> using MultiAssayExperiments

julia> MultiAssayExperiment()
MultiAssayExperiment object
  experiments(0):
  sampledata(1): name
  metadata(0):
source

Getters

MultiAssayExperiments.experimentMethod
experiment(x[, i]; sampledata = false)

Extract the specified SummarizedExperiment from a MultiAssayExperiment x. i may be a positive integer no greater than the number of experiments in x, or a string specifying the name of the desired experiment. If i is not specified, it defaults to the first experiment in x.

If sampledata = true, we attempt to add the sample data of x to the coldata of the returned SummarizedExperiment. This is done by subsetting sampledata(x) based on sample mapping to the columns of the returned SummarizedExperiment - see expandsampledata for more details. If there are columns in the sampledata(x) and the coldata of the SummarizedExperiment with the same name but different values, the former are omitted with a warning.

Note that, if sampledata = true, the returned SummarizedExperiment will be a copy of the relevant experiment in x. If false, the returned object will be a reference.

Examples

julia> using MultiAssayExperiments;

julia> x = MultiAssayExperiments.exampleobject();

julia> experiment(x)
100x10 SummarizedExperiment
  assays(3): foo bar whee
  rownames: Gene1 Gene2 ... Gene99 Gene100
  rowdata(2): name Type
  colnames: foo1 foo2 ... foo9 foo10
  coldata(3): name Treatment Response
  metadata(1): version

julia> experiment(x, 1); # same result

julia> experiment(x, "foo");

julia> experiment(x, "foo", sampledata = true) # add sample data
100x10 SummarizedExperiment
  assays(3): foo bar whee
  rownames: Gene1 Gene2 ... Gene99 Gene100
  rowdata(2): name Type
  colnames: foo1 foo2 ... foo9 foo10
  coldata(4): name Treatment Response disease
  metadata(1): version
source
MultiAssayExperiments.experimentsMethod
experiments(x)

Return an ordered dictionary containing all experiments in the MultiAssayExperiment x.

Examples

julia> using MultiAssayExperiments

julia> x = MultiAssayExperiments.exampleobject();

julia> collect(keys(experiments(x)))
2-element Vector{String}:
 "foo"
 "bar"
source
MultiAssayExperiments.sampledataMethod
sampledata(x, check = true)

Return a DataFrame containing the sample data in the MultiAssayExperiment x.

The returned object should contain name as the first column, containing a vector of unique strings. If check = true, the function will check the validity of the sample data before returning it.

Examples

julia> using MultiAssayExperiments

julia> x = MultiAssayExperiments.exampleobject();

julia> names(sampledata(x))
2-element Vector{String}:
 "name"
 "disease"
source
MultiAssayExperiments.samplemapMethod
samplemap(x)

Return an ordered dictionary containing the sample mapping from the MultiAssayExperiment x.

The returned object should contain the sample, experiment and colname columns in that order. Each column should contain a vector of strings, and rows should be unique. If check = true, the function will check the validity of the sample data before returning it.

Examples

julia> using MultiAssayExperiments

julia> x = MultiAssayExperiments.exampleobject();

julia> names(samplemap(x))
3-element Vector{String}:
 "sample"
 "experiment"
 "colname"
source
MultiAssayExperiments.metadataMethod
metadata(x)

Return a dictionary containing the metadata from the MultiAssayExperiment x.

Examples

julia> using MultiAssayExperiments

julia> x = MultiAssayExperiments.exampleobject();

julia> collect(keys(metadata(x)))
1-element Vector{String}:
 "version"
source

Setters

MultiAssayExperiments.setexperiment!Method
setexperiment!(x[, i], value)

Set experiment i in MultiAssayExperiment x to the SummarizedExperiment value. This returns a reference to the modified x.

i may be a positive integer, in which case it should be no greater than the length of experiments(x). It may also be a string specifying a new or existing experiment in x. If omitted, we set the first experiment by default.

Examples

julia> using MultiAssayExperiments;

julia> x = MultiAssayExperiments.exampleobject();

julia> size(experiment(x, 2))
(50, 8)

julia> val = experiment(x);

julia> setexperiment!(x, 2, val);

julia> size(experiment(x, 2))
(100, 10)
source
MultiAssayExperiments.setexperiments!Method
setexperiments!(x, value)

Set the experiments in the MultiAssayExperiment x to the OrderedDict value. This returns a reference to the modified x.

Examples

julia> using MultiAssayExperiments

julia> x = MultiAssayExperiments.exampleobject();

julia> y = copy(experiments(x));

julia> delete!(y, "foo");

julia> setexperiments!(x, y);

julia> collect(keys(experiments(x)))
1-element Vector{String}:
 "bar"
source
MultiAssayExperiments.setsampledata!Method
setsampledata!(x, value)

Set the sample data in the MultiAssayExperiment x to the DataFrame value.

The returned object should contain name as the first column, containing a vector of unique strings. If check = true, the function will check the validity of the sample data before returning it.

Examples

julia> using MultiAssayExperiments

julia> x = MultiAssayExperiments.exampleobject();

julia> sd = copy(sampledata(x));

julia> sd[!,"stuff"] = [rand() for i in 1:size(sd)[1]];

julia> setsampledata!(x, sd);

julia> names(sampledata(x))
3-element Vector{String}:
 "name"
 "disease"
 "stuff"
source
MultiAssayExperiments.setsamplemap!Method
setsamplemap!(x, value)

Set the sample mapping in the MultiAssayExperiment x to a DataFrame value. This returns a reference to the modified x.

value should contain the sample, experiment and colname columns in that order. Each column should contain a vector of strings:

  • Values of sample may (but are not required to) correspond to the names of samples in sampledata(x).
  • Values of experiment may (but are not required to) correspond to the keys of experiments(x).
  • Values of colname should (but are not required to) correspond to the columns of the corresponding SummarizedExperiment in the experiment of the same row.

This correspondence is used for convenient subsetting and extraction, e.g., expandsampledata, filtersamplemap. However, values in the sample mapping columns need not have a 1:1 match to their corresponding target; any values unique to one or the other will be ignored in the relevant methods. This allows users to flexibly manipulate the object without constantly hitting validity checks.

It is legal (but highly unusual) for a given combination of experiment and colname to occur more than once. This may incur warnings in methods like expandsampledata.

Examples

julia> using MultiAssayExperiments

julia> x = MultiAssayExperiments.exampleobject();

julia> y = samplemap(x)[1:10,:];

julia> setsamplemap!(x, y);

julia> size(samplemap(x))[1]
10
source
MultiAssayExperiments.setmetadata!Method
setmetadata!(x, value)

Set the metadata of a MultiAssayExperiment x to a dictionary value. This returns a reference to the modified x.

Examples

julia> using MultiAssayExperiments

julia> x = MultiAssayExperiments.exampleobject();

julia> meta = copy(metadata(x));

julia> meta["version"] = "0.2.0";

julia> setmetadata!(x, meta);

julia> metadata(x)["version"]
"0.2.0"
source

Filtering

MultiAssayExperiments.filtersamplemap!Method
filtersamplemap!(x; samples = nothing, experiments = nothing, colnames = nothing)

Modifies samplemap(x) in place by filtering based on filtersamplemap. A reference to the modified x is returned.

Examples

julia> using MultiAssayExperiments

julia> x = MultiAssayExperiments.exampleobject();

julia> filtersamplemap!(x; samples = ["Patient1", "Patient2"]);

julia> samplemap(x)
8×3 DataFrame
 Row │ sample    experiment  colname 
     │ String    String      String  
─────┼───────────────────────────────
   1 │ Patient1  foo         foo1
   2 │ Patient1  foo         foo2
   3 │ Patient1  foo         foo3
   4 │ Patient2  foo         foo4
   5 │ Patient2  foo         foo5
   6 │ Patient2  foo         foo6
   7 │ Patient2  bar         bar1
   8 │ Patient2  bar         bar2
source
MultiAssayExperiments.filtersamplemapMethod
filtersamplemap(x; samples = nothing, experiments = nothing, colnames = nothing)

Filter the sample mapping DataFrame to the requested samples, experiments and column names. x can either be a MultiAssayExperiment or its samplemap.

If samples is nothing, it is not used for any filtering. Otherwise, it may be a vector or set of strings specifying the samples to retain. A single string may also be supplied.

If experiments is nothing, it is not used for any filtering. Otherwise, it may be a vector or set of strings specifying the experiments to retain. A single string may also be supplied.

If colnames is nothing, it is not used for any filtering. Otherwise, it may be a vector or set of strings specifying the columns to retain. A single string may also be supplied.

A row of the sample mapping is only retained if it passes all supplied filters.

Examples

julia> using MultiAssayExperiments

julia> x = MultiAssayExperiments.exampleobject();

julia> filtersamplemap(samplemap(x); samples = ["Patient1", "Patient2"])
8×3 DataFrame
 Row │ sample    experiment  colname 
     │ String    String      String  
─────┼───────────────────────────────
   1 │ Patient1  foo         foo1
   2 │ Patient1  foo         foo2
   3 │ Patient1  foo         foo3
   4 │ Patient2  foo         foo4
   5 │ Patient2  foo         foo5
   6 │ Patient2  foo         foo6
   7 │ Patient2  bar         bar1
   8 │ Patient2  bar         bar2

julia> filtersamplemap(samplemap(x); experiments = "foo")
10×3 DataFrame
 Row │ sample    experiment  colname 
     │ String    String      String  
─────┼───────────────────────────────
   1 │ Patient1  foo         foo1
   2 │ Patient1  foo         foo2
   3 │ Patient1  foo         foo3
   4 │ Patient2  foo         foo4
   5 │ Patient2  foo         foo5
   6 │ Patient2  foo         foo6
   7 │ Patient3  foo         foo7
   8 │ Patient3  foo         foo8
   9 │ Patient3  foo         foo9
  10 │ Patient4  foo         foo10
source
MultiAssayExperiments.dropunused!Method
dropunused!(x; samples = true, experiments = true, colnames = true, mapping = true)

Drop unused samples, experiments and/or column names from the MultiAssayExperiment x. A reference to the modified x is returned.

If samples = true, sampledata(x) is filtered to only retain samples that are present in the sample mapping.

If experiments = true, experiments(x) is filtered to only retain experiments that are present in the sample mapping.

If colnames = true, each entry of experiments(x) is filtered to only retain column names that are present in the sample mapping for that experiment.

If mapping = true, the sample mapping is filtered to remove rows that contain samples, experiments or column names that do not exist in x.

Examples

julia> using MultiAssayExperiments

julia> x = MultiAssayExperiments.exampleobject();

julia> filtersamplemap!(x; experiments = "bar"); # Only keeping experiment 'bar'

julia> dropunused!(x) # We see that 'foo' is dropped
MultiAssayExperiment object
  experiments(1): bar
  sampledata(2): name disease
  metadata(1): version
source
MultiAssayExperiments.dropunusedMethod
dropunused(x; kwargs...)

Return a new MultiAssayExperiment where unused samples, experiments or column names are removed. This makes a copy of x and passes it (and any keyword arguments in kwargs) to dropunused!; see the latter function for more details.

Examples

julia> using MultiAssayExperiments

julia> x = MultiAssayExperiments.exampleobject();

julia> y = filtersamplemap(x; experiments = "bar"); # Only keeping experiment 'bar'

julia> dropunused(y) # We see that 'foo' is dropped
MultiAssayExperiment object
  experiments(1): bar
  sampledata(2): name disease
  metadata(1): version
source
MultiAssayExperiments.multifilter!Method
multifilter!(x; samples = nothing, experiments = nothing, colnames = nothing)

Filters the MultiAssayExperiment x in place so that it only contains the specified samples, experiments or column names. This returns a reference to the modified x.

See filtersamplemap for the accepted values of samples, experiments and colnames. The behavior of this function is equivalent to calling filtersamplemap! followed by dropunused!. The aspects of x that are dropped depend on the arguments:

  • Unused samples are dropped from the sample data iff samples != nothing.
  • Unused experiments are dropped from experiments(x) iff experiments != nothing.

This may result in some experiments with zero columns after filtering, but is generally more consistent behavior than experiments disappearing without notice.

  • Unused columns are dropped from each experiment if colnames != nothing, or if dropcolnames = true and samples != nothing.

The latter condition provides the expected default behavior, where filtering on the samples is expected to propagate to the corresponding columns of each experiment; this can be disabled by setting dropcolnames = false.

Examples

julia> using MultiAssayExperiments

julia> x = MultiAssayExperiments.exampleobject();

julia> multifilter!(x; samples = ["Patient2", "Patient3"], experiments = "foo")
MultiAssayExperiment object
  experiments(1): foo
  sampledata(2): name disease
  metadata(1): version

julia> experiment(x)
100x6 SummarizedExperiment
  assays(3): foo bar whee
  rownames: Gene1 Gene2 ... Gene99 Gene100
  rowdata(2): name Type
  colnames: foo4 foo5 ... foo8 foo9
  coldata(3): name Treatment Response
  metadata(1): version
source
MultiAssayExperiments.multifilterMethod
multifilter!(x; samples = nothing, experiments = nothing, colnames = nothing)

Return a new MultiAssayExperiment that has been filtered to only the specified samples, experiments or column names. This makes a copy of x and passes it (and any keyword arguments in kwargs) to multifilter!; see the latter function for more details.

Examples

julia> using MultiAssayExperiments

julia> x = MultiAssayExperiments.exampleobject();

julia> multifilter(x; samples = ["Patient2", "Patient3"], experiments = "foo")
MultiAssayExperiment object
  experiments(1): foo
  sampledata(2): name disease
  metadata(1): version
source

Miscellaneous

Base.copyMethod
copy(x::MultiAssayExperiment)

Return a copy of x, where all components are identically-same as those in x.

Examples

julia> using MultiAssayExperiments

julia> x = MultiAssayExperiments.exampleobject();

julia> x2 = copy(x);

julia> setsampledata!(x2, DataFrame(name=["A", "B"]));

julia> size(sampledata(x))
(5, 2)

julia> size(sampledata(x2)) # Change to reference is only reflected in x2.
(2, 1)

julia> stuff = experiments(x);

julia> delete!(stuff, "bar");

julia> collect(keys(experiments(x2)))
1-element Vector{String}:
 "foo"
source
Base.deepcopyMethod
deepcopy(x::MultiAssayExperiment)

Return a deep copy of x and all of its components.

Examples

julia> using MultiAssayExperiments

julia> x = MultiAssayExperiments.exampleobject();

julia> x2 = deepcopy(x);

julia> insertcols!(sampledata(x), 2, "WHEE" => 1:5); # References now point to different objects.

julia> names(sampledata(x2))
2-element Vector{String}:
 "name"
 "disease"
source
Base.showMethod
print(io::IO, x::MultiAssayExperiment)

Print a summary of x.

source
MultiAssayExperiments.expandsampledataMethod
expandsampledata(x, experiment[, colnames])

Return a DataFrame containing the sample data for all or some of the column names in the chosen experiment. Columns are the same as those in sampledata(x).

If colnames is supplied, each row of the returned DataFrame corresponds to an entry of colnames and contains the data for the sample matching that column in the specified experiment.

If colnames is not supplied, each row of the returned DataFrame corresponds to a column of the specified experiment.

An error is raised if the requested columns do not have a matching sample in samplemap(x). Use dropunused to remove unused columns from each experiment prior to calling this function.

A warning is raised if sampledata(x) contains duplicate sample names. In such cases, data is taken from the first entry for each sample.

A warning is raised if samplemap(x) contains multiple occurrences of the same experiment/colname combination with a different sample. In such cases, the first occurrence of the combination is used.

Examples

julia> using MultiAssayExperiments;

julia> x = MultiAssayExperiments.exampleobject();

julia> expandsampledata(x, "foo")
10×2 DataFrame
 Row │ name      disease 
     │ String    String  
─────┼───────────────────
   1 │ Patient1  good
   2 │ Patient1  good
   3 │ Patient1  good
   4 │ Patient2  bad
   5 │ Patient2  bad
   6 │ Patient2  bad
   7 │ Patient3  good
   8 │ Patient3  good
   9 │ Patient3  good
  10 │ Patient4  bad

julia> expandsampledata(x, "foo", ["foo2", "foo1"])
2×2 DataFrame
 Row │ name      disease 
     │ String    String  
─────┼───────────────────
   1 │ Patient1  good
   2 │ Patient1  good
source
SummarizedExperiments.exampleobjectMethod
MultiAssayExperiments.exampleobject()

Create an example MultiAssayExperiment object. This is to be used to improve the succinctness of examples and tests.

Examples

julia> using MultiAssayExperiments 

julia> x = MultiAssayExperiments.exampleobject()
MultiAssayExperiment object
  experiments(2): foo bar
  sampledata(2): name disease
  metadata(1): version
source

Contact

This package is maintained by Aaron Lun (@LTLA). If you have bug reports or feature requests, please post them as issues at the GitHub repository.