MultiAssayExperiments for Julia
Overview
The MultiAssayExperiment package provides Bioconductor's standard structure for multimodal datasets. This repository ports the basic MultiAssayExperiment
functionality from R to Julia, allowing Julians to conveniently manipulate analysis-ready datasets in the same fashion as R/Bioconductor workflows.
The MultiAssayExperiment
class is effectively a wrapper around multiple SummarizedExperiment
objects, each of which usually represents a different data modality, e.g., gene expression, protein intensity. The sophistication lies in the relationships between columns of the various SummarizedExperiment
s. A "sample" may map to zero, one or many columns in any of the individual SummarizedExperiment
s, and many of the MultiAssayExperiment
methods are focused on exploiting these relationships for convenient filtering of the dataset.
Check out Figure 1 of the MultiAssayExperiment vignette for more details, though note that this package does make a few changes from the original Bioconductor implementation.
Quick start
Users may install this package from the GitHub repository through the usual process on the Pkg REPL:
add https://github.com/LTLA/MultiAssayExperiments.jl
And then:
julia> using MultiAssayExperiments, SummarizedExperiments
julia> mae = MultiAssayExperiments.exampleobject()
MultiAssayExperiment object
experiments(2): foo bar
sampledata(2): name disease
metadata(1): version
julia> se = experiment(mae, "bar")
50x8 SummarizedExperiments.SummarizedExperiment
assays(3): foo bar whee
rownames: Gene1 Gene2 ... Gene49 Gene50
rowdata(2): name Type
colnames: bar1 bar2 ... bar7 bar8
coldata(3): name Treatment Response
metadata(1): version
julia> coldata(experiment(mae, "bar"; sampledata = true))
8×4 DataFrame
Row │ name Treatment Response disease
│ String String Float64 String
─────┼────────────────────────────────────────
1 │ bar1 drug2 0.841273 bad
2 │ bar2 normal 0.523172 bad
3 │ bar3 drug1 0.253657 good
4 │ bar4 normal 0.613006 good
5 │ bar5 drug1 0.0986848 bad
6 │ bar6 drug1 0.610145 bad
7 │ bar7 normal 0.179339 very bad
8 │ bar8 normal 0.832958 very bad
julia> sub1 = multifilter(mae; samples = ["Patient1", "Patient3"]);
julia> experiment(sub1, "bar")
50x2 SummarizedExperiments.SummarizedExperiment
assays(3): foo bar whee
rownames: Gene1 Gene2 ... Gene49 Gene50
rowdata(2): name Type
colnames: bar3 bar4
coldata(3): name Treatment Response
metadata(1): version
julia> sub2 = multifilter(mae; experiments = "foo")
MultiAssayExperiment object
experiments(1): foo
sampledata(2): name disease
metadata(1): version
Class definition
MultiAssayExperiments.MultiAssayExperiment
— TypeThe MultiAssayExperiment
class is a Bioconductor container for multimodal studies. This is basically a list of SummarizedExperiment
objects, each of which represents a particular experimental modality. A mapping table specifies the relationships between the columns of each SummarizedExperiment
and a conceptual "sample", assuming that each sample has data for zero, one or multiple modalities. A sample can be defined as anything from a cell line culture to an individual patient, depending on the context.
The central idea is to use the sample mapping to easily filter the MultiAssayExperiment
based on the samples of interest. For example, a user can call multifilter
to only keep the columns of each SummarizedExperiment
that correspond to desired samples via the sample mapping. This facilitates coordination across multiple modalities without needing to manually subset each experiment. We also store sample-level annotations in a sample data DataFrame
, where they can be easily attached to the coldata
of a SummarizedExperiment
for further analyses.
This implementation makes a few changes from the original Bioconductor implementation. We do not consider the MultiAssayExperiment
to contain any "columns", as this was unnecessarily confusing. The previous colData
field has thus been renamed to sampledata
, to reflect the fact that we are operating on samples. We are also much more relaxed about harmonization between the experiments, sample mapping, and sample data - or more specifically, we don't harmonize at all, allowing greater flexibility in storage and manipulation.
Constructors
MultiAssayExperiments.MultiAssayExperiment
— TypeMultiAssayExperiment(experiments, sampledata, samplemap, metadata = Dict{String,Any}())
Creates a new MultiAssayExperiment
from its components.
experiments
should contain ordered pairs of experiment names and SummarizedExperiment
objects. Each SummarizedExperiment
may contain any number and identity for the rows. However, the column names must be non-nothing and unique within each object.
Each row of sampledata
corresponds to a conceptual sample. The first column should be called name
and contain the names of the samples in a Vector{String}
. Sample names are arbitrary but should be unique. Any number and type of other columns may be provided, usually containing sample-level annotations.
The samplemap
table is expected to have 3 Vector{String}
columns - sample
, experiment
and colname
- specifying the correspondence between each conceptual sample and the columns of a particular SummarizedExperiment
. See setsamplemap!
for more details on the expected format.
Note that values in the samplemap
columns need not have a 1:1 match to their cross-referenced target; any values unique to one or the other will be ignored in methods like expandsampledata
and filtersamplemap
. This allows users to flexibly manipulate the object without constantly hitting validity checks.
The metadata
stores other annotations unrelated to the samples.
Examples
julia> using MultiAssayExperiments
julia> using SummarizedExperiments
julia> exp = OrderedDict{String, SummarizedExperiment}();
julia> exp["foo"] = SummarizedExperiments.exampleobject(100, 2);
julia> exp["bar"] = SummarizedExperiments.exampleobject(50, 5);
julia> cd = DataFrame(
name = ["Aaron", "Michael", "Jayaram", "Sebastien", "John"],
disease = ["good", "bad", "good", "bad", "very bad"]
);
julia> sm = DataFrame(
sample = ["Aaron", "Michael", "Aaron", "Michael", "Jayaram", "Sebastien", "John"],
experiment = ["foo", "foo", "bar", "bar", "bar", "bar", "bar"],
colname = ["Patient1", "Patient2", "Patient1", "Patient2", "Patient3", "Patient4", "Patient5"]
);
julia> using MultiAssayExperiments;
julia> out = MultiAssayExperiment(exp, cd, sm)
MultiAssayExperiment object
experiments(2): foo bar
sampledata(2): name disease
metadata(0):
MultiAssayExperiments.MultiAssayExperiment
— MethodMultiAssayExperiment(experiments)
Creates an MultiAssayExperiment
object from a set of experiments
. The per-sample column data and sample mapping is automatically created from the union of column names from all experiments
.
Examples
julia> using MultiAssayExperiments
julia> using SummarizedExperiments
julia> exp = OrderedDict{String, SummarizedExperiment}();
julia> exp["foo"] = SummarizedExperiments.exampleobject(100, 10);
julia> exp["bar"] = SummarizedExperiments.exampleobject(50, 20);
julia> out = MultiAssayExperiment(exp)
MultiAssayExperiment object
experiments(2): foo bar
sampledata(1): name
metadata(0):
MultiAssayExperiments.MultiAssayExperiment
— MethodMultiAssayExperiment()
Creates an empty MultiAssayExperiment
object.
Examples
julia> using MultiAssayExperiments
julia> MultiAssayExperiment()
MultiAssayExperiment object
experiments(0):
sampledata(1): name
metadata(0):
Getters
MultiAssayExperiments.experiment
— Methodexperiment(x[, i]; sampledata = false)
Extract the specified SummarizedExperiment
from a MultiAssayExperiment
x
. i
may be a positive integer no greater than the number of experiments in x
, or a string specifying the name of the desired experiment. If i
is not specified, it defaults to the first experiment in x
.
If sampledata = true
, we attempt to add the sample data of x
to the coldata
of the returned SummarizedExperiment
. This is done by subsetting sampledata(x)
based on sample mapping to the columns of the returned SummarizedExperiment
- see expandsampledata
for more details. If there are columns in the sampledata(x)
and the coldata
of the SummarizedExperiment
with the same name but different values, the former are omitted with a warning.
Note that, if sampledata = true
, the returned SummarizedExperiment
will be a copy of the relevant experiment in x
. If false
, the returned object will be a reference.
Examples
julia> using MultiAssayExperiments;
julia> x = MultiAssayExperiments.exampleobject();
julia> experiment(x)
100x10 SummarizedExperiment
assays(3): foo bar whee
rownames: Gene1 Gene2 ... Gene99 Gene100
rowdata(2): name Type
colnames: foo1 foo2 ... foo9 foo10
coldata(3): name Treatment Response
metadata(1): version
julia> experiment(x, 1); # same result
julia> experiment(x, "foo");
julia> experiment(x, "foo", sampledata = true) # add sample data
100x10 SummarizedExperiment
assays(3): foo bar whee
rownames: Gene1 Gene2 ... Gene99 Gene100
rowdata(2): name Type
colnames: foo1 foo2 ... foo9 foo10
coldata(4): name Treatment Response disease
metadata(1): version
MultiAssayExperiments.experiments
— Methodexperiments(x)
Return an ordered dictionary containing all experiments in the MultiAssayExperiment
x
.
Examples
julia> using MultiAssayExperiments
julia> x = MultiAssayExperiments.exampleobject();
julia> collect(keys(experiments(x)))
2-element Vector{String}:
"foo"
"bar"
MultiAssayExperiments.sampledata
— Methodsampledata(x, check = true)
Return a DataFrame
containing the sample data in the MultiAssayExperiment
x
.
The returned object should contain name
as the first column, containing a vector of unique strings. If check = true
, the function will check the validity of the sample data before returning it.
Examples
julia> using MultiAssayExperiments
julia> x = MultiAssayExperiments.exampleobject();
julia> names(sampledata(x))
2-element Vector{String}:
"name"
"disease"
MultiAssayExperiments.samplemap
— Methodsamplemap(x)
Return an ordered dictionary containing the sample mapping from the MultiAssayExperiment
x
.
The returned object should contain the sample
, experiment
and colname
columns in that order. Each column should contain a vector of strings, and rows should be unique. If check = true
, the function will check the validity of the sample data before returning it.
Examples
julia> using MultiAssayExperiments
julia> x = MultiAssayExperiments.exampleobject();
julia> names(samplemap(x))
3-element Vector{String}:
"sample"
"experiment"
"colname"
MultiAssayExperiments.metadata
— Methodmetadata(x)
Return a dictionary containing the metadata from the MultiAssayExperiment
x
.
Examples
julia> using MultiAssayExperiments
julia> x = MultiAssayExperiments.exampleobject();
julia> collect(keys(metadata(x)))
1-element Vector{String}:
"version"
Setters
MultiAssayExperiments.setexperiment!
— Methodsetexperiment!(x[, i], value)
Set experiment i
in MultiAssayExperiment
x
to the SummarizedExperiment
value
. This returns a reference to the modified x
.
i
may be a positive integer, in which case it should be no greater than the length of experiments(x)
. It may also be a string specifying a new or existing experiment in x
. If omitted, we set the first experiment by default.
Examples
julia> using MultiAssayExperiments;
julia> x = MultiAssayExperiments.exampleobject();
julia> size(experiment(x, 2))
(50, 8)
julia> val = experiment(x);
julia> setexperiment!(x, 2, val);
julia> size(experiment(x, 2))
(100, 10)
MultiAssayExperiments.setexperiments!
— Methodsetexperiments!(x, value)
Set the experiments in the MultiAssayExperiment
x
to the OrderedDict
value
. This returns a reference to the modified x
.
Examples
julia> using MultiAssayExperiments
julia> x = MultiAssayExperiments.exampleobject();
julia> y = copy(experiments(x));
julia> delete!(y, "foo");
julia> setexperiments!(x, y);
julia> collect(keys(experiments(x)))
1-element Vector{String}:
"bar"
MultiAssayExperiments.setsampledata!
— Methodsetsampledata!(x, value)
Set the sample data in the MultiAssayExperiment
x
to the DataFrame
value
.
The returned object should contain name
as the first column, containing a vector of unique strings. If check = true
, the function will check the validity of the sample data before returning it.
Examples
julia> using MultiAssayExperiments
julia> x = MultiAssayExperiments.exampleobject();
julia> sd = copy(sampledata(x));
julia> sd[!,"stuff"] = [rand() for i in 1:size(sd)[1]];
julia> setsampledata!(x, sd);
julia> names(sampledata(x))
3-element Vector{String}:
"name"
"disease"
"stuff"
MultiAssayExperiments.setsamplemap!
— Methodsetsamplemap!(x, value)
Set the sample mapping in the MultiAssayExperiment
x
to a DataFrame
value
. This returns a reference to the modified x
.
value
should contain the sample
, experiment
and colname
columns in that order. Each column should contain a vector of strings:
- Values of
sample
may (but are not required to) correspond to the names of samples insampledata(x)
. - Values of
experiment
may (but are not required to) correspond to the keys ofexperiments(x)
. - Values of
colname
should (but are not required to) correspond to the columns of the correspondingSummarizedExperiment
in theexperiment
of the same row.
This correspondence is used for convenient subsetting and extraction, e.g., expandsampledata
, filtersamplemap
. However, values in the sample mapping columns need not have a 1:1 match to their corresponding target; any values unique to one or the other will be ignored in the relevant methods. This allows users to flexibly manipulate the object without constantly hitting validity checks.
It is legal (but highly unusual) for a given combination of experiment
and colname
to occur more than once. This may incur warnings in methods like expandsampledata
.
Examples
julia> using MultiAssayExperiments
julia> x = MultiAssayExperiments.exampleobject();
julia> y = samplemap(x)[1:10,:];
julia> setsamplemap!(x, y);
julia> size(samplemap(x))[1]
10
MultiAssayExperiments.setmetadata!
— Methodsetmetadata!(x, value)
Set the metadata of a MultiAssayExperiment
x
to a dictionary value
. This returns a reference to the modified x
.
Examples
julia> using MultiAssayExperiments
julia> x = MultiAssayExperiments.exampleobject();
julia> meta = copy(metadata(x));
julia> meta["version"] = "0.2.0";
julia> setmetadata!(x, meta);
julia> metadata(x)["version"]
"0.2.0"
Filtering
MultiAssayExperiments.filtersamplemap!
— Methodfiltersamplemap!(x; samples = nothing, experiments = nothing, colnames = nothing)
Modifies samplemap(x)
in place by filtering based on filtersamplemap
. A reference to the modified x
is returned.
Examples
julia> using MultiAssayExperiments
julia> x = MultiAssayExperiments.exampleobject();
julia> filtersamplemap!(x; samples = ["Patient1", "Patient2"]);
julia> samplemap(x)
8×3 DataFrame
Row │ sample experiment colname
│ String String String
─────┼───────────────────────────────
1 │ Patient1 foo foo1
2 │ Patient1 foo foo2
3 │ Patient1 foo foo3
4 │ Patient2 foo foo4
5 │ Patient2 foo foo5
6 │ Patient2 foo foo6
7 │ Patient2 bar bar1
8 │ Patient2 bar bar2
MultiAssayExperiments.filtersamplemap
— Methodfiltersamplemap(x; samples = nothing, experiments = nothing, colnames = nothing)
Filter the sample mapping DataFrame
to the requested samples, experiments and column names. x
can either be a MultiAssayExperiment
or its samplemap
.
If samples
is nothing
, it is not used for any filtering. Otherwise, it may be a vector or set of strings specifying the samples to retain. A single string may also be supplied.
If experiments
is nothing
, it is not used for any filtering. Otherwise, it may be a vector or set of strings specifying the experiments to retain. A single string may also be supplied.
If colnames
is nothing
, it is not used for any filtering. Otherwise, it may be a vector or set of strings specifying the columns to retain. A single string may also be supplied.
A row of the sample mapping is only retained if it passes all supplied filters.
Examples
julia> using MultiAssayExperiments
julia> x = MultiAssayExperiments.exampleobject();
julia> filtersamplemap(samplemap(x); samples = ["Patient1", "Patient2"])
8×3 DataFrame
Row │ sample experiment colname
│ String String String
─────┼───────────────────────────────
1 │ Patient1 foo foo1
2 │ Patient1 foo foo2
3 │ Patient1 foo foo3
4 │ Patient2 foo foo4
5 │ Patient2 foo foo5
6 │ Patient2 foo foo6
7 │ Patient2 bar bar1
8 │ Patient2 bar bar2
julia> filtersamplemap(samplemap(x); experiments = "foo")
10×3 DataFrame
Row │ sample experiment colname
│ String String String
─────┼───────────────────────────────
1 │ Patient1 foo foo1
2 │ Patient1 foo foo2
3 │ Patient1 foo foo3
4 │ Patient2 foo foo4
5 │ Patient2 foo foo5
6 │ Patient2 foo foo6
7 │ Patient3 foo foo7
8 │ Patient3 foo foo8
9 │ Patient3 foo foo9
10 │ Patient4 foo foo10
MultiAssayExperiments.dropunused!
— Methoddropunused!(x; samples = true, experiments = true, colnames = true, mapping = true)
Drop unused samples, experiments and/or column names from the MultiAssayExperiment
x
. A reference to the modified x
is returned.
If samples = true
, sampledata(x)
is filtered to only retain samples that are present in the sample mapping.
If experiments = true
, experiments(x)
is filtered to only retain experiments that are present in the sample mapping.
If colnames = true
, each entry of experiments(x)
is filtered to only retain column names that are present in the sample mapping for that experiment.
If mapping = true
, the sample mapping is filtered to remove rows that contain samples, experiments or column names that do not exist in x
.
Examples
julia> using MultiAssayExperiments
julia> x = MultiAssayExperiments.exampleobject();
julia> filtersamplemap!(x; experiments = "bar"); # Only keeping experiment 'bar'
julia> dropunused!(x) # We see that 'foo' is dropped
MultiAssayExperiment object
experiments(1): bar
sampledata(2): name disease
metadata(1): version
MultiAssayExperiments.dropunused
— Methoddropunused(x; kwargs...)
Return a new MultiAssayExperiment
where unused samples, experiments or column names are removed. This makes a copy of x
and passes it (and any keyword arguments in kwargs
) to dropunused!
; see the latter function for more details.
Examples
julia> using MultiAssayExperiments
julia> x = MultiAssayExperiments.exampleobject();
julia> y = filtersamplemap(x; experiments = "bar"); # Only keeping experiment 'bar'
julia> dropunused(y) # We see that 'foo' is dropped
MultiAssayExperiment object
experiments(1): bar
sampledata(2): name disease
metadata(1): version
MultiAssayExperiments.multifilter!
— Methodmultifilter!(x; samples = nothing, experiments = nothing, colnames = nothing)
Filters the MultiAssayExperiment
x
in place so that it only contains the specified samples, experiments or column names. This returns a reference to the modified x
.
See filtersamplemap
for the accepted values of samples
, experiments
and colnames
. The behavior of this function is equivalent to calling filtersamplemap!
followed by dropunused!
. The aspects of x
that are dropped depend on the arguments:
- Unused samples are dropped from the sample data iff
samples != nothing
. - Unused experiments are dropped from
experiments(x)
iffexperiments != nothing
.
This may result in some experiments with zero columns after filtering, but is generally more consistent behavior than experiments disappearing without notice.
- Unused columns are dropped from each experiment if
colnames != nothing
, or ifdropcolnames = true
andsamples != nothing
.
The latter condition provides the expected default behavior, where filtering on the samples is expected to propagate to the corresponding columns of each experiment; this can be disabled by setting dropcolnames = false
.
Examples
julia> using MultiAssayExperiments
julia> x = MultiAssayExperiments.exampleobject();
julia> multifilter!(x; samples = ["Patient2", "Patient3"], experiments = "foo")
MultiAssayExperiment object
experiments(1): foo
sampledata(2): name disease
metadata(1): version
julia> experiment(x)
100x6 SummarizedExperiment
assays(3): foo bar whee
rownames: Gene1 Gene2 ... Gene99 Gene100
rowdata(2): name Type
colnames: foo4 foo5 ... foo8 foo9
coldata(3): name Treatment Response
metadata(1): version
MultiAssayExperiments.multifilter
— Methodmultifilter!(x; samples = nothing, experiments = nothing, colnames = nothing)
Return a new MultiAssayExperiment
that has been filtered to only the specified samples, experiments or column names. This makes a copy of x
and passes it (and any keyword arguments in kwargs
) to multifilter!
; see the latter function for more details.
Examples
julia> using MultiAssayExperiments
julia> x = MultiAssayExperiments.exampleobject();
julia> multifilter(x; samples = ["Patient2", "Patient3"], experiments = "foo")
MultiAssayExperiment object
experiments(1): foo
sampledata(2): name disease
metadata(1): version
Miscellaneous
Base.copy
— Methodcopy(x::MultiAssayExperiment)
Return a copy of x
, where all components are identically-same as those in x
.
Examples
julia> using MultiAssayExperiments
julia> x = MultiAssayExperiments.exampleobject();
julia> x2 = copy(x);
julia> setsampledata!(x2, DataFrame(name=["A", "B"]));
julia> size(sampledata(x))
(5, 2)
julia> size(sampledata(x2)) # Change to reference is only reflected in x2.
(2, 1)
julia> stuff = experiments(x);
julia> delete!(stuff, "bar");
julia> collect(keys(experiments(x2)))
1-element Vector{String}:
"foo"
Base.deepcopy
— Methoddeepcopy(x::MultiAssayExperiment)
Return a deep copy of x
and all of its components.
Examples
julia> using MultiAssayExperiments
julia> x = MultiAssayExperiments.exampleobject();
julia> x2 = deepcopy(x);
julia> insertcols!(sampledata(x), 2, "WHEE" => 1:5); # References now point to different objects.
julia> names(sampledata(x2))
2-element Vector{String}:
"name"
"disease"
Base.show
— Methodprint(io::IO, x::MultiAssayExperiment)
Print a summary of x
.
MultiAssayExperiments.expandsampledata
— Methodexpandsampledata(x, experiment[, colnames])
Return a DataFrame containing the sample data for all or some of the column names in the chosen experiment
. Columns are the same as those in sampledata(x)
.
If colnames
is supplied, each row of the returned DataFrame
corresponds to an entry of colnames
and contains the data for the sample matching that column in the specified experiment.
If colnames
is not supplied, each row of the returned DataFrame
corresponds to a column of the specified experiment.
An error is raised if the requested columns do not have a matching sample in samplemap(x)
. Use dropunused
to remove unused columns from each experiment prior to calling this function.
A warning is raised if sampledata(x)
contains duplicate sample names. In such cases, data is taken from the first entry for each sample.
A warning is raised if samplemap(x)
contains multiple occurrences of the same experiment/colname combination with a different sample. In such cases, the first occurrence of the combination is used.
Examples
julia> using MultiAssayExperiments;
julia> x = MultiAssayExperiments.exampleobject();
julia> expandsampledata(x, "foo")
10×2 DataFrame
Row │ name disease
│ String String
─────┼───────────────────
1 │ Patient1 good
2 │ Patient1 good
3 │ Patient1 good
4 │ Patient2 bad
5 │ Patient2 bad
6 │ Patient2 bad
7 │ Patient3 good
8 │ Patient3 good
9 │ Patient3 good
10 │ Patient4 bad
julia> expandsampledata(x, "foo", ["foo2", "foo1"])
2×2 DataFrame
Row │ name disease
│ String String
─────┼───────────────────
1 │ Patient1 good
2 │ Patient1 good
SummarizedExperiments.exampleobject
— MethodMultiAssayExperiments.exampleobject()
Create an example MultiAssayExperiment
object. This is to be used to improve the succinctness of examples and tests.
Examples
julia> using MultiAssayExperiments
julia> x = MultiAssayExperiments.exampleobject()
MultiAssayExperiment object
experiments(2): foo bar
sampledata(2): name disease
metadata(1): version
Contact
This package is maintained by Aaron Lun (@LTLA). If you have bug reports or feature requests, please post them as issues at the GitHub repository.