SummarizedExperiments for Julia
Overview
The SummarizedExperiment package is a staple of the Bioconductor ecosystem, providing a powerful yet user-friendly container for summarized genomics datasets. This repository ports the basic SummarizedExperiment
functionality from R to Julia, allowing Julians to conveniently manipulate analysis-ready datasets in the same fashion as R/Bioconductor workflows.
The SummarizedExperiment
class is centered around the idea of assay matrices for experimental data where the rows are features (most commonly genes) and the columns are samples. Typical use cases include intensities for microarrays or counts for sequencing data. We hold further annotations on the rows and columns in the rowdata
and coldata
respectively, both of which are synchronized to the assays during subsetting and concatenation.
Check out Figure 2 of the Orchestrating high-throughput genomic analysis with Bioconductor paper for more details.
Quick start
Users may install this package from the GitHub repository through the usual process on the Pkg REPL:
add https://github.com/LTLA/SummarizedExperiments.jl
And then:
julia> using SummarizedExperiments
julia> x = exampleobject(100, 10) # Mocking up an example object
100x10 SummarizedExperiment
assays(3): foo bar whee
rownames: Gene1 Gene2 ... Gene99 Gene100
rowdata(2): name Type
colnames: Patient1 Patient2 ... Patient9 Patient10
coldata(3): name Treatment Response
metadata(1): version
julia> coldata(x)
10×3 DataFrame
Row │ name Treatment Response
│ String String Float64
─────┼─────────────────────────────────
1 │ Patient1 normal 0.197936
2 │ Patient2 drug1 0.886853
3 │ Patient3 drug2 0.184345
4 │ Patient4 drug1 0.271934
5 │ Patient5 normal 0.227814
6 │ Patient6 drug1 0.357306
7 │ Patient7 drug2 0.0882962
8 │ Patient8 normal 0.306175
9 │ Patient9 normal 0.731478
10 │ Patient10 drug2 0.419693
julia> assay(x)
100×10 Matrix{Int64}:
76 77 36 26 9 10 62 88 2 31
56 28 100 68 35 19 29 35 17 70
72 82 56 72 79 0 20 52 22 24
98 59 0 17 27 90 17 22 26 85
17 9 44 73 72 52 96 90 68 29
62 56 15 24 60 38 79 67 71 90
etc. etc.
Class definition
SummarizedExperiments.SummarizedExperiment
— TypeThe SummarizedExperiment
class is a Bioconductor container for matrix-like data with annotations on the rows and columns. It is the data structure underlying analysis workflows for many genomics data modalities, ranging from microarrays, bulk and single-cell RNA sequencing, ChIP-seq, epigenomics and beyond.
Any number of arrays (also known as "assays") can be stored in the container, provided they are assigned to unique names and all have the same extents for the first two dimensions. This reflects the fact that we often have multiple experimental readouts of the same shape, e.g., raw counts, normalized values, quality metrics. These assays are held as an OrderedDict
so the order of their addition is respected.
The row and column annotations are stored as DataFrame
s, with number of rows equal to the number of assay rows and columns, respectively. Any number and type of columns may be present in each DataFrame
, with the only constraint being that the first column must be a "name"
column of strings containing the feature/sample names. If no names are present, the "name"
column must contain nothing
s.
Each instance may also contain arbitrary metadata not associated with the rows or columns.
Constructors
SummarizedExperiments.SummarizedExperiment
— TypeSummarizedExperiment(assays, rowdata, coldata, metadata = Dict{String,Any}())
Create an instance of a SummarizedExperiment
with the supplied assays and the row/column annotations.
All entries of assays
should have the same extents for the first two dimensions. However, they can otherwise have any number of other dimensions. Each assay can be of different type.
For rowdata
, the number of rows must be equal to the extent of the first dimension for each entry in assays
. Similarly, for coldata
, the number of rows must be equal to the extent of the second dimension. In both cases, the first column must be called "name"
and contain a Vector
of String
s or Nothing
s (if no names are available).
assays
may also be empty.
Examples
julia> using SummarizedExperiments
julia> assays = OrderedDict{String, AbstractArray}(
"foobar" => [[1,2] [3,4] [5,6]],
"whee" => [[1.2,2.3] [3.4,4.5] [5.6,7.8]]);
julia> rowdata = DataFrame(
name = [ "X", "Y" ],
type = ["protein", "transcript"]);
julia> coldata = DataFrame(
name = [ "a", "b", "c" ],
treatment = ["normal", "drug1", "drug2"]);
julia> x = SummarizedExperiment(assays, rowdata, coldata)
2x3 SummarizedExperiment
assays(2): foobar whee
rownames: X Y
rowdata(2): name type
colnames: a b c
coldata(2): name treatment
metadata(0):
SummarizedExperiments.SummarizedExperiment
— MethodSummarizedExperiment(assays)
Create an instance of a SummarizedExperiment
with the supplied assays.
All entries of assays
should have the same extents for the first two dimensions. However, they can otherwise have any number of other dimensions. Each assay can be of different type. assays
should contain at least one assay matrix.
For the coldata
and rowdata
, an empty DataFrame
is created with a "name"
column containing all nothing
s.
Examples
julia> using SummarizedExperiments
julia> assays = OrderedDict{String, AbstractArray}(
"foobar" => [[1,2] [3,4] [5,6]],
"whee" => [[1.2,2.3] [3.4,4.5] [5.6,7.8]]);
julia> x = SummarizedExperiment(assays)
2x3 SummarizedExperiment
assays(2): foobar whee
rownames:
rowdata(1): name
colnames:
coldata(1): name
metadata(0):
SummarizedExperiments.SummarizedExperiment
— MethodSummarizedExperiment()
Create an empty SummarizedExperiment
with no assays and empty row/column annotations.
Examples
julia> using SummarizedExperiments
julia> SummarizedExperiment()
0x0 SummarizedExperiment
assays(0):
rownames:
rowdata(1): name
colnames:
coldata(1): name
metadata(0):
Getters
Base.size
— Methodsize(x::SummarizedExperiment)
Return a 2-tuple containing the number of rows and columns in x
.
Examples
julia> using SummarizedExperiments
julia> x = exampleobject(20, 10);
julia> size(x)
(20, 10)
SummarizedExperiments.assay
— Methodassay(x[, i]; check = true)
Return the requested assay in x
. i
may be an integer specifying an index or a string containing the name. If i
is not supplied, the first assay is returned.
The returned assay should have the same extents as x
for the first two dimensions. If check = true
, this function will verify that this expectation is satisfied. Any failures will cause warnings to be emitted.
Examples
julia> using SummarizedExperiments
julia> x = exampleobject(20, 10);
# All of these give the same value.
julia> assay(x);
julia> assay(x, 1);
julia> assay(x, "foo");
SummarizedExperiments.assays
— Methodassays(x; check = true)
Return all assays from x
as an OrderedDict
where the keys are the assay names. Each returned assay should have the same extents as x
for the first two dimensions. If check = true
, this function will verify that this expectation is satisfied. Any failures will cause warnings to be emitted.
Examples
julia> using SummarizedExperiments
julia> x = exampleobject(20, 10);
julia> collect(keys(assays(x)))
3-element Vector{String}:
"foo"
"bar"
"whee"
SummarizedExperiments.rowdata
— Methodrowdata(x; check = true)
Return the row annotations as a DataFrame
with number of rows equal to the number of rows in x
. The first column is called "name"
and contains the row names of x
; this can either be an AbstractVector{AbstractString}
or a Vector{Nothing}
(if no row names are available).
If check = true
, this function will verify that the above expectations on the returned DataFrame
are satisfied. Any failures will cause warnings to be emitted.
Examples
julia> using SummarizedExperiments
julia> x = exampleobject(20, 10);
julia> names(rowdata(x))
2-element Vector{String}:
"name"
"Type"
julia> size(rowdata(x))
(20, 2)
SummarizedExperiments.coldata
— Methodcoldata(x, check = true)
Return the column annotations as a DataFrame
with number of rows equal to the number of columns in x
. The first column is called "name"
and contains the column names of x
; this can either be a Vector{String}
or a Vector{Nothing}
(if no column names are available).
If check = true
, this function will verify that the expectations on the returned DataFrame
are satisfied. Any failures will cause warnings to be emitted.
Examples
julia> using SummarizedExperiments
julia> x = exampleobject(20, 10);
julia> names(coldata(x))
3-element Vector{String}:
"name"
"Treatment"
"Response"
julia> size(coldata(x))
(10, 3)
SummarizedExperiments.metadata
— Methodmetadata(x)
Return metadata from x
as a Dict
where the keys are the metadata field names.
Examples
julia> using SummarizedExperiments
julia> x = exampleobject(20, 10);
julia> metadata(x)
Dict{String, Any} with 1 entry:
"version" => "1.1.0"
Setters
SummarizedExperiments.setassay!
— Methodsetassay!(x[, i], value)
Set the requested assay in x
to any array-like value
. The first two dimensions of value
must have extent equal to those of x
.
i
may be an integer specifying an index, in which case it must be positive and no greater than length(assays(x))
; or a string containing the name, in which case it may be an existing or new name. If i
is not supplied, value
is set as the first assay of x
.
Examples
julia> using SummarizedExperiments
julia> x = exampleobject(20, 10);
julia> first_sum = sum(assay(x));
julia> second_sum = sum(assay(x, 2));
julia> setassay!(x, assay(x, 2)); # Replacing the first assay with the second.
julia> first_sum == sum(assay(x))
false
julia> second_sum == sum(assay(x))
true
julia> setassay!(x, 1, assay(x, 2)); # More explicit forms of the above.
julia> setassay!(x, "foo", assay(x, 2));
SummarizedExperiments.setassays!
— Methodsetassays!(x, value)
Set assays in x
to value
, an OrderedDict
where the keys are assay names and the values are arrays. All arrays in value
should have the same extents as x
for the first two dimensions.
Examples
julia> using SummarizedExperiments
julia> x = exampleobject(20, 10);
julia> length(assays(x))
3
julia> refresh = copy(assays(x));
julia> delete!(refresh, "foo");
julia> setassays!(x, refresh)
julia> length(assays(x))
2
SummarizedExperiments.setrowdata!
— Methodsetrowdata!(x, value)
Set the row annotations in x
to value
.
If value
is a DataFrame
, the first column should be called "name"
and contain the row names of x
; this can either be an AbstractVector{AbstractString}
or a Vector{Nothing}
(if no row names are available).
If value
is nothing
, this is considered to be equivalent to a DataFrame
with one "name"
column containing nothing
s.
The return value is a reference to the modified x
.
Examples
julia> using SummarizedExperiments
julia> x = exampleobject(20, 10);
julia> # using DataFrames
julia> replacement = copy(rowdata(x));
julia> replacement[!,"foobar"] = [ rand() for i in 1:size(x)[1] ];
julia> setrowdata!(x, replacement);
julia> names(rowdata(x))
3-element Vector{String}:
"name"
"Type"
"foobar"
SummarizedExperiments.setcoldata!
— Methodsetcoldata!(x, value)
Set the column annotations in x
to value
.
If value
is a DataFrame
, the first column should be called "name"
and contain the column names of x
; this can either be a Vector{String}
or a Vector{Nothing}
(if no column names are available).
If value
is nothing
, this is considered to be equivalent to a DataFrame
with one "name"
column containing nothing
s.
The return value is a reference to the modified x
.
Examples
julia> using SummarizedExperiments
julia> x = exampleobject(20, 10);
julia> replacement = copy(coldata(x));
julia> replacement[!,"foobar"] = [ rand() for i in 1:size(x)[2] ];
julia> setcoldata!(x, replacement);
julia> names(coldata(x))
4-element Vector{String}:
"name"
"Treatment"
"Response"
"foobar"
SummarizedExperiments.setmetadata!
— Methodsetmetadata!(x, value)
Set metadata in x
to value
, a Dict
where the keys are the metadata field names.
Examples
julia> using SummarizedExperiments
julia> x = exampleobject(20, 10);
julia> setmetadata!(x, Dict{String,Any}("foo" => 200));
julia> metadata(x)
Dict{String, Any} with 1 entry:
"foo" => 200
Subsetting
Base.getindex
— Methodgetindex(x::SummarizedExperiment, i, j)
Subset x
by the rows or columns based on i
and j
, respectively. Types for the arguments to i
and j
are similar to those for arrays:
- An integer
Vector
containing indices. - An
Int
containing a single index. - A boolean
Vector
of length equal to the relevant dimension, indicating whether each entry of that dimension should be retained. - A
:
operator to retain the entirety of a dimension's extent.
Examples
julia> using SummarizedExperiments
julia> x = exampleobject(20, 10);
julia> x[1,:]
1x10 SummarizedExperiment
assays(3): foo bar whee
rownames: Gene1
rowdata(2): name Type
colnames: Patient1 Patient2 ... Patient9 Patient10
coldata(3): name Treatment Response
metadata(1): version
julia> x[:,1:5]
20x5 SummarizedExperiment
assays(3): foo bar whee
rownames: Gene1 Gene2 ... Gene19 Gene20
rowdata(2): name Type
colnames: Patient1 Patient2 ... Patient4 Patient5
coldata(3): name Treatment Response
metadata(1): version
julia> keep = [ i > 5 for i in 1:size(x)[1] ];
julia> x[keep,1:2]
15x2 SummarizedExperiment
assays(3): foo bar whee
rownames: Gene6 Gene7 ... Gene19 Gene20
rowdata(2): name Type
colnames: Patient1 Patient2
coldata(3): name Treatment Response
metadata(1): version
Subset assignment
Base.setindex!
— Methodsetindex!(x, value, i, j)
Assign the SummarizedExperiment
value
to a subset of SummarizedExperiment
x
by the rows or columns based on i
and j
, respectively. Types for the arguments to i
and j
are similar to those for arrays:
- An integer
Vector
containing indices. - An
Int
containing a single index. - A boolean
Vector
of length equal to the relevant dimension, indicating whether each entry of that dimension should be retained. - A
:
operator to retain the entirety of a dimension's extent.
On assignment, the assay values in the specified subset of x
will be replaced by the corresponding values in value
. Rows of the rowdata(x)
and coldata(x)
will be replaced by those in value
, according to i
and j
respectively. Metadata fields in metadata(value)
will be added to or overwrite those in metadata(x)
.
It is assumed that x
and value
contain the same name, order and type of columns in their rowdata
and coldata
. Similarly, both objects should contain the same name, order and type of arrays in their assays
.
Examples
julia> using SummarizedExperiments
julia> x = exampleobject(20, 10);
julia> x[:,1] = x[:,2];
julia> sn = coldata(x)[!,"name"];
julia> sn[1] == sn[2]
true
julia> y = assay(x);
julia> y[:,1] == y[:,2]
true
Concatenation
Base.hcat
— Methodhcat(A::Vararg{SummarizedExperiment})
Horizontally concatenate one or more SummarizedExperiment
objects. The input objects must satisfy the following constraints:
- All objects must have the same number of rows, which are assumed to be in the same order.
- All objects must have
DataFrame
s in theircoldata
with the same type and names of all columns (though they may be ordered differently). - All objects must have the same names and types of assays; for a given assay name, the dimensions of the corresponding arrays across all
A
should be the same except for the second dimension.
This function returns a single SummarizedExperiment
instance where the number of columns is equal to the sum of the number of columns across all objects in A
. The number of rows in the output object is the same as the number of rows in any object in A
. The order of columns in the output coldata
is the same as that of the first object. The output rowdata
is created by combining columns horizontally across rowdata
of all objects in A
; if columns have duplicate names, only the first instance of each column is retained.
Examples
julia> using SummarizedExperiments
julia> x = exampleobject(20, 20);
julia> y = exampleobject(20, 30);
julia> z = hcat(x, y)
20x50 SummarizedExperiment
assays(3): foo bar whee
rownames: Gene1 Gene2 ... Gene19 Gene20
rowdata(2): name Type
colnames: Patient1 Patient2 ... Patient29 Patient30
coldata(3): name Treatment Response
metadata(1): version
Base.vcat
— Methodvcat(A::Vararg{SummarizedExperiment})
Vertically concatenate one or more SummarizedExperiment
objects. The input objects must satisfy the following constraints:
- All objects must have the same number of columns, which are assumed to be in the same order.
- All objects must have
DataFrame
s in theirrowdata
with the same type and names of all columns (though they may be ordered differently). - All objects must have the same names and types of assays; for a given assay name, the dimensions of the corresponding arrays across all
A
should be the same except for the first dimension.
This function returns a single SummarizedExperiment
instance where the number of rows is equal to the sum of the number of rows across all objects in A
. The number of columns in the output object is the same as the number of columns in any object in A
. The order of columns in the output rowdata
is the same as that of the first object. The output coldata
is created by combining columns horizontally across coldata
of all objects in A
; if columns have duplicate names, only the first instance of each column is retained.
Examples
julia> using SummarizedExperiments
julia> x = exampleobject(20, 10);
julia> y = exampleobject(30, 10);
julia> z = vcat(x, y)
50x10 SummarizedExperiment
assays(3): foo bar whee
rownames: Gene1 Gene2 ... Gene29 Gene30
rowdata(2): name Type
colnames: Patient1 Patient2 ... Patient9 Patient10
coldata(3): name Treatment Response
metadata(1): version
Miscellaneous
Base.copy
— Methodcopy(x::SummarizedExperiment)
Return a copy of x
, where all components are identically-same as those in x
.
Examples
julia> using SummarizedExperiments
julia> x = exampleobject(20, 10);
julia> x2 = copy(x);
julia> setrowdata!(x2, nothing);
julia> size(rowdata(x)) # Change to reference is only reflected in x2.
(20, 2)
julia> size(rowdata(x2))
(20, 1)
julia> insertcols!(coldata(x), 2, "WHEE" => 1:10); # Otherwise, references point to the same object.
julia> names(coldata(x2))
4-element Vector{String}:
"name"
"WHEE"
"Treatment"
"Response"
Base.deepcopy
— Methoddeepcopy(x::SummarizedExperiment)
Return a deep copy of x
and all of its components.
Examples
julia> using SummarizedExperiments
julia> x = exampleobject(20, 10);
julia> x2 = deepcopy(x);
julia> setrowdata!(x2, nothing);
julia> size(rowdata(x)) # Change to reference is only reflected in x2.
(20, 2)
julia> size(rowdata(x2))
(20, 1)
julia> insertcols!(coldata(x), 2, "WHEE" => 1:10); # References now point to different objects.
julia> names(coldata(x2))
3-element Vector{String}:
"name"
"Treatment"
"Response"
Base.show
— Methodshow(io::IO, x::SummarizedExperiment)
Show a summary of x
, printing the details to the specified io
device.
SummarizedExperiments.exampleobject
— Methodexampleobject(nrow, ncol)
Create an example SummarizedExperiment
object with the specified number of rows and columns. This is to be used to improve the succinctness of examples and tests.
Examples
julia> using SummarizedExperiments
julia> x = exampleobject(20, 10)
20x10 SummarizedExperiment
assays(3): foo bar whee
rownames: Gene1 Gene2 ... Gene19 Gene20
rowdata(2): name Type
colnames: Patient1 Patient2 ... Patient9 Patient10
coldata(3): name Treatment Response
metadata(1): version
Contact
This package is maintained by Aaron Lun (@LTLA). If you have bug reports or feature requests, please post them as issues at the GitHub repository.