SummarizedExperiments for Julia

Overview

The SummarizedExperiment package is a staple of the Bioconductor ecosystem, providing a powerful yet user-friendly container for summarized genomics datasets. This repository ports the basic SummarizedExperiment functionality from R to Julia, allowing Julians to conveniently manipulate analysis-ready datasets in the same fashion as R/Bioconductor workflows.

The SummarizedExperiment class is centered around the idea of assay matrices for experimental data where the rows are features (most commonly genes) and the columns are samples. Typical use cases include intensities for microarrays or counts for sequencing data. We hold further annotations on the rows and columns in the rowdata and coldata respectively, both of which are synchronized to the assays during subsetting and concatenation.

Check out Figure 2 of the Orchestrating high-throughput genomic analysis with Bioconductor paper for more details.

Quick start

Users may install this package from the GitHub repository through the usual process on the Pkg REPL:

add https://github.com/LTLA/SummarizedExperiments.jl

And then:

julia> using SummarizedExperiments

julia> x = exampleobject(100, 10) # Mocking up an example object
100x10 SummarizedExperiment
  assays(3): foo bar whee
  rownames: Gene1 Gene2 ... Gene99 Gene100
  rowdata(2): name Type
  colnames: Patient1 Patient2 ... Patient9 Patient10
  coldata(3): name Treatment Response
  metadata(1): version

julia> coldata(x)
10×3 DataFrame
 Row │ name       Treatment  Response
     │ String     String     Float64
─────┼─────────────────────────────────
   1 │ Patient1   normal     0.197936
   2 │ Patient2   drug1      0.886853
   3 │ Patient3   drug2      0.184345
   4 │ Patient4   drug1      0.271934
   5 │ Patient5   normal     0.227814
   6 │ Patient6   drug1      0.357306
   7 │ Patient7   drug2      0.0882962
   8 │ Patient8   normal     0.306175
   9 │ Patient9   normal     0.731478
  10 │ Patient10  drug2      0.419693

julia> assay(x)
100×10 Matrix{Int64}:
 76  77   36  26    9   10  62  88   2  31
 56  28  100  68   35   19  29  35  17  70
 72  82   56  72   79    0  20  52  22  24
 98  59    0  17   27   90  17  22  26  85
 17   9   44  73   72   52  96  90  68  29
 62  56   15  24   60   38  79  67  71  90
 etc. etc.

Class definition

SummarizedExperiments.SummarizedExperimentType

The SummarizedExperiment class is a Bioconductor container for matrix-like data with annotations on the rows and columns. It is the data structure underlying analysis workflows for many genomics data modalities, ranging from microarrays, bulk and single-cell RNA sequencing, ChIP-seq, epigenomics and beyond.

Any number of arrays (also known as "assays") can be stored in the container, provided they are assigned to unique names and all have the same extents for the first two dimensions. This reflects the fact that we often have multiple experimental readouts of the same shape, e.g., raw counts, normalized values, quality metrics. These assays are held as an OrderedDict so the order of their addition is respected.

The row and column annotations are stored as DataFrames, with number of rows equal to the number of assay rows and columns, respectively. Any number and type of columns may be present in each DataFrame, with the only constraint being that the first column must be a "name" column of strings containing the feature/sample names. If no names are present, the "name" column must contain nothings.

Each instance may also contain arbitrary metadata not associated with the rows or columns.

source

Constructors

SummarizedExperiments.SummarizedExperimentType
SummarizedExperiment(assays, rowdata, coldata, metadata = Dict{String,Any}())

Create an instance of a SummarizedExperiment with the supplied assays and the row/column annotations.

All entries of assays should have the same extents for the first two dimensions. However, they can otherwise have any number of other dimensions. Each assay can be of different type.

For rowdata, the number of rows must be equal to the extent of the first dimension for each entry in assays. Similarly, for coldata, the number of rows must be equal to the extent of the second dimension. In both cases, the first column must be called "name" and contain a Vector of Strings or Nothings (if no names are available).

assays may also be empty.

Examples

julia> using SummarizedExperiments

julia> assays = OrderedDict{String, AbstractArray}(
          "foobar" => [[1,2] [3,4] [5,6]], 
          "whee" => [[1.2,2.3] [3.4,4.5] [5.6,7.8]]);

julia> rowdata = DataFrame(
          name = [ "X", "Y" ],
          type = ["protein", "transcript"]);

julia> coldata = DataFrame(
          name = [ "a", "b", "c" ],
          treatment = ["normal", "drug1", "drug2"]);

julia> x = SummarizedExperiment(assays, rowdata, coldata)
2x3 SummarizedExperiment
  assays(2): foobar whee
  rownames: X Y
  rowdata(2): name type
  colnames: a b c
  coldata(2): name treatment
  metadata(0):
source
SummarizedExperiments.SummarizedExperimentMethod
SummarizedExperiment(assays)

Create an instance of a SummarizedExperiment with the supplied assays.

All entries of assays should have the same extents for the first two dimensions. However, they can otherwise have any number of other dimensions. Each assay can be of different type. assays should contain at least one assay matrix.

For the coldata and rowdata, an empty DataFrame is created with a "name" column containing all nothings.

Examples

julia> using SummarizedExperiments

julia> assays = OrderedDict{String, AbstractArray}(
          "foobar" => [[1,2] [3,4] [5,6]], 
          "whee" => [[1.2,2.3] [3.4,4.5] [5.6,7.8]]);

julia> x = SummarizedExperiment(assays)
2x3 SummarizedExperiment
  assays(2): foobar whee
  rownames:
  rowdata(1): name
  colnames:
  coldata(1): name
  metadata(0):
source
SummarizedExperiments.SummarizedExperimentMethod
SummarizedExperiment()

Create an empty SummarizedExperiment with no assays and empty row/column annotations.

Examples

julia> using SummarizedExperiments

julia> SummarizedExperiment()
0x0 SummarizedExperiment
  assays(0):
  rownames:
  rowdata(1): name
  colnames:
  coldata(1): name
  metadata(0):
source

Getters

Base.sizeMethod
size(x::SummarizedExperiment)

Return a 2-tuple containing the number of rows and columns in x.

Examples

julia> using SummarizedExperiments

julia> x = exampleobject(20, 10);

julia> size(x)
(20, 10)
source
SummarizedExperiments.assayMethod
assay(x[, i]; check = true)

Return the requested assay in x. i may be an integer specifying an index or a string containing the name. If i is not supplied, the first assay is returned.

The returned assay should have the same extents as x for the first two dimensions. If check = true, this function will verify that this expectation is satisfied. Any failures will cause warnings to be emitted.

Examples

julia> using SummarizedExperiments

julia> x = exampleobject(20, 10);

# All of these give the same value.
julia> assay(x);

julia> assay(x, 1);

julia> assay(x, "foo");
source
SummarizedExperiments.assaysMethod
assays(x; check = true)

Return all assays from x as an OrderedDict where the keys are the assay names. Each returned assay should have the same extents as x for the first two dimensions. If check = true, this function will verify that this expectation is satisfied. Any failures will cause warnings to be emitted.

Examples

julia> using SummarizedExperiments

julia> x = exampleobject(20, 10);

julia> collect(keys(assays(x)))
3-element Vector{String}:
 "foo"
 "bar"
 "whee"
source
SummarizedExperiments.rowdataMethod
rowdata(x; check = true)

Return the row annotations as a DataFrame with number of rows equal to the number of rows in x. The first column is called "name" and contains the row names of x; this can either be an AbstractVector{AbstractString} or a Vector{Nothing} (if no row names are available).

If check = true, this function will verify that the above expectations on the returned DataFrame are satisfied. Any failures will cause warnings to be emitted.

Examples

julia> using SummarizedExperiments

julia> x = exampleobject(20, 10);

julia> names(rowdata(x))
2-element Vector{String}:
 "name"
 "Type"

julia> size(rowdata(x))
(20, 2)
source
SummarizedExperiments.coldataMethod
coldata(x, check = true)

Return the column annotations as a DataFrame with number of rows equal to the number of columns in x. The first column is called "name" and contains the column names of x; this can either be a Vector{String} or a Vector{Nothing} (if no column names are available).

If check = true, this function will verify that the expectations on the returned DataFrame are satisfied. Any failures will cause warnings to be emitted.

Examples

julia> using SummarizedExperiments

julia> x = exampleobject(20, 10);

julia> names(coldata(x))
3-element Vector{String}:
 "name"
 "Treatment"
 "Response"

julia> size(coldata(x))
(10, 3)
source
SummarizedExperiments.metadataMethod
metadata(x)

Return metadata from x as a Dict where the keys are the metadata field names.

Examples

julia> using SummarizedExperiments

julia> x = exampleobject(20, 10);

julia> metadata(x)
Dict{String, Any} with 1 entry:
  "version" => "1.1.0"
source

Setters

SummarizedExperiments.setassay!Method
setassay!(x[, i], value)

Set the requested assay in x to any array-like value. The first two dimensions of value must have extent equal to those of x.

i may be an integer specifying an index, in which case it must be positive and no greater than length(assays(x)); or a string containing the name, in which case it may be an existing or new name. If i is not supplied, value is set as the first assay of x.

Examples

julia> using SummarizedExperiments

julia> x = exampleobject(20, 10);

julia> first_sum = sum(assay(x));

julia> second_sum = sum(assay(x, 2));

julia> setassay!(x, assay(x, 2)); # Replacing the first assay with the second.

julia> first_sum == sum(assay(x))
false

julia> second_sum == sum(assay(x))
true

julia> setassay!(x, 1, assay(x, 2)); # More explicit forms of the above.

julia> setassay!(x, "foo", assay(x, 2));
source
SummarizedExperiments.setassays!Method
setassays!(x, value)

Set assays in x to value, an OrderedDict where the keys are assay names and the values are arrays. All arrays in value should have the same extents as x for the first two dimensions.

Examples

julia> using SummarizedExperiments

julia> x = exampleobject(20, 10);

julia> length(assays(x))
3

julia> refresh = copy(assays(x));

julia> delete!(refresh, "foo");

julia> setassays!(x, refresh)

julia> length(assays(x))
2
source
SummarizedExperiments.setrowdata!Method
setrowdata!(x, value)

Set the row annotations in x to value.

If value is a DataFrame, the first column should be called "name" and contain the row names of x; this can either be an AbstractVector{AbstractString} or a Vector{Nothing} (if no row names are available).

If value is nothing, this is considered to be equivalent to a DataFrame with one "name" column containing nothings.

The return value is a reference to the modified x.

Examples

julia> using SummarizedExperiments

julia> x = exampleobject(20, 10);

julia> # using DataFrames

julia> replacement = copy(rowdata(x));

julia> replacement[!,"foobar"] = [ rand() for i in 1:size(x)[1] ];

julia> setrowdata!(x, replacement);

julia> names(rowdata(x))
3-element Vector{String}:
 "name"
 "Type"
 "foobar"
source
SummarizedExperiments.setcoldata!Method
setcoldata!(x, value)

Set the column annotations in x to value.

If value is a DataFrame, the first column should be called "name" and contain the column names of x; this can either be a Vector{String} or a Vector{Nothing} (if no column names are available).

If value is nothing, this is considered to be equivalent to a DataFrame with one "name" column containing nothings.

The return value is a reference to the modified x.

Examples

julia> using SummarizedExperiments

julia> x = exampleobject(20, 10);

julia> replacement = copy(coldata(x));

julia> replacement[!,"foobar"] = [ rand() for i in 1:size(x)[2] ];

julia> setcoldata!(x, replacement);

julia> names(coldata(x))
4-element Vector{String}:
 "name"
 "Treatment"
 "Response"
 "foobar"
source
SummarizedExperiments.setmetadata!Method
setmetadata!(x, value)

Set metadata in x to value, a Dict where the keys are the metadata field names.

Examples

julia> using SummarizedExperiments

julia> x = exampleobject(20, 10);

julia> setmetadata!(x, Dict{String,Any}("foo" => 200));

julia> metadata(x)
Dict{String, Any} with 1 entry:
  "foo" => 200
source

Subsetting

Base.getindexMethod
getindex(x::SummarizedExperiment, i, j)

Subset x by the rows or columns based on i and j, respectively. Types for the arguments to i and j are similar to those for arrays:

  • An integer Vector containing indices.
  • An Int containing a single index.
  • A boolean Vector of length equal to the relevant dimension, indicating whether each entry of that dimension should be retained.
  • A : operator to retain the entirety of a dimension's extent.

Examples

julia> using SummarizedExperiments

julia> x = exampleobject(20, 10);

julia> x[1,:]
1x10 SummarizedExperiment
  assays(3): foo bar whee
  rownames: Gene1
  rowdata(2): name Type
  colnames: Patient1 Patient2 ... Patient9 Patient10
  coldata(3): name Treatment Response
  metadata(1): version

julia> x[:,1:5]
20x5 SummarizedExperiment
  assays(3): foo bar whee
  rownames: Gene1 Gene2 ... Gene19 Gene20
  rowdata(2): name Type
  colnames: Patient1 Patient2 ... Patient4 Patient5
  coldata(3): name Treatment Response
  metadata(1): version

julia> keep = [ i > 5 for i in 1:size(x)[1] ];

julia> x[keep,1:2]
15x2 SummarizedExperiment
  assays(3): foo bar whee
  rownames: Gene6 Gene7 ... Gene19 Gene20
  rowdata(2): name Type
  colnames: Patient1 Patient2
  coldata(3): name Treatment Response
  metadata(1): version
source

Subset assignment

Base.setindex!Method
setindex!(x, value, i, j)

Assign the SummarizedExperiment value to a subset of SummarizedExperiment x by the rows or columns based on i and j, respectively. Types for the arguments to i and j are similar to those for arrays:

  • An integer Vector containing indices.
  • An Int containing a single index.
  • A boolean Vector of length equal to the relevant dimension, indicating whether each entry of that dimension should be retained.
  • A : operator to retain the entirety of a dimension's extent.

On assignment, the assay values in the specified subset of x will be replaced by the corresponding values in value. Rows of the rowdata(x) and coldata(x) will be replaced by those in value, according to i and j respectively. Metadata fields in metadata(value) will be added to or overwrite those in metadata(x).

It is assumed that x and value contain the same name, order and type of columns in their rowdata and coldata. Similarly, both objects should contain the same name, order and type of arrays in their assays.

Examples

julia> using SummarizedExperiments

julia> x = exampleobject(20, 10);

julia> x[:,1] = x[:,2];

julia> sn = coldata(x)[!,"name"];

julia> sn[1] == sn[2]
true

julia> y = assay(x);

julia> y[:,1] == y[:,2]
true
source

Concatenation

Base.hcatMethod
hcat(A::Vararg{SummarizedExperiment})

Horizontally concatenate one or more SummarizedExperiment objects. The input objects must satisfy the following constraints:

  • All objects must have the same number of rows, which are assumed to be in the same order.
  • All objects must have DataFrames in their coldata with the same type and names of all columns (though they may be ordered differently).
  • All objects must have the same names and types of assays; for a given assay name, the dimensions of the corresponding arrays across all A should be the same except for the second dimension.

This function returns a single SummarizedExperiment instance where the number of columns is equal to the sum of the number of columns across all objects in A. The number of rows in the output object is the same as the number of rows in any object in A. The order of columns in the output coldata is the same as that of the first object. The output rowdata is created by combining columns horizontally across rowdata of all objects in A; if columns have duplicate names, only the first instance of each column is retained.

Examples

julia> using SummarizedExperiments

julia> x = exampleobject(20, 20);

julia> y = exampleobject(20, 30);

julia> z = hcat(x, y)
20x50 SummarizedExperiment
  assays(3): foo bar whee
  rownames: Gene1 Gene2 ... Gene19 Gene20
  rowdata(2): name Type
  colnames: Patient1 Patient2 ... Patient29 Patient30
  coldata(3): name Treatment Response
  metadata(1): version
source
Base.vcatMethod
vcat(A::Vararg{SummarizedExperiment})

Vertically concatenate one or more SummarizedExperiment objects. The input objects must satisfy the following constraints:

  • All objects must have the same number of columns, which are assumed to be in the same order.
  • All objects must have DataFrames in their rowdata with the same type and names of all columns (though they may be ordered differently).
  • All objects must have the same names and types of assays; for a given assay name, the dimensions of the corresponding arrays across all A should be the same except for the first dimension.

This function returns a single SummarizedExperiment instance where the number of rows is equal to the sum of the number of rows across all objects in A. The number of columns in the output object is the same as the number of columns in any object in A. The order of columns in the output rowdata is the same as that of the first object. The output coldata is created by combining columns horizontally across coldata of all objects in A; if columns have duplicate names, only the first instance of each column is retained.

Examples

julia> using SummarizedExperiments

julia> x = exampleobject(20, 10);

julia> y = exampleobject(30, 10);

julia> z = vcat(x, y)
50x10 SummarizedExperiment
  assays(3): foo bar whee
  rownames: Gene1 Gene2 ... Gene29 Gene30
  rowdata(2): name Type
  colnames: Patient1 Patient2 ... Patient9 Patient10
  coldata(3): name Treatment Response
  metadata(1): version
source

Miscellaneous

Base.copyMethod
copy(x::SummarizedExperiment)

Return a copy of x, where all components are identically-same as those in x.

Examples

julia> using SummarizedExperiments

julia> x = exampleobject(20, 10);

julia> x2 = copy(x);

julia> setrowdata!(x2, nothing);

julia> size(rowdata(x)) # Change to reference is only reflected in x2.
(20, 2)

julia> size(rowdata(x2))
(20, 1)

julia> insertcols!(coldata(x), 2, "WHEE" => 1:10); # Otherwise, references point to the same object.

julia> names(coldata(x2))
4-element Vector{String}:
 "name"
 "WHEE"
 "Treatment"
 "Response"
source
Base.deepcopyMethod
deepcopy(x::SummarizedExperiment)

Return a deep copy of x and all of its components.

Examples

julia> using SummarizedExperiments

julia> x = exampleobject(20, 10);

julia> x2 = deepcopy(x);

julia> setrowdata!(x2, nothing);

julia> size(rowdata(x)) # Change to reference is only reflected in x2.
(20, 2)

julia> size(rowdata(x2))
(20, 1)

julia> insertcols!(coldata(x), 2, "WHEE" => 1:10); # References now point to different objects.

julia> names(coldata(x2))
3-element Vector{String}:
 "name"
 "Treatment"
 "Response"
source
Base.showMethod
show(io::IO, x::SummarizedExperiment)

Show a summary of x, printing the details to the specified io device.

source
SummarizedExperiments.exampleobjectMethod
exampleobject(nrow, ncol)

Create an example SummarizedExperiment object with the specified number of rows and columns. This is to be used to improve the succinctness of examples and tests.

Examples

julia> using SummarizedExperiments

julia> x = exampleobject(20, 10)
20x10 SummarizedExperiment
  assays(3): foo bar whee
  rownames: Gene1 Gene2 ... Gene19 Gene20
  rowdata(2): name Type
  colnames: Patient1 Patient2 ... Patient9 Patient10
  coldata(3): name Treatment Response
  metadata(1): version
source

Contact

This package is maintained by Aaron Lun (@LTLA). If you have bug reports or feature requests, please post them as issues at the GitHub repository.