
Overview
This repository contains a header-only C++ library for reading and writing RDS or RDA files without the need to link to R's libraries. In this manner, we can use RDS as a flexible data exchange format across different frameworks that have C++ bindings, e.g., Python, Javascript (via Wasm). We currently support most user-visible data structures such as atomic vectors, lists, environments and S4 classes.
Quick start
Each RDS file contains a single R object and is typically created by calling saveRDS() within an R session. Given a path to an RDS file, the parse_rds() function will return a pointer to an RObject interface:
const auto& ptr = file_info->object;
RdsFile parse_rds(Reader_ &reader, const ParseRdsOptions &options)
Definition parse_rds.hpp:51
Options for parse_rds().
Definition parse_rds.hpp:27
The type of the underlying object can then be queried for further examination. For example, if we wanted to process integer vectors:
if (ptr->type() == rds2cpp::SEXPType::INT) {
const std::vector<std::int32_t>& values = iptr->
data;
}
Vector of some atomic type.
Definition RObject.hpp:308
std::vector< ElementType > data
Definition RObject.hpp:326
See the reference documentation for a list of known representations.
More reading examples
We can extract ordinary lists from an RDS file, examining the attributes to determine if the list is named.
if (ptr->type() == rds2cpp::SEXPType::VEC) {
const auto& elements = lptr->
data;
for (const auto& attr : lptr->attributes) {
const auto& attr_name = file_info.symbols[attr.name.index].name;
if (attr_name == "names") {
if (attr.value->type() != rds2cpp::SEXPType::STR) {
throw std::runtime_error("oops, names should be strings!");
}
for (const auto& str : nptr->value) {
if (!str.value.has_value()) {
throw std::runtime_error("oops, names should not be missing!");
}
const std::string& str_value = *(str.value);
const auto& str_enc = str.encoding;
}
}
}
}
Generic vector, i.e., an ordinary R list.
Definition RObject.hpp:415
std::vector< std::unique_ptr< RObject > > data
Definition RObject.hpp:431
String vector.
Definition RObject.hpp:390
Slots of S4 instances are similarly encoded in the attributes - except for the class name, which is extracted into its own member.
if (ptr->type() == rds2cpp::SEXPType::S4) {
sptr->package_name;
for (const auto& slot : sptr->attributes) {
const auto& slot_name = file_info.symbols[slot.name.index].name;
const auto& slot_val = *(slot.value);
}
}
S4 object.
Definition RObject.hpp:493
std::string class_name
Definition RObject.hpp:499
Advanced users can also pull out serialized environments. These should be treated as file-specific globals that may be referenced one or more times inside the R object.
if (ptr->type() == rds2cpp::SEXPType::ENV) {
const auto& env = file_info.environments[eptr->index];
for (const auto& var = env.variables) {
const auto& var_name = file_info.symbols[var.name.index].name;
const auto& var_value = *(var.value);
}
}
Reference to an environment.
Definition RObject.hpp:134
NULLs are supported but not particularly interesting:
if (ptr->type() == rds2cpp::SEXPType::NIL) {
}
Writing RDS files
The write_rds() function will write RDS files from an rds2cpp::RObject representation:
vec->data = std::vector<std::int32_t>{ 0, 1, 2, 3, 4, 5 };
void write_rds(const RdsFile &info, Writer &writer, const WriteRdsOptions &options)
Definition write_rds.hpp:50
AtomicVector< std::int32_t, SEXPType::INT > IntegerVector
Integer vector.
Definition RObject.hpp:337
Contents of the parsed RDS file.
Definition RdsFile.hpp:21
std::unique_ptr< RObject > object
Definition RdsFile.hpp:45
Options for write_rds().
Definition write_rds.hpp:27
Here's a more complicated example that saves a sparse matrix (as a dgCMatrix from the Matrix package) to file.
auto ptr = std::make_unique<rds2cpp::S4Object>();
auto& obj = *ptr;
obj.class_name = "dgCMatrix";
obj.package_name = "Matrix";
auto ivec = std::make_unique<rds2cpp::IntegerVector>();
ivec->data = std::vector<std::int32_t>{ 6, 8, 0, 3, 5, 6, 0, 1, 3, 7 };
obj.attributes.emplace_back(
std::move(ivec)
);
auto pvec = std::make_unique<rds2cpp::IntegerVector>();
pvec->data = std::vector<std::int32_t>{ 0, 0, 2, 3, 4, 5, 6, 8, 8, 8, 10 };
obj.attributes.emplace_back(
std::move(pvec)
);
auto xvec = std::make_unique<rds2cpp::DoubleVector>();
xvec->data = std::vector<double>{ .96, -.34, .82, -2., -.72, .39, .16, .36, -1.5, -.47 };
obj.attributes.emplace_back(
std::move(xvec)
);
auto dims = std::make_unique<rds2cpp::IntegerVector>();
dims->data = std::vector<int32_t>{ 10, 10 };
obj.attributes.emplace_back(
std::move(dims)
);
auto dimnames = std::make_unique<rds2cpp::GenericVector>();
dimnames->data.emplace_back(new Null);
dimnames->data.emplace_back(new Null);
obj.attributes.emplace_back(
std::move(dimnames)
);
obj.attributes.add(
std::make_unique<rds2cpp::GenericVector>()
);
file_info.
object = std::move(ptr);
SymbolIndex register_symbol(std::string name, StringEncoding encoding, std::vector< Symbol > &symbols)
Definition RObject.hpp:117
std::vector< Symbol > symbols
Definition RdsFile.hpp:57
We can also create environments by registering the environment before creating indices to it.
auto sptr = std::make_unique<rds2cpp::StringVector>();
sptr->data.emplace_back("bar", rds2cpp::StringEncoding::UTF8);
sptr->data.emplace_back();
sptr->data.emplace_back("whee", rds2cpp::StringEncoding::ASCII);
std::vector< Environment > environments
Definition RdsFile.hpp:51
Reading/writing RDA files
Each RDA file (a.k.a., Rdata) contains multiple R objects, each of which is associated with a unique name. It is typically created by calling save() within an R session. We can read all objects into memory with the parse_rda() function:
for (const auto& obj : file_info.objects) {
const auto& obj_name = file_info.
symbols[obj.name.index].name;
switch (obj.value->type()) {
case rds2cpp::SEXPType::INT:
break;
case rds2cpp::SEXPType::STR:
break;
default:
}
}
RdaFile parse_rda(Reader_ &reader, const ParseRdaOptions &options)
Definition parse_rda.hpp:51
Options for parse_rda().
Definition parse_rda.hpp:27
Similarly, we can write the name/object pairs into an RDA file.
#include <numeric>
auto ivec = std::make_unique<rds2cpp::IntegerVector>(5);
std::iota(ivec->data.begin(), ivec->data.end(), 1);
auto list = std::make_unique<rds2cpp::GenericVector>(2);
list->data[0].reset(new Null);
auto svec = std::make_unique<rds2cpp::StringVector>(1);
svec->data[0].value = "FOOBAR";
std::move(ivec)
);
std::move(list)
);
std::move(svec)
);
void write_rda(const RdaFile &info, Writer &writer, const WriteRdaOptions &options)
Definition write_rda.hpp:50
Contents of the parsed RDA file.
Definition RdaFile.hpp:47
std::vector< RdaObject > objects
Definition RdaFile.hpp:71
std::vector< Symbol > symbols
Definition RdaFile.hpp:83
Options for write_rda().
Definition write_rda.hpp:27
Building projects
CMake with FetchContent
If you're using CMake, you just need to add something like this to your CMakeLists.txt:
include(FetchContent)
FetchContent_Declare(
rds2cpp
GIT_REPOSITORY https://github.com/LTLA/rds2cpp
GIT_TAG master # or any version of interest
)
FetchContent_MakeAvailable(rds2cpp)
Then you can link to rds2cpp to make the headers available during compilation:
# For executables:
target_link_libraries(myexe rds2cpp)
# For libaries
target_link_libraries(mylib INTERFACE rds2cpp)
CMake using find_package()
You can install the library by cloning a suitable version of this repository and running the following commands:
mkdir build && cd build
cmake ..
cmake --build . --target install
Then you can use find_package() as usual:
find_package(ltla_rds2cpp CONFIG REQUIRED)
target_link_libraries(mylib INTERFACE ltla::rds2cpp)
Manual
If you're not using CMake, the simple approach is to just copy the files in the [include/](include) subdirectory - either directly or with Git submodules - and include their path during compilation with, e.g., GCC's -I. You'll need to add the various dependencies listed in extern/CMakeLists.txt to the compiler's search path. You'll also need to link to Zlib.
Known limitations
This library may not support RDS files created using saveRDS() with non-default parameters.
Environments are written without a hash table, so as to avoid the need to replicate R's string hashing logic. This may result in slower retrieval of variables when those environments are loaded into an R session.
Currently, no support is provided for unserializing built-in functions or user-defined closures.