Overview
This repository contains a header-only C++ library for reading and writing RDS files (created with saveRDS()
) without the need to link to R's libraries. In this manner, we can use RDS as a flexible data exchange format across different frameworks that have C++ bindings, e.g., Python, Javascript (via Wasm). We currently support most user-visible data structures such as atomic vectors, lists, environments and S4 classes.
Quick start
Given a path to an RDS file, the parse_rds()
function will return a pointer to an RObject
interface:
const auto& ptr = file_info->object;
RdsFile parse_rds(Reader_ &reader)
Definition parse_rds.hpp:34
The type of the underlying object can then be queried for further examination. For example, if we wanted to process integer vectors:
if (ptr->type() == rds2cpp::SEXPType::INT) {
const auto& values = iptr->
data;
const auto& attr_names = iptr->attributes.names;
}
Vector of some atomic type.
Definition RObject.hpp:178
std::vector< ElementType > data
Definition RObject.hpp:193
See the reference documentation for a list of known representations.
More reading examples
rds2cpp can extract ordinary lists from an RDS file. Users can inspect the attributes to determine if the list is named.
if (ptr->type() == rds2cpp::SEXPType::VEC) {
const auto& elements = lptr->
data;
const auto& attr = lptr->attributes;
const auto& attr_names = sptr->attributes.names;
const auto& attr_values = sptr->attributes.values;
auto nIt = std::find(attr_names.begin(), attr_names.end(), std::string("names"));
if (nIt != attr_names.end()) {
size_t nindex = nIt - attr_names.begin();
if (attr_values[nindex]->type() == rds2cpp::SEXPType::STR) {
}
}
}
Generic vector, i.e., an ordinary R list.
Definition RObject.hpp:286
std::vector< std::unique_ptr< RObject > > data
Definition RObject.hpp:300
String vector.
Definition RObject.hpp:229
Slots of S4 instances are similarly encoded in the attributes - except for the class name, which is extracted into its own member.
if (ptr->type() == rds2cpp::SEXPType::S4) {
sptr->package_name;
const auto& slot_names = sptr->attributes.names;
const auto& slot_values = sptr->attributes.values;
}
S4 object.
Definition RObject.hpp:399
std::string class_name
Definition RObject.hpp:405
Advanced users can also pull out serialized environments. These should be treated as file-specific globals that may be referenced one or more times inside the R object.
if (ptr->type() == rds2cpp::SEXPType::ENV) {
const auto& env = file_info->environments[eptr->index];
const auto& vnames = env.variable_names;
const auto& vvalues = env.variable_values;
}
NULL
s are supported but not particularly interesting:
if (ptr->type() == rds2cpp::SEXPType::NIL) {
}
Writing RDS files
The write_rds()
function will write RDS files from an rds2cpp::RObject
representation:
vec->data = std::vector<int32_t>{ 0, 1, 2, 3, 4, 5 };
void write_rds(const RdsFile &info, Writer &writer)
Definition write_rds.hpp:31
AtomicVector< int32_t, SEXPType::INT > IntegerVector
Integer vector.
Definition RObject.hpp:204
Contents of the parsed RDS file.
Definition RdsFile.hpp:21
std::unique_ptr< RObject > object
Definition RdsFile.hpp:68
Here's a more complicated example that saves a sparse matrix (as a dgCMatrix
from the Matrix package) to file.
auto& obj = *ptr;
obj.class_name = "dgCMatrix";
obj.package_name = "Matrix";
obj.attributes.add("i", ivec);
ivec->data = std::vector<int32_t>{ 6, 8, 0, 3, 5, 6, 0, 1, 3, 7 };
obj.attributes.add("p", pvec);
pvec->data = std::vector<int32_t>{ 0, 0, 2, 3, 4, 5, 6, 8, 8, 8, 10 };
obj.attributes.add("x", xvec);
xvec->data = std::vector<double>{ 0.96, -0.34, 0.82, -2, -0.72, 0.39, 0.16, 0.36, -1.5, -0.47 };
obj.attributes.add("Dim", dims);
dims->data = std::vector<int32_t>{ 10, 10 };
AtomicVector< double, SEXPType::REAL > DoubleVector
Double-precision vector.
Definition RObject.hpp:214
void add(std::string n, RObject *v, StringEncoding enc=StringEncoding::UTF8)
Definition RObject.hpp:151
Attributes attributes
Definition RObject.hpp:305
R's NULL value.
Definition RObject.hpp:47
We can also create environments by registering the environment before creating indices to it.
current_env.
add(
"foo", sptr);
sptr->add("bar");
sptr->add();
sptr->add("whee");
Reference to an environment.
Definition RObject.hpp:72
std::vector< Environment > environments
Definition RdsFile.hpp:74
void add(std::string d, StringEncoding enc=StringEncoding::UTF8)
Definition RObject.hpp:262
Building projects
CMake with FetchContent
If you're using CMake, you just need to add something like this to your CMakeLists.txt
:
include(FetchContent)
FetchContent_Declare(
rds2cpp
GIT_REPOSITORY https://github.com/LTLA/rds2cpp
GIT_TAG master # or any version of interest
)
FetchContent_MakeAvailable(rds2cpp)
Then you can link to rds2cpp to make the headers available during compilation:
# For executables:
target_link_libraries(myexe rds2cpp)
# For libaries
target_link_libraries(mylib INTERFACE rds2cpp)
CMake using find_package()
You can install the library by cloning a suitable version of this repository and running the following commands:
mkdir build && cd build
cmake ..
cmake --build . --target install
Then you can use find_package()
as usual:
find_package(ltla_rds2cpp CONFIG REQUIRED)
target_link_libraries(mylib INTERFACE ltla::rds2cpp)
Manual
If you're not using CMake, the simple approach is to just copy the files in the [include/
](include) subdirectory - either directly or with Git submodules - and include their path during compilation with, e.g., GCC's -I
.
You'll also need to add the byteme header-only library to the compiler's search path. Normally, when using CMake, this is automatically linked to Zlib; this will now need to be done manually.
Known limitations
This library may not support RDS files created using saveRDS()
with non-default parameters.
Environments are written without a hash table, so as to avoid the need to replicate R's string hashing logic. This may result in slower retrieval of variables when those environments are loaded into an R session.
Currently, no support is provided for unserializing built-in functions or user-defined closures.