byteme
Read/write bytes from various sources
Loading...
Searching...
No Matches
Gimme some bytes

Unit tests Documentation codecov

Overview

This library implements a few functors to read buffered inputs from uncompressed or Gzip-compressed files or buffers. Classes can be exchanged at compile- or run-time to easily re-use the same code across different input sources. The aim is to consolidate some common boilerplate across several projects, e.g., tatami, singlepp. Interfacing with Zlib is particularly fiddly and I don't want to be forced to remember how to do it in each project.

Usage

To read bytes, create an instance of the desired Reader class and loop until no bytes remain in the source.

const char* filepath = "input.gz";
byteme::GzipFileReader reader(filepath, {});
std::vector<unsigned char> buffer(20);
while (1) {
// read() returns the number of bytes that were actually read into the buffer.
auto num_read = reader.read(buffer.data(), buffer.size());
/* Do something with the available bytes in the buffer */
if (num_read < buffer.size()) {
// If fewer bytes are read than requested, the input is finished.
break;
}
}
Umbrella header for all byteme classes.
Read uncompressed bytes from a Gzip-compressed file.
Definition GzipFileReader.hpp:40

To write bytes, create the desired Writer class and supply an array of bytes until completion.

std::vector<std::string> lyrics {
"Kimi dake o kimi dake o",
"Suki de ita yo",
"Kaze de me ga nijinde",
"Tooku naru yo"
};
byteme::GzipFileWriter writer("something.gz", {});
const char newline = '\n';
for (const auto& line : lyrics) {
writer.write(reinterpret_cast<const unsigned char*>(line.c_str()), line.size());
writer.write(reinterpret_cast<const unsigned char*>(&newline), 1);
}
writer.finish();
Write uncompressed bytes to a Gzip-compressed file.
Definition GzipFileWriter.hpp:42
void finish()
Definition GzipFileWriter.hpp:71

More details can be found in the reference documentation.

Supported classes

For the readers:

Class Description
RawBufferReader Read from a uncompressed buffer
RawFileReader Read from an uncompressed file
ZlibBufferReader Read from a Zlib-compressed buffer
GzipFileReader Read from an Gzip-compressed file
IstreamReader Read from a std::istream

For the writers:

Class Description
RawBufferWriter Write to a uncompressed buffer
RawFileWriter Write to an uncompressed file
ZlibBufferWriter Write to a Zlib-compressed buffer
GzipFileWriter Write to an Gzip-compressed file
OstreamWriter Write to a std::ostream

The different subclasses can be switched at compile time via templating, or at run-time by exploiting the class hierarchy:

#include <memory>
std::vector<unsigned char> input_buffer;
auto buffer = input_buffer.data();
size_t length = input_buffer.size();
std::unique_ptr<byteme::Reader> ptr;
if (some_condition) {
ptr.reset(new byteme::ZlibBufferReader(buffer, length, {}));
} else {
ptr.reset(new byteme::RawBufferReader(buffer, length));
}
// Read bytes into the buffer from an abstract input source.
std::vector<unsigned char> buffer(123);
auto available = ptr->read(buffer.data(), buffer.size());
Read bytes from a raw buffer, usually text.
Definition RawBufferReader.hpp:22
Read and decompress bytes from a Zlib-compressed buffer.
Definition ZlibBufferReader.hpp:46

Most of the Reader and Writer constructors will also accept a matching Options instance to fine-tune their behavior.

// For readers.
zopt.buffer_size = 8096;
zopt.mode = byteme::ZlibCompressionMode::GZIP;
byteme::ZlibBufferReader zreader(buffer, length, zopt);
// For writers.
zwopt.buffer_size = 8096;
zwopt.mode = byteme::ZlibCompressionMode::DEFLATE;
byteme::ZlibBufferReader zwriter(zwopt);
Options for the ZlibBufferReader constructor.
Definition ZlibBufferReader.hpp:27
std::optional< ZlibCompressionMode > mode
Definition ZlibBufferReader.hpp:32
std::size_t buffer_size
Definition ZlibBufferReader.hpp:38
Options for the ZlibBufferWriter constructor.
Definition ZlibBufferWriter.hpp:26
int compression_level
Definition ZlibBufferWriter.hpp:36
ZlibCompressionMode mode
Definition ZlibBufferWriter.hpp:30
std::size_t buffer_size
Definition ZlibBufferWriter.hpp:42

Buffered reading and writing

Some applications need to access small chunks or individual bytes from the input stream. Calling Reader::read() for each request could be too expensive, e.g., if each call makes some attempt to access a storage device. In such cases, users can create a BufferedReader class to wrap each Reader. This will read a large chunk into a buffer from which smaller chunks or individual bytes can be extracted.

auto reader = std::make_unique<byteme::GzipFileReader>(filepath, {})
byteme::SerialBufferedReader<char> pb(std::move(reader), /* buffer_size = */ 65536);
auto valid = pb.valid();
while (valid) {
char x = pb.get();
// Do something with 'x'.
valid = pb.advance();
}
Serial buffering to wrap a Reader.
Definition BufferedReader.hpp:324

We can also extract a range of bytes:

auto reader = std::make_unique<byteme::GzipFileReader>(filepath, {})
byteme::SerialBufferedReader<unsigned char> pb(std::move(reader), /* buffer_size = */ 65536);
while (valid) {
std::int32_t value;
auto outcome = pb.extract(reinterpret_cast<unsigned char*>(&value), sizeof(std::int32_t));
if (outcome.first != sizeof(std::int32_t)) {
// uh oh, not enough bytes.
} else {
// do something with the extracted integer.
}
valid = outcome.second;
}

We can even perform the reading in a separate thread via the ParallelBufferedReader class. This allows the (possibly expensive) disk IO operations to be performed in parallel to the user-level parsing.

auto reader = std::make_unique<byteme::GzipFileReader>(filepath, {})
byteme::ParallelBufferedReader<char> pb(std::move(reader), /* buffer_size = */ 65536);
auto valid = pb.valid();
while (valid) {
char x = pb.get();
// Do something with 'x'.
valid = pb.advance();
}
Parallelized buffering to wrap a Reader.
Definition BufferedReader.hpp:371

Similarly, BufferedWriter will cache all write requests into a large buffer, intermittently calling Writer::write() to push the buffered bytes to the underlying storage.

auto writer = std::make_unique<byteme::GzipFileWriter>(filepath, {})
byteme::SerialBufferedWriter<char> pb(std::move(writer), /* buffer_size = */ 65536);
std::string input("foobarwhee");
for (auto i : input) { // write individual bytes.
pb.write(i);
}
pb.write(input.c_str(), input.size()); // or write an array.
pb.finish(); // flush everything to file.
Serial buffering to wrap a Writer.
Definition BufferedWriter.hpp:208

Building projects

CMake using FetchContent

If you're using CMake, you just need to add something like this to your CMakeLists.txt:

include(FetchContent)
FetchContent_Declare(
byteme
GIT_REPOSITORY https://github.com/LTLA/byteme
GIT_TAG master # or any version of interest
)
FetchContent_MakeAvailable(byteme)

Then you can link to byteme to make the headers available during compilation:

# For executables:
target_link_libraries(myexe byteme)
# For libaries
target_link_libraries(mylib INTERFACE byteme)

CMake using find_package()

You can install the library by cloning a suitable version of this repository and running the following commands:

mkdir build && cd build
cmake .. -DBYTEME_TESTS=OFF
cmake --build . --target install

Then you can use find_package() as usual:

find_package(ltla_byteme CONFIG REQUIRED)
target_link_libraries(mylib INTERFACE ltla::byteme)

Manual

If you're not using CMake, the simple approach is to just copy the files the include/ subdirectory - either directly or with Git submodules - and include their path during compilation with, e.g., GCC's -I.

Adding Zlib support

To support Gzip-compressed files, we also need to link to Zlib. When using CMake, byteme will automatically attempt to use find_package() to find the system Zlib. If no Zlib is found, it is skipped and no Gzip functionality is provided by the libary. Users can also set the BYTEME_FIND_ZLIB option to OFF to provide their own Zlib.

Further comments

I thought about using C++ streams, much like how the zstr library handles Gzip (de)compression. However, I'm not very knowledgeable about the std::istream interface, so I decided to go with something simpler. Just in case, I did add a byteme::IstreamReader class so that byteme clients can easily leverage custom streams.