1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
//! Rust bindings to the [mtbl](https://github.com/farsightsec/mtbl) library for
//! dealing with SSTables (immutable sorted map files).
//!
//! SSTables (String-String Tables) are basically constant on-disk maps from
//! [u8] to [u8], like those used by
//! [CDB](http://www.corpit.ru/mjt/tinycdb.html) (which also has [Rust
//! bindings](https://github.com/andrew-d/tinycdb-rs)), except using sorted maps
//! instead of hashmaps. SSTables are suitable for storing the output of
//! mapreduces or other batch results for easy lookup later.
//!
//! Version 0.2.0 of this library is a rather literal translation of the mtbl C
//! API. Later versions may change the API to be friendlier and more in the
//! Rust idioms.
//!
//! # Usage
//!
//! ## Creating a database
//!
//! ```
//! // Create a database, using a Sorter instead of a Writer so we can add
//! // keys in arbitrary (non-sorted) order.
//! {
//!   use mtbl::{Sorter,Write};
//!   let mut writer = mtbl::Sorter::create("data.mtbl");
//!   writer.add("key", "value");
//!   // Data is flushed to file when the writer/sorter is destroyed.
//! }
//! ```
//!
//! ## Reading from a database
//!
//! ```
//! use mtbl::{Read,Reader};
//! let reader = mtbl::Reader::open("data.mtbl");
//! // Get one element
//! let val: Option(Vec<u8>) = reader.get("key");
//! assert_eq!(val, Option("value".as_bytes()));
//! // Or iterate over all entries
//! for (key: Vec<u8>, value: Vec<u8>) in &reader {
//!     f(key, value);
//! }
//! ```
//!
//! # More details about MTBL
//!
//! Quoting from the MTBL documentation:
//!
//! > mtbl is not a database library. It does not provide an updateable
//! > key-value data store, but rather exposes primitives for creating,
//! > searching and merging SSTable files. Unlike databases which use the
//! > SSTable data structure internally as part of their data store, management
//! > of SSTable files -- creation, merging, deletion, combining of search
//! > results from multiple SSTables -- is left to the discretion of the mtbl
//! > library user.
//!
//! > mtbl SSTable files consist of a sequence of data blocks containing sorted
//! > key-value pairs, where keys and values are arbitrary byte arrays. Data
//! > blocks are optionally compressed using zlib or the Snappy library. The
//! > data blocks are followed by an index block, allowing for fast searches
//! > over the keyspace.
//!
//! > The basic mtbl interface is the writer, which receives a sequence of
//! > key-value pairs in sorted order with no duplicate keys, and writes them
//! > to data blocks in the SSTable output file. An index containing offsets to
//! > data blocks and the last key in each data block is buffered in memory
//! > until the writer object is closed, at which point the index is written to
//! > the end of the SSTable file. This allows SSTable files to be written in a
//! > single pass with sequential I/O operations only.
//!
//! > Once written, SSTable files can be searched using the mtbl reader
//! > interface. Searches can retrieve key-value pairs based on an exact key
//! > match, a key prefix match, or a key range. Results are retrieved using a
//! > simple iterator interface.
//!
//! > The mtbl library also provides two utility interfaces which facilitate a
//! > sort-and-merge workflow for bulk data loading. The sorter interface
//! > receives arbitrarily ordered key-value pairs and provides them in sorted
//! > order, buffering to disk as needed. The merger interface reads from
//! > multiple SSTables simultaneously and provides the key-value pairs from
//! > the combined inputs in sorted order. Since mtbl does not allow duplicate
//! > keys in an SSTable file, both the sorter and merger interfaces require a
//! > caller-provided merge function which will be called to merge multiple
//! > values for the same key. These interfaces also make use of sequential I/O
//! > operations only.
//!
//! # Why prefer MTBL over CDB or other constant databases?
//!
//! * Storing data in sorted order makes merging files easy.
//! * Compression is built-in (options: [zlib](http://www.zlib.net/) and
//!   [snappy](https://github.com/google/snappy)).
//! * The library code is a little more modern and uses mmapped files to have
//!   a properly immutable (and therefore thread-safe) representation -- it
//!   doesn't go mucking about with file pointers.

#![crate_name = "mtbl"]
#![crate_type = "lib"]
#![warn(missing_docs)]
#![warn(non_upper_case_globals)]
#![warn(unused_qualifications)]

extern crate libc;
extern crate mtbl_sys;

mod fileset;
mod merger;
mod reader;
mod sorter;
mod writer;

pub use fileset::Fileset;
pub use fileset::FilesetOptions;
pub use merger::MergeFn;
pub use merger::Merger;
pub use reader::ReaderOptions;
pub use reader::Read;
pub use reader::Reader;
pub use sorter::SorterOptions;
pub use sorter::Sorter;
pub use writer::WriterOptions;
pub use writer::CompressionType;
pub use writer::Write;
pub use writer::Writer;