MetaKit is an efficient, small-footprint, robust, and platform-independent database library. The name "MetaKit" is used as name for the file format, as well as for the implementation (coded in C++, with interfaces to Python and Tcl). MetaKit 2.0 is freely available as open source, with optional commercial support through my company, Equi4 Software.
This document describes some of the main elements of the design of the file format used in release 2.0 of MetaKit - which reads all prior formats back to the 1.0 release of March 1996.
For general information about MetaKit, you are kindly referred to information on the world wide web - the MetaKit home page is at http://www.equi4.com/metakit/. For the purpose of this document, here is a summary with the capabilities of MetaKit which most affect the design of the file format.
BASIC DATA STRUCTURE
To describe the format used for the datafiles, it is necessary to introduce some terminology and to describe how data is organized. What MetaKit does, is present a more or less relational perspective on data storage.
But the fundamental concept of MetaKit, is that all data is stored column-wise. A view (table) with 10,000 rows (records), consisting of 25 properties (attributes), is stored on file as a data structure pointing to 25 columns of bytes. Each of these columns may well have a different format - depending on their datatype, and each will contain 10,000 entries. Each column is stored as a single contiguous area on file, which can be reallocated (i.e. moved) to accommodate size changes later on.
This column-based data organization has far-reaching consequences for
the implementation. For one, the implementation fully hides
the column-wise structure and present a normal container-like row-wise
API, which matches the conceptual / relational model. Furthermore,
a range of techniques has been implemented to offer high performance.
FILE FORMAT OVERVIEW
Once you switch your mind to this column-wise approach, the format of MetaKit datafiles becomes quite simple to describe:
Note that the amount of information needed to define all views and columns can remain very small. The number of rows in each view affects only the size of each column, not the number of them. This effect is used while opening a datafile: MetaKit quickly reads the header, description, and all column position/size information. From then on, access to data is basically instant - with seeks doing the rest (actually, most of it is memory-mapped file access). To use the previous example: a view with 10,000 rows of 25 properties ends up being managed as 25 columns on file - which is a very simple task.
The row counts and the <offset,size> tuples defining all columns
are stored in an tagged-byte format, with as many bytes as need to represent
each integer value. This format is space efficient for small
files, and is able to adapt to file sizes (and offsets) which are more
than 32 bits. All values are stored consecutively and are loaded
very quickly (this includes the more complex case when subviews are part
of the data structure).
The description stored in each datafile is a string which precisely describes the name of all views and subviews, and the name and type of all properties in these subviews. An example of such a definition string (as stored on file) will make it clear how the information is represented:
Datatypes are currently defined as one of the following:
S - string (null-byte terminated)The above example describes a datafiles with information about companies, each having a variable number of offices, which in turn have a varying number of employees. Other information can be added in the same file, it is not limited to a single view (companies in this case).
I - integer (1..32 bits, adapting at run time)
F - 32-bit floating point
D - 64-bit floating point
B - untyped binary data
M - "memo" binary data (see below)
This string, as well as all strings stored as properties of rows, is
stored in UTF-8 format, which allow storing Unicode strings while maintaining
compatibility of datafiles across all platforms.
STORING BINARY DATA
The file format described so far is useful for small to medium-sized
datasets (see the section on "performance" below), but as column sizes
grow, performance tends to suffer in a number of ways. To meet requests
to store larger amounts of data efficiently, "memo" properties were added
in March 1998. These add a single level of indirection, and allow
storing binary data as individually allocated objects in the datafile.
This mechanism works well when rows hold data items which are each of up
to 1 Mb or so. The current limitation is that each such memo-item
is loaded and saved in one step. A future release will allow partial
reading and writing of such items, and will in fact allow very efficient insertion
and deletion at any position within such items. This will allow
access, and storage of concurrent independently emitted data streams (it's
mostly a matter of designing a proper API to get at the functionality -
which is already present).
FILE FORMAT DETAILS
With all this conceptual information out of the way, it is now possible to describe the file format is somewhat more detail.
FREE SPACE MANAGEMENT AND RECOVERY
The datafile format of MetaKit contains no other information than what has been described so far. Specifically, there is no "free space" pool. This is possible, because free space can be deduced from what is present in the file - it consists of those areas of the file which are not occupied by the information described so far (header, structure def, column defs, columns).
This lack of redundancy improves robustness (the potential for inconsistencies is reduced), is more efficient (less information to carry around), and reduces file size.
Recovery is based on "stable storage" - a very effective technique which makes the datafile resilient against catastrophic failure. It all works by never writing changes to a datafile over current data. All changes are allocated and written to currently unused areas of the file (extending it if necessary). Only changed data is rewritten. On commit, all changes are flushed, and when this has been completed, a single offset in the file header is overwritten (this is atomic) - pointing to the new state. Once the commit is complete, free space in the file is redefined to include old and now obsolete areas of the file, and to exclude the newly allocated data.
To date, with thousands of users known to be using software based on MetaKit in production-grade commercial products, no case has ever been reported of a corrupted data file. MetaKit does not contain any recovery/repair code - there is no need.
The trade-off is that datafiles which get modified can grow to roughly twice their optimum size. Despite this, MetaKit datafile sizes are often smaller than with other approaches. The reasons for this are that MetaKit supports variable-sized data with no overhead, that integers tend to be packed very densely, that there is no overhead per row, and that techniques such as B-trees with partially-filled nodes are not used.
It goes without saying that all unused data areas get re-used when possible
- file size usually increases during the first few commits, and remains more or less
stable after that. For those cases where slack is unacceptable, a
simple full re-save can be used to produce a fresh, optimally-packed datafile.
The column-wise datafile format makes it possible to restructure datafiles - i.e. add or delete properties - very efficiently. All it takes, is reading the structural data (which happens during open anyway), creating or deleting one or more columns, and storing the adjusted structural data. Since no rows are accessed at all, this is virtually instant.
This capability has recently been used to implement a generic database server - it starts out empty, with no idea what data it will store, and it adapts whenever clients connect. Although this is in itself an interesting technical feature, the value lies in long-lived data, since it is now possible to store information in whatever format seems appropriate right now, and to write software which evolves to work with more complex structures overtime. Whenever a datafile is opened, its format can be inspected, applying the transformations which are required to bring it up to date - instantly, during open. Even if the data is read-only, such on-the-fly adaptations can be used for the duration of the run (simply by never committing).
The restructuring capabiltity, and the fact that it can be done almost
instantly, is very useful for long-lived data. It
simplifies the task of writing software which can handle all file formats used
in previous releases (backward compatibility).
Adding or dropping information as new releases of
a product are issued no longer leads to compatibility problems or lengthy
conversions. Furthermore, techniques are being developed to let older
software deal with files created by newer releases (forward compatibility).
a common problem in today's software, which forces users to continuously
upgrade their software to be able to easily exchange files with others.
The performance of this data format is extremely difficult to assess. During the evolution from version 1.0 to the current 2.0 release, improvements have been made of some two orders of magnitude. This is probably not very informative, since it might simply reflect how bad the original design was.
But there are several factors which can help illustrate why this is probably not the case:
Evidently, at some point, it will be better to index by key values than to scan millions of strings for a match. This is where subviews in MetaKit come in handy. By structuring data in a "slightly hierarchical" manner, searches again often end up being over just thousands or tens of thousands of entries. Also, keeping data sorted (in some sense) and then doing binary search can often be an effective option (this is part of the Metakit API).
Sequential searching is well suited for keyword searches and for regex pattern-matching, and by its very nature well suited for ad-hoc queries: all properties are searchable without the need to decide what key indexes to define up front.
Some further explanations why performance is high in MetaKit:
There are some limitations in the current MetaKit implementation, these are currently being addressed:
In the end, the proof is in the pudding. As it turns out, three years of MetaKit use and evolution have shown that a lot of performance can be achieved as a result of reducing file format and implementation complexity. Robustness is achieved through minimizing redundancy, while fail-safety is achieved through a form of controlled duplication.
Both of these have led to a format which is very simple, flexible enough to evolve (half of the datatypes were added after the fact), and which has the capability of being instantly restructured to adapt to new data structures. This latter property of the MetaKit datafile format is what makes it future-proof for application developers - in a very practical sense.
There are some drawbacks to using column-wise storage, but the current
implementation shows that despite this, a number of technical "tricks"
can lead to an implementation (consisting of 16,000 lines of C++) which
shows excellent performance in databases scaling to 10,000
views/subviews or 100,000 rows. Future versions will aim to make
10,000 views/subviews times 100,000 rows efficient.