metakit-fileformat(n) 1.0 "Metakit Database System"

NAME

metakit-fileformat - Metakit File Format

DESCRIPTION

This document specifies the file format used by the metakit database library for persistent storage of its databases. The same format is also used for serialization and subsequent transfer of a database over some communication system, like pipes and sockets.

How the metakit library uses files in the specified format is outside of the scope of this document, although in some sections, hints for specific uses might be given.

To ensure an unambiguous use of all terms inside of this document and when discussing its contents a glossary was created. See section GLOSSARY at the end.

It was decided to specify the format in the form of a grammar in Extended Backus-Naur Form (EBNF) augmented by free-form text to capture the context-sensitive parts of the language. It is assumed that the reader of this document is familiar with EBNF.

BACKGROUND

The background for this specification is metakit (https://www.equi4.com/metakit), a flexible database system developed by Coen Siegerink (mailto:[email protected]).

In contrast to most other systems which handle their data row-wise it manages the data in a column-oriented way, i.e. all data for a single column is handled together. This characteristic is reflected in the file format too.

What would be called tables in other relational database systems are known as views in Metakit. Views consist of columns, which store specific pieces of data in cells. The cells at the same row-index in all columns of a view are called a row.

Another difference is metakit's ability to define subview columns, which are columns where the data in each cell is a complete view in its own right, although they are sharing the same structural definition.

TYPE DEFINITIONS

Metakit supports the following six types for its columns. The key in the list below is the character used by metakit as indicator for that type. See section STRUCTURE DEFINITION for the place in which these characters are used.
S
All entries in a column of this type contain strings.

I
All entries in a column of this type contain integer numbers requiring at most 32 bits of storage space. All entries will use the same number of bits to store their data.

F
All entries in a column of this type contain single precision floating point numbers, each taking up 4 bytes (32 bits) of space.

D
All entries in a column of this type contain double precision floating point numbers, each taking up 8 bytes (64 bits) of space.

B
All entries in a column of this type contain arbitrary binary data of arbitrary length. There is no bit-packing, the data is measured in bytes.

L
All entries in a column of this type contain large integer numbers, each taking up 8 bytes (64 bits) of space.

COLUMN MAPPING

When metakit stores column data into a file or serialization it places them into one or more itemvectors, the physical containers for the data. How many itemvectors are required is dependent on the type of the column, and on the data contained in it.

This section describes only the basic mapping required to create the table of contents (See TableOfContents), and none of the secondary itemvectors indirectly reachable through the primary itemvectors of a column listed in the table of contents.

I, L, F, D
A single primary itemvector is used to store all column data.

S, B
Depending on the size of the string/binary data stored in the entries of a column either two or three primary itemvectors are used to store the column data. In addition secondary itemvectors may be reached through these, holding the actual string/binary data.
The exact contents of each itemvector are described in the upcoming grammar. See IVecData and its variants.

VARIABLE SIZED DATA

One of the consequences of using a column-wise representation for views is that for any insertion, deletion, or change of a row the system has to relocate and copy all itemvectors for all the columns in the view. This is not so big a problem for data of a fixed size, like for the types I, F, D, and L. For them this operation is only invoked when inserting or deleting row. Changing the value of a cell invokes only the relocation and copying of the itemvectors for one column, and they tend to be relatively small.

This situation changes when data of varying and arbitrary length is involved, be it strings or just binary data (types S and B). For them the simple method of storing all the data in one itemvector and the sizes of the items in a second scales badly as even minuscule changes cause the copying of large amount of data.

To evade this trap the file format uses a slightly more complex method. Instead of only two itemvectors it employs three. The first two are the same ones as for the simple method, with a small change. While the first itemvector contains the sizes for all items, the second itemvector contains only the data for the items with a size > 0. Items whose size is recored as zero are not stored in the second itemvector, but are indirectly reachable through the third itemvector, a catalog. Each of the third itemvector's items records the location of another itemvector in the file on the one hand, and information determining to which row in the column the item belongs to. In other words, how to interleave the items reachable through the catalog with the items in the first two itemvectors to reconstruct their proper order at the logical level of the column.

With the above structure in place any writer of a database is now free in his decision where to actually place the variable sized data of a cell when writing to the file. Namely either directly into the second itemvector, or into a block of its own with the location of that block recorded in the catalog vector. By storing smaller data directly and larger data indirectly the performance impact of the large data is reduced considerably, because now only the itemvector containing the catalog has to be copied for changes, whereas the large data blocks often can be left in the location initially given to them.

The relevant symbols of the grammar are IVecCatalogData and VariableMapping. See section FORMAT GRAMMAR for their definition.

 
  For the example let us assume that we have items 0 and 3 and 6 all
  having small amounts of data, items 1 and 4 are empty, and 2 is a
  larger memo item. Then the situation would be:
  Column 0, the data: concatenated contents of items 0, 3, and 6.
  Column 1, the sizes: sizes for entries 0, 3, and 6, the rest zeroes.
  Column 2, the memos:
    2 as byte-packed integer, meaning skip 2 rows
    the size of the data in row 2, as byte-packed integer
    the pointer to the data in row 2, as byte-packed integer

LEXICAL UNITS

The lexical units of the grammar used here are are the fundamental pieces making up a metakit file or serialization. This unit is the byte, containing 8 bits.

FORMAT GRAMMAR

The grammar is written in a bottom up format. This means that the more basic elements are specified first, and the specification of the complete database is the last element.
Word
::= byte byte

A 16-bit word consists of two bytes. The endianess of words is variable. When reading, the metakit library determines the actual endianess from the marker in the Header. When writing the metakit library uses the native endianess of the host on which it is running.

Long
::= byte byte byte byte

A 32-bit long word consists of four bytes (or two words). The endianess of long words is variable. When reading, the metakit library determines the actual endianess from the marker in the Header. When writing the metakit library uses the native endianess of the host on which it is running.

bpInt
::= [ bpiSignByte ] { bpiDataByte } bpiStopByte

The name is a shortcut for byte-packed integer. It is a notation for storing arbitrarily large integer numbers in a very compact way. Note that any number is always stored in the most compact way possible. This means that leading zeros are always stripped down as much as possible. In other words, no instance of bpInt will contain a bpiDataByte of value 0x00 at its beginning. This definition is necessary to allow unambiguous separation of bpiSignByte and bpiDataByte. bpInt's are always stored in big-endian format. In other words, the MSB is stored first, followed by the lesser bits.

bpiSignByte
::= byte /= 0x00

The sign byte is an optional part of instances of bpInt. Its presence indicates that the following bpiDataByte and bpiStopByte contain a negative number and that the actual value in 2-complement representation is obtained by negating all bits in the expanded number (1-complement).

bpiDataByte
::= byte /in 0x00 .. 0x7f

The meat of any instance of bpInt. Bit 7 is set to 0, indicating that more bytes follows. Bit 6 to 0 are the payload, a fragment of the 2-complement representation of the number stored in the bpInt.

bpiStopByte
::= byte /in 0x80 .. 0xff

The last (and possibly the only) byte in any instance of bpInt. Bit 7 is set to 1, as indicator that this is the last byte in bpInt. Bit 6 to 0 are the payload, the last fragment of the 2-complement representation of the number stored in the instance of bpInt.


Examples:

 
        0x80 : First is last byte,
               payload = 00000000/2 = 0/10
        0x00 0x80 : Negative number
                    payload      = 0000000/2
                    1-complement = 1111111/0 = -1/10
        0x94 : First is last byte.
               payload 0011000/2 = 20/10
        0x00 0x94 : Negative number
                    payload      = 0011000/2
                    1-complement = 1100111/2 = -21/10



Char
::= { byte }{1,4}

A character in UTF-8 encoding, taking up one to four bytes.

pString
::= bpInt { Char }

A string of characters, with prefixed length information. This is the Pascal notation, hence the p. Note that the length is specified as the number of bytes in the string, and not the number of characters. However as the contents have to be a sequence of valid UTF-8 characters this symbol was used in the definition instead of the more generic byte.

cString
::= { Char } '\0'

A string of characters, terminated by a null-byte. This is C notation.

UnknownLong
::= Long

Usage in a definition implies that the purpose of the element is not known.

Header
::= Magic Valid HeaderType FooterLocation

Magic
::= byte byte /= (0x4a 0x4c = "JL" | 0x4c 0x4a = "LJ")

These two magic bytes signal the beginning of a metakit database, and additionally the endianess of Word and Long values. The first form, "JL", indicates a little endian format, the other one indicates a big endian format.

Valid
::= byte /= 0x1a Indicates that the header is valid. No other value is known.

HeaderType
::= byte /in { 0x00, 0x80 }

The value 0x00 indicates a new-style header whereas 0x80 signals usage of an old-style header. The format of old-style headers is not known, therefore the presence of 0x80 can be treated as an error.

FooterLocation
::= Long

A pointer to the Footer of the database. Through that the TableOfContents can be reached and from there all of the data stored in the file or serialization.
Footer
::= UnknownLong HeaderLocation UnknownLong TOCLocation

HeaderLocation
::= Long

Offset to the beginning of the metakit Header in the file. The offset is counted backwards from the beginning of the footer. As this is 16 bytes from the end of the Serialization, the actual location of the beginning of the header is "HeaderLocation + 16" bytes backward from the end of the serialization.

TOCLocation
::= Long Pointer to the beginning of the TableOfContents in the metakit file. Usable only after HeaderLocation was decoded into the absolute location of the beginning of the metakit header in the file. It is expected that the end of the TableOfContents coincides with the beginning of the Footer.


Because of the two offsets in the footer it is possible to append a metakit database to any file and still be able to access all of its contents. It is this property which makes the usage of metakit databases as a file system for starkits and starpacks possible.

IVecData
::= IVecIntData | IVecFloatData | IVecDoubleData | IVecLongData | IVecBinaryData | IVecCatalogData | IVecSubviewData

IVecIntData
::= { byte }

The itemvector is used to compactly store integer numbers in 2-complement representation. Depending on the maximal absolute value over all numbers in the itemvector the metakit library will allocate 1, 2, 4, 8, 16, or 32 bits per item, for all items. The latter means that all items in this itemvector will have the same size.

How many bits per item were allocated can be computed from the number of rows in the view containing the associated column and the size of the itemvector in bytes.

The data of all items is stored back to back as usual, with a little twist. Multiple items are packed into a single byte if the number of bits per item is less than eight.

This format is called adaptive integers. It should not be confused with octet-packed integers (bpInt).

IVecIntData itemvectors are used to store the data of integer columns (type I), and also, in conjunction with IVecCatalogData and IVecBinaryData, to store the data of binary and string columns (types S and B). See section
VARIABLE SIZED DATA.

IVecFloatData
::= { Long }

Each item is a single precision floating point number stored in 4 bytes. The items are stored back to back. The vectors are used to store the data for columns of type F.

IVecDoubleData
::= { {Long}{2} }

Each item is a double precision floating point number stored in 8 bytes. The items are stored back to back. The vectors are used to store the data for columns of type D.

IVecLongData
::= { {Long}{2} }

Each item is a 64-bit integer number in 2-complement representation stored in 8 bytes. The item are stored back to back. The vectors are used to store the data for columns of type L.

IVecBinaryData
::= { byte } | cString

Each item is a variable-sized block of bytes. The length of each item is stored in a separate itemvector, an instance of IVecIntData. The connection between the two itemvectors is made through an instance of VariableMap.

The items are always stored back to back. However in the case of strings each item is actually a cString.

IVecCatalogData
::= { SkipCount OutlineRef } { IVecBinaryData }

See also section VARIABLE SIZED DATA.

The part containing additional binary data is present only if one or more of the indirect data-blocks listed in the catalog coming before it are marked as having location 0. Note that the size of this binary section is never counted in the size of the itemvector itself. In that respect the binary data is not part of the itemvector, although when reading them it is easier to pretend that they are part of the itemvector. This makes it possible to determine the border between the catalog items and the associated binary data without complicated calculations.

Each such itemvector is associated with two other itemvectors holding the information for the directly stored data of their column (size and the data itself)). This association is made through an instance of VariableMap.

SkipCount
::= bpInt

Declares how many items to skip in the associated itemvector for the directly stored data to reach this item.

OutlineRef
::= IVecRef

This refers to the block of bytes actually holding the data for the described cell. This block is an instance of IVecBinaryData. The referred itemvector contains only the data for this item and nothing else. If the location is 0, then the data is not stored in a separate itemvector, but inside of the IVecBinaryData section of the catalog itemvector itself.

IVecSubviewData
::= { SubviewMap }

Each item in the vector represents a whole subview and does so by containing a mapping from the columns the subview consists of to the itemvectors storing the subview data for these columns. See Mapping for the definition of symbol SubviewMap.

IVecRef
::= IVecSize [ IVecLocation ]

IVecSize
::= bpInt

Size of the referenced itemvector, in bytes. If the value is zero a location is irrelevant and not stored.

IVecLocation
::= bpInt

Pointer to the beginning of the itemvector. Not present if the itemvector is empty (size == 0). See IVecData for the legal internal structures of itemvectors.

ColumnMap
::= FixedMap | VariableMap

See also section COLUMN MAPPING.

FixedMap
::= IVecRef

A single itemvector is used to store the column data for the types I, L, F, and D. There are no secondary itemvectors. The IVecRef instance will point to instances of IVecIntData, IVecFloatData, IVecDoubleData, or IVecLongData, as directed by the concrete type.

VariableMap
::= IVecRef IVecRef IVecRef

Two or three primary itemvectors are required when storing string or binary data (Types S and B).

The first two itemvectors contain all the data which is considered small enough to be stored together. The third itemvector is present only if the column contains rather large data in one or more cells. See also section VARIABLE SIZED DATA.

The first reference is to an instance of IVecDinaryData. It contains the directly stored data of the column. The sizes of the items in this itemvector are given through the second itemvector for the column. It is always present.

The second reference is to an instance of IVecIntData. It contains the item sizes for all the directly data of the column. This reference is absent if the itemvector for directly stored data (see above) is empty (size == 0).

The third reference is to an instance of IVecCatalogData. It contains the information to be able to reach all data of the column which is not stored directly in the first itemvector. This data is called indirect because a catalog, this itemvector, is required to find its actual locations. The reference is always present, even if the column does not possess indirectly stored data.

SubviewMap
::= SubMarker NumberOfRows [ Mapping ]

The Mapping is similar to the one used for the main views in the database. Per column in the subview we have one mapping, as declared by the selected structure of the subview. The whole mapping is not present if the subview is empty (number of rows == 0).

SubMarker
::= bpInt /= 0

Purpose unknown.

NumberOfRows
::= bpInt

The number of rows in the subview.

TableOfContents
::= TocMarker StructureDefinition ViewSizes Mapping

TocMarker
::= bpInt /= 0

Because of the constant value for the marker the exact byte representation is also known:

byte /= 0x80

StructureDefinition
::= pString

This string contains the structural definitions for all views in the database. See section STRUCTURE DEFINITION for the syntax of its contents.

ViewSizes
::= { bpInt }

This section contains one octet-packed integer per view in the database (subviews not counted), in the same order as the views were defined in StructureDefinition. The stored value declares the number of rows contained in the view. This number is the same across all columns in the view.

Mapping
::= { ColumnMap }

This section contains one mapping of a column to its primary itemvectors per non-empty view (number of rows greater than zero) in the database (subviews not counted) and column in such a view.

DataArea
::= { IVecData | Hole }

The main part of any metakit file contains the data for the various itemvectors required to store column data, and possible holes, i.e unused, free space.

Hole
::= { byte }

Holes are not explicitly listed anywhere, but implicitly determined as the areas of the file not reached through any of the itemvector references in the table of contents. Because of that their size cannot be declared in the grammar.

Serialization
::= Header DataArea TableOfContents Footer

STRUCTURE DEFINITION

This section describes the syntax of the string containing the structural definition of all views in the database. See StructureDefinition in section
FORMAT GRAMMAR for its place in the file. Like for the whole format itself we use a grammar in EBNF format to specify the syntax. Note however that the symbols used here are distinct from the symbols used for the file format, even if they have the same name.

In contrast to the overall file format the lexical unit here is a single UTF-8 character (Char).

Database
::= View { ',' View }

View
::= ViewName '[' Columns ']'

ViewName
::= { Char\'[' }{1,}

Columns
::= Column { ',' Column }

Column
::= (ColumnName ':' ColumnType) | View

ColumnName
::= { Char\':' }{1,}

ColumnType
::= 'S' | 'I' | 'F' | 'D' | 'B' | 'L'

Note that the characters used here correspond to the type indicator characters introduced in section TYPE DEFINITIONS

GLOSSARY

database
A metakit database, consisting of a set of views.

view
In standard database speak a table, holding data belonging together. In abstract terms a matrix of cells, sliced into vertical columns and horizontal rows. All cells in a single column have the same type, whereas the cells in a row can be of different types, and are also the fundamental unit of data belonging together.

column
A vertical section of the data in a table. Its characteristic attribute is the type, shared by all cells in a column.

row
A horizontal section of the data in a table. The cells in a row belong together in some way.

cell
The fundamental container for holding data in a table. Arranged in a grid of rows and columns.

subview
A special type of column where the data in each cell is a view in its own right.

type
An attribute of columns, it defines the structure of the data stored in the cells of a column. In ambiguous situations this term can be qualified as column type. Examples of types are

  • floating point number
  • integer number
  • string
  • subview
  • ...
property
Properties to columns are like classes to instances. They exist independent of columns, and each column belongs to a property. The only information a property currently holds is its name. This name is inherited by all columns belonging to the property, and actually defines the relationship. All columns with name FOO belong to the property with the same name. The type of a column is not an attribute of the property, but of the column itself.

itemvector
Physical container used to store column data. Depending on the type and data contained in each column in a view metakit will use one or more itemvectors to save all the relevant information. Each itemvector is made up of items stored back to back.

item
Fundamental unit for storing data. A contiguous group of items is called itemvector. The data of a cell in a column is stored in a single item if and only if the column is represented by a one itemvector. Else the data of a cell will be spread over multiple items. The items for a cell will be associated through the row index in their respective containers. In other words, the items of a cell all have the same row index.

indirect
See section VARIABLE SIZED DATA.

offset
An offset is a reference to another location in the metakit file or serialization. It is counted either forward or backward from a specific anchor location in the metakit file or serialization. Both direction and anchor location are part of the specification of an offset value.

pointer
A pointer is an offset counted forward from the beginning of the metakit file or serialization. The beginning itself is defined to be at pointer offset 0.

KEYWORDS

metakit, database, vector storage, column-oriented

COPYRIGHT

Copyright (c) 1996-2003 Jean Claude Wippler <[email protected]>
Copyright (c) 2003 Andreas Kupries <[email protected]>