[ Terminology | Installation | Getting started | Mk4py Reference ]
Buzzwords - Metakit is an embeddable database which runs on Unix, Windows, Macintosh, and other platforms. It lets you build applications which store their data efficiently, in a portable way, and which will not need a complex runtime installation. In terms of the data model, Metakit takes the middle ground between RDBMS, OODBMS, and flat-file databases - yet it is quite different from each of them.
Technology - Everything is stored variable-sized yet with efficient positional row access. Changing an existing datafile structure is as simple as re-opening it with that new structure. All changes are transacted. You can mix and match software written in C++, Python, and Tcl. Things can't get much more flexible...
Python - The extension for Python is called "Mk4py". It provides a lower-level API for the Metakit C++ core extension than an earlier version of this interface, and uses SCXX by Gordon McMillan as C++ glue interface.
Mk4py 2.4.9.6 - is a final/production release. The homepage points to a download area with pre-compiled shared libraries for Unix, Windows, and Macintosh. The Metakit source distribution includes this documentation, the Mk4py C++ source code, a "MkMemoIO.py" class which provides efficient and fail-safe I/O (therefore also pickling) using Metakit memo fields, and a few more goodies.
License and support - Metakit 2 and up are distributed under the liberal X/MIT-style open source license. Commercial support is available through an Enterprise License. See the license page for details.
Credits - Are due to Gordon McMillan for not stopping at the original Mk4py and coming up with a more Pythonic interface, and to Christian Tismer for pushing Mk4py way beyond its design goals. Also to GvR and the Python community for taking scripting to such fascinating heights...
Updates - The latest version of this document is at
https://www.equi4.com/metakit/python.html
The terms adopted by Metakit can be summarized as follows:
A few more comments about the semantics of Metakit:
Create a view (this is the Metakit term for "table"):import metakit db = metakit.storage("datafile.mk",1)
Add two rows (this is the Metakit term for "record"):vw = db.getas("people[first:S,last:S,shoesize:I]")
Commit the changes to file:vw.append(first='John',last='Lennon',shoesize=44) vw.append(first='Flash',last='Gordon',shoesize=42)
Show a list of all people:db.commit()
Show a list of all people, sorted by last name:for r in vw: print r.first, r.last, r.shoesize
Show a list of all people with first name 'John':for r in vw.sort(vw.last): print r.first, r.last, r.shoesize
for r in vw.select(first='John'): print r.first, r.last, r.shoesize
import metakit print metakit.version
SYNOPSYS
ADDITIONAL DETAILS
- db = metakit.storage()
- Create an in-memory database (can't use commit/rollback)
- db = metakit.storage(file)
- Use a specified file object to build the storage on
- db = metakit.storage(name, mode)
- Open file, create if absent and rwflag is non-zero. Open read-only if mode is 0, r/w if mode is 1 (cannot be shared), or as commit-extend if mode is 2 (in mode 1 and 2, the file will be created if needed).
- vw = metakit.view()
- Create a standalone view; not in any storage object
- pr = metakit.property(type, name)
- Create a property (a column, when associated to a view)
- vw = metakit.wrap(sequence, proplist, byPos=0)
- Wraps a Python sequence as a view
storage - When given a single argument, the file object must be a real stdio file, not a class implementing the file r/w protocol. When the storage object is destroyed (such as with 'db = None'), the associated datafile will be closed. Be sure to keep a reference to it around as long as you use it.
wrap - This call can be used to wrap any Python sequence, it assumes that each item is either a dictionary or an object with attribute names corresponding to the property names. Alternately, if byPos is nonzero, each item can be a list or tuple - they will then be accessed by position instead. Views created in this way can be used in joins and any other view operations.
ADDITIONAL DETAILS
- vw = storage.getas(description)
- Locate, define, or re-define a view stored in a storage object
- vw = storage.view(viewname)
- The normal way to retrieve an existing view
- storage.rollback(full=0)
- Revert data and structure as was last committed to disk. In commit-aside mode, a "full" rollback reverts to the state of the original file and forgets about the aside file.
After a rollback, your view objects are invalid (use the view or getas methods on your storage object to get them back). Furthermore, after a full rollback, the aside storage is detached from the main storage. Use the aside method on your main storage object to reattach it. If you do not reattach it, further commits will (try to) write to the main storage.- storage.commit(full=0)
- Permanently commit data and structure changes to disk In commit-aside mode, a "full" commit save the latest state in the original file and clears the aside datafile.
- ds = storage.description(viewname='')
- The description string is described under getas
- vw = storage.contents()
- Returns the View which holds the meta data for the Storage.
- storage.autocommit()
- Commit changes automatically when the storage object goes away
- storage.load(fileobj)
- Replace storage contents with data from file (or any other object supporting read)
- storage.save(fileobj)
- Serialize storage contents to file (or any other object supporting write)
description - A description of the entire storage is retured if no viewname is specified, otherwise just the specified top-level view.
getas - Side-effects: the structure of the view is changed.
Notes: Normally used to create a new View, or alter the structure of an existing one.
A description string looks like:"people[name:S,addr:S,city:S,state:S,zip:S]"That is "<viewname>[<propertyname>:<propertytype>...]"
Where the property type is one of:
Careful: do not include white space in the decription string.
I adaptive integer (becomes Python int) L 64-bit integer (becomes Python long) F C float (becomes Python float) D C double (is a Python float) S C null terminated string (becomes Python string) B C array of bytes (becomes Python string) In the Python binding, the difference between S and B types is not as important as in C/C++, where S is used for zero-terminated text strings. In Python, the main distinctions are that B properties must be used if the data can contain zero bytes, and that sort order of S (stricmp) and B (memcmp) differ. At some point, Unicode/UTF-8 will also play a role for S properties, so it's best to use S for text.
r = view[0] r.name = 'Julius Caesar' view[0].name # will yield 'Julius Caesar'A slice returns a modifiable view which is tied to the underlying view. As special case, however, you can create a fresh empty view with the same structure as another view with:
v2 = v[0:0]Setting a slice changes the view:
v[:] = [] # empties the viewView supports getattr, which returns a Property (eg view.shoesize can be used to refer to the shoesize column). Views can be obtained from Storage objects: view = db.view('inventory') or from other views (see select, sort, flatten, join, project...) or empty, columnless views can be created: vw = metakit.view()
SYNOPSYS
ADDITIONAL DETAILS
- view.insert(index, obj)
- Coerce object to a Row and insert at index in View
- ix = view.append(obj)
- Object is coerced to Row and added to end of View
- view.delete(index)
- Row at index removed from View
- lp = view.structure()
- Return a list of property objects
- cn = view.addproperty(fileobj)
- Define a new property, return its column position
- str = view.access(byteprop, rownum, offset, length=0)
- Get (partial) byte property contents
- view.modify(byteprop, rownum, string, offset, diff=0)
- Store (partial) byte property contents. A non-zero value of diff removes (<0) or inserts (>0) bytes.
- n = view.itemsize(prop, rownum=0)
- Return size of item (rownum only needed for S/B types). With integer fields, a result of -1/-2/-4 means 1/2/4 bits per value, respectively.
- view.map(func, subset=None)
- Apply func to each row of view, or (if subset specified) to each row in view that is lso in subset. Func must have the signature "func(row)", and may mutate row. Subset must be a subset of view: e.g. "customers.map(func, customers.select(...))".
- rview = view.filter(func)
- Return a view containing the indices of those rows satisfying func. Func must have signature "func(row)" and must return a false value to omit the row.
- obj = view.reduce(func, start=0)
- Return the result of applying func(row, lastresult) to each row in view.
- view.remove(indices)
- Remove all rows whose indices are in subset from view. Not the same as minus, because unique is not required, and view is not reordered.
- rview = view.indices(subset)
- Returns a view containing the indices in view of the rows in subset.
- rview = view.copy()
- Returns a copy of the view.
addproperty - This adds properties which do not persist when committed. To make them persist, you should use storage.getas(...) when defining (or restructuring) the view.
append - Also support keyword args (colname=value...).
insert - coercion to a Row is driven by the View's columns, and works for:
dictionaries (column name -> key) instances (column name -> attribute name) lists (column number -> list index) - watch out!
ADDITIONAL DETAILS
- vw = view.select(criteria...)
- Return a view which has fields matching the given criteria
- vw = view.select(low, high)
- Return a view with rows in the specified range (inclusive)
- vw = view.sort()
- Sort view in "native" order, i.e. the definition order of its keys
- vw = view.sort(property...)
- Sort view in the specified order
- vw = view.sortrev((propall...), (proprev...))
- Sort view in specified order, with optionally some properties in reverse
- vw = view.project(property...)
- Returns a derived view with only the named columns
select - Example selections, returning the corresponding subsets:
result = inventory.select(shoesize=44) result = inventory.select({'shoesize':40},{'shoesize':43}) result = inventory.select({},{'shoesize':43})The derived view is "connected" to the base view. Modifications of rows in the derived view are reflected in the base view
sort - Example, returning the sorted permutation:result = inventory.sort(inventory.shoesize)See notes for select concerning changes to the sorted view
ADDITIONAL DETAILS
- vw = view.flatten(subprop, outer=0)
- Produces one 'flat' view from a nested view
- vw = view.join(view, property...,outer=0)
- Both views must have a property (column) of that name and type
- ix = view.find(criteria..., start=0)
- Returns the index of the found row, or -1
- ix = view.search(criteria...)
- Binary search (native view order), returns match or insertion point
- ix, cnt = view.locate(criteria...)
- Binary search, returns position and count as tuple (count can be zero)
- vw = view.unique()
- Returns a new view without duplicate rows (a set)
- vw = view.union(view2)
- Returns a new view which is the set union of view and view2
- vw = view.intersect(view2)
- Returns a new view which is the set intersection of view and view2
- vw = view.different(view2)
- Returns a new view which is the set XOR of view and view2
- vw = view.minus(view2)
- Returns a new view which is (in set terms) view - view.intersect(view2)
- vw = view.remapwith(view2)
- Remap rows according to the first (int) property in view2
- vw = view.pair(view2)
- Concatenate rows pairwise, side by side
- vw = view.rename('oldname', 'newname')
- Returns a derived view with one property renamed
- vw = view.product(view)
- Returns the cartesian product of both views
- vw = view.groupby(property..., 'subname')
- Groups on specified properties, with subviews to hold groups
- vw = view.counts(property..., 'name')
- Groups on specified properties, replacing rest with a count field
find - view[view.find(firstname='Joe')] is the same as view.select(firstname='Joe')[0] but much faster Subsequent finds use the "start" keyword: view.find(firstname='Joe', start=3)
ADDITIONAL DETAILS
- vw = view.hash(mapview, numkeys=1)
- Construct a hash mapping based on the first N fields.
- vw = view.blocked(blockview)
- Construct a "blocked" view, which acts as if all segments together form a single large view.
- vw = view.ordered(numkeys=1)
- Define a view which assumes and maintains sort order, based on the first N fields. When layered on top of a blocked view, this implements a 2-level btree.
blocked - This view acts like a large flat view, even though the actual rows are stored in blocks, which are rebalanced automatically to maintain a good trade-off between block size and number of blocks.
The underlying view must be defined with a single view property, with the structure of the subview being as needed.
hash - This view creates and manages a special hash map view, to implement a fast find on the key. The key is defined to consist of the first numKeys_ properties of the underlying view.
The mapview must be empty the first time this hash view is used, so that Metakit can fill it based on whatever rows are already present in the underlying view. After that, neither the underlying view nor the map view may be modified other than through this hash mapping layer. The defined structure of the map view must be "_H:I,_R:I".
This view is modifiable. Insertions and changes to key field properties can cause rows to be repositioned to maintain hash uniqueness. Careful: when a row is changed in such a way that its key is the same as in another row, that other row will be deleted from the view.
ordered - This is an identity view, which has as only use to inform Metakit that the underlying view can be considered to be sorted on its first numKeys properties. The effect is that view.find() will try to use binary search when the search includes key properties (results will be identical to unordered views, the find will just be more efficient).
This view is modifiable. Insertions and changes to key field properties can cause rows to be repositioned to maintain the sort order. Careful: when a row is changed in such a way that its key is the same as in another row, that other row will be deleted from the view.
This view can be combined with view.blocked(), to create a 2-level btree structure.