Introducing Metakit

Introducing Metakit
adapted from the 1.6 documentation

This document introduces the main features and requirements of the Metakit library software product, shows you how it can be used with some coding examples, and tells you how to find your way in the different classes and files which are present in this package.

A portable class library to manage structured data?

There are many ways to develop software, there are many languages, and nowadays, there are as many object oriented class libraries. Well... here's an unusual one. In C++.

The Metakit class library does only one thing. It takes care of the data you give it: manages it, stores it on file, serializes the data for stream I/O, and it loads it back in on request. Persistent data, data manipulation, that's the general idea here. It can instantly restructure data files, allowing you to adjust your design as often as you like.

Are you looking for an SQL database interface? Sorry, wrong number. Do you wish to store data in DBF or WKS format? Sorry again. Would you like to store everything with ODBC, BDE, JET? Nope, this library is not what you want.

Do you want to create self-contained applications which need to store all sorts of data?
Are those INI files and cleverly designed ASCII files starting to become far too complex?
Do you hate to have to constantly go through lengthy file format conversions?
Do those complex serializable objects seem to take forever to load on startup?
Are you worried about losing data if the system fails at the wrong time (it will, eventually)?
Do you wonder why the database you're using has tripled the size of your executable file?
Are you worried about the support needed for all those DLLs or database drivers?
Do you want to create document-centric software, using a single file to hold all data?
How about creating a datafile which can be used on any platform without conversion?
Would you like to stream complex data structures across a network or even internet?
Are you lost in all the header files and documentation sets that came with your database?
Do you want a small database package? With source code freely available?

Yes? Welcome! Please step right this way... meet the Metakit library!

New in 1.2: How about changing data structures on-the-fly - instantly, even on disk?
New in 1.3: Would you like to store all data using your own encryption scheme?
New in 1.4: Do you want nearly automatic persistence with data of any complexity?
New in 1.5: Looking for good performance for 1 to 100,000 records - and more?
New in 1.6: Store icons and other binary objects, thread-safe for use in OCX/ActiveX
New in 1.7: Adds memory mapped files: faster, and supports much larger data files
New in 1.8: Customizable views, 15 new view operators: Join, GroupBy, Intersect, ...

[ Requirements ] [ Coding examples ] [ Classes and files ]

REQUIREMENTS

|| C++ development || The learning curve ||

The Metakit library is a tool in the form of a high-level software (class) library. If it does what you expect, if it conforms to your development choices, if it's easy to work with, and if it can be incorporated in your final product, this might be just what you need. You are the only person who can answer the first question, but here are some notes on the other three issues.

[ Go to top ] [ Next section ]

C++ development

The first versions of Metakit were built using Microsoft Visual C++, and required the Microsoft Foundation Class library (MFC) for containers, strings, and file I/O. This is no longer a requirement, although MFC can still optionally be used on those platforms that support it.

The current version of Metakit is available for a wide range of development environments, including MS-DOS, Windows (all versions, several compilers), Macintosh (both 68k and PowerPC), and UNIX (gcc). More ports will probably be available when you read this.

A standard (and portable) test set has been developed to ease porting and testing accross the wide range of platforms on which Metakit can now be used.

[ Back ]

The learning curve

Like any other C++ class library, the Metakit library is not for the novice C++ programmer. It takes some effort to understand the intended use of classes and objects and to become familiar with a certain way of doing things the "nice" way. It takes determination to dig into the class details when things are not working quite as you expected. Failure to invest time to understand how all classes interact and where specific functionality is meant to be added or altered can lead to disappointing results. You should plan on joining forces with the class designers, not fight them. You even have to be willing to synchronize your programming style with others to benefit from their work. It is the class designer's task to reduce this effort as much as possible, but there is no way around understanding each other's needs and intentions.

Having said that, you may be surprised by the simplicity of Metakit's API. The main header files are small. There is no deeply nested class hierarchy you need to study and understand. There is no base class that you need to be aware of for your own objects, and you will only rarely derive from classes in this package. In practice, most objects in Metakit are either used as is or added to your own classes as member objects. This does not imply that Metakit is a trivial piece of software (it isn't), but merely that a lot of effort has gone into the encapsulation of the underlying complexity.

As a consequence, the Metakit classes tend to fit in nicely with a range of other class libraries (user interfaces, data communication, networking, even other database packages). You can start using this library for evaluation in your existing projects without disrupting existing structures and then choose to adopt more of it if it suits your needs. The design is highly modular (and will remain so as much as possible), allowing you to take what you like, without pulling in code which you do not wish or need to use.

[ Back ]

CODING EXAMPLES

|| Basics || Operators || Persistence || Sorting || Nested views || Performance || Structure changes ||

The Metakit library manages structured information for you, but it does this using a new approach. The best way to introduce many of the ideas is with sample code (if you're not a C++ programmer: shutdown now, reboot, and go have a nice cup of coffee).

[ Top ] [ Next ]

Basics

Let's create a trivial address book with a single entry in it:

    c4_StringProp pName ("name");            
    c4_StringProp pCountry ("country");           
    c4_View vAddress;           
    c4_Row row; 
    pName (row) = "John Williams";            
    pCountry (row) = "UK";
    vAddress.Add(row);

In this code, we define two fields (called "properties"), a collection (called a "view"), and a temporary buffer (called a "row"), and then we set the properties (fields) to some sample values and add the row (entry) to our view (collection). Note that a property must be given a name when defined.

The most striking difference with the C++ code you've probably grown used to, is that the data structure has been defined entirely in terms of existing classes. What Metakit does is to introduce "run-time data structures", i.e. structures which can be set up and manipulated entirely at run time. Big deal? Well... you're right, this doesn't look like much. In fact, the more trivial it looks, the better: using run-time data structures in your C++ sources should be (almost) as easy as the classical structure/class & member approach. Let's add a second entry:

    pName (row) = "Paco Pena";           
    pCountry (row) = "Spain";         
    vAddress.Add(row);

[ Back ]

Operators

Now let's add a third one using some shorthand operators:

    vAddress.Add(pName ["Julien Coco"] + pCountry ["Netherlands"]);

Even if this looks a bit confusing at first, don't worry. There are only a couple of overloaded operators, and this is as concise as it gets. Metakit is not a wild collection of non-intuitive operator notations, although the use of the array operator in the example above is definitely a bit unconventional.

The five operator notations you need to be aware of in Metakit are:

view [ index ] - is a - row: (this makes views similar to ordinary arrays)
property ( row ) - is a - value: (which can be used on either side of an assignment statement)
property [ value ] - is a - row: (a single row with a single property, sort of a constant, or "scalar")
row + row - is a - row: (the concatenation of the property values in both rows)
property, property, ..., property - is a - view: (an empty one with the specified properties)

Now that we've appended three entries, let's take some information out of the view again:

    CString s1 = pName (vAddress[1]);          
    CString s2 = pCountry (vAddress[1]);            
    printf("The country of %s is: %sn",
                (const char*) s1, (const char*) s2);

[ Back ]

Persistence

Ok, but what about storing this information on file? Ah, well, that's simple with these run-time data structures:

    c4_Storage storage ("myfile.dat", true);        
    storage.Store("address", vAddress);       
    storage.Commit();

Loading this data from file again is just as simple. You only need to make sure that properties are named and typed according to the stored structural information. Here is the full code to access a previously stored datafile:

    c4_StringProp pName ("name"), pCountry ("country");        
    c4_Storage storage ("myfile.dat");     
    c4_View vAddress = storage.View("address");       
    for (int i = 0; i < vAddress.GetSize(); ++i)             
        ... pName (vAddress [i]) ... pCountry (vAddress [i]) ...

You can have on-demand loading: data will only be read from file when actually needed. To do this, the storage object must not be destroyed: in MFC for example, making the storage object a member of your derived document class will do the trick. If a storage object is destroyed while there are still views referring to its contents, all relevant data will be loaded into memory. This may require a lot of time and memory space - so be sure to destroy or clear all views if this is not what you want.

[ Back ]

Sorting

Now for some data manipulation. Let's create a sorted version of this view:

    c4_View vSorted = vAddress.SortOn(pName);

The result is a new derived view with all rows sorted by name. Note that vSorted is not a copy, but that the rows in this view share their contents with vAddress. This is only another way to look at this information.

Using a nasty overload of the comma operator, we can define composite keys, and thus sort on more than one property (the extra parentheses in the following line are essential):

    vSorted = vAddress.SortOn((pCountry, pName));      
    ASSERT(vSorted.GetSize() == vAddress.GetSize());

There are several more functions (descending sorts, searches, selection on values and ranges), which will all work with any structure you care to build using the basic mechanisms just described. These manipulation functions can hide quite a bit of complexity. Here is a more advanced example:

    c4_View vSome = vAddress.Select(pCountry ["UK"]).SortOn(pName);
    for (int i = 0; i < vSome.GetSize(); ++i)
    {
        CString name = pName (vSome[i]);
        printf("'%s' lives in the UKn", (const char*) name);
        printf("Entry 'vAddress[%d]' is that same personn",
                            vAddress.GetIndexOf(vSome[i]));
    }

This will show a list of record indexes of all people living in the UK, sorted by name. The effect of the GetIndexOf() member is to "unmap" all selections and sorts back to the underlying view.

That's it. This is how Metakit is used. Isn't it great? End of story.

[ Back ]

Wait... is that all?

Not quite. There is more. One of the nice features of Metakit is, that the views you just saw can also be stored inside other views. A view is very similar in a way to the integers and strings used up to this point.

Let's first create a list of John's telephone numbers:

    c4_StringProp pType ("type"), pNum ("num");        
    c4_View view;       
    view.Add(pType ["work"] + pNum ["+44 (1) 123 4567"]);       
    view.Add(pType ["home"] + pNum ["+44 (1) 123 6789"]);

Now, we can add this list to the original view by defining an appropriately typed property:

    c4_ViewProp pPhone ("phone");        
    pPhone (vAddress[0]) = view;

This creates a nested view structure (or "repeating field", if you like). Our simplistic address book now holds three addresses, of which the first contains a list of two telephone numbers. Views can be as complex as you like, and can accommodate a wide range of application storage structures.

Storing such a compound data structure is just as simple as before:

    c4_Storage storage ("myfile.dat", true);        
    storage.Store("address", vAddress);       
    storage.Commit();

In day-to-day use, you will create a number of properties with appropriate names and datatypes and make them available via a header file. Although not required, it is far simpler if these properties are defined as global objects and exist as long as your application runs. Note that properties (objects of class c4_Property or classes derived from it) only act as (lightweight) placeholders, and that they are independent of actual data stored in views (in the same way that member fields are defined as part of a class in C++, long before objects of that class are created).

[ Back ]

Changing data structures on-the-fly

So, what's the big deal you ask, structured data is a very common way of organizing information. Well, one of the unique features of Metakit is that it can change its data structures on-the-fly. In Metakit, you can not only quickly add/delete rows - as in any other database package - but also properties (i.e. columns) and views, all with minimal effort.

To continue with our example, suppose you wish to add a "city" field to every address. There are two ways to do this, either by redefining the exact structure of the address view, while the storage object is present:

    #define FORMAT "address[name:S,city:S,country:S,phone[type:S,num:S]]"
    vAddress = storage.GetAs(FORMAT);

Or you can simply let the conversion take place automatically the next time you save the data:

    c4_Storage storage ("myfile.dat", true);
    c4_View vAddress = storage.View("address");
    c4_StringProp pCity ("city");
    pCity (vAddress[0]) = "Paris";
    storage.Store("address", vAddress);       
    storage.Commit();

Properties which are no longer defined will be deleted (along with all the associated data). The following definition will remove all phone number information:

    #define FORMAT "address[name:S,city:S,country:S]"
    vAddress = storage.GetAs(FORMAT);

As with all modifications to the data file, changes only take place during the Commit call. If you don't call it (or if the application is aborted) the original version will be available again on the next open. In fact, it is possible to alter a data structure even if it is stored on read-only media - but evidently these changes cannot be saved in that case.

On-the-fly restructuring allows you to issue new releases of your software which can maintain backward compatibility with previous data file formats. People using your software need not notice such a conversion at all, nor do you have to add a lot of code to deal with older formats. With Metakit, your datafiles can continue to evolve to match your needs!

[ Back ]

Performance

Metakit was designed with efficiency in mind (in time and in space). When used appropriately, this version can achieve very high performance (storing 100,000 attributes in a few seconds on a P5/166...). But the current version is not quite there yet - in some situations, performance will drop back dramatically below that level. The catch is how to figure out what is and what isn't effective.

Due to on-demand loading, opening a file is instant regardless of file size, the performance issue is only relevant once you start accessing or altering information.

The CatFish disk catalog browser utility built with Metakit demonstrates that very high performance can be achieved if the data structure is designed to take advantage of the unique way in which Metakit manages its data.

Performance is likely to increase further in coming releases. This is based on results obtained with the predecessor of Metakit which has shown stunning performance (on millions of objects) in a commercial application, but which used a less general class library interface. The current product is a major rewrite of that software to build a more general foundation, with emphasis on functionality first, speed second.

Another - perhaps surprising - fact, is that structural changes to data files are virtually instant. Adding or deleting properties and/or views is very quick. In Metakit, only the manipulation of data values takes time.

[ Back ]

CLASSES AND FILES

|| Class hierarchy || Naming conventions || Header files ||

This section presents a general perspective on the Metakit classes, header files, and library files.

[ Top ]

Class hierarchy

The outside view of the class hierarchy is almost flat: the public classes don't share a common base class, nor do they require complex derived classes or large numbers of virtual functions. As you might expect, there is essentially one class for every type of object:

c4_View: For views, which can hold zero or more rows of data.
c4_Row: For rows, consisting of zero or more properties, each with an associated value.
c4_Property: The base class for properties, with a few basic derived classes.
c4_Cursor: This is a generic iterator for views (or a "pointer to a row" if you prefer).
c4_RowRef: A reference to a row, either a c4_Row, or one of the entries of a c4_View.
c4_Storage: Objects of this type manage persistence and on-demand loading.

That's basically it. If you study the definition of the above six classes you will be ready to use Metakit. The c4_View class might look quite familiar: it has the same set of member functions as the "C...Array" classes in MFC. The c4_Cursor class has all the operator overloads (++, --, *) you would expect for iterators.

[ Back ]

Naming conventions

All class names start with the prefix "c4_". Until C++ namespaces become more widely implemented, this class library needs to use yet another naming scheme which must not conflict with whatever other software and class libraries you may be using. The decision was made to insert the digit four in every globally accessible identifier of this class library. In addition, the first letter of all identifiers indicates its type. It may not win a prize in esthetics, but it seems unlikely that anyone else uses this convention.

The most common prefixes are:

c4_: Classes
d4_: Preprocessor defines
f4_: Global functions
t4_: Typedefs

Two other conventions are used throughout this class library: function arguments ("formal parameters") always have a trailing underscore, and private member fields always start with a leading underscore. These choices do not affect your own programming style, but it helps to be aware of them while going through the headers and sources.

[ Back ]

Header files

There are three categories of header files:

m4kit.h: The top level header file you need to include in your sources.
k4*.h: Contains all public class definitions of Metakit.
k4*.inl: Inline definitions included by the "k4*.h" files.

The two central header files of the Metakit library are "k4view.h" where all the core classes are defined, and "k4conf.h" (a simplified version of the headers used to build te library itself) with the definitions to deal with hardware / operating system / compiler / framework dependencies (x86 / Win+Dos / MSVC / MFC in the standard release).

As you can see, all public files contain the digit four to avoid name conflicts with any other files you may be using.

[ Back ]