MK and object-oriented databases (OODB)

From: <[email protected]> - 17 Mar 1999

I am thinking of building an object DB on top of Metakit and I'm wondering what the best way to go would be. One thought is to have each row be a class and put the data for an object as a byte stream in one column. The objects would restore themselves from the byte stream. That way, all Foos, for example, would be in a particular row and each Foo would have its own column in that row. As has been stated before, this approach would be platform-specific.

Another approach would be for each row to be an object and for the columns of the row to be the fields of the object. Under this approach the objects could save there fields as typed data, so that integers are integers and strings are strings. This, it seems to me, would be a cross-platform solution (except for any binary data an object might want to save).

I wonder if there are performance considerations that would make one of these approaches better than the other.

Reply:

The issues you raise and the approaches you decribe are both extremely interesting. It is also quite complex - this is the sort of stuff OODBMS versus RDBMS people are struggling with, clearly.

The question of "opaque" versus "type-exposed" storage relates to the way you expect to navigate in the data. In my experience, Metakit gets a lot of its flexibility from the fact that it does not need indexing decisions to ve able to search and find rows. Things like regular- expression searches rely on the fact that Metakit just races through column-wise data structures to deliver results. Numeric searches are very efficient as well (much more so than strings, in fact).

But the more you expose the structure to Metakit, the harder it gets to tie Metakit "rows" to OO "instances" :(

It's not that hard if objects types are simple, but once you start to get into deeply derived hierarchies of classes, restoring objects becomes quite tricky (and probably messy). Also, the issue of object identity needs to be addressed, requiring some form of "Object ID" (OID) to work. And finally, there is the issue of resident objects versus dormant (on-disk) objects and their interdependencies.

In terms of performance, the key to speed in Metakit, is the realization that performance only matters inside loops, and loops often iterate with access to just a few fields of an object. Depending on your application you could even decide to use a hybrid form: store some fields as Metakit properties (i.e. separately), and "serialize" everything else into an opaque set of bytes (using the c4_ByteProp datatype - or strings).

Another thing to keep in mind for performance, is that Metakit in a way acts like a collection of rows, with all the usual performance tradeoffs of inserting/deleting entries in the middle of large arrays.

If you have mroe details, I'd be happy to discuss these issues by email. Being able to better support an OODBMS model is a very important issue for me for a future release of Metakit.

-- JC