What is the maximum file size

From: Masashi Kanamori - 14 Nov 1998

 > I consider handling huge binary data(over 1GB to 5GB) with Metakit.

There have been requests for such large databases in Metakit before. Let me add that: 1) this is two orders of magnitude higher than what Metakit really is meant for today, and 2) I intend to adjust internal datastructures and algorithms in the future to better deal with such and even larger databases.

 > How much is the maximum file size that Metakit can be handled on NTFS
 > file system?

There are currently 32-bit file pointers in Metakit, meaning that 2 Gb (maybe 4 Gb, I haven't researched whether signed longs work ok) are the abolute maximum file size Metakit can handle today. The internal data format will handle larger files (except for a single 32-bit pointer which needs to be altered in a next release).

 > Is the number of items that it can be handled with datarows
 > restricted with int (32bit)?

No, by using nested subviews, you can easily go beyond this.

 > And, let me know whether Metakit can be
 > used at a practical speed if file size is to how much.

In the current version of Metakit, it will be *very* tough to deal with say 1 Gb of data without huge memory usage problems. Things too watch out for are:

   1)  do not use more than perhaps 100,000 rows in any view/subview
   2)  keep this limit even lower if you are using large strings
   3)  try to keep the total number of views/subviews low as well
   4)  memory use grows during changes until commit, so commit often
   5)  commits are slow, proportional to the number of views/subviews

Requirements 1) and 3) are confliciting: don't make views too large, nor too small (implying lots of subviews). Requirements 4) and 5) also conflict, you will need to look for a balance to get workable results.

Another approach may help here: if you are storing some huge objects, then you could consider storing those in a separate file and using Metakit for the more compact remaining datastructures and to *manage* that secondary file by storing <position,size> information instead of the huge data objects themselves. Thought this means more work (explicitly coding file free space management, for one), it might allow you to reduce Metakit file sizes drastically so that you don't have to fight its current performance/size tradeoffs so much.

As I said, I intend to gradually introduce far more powerful and scalable approaches so that Metakit *can* one day be used efficiently for multi-Gb data storage.

Please feel free to email me with further questions and suggestions, as I would very much like to understand what you need and to help look for an approach which might be used with the current version of Metakit.

-- JC


August 1999, good news: a new design has been prepared, and even tried as prototype recently (thanks to Christian Tismer), which has inserted 100,000,000 rows of 10 small values into a 2 Gb database. The results are extremely promising (and yes, 2 Gb is a limit which causes MK 1.8.6 to fail), and will lead to a few changes in MK before the end of the year.