Converting from 1.8.6 datafiles

From the CHANGES file:

2000-07-30 : Major auto-convert 1.8.6 file bug

     Bug in on-the-fly conversion of bytes properties ('B')
     in pre-2.0 datafiles (i.e. 1.8.6 and earlier) resolved.
     Unfortunately, this bug can not be 100% unambiguously
     fixed.  The new code *will* properly detect most cases,
     and convert both 1.8.6 and 2.0 datafiles on the fly, but
     especially for views with only a few rows and at most a
     few bytes of data per row - the conversion *might* fail.
     In this case, MK will have to be compiled with a define
     to force it to either assume all old datafiles are 1.8.6
     (-Dq4_OLD_IS_PRE_V2), or to assume that they are always
     2.0 (-Dq4_OLD_IS_ALWAYS_V2).  If you are currently using
     MK 1.8.6, then you should *skip* the update to 2.01, and
     consider updating to 2.3.x.  This way you never have any
     2.0 files around, and can force all your code to handle
     1.8.6 files properly (by using "-Dq4_OLD_IS_PRE_V2").
     See src/format.cpp, c4_FormatB::OldDefine for details.
     This bug *only* applies to bytes properties in pre-2.0
     data files.  Conversion of 2.0x files is unaffected.

See below for suggestions on how to best deal with this problem.

Background information:

MK 2.0 turns out to have *accidentally* introduced a file format change, for 'B' (bytes) data. The problem is that this change was unintentional, but that the header was *NOT* changed along with it. That means that there is *no* unambiguous way to detect whether a file is 1.8.6 or 2.0, yet the file format differs (for B-type data only).

I have uploaded a new release to:

    https://www.equi4.com/previews/mk2-20000729.tar.gz

This is the 2.3.2 beta, and unfortunately - because of this bug - I would urge everone to update to 2.3.2 beta instead of 2.01 (see below). Although the fix could be back-ported to 2.01, it would introduce a huge new problem: if 2.01 auto-converts 1.8.6 data, then it will change the file format in such a way that it will be 2.01-like, but with no way of distinguishing whether it is 1.8.6 or not on the next open. Keep in mind that 2.0 introduced a dataformat change without changing the header.

The fix I introduced tries to convert any datafile, by examining the actual data stored - the distinction between 1.8.6 and 2.0 can *usually* be made. In doubt, the new 2.3.2 beta assumes the data file is a 2.0 format.

What does this mean? Well: it is possible to have a datafile in 1.8.6 which 2.3.2 beta will fail to recognize as being 1.8.6, and hence will fail to convert properly. I expect this to never happen in practical cases, but it is a non-zero probability. It can no longer crash, but the data values will be wrong. The chances of such a format conversion are extremely slim, and decrease as the number of rows and the amount of data in the view increases. If entries are usually over 4 bytes long, then the chance of a bad conversion is just about zero. The whole issue is caused by a reversed interpretation of a data column and its int-size description vector.

There is precisely one way to avoid this issue, if *all* your datafiles are 1.8.6 format (or new ones, i.e. the 2.3.2 beta format): build the new 2.3.2 beta to *always* assume that old datafiles are pre-2.0. This is done by compiling MK 2.3.2 beta with the "-Dq4_OLD_IS_PRE_V2" define enabled. By doing this, your code will fail to work with 2.0x datafiles, but will *always* work properly with 1.8.6 files (auto-converting them on the fly during open and saving the new format during commit, as always).

So which version should I use?

You do not have to update immediately - this merely describes the suggested approach once you decide to upgrade (you will, at some point, no doubt).

Note A: you can update to 2.3.2 beta or later as is, or if you want to be absolutely certain that every possible datafile conversion works for sure, compiled with the flag "-Dq4_OLD_IS_PRE_V2". If you do, you have to keep in mind that the resulting code will not be able to deal with any 2.0x datafile that has B-type data in it. The benefit is a 100% correct conversion of all 1.8.6 datafiles.

Note B: in this case update to 2.3.2 beta or later as is. As of 29-7-2000, that release will properly convert all 2.0x datafiles, and will succeed in all but the rarest cases when converting 1.8.6 datafiles. The default of the conversion code is to assume that the datafile is in 2.0x format, it decides to use 1.8.6 only if that decision cannot possible be valid. The conversion will never crash (unlike 2.0x, which can choke on 1.8.6 datafiles with B-type properties) nor corrupt datafiles - the worst outcome is incorrect data (all data in a B-type property will be incorrect). This is of course a very serious problem (which now turns out to have been in MK since 2.0). To put things into perspective: I have not yet found an example of a datafile which passes the new tests, yet converts incorrectly - the probability of a missed conversion really is nearly zero.