[Metakit] newbie question - writing derived view back to
db
Wolfgang Lipp
paragate at gmx.net
Thu Jan 20 21:26:41 CET 2005
> BK> will get to in a moment. You make a point about the speed of mk
> BK> versus the speed of raw python dictionary,
> I don't think it could ever get as good as operations on disk-based
> data structures being comparable in speed to memory-based data
> structures. It's unrealistic to expect that I think, unless you're
> some sort of Donald Knuth on steroids.
agreed. i was writing under the impression that all of
an mk storage gets always fully loaded into memory on
opening. i am happy to hear this is not the case. my
caching example was only possible in such a simple way
because i know my tables are not too big for memory. i
am thinking about functionality somehwere in my wrapping
class that manages such a caching on demand.
> view = storage.getas("test[_B[a:s,b:s,c:s]]").blocked()
> vw.append(('1','2','3')))
> 600000, time: 21.30, delta: 2.78
> Values written, now syncing, time: 22.83
> After syncing: 23.08
> end.
i tried this myself. the results so far are slightly
puzzling to me. here is a short test report:
============================================================
table creation strings: for blocked and unblocked views:
node[_B[name:S,comment:S,termnr:I]]
node[name:S,comment:S,termnr:I]
data: several thousands of rows with nearly identical
content. all data was produced prior to each testrun and
kept in memory throughout. for mode t1, a list of
tuples, for mode t0, a list of dictionaries was
produced::
[
('*0*', 'QfoY', 88),
('*1*', 'dn', 430),
('*2*', 'CJnnTZTLD', 502),
... ]
[
{'termnr': 88, 'comment': 'QfoY', 'name': '*0*'},
{'termnr': 430, 'comment': 'dn', 'name': '*1*'},
{'termnr': 502, 'comment': 'CJnnTZTLD', 'name': '*2*'},
... ]
core code::
stopwatch.start( _inter( '$ax $bx $ex $tx' ) )
if useAppend:
if useTuples:
for entry in ENTRYTUPLES:
targetTable.append( entry )
else:
for entry in ENTRYDICTS:
targetTable.append( **entry )
else:
if useTuples:
targetTable[ 0 : ROWCOUNT ] = ENTRYTUPLES
else:
targetTable[ 0 : ROWCOUNT ] = ENTRYDICTS
stopwatch.stop()
the test runs created 16 data storages with identical sizes of about 2MB
each.
results::
test run 2, 100'000 rows
TOTAL : 749.9780
a0 b0 x0 t0: 10.1340 ****
a1 b1 x1 t1: 10.3540 *****
a0 b0 x0 t1: 11.0860 *****
a1 b0 x0 t0: 11.7670 *****
a1 b1 x0 t1: 14.6310 ******
a0 b0 x1 t1: 22.5230 **********
a0 b0 x1 t0: 27.2790 ************
a1 b1 x1 t0: 28.9120 *************
a1 b0 x1 t0: 41.5600 ******************
a0 b1 x1 t1: 60.7580 ***************************
a1 b0 x0 t1: 63.1210 ****************************
a0 b1 x1 t0: 65.4740 *****************************
a1 b0 x1 t1: 67.9580 ******************************
a1 b1 x0 t0: 78.3630 **********************************
a0 b1 x0 t1: 109.0270
************************************************
a0 b1 x0 t0: 114.1740
**************************************************
test run 2, 50'000 rows
TOTAL : 187.3690
a1 b1 x1 t1: 3.7960 ********
a0 b0 x0 t1: 4.0160 ********
a1 b1 x0 t1: 4.4560 *********
a0 b0 x0 t0: 4.9470 **********
a1 b0 x0 t0: 4.9770 **********
a0 b0 x1 t1: 5.3680 ***********
a0 b0 x1 t0: 6.6990 **************
a1 b1 x1 t0: 9.2140 *******************
a1 b0 x1 t0: 11.0960 ***********************
a1 b0 x0 t1: 13.3400 ****************************
a1 b1 x0 t0: 14.8110 *******************************
a1 b0 x1 t1: 16.5630 ***********************************
a0 b1 x1 t1: 16.6240 ***********************************
a0 b1 x1 t0: 17.5650 *************************************
a0 b1 x0 t1: 23.6340
*************************************************
a0 b1 x0 t0: 23.9150
**************************************************
a0 -- use slice assignment (see code)
a1 -- use append with loop (see code)
b0 -- do not use blocked view
b1 -- use blocked view
x0 -- use normal commit mode
x1 -- use extend commit mode
t1 -- use tuples (see code)
t0 -- use dictionaries (see code)
============================================================
there are huge differences in the timings, but i find
myself unable to distill any kind of clear policy for
using metakit from them -- all of the 0s and 1s seem to
be scattered all over the plot for all four options. i
would have expected the results with slice assignment
from a list of tuples on a blocked view that is in a
storage opened using extend-comit should behave fastest,
but even if we concede that the top-runners in both
cases somehow corroborate that expectation. furthermore,
the results seem not to allow the interpretation that
these factors act together in a synergetic way. even if
we say that factor b (blocked views) does not kick in
here because even 100'000 rows are not enough, then
still these other factors do not appear to act together.
the only three interpretations i have to offer right now are:
1) the testing code contains some grave blunder that mars
the results;
2) it is the lack of many test runs that are randomly
shuffled that is missing here -- perhaps the order
in which the storages were produced is important (i
can not see how, but i'll try);
3) the results are correct and metakit's behavior *is*
not very predictable.
perhaps someone would be eager to falsify at least the
last hypothesis.
_wolf
More information about the Metakit
mailing list