Can I use hashing to speed up searches

Yes, a normal Find works in O(N) time, a Search works in O(log N), and Hashing works in O(1), so there are definitely cases where hashing will provide the best performance by far.

For an example in Python, see "examples/find.py" in the source distribution.

For an example in Tcl, see "examples/mapped.tcl".

For C++, I could not find an example, but here is some (untested) code:

  - create your view as usual, with the keys as ''first'' property(/ies)
      c4_View data = storage.GetAs("data[key:S,val1:S,val2:I]")
  - create a second view to hold the hash mapping:
      c4_View data_h = storage.GetAs("data_H[_H:I,_R:I]")
  - construct a mapping view which implements hashing:
      c4_View hashed = data.Hash(data_h,1)
    (the "1" arg says that the key is only the first property)

Now, the key is to do all accesses through the "hashed" view, as well as all modifications. In other words: do not use data or data_h in your code, other than for setting up the above mapping view. The mapping view will intercept all modifications and apply them to data itself, as well asl adjusting data_h to maintain the proper hashing information (for insertions, modifications, and deletions).

With the above structure, you can use Find to locate rows in O(1) time, on condition that the Find is used with just the key value filled in. In all other cases, Find wil revert to linear scanning. (see also Performance of the 2.4.x mapping views)


TFW - Jan 19, 2003 Just started using Metakit, actually Mk4tcl, and after reading the excellent Roseman paper I was left with the impression that indexed access to a metakit was impossible unless you rolled your own. Not true, as mentioned above. I then dug through the mapped.tcl example, and pieced together a simple example that allows me to have a view of file names, added in any order, and directly access a specific row O(1) using a hash map. The example below should be a lot simpler than the mapped.tcl file for the simplistic case of a simple hash.

e.g.

   mk::file open db $dbLocation -nocommit
   mk::view layout db.catalog {
      File
      Folder
      ...
   }
   mk::view open db.catalog db::_data
   ;# Create a hash map view for direct access on find
   mk::view layout db.catalog_map {_H:I _R:I}
   mk::view open db.catalog_map map
   set dbObj [db::_data view hash map 1]

I can then use the $dbObj object to perform the normal set/get/insert end and other commands while the hash magically works in the background to ensure that a $dbObj find File $someFile access the approprote row in O(1) rather than O(log N) or even worse O(N).

--- Oct 10, 2003

A bonus gained from using Hash is that you don't have to worry about duplicates, since any inserts will automatically replace the previous entry with the same key(s).