[Metakit] Python, Metakit, Sorting tracking data

Joel Lawhead jlawhead at nvisionsolutions.com
Fri Jun 18 15:39:17 CEST 2004


Hi,

I've seen mention of metakit in various places over the last couple of
years and I finally started using it with Python. It's great! It
definitely fills the large application persistence void between object
serialization or Gadfly and relational database servers that I was
straddling up until a few days ago.

Even though I've been working with it for a few days and have gone 
through much of the list archives and available MK4Py docs I still don't 
quite have my brain wrapped around it yet and thought I would ask for 
some suggestions.

Here's what I'm doing:

On a remote server, tracking data is entered into a database by a 
tracking device about every 3 seconds. My Python program then requests 
data for a block of time in that remote database through a Java servlet 
that returns xml.

This request, usually for about 1 minute of tracking data, happens every 
10 seconds or so.

In each xml document I download, there are tracking "targets" each with
an id, a timestamp, and a location coordinate. I parse the xml and store 
each target and its attributes in a row all in a metakit view.

Next I remove duplicates from the view because there is often overlap in
the incoming data sets.

Within the view, a single target id may have several earlier recorded 
tracks with the same id but different timestamps and locations. We call 
all the data for a target prior to its latest timestamp the target "trail".

I then sort the view by the timestamps to get the latest recorded time.

Here's where I'm starting to get lost. Using that latest timestamp I 
have to get rid of all targets (and their trails) whose latest record is 
more than 30 seconds before the latest timestamp. Any target that hasn't 
  been active is considered to have dropped out of the tracking system 
and  should be removed.

I created a filtered subview which contains the indicies of the targets 
whose latest recorded date is less than thirty seconds old compared to 
the latest timestamp.

So I'm not sure where to go from here because I haven't quite figured 
out all of the MK4Py methods yet.

So far we've had roughly 1,000 targets at any given time with a wide 
variety of tail targets. But I don't for see the database ever gettin 
bigger than 200 megs.

So each time I add data to metakit I want to:

1. Throw out duplicates (done)
2. Get the latest timestamp (done)
3. Remove targets and their trails 30+ seconds older than the latest 
timestamp. (not sure)
4. Group the targets by id (no problem)
5. Access each target id by group and sort it by head (the latest point) 
and trail (all the other track points). (not sure)

Here's pretty much what the view I have looks like:

    id     date                 latitude       longitude       epoch
  -----  ------------------- -------------  --------------  ------------
   2873  2004-04-15 20:00:31 38.2749996185  -75.1578979492  1087520384.0
   2873  2004-04-15 20:00:34 38.2749996185  -75.1579971313  1087520384.0
   2878  2004-04-15 20:00:31 38.2283992767  -75.1417999268  1087520384.0
   2878  2004-04-15 20:00:34 38.2285003662  -75.141998291   1087520384.0
   2878  2004-04-15 20:00:37 38.2285003662  -75.141998291   1087520384.0
   2878  2004-04-15 20:00:40 38.2285003662  -75.141998291   1087520384.0
   2906  2004-04-15 20:00:37 38.3486003876  -75.0955963135  1087520384.0
   2906  2004-04-15 20:00:40 38.3486003876  -75.0955963135  1087520384.0
   2909  2004-04-15 20:00:31 38.3435001373  -75.1462020874  1087520384.0
   2909  2004-04-15 20:00:34 38.3435001373  -75.1462020874  1087520384.0

I add the "epoch" column for time sorting so I don't run into any 
problems sorting the date column which is a string.

I'm trying to use metakit for as much of the sorting as possible because 
it's so darn fast.

I haven't quite grasped several of the view operators such as 
"remapwith", "reduce", etc. and haven't found good examples.

Any suggestions on which metakit methods can or can't be used to do the 
five data processing steps above?

Thanks,
Joel






More information about the Metakit mailing list