[Metakit] Python, Metakit, Sorting tracking data
Joel Lawhead
jlawhead at nvisionsolutions.com
Fri Jun 18 15:39:17 CEST 2004
Hi,
I've seen mention of metakit in various places over the last couple of
years and I finally started using it with Python. It's great! It
definitely fills the large application persistence void between object
serialization or Gadfly and relational database servers that I was
straddling up until a few days ago.
Even though I've been working with it for a few days and have gone
through much of the list archives and available MK4Py docs I still don't
quite have my brain wrapped around it yet and thought I would ask for
some suggestions.
Here's what I'm doing:
On a remote server, tracking data is entered into a database by a
tracking device about every 3 seconds. My Python program then requests
data for a block of time in that remote database through a Java servlet
that returns xml.
This request, usually for about 1 minute of tracking data, happens every
10 seconds or so.
In each xml document I download, there are tracking "targets" each with
an id, a timestamp, and a location coordinate. I parse the xml and store
each target and its attributes in a row all in a metakit view.
Next I remove duplicates from the view because there is often overlap in
the incoming data sets.
Within the view, a single target id may have several earlier recorded
tracks with the same id but different timestamps and locations. We call
all the data for a target prior to its latest timestamp the target "trail".
I then sort the view by the timestamps to get the latest recorded time.
Here's where I'm starting to get lost. Using that latest timestamp I
have to get rid of all targets (and their trails) whose latest record is
more than 30 seconds before the latest timestamp. Any target that hasn't
been active is considered to have dropped out of the tracking system
and should be removed.
I created a filtered subview which contains the indicies of the targets
whose latest recorded date is less than thirty seconds old compared to
the latest timestamp.
So I'm not sure where to go from here because I haven't quite figured
out all of the MK4Py methods yet.
So far we've had roughly 1,000 targets at any given time with a wide
variety of tail targets. But I don't for see the database ever gettin
bigger than 200 megs.
So each time I add data to metakit I want to:
1. Throw out duplicates (done)
2. Get the latest timestamp (done)
3. Remove targets and their trails 30+ seconds older than the latest
timestamp. (not sure)
4. Group the targets by id (no problem)
5. Access each target id by group and sort it by head (the latest point)
and trail (all the other track points). (not sure)
Here's pretty much what the view I have looks like:
id date latitude longitude epoch
----- ------------------- ------------- -------------- ------------
2873 2004-04-15 20:00:31 38.2749996185 -75.1578979492 1087520384.0
2873 2004-04-15 20:00:34 38.2749996185 -75.1579971313 1087520384.0
2878 2004-04-15 20:00:31 38.2283992767 -75.1417999268 1087520384.0
2878 2004-04-15 20:00:34 38.2285003662 -75.141998291 1087520384.0
2878 2004-04-15 20:00:37 38.2285003662 -75.141998291 1087520384.0
2878 2004-04-15 20:00:40 38.2285003662 -75.141998291 1087520384.0
2906 2004-04-15 20:00:37 38.3486003876 -75.0955963135 1087520384.0
2906 2004-04-15 20:00:40 38.3486003876 -75.0955963135 1087520384.0
2909 2004-04-15 20:00:31 38.3435001373 -75.1462020874 1087520384.0
2909 2004-04-15 20:00:34 38.3435001373 -75.1462020874 1087520384.0
I add the "epoch" column for time sorting so I don't run into any
problems sorting the date column which is a string.
I'm trying to use metakit for as much of the sorting as possible because
it's so darn fast.
I haven't quite grasped several of the view operators such as
"remapwith", "reduce", etc. and haven't found good examples.
Any suggestions on which metakit methods can or can't be used to do the
five data processing steps above?
Thanks,
Joel
More information about the Metakit
mailing list