an introduction to ratcl
Tcl is a scripting language with a lot of built-in functionality for general-purpose programming, string handling, networking, and - with Tk - graphical user interfaces. Tcl supports two key data structures: lists and arrays. Tcl 8.5 will add dictionaries, which could be described as "arrays-by-value".
But there is little beyond that, other than the comfort that "everything is a string". To locate items, to reorganize data structures, and to apply transformations over entire datasets, one has to look beyond Tcl itself, using packages such as Tcllib's record and matrix and database bindings such as Oratcl, SQLite, TclODBC, and Metakit. Either they are of limited scope, or they throw a whole world of databases into the mix. But what if you just want to explore and manipulate some data?
Ratcl (pronounced "radical") is a package for managing, traversing, and combining structured data collections in Tcl.
It provides Relational algebra operators such as select/join/project and set operators, and allows you to manipulate and transform arbitrary collections of data in a holistic set-wise manner, instead of having to iterate and loop to find, extend, and alter individual data items.
This is a brief introduction with examples to illustrate the concepts of Ratcl. It will also show you how to try out Ratcl yourself, and use it with your own data.
Although Ratcl is in its infancy, relational algebra definitely is not: it has been the foundation underneath relational databases, SQL, and more for many decades. Ratcl is not tied to a storage choice, i.e. it can be combined with anything.
Structured data in Ratcl is maintained in data collections called "views".
Views are shaped as rectangular matrices: N rows and M columns, containing NxM items. Views are homogenous in the (vertical) column-wise direction: all entries in the same column position in a row must have the same type (integer, string, etc). The (horizontal) row-wise direction can be heterogeneous: each of the columns can have different types - or to put it differently: the items in a row need not all be of the same type.
Rows have a zero-based position. Columns have unique names. These are used as symbolic tags, though it is also possible to identify columns by their position.
Views are very much like "tables", "arrays of structs", "lists of lists", and such - the terminology depends on what language one uses as reference point. For Ratcl, the central terms are simply: views, rows, columns, and (data) items.
Ratcl is a single-file binary extension. On Windows, it is called "ratcl-0.9.dll", Mac OS X and Linux use the extension ".dylib" and ".so", respectively.
It's available freely and may be used freely. The homepage has download details.
The starkit version of Ratcl runs on several platforms and includes a demo. You need the Tclkit standalone runtime (any starkit-aware installation of Tcl will do) to try it. Running the starkit will give details on how to view and run the demo.
To use Ratcl as package in your own scripts, source the starkit and package require ratcl (or use a shared lib, i.e. "load ratcl-0.9.dll Thrive").
a first example
This running example illustrates what would happen in an interactive Tcl session. The input is shown in red, the output is shown in bold.
This is the idiom to define a view called "R" and fill it with some data:
The reason for this will become clear later on.
There are 3 columns, unimaginatively called "A", "B", and "C":
Types were not specified, so Ratcl used strings.
It's time to make some changes (see example on the right):
These changes will alter the specified view, but have no effect on anything done previously with that view, i.e. this maintains copy-on-write semantics.
One last step is tidying up:
The above will also clean up all internal data structures when no longer needed. Ratcl uses a fully automatic garbage collection mechanism behind the scenes.
Ratcl operations returning a view can be extended by nesting calls. The example shown here just prints the result, more deeply nested combinations will be shown on the following pages.
As you can see, simple operations are simple to express in Tcl.
With that out of the way, it's time to have a look at manipulating data with some more capable operations, i.e. relational algebra and sets.