1 · Introduction
LuaVlerq is a Lua extension for managing and storing structured datasets.
The current document is preliminary, some parts are incomplete or inaccurate.
2 · Getting started
Get the latest source code from the Subversion repository:
svn co svn://svn.equi4.com/vlerq/branches/v7 vlerq
Edit vlerq/src/config
as needed, then proceed as follows:
cd vlerq/src make make test # optional make install # optional
On to the first example:
$ lua Lua 5.1.2 Copyright (C) 1994-2007 Lua.org, PUC-Rio > require 'vq' > vq{1,2,3}:p() ? - 1 2 3 >
Explanation: vq is the name of the LuaVlerq module and also the interface through which all objects are constructed. This example converts a Lua table into a LuaVlerq object and prints it using its "p" operator.
Here is a slightly more elaborate example:
> (vq{meta='A';1,2,3}..vq{meta='B';4,5,6}):reverse():p() A B - - 3 6 2 5 1 4 >
Explanation: two tables are used, each with explicit column names. Once converted to LuaVlerq objects, they are "paired" (i.e. rows concatenated in the horizontal direction). The result is a new object, of which the rows are printed in reverse order.
Note: this example defines
A
andB
as string columns, which is not quite same as the first example. To force the column type to integer, usemeta='A:I'
andmeta='B:I'
.
These examples only scratch the surface of LuaVlerq, illustrating how operations are applied to combine and re-organize data structures in different ways. The p()
method is mostly a convenience for debugging, in normal use the resulting objects would be saved in variables for further use.
3 · Concepts
LuaVlerq introduces a bit of terminology, all of which is essential to understand how LuaVlerq works with its data and what its operators do. The key terms and concepts are described in this section.
3.1 · Views
The "view" is the central data structure in LuaVlerq. It is a rectangular collection of cells, with rows and columns. A cell can store at most one value. Rows are indexed by their ordinal position in the view. Columns are identified either by their ordinal position or by name.
Row and column positions are 1-based in LuaVlerq (but 0-based internally), i.e. rows count from 1 through <numrows> and columns count from 1 through <numcols>. Negative values are treated as relative from the end, i.e. row -1 is the last row.
Rows are uniquely identified by their position. When rows are inserted or deleted in the middle of a view, all following row positions change accordingly.
Column names may be empty and need not be unique. In the case of duplicate column names, column access by name is not well-defined.
Columns are typed: all values in a single column must be of the same data type. The types supported are: I
= integer (up to 32b), P
= 1-based positions (up to 32b), L
= longs (64b), F
= floats (32b), D
= doubles (64b), S
= strings (0-terminated), B
= bytes (arbitrary byte sequences), and V
= (sub)views.
3.2 · Rows
In addition to view objects, LuaVlerq supports lightweight row objects. These are (view,index) pairs which identify a row in a view. Being objects, rows can be passed around as needed. A row in a view can be treated as a collection of values in named fields, and as such very much acts like an object itself, whose state happens to be stored in a view.
3.3 · Empty views
There are two edge cases for views: without rows and without columns. Both are allowed (and so is their combination: the 0-row/0-column view). A 0-row view is simply a structured collection with no data (yet). A 0-column view is slightly unusual in that it cannot contain any values. This case is less common, but it turns out that such 0-column views are in fact quite useful. A 0-column views with N rows is in similar to the integer value N. For example: the relational product of two 0-column views with N and M rows produces a 0-column view with N*M rows.
Conversely, in most places where an integer argument is expected in LuaVlerq, a view can be given instead. This is equivalent to passing the number of rows of that view as an integer.
3.4 · Subviews
A subview is a view stored as value in another view. Subviews are values, not references: circular/recursive subview chains are not possible.
3.5 · Metaviews
Each view in LuaVlerq has a structure which is also described using a view, called the "metaview". For each column in view v
there is a row in its metaview which describes that column: its name, type, and subview structure.
Given that metaviews are views, they too have metaviews (the "metametaview"). But since all metaviews have the same structure, there is exactly one such metametaview. Though rarely needed, it exists so that the entire view hierarchy is well-defined.
3.6 · Operators
As objects, views have access to a wide range of view operators or "vops". The built-in ones are described later in this document, but vops can also be defined by the application. Once defined, such custom vops become indistinguishable from the built-in ones. Vops can be defined in Lua or C.
3.7 · Storage
LuaVlerq supports disk-backed storage and network-based data-exchange. Any view structure can be stored on file or sent across a communications channel in a compact format. Views can be loaded on-demand, whereby data only gets read off the disk when needed. There is a separate Storage section describing these capabilities.
4 · View operators
There are many pre-defined view operators and additional custom ones can be defined at any time. The normal way to access these operators is via Lua's ":" method-call convention, but V:someop(...)
can also be written as "vq.someop(V,...)
".
The view operators described in this section are grouped in several categories to place related ones near each other.
4.1 · View setup and info
Several important functions defy categorization - these are described first:
vq(T[,M])
Convert table T
into a view. Only the list portion is used if M
is absent, i.e. the entries stored at T[1]
through T[#T]
. If T.meta
does not exist then T
is converted to a view with a single integer column, otherwise T.meta
is used as metaview. If T.meta
defines two or more columns, all values will be taken from T
in row-wise order. If metaview M
is present, it defines the 1- or 2-column view structure (any T.meta
will then be considered a regular entry). With 1 column only the key part of T
is used, else all key/value pairs of T
will be used, each in their own column.
vq(N[,M])
Create a new view with N
rows. If M
is a string or a view, it is used as metaview describing the view structure, otherwise return an empty 0-column/N-row view.
vq(S)
Convert a description string S
into a metaview.
vq.s2t(S)
Convert a type string to the integer code as stored in metaviews.
vq.t2s(T)
Convert an integer type code to a string. Currently, the code is always 0..8 and converts to one of "N" (nil), "I" (integer), "P" (position), "L" (long), "F" (float), "D" (double), "S" (string), "B" (bytes), or "V" (view).
tostring(V)
Return a short string with the type, size, and structure of view V
.
V:calc(M,F)
Return a "calculated" view, with the same number of rows as V
and the structure specified by metaview M
. For each item access, function F
is called with the row and column index as arguments. The result must have the type specified in M
.
V:describe()
Return a description of the structure of V
, which can be either a view or a metaview.
V:dump()
Return a nicely-formatted tabular version of the contents of view V
as a string. Subviews are shown as a row count.
V:html()
Return an HTML rendering of the contents of view V
as a string. Unlike the dump()
operator, this recursively renders the contents of subviews as well.
V:meta()
Return the metaview of V
, describing its structure.
V:p()
Display view V
on stdout. Implemented as: "print(V:dump())
".
V:rename(X)
Return a view with some columns of V
renamed according to X
, which can be a table or a string (the latter is shorthand for {[-1]=X}
). Keys in X
must be column numbers or names, values must be valid column names. Can also be written as: V%X
.
V:table()
Return a table with the rows of V
. If V
has one column, return its values as index entries 1..#V
. With two columns, use the first as keys and the second as values.
V:width()
Return the number of columns in V
. Implemented as: "#V:meta()
".
4.2 · Lua operators
Some common operations are defined as Lua operators:
#R
Given a row object R
, return its index.
R()
Given a row object R
, return its view.
#V
Return the size of view V
, i.e. the number of rows it contains.
V[N]
Return row N
of view V
as a new row object. Valid rows numbers are from 1 through #V
, with negative values interpreted as being end-relative.
V[W]
Return a view with rows from view V
"mapped" by the indices in the first column of view W
. The number of rows in the resulting view is the number of rows in W
.
V..W
Return a new view with all columns of V
on the left and all the columns of W
on the right. Implemented as: V:pair(W)
.
V+W
Return a view with all rows of V
followed by all rows of W
. V
and W
must be compatible. Implemented as: V:plus(W)
.
V/X
Return a view with columns from view V
"mapped" according to the number, string, or view X
. Implemented as: V:colmap(X)
.
V%X
Return a view with some columns of V
renamed, as specified by entries of X
, which can be a table or a string. Implemented as: V:rename(X)
.
V(...)
Return a subset of V
matching some criteria, implemented as: V:select(...)
.
4.3 · Get and set values
The contents of rows and views is accessed and modified with the following operators:
R.name
Access the "name
" field of row R
. Used to get or set the value of the designated field. Setting a field to nil does not delete the entry but sets it to a default value (this is likely to change in a future release). Use the replace()
operator to delete rows.
R[N]
Access to column N
of row R
. Used to get or set the value of the designated column. Do not confuse with V[N]
view indexing, which returns a row.
V:append(...)
Append one ore more rows to V
. The number of arguments must be a multiple of the width of V
, and the type of each one must match the corresponding column type in V
.
V:replace(I[,N[,W]])
Replace N
rows starting at index I
in view V
with all rows in view W
. W
must be compatible with V
. When N
is zero, this inserts rows. When W
is nil or absent, this deletes rows (N
defaults to 1). Index I
can be #V+1
to append at the end.
4.4 · Vector operators
One way to deal with views is to treat them as vectors of rows or columns. The following operators provide some primitives and wrappers:
V:blocked()
This operator allows access to Metakit's "blocked" views.
V:colmap(X)
Return a view with columns from V
as specified in X
, which can be a column number or name, or a 1-column view of column indices. Can also be written as: V/X
.
V:find(C,...)
Return the row of V
for which condition C
is satisfied. Uses select()
and returns nil unless exactly one row is found.
V:first(N)
Return the first N
rows of V
. Implemented as: V:pair(N)
.
V:last(N)
Return the last N
rows of V
. Implemented as: V:reverse():pair(N):reverse()
.
V:omit(W)
Return a view containing all rows of V
except those listed by index in map W
. Implemented as: V[V:omitmap(W)]
.
V:omitcol(X)
Return a view containing all columns except X
. Uses colmap()
and omitmap()
.
V:omitmap(W)
Return a 1-column view containing all indices except those listed by index in the first column of W
. The result has at least #V-#W
rows (more if W
contains duplicates).
V:pair(...)
Return a view containing all the columns of view V
, followed by all the columns of each of the views passed as argument to pair()
. The result has as many rows as the smallest view. V:pair(W)
can also be written as: V..W
.
V:plus(...)
Return a view containing all the rows of view V
, followed by all the rows of each of the views passed as argument to plus()
. All views must have a compatible structure. V:plus(W)
can also be written as: V+W
.
V:reverse()
Return a view with all rows from view V
in reversed order.
V:rows()
Return a generator for looping over all the rows in V
.
V:spread(N)
Return a view with each row in view V
repeated N
times.
V:step([O[,S[,R]]])
Return a 1-column view with the same number of rows as view V
. The integer in row position I
is "O+(I/R)*S
", where offset O
defaults to 0, and both step S
and rate R
default to 1. Integer division is used (R
must be at least 1).
V:times(N)
Return a view with N
times the contents of V
, e.g. V:times(2)
is like V:plus(V)
.
4.5 · Set operators
With unique rows, these set operators return the usual results. With duplicate rows, each of these operators is also unambiguously defined:
V:except(W)
Return the rows of view V
which are not present in view W
. Implemented as: "V[V:exceptmap(W)]
".
V:exceptmap(W)
Return a 1-column map of row numbers in view V
, for those rows which are not present in view W
.
V:intersect(W)
Return the rows of view V
which are also present in view W
. Implemented as: "V[V:isectmap(W)]
".
V:isectmap(W)
Return a 1-column map of row numbers in view V
, for those rows which are also present in view W
.
V:union(W)
Return all rows of view V
, followed by those rows in view W
which are not present in view V
. Implemented as: "V+W:except(V)
".
4.6 · Relational algebra
The Relational Algebra operators make it possible to express various queries:
V:product(W)
Return the cross product of V
and W
. Implemented as: V:spread(W)..W:times(V)
.
V:project(W)
Return the relational projection of V
for the columns listed in map W
. Implemented as: (V/W):unique()
.
V:select(C,...)
Return the subset of V
for which condition C
is satisfied. Can also be written as: V(C,...)
. Implemented as: V[V:selectmap(C,...)]
.
V:selectmap(C,...)
Return a 1-column view with indices of the rows in V
for which condition C
is satisfied. If C
is a table, it is used to match on equality of the specified keys. Otherwise, C
and all further arguments are matched on equality against the first column(s) of V
.
V:where(F)
Return the subset of V
for which F
returns true. Implemented as: V[V:wheremap(F)]
.
V:wheremap(F)
Return a 1-column view with indices of those rows in V
for which function F
returns true. F
is called with a row object as argument.
4.7 · Groups and joins
Grouping re-shapes collections so that common parts are collected into subviews, whereas ungrouping does the inverse. Joins are like relational joins:
V:group(W)
Return a view which groups the columns of V
as indexed by W
. The result has a new subview column appended, with rows corresponding to all matching groups.
V:ijoin(W)
Return the inner join of V
and W
. Implemented as: V:join(W):ungroup(-1)
.
V:join(W)
Return the natural join of V
and W
. Matches are collected in a new subview column appended to the result. The join is performed on all columns with matching names.
V:ungroup(N)
Ungroup column N
in view V
, which must contain subviews. The result "flattens" the subviews and appends their columns to V
. Empty subviews cause their parent row to be omitted from the result.
4.8 · Sort and compare
These operators provide basic sorting and comparison functionality:
V:sort()
Return a sorted version o view V
. Implemented as: "V[V:sortmap()]
".
V:sortmap()
Return a 1-column map of row numbers in V
, such that it would sort view V
if the map were used to reorder V
. Sorting is stable, i.e. equal rows retain their relative order.
V:unique()
Return a view with all duplicate rows in V
omitted. The order of the rows is not affected. Implemented as: "V[V:uniquemap()]
".
V:uniquemap()
Return a 1-column map of row numbers in V
, for those rows which are not duplicates of any preceding rows.
4.9 · Custom operators
View operators can be defined by adding them to the global "vops" table:
function vops.myvop (...) ... end
This defines a myvop
view operator as a function which gets a view as first argument when called as v:myvop(...)
.
To cast arguments to a specific type, such as a table to a view or a view to an integer (its row count), append a type annotation to the operator name when defined. E.g. by defining vops.myvop_VVI
, a wrapper will be set up around the given function object which casts the first two arguments to views, and the third arg to an integer.
Note: a view operator defined as
myvop_VVI
ends up being calledmyvop
, this mechanism does not allow defining view operators with underscores in them.
5 · Storage
These operators support saving and loading views in a compact and portable format, similar to the Metakit database library (under certain conditions compatible with it).
5.1 · Saving
Views can be saved as (binary) string or to file:
V:emit()
Return a binary string representing view V
, i.e. its structure and contents (including any nested subviews).
V:save(S)
Save view V
to a file named S
. Return the number of bytes written.
5.2 · Loading
Views can be loaded from string or from file:
vq.load(S)
Reconstruct a complete view from a (binary) string, as previously created via emit()
or as read from a file created by save()
.
vq.open(S)
Map a file named S
into memory in read-only mode and return a view with the same structure and contents as the original view.