Scripting - the fourth generation

If you've written more than 10,000 lines of code, it's time to switch languages...

Yes, really. To explain why, let's examine briefly how software development has been evolving.

The first generation

In the beginning (whatever that means), there was assembly-code. To get software of any meaningful complexity done, we invented "macro-assembler": elaborate tools which allowed you to not just churn out code instruction-by-instruction, but to define clever macros to handle a lot of the recurring work: argument passing, data initialization, stack cleanup, and more. I'll call this the first generation of software development. One line per machine code instruction (less, once sophisticated vendor-supplied macro libraries were introduced).

The second generation

Then came compiled languages, such as FORTRAN, ALGOL, and the revolution of C. To get even more complex software done, we invented "object code libraries": self-contained collections of functionality, ready to be tapped - by including headers and linking in the code. This mechanism is still used widely today, with as refinement the ability to load libraries dynamically.

But code libraries are not always flexible enough. If the designer of a library did not build in ways to extend the library, you were stuck with it. The only way out would be to code around it (or just live with it), or to try to get a copy of the source code, make some alterations, and use the modified copy instead.

The third generation

Time for a new approach. Welcome to object-oriented application frameworks: an elaborate amount of code which is "almost there". Designed to have lots of ways to plug in the specific functionality you need. Long live derived classes and member function overrides. This approach is still used, and is probably the predominant mode of developing new applications today. It's the "modern" way...

The problem with frameworks is the learning curve. Though frameworks promise to solve all the nitty gritty details for you, the reality is that you need to know exactly how they do that, to be able to plug in your own code. Many people who use frameworks without investing in this form of deeper knowledge end up fighting that very same framework which was supposed to take care of things: with "impedance mismatches" everywhere, and working around "flaws" (which often are simply different assumptions on how to code things). There are usually many ways to achieve the same result... trouble is, that with frameworks, you better make sure you understand how the framework designer intended things to be done.

The fourth generation

Time for a new approach. More abstraction. Welcome to scripting. Instead of choosing a single conceptual level, and the corresponding language/tool, you start at the level which is most productive: Python, Tcl, Perl, ... each of these has a large following of people who can easily point out how tremendously productive they are. Unlike frameworks, these languages are simple to learn and to use. Exceptions, good error tracebacks, interactive tinkering, introspection - these are some ingredients which get you going fast, and keep you running ever after. If you need to drop down to the bare metal, to obtain more performance, or to connect to services which are only available in C, for example - then you can. For some, that is an absolute necessity, for many others it almost never happens. Scripting, with all of todays libraries and its well-developed "gluing" capabilities, often gets the work done. An example: take an efficient database or a GUI, "glue" it into a scripting language (others will probably already have done it for you), and you'll probably end up having plenty of speed.

Why scripting is different

How can it be that scripting works so well? Isn't it just another way of ending up with complex software? How does it differ from modern frameworks?

The key concept which explains this, is "abstraction". Scripting is not a horizontal extension to C, which is almost always used as implementation language for the script compiler or interpreter. You don't end up with more C, you end up with a context where you can forget about C! While the script implementation is C, and deals with pointers, memory allocation, and error conditions - it does this so you never have to again. Strings just work. Exceptions are embedded into the language in a natural (and robust) way. Stray pointers can no longer mess up what you have coded. You have left the efficient machine-oriented realm of C, and are entering a world where crashes become virtually non-existent.

And scripting goes further. Instead of learning how to code I/O one way in one machine environment, and differently in another - you get a platform-independent environment where OS differences largely disappear. Even GUI differences tend to disappear below the surface - an application scripted using Tk works on Unix, Windows, and Macintosh - requiring hardly more attention than tweaking for some font and screen size differences. Stop filling your mind with platform-specific details... it's a waste of your gray cells, and worse: such details tend to become obsolete the minute you learn about them.

What about a fifth generation?

The fundamental technique of using a language to build the perfect environment in which to write an application can be repeated. Scripting languages illustrate how far one can go in defining general-purpose languages. It is only logical to expect this to be repeated for application- and domain-specific issues. If scripting is so useful because one can leave C behind and get more real work done, then why stop there? Scripting can be used to define more powerful tools for domains such as software development, networking, but also banking, accounting, telephony, collaboration, and a host of other industry-specific areas. Yes, there will be a fifth generation - but with a difference: there will be several fifth-generation environments. All built with a mix of scripting and interfaces which "drop down" to domain-specific "legacy" code.

This form of layering really takes place in more ways than just "up", however. In the early days of personal computers, before the term PC was coined in fact, there were several small-scale environments - the best known being BASIC and FORTH. Interestingly enough, these both use a form of layering, in very much the same way as has just been described. There have been several "Tiny Basic" implementations which used two levels of interpretation - fitting entirely in just a few Kb's of "code" (or more accurately: interpreter tokens / threaded code instructions).

10,000 lines of code

This brings me back to the statement that one should not write more than 10,000 lines of code (LOC) in any language. Large software projects are still very commonplace (and the graveyard of failed projects is vast...). When Mozilla (the Netscape browser source code) is being described as a multi-million LOC project, it illustrates how far we have come.

And how far we have strayed... Ideally, a million lines of code ought to be replaced by two thousand lines of code: one thousand to define a project-specific "tiny" language, and one thousand lines written IN that language to implement the rest of the functionality. This is totally unrealistic, of course. A better scale would be to state that the original one million lines of code could be re-coded as 20,000 lines of code: 10,000 for a script language implementation (special purpose, perhaps), and the other 10,000 lines in that language to implement the core of the application.

You should try it! Next time you are involved in a large software project, try to rethink it as a core engine dedicated to make most of the application-specific work trivial to write... (and maintain, and extend, and explain, and document).

Hundreds of thousands of programmers are doing just that, and proving it works. And most of them don't even need to write the engine: Python, Tcl, and Perl are there to make it happen with a fraction of the effort usually associated with large projects.

No one in their right mind should ever have to deal with more than 30,000 lines of code in a single project. Actually, I'll rephrase that: no mortal being can deal with more complexity. I am absolutely convinced that we all have a 15-bit program counter in our heads - at best!

� April 1999

Starkits are a lifestyle

Perfection is the ideal, but the enemy of done

Virtual reality

The Designer's Stance

The strength of Tcl

One year later

Our past is holding us back

Outside the box

Musings of a maverick

An exponential rant

Why compilers are doomed

The Jericho itch

The nuts and bolts of scripting

Scripting - the fourth generation

Tcl (over-) flow