Language Design

Unless you are a programmer that only follows specifications written by others when you program, you have designed several languages already, even if you didn't think of it that way. "But I haven't written any programming languages," you say. "I don't even know how an interpreter or compiler works." But that is not what I mean. Let me explain.

Whenever you work with your computer, you are using one or more languages. Languages define the way you as a user interact with components in the computer, and how components interact with each other. "I have actually implemented a couple of user interfaces. Is that what you mean?" Yes. "And one of my programs could be configured with a property file. Do you mean the property file is a language too?" Almost. I would say the property file is written in a language of your design, but the property file itself is a component in the software architecture, not a language. The property language is also implemented by the application reading the file. "But I was only using the standard Java property file format. I didn't design anything." Yes you did. You choose the parameters and what impact they will have on the application. Your property language is, as you correctly point out, an extension of the standard Java property files, but also a unique language. It would be impossible to interpret the meaning of your properties with only the Java property file definition as a reference.

When designing a language, I start to think about what I want to be able to do with the language. I write some statements a potential user would write, or think through a couple of use cases. At the same time, I start to define the nouns and verbs in the language with a model of the terminology used. The language then evolves to handle all requirements from the initial user. I always have an initial user, or I would not design the language at all. The hard part is then to make the language useful for others. If I only have one user, I will have to guess where to generalize and when to specialize. I definitely want to be as specific as possible, or the users will find the language useless for its purpose.

Then, if not earlier, comes the question if the language should be a standalone language with its own syntax and semantics or if it should extend some existing language. In the latter case it could be a programming language API, an XML schema, a GUI, a BEEP protocol, and so on. Another decision is if the language should be defined using a formal language like IDL or ASN.1, or if the language should have an informal description only. In the extreme case, a new formal language description language is developed in parallel. Another popular way to describe the language is to write an interpreter for it using itself, but this only work for Turing complete languages, and probably needs an informal description as well, for anyone to understand it.

The conclusion is that whenever you write a program on your computer, you implement some language at the same time. It doesn't have to be a state of the art generic programming language with the latest type theory, even a simple graphical user interface counts. If you always know which language you are implementing, you have come a long way towards being a successful programmer.

And now something to think about. If the property file is a component, which language does it implement? "Sigh, I don't know. Wait! Don't tell me. It is extending one of my other languages, right?" Right.



This is software. Where is the interpreter?


Components, Software Architecture and Product Structure

I shall begin with some basic concepts that are useful when discussing software architecture and the software development process.

Sequences of binary data are the foundation of software systems. These sequences form the natural components in the software architecture. The binary data are structured according to some language definition and, when interpreted, will give rise to a specific behavior. We say the component uses a given language to implement a new language, which is then available for some other component to use. Components are usually stored in files, but can also be volatile streams.

Note that a component never can do anything by itself. It must always be interpreted by another component which implements the language the former uses. It may seem we are building a castle in the air, or have turtles all the way down, but this time we are lucky. At the bottom is the microprocessor, a component which connects the logical structure of the software architecture to reality, which is what we care about for all practical purposes.

I must now clarify that I do not use the term software component as described in this Wikipedia article, or any other framework for connecting modules together. I mean that any meaningful binary sequence is a software component. It can be a text file, an executable, a network packet, an office document, a source file, a script, an mp3 file, a divx movie, user interaction, and so on.

The languages that connect components are what computing is all about. They are purely abstract constructs that alone define what can be done to reality through the microprocessor. A language can be a communication protocol, a programming language, a video or audio format, an instruction set to a microprocessor, an office document file format, and so on. It is common for languages to be extensions of other languages, for example programming language APIs (AWT/Java), XML schemas (RDF/XML) or protocol layers (TCP/IP).

I don't think I will ever catch up with the state of the art in language theory as it is frequently discussed on Lambda the Ultimate. But I will try, and the obvious read for the beginner is of course Structure and Interpretation of Computer Programs. Language theory is the key to computing.

The software architecture can now be defined as all the components and languages that form the software system, and how they relate to each other. In practice, large parts of the entire structure is omitted when discussing the architecture of a software system. At the bottom is some well known language or runtime environment which is not further explored. It is also common to collect natural components in new abstract components called modules, libraries or packages. This will reduce the software architecture to something that is quite manageable, but still gives us a good overview of the software system.

A software product is then defined as an abstract component, which contain one or more components from the architecture. The reasons to define products are many, but the ultimate purpose is to specify what is delivered to the customer or user. The product can be split into subproducts for reuse or to break dependencies, but this must also be reflected in the architecture, which dictates where product boundaries can be drawn.

Software products are individually tracked with configuration management. They have a version and can be branched. Their quality is tracked with test reports and issues.

The product structure is very similar to the software architecture, but there will be more components in the architecture than products in the product structure. It can be trivial with only one product, or very complicated with hundreds of products and configurations which fulfill every imaginable need. It is up to you to decide where to draw the line.

Mission Impossible: A Wifi Photo Frame that doesn't Suck

In my ongoing project to build a wifi photo frame that doesn't suck I got a new idea. Why not combine an ordinary memory card photo frame and a memory card with built in wifi? Then I could automatically upload photos to the card which would be displayed in the frame. I started to seach the web, but it was not easy to get relevant hits for all the wifi adapters with Compact Flash formfactor. Finally I found Eye-Fi.

But it turns out this product sucks too. Why does everyone keep bundling nice hardware with proprietary services? Just give me the hardware! This is what's wrong with all the wifi photo frames on the market too, not to mention the crap that is Windows Media Connect. We are living on the Internet now and don't want to be locked into a last century desktop system.

Why doesn't anyone understand that an open wifi photo frame would sell in bundles?

What can I contribute myself you ask? My current plan is to connect an eBox-2300 equipped with a CF card to a LCD monitor, install Linux and write a couple of scripts that display a slideshow in the framebuffer. A web frontend could be used to specify RSS feeds or just plain web folders to fetch images from. Why complicate things more?


DVD Backup

This is the script I use to backup my DVDs.
mkdir dvd
dvdbackup -M -i/dev/dvd -odvd
genisoimage -dvd-video -o $1.iso dvd/*
rm -r dvd
The result is an ISO file which is named after the first argument to the script.

Install the required programs with the following command.
apt-get install dvdbackup genisoimage


SPARQL Literal Matching

I use the Jena implementation of SPARQL for a personal photoalbum. I have encountered a use case where SPARQL just can't help me. It is impossible to split a literal into parts. Take this RDF example.
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
_:a foaf:name "Johnny Lee Outlaw" .
If I want to use SPARQL to convert this information into a different schema where first, middle and last name are separate properties, it can't be done.
@prefix ns2: <http://some.other/namespace/> .
_:a ns2:firstname "Johnny" .
_:a ns2:middlename "Lee" .
_:a ns2:lastname "Outlaw" .
This could be solved by allowing variables to bind to regexp groups. Something like this.
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX ns2: <http://some.other/namespace/>
?s ns2:firstname ?first .
?s ns2:middlename ?middle .
?s ns2:lastname ?last .
?s foaf:name ?fullname .
FILTER match(?fullname, "([^ ]*) ([^ ]*) ([^ ]*)", "?first ?middle ?last") .
I could of course have missed something that allows me to do what I want. You are welcome to correct me in the comments.

In this case, not being Turing-complete is a major drawback for SPARQL. If it was Turing-complete, the problem would be solvable in some way or another. The Principle of Least Power is very useful for data definition languages, but I doubt a SPARQL query is useful for an interpreter not being a query engine.

The easiest way to create a Turing-complete language is to embed it in a Turing-complete host language. Then it is always possible to go beyond the embedded language and use features from the host language when necessary.