Design principles for programming languages, Part 2a: Readability, Expressability, Concision, and Regularity

In part 1 of this article, I explained how to design programming languages to help us feeble humans managing complexity. This isn’t a new thought:

Controlling complexity is the essence of computer programming.(Brian Kernighan)

The art of programming is the art of organizing complexity, of mastering multitude and avoiding its bastard chaos as effectively as possible. (Edsger Dijkstra)

Programs must be written for people to read, and only incidentally for machines to execute. (Hal Abelson)

…so I thought I could just quote a bunch of inspirational sayings and be done. No, wait. Some of you asked for more details on the specific criteria I outlined.

So today, we got details. We got half the details; the other half will appear in an upcoming post. Clap for this one if you want to encourage me to finish it faster.

Details can denote craftsmanship, as in this ceiling from Quarto Dorado, Alhambra. (source)

Note: Like many things, it’s easier demonstrate principles by picking on languages that don’t, shall we say, prioritize them. And some that actively do.

I’m going to use some “silly” languages here — or, more formally, esoteric languages — because I want to explain in cartoonishly clear terms. Some of these languages are highly optimized for criteria not necessarily listed above: obfuscation, power, concision, or sheer perversity.

Also, as some of these examples are drawn from MATLAB, let me reiterate that the opinions I present here on Medium are my own, not those of any past or present employer.

Every programming language possesses all of these qualities, in some measure.

Ready? Here goes.

1. Readability

Any fool can write code that a computer can understand. Good programmers write code that humans can understand. (Kent Beck)

Readability measures how easy it is to, well, read a bit of code and figure out what it is doing. Ideally, a programmer can easily understand what code does simply by reading it; in practice, many times it’s necessary to step through the code while it’s running, using some inspectional debugging method like print statements, to really understand the behavior. The language itself can help or hurt this, as can the programmer. Obvious wins include clear keywords and conventions for understandable variable names.

Unambiguous syntax helps, but is not essential; for example, in MATLAB, there’s usually no way to tell whether fft(x,y) is an indexing expression, method, or a function call until the moment that the line is executed.

This type of ambiguity can be useful to hide implementation details from callers of a function—most people don’t need to know which operation is happening behind the scenes, and in fact becuse of MATLAB’s dispatching rules, it might be a function or method call at different times depending on the types of x and y. It improves readability by allowing the reader to operate at the level they want: that there is an FFT occurring, on this data.

In contrast, perl requires you to label all indexing with the resultant type of the indexed variable:

While $a[$index] is, technically, a scalar value, many programmers get confused because they consider @a to be the name of the variable — and indeed it is, because $a could be a different variable. This impairs the readability of perl code; the reader must pick out $a[$index] (or an even more complex indexing operation) and mentally connect it with previous uses of @a.

I love perl, but readability is not one of its strong suits; it’s even been called a “write-only language”.

It’s tempting to make things as simple as possible, in the hopes that that will make them more readable. The MATLAB example given above is an example of that.

However, this can very easily be taken to far. With a little effort, it can be taken WAY too far. Malebolge is…well, it was created to flaunt this principle. Witness the reference implementation of “HELLO WORLD!”:

You got that, right?

Alright, here’s a slightly less ridiculous example:

Despite what it looks like, this really is a programming language — v7 of the Inform language, built for writing text-based games. This snippet is an example of room creation, complete with a custom rule (“instead of…”) which prevents the player from taking anything in the room.

But you got all that, because Inform excels at readability.

2. Expressability

I had a running compiler and nobody would touch it. … they carefully told me, computers could only do arithmetic; they could not do programs. (Grace Hopper)

But, Grace, then anyone will be able to write programs! (reported reaction to the release of COBOL)

Expressability is the converse[1] of readability. It measures how easy it is to figure out what to type when you’ve got an idea in your head.

Inform7, by contrast with its excellent readability, has some issues here. A programmer might be tempted to write the above snippet like this:

This is valid English, but it isn’t valid Inform7 code. Worse, there’s no clear indication from within the language or IDE of the correct expression of the intended statement.

I find Java to have good expressability: while there are usually a handful of ways to say something in Java, it has a high degree of structure and predictability which means you can almost tab-complete a full program if you’re using a smart editor like IntelliJ. Say I want to build that same room:

Languages with good expressability guide the programmer toward a solution that works. If it feels awkward, it’s often because you’re doing things The Wrong Way. This was my experience when learning MATLAB; my process looked like this for an embarrassingly long time:

  1. I want to read numbers from this file and compute the Fourier transform.
  2. Okay, I wrote a loop which , and figured out all the sin and cos calls. But I had to convert from strings to numbers, and looks kinda dorky.
  3. Oh, wait, there’s a parameter which reads in the file as numbers.
  4. Oh, there’s an fft function, and it takes arrays as input. Duh.
  5. Oh, hey, if I use that, I can stick it on the GPU with one extra line of code.

And what had been fifteen lines of code is suddenly one or two. That’s powerful, and power often fights with Expressability — because there is so much you can say with the language.

Simplicity is alluring when it comes to Expressability: a simpler syntax seems like it should lead to increased ease of expression. But the winner for simplest syntax is probably LISP; there are only two and a half rules (some folks would omit the third rule):

And indeed, given these, programmer can, at some level, understand precisely how to generate a new expression. In practice, however, this leads to confusion at the next level: you know how the expression will look, but are left at a loss figuring out what expression to write. It puts the burden of creative expression entirely on the programmer.

Libraries are a great help here, allowing you to pull larger and larger patterns and stitch them together. Using LISP in a library-free environment is a fun exercise — LISP can be written in LISP, after all — but it scores poorly on Summarizability, and is a lot of work.

3. Concision

This one’s simple to explain, but devilish to accomplish. Ook! is a canonical bad example:

That’s “Hello World” again. Yyyyeah. I don’t want to have to write Ook! code.

Less ridiculously, assembler is perhaps equally verbose, without being perversely so: it’s simply written at the most detailed level.

In contrast, Golfscript is written for concision at all costs. With it you can win at Code Golf, for exampel, finding all the primes below a certain number in only twenty characters:

(source)

Concision is usually in tension with both Readability and Expressability.

4. Regularity

Although the perl slogan is There’s More Than One Way to Do It, I hesitate to make ten ways to do something. (Larry Wall)

There should be one — and preferably only one — obvious way to do it. (The Zen of Python)

Regularity! It’s not just from fiber any more!

Regularity means that parts of the language are recurring and uninteresting: they’re regular, they recur in roughly the same way, over and over. This is a good thing. Regularity lets the mind separate what’s important from the frame around it.

One way to achieve this is to ensure that parts of the language that look the same also work the same. The inverse is also important: things that look different should work differently. These are the two halves of an important maxim in user experience.

Javascript, frustratingly and famously, violates this with its confusing equality operators:

Here, == looks the same but works quite differently[2] depending on its operands. As a result, the practice of the language has evolved such that === is recommended or required instead, because it works the way most people expect equality to work.

This is because something that should be Regular, and therefore ignorable, became a “gotcha” that you had to think about every time you encountered it. When things work confusingly differently based on context, the programmer has to treat each one as a different unit, and reason about them individually. In contrast, if you can trust that section to Do The Right Thing, it fades into the background.

A counter-example would be Java, or other similiarly strongly typed languages. This is a mistake that won’t ship in Java code:

because x = 1 is an assignment that returns an integer, and the if construct expects a boolean value. Your editor will flag it immediately; at worst, you’ll get an error at compile-time. So you can be confident when you look at an error-free condition that you’re looking at the right operator.

(This is also a subset of the UX principle “make errors visible as early as possible”; errors are irregularities, and as such need to look different than error-free code.)

Regularity enhances both Readability and Expressibility, sometimes at the cost of Concision; but the nice thing about verbose regularity is that it’s often easy to teach a computer to fill it in for you. Good Regularity also improves Predictability, because it provides context that both the programmer and the computer can agree on.

Stay tuned!

And that’s it for this installment. Hope you learned something, or at least gave yourself a neck cramp nodding in agreement.

If you enjoyed this, please follow me for more programming insights, random topics, and/or just to artificially inflate my follower numbers.

[1] I didn’t call it “writability”, because that doesn’t really communicate the full challenge of crossing the Gulf Of Execution. It’s more than just the writing itself — it’s figuring out how to express what you want to write. Some languages make it easier to think of solutions. More on this in a future post on the intersection of cognitive artifacts and programming languages.

[2] Damningly, it works very close to something that it looks like, but instead means something different in a confusing way. We expect equals to fundamentally work differently when comparing numbers and arrays, but in the end, they both mean “equal”…except in JavaScript, where they mean “try to convert the right hand side to the type of the left hand side, and only then compare equality”.

Obligate infovore. Antiviral blogger. All posts made with 100% recycled electrons, sustainably crafted by artisanal artisans. He/him/his.