ed(1) is Turing-Complete

Can a mere text editor be powerful enough to calculate anything that a computer can? Sure there’s Emacs, but it’s really just a Lisp interpreter, so it hardly counts. It turns out that even the humble line editor ed(1) is theoretically capable of such computation.

ed(1), or simply ed (the “(1)” comes from the man page section), is the standard text editor. It is the original Unix text editor and is the ancestor, either directly or indirectly, to many modern utilities, including sed, grep and vi. Unlike most editors today, ed is a line editor: instead of presenting the user with a screen full of text as a visual editor would, it uses commands to act on lines in a buffer.

A system that is Turing-complete is a system that can simulate any Turing machine, and a Turing machine is an abstract machine that can compute anything that can be computed with an algorithm (most programming languages are Turing-complete). To show that a system is Turing-complete, one only needs to show that it can simulate another Turing-complete system. Note that Turing completeness implies nothing about the efficiency of the system.

Rule 110 is an example of a Turing-complete system. It is an elementary cellular automaton (similar to Conway’s Game of Life, but in one dimension only) that was shown to be Turing-complete by Matthew Cook around 2000. As an elementary cellular automaton, the Rule defines the life of a cell in the next generation based on its current value and the values of the cells adjacent to it.

I had an idea one day: if you could call ed recursively, you’d be able to implement Rule 110 and prove that ed is Turing-complete.

So I got to work on it. It turns out, yes, it is possible to simulate Rule 110 in ed, showing that ed is Turing-complete.

Here it is, the code in full (as of writing), licensed under the MIT. Instructions on how to run the simulation can be found in the GitHub repository. An overview and a command-by-command walkthrough follows.

$a

.
-t.
.g/^\(.\).*\(.\)$/s//\2&\1/
.g/^.$/s//&&&/
s/./&\
/g
d
?^$?+,$-2g/^/+,+2t.\
-2,.j\
s/111/0/\
s/110/1/\
s/101/1/\
s/100/0/\
s/011/1/\
s/010/1/\
s/001/1/\
s/000/0/
$-,$d
?^$?,$jp
w
!exec ed '%' < '%'
q

# DATA
00000000000000000000000000000100

But before that, I’d like to thank Tom Ryder for sparking my interest in ed, and @ed1conf for reviewing the script and keeping my interest in ed high.


Table of Contents

  1. Overview
  2. Walkthrough
    1. Set up the scratch space
    2. Set up the edges
    3. Split the input into individual parts
    4. Perform one step of Rule 110
    5. Combine the result
    6. Recurse
  3. Next Steps
  4. Links

Overview

To prove Turing completeness, we must show that ed can simulate a Turing-complete system. Rule 110 is probably the easiest one to simulate since each step is just a series of substitutions.

In this simulation, the script acts on its own source code. The file is split into two parts: the code part and the data part. The code part is executed by ed, and the last line of the data part serves as input to the script.

The script is saved in a file called rule110.ed (the name does not really matter) and is run like so:

$ ed rule110.ed < rule110.ed

Each step of the simulation is done through a substitution (search and replace). Recursion is achieved by invoking another instance of ed on the script in a similar manner to the original invocation, and the result of each iteration is saved between recursive calls by writing the result to the end of the file before the call is made. The script does not terminate by itself.

Walkthrough

If you are unfamiliar with ed, consider reading Tom Ryder’s post on ed to get a feel for how ed commands work. In general, though, commands are single characters that can optionally be preceded by a range on which to act, and they may be followed by arguments.

In this walkthrough, we’ll follow the script as it executes on an example. We will assume that the data section is initialized like so:

# DATA
01001111

We’ll see how the data section changes throughout the script. The lines that were changed will be highlighted, and the current address will be indicted by the .=n (where n is a line number) just before the code block.

Set up the scratch space

We’ll start by copying the input to a scratch space at the end of the file where we’ll do our work.

$a

.
-t.

The a command “appends” the lines that follow after the given address, then sets the current address to the last line of the appended text. In this case, we give it the address $, which stands for the last line of the file.

ed will keep appending lines until it sees a line with a dot (.) by itself. So as a whole, this command inserts a blank line at the end of the file and sets the current address to that line. We will use this blank line to separate our scratch space from the rest of the file.

$a

.
-t.

The t command “transfers” (read “copies”) the given lines after the given address. Recall that our current address is the new blank line at the end of the file, and the input to the script is the line immediately before. This command copies the previous line (-) after the current line (.), thereby copying the input to our scratch space.

The current address is set to the copy.

After these two commands, our data section looks like this:

# DATA
01001111

01001111

Set up the edges

Theoretically, the cellular automaton should execute on an infinite field, but since we are dealing with a finite representation, we must account for the edges. There are two common approaches to deal with the cells past the edges, with the second being the more common approach: keep those cells constant, or simulate periodic tiling where the edges “wrap around” (i.e. the left edge is adjacent to the right edge and vice versa). The above script uses the second approach, but there is a version of the script that uses the first approach.

Constant edges

s/.*/0&0/

The code for the non-periodic version is the simpler of the two approaches. It simply adds a 0 to the start and end of the line through a substitution on the whole line.

Periodic tiling

To simulate periodic tiling, the cell past the left edge should be the same as the cell on the right edge, and the cell past the right edge should be the same as the left edge.

.g/^\(.\).*\(.\)$/s//\2&\1/
.g/^.$/s//&&&/

The g command will run a command for each line in the range that matches the given regex. It is not obvious, but g can be used as a conditional, as is the case here. We give the current line (.) as the range, so if it has two or more characters (regex: ^\(.\).*\(.\)$), then run the command s//\2&\1/.

Interestingly, ed considers it an error if an s command fails to match a line and stops the script, except when it’s guarded by a g command because g masks any errors. The use of g as a conditional guard is mentioned in the FreeBSD man page.

The s command “substitutes” the text matched by the regex with the given replacement pattern. An empty search pattern stands for the last search pattern. Here, we search the line for all characters, capturing the first and last characters, and replace it with itself, prepending the last character and appending the first character. Using our example, “01001111” would become “1010011110” after the substitution.

ed uses the POSIX Basic Regular Expression (BRE) syntax. Unlike the potentially more familiar PCRE syntax, the backslashes (\) in front of the parentheses do not indicate literal parentheses, but instead mark the parentheses as metacharacters so they can function as capture groups, which can later be accessed by \1 and \2. The ampersand (&) in the replacement is replaced with the matched text.

This regex assumes that the line has at least two characters and will fail when the line has only one, so we need to add another case.

.g/^\(.\).*\(.\)$/s//\2&\1/
.g/^.$/s//&&&/

This g command will act on the current line if it contains only a single character (ignoring the newline), and the s command will result in the character being repeated three times.

The result of periodic tiling on our example:

# DATA
01001111

1010011110

Split the input into individual parts

To prepare for our substitutions later, we must first break the input into its parts. Each part will go on its own line. It is easiest to define a part to be a single cell.

s/./&\
/g
d

This substitute command will match any character (.) and add a newline after it (&\↵), and it will do this for all characters on the current line (the g flag; g for “global”).

Inserting newlines like this is the usual way to split a line in ed. Interestingly, ed uses a literal newline character escaped by a backslash to denote a newline instead of an escape sequence like \n like in many other systems.

The original line ended with a newline, so we’re left with an extra blank line at the end after that substitution; we need to remove it.

s/./&\
/g
d

The d command “deletes” the given range, and it defaults to deleting the current line if no range is given. The current address is then normally set to the next line, but since we’re at the end of the file, it sets it to the last line of the file instead.

After splitting and deleting the blank line, our example becomes:

# DATA
01001111

1
0
1
0
0
1
1
1
1
0

Perform one step of Rule 110

This, perhaps the most complicated part of the script, is where the work is done.

Rule 110 works by determining the value of a cell based on its current value and the values of the cells immediately adjacent to it. In the usual visualization where each iteration is a row, this means that the value of a cell is based on the three cells that it touches above.

This is the approach the script takes: for every line in the scratch space not including the last two, copy the characters of the next two lines to form a line with three characters; then, perform the Rule 110 substitution given in the table below.

Current New
111 0
110 1
101 1
100 0
011 1
010 1
001 1
000 0

All of that is expressed in one command over ten lines.

?^$?+,$-2g/^/+,+2t.\
-2,.j\
s/111/0/\
s/110/1/\
s/101/1/\
s/100/0/\
s/011/1/\
s/010/1/\
s/001/1/\
s/000/0/
$-,$d

There’s a lot of information packed into this command, so let’s break it up into three parts: the range, the command, and the subcommands.

The range

?^$?+,$-2g/^/+,+2t.\
-2,.j\
s/111/0/\
s/110/1/\
s/101/1/\
s/100/0/\
s/011/1/\
s/010/1/\
s/001/1/\
s/000/0/
$-,$d
?^$?+,$-2

Recall that we separated our scratch space from the rest of the file with a blank line. To get the start of the first line with content, we just need to go one line after the blank line. The regex ?^$? will search backwards for the first blank line (notice how ? is Shift+/ on the QWERTY keyboard; / searches forward when used as part of a range). Adding a + tells ed to go to the next line. This is the first address of our range.

For the second address of our range, we want the third-last line of the file. Since we’re copying the next two lines for each line, we have to make sure that there are actually two lines available to copy, even for the last line. $ is the address of the last line, so $-2 is two lines before it; that is, the third-last line. This is the second address of our range.

In our example, the range includes lines 29 through 36:

# DATA
01001111

1
0
1
0
0
1
1
1
1
0

The command

?^$?+,$-2g/^/+,+2t.\
-2,.j\
s/111/0/\
s/110/1/\
s/101/1/\
s/100/0/\
s/011/1/\
s/010/1/\
s/001/1/\
s/000/0/
$-,$d
g/^/

The g command is the obvious choice for the “foreach” loop that we need. The regex /^/ will match the start of a line, so it will match all lines. Thus, the subcommand will be run on all lines in the range.

The subcommand: copy the next two lines

?^$?+,$-2g/^/+,+2t.\
-2,.j\
s/111/0/\
s/110/1/\
s/101/1/\
s/100/0/\
s/011/1/\
s/010/1/\
s/001/1/\
s/000/0/
$-,$d
+,+2t.

The subcommand runs relative to the line that the g command runs it on; that is, the current line (.) is set to the line that g is currently working on.

Once you know what the current address is, this subcommand is fairly straightforward: it copies the next line (+) and the line after (+2) after the current line.

The result of this subcommand on the first line is shown below.

# DATA
01001111

1
0
1
0
1
0
0
1
1
1
1
0

The subcommand: join the copied lines

?^$?+,$-2g/^/+,+2t.\
-2,.j\
s/111/0/\
s/110/1/\
s/101/1/\
s/100/0/\
s/011/1/\
s/010/1/\
s/001/1/\
s/000/0/
$-,$d

The subcommand can actually be a list of commands, with each subcommand separated by a backslash-escaped newline. The current address is not reset for each subcommand, but instead follows from the previous subcommand. In this case, the t command set the current address to the second copied line (two lines after the original).

The j command joins the lines in the given range, or in other words, removes the newlines. We want to join three lines: the original and the two copies that follow. Our current address is at the second copy, so the start of the range should be two lines before (-2) and the end should be the current line (.): -2,.. The result is a line with three characters: the cell to the left, the current cell, and the cell to the right.

The current address is set to the joined line.

The result of the join on the first line of the example:

# DATA
01001111

101
0
1
0
0
1
1
1
1
0

The subcommand: substitute

?^$?+,$-2g/^/+,+2t.\
-2,.j\
s/111/0/\
s/110/1/\
s/101/1/\
s/100/0/\
s/011/1/\
s/010/1/\
s/001/1/\
s/000/0/
$-,$d

These lines may look complex, but these are just substitutions that implement the translation table. Only one of these will match the line. Since g masks errors, if the line doesn’t match the pattern, ed will simply continue to the next one.

Note that the last substitution does not end in a backslash, so the command list ends there.

If the script were optimized for a version of ed that supports regex alternation, like GNU ed with glibc (the version found on Linux), then these eight substitutions can be collapsed into two:

s/\(111\)\|\(100\)\|\(000\)/0/\
s/\(110\)\|\(101\)\|\(011\)\|\(010\)\|\(001\)/1/

In our example, “101” is replaced with “1”.

# DATA
01001111

1
0
1
0
0
1
1
1
1
0

These subcommands are repeated for the rest of the lines in the range. This is the result of the whole g command:

# DATA
01001111

1
1
0
1
1
0
0
1
1
0

Cleanup

?^$?+,$-2g/^/+,+2t.\
-2,.j\
s/111/0/\
s/110/1/\
s/101/1/\
s/100/0/\
s/011/1/\
s/010/1/\
s/001/1/\
s/000/0/
$-,$d

The last two lines of the file were needed for the copy, but we no longer have any use for them. We can delete them by using a d command, giving the line before the last ($-) and the last line ($) as the range.

Once again, since we deleted the last line of the file, the current address is set to the new last line.

Our scratch space now contains the result of one iteration of Rule 110, spread across multiple lines (one cell per line).

# DATA
01001111

1
1
0
1
1
0
0
1

Combine the result

?^$?,$jp

All that’s left for this iteration is to join the lines into one. The blank line we set up earlier is the perfect spot for it, so all we have to do is join (j) the lines from the blank one (?^$?) until the last one ($).

Most commands can also take a print suffix to print the last line affected by the command. The simplest print suffix is p, which simply prints the line. The print suffix is not necessary for the simulation, but it’s nice to see the script actually doing something.

This is the data section after the join. Notice that the result is now the last line of the file.

# DATA
01001111
11011001

At this point, the script is ready to recurse.

Recurse

To better explain this section, I’ll walk through it out of order, starting with the ! line.

w
!exec ed '%' < '%'
q

This command is what makes the whole thing work, and the reason why some might not accept this script as proof of Turing-completeness.

The ! command executes the given arguments using sh(1) (i.e. /bin/sh, the Bourne shell), replacing any unescaped instances of % with the default filename. If ed is editing a file on the disk, which is the case here, the default filename is set to the name of the file.

We invoke ed the same way as we invoke the script itself: by giving ed the script as an argument for editing and feeding the script through stdin to execute. exec is a sh built-in that replaces the sh process with the given command. Using exec will result in only one process spawning instead of two for every recursive call, drastically reducing the number of PIDs used.

The alternative to self-modification is to have the script and the data in separate files. However, doing so would require hard-coding the filenames for at least one of them. (Exercise 3: Why?)

I expect that relying on an external utility like this will be a reason why some would dismiss this as proof. I argue that the use of the ! command in this way is acceptable because ! is a valid command, and although it does invoke sh, I’m only using it to invoke ed.

Since the next iteration of the simulation won’t be run in the same process, we must save our progress so that the next process can pick up where the current one leaves off.

w
!exec ed '%' < '%'
q

The w command with no argument writes the contents of the buffer to the default filename.

Unfortunately, to my knowledge, there’s no way to replace the main ed process with the child, so there’s no way to do any tail call elimination. When the innermost process is terminated (externally, since there’s no base case to stop it), execution will return to the parent. There’s nothing left to do after it returns, so we should quit so that it doesn’t try to execute the data part.

w
!exec ed '%' < '%'
q

The q command quits ed. The command part of the script ends here since ed won’t execute anything past this point.


Next Steps

What does the Turing completeness of ed imply? Theoretically, it means that we could potentially use ed to calculate anything, but practically, not much because it’s highly impractical to do so.

Perhaps the biggest takeaway from this script is not that ed is Turing-complete, but that conditionals and recursion are possible. Recursion that terminates is possible by using a conditional to implement a base case and guard the recursive call.

What other novel applications might we see using these techniques? I have some ideas:

  • A calculator. I suggested that it might be possible to implement arithmetic by invoking ed on a file of known length and reading its output.
  • A Turing machine. Rule 110 was the simplest, but it might not be that hard to design a Turing machine. I imagine the implementation being a series of g commands building other commands in another file to execute at the end.
  • A Brainfuck interpreter. I can see input being difficult to implement, but once a technique for arithmetic is found, I imagine that this is simply a matter of parsing the program and writing to a separate script file to execute at the end. The tape would be in a separate file, where each cell is on its own line.
  • Tetris, because even sed has Tetris.

I’m sure there are others. Please leave a comment if you manage to implement any of these or any other creative ed programs.


See also: Other notes, Answers to the exercises


TL;DR Rule 110 is Turing-complete, and ed(1) is Turing-complete because it can simulate Rule 110. The Rule is implemented using substitutions, and recursion is achieved by invoking another instance of ed. The code is up on GitHub.

Advertisements

6 thoughts on “ed(1) is Turing-Complete

  1. Thank you a lot for this I never knew I needed!

    The LUT (Lines 12-19) can be simplified though: First, replace the three cases that result in a 0, then the rest of the lines are 1s (and the only ones with more than one char on the line)

    s/000/0/\
    s/111/0/\
    s/100/0/\
    s/…/1/

    This can then be again shortened by saying “when the second and third digit are the same as the first, we want a 0”. This only leaves us with one special case, ‘100’.

    s/\(.\)\1\1/0/\
    s/100/0/\
    s/…/1/

    • Ah, I was looking for a good way to condense that. I gave up and went for simplicity instead after I saw that alternation wasn’t portable. Nice one!

$ cat your_comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.