Glorious Pipes! More Ode to the Unix Command Line

That’s right; more command line posts. I was just filled with wonder at the glories of the Unix command line, filled with the conviction that this is how computers are supposed to work, and decided to blog for the first time in months as a result.

The Unix Way is built upon having many, many small tools which do one thing and do it well; these tools are joined together in appropriate ways such that any one part can be swapped out without effecting the total tool chain. It does this in a number of ways; the most important can be summed up as follows:

  1. A program should do one thing and it should do one thing well. Programs that try to be all things to all men—Microsoft Office comes immediately to mind—don’t play well with others, and bloat beyond all usefulness.
  2. A program should, as far as possible, work with simple text streams. Text is a universal interface that everybody can deal with. Sometimes binary data formats are unavoidable—digital media files are about the only example I can think of—but for the most part data is suitably stored in simple text. This allows data to be used by many different programs for many different purposes, rather than (like a Powerpoint file) used by only one program in the universe, and only for the purposes that that one program deems worthwhile. This also means that a program doesn’t care where its data comes from or where it’s going; it’s just dealing with text streams. One can then replace the source of its data with anything, replace its destination with anything, and the program will continue working as before, without change.

The most frequent use for this boils down to redirection and pipes.

Redirection is simply telling a program to take its input from something other than normal, or to send its output somewhere other than normal. Well-behaved Unix programs take their input from “stdin”, a standard file descriptor which means “standard input.” Normally this is simply the keyboard; redirection, though, means that it can come from anywhere.

Take wc as an example. In its simplest form, wc does a simple word count. It will take its input from stdin and tell you how many characters, words, and lines are in it:

wc
Type: Now is the time for all good men to come to the aid of their party.
Output: 1 16 68

It’s that simple. But let’s redirect its input; we don’t want to just type everything in, we want the information from a file:

wc < file
Output: 52 1069 12280

And so we know the number of lines, words, and characters in the file “file”. That “less than” sign redirected the standard input to wc to be from the keyboard to from a file. Add a “greater than” sign and standard output, rather than being to your screen, will be to a file, too:

wc > filename < file

No output this time, because it went to a file. But if you open the ifle “filename”, you’ll find the same output there that went to your screen before we redirected it.

Which leads us to pipes. Pipes take the standard output of a program and send it directly to the standard input of another program. A trivial example (which is also the original “Useless Use of cat” award winner, but which serves well as a basic pipe example) is as follow:

cat file | wc
Output: 52 1069 12280

This accomplishes little, of course, but it’s only an example. cat prints the contents of one or more files to standard output; in this case, though, that means to the pipe, which sends that standard output to wc‘s standard input, which then outputs as normal. This example is trivial, and even useless (wc file would have given us the same data), but it shows well what pipes are and gives an inkling of their true power.

As an example, the other day I was reading with my son, who is obsessed with airplanes, about the fastest airplane in history: the North American X-15, which achieved the incredible speed of…7,274 km/hr. Now, I’m an American, and so was the plane; a speed in miles per hour would seem only polite. But the source I was reading didn’t give miles per hour, and rather than just searching the web, I decided this problem would produce a good example of the power of pipes.

I’ve written the dozenal program suite, which includes a converter for units from the metric, customary, and imperial systems in TGM (a dozenal unit system) and vice-versa. As a side effect, this program (tgmconv) also serves as a converter for metric to customary or imperial and vice versa. However, it only accepts dozenal numbers as input, for obvious reasons, and outputs them in dozenal, while I had a decimal number to input and wanted a decimal number at the end to explain to my little boy. (Who’s only just mastering basic math; no need to confuse him with radices.) Fortunately, the dozenal suite includes a decimal-to-dozenal converter and a dozenal-to-decimal converter, as well. So here’s my basic need:

  1. Take 7,274 and convert it into dozenal.
  2. Convert the result of step 1 in km/hr into mi/hr.
  3. Take the result of step 2 and convert it to decimal.

In less elegant systems than Unix, I’d be forced to open some huge, bulky program for each of these tasks; or, if I’m lucky, one gigantic one, which would require repeated manipulations of several layers of menus to accomplish each step, and which would probably result in me forgetting the result of one step before I’d gotten to needing it for the next. Instead, I used pipes:

doz 7274 | tgmconv -i km/hr -o mi/hr | dec

doz 7274 convert 7,274 (decimal) into dozenal, and prints it to standard output. The standard output is grabbed by the pipe and sent to the standard input of tgmconv, which has two flags; the -i km/hr indicating that the input unit (which gives a unit to the standard input it was receiving) is kilometers per hour, and the -o mi/hr, which indicates that the output unit (which applies to what it would emit on standard output) is miles per hour. tgmconv obediently converted the unit and emitted the result on its standard output. That standard output was grabbed by the next pipe for the standard input of the next program, dec, whose mission in life is to convert dozenal numbers to decimal. It did so, and finally emitted the result: 4,519.8540 miles per hour.

Go ahead; search the Internet for it. That’s right; it’s precisely correct.

This example is pretty trivial, too; and the great Unix shell is made powerful not only by pipes and redirection, but also by its ability to be scripted. But it provides an example, even if a simple one, of the great power of the pipes. Oh, glorious pipes! Great Unix! Live forever on my box!

Praise be to Christ the King!

Advertisement
Published in: on 8 January 2011 at 4:31 am  Comments (2)  
Tags: , , , ,

2 CommentsLeave a comment

  1. Redirection and pipes are definitely cool and fun to use.

    That said, I don’t see many instances in my day-to-day activities that would benefit from them. Do you have examples of using pipes for time-savings in a more laborious matter than unit conversion?

  2. +AMDG

    Pipes and redirection are hugely useful in day-to-day activities whenever you’re working on the command line. E.g., when I typeset the IIa-IIae (~1200 pages long) in LaTeX, I often have overfull or underfull lines in the document; but LaTeX’s output is verbose, and I just want to know about the overfull boxes. So I run pdflatex on the document so:

    pdflatex file.tex | grep Overfull

    This limits my output to the overfull boxes, which tells me what lines of my source file are overfull (if any), so that I can get them.

    It’s also very useful in other contexts; e.g., I get an email from somebody that’s full of a bunch of strange, non-standard HTML markup and images, and I want to read the actual text. I have it scripted, but what mutt does here is basically thus:

    w3m –dump message.html | less

    It’s even more useful in scripting; e.g., to prevent race conditions I want to create a temporary file with an unpredictable name, without spaces in it to prevent filename manipulation insecurities:

    FILE_NAME=”tmp_`od -An -N4 -tuL /dev/random | tr -d ‘\ ‘`.pdf”

    (This is an actual example of something I did in a script which imposes multiple pdf pages onto a single physical sheet for folding and binding.) Or I need to do multiple conversions on something; e.g., I’ve got a Word to rtf converter, and I’ve got an rtf to LaTeX converter, but I don’t have a Word to LaTeX converter (which is almost impossible to write for a variety of reasons, mostly relating to Word’s supreme suckage, that we don’t need to get into here):

    word2rtf file.doc | rtftolatex > file.tex

    It enables programs to work together by requiring them all to use the same formats (character streams) for input and output.

    Basically, pipes and redirection are excellent whenever you’re using many small tools to accomplish individual tasks, rather than monolithic monsters which attempt to do everything.

    Praise be to Christ the King!


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.