Outline Proposal for Strongly Type-Checked Make and Shell

Daniel S. Wilkerson; 2 May 2006 and 4 October 2006

Note: this was originally written as a lettter to someone who had asked for my advice on this question. It is really just an outline of ideas written for people with experiance with programming languages static analysis who can fill in the blanks.

Shell-scripts and Makefiles are brittle

Corner conditions: This makefile has a mistake:

foo:
    for A in `ls *.foo`; do ls -l $$A; done
    do_something_else

The mistake is that if there are no .foo files in the directory, the for loop will fail, rather than doing what most people expect and executing the inner loop 0 times and then going on to the next line.

Everything is a string: Shell and make have multiple layers of evaluation, which can lead to quoting and tokenizing hell. For example, if I had typed $A above instead of $$A, it would be wrong, but it would still do something, just not what you want (repeatedly list the whole directory in this case).

Non-functional makefiles: It is easy to write a makefile that will build everything from scratch, but much harder to write one that will incrementally update the project if only some of the files have changed. Make is actually intended to be a pure functional language and you could check that it is used that way; if it is, then make -j will speed up your makefiles; if not, make -j is dangerous; in fact, all order-dependencies in make are dangerous.

One really difficult is to write (and analyze) makefiles for is functions that map multiple inputs to multiple outputs. This happens by necessity in real makefiles though.

Result: brittleness: These kind of mistakes make shell and make scripts/makefiles very brittle. This brittle property of shell and make is completely unnecessary but people have simply gotten used to it. And don't tell me about automake or autoconf; they seem to cause more problems than the solve.

Partial correctness solution: APIs for Unix Utilities

It would be a big help just to provide some partial correctness guarantees for make and shell, so that any failures are at an algorithmic level rather than due to some nasty gotcha. That is, the C typechecker us that does not guarantee us that our C program is correct, but it can tell me that struct Bar does not have a foo field and that I am attempting to call a pointer to an array instead of a pointer to a function. Such checking would help considerably in shell and make.

We could basically partially-evaluate make and shell; that is, run the makefile or shell script "in the abstract". Partial evaluation done right can be though of as just another form of typechecking. If makefiles and shell scripts are programs, then their functions are the Unix command line utilies. What is missing is that we don't have APIs for these utilites. We would need the following.

A "signature" or API for the standard Unix command line utilities, such as 'ls', 'ar', and 'gcc'. This signature would say how to parse its flags and "what they mean" at some abstraction. For example we know that gcc -c -o b.o a.c is supposed to be a function where a.c is the input and b.o is the output.

A typesystem for files: we know that a .c file has some meaning different from a .o file; you shouldn't run a .c file for example.

A model of the file system: we need to have some idea of what state the file system is in when the script or makefile starts and then track changes as it runs.

Examples of how it might work

Running programs in the abstract means the abstact interpreter finds corner coditions because it considers all possibilities. Consider the quoting mistake pointed out above where in a makefile the programmer says '$A' instead of '$$A'; if there is no variable 'A' in the makefile then it must come from the environment; this requirement can be reported to the user who will be surprised if he isn't expecting it.

Tracking the use of a string through a script or makefile gives it a "type" and mis-uses can be caught. Consider the for A in `ls *.foo` example above; the filesystem model would notice that having no '*.foo' files is a possibility (since it hasn't been ruled out) and in that case the 'for' loop gets something that is not of the type "list of files in the directory ending in .foo". Instead it gets the string '*.foo' which we have no reason to belive has the property of being the name of a file in the directory.

If one make target writes to a file 'bar' and another one reads it, but this dependency is not mentioned explicitly in the makefile, then this is a possible concurency bug and make -j is unsafe.