7C. Avoiding the Swamp: Function Contracts


Importance of documentation

Internal and external documentation

Internal documentation is written in a program as comments, and is intended to help a software developer. External documentation is written in a place where people who need to use the software can read about it, such as in a book.


Importance of documentation for the developer

Each function in a piece of software solves a specific problem. Before you try to solve any problem, you should have a good understanding of exactly what the problem is. It makes no sense just to start writing and then, afterwards, look at what you have come up with to see whether it solves any useful problem!

Inexperienced computer programmers imagine that they can keep all problem descriptions in their heads. Experience has shown that they can't. Nobody can. Three issues come up.

  1. When writing a function definition without written documentation, you only have a rough idea of what the function is supposed to do. While you write, the idea morphs in your head. A simple interruption can cause the idea to lose what focus it has. You end up in a swamp of your own making, thrashing about trying to understand what you are doing.

  2. Suppose that you test the function and find that it does not work. So you need to fix it. But during the process of fixing it, you have nothing but your memory telling you what the function is supposed to do. It is difficult to keep that in your head along with the details of how the function is supposed to work and how the program is being tested. Because of that, the process of fixing a function definition can take the function further away from its original intent, not closer, and that takes you into the swamp.

  3. Later, when you need to use a function, you have forgotten just what it does. Everybody forgets. You can either reverse-engineer the function from its body or you can throw away the function or even the entire piece of software and start again. You don't want that.

    The situation is actually much worse than you might imagine. Suppose that you are working on a piece of software that is 10,000 lines long, and suppose you set out to reverse-engineer one function definition. But that function's body calls other functions that are written in this software. You don't know what they do until you reverse-engineer them too. But they, in turn, call more functions that are written here. The task of reverse-engineering one function definition can be prohibitively expensive. Now suppose that the software is 100,000 lines long.


Self-documenting software

Software maintenance is the job of keeping software up to date. It can involve adding new features, making the software work on a new operating system, changing the format of input or output, etc. Software engineers have found that, for a piece of production software, far more time is spent doing maintenance than was spent doing the original development.

You might have heard of self-documenting code. The idea is for functions to be written in a readable form so that, to find out what a function does, you just read the function's body. That is a good thing to do and, for very small pieces of software, you don't really need any more documentation than that. But imagine a larger piece of software, say with about 1000 functions. Such software is built up function upon function; one function typically uses a few others that are defined in the same collection of 1000 functions, with the exception of the bottom-level functions that only use the library.

Suppose that the software has no internal documentation, and relies on self-documenting code. Now you want to understand what a particular function does. So you read its body. But it uses 3 other undocumented functions, so you need to understand what they do first. Each of those uses 2 undocumented functions, so you must read their bodies too. It goes on and on. You find yourself reading thousands of lines of code to understand a single function whose body is only ten lines long.

The only way that anyone can maintain an undocumented large piece of software is to reverse-engineer the whole thing and add documentation that should have been written by the developer. As you can imagine, no maintainer is happy about that, Most of the time, it is too difficult. Undocumented software is often just thrown away as unmaintainable.



Function contracts

Function contracts

A function contract is a comment that tells precisely what the function accomplishes and how its parameters affect what it accomplishes.

Someone who reads a contract is assumed to be able to see the function heading, but not the function body. Therefore:

A contract cannot refer to a function's body. It cannot refer to local variables in the body. It cannot refer to loops, if-statements, or anything else in the function's body.

Contract bonus

The standards for this course require you to write a contract for every function in your program. If you do not write contracts, or if your contracts are shoddy, you can expect to lose at least a letter grade.

That is the stick. Here is the carrot. If you write good documentation throughout your program, you will receive a bonus of 10 points (out of 100). That boosts your score a letter grade. If your documentation is mostly good, you might receive fewer than 10 bonus points, but even that helps a lot.

Putting the stick and the carrot together, the swing from good documentation to poor documentation can be two letter grades.

You can only receive the bonus if your program is mostly correct. Documentation by itself will not do. If you only write half of the program, and that part is well written and well documented, you will receive half of the bonus. Pervasive misspelling and poor grammar can disqualify the program from receiving the bonus.



Do's and Don'ts for function contracts

Do provide information that is useful to the reader and that is not obvious from the function heading.

Later, we will see linked lists. If parameter L is a linked list, say so in the contract. If an integer parameter must be positive, say so in the contract. Help the reader out.

Do make sure to discuss the role of each parameter.

Don't just say that the parameter exists. Don't tell its type. That information is obvious from the heading.

Say how each parameter affects what the function accomplishes. If a parameter is an out-parameter, make that clear; say what gets stored in it.


Do refer to parameters by their names.

Do not use pronouns or vague phrases such as "the number" or "the given graph" to refer to parameters. Do not refer to a parameter by its type. Always refer to a parameter name by its name.

I recommend that you write parameter names in single-quote marks, such as 'size'. That makes it clear that you are referring to a parameter. (For single-letter parameters, you can do that, but it is not usually needed. Parameters called a probably need quotes to avoid confusion with the word.)

In common writing, authors avoid using a name over and over. A sports writer might refer to a team as "The Bulldogs," "the dogs", "they" or "State." Technical writing is different. Refer to each parameter by its name, even if you use the name 50 times.


Do describe what the function returns.

It is not enough to say that a function returns "an integer". That is obvious from the heading. Say what the returned value means to the function's caller and how the parameters affect it. For functions with a void return type, nothing is returned, so skip this part.

Do discuss requirements on the parameters

If the function has requirements, be sure to mention them. For example, if parameter x must be a positive number, say that x > 0 is required. The caller needs to know that in order to use the function.

Make it clear what you are saying. Don't just write x > 0. Write "It is required that x > 0."


Do mention side-effects.

If the function has a visible effect (which does not include what it does with its own local variables) be sure to mention that effect. For example, if a function writes something to the standard output, then the caller needs to know that. It is not necessary to mention side-effects that are only done when tracing is turned on.

Do provide examples.

Showing some examples of what the function does can be very helpful to a reader. If a function reads a file in a particular format, show what a sample input file looks like. If it takes a number and yields a number, show some examples. If it takes a list and returns another list, show what it returns on sample lists.

Do write a contract before you write the function definition.

Before you try to solve a problem, find out what the problem is. A contract specifies just what problem the function solves. Get it down in writing before you write the function definition. If you do not understand what the problem is well enough to describe it, then you certainly do not understand the problem well enough to write a computer program to solve it.

It might help to copy information from the assignment. But modify it so that it makes sense. A contract should not say "Write a function that …".

For some reason, writing contracts before code is one of the most difficult things for students to do. It might be that they simply have so little experience writing that they cannot imagine what to say. Whatever the reason, almost every student who asks me about his or her software assumes that it is obvious that contracts will be added after everything is working.

The programming assignments tell you what each function is supposed to do. I give you permission to copy such information into your program. There is no excuse for putting off documentation. You plan the building before you build it, not after it has been built.

Expert developers keep their software in clear, readable form during development because they know that keeping the software easy to understand eases development. That includes using sensible names, keeping the software well indented, and including function contracts and other useful comments.

Novice software developers tend to work on software in a form that is difficult to read and hence difficult to understand. Once the get it working, they indent it sensibly, rename things and add documentation. The do it the hard way but then make it look like that they did it in the easy way. You be the judge of whether that is smart.


Do modify the contract if you modify what the function does

A common mistake is to write a contract and then ignore it during the process of making changes to the program. Keep contracts up to date.

Do use correct spelling and grammar

Use complete sentences. Start a sentence with a capitalized word, and end it with a period.

A program with spelling or grammatical errors in contracts looks very unprofessional. Be sure your contract does not look like someone just threw a bucket of words at the screen.


Don't say how the function works, unless there is a good reason for that.

A contract is intended for someone who only wants to use the function. Do not burden that person with information about how the function accomplishes what it does. If you only want to drive a car, you do not need to know how the engine works.

In some situations, providing a little bit of how information can be useful in explaining what the function does. But use that as the exception, not as the rule.


Don't say which other functions this one uses.

A contract does not tell how a function works. Saying that it uses f, g and h is irrelevant.

Don't say which other functions use this function, or where the parameters come from.

It is tempting to try to describe just where a function fits into the whole program. For example, if you know that a particuar parameter will be a number that was read from the user, then it is tempting to say so in the contract. Don't.

Functions are tools that are part of a toolkit for a particular piece of software. You might have a wrench in your toolbox that you acquired in order to turn a particular nut. But don't limit it to turning that nut. It will probably come in handy for some other nuts later. Similarly, a function that you use in one place can often come in handy later at a different place. The function's contract should not limit the function's usefulness.

If a parameter is an array that was filled in by reading a file, do not call the array a file. It is an array. Say what the function does with the array.


Don't describe the types of the parameters.

Type information is clear from the heading. Assume that the reader can see the heading. If there is any information that a person needs to know that is not present in the types, then describe that.

Do not write a full function heading, with types, in the contract. That is not useful, and only clutters the contract. But you often show the function with parameters, by name. For example, you might say that "factorial(n) returns n!". Notice that you don't say "factorial(int n) returns n!".

When you refer to a parameter, do not write its type. Use the parameter's name.



Some words and phrases to avoid

"the given value"

Refer to parameters by name.

"based off"

Don't say "This function returns a value based off parameter q." Of course the function uses its parameter. The issue is, how does the value of q influence the value that is returned?

"reads out"

Only use word read for getting information from the user or from a file, and that is input, not output. You print or write output.

"implements f"

If a function calls f, and for some reason you need to say that, say that it calls f, not that it implements f.

"vertice", "verticies"

Some of your assignments involve graphs. A vertex is part of a graph. The plural of vertex is either vertexes or vertices. I am tired of reading non-words vertice and verticies.

"computes and returns", "finds and returns", "finds"

A function cannot return something that it does not know. Just say that it returns the result. Adding that the function computes the result is redundant. If your function returns something, use return or returns.


Use sensible function names

The programming standards require you to choose a name for each function that suggests what the function does, and to avoid confusing names.

For example, a function that inserts something into an array could be called insert. A function that makes a copy of a string would reasonably be called stringCopy. Common-sense function names are an important part of documentation.

The extent to which this simple rule has been violated by students in the past boggles the mind. Here are some examples taken from student programming assignment submissions.

  • A function called insertCell does not insert anything into anything.

  • A function called compare that does not compare things.

  • A function called printNode that does not print a node.

  • A function called WriteGraph that does not write a graph; it reads a graph.

  • A function called ReadGraph that does not read a graph; it writes a graph.

  • A function called ProcessEvent that does not process an event.

  • A function called InsertMountains that inserts one mountain.

There are many more. Don't let one of your function names end up in this list.


Summary

Contracts are essential to the readability of software. A contract makes it unnecessary to reverse-engineer a function definition to understand what it does. Having contracts for all functions makes it possible to understand the body of one function without the need to reverse-engineer the functions that it calls.

Here is a brief synopsis of the most important rules for writing contracts.


Exercises

  1. What is the difference between internal and external documentation? Answer

  2. Why isn't the idea of self-documenting software viable for large pieces of software? Answer

  3. What happens to undocumented software? Answer

  4. Is the following a sensible contract and heading for function distance?

      // distance(x,y,u,v) returns the distance between
      // points (x,y) and (u,v) in the plane.
    
      double distance(double x, double y, double u, double v)
    
    Answer

  5. Is the following a sensible contract and heading for function distance?

      // distance returns the distance between
      // points ('x','y') and ('u','v') in the plane.
    
      double distance(double x, double y, double u, double v)
    
    Answer

  6. Is the following a sensible contract and heading for function distance?

      // distance(x,y) returns the distance between
      // points (x,y) and (u,v) in the plane.
    
      double distance(double x, double y)
    
    Answer

  7. Is the following a sensible contract and heading for function distance?

      // distance(x,y,u,v) returns the distance between
      // the first two doubles and the second two doubles.
    
      double distance(double x, double y, double u, double v)
    
    Answer

  8. Is the following a sensible contract and heading for function distance?

      // distance(x,y,u,v) returns the distance between
      // two points in the plane.
    
      double distance(double x, double y, double u, double v)
    
    Answer

  9. Is the following a sensible contract and heading for function f?

      // f(x,y) takes two integers x and y and returns
      // an integer.
    
      int f(int x, int y)
    
    Answer

  10. (This question involves call by reference.) Is the following a sensible contract and definition for function foo?

      // foo returns the sum of 'x' and 'y'.
    
      int foo(int x, int& y)
      {
        y = y + 1;
        return x + y;
      }
    
    Answer