On Programming Style

Notes on Pet Peeves, from a Guy with a Red Pen

Acknowledgments: To teachers and writers of good code for the lessons they have imparted and their examples of elegance and clarity; and to writers of bad code for their rich legacy of atrocities.

Introduction

This page is designed to advise on some of the do's and don'ts of good programming style. It is motivated by the conviction that computer programmers should not have for the only goal of a software development project that the program work correctly. Don't get me wrong: the program should work correctly. But there are very practical benefits to developing good programming style (for my students, one is keeping my blood pressure down while I grade your program). A program written in good style is usually easier to understand, debug, and update, than the same program would be if written in bad style. In professional software development, programs are constantly being modified to correct errors, expand functionality, meet changing specifications, improve efficiency, etc. In order that these be done correctly and efficiently, it's important that the source code be easily read and understood.

To page index

C vs. C++ features

The C++ programming language is built on a foundation of the C programming language, with features added on for the purpose of building a better programming language.  Many tools found in C++ are meant to replace analogs in C; in most cases, the C++ tools are stylistically (and, in some cases, semantically) superior.  Examples, in which the C++ tool is generally easier to write and understand than its C analog, include the following:

Purpose C C++
Input scanf, sscanf, and other input functions cin >>
Output printf, sprintf, and other output functions cout <<
Constant declaration #define const
Record-defining struct class
Pointer allocation alloc, malloc, and related tools new

Often, students use the inferior C tools.  This can create several problems, including:

To page index

Documentation

It has been said that programmers who do not document their code are indispensable, and unpromotable. They are indispensable because without proper documentation, it's likely that nobody else will understand their code. They are unpromotable for the same reason: you can't become a project leader if you don't yourself practice a fundamental of teamwork, namely, making your code understandable to your project teammates.

Proper documentation in a program should include the following elements:

There are other issues of good documentation besides the use of comment/remark statements. For example, the programmer's choices of identifiers can do a lot to improve or harm understanding of code. Variables, function names, etc., should be chosen to suggest what they represent. Compare, for example, the following snippets of code:

float s, t, t1;
   .
   .
   .
t1 = s + t;
float subtotal, tax, total;
   .
   .
   .
total = subtotal + tax;

Logically, these two code fragments are identical. However, the one on the right is written in superior style, because its identifiers have been chosen to document their respective purposes within the code.

See the related issue of Magic Numbers.

To page index

Indentation

The indentation at the left margin of a line of code should indicate the block level of the statement. Nested statements should be indented more than higher-level (unnested) statements. Further, a continuation line should be indented more than the line that starts the statement. These practices help clarify a program's structure. Consider the fragments below, in which the version on the right is better than the version on the left:

   for (i = 0; i < size; i++) {
entry[i] = blah(i);
   cout << endl << i << ") "
   << entry[i] << endl;
       } // end for
   for (i = 0; i < size; i++) {
      entry[i] = blah(i);
      cout << endl << i << ") "
           << entry[i] << endl;
   } // end for

The superiority of the style used on the right is observed in the following:

To page index

Length of a Subprogram

KISS - Keep it short, smarty! (You thought I would use "stupid"? Stupid people don't write software.)

In general, it's wise to keep every subprogram (in C++, every function, including main) short. A reader typically tries to understand one function at a time, so keeping functions short makes them more "digestable." A classical guideline: a maximum of 25 lines of code, what used to be the maximum viewable on a computer screen using a typical text editor. Today's screens often show more than 25 lines, and I won't send a student to the guillotine for a 26th line, but if you're in excess of 30, there's probably a natural way to abbreviate your function, perhaps by extracting one or more blocks of its code as (a) separate function(s).

To page index

Logical Structure of a Subprogram

A subprogram (in C++, a function) that performs a small number of simple actions can have a short listing consisting of simple action statements. But what about a function that is responsible for a complex segment of the program's actions?

I encourage using a "top-down" approach. A subprogram (particularly, the main function in a C++ program) that directs complex action should have a listing that outlines the totality of the actions managed by the subprogram. When we combine this outlook with the advice given elsewhere in this document to keep the listing of each subprogram short, we can deduce that it's a good idea to extract lower-level details of a major subtask managed by a function so that such a subtask has its own subprogram that can be called upon by the current subprogram. This approach also facilitates the view of a subprogram as doing "one job," even if that "one job" involves managing several other "jobs."

For example: Here's a situation that often arises in student programs. A main function calls upon a menu function that both presents a menu to the user and acts upon the user's choice. As a result, a reader of the main function doesn't see how the program acts on the user's choice. A better structure: have the main function call upon a menu function that returns (preferably via a reference parameter, not via a return statement, as the latter method likely would represent a side effect) the user's choice to the main function, and have the main function contain the code (which could be a call to a separate function with, say, a switch structure to select the appropriate action) that shows how the program responds to the user's choice.

To page index

Loop Style

Most programming languages have a variety of loop structures. Often, more than one loop structure can be used to code a given piece of logic. However, there is generally a preferred loop structure for a given situation.

First of all, although most programming languages have a goto statement, it should be avoided when possible (and in modern programming languages, including even modern dialects of BASIC, it's almost always possible). The use of a label (a target for a goto) greatly increases the difficulty of understanding the flow of control in a program, because when a label is present, control can reach the labeled statement from many more places in the listing than otherwise. See the classic article [Dijkstra] for more on this point. In particular, then, loops should not be controlled by goto statements.

A reader of this document probably knows that a modern language may have a loop structure that tests a condition before performing the loop body, thus allowing the possibility that the loop body is not performed at all (in C++, a while loop), and a loop structure that tests a condition only after performing the loop body, thus requiring the loop body to be performed at least once (in C++, the do ... while loop). Thus, the most prominent criterion for choosing between a while loop and a do ... while loop is whether or not to allow the possibility of reaching the loop and performing the loop body zero times.

What about a for loop? Should you use a for loop where you might use a do ... while loop or a while loop? Often, this is possible. In general, the use of a for loop is interpreted to mean that the author has some knowledge of the number of times the loop body is to be performed. For example, if a loop is to have one performance for each count of a control variable from some minimum value up to some maximum value (or for some maximum value down to some minimum value) (these extreme values need not be constants in the program), a for is the preferred loop structure. The increment or step value of the control variable need not be 1 (or -1); for example, it is appropriate to use a for loop to print the multiples of 5 and their squares:

 
   int x, y;
   .
   .
   .
   for (x = 5; x <= TOPVALUE; x += 5) {
      y = x * x;
      cout << x << '\t' << y << endl;
   } // end for

By contrast, even when possible to use a for loop, it's preferred to use a do or while when there is no knowledge of the number of performances of the loop body before exit from the loop. For example, to print the name stored in each entry of a linked list, it is better to traverse the list with a while loop, since in general we don't know how many nodes are in the list. Thus, in the following, the version on the right is preferable.

   node * head;
   node * at;
   .
   .
   .        
   for (at = head; at /* != NULL */ ; at = at->next)
      cout << at->name << endl;
   node * head;
   node * at;
   .
   .
   .        
   at = head;
   while (at)   // while (at != NULL)
   {  cout << at->name << endl;
      at = at->next;
   } // end while
To page index

Magic Numbers

Magic numbers are (numeric) constant values that are unexplained. The use of magic numbers is easily avoided by assigning an appropriate symbolic identifier (in C++, typically via a const assignment) to the value in question. This practice has several advantages, as illustrated in the following examples.

To page index

Recursion - when to use it, when not to use it

Recursion is when a subprogram calls upon itself, either directly or circularly. Circular recursion takes place when there is a list of subprograms s0, s1, ..., sn-1 such that for each index i, si calls upon s(i+1) mod n -

s0 calls s1, and s1 calls s2, and ..., and sn-2 calls sn-1, and sn-1 calls s0. When used properly, recursion is a powerful programming technique. Often, an algorithm expressed recursively requires much less code than would the same algorithm expressed nonrecursively.

Typical good uses of recursion are in "Divide and Conquer" algorithms.  These are algorithms for which a large problem is divided up into smaller problems of the same type; each of the smaller problems is solved; and the partial solutions are "stitched together" to obtain a solution to the original, large problem.  (In the following, you need to know that merging two sorted lists means combining the lists into a single sorted list.)  For example, the Merge Sort algorithm for sorting data may be expressed as follows:

Merge Sort - given an unsorted list L of data, sort the data, as follows:
If the list L has at least 2 items (and therefore has the possibility of data out of order), then
  1. Divide the list L into two smaller lists L1 and L2 of approximately equal size (thus, each of L1 and L2 has about 1/2 of the data of the original list L).
  2. Recursively, apply the Merge Sort algorithm to L1, so that at the end of this step, L1 is sorted.
  3. Recursively, apply the Merge Sort algorithm to L2, so that at the end of this step, L2 is sorted.
  4. Merge the sorted lists L1 and L2 to obtain the sorted list L

Notice that recursion is a form of looping.  For example, in the discussion above, when we apply the algorithm recursively to L1, the list L1 has at least 2 items then it is divided into smaller lists, say, L11 and L12; if L11 has at least 2 items, it is subdivided into, say, L111 and L112; etc.

Because direct recursion (a subprogram calling itself directly) is a form of looping, it can often be avoided when its only motivation is looping.  Languages like C++, BASIC, COBOL, and Visual Basic provide other loop forms for programmers to use that will often be more clear than recursion, so that, unless there is a motivation other than looping (such as divide-and-conquer), it may be preferable to avoid direct recursion.  (There are programming languages such as LISP and PROLOG in which recursion is the primary form of looping - in such languages, it may be impossible or undesirable to avoid direct recursion even when looping is the only motivation.)

Circular recursion is rarely necessary, and often is difficult to understand.  Unfortunately, many beginners use circular recursion where a more conventional form of looping would be much easier to understand.  Consider the following forms, in which the example on the right is preferable:

   .
   .
   .
   void againQuery();
   int main() {
   .
   .
   .
   againQuery();
   return 0; 
   } // end main
   .
   .
   .
   // - - - - - - -
   void againQuery()
   {  bool repeating;
      char response;
      .
      .
      .
      cout << "Another round? (y/n): ";
      cin >> response;
      .
      .
      .
      repeating = (response == 'Y') || (response == 'y');
      if (repeating)
         main();  // here's circular recursion
   } // end againQuery
   
   .
   .
   .
   void againQuery(bool &);
   int main() {
   bool repeating;
   .
   .
   .
   do {
      .
      .
      .
      againQuery(repeating);
   } while (repeating); // end do
   return 0; 
   } // end main
   .
   .
   .
   // - - - - - - -
   void againQuery(bool & repeating)
   {  char response;
      .
      .
      .
      cout << "Another round? (y/n): ";
      cin >> response;
      .
      .
      .
      repeating = (response == 'Y') || (response == 'y');
   } // end againQuery
   

Logically, these code fragments are identical. However, the version on the right is easier to understand, because its loop structure is explicit in the main function.  By contrast, the version on the left hides its looping - you must read main and againQuery together in order to understand that the actions of main repeat.  It's easier to read one function at a time than two (or more) functions together.  Therefore, circular recursion should be avoided unless there is some powerful reason other than looping (which, for most programmers, will be a rare event) for using it.

To page index

Case Selection / Selection of Actions

Many programming languages have a statement that is designed to control the selection of actions among a known list of possibilities. In C++, this is the switch structure. An alternative to the use of switch is the use of a series of if statements.

When we are selecting among more than two mutually exclusive possibilities, the switch statement is preferred over the use of a series of if statements. This clarifies the mutually exclusive nature of the list of cases among which we're choosing, while a series of if statements does not. Consider the following pseudo-code examples:

    if (choice == 1)   
        action1;
    if (choice == 2)
        action2;
    if (choice == 3)
        action3;
    if (choice == 4)
        action4;
    switch (choice) {
      case 1: {
        action1;
        break; } // end case 1
      case 2: {
        action2;
        break; } // end case 2
      case 3: {
        action3;
        break; } // end case 3
      case 4: {
        action4;
        break; } // end case 4
    } // end switch

Are these seemingly equivalent versions even truly equivalent? Not necessarily. For example, in the version on the left, suppose control reaches the first if statement with choice equal to 1. Suppose action1 causes the value of choice to be changed to 4. Then the version on the left executes both action1 and action4. By contrast, if the version on the right has control reach the switch with choice equal to 1, action1 is executed, but action4 is not despite the fact that action1 changes the value of choice to 4.

Even when these two versions execute alike, the version on the right clarifies that (under these circumstances) the cases are mutually exclusive, while the version on the left does not.

To page index

Side Effects

A side effect introduces the potential for a nasty surprise in the behavior of a program. See this essay for more information on this topic.

To page index

Single Entry Point, Single Exit Point Style

A piece of code is likely to be more easily understood if it has only one entry point (at the top of its listing) and only one exit point at (or near, e.g., a return statement just before the closing "}" of a non-void C++ function) the bottom of its listing. In C++, this philosophy has the following implications:

Some argue for exceptions to the rules stated above in order to simplify the flow of control amidst complex code. For example, some argue that if main calls on blah, which calls on ..., which calls on gallumph, and an unusual error is detected by the code of the latter function that makes further processing within the program pointless, it is easier for the programmer to use an exit statement to end the run of the program than to provide appropriate if statements, one in each of the chain of functions alluded to above, to provide "normal" exit from the main function's end-of-listing or return 0; statement. Since student exercises rarely get complex enough to make a strong argument along these lines, I prefer my students to follow the guidelines above.

To page index

Space

Proper use of space between distinct symbols, even when not required by one's language software, can make a great deal of difference in the ease of reading your code. Consider the code fragments below, in which it should be clear that the version on the right is the superior version:

for(i=0;i<LISTSIZE;i++){
   a[i]=b[i]+nu(i,j,k);
   cout<<a[i]<<endl;
}//end for
for (i = 0; i < LISTSIZE; i++) {
   a[i] = b[i] + nu(i, j, k);
   cout << a[i] << endl;
} // end for

You might use written English as your model. In written English, we separate distinct words by spaces. We could learn to read English written without spaces between words, but the convention of placing spaces between words makes reading easier on our eyes. Similarly, you should separate "words" of your code by spaces.

To page index

Wraparound

Lines of code that "wrap around" the screen tend to print with apparent line breaks in undesirable locations. Programmers should learn to use their Enter keys during program editing. It should be clear that anyone reading your code will prefer the version on the right to the version on the left, below:

  cout << "Today's winner is " << player[i
ndex].firstName << ' ' << player[index].la
stName << ".\n";
  cout << "Today's winner is " <<
       player[index].firstName << ' ' <<   
       player[index].lastName << ".\n";
To page index

Reference

[Dijkstra] E.W.G. Dijkstra, "Go To statement considered harmful," Communications of the Association for Computing Machinery 11, 3 (Mar., 1968), pp. 147-148;

          Online at http://www.cs.utexas.edu/users/EWD/ewd02xx/EWD215.PDF

Dijkstra's letter was reprinted with some paraphrasing as

"(A Look Back at) Go To Statement Considered Harmful," Communications of the Association for Computing Machinery 51, 1 (Jan., 2008), pp. 7-9.

          Online at http://mags.acm.org/communications/200801/

 

Back to Boxer's Home Page