| Acknowledgments: To teachers and writers of good code for the lessons they have imparted and their examples of elegance and clarity; and to writers of bad code for their rich legacy of atrocities. |
This page is designed to advise on some of the do's and don'ts of good programming style. It is motivated by the conviction that computer programmers should not have for the only goal of a software development project that the program work correctly. Don't get me wrong: the program should work correctly. But there are very practical benefits to developing good programming style (for my students, one is keeping my blood pressure down while I grade your program). A program written in good style is usually easier to understand, debug, and update, than the same program would be if written in bad style. In professional software development, programs are constantly being modified to correct errors, expand functionality, meet changing specifications, improve efficiency, etc. In order that these be done correctly and efficiently, it's important that the source code be easily read and understood.
To page indexThe C++ programming language is built on a foundation of the C programming language, with features added on for the purpose of building a better programming language. Many tools found in C++ are meant to replace analogs in C; in most cases, the C++ tools are stylistically (and, in some cases, semantically) superior. Examples, in which the C++ tool is generally easier to write and understand than its C analog, include the following:
| Purpose | C | C++ |
| Input | scanf, sscanf, and other input functions | cin >> |
| Output | printf, sprintf, and other output functions | cout << |
| Constant declaration | #define | const |
| Record-defining | struct | class |
| Pointer allocation | alloc, malloc, and related tools | new |
Often, students use the inferior C tools. This can create several problems, including:
It has been said that programmers who do not document their code are indispensable, and unpromotable. They are indispensable because without proper documentation, it's likely that nobody else will understand their code. They are unpromotable for the same reason: you can't become a project leader if you don't yourself practice a fundamental of teamwork, namely, making your code understandable to your project teammates.
Proper documentation in a program should include the following elements:
There are other issues of good documentation besides the use of comment/remark statements. For example, the programmer's choices of identifiers can do a lot to improve or harm understanding of code. Variables, function names, etc., should be chosen to suggest what they represent. Compare, for example, the following snippets of code:
float s, t, t1; . . . t1 = s + t; |
float subtotal, tax, total; . . . total = subtotal + tax; |
Logically, these two code fragments are identical. However, the one on the right is written in superior style, because its identifiers have been chosen to document their respective purposes within the code.
See the related issue of Magic Numbers.
To page indexThe indentation at the left margin of a line of code should indicate the block level of the statement. Nested statements should be indented more than higher-level (unnested) statements. Further, a continuation line should be indented more than the line that starts the statement. These practices help clarify a program's structure. Consider the fragments below, in which the version on the right is better than the version on the left:
for (i = 0; i < size; i++) {
entry[i] = blah(i);
cout << endl << i << ") "
<< entry[i] << endl;
} // end for
|
for (i = 0; i < size; i++) {
entry[i] = blah(i);
cout << endl << i << ") "
<< entry[i] << endl;
} // end for
|
The superiority of the style used on the right is observed in the following:
KISS - Keep it short, smarty! (You thought I would use "stupid"? Stupid people don't write software.)
In general, it's wise to keep every subprogram (in C++, every function, including main) short. A reader typically tries to understand one function at a time, so keeping functions short makes them more "digestable." A classical guideline: a maximum of 25 lines of code, what used to be the maximum viewable on a computer screen using a typical text editor. Today's screens often show more than 25 lines, and I won't send a student to the guillotine for a 26th line, but if you're in excess of 30, there's probably a natural way to abbreviate your function, perhaps by extracting one or more blocks of its code as (a) separate function(s).
To page indexA subprogram (in C++, a function) that performs a small number of simple actions can have a short listing consisting of simple action statements. But what about a function that is responsible for a complex segment of the program's actions?
I encourage using a "top-down" approach. A subprogram (particularly, the main function in a C++ program) that directs complex action should have a listing that outlines the totality of the actions managed by the subprogram. When we combine this outlook with the advice given elsewhere in this document to keep the listing of each subprogram short, we can deduce that it's a good idea to extract lower-level details of a major subtask managed by a function so that such a subtask has its own subprogram that can be called upon by the current subprogram. This approach also facilitates the view of a subprogram as doing "one job," even if that "one job" involves managing several other "jobs."
For example: Here's a situation that often arises in student programs. A main function calls upon a menu function that both presents a menu to the user and acts upon the user's choice. As a result, a reader of the main function doesn't see how the program acts on the user's choice. A better structure: have the main function call upon a menu function that returns (preferably via a reference parameter, not via a return statement, as the latter method likely would represent a side effect) the user's choice to the main function, and have the main function contain the code (which could be a call to a separate function with, say, a switch structure to select the appropriate action) that shows how the program responds to the user's choice.
To page indexMost programming languages have a variety of loop structures. Often, more than one loop structure can be used to code a given piece of logic. However, there is generally a preferred loop structure for a given situation.
First of all, although most programming languages have a goto statement, it should be avoided when possible (and in modern programming languages, including even modern dialects of BASIC, it's almost always possible). The use of a label (a target for a goto) greatly increases the difficulty of understanding the flow of control in a program, because when a label is present, control can reach the labeled statement from many more places in the listing than otherwise. See the classic article [Dijkstra] for more on this point. In particular, then, loops should not be controlled by goto statements.
A reader of this document probably knows that a modern language may have a loop structure that tests a condition before performing the loop body, thus allowing the possibility that the loop body is not performed at all (in C++, a while loop), and a loop structure that tests a condition only after performing the loop body, thus requiring the loop body to be performed at least once (in C++, the do ... while loop). Thus, the most prominent criterion for choosing between a while loop and a do ... while loop is whether or not to allow the possibility of reaching the loop and performing the loop body zero times.
What about a for loop? Should you use a for loop where you might use a do ... while loop or a while loop? Often, this is possible. In general, the use of a for loop is interpreted to mean that the author has some knowledge of the number of times the loop body is to be performed. For example, if a loop is to have one performance for each count of a control variable from some minimum value up to some maximum value (or for some maximum value down to some minimum value) (these extreme values need not be constants in the program), a for is the preferred loop structure. The increment or step value of the control variable need not be 1 (or -1); for example, it is appropriate to use a for loop to print the multiples of 5 and their squares:
int x, y;
.
.
.
for (x = 5; x <= TOPVALUE; x += 5) {
y = x * x;
cout << x << '\t' << y << endl;
} // end for
|
By contrast, even when possible to use a for loop, it's preferred to use a do or while when there is no knowledge of the number of performances of the loop body before exit from the loop. For example, to print the name stored in each entry of a linked list, it is better to traverse the list with a while loop, since in general we don't know how many nodes are in the list. Thus, in the following, the version on the right is preferable.
node * head;
node * at;
.
.
.
for (at = head; at /* != NULL */ ; at = at->next)
cout << at->name << endl;
|
node * head;
node * at;
.
.
.
at = head;
while (at) // while (at != NULL)
{ cout << at->name << endl;
at = at->next;
} // end while
|
Magic numbers are (numeric) constant values that are unexplained. The use of magic numbers is easily avoided by assigning an appropriate symbolic identifier (in C++, typically via a const assignment) to the value in question. This practice has several advantages, as illustrated in the following examples.
Imagine a program that requires time calculations. Such a program might well use the value 60 for multiple purposes, including the number of seconds per minute, the number of minutes per hour, and the number of paper clips in a small box (has nothing to do with time, which is part of the point). Compare the following code fragments, in which, to emphasize our points, we will combine the issue of magic numbers with the issue of choosing identifiers mnemonically:
. . . m = 60 * h + em; s = 60 * m + es; tc = 60 * b + lc; |
const short MINUTESPERHOUR = 60; const short SECONDSPERMINUTE = 60; const short CLIPSPERBOX = 60; . . . minutes = MINUTESPERHOUR * hours + extraMinutes; seconds = SECONDSPERMINUTE * minutes + extraSeconds; totalClips = CLIPSPERBOX * boxes + looseClips; |
Logically, these code fragments are identical. However, the one on the right has the advantage that each use of the constant value 60 is explained by its const identifier. Indeed, the right fragment clarifies, and the left fragment does not, that the three uses of the value 60 in the assignments to (nonconstant) variables are unrelated to each other. (The right fragment also has the advantage that its other identifiers are mnemonically chosen.)
. . . tax = 0.07 * subtotal; |
const float TAXRATE = 0.07; . . . tax = TAXRATE * subtotal; |
These fragments have identical logic. However, the left version does not clarify nearly as well as the right version that the tax rate is 7%.
Further, imagine a program that has 20 such computations. Now suppose the county legislature changes the sales tax rate. In the version of the program written in the style of the left fragment above, 20 changes must be made, and because there are so many, there is a danger that not every place in the program requiring a change is found and corrected; but in the version of the program written in the style of the right fragment, only the const declaration need be changed.
Recursion is when a subprogram calls upon itself, either directly or circularly. Circular recursion takes place when there is a list of subprograms s0, s1, ..., sn-1 such that for each index i, si calls upon s(i+1) mod n -
Typical good uses of recursion are in "Divide and Conquer" algorithms. These are algorithms for which a large problem is divided up into smaller problems of the same type; each of the smaller problems is solved; and the partial solutions are "stitched together" to obtain a solution to the original, large problem. (In the following, you need to know that merging two sorted lists means combining the lists into a single sorted list.) For example, the Merge Sort algorithm for sorting data may be expressed as follows:
Notice that recursion is a form of looping. For example, in the discussion above, when we apply the algorithm recursively to L1, the list L1 has at least 2 items then it is divided into smaller lists, say, L11 and L12; if L11 has at least 2 items, it is subdivided into, say, L111 and L112; etc.
Because direct recursion (a subprogram calling itself directly) is a form of looping, it can often be avoided when its only motivation is looping. Languages like C++, BASIC, COBOL, and Visual Basic provide other loop forms for programmers to use that will often be more clear than recursion, so that, unless there is a motivation other than looping (such as divide-and-conquer), it may be preferable to avoid direct recursion. (There are programming languages such as LISP and PROLOG in which recursion is the primary form of looping - in such languages, it may be impossible or undesirable to avoid direct recursion even when looping is the only motivation.)
Circular recursion is rarely necessary, and often is difficult to understand. Unfortunately, many beginners use circular recursion where a more conventional form of looping would be much easier to understand. Consider the following forms, in which the example on the right is preferable:
.
.
.
void againQuery();
int main() {
.
.
.
againQuery();
return 0;
} // end main
.
.
.
// - - - - - - -
void againQuery()
{ bool repeating;
char response;
.
.
.
cout << "Another round? (y/n): ";
cin >> response;
.
.
.
repeating = (response == 'Y') || (response == 'y');
if (repeating)
main(); // here's circular recursion
} // end againQuery
|
.
.
.
void againQuery(bool &);
int main() {
bool repeating;
.
.
.
do {
.
.
.
againQuery(repeating);
} while (repeating); // end do
return 0;
} // end main
.
.
.
// - - - - - - -
void againQuery(bool & repeating)
{ char response;
.
.
.
cout << "Another round? (y/n): ";
cin >> response;
.
.
.
repeating = (response == 'Y') || (response == 'y');
} // end againQuery
|
Logically, these code fragments are identical. However, the version on the right is easier to understand, because its loop structure is explicit in the main function. By contrast, the version on the left hides its looping - you must read main and againQuery together in order to understand that the actions of main repeat. It's easier to read one function at a time than two (or more) functions together. Therefore, circular recursion should be avoided unless there is some powerful reason other than looping (which, for most programmers, will be a rare event) for using it.
To page indexMany programming languages have a statement that is designed to control the selection of actions among a known list of possibilities. In C++, this is the switch structure. An alternative to the use of switch is the use of a series of if statements.
When we are selecting among more than two mutually exclusive possibilities, the switch statement is preferred over the use of a series of if statements. This clarifies the mutually exclusive nature of the list of cases among which we're choosing, while a series of if statements does not. Consider the following pseudo-code examples:
if (choice == 1)
action1;
if (choice == 2)
action2;
if (choice == 3)
action3;
if (choice == 4)
action4;
|
switch (choice) {
case 1: {
action1;
break; } // end case 1
case 2: {
action2;
break; } // end case 2
case 3: {
action3;
break; } // end case 3
case 4: {
action4;
break; } // end case 4
} // end switch
|
Are these seemingly equivalent versions even truly equivalent? Not necessarily. For example, in the version on the left, suppose control reaches the first if statement with choice equal to 1. Suppose action1 causes the value of choice to be changed to 4. Then the version on the left executes both action1 and action4. By contrast, if the version on the right has control reach the switch with choice equal to 1, action1 is executed, but action4 is not despite the fact that action1 changes the value of choice to 4.
Even when these two versions execute alike, the version on the right clarifies that (under these circumstances) the cases are mutually exclusive, while the version on the left does not.
To page indexA side effect introduces the potential for a nasty surprise in the behavior of a program. See this essay for more information on this topic.
To page indexA piece of code is likely to be more easily understood if it has only one entry point (at the top of its listing) and only one exit point at (or near, e.g., a return statement just before the closing "}" of a non-void C++ function) the bottom of its listing. In C++, this philosophy has the following implications:
Some argue for exceptions to the rules stated above in order to simplify the flow of control amidst complex code. For example, some argue that if main calls on blah, which calls on ..., which calls on gallumph, and an unusual error is detected by the code of the latter function that makes further processing within the program pointless, it is easier for the programmer to use an exit statement to end the run of the program than to provide appropriate if statements, one in each of the chain of functions alluded to above, to provide "normal" exit from the main function's end-of-listing or return 0; statement. Since student exercises rarely get complex enough to make a strong argument along these lines, I prefer my students to follow the guidelines above.
To page indexProper use of space between distinct symbols, even when not required by one's language software, can make a great deal of difference in the ease of reading your code. Consider the code fragments below, in which it should be clear that the version on the right is the superior version:
for(i=0;i<LISTSIZE;i++){
a[i]=b[i]+nu(i,j,k);
cout<<a[i]<<endl;
}//end for
|
for (i = 0; i < LISTSIZE; i++) {
a[i] = b[i] + nu(i, j, k);
cout << a[i] << endl;
} // end for
|
You might use written English as your model. In written English, we separate distinct words by spaces. We could learn to read English written without spaces between words, but the convention of placing spaces between words makes reading easier on our eyes. Similarly, you should separate "words" of your code by spaces.
To page indexLines of code that "wrap around" the screen tend to print with apparent line breaks in undesirable locations. Programmers should learn to use their Enter keys during program editing. It should be clear that anyone reading your code will prefer the version on the right to the version on the left, below:
cout << "Today's winner is " << player[i ndex].firstName << ' ' << player[index].la stName << ".\n"; |
cout << "Today's winner is " <<
player[index].firstName << ' ' <<
player[index].lastName << ".\n";
|
[Dijkstra] E.W.G. Dijkstra, "Go To statement considered harmful," Communications of the Association for Computing Machinery 11, 3 (Mar., 1968), pp. 147-148;
Online at http://www.cs.utexas.edu/users/EWD/ewd02xx/EWD215.PDF
Dijkstra's letter was reprinted with some paraphrasing as
"(A Look Back at) Go To Statement Considered Harmful," Communications of the Association for Computing Machinery 51, 1 (Jan., 2008), pp. 7-9.
Online at http://mags.acm.org/communications/200801/
Back to Boxer's Home Page