Software engineering, at its core, is about writing programs in teams. Writing programs in teams is in many ways a fundamentally different activity than writing programs by yourself or in a small team of three or four people. This document outlines development practices that we have found to have a generally high level of utility, and that all projects could benefit from. Most of these practices have been suggested in the past, and most come from our experiences in developing and releasing the ACS 4.0 development platform.
You will find that most of the suggestions in this document are concerned with communication between developers and development teams. Our experience has been that this is the hardest problem to manage in larger projects, so it's important to have effective channels in place to manage the issue.
The first thing to do before you write code is to write a specification of what the code will do. We call this a requirements document, but specification is a slightly more informal and less intimidating word. You probably won't write this document on your own. It should be a collaboration between you and the eventual client or user of the code. This might be an actual Arsdigita client, or it might be the Arsdigita marketing department, or it might be me.
Why write a specification? A good spec is a like a contract between you and the client. It spells out as exactly as possible what will be implemented in a way that is understandable both to you as the implementer and the client as the user. This is very important because even though you have to spend some time doing this before writing code, in the end it makes development go faster. Joel Spolsky wrote four articles on why this is so. I'll summarize the main points here:
Having requirements facilitates communication. The document is the ultimate arbitrator in disputes over what should or should not be in the system. This means you spend less time re-implementing things that you ended up implementing wrong because you misunderstood the client.
Having requirements facilitates scheduling. Unless you know the scope of the project, there is no way to estimate how long it will take for you to do the implementation. Good requirements also help you stick to schedules, because the scope of the project is less likely to drift. This is especially true if you have the discipline to never implement anything that isn't explicitly in the requirements.
Having requirements facilitates testing. In fact, having requirements means that you can start developing tests before and during the time you are coding. This is good.
While we would all like to have a full set of unchanging requirements to work from once a project gets started, this is in fact an unlikely occurrence. A reality of the software world is that requirements change. A good requirements document, and good requirements management is the key to managing this change, because it provides a central point around which to collaborate. Changes to the system require that you change the requirements document, which in turn requires good communication and collaboration between the client and the development team.
The next thing you do before writing code is to think about the design of the system and write down what seems relevant. In the context of ACS, the data model and the rationale behind the data model are often the most important things to generate documentation for before starting development.
The role of the design document is similar to the role of the requirements document, except that it is primarily used by the development team. The design document should be the central repository for a high level description of how the system works and any programming interfaces that the system implements. The design document should track both the requirements document for the system and the implementation of the system. I think a good way to look at the design document is as a more specific set of technical requirements that developers can use to plan their work.
Like the requirements document, the design document will be a collaborative effort and will probably be developed in an iterative fashion. Don't be obsessed with having the whole thing done before development starts, that isn't realistic. In fact, early stage prototyping of a design can be very helpful in trying out speculative ideas to see if they will work out without actually building the whole system. The goal is not to follow a rigid set of rules before you are allowed to being coding. The goal is just to spend some time thinking about the system before diving into code, because thinking about higher level design issues is easier without the day to day details of the actual coding getting in the way.
The ACS 4.0 package development guidelines talk about requirements and design some more.
The final thing to do before writing code is to set up a CVS repository for the project. CVS is the main tool that we have for facilitating collaboration around the code itself. Therefore, the first rule is: every file that you touch on a project should be in CVS. Nothing that you work on, no matter how temporary, should be a plain file just sitting somewhere that is not version controlled in some way. When used correctly, CVS will allow you to effectively keep track of what is happening to the code in the project, and to maintain stable checkpoints of the project at various milestones. The ACS 4.0 build management page talks about how the toolkit team manages configurations. This scheme works pretty well once the code hits a fairly stable state.
Every file should have a CVS Id tag, like this:
$Id: foo.java,v 1.6 2000/11/20 18:36:00 psu Exp $
This is the most compact and most convenient way for people to know who last changed the file and when they did it. Standards for how to get this string into a source file depend on what kind of file it is. See the specific standards docs for Tcl, HTML, SQL and Java code for more details.
Never use ftp, copy, cp, tar, unzip or scp to copy file trees to a CVS
work area. Always make your own private work area using cvs
checkout and put the files there, then add them, then commit
and update.
Try to leave behind nice commit log messages so people know what you are doing. Commit log messages like this are pretty useless:
fixed bugs
Commit log message like this are better:
Fixed bug #5067 in the SDM. Had to change how the package logs user activity with some extra fields in the data model to store more information that we didn't realize we needed.
Making CVS send mail to you whenever people check in is insanely useful. Here are some things it does for you:
The incoming mail stream gives you an instant snapshot of who is
changing what in the code. It's hard to get this from cvs
log because it only provides you with a file by file view.
All ACS 4.0 related checkins are logged
to a bboard. Take a look at it to get an idea what you can learn.
If you make a lot of branches for releases or daily builds, the mail alert will tell you instantly when people are commiting code to the wrong place. This saves a lot of integration time later.
When people check in code that they weren't supposed to touch, or commit changes to major subsystems without notifying people or appropriate reviews, you find out and can yell at them.
If you archive the e-mail, this is as close as you will come to a real transaction log on the CVS repository. Changes that are all part of a single commit tend to be grouped together and have similar log messages. This is handy when doing release integration.
If you do a checkout and suddenly everything is broken, the checkin log can give you clues as to who the culprit is. You can use cvs log for this too, but it's hard to get good information on a large tree.
Writing good code is an art and a skill. Although we like to think we can, very few people can sit down and write perfect, elegant, and fast code off the top of their head. In large teams, a certain amount of consistency in style is also important. So, when you are writing code, keep coding standards in mind. But, in addition to those mechanical standards, keep these rules of thumb in mind as well:
Code review is a great thing. The best way to find bugs in your code is to show it to someone else. The package development guidelines set several large scale review milestones for every project, but I think small scale reviews are even more important. Have people review your code before checking it in. The easiest way to know if you are about to break the semantics of some API call that you just changed is to ask a user of that API. One simple message exchange or phone call before the checkin can save hours of debugging work that might happen after you commit the code and break your client's system.
Abstraction is good. Abstraction is what programming is all about. Abstraction encapsulates data structures or algorithms that you use a lot and makes them easy to use over and over again. Cut and paste, and duplicated code defeat abstraction. If you find yourself copying code to many places in the system, try and figure out how to abstract the code into a procedure.
For example, in ACS 3.x, there were many instances of page files being duplicated in different parts of the file tree so that they would be available from multiple URLs with different semantics. In ACS 4.0, we put mechanisms into the system to allow the same code to be served from several different URLs. This put all the logic for this functionality in one place (easy to fix) and reduced the amount of duplicated code in the system.
Try to limit the use of global variables and language mechanisms that
cause non-local side effects. For example, in TCL, eval
and uplevel should generally be avoided at all costs
unless you know for a fact that there is no other way to achieve the
effect you are after. In Java, object fields should never be public,
so you know that only the methods defined in the given class can touch
them.
Non-local effects are the most frequent cause of bugs that are hard to reproduce and fix.
Everything else being equal, code that is easier to read is always better. Avoid using constructs that obscure what your code is doing. For example, in perl, the following two code fragments do the same thing:
print "Starting analysis\n" if $verbose; $verbose && print "Starting analysis\n";
Even if you don't know perl, the first line of code is pretty easy to
follow, although you might be surprised that the language allows the
trailing if. Even if you do know perl, the second line of
code is kind of obscure. Of course, a more traditional construct is:
if ($verbose) {
print "Starting analysis\n";
}
But in perl this is really no more or less readable.
You must write tests while you code. The goal for any development project should be to move the code from one well known stable state to the next well known stable state at a predictable rate. This is goal behind the the ACS 4 build process.
CVS is the first critical tool that is needed to achieve this. Good tests are the second. You can't keep the tree stable without good tests. The ACS 4.0 team found this out the hard way. Hours before the first alpha release tarball was to be cut, it seemed like every time someone checked code into the CVS repository the entire system broke. The reason for this was that we didn't have an effective way to automatically test the system for a minimal amount of correctness. If we had had such tests, developers would have been able to use them to test their changes before committing them to the repository, and we would have been able to keep the repository stable.
We should have the following goals with respect to testing:
Write tests while you write code. There are several good reasons for this. First, it's tedious to write tests, so writing all the tests at once at the end of the process is nearly impossible to do without going insane. Second, writing tests early may uncover design problems that you did not predict. Third, writing tests early and often will allow you to get the code to a stable state more quickly, and keep it stable more easily. You can't keep code stable without having tests around so that you know when changes break things.
Every system component should have a complete set of regression tests that test both basic functionality and all the strange and wonderful boundary cases.
It should be possible to run the complete test suite or any subset of the test suite automatically and without human intervention.
Testing the code should go hand in hand with keeping the CVS repository stable. There should be a periodic and automatic process for checking the system out of CVS, building it, and testing it to make sure that no integration related problems have appeared.
In an ideal world, we would have pairs of developers working on code: one writing code and the other doing reviews, writing tests, checking for user interface foibles, and writing prototypes against APIs to make sure that they are usable.
In the ACS world, we are still lacking a totally automatic testing framework, but things are getting better. Developers should be using a combination of UtPL/SQL, JUnit, and E-test for testing data models, Java code and page code. E-test is at the current time the hardest of these tools to automate. With luck we might have something in place soon.
Pete Su Last Modified: practices.html,v 1.1 2001/01/21 01:49:08 bquinn Exp