by Ron Henderson (ron@arsdigita.com)
Submitted on: 2000-06-30
ArsDigita : ArsDigita Systems Journal : One article
Running a Web development project is a complex process. As a
project leader, you will need to coordinate the efforts of a
programming team, one or more graphic artists, content managers, and
testers, all while dealing with site objectives that necessarily
evolve over time. If you want to sleep at night, every piece of
static content required to make your Web service operate should be
under version control. This includes all server-side scripts,
static html pages, images, template specifications, data models, and
anything else your site requires to operate.
A good version control system serves two related functions: record
keeping and collaboration. Record keeping is the more obvious. What
state was the code in yesterday? What changes have taken place since
the last feature launch? Who made change X that broke page Y?
Typical version control systems achieve this by recording each change
to a file as a separate numbered revision, along with a message
about what the change was and who made it. This fosters collaboration
by making it easier for one developer to understand what the others
have done (they can read the log messages, and look at the isolated
changes). Good systems provide mechanisms for merging changes from
multiple developers so that work is never clobbered accidentally.
At ArsDigita we have selected the free, open-source Concurrent
Versions System (CVS) as our standard version control system. CVS is
based on the copy-modify-merge model, and works as follows: A
master copy of the the project is stored in a special area called the
repository. One or more working copies of the project
can be checked out of the repository. Each contributor to the
project makes changes in one of these working copies and then
commits those changes, along with a log message about what they
did, to be recorded in the repository. At any time the working copy
can be compared to the master copy and, if necessary, updated
by merging in the more recent changes.
Anyone can pick up the basics of CVS in a few minutes by reading
the documentation, available online at
http://www.loria.fr/~molli/cvs/doc/cvs_toc.html).
You really only need a handful of commands: add, remove, commit, and
update. This article is geared toward more advanced features of CVS
that come into play during the overall management of a Web development
project.
A typical Web development project involves three servers, which may
be on one or more physical hosts (see http://arsdigita.com/asj/cvs/).
We need a production server for the live site, a staging
server to test changes before they go live, and a development
server for more long-term or unstable development. In CVS terms, each
one is served from a different working copy of the project. We
also need at least two Oracle users/tablespaces to separate
production/staging from development. Here we concentrate on
initializing the project's code repository.
Putting a new project into a CVS repository is called
importing. For the sake of example, assume that you're setting
a new project called "myservice" based on the ArsDigita Community
System (ACS) version 3.1.5.
This creates a new repository in
Note that this command is executed from inside the
top-level ACS subdirectory, not above it. The example above will create a new project repository rooted at
Note that the repository does not have to reside on the same host
as the working copies. In fact, CVS is network transparent:
the repository can be accessed anywhere on the internet using one of
four connetion methods. At ArsDigita we only use the ext
(external transport agent) method with Secure Shell as the actual
transport agent. For this to work the environment variable
You now have a fresh installation of ACS to start from. Use the
regular cvs commands to add new files
and directories to your repository (see http://www.loria.fr/~molli/cvs/doc/cvs_7.html#SEC60), and remind your developers to
get in the habit of committing their changes regularly.
Often you want to get a quick summary of the status of each file in
your project. You can use cvs update to do this. This command
compares the state of each file in your working directory with the
corresponding file in the repository, merging changes from the
repository if newer revisions have been committed. You can use update
in preview mode to check the state of everything:
We get a quick view of the state of the project: M is a
locally modified file with uncommitted changes, U indicates
there is a newer revision in the repository (committed from some other
working copy), C indicates a file with unresolved conflicts,
and ? is a file that CVS has no version control information
for.
When multiple files are sent to the repository they receive the
same log message, making it possible to recombine a batch of changes.
You can use a simple third-party tool called cvs2cl.pl to
convert the raw CVS log entries into a GNU-style ChangeLog. This is
extremely useful for recording the change history of a project.
To generate the ACS ChangeLog for the last week of April:
Last updated: 2000-08-06
Setting up a new project under CVS
/cvsweb/, so you
don't have to create one. If /cvsweb/ doesn't
exist you will need to use cvs init:
% cvs -d /cvsweb init
/cvsweb/ and populates the
administrative subdirectory
/cvsweb/CVSROOT/.~/develop:
% cd ~/develop/
% tar zxvf acs-3.1.5.tar.gz
% cd ~/develop/acs/
% cvs -d /cvsweb import -m "importing ACS 3.1.5" myservice ArsDigita acs-3-1-5
% cd ~/develop/
% rm -rf acs acs-3.1.5.tar.gz
% cd /web/
% cvs -d /cvsweb checkout -d myservice-dev myservice
% ssh myservice.arsdigita.com
% cd /web
% cvs -d myservice-dev.arsdigita.com:/cvsweb checkout myservice
% cvs -d myservice-dev.arsdigita.com:/cvsweb checkout -d myservice-staging myservice
/cvsweb/myservice/. It will be important later to use a
consistent set of vendor and release tags so that your
project can easily upgrade to newer versions of the ACS as they come
along. In the example above, vendor=ArsDigita
(this will always be the case) and release=acs-3-1-5.CVS_RSH=ssh must be set, although it is part of the
standard environment on all ArsDigita machines.
Tracking changes
% cd /web/arsdigita
% cvs -q -n update
? bin/make-acs-dist.pl
M loader/acs-loader.tcl
C tcl/address-book-defs.tcl
M tcl/download-defs.tcl
M tcl/ecommerce-scheduled-procs.tcl
M tcl/education-portal.tcl
M tcl/ischecker-notifier.tcl
M tcl/news-defs.tcl
U tcl/webmail-defs.tcl
? tcl/extensions-defs.tcl
? tcl/SQL
% cvs2cl.pl -l "'-d 2000-04-21 < 2000-04-29'"
% cat ChangeLog
2000-04-28 21:28 teadams
* www/intranet/employees/index.tcl: making team and office views
2000-04-28 21:23 teadams
* www/intranet/employees/admin/index.tcl: separating into offices
and teams
2000-04-28 20:49 ron
* www/doc/release-numbering.html: standard document regarding ACS
version numbers
2000-04-28 18:40 jsalz
* tcl/: ad-abstract-url.tcl, ad-admin.tcl, ad-security.tcl,
ad-table-display.tcl, ad-user-groups.tcl, ad-widgets.tcl: moved to
packages
2000-04-28 17:34 ron
* www/register/index.tcl: removed reference to ad_style_bodynote
2000-04-28 17:32 jsalz
* www/admin/apm/version-install.tcl: initial checkin
...
% cvs tag acs-3-2-2-R20000412
It might help to visualize a tag as a straight line through a particular set of revisions of each file of the project:
file1 file2 file3 file4
1.3 1.1
1.17 -- 1.4 -- 1.9 -- 1.2 -- acs-3-2-2-R20000412
1.18 1.5 1.10 1.3
1.19 1.11
When you checkout a copy of a project, by default you receive the latest revisions on the main line of development, also called the trunk. But you can also uses tags to request a particular named snapshot of project:
% cvs update -r acs-3-2-2-R20000412 % cvs checkout -r acs-3-2-2-R20000412
Note that for tags to be useful, somebody (the project leader) needs to plan a system for creating them at critical times in the project's history. For the ACS we record each toolkit release using a special tag of the form:
acs-major-minor-release-date
This intentionally mimics our release version numbering (see http://arsdigita.com/doc/release-numbering) so that the exact code shipped out in a particular release can be recalled from CVS.
At certain times it's important to separate the project into separate, parallel lines of development. For example, prior to a release we split the main project into a release branch and a development branch. On the release branch we freeze the code and fix bugs in preparation for the release, while on the development branch we continue development for the future.
The idea is simple:
trunk
|
| acs-3-1 (branch 1.35.2)
| /
1.35 ------------- 1.35.2.1
| |
| |
1.36 1.35.2.2
| |
| |
1.37 1.35.2.3
| |
| |
1.38 1.35.2.4 acs-3-1-0-R20000204 (tag)
The branch has a label (acs-3-1), and revisions along the branch have four digits instead of two.
The command to create a branch is:
% cvs tag -b branchtag
Once created you have to explicitly move your working copy onto the branch with a cvs checkout or update. Example:
% cd /web/arsdigita % cvs tag -b acs-3-1 % cd /web/ % cvs checkout -r acs-3-1 -d acs-staging acs
You know that a particular file is on a branch because cvs status tells you:
% cvs -d /usr/local/cvsroot checkout -r acs-3-2 acs % cvs status acs/readme.txt =================================================================== File: readme.txt Status: Up-to-date Working revision: 3.1 Mon Feb 21 07:02:06 2000 Repository revision: 3.1 /usr/local/cvsroot/acs/readme.txt,v Sticky Tag: acs-3-2 (branch: 3.1.4) Sticky Date: (none) Sticky Options: (none)
Branches are only used in the ACS repository to isolate code prior to a release. CVS supports branching to an arbitrary depth, but it's hard to imagine needing more than two parallel lines of development.
Note that branches created this way always even-numbered (3.1.4.1). As described below, the import command also uses branches to separate your changes from the code imported from an external vendor. These so-called vendor branches are always odd-numbered.
We just spent a week working on the new release, fixing bugs and typos. All of those changes were commited to the release branch so that work in the development area could continue in isolation. Now that the release is over, we want to merge all of those fixes back into the trunk so they don't have to be fixed again for the next release.
In CVS terms, we need to merge the release branch and the development trunk.
trunk
|
| acs-3-1 (branch 1.35.2)
| /
1.35 ------------- 1.35.2.1
| |
| |
1.36 1.35.2.2
| |
| |
1.37 1.35.2.3
| |
| |
1.39 1.35.2.4 acs-3-1-0-R20000204 (tag)
. .
. .
1.40 . . . . . changes merged to create 1.40
Another way to think about it: we want to create a patch comprised of all the changes that took place along the acs-3-1 branch and apply that to our development copy. This is another job for CVS update using the -j (join) option to merge in a selected set of changes.
% cd ~/develop % cvs checkout acs % cd ~/develop/acs/ % cvs update -kk -j acs-3-1 % cvs commit -m "merged changes from acs-3-0 branch"
The arguments to update are -j acs-3-1 to merge in all changes that took place along the acs-3-1 branch, and -kk to kill keyword expansion so we don't get spurious conflicts from CVS keywords in the source files.
What if you want to go in the other direction, merging changes on the trunk with a release branch?
% cd ~/develop % cvs checkout -r acs-3-1 acs % cd ~/develop/acs % cvs update -kk -j HEAD % cvs commit -m "merged changes from development"
You can find a complete description of branches and merging in the context of toolkit releases in the document ACS Release Mangement (http://www.arsdigita.com/doc/runbook/acs-release-management).
As ArsDigita programmers we always want to develop using the latest version of the ACS. Why is this important?
Several weeks have gone by since you started developing the site, and now a new version of ACS has been released that contains features you need or bug fixes to important modules. In the meantime, you've customized parts of the basic toolkit software necessary for the project. How can you upgrade your ACS installation without losing the customization?
The solution is to merge the new release with your existing code base using vendor/release tags to keep track of where the code came from. This is essentially the same principal that applies to merging branches in CVS. You supply a symbolic name for the vendor and software release of the new code you're importing. CVS then checks each file against the code in the repository, merges all the changes, and reports any conflicts so you can resolve them.
For the most part this process should simply upgrade the existing files in your repository so they reflect the latest release. Conflicts should only arise if you and ArsDigita (whom we treat here as an outside software vendor) made overlapping changes.
The basic steps are simple. Here is the upgrade method based on
cvs import using vendor/release tags:
~/develop/ . Unpack the distribution:
cd ~/develop/
tar zxvf acs-3.1.5.tar.gz
cd ~/develop/acs
cvs import -m "upgrade to ACS 3.1.5" myservice ArsDigita acs-3-1-5
cd ..
rm -rf acs
cd /web/myservice/
cvs update -d
Note that the -d option to cvs update will create any new directories that were added to ACS. Without this option the update command will only operate on directories that already exist in your working copy.
A CVS import into an active project rarely goes that smoothly. Before we go into the mechanics of finishing the upgrade, it's important to understand how CVS stores each import within your project repository using vendor branches.
Let's say that over time we've upgraded the project three times. In the simplest case of a file that we never touched, the revision tree looks like this:
% cvs -d /cvsweb import myservice ArsDigita acs-3-1-0
% cvs -d /cvsweb import myservice ArsDigita acs-3-1-1
% cvs -d /cvsweb import myservice ArsDigita acs-3-1-2
file
|
| ArsDigita (branch 1.1.1)
| /
1.1 ------------- 1.1.1.1 acs-3-1-0
|
|
1.1.1.2 acs-3-1-1
|
|
1.1.1.3 acs-3-1-2
Each import adds a new revision to the vendor branch, and a subsequent update or checkout will pull that latest revision out to your working copy.
What if you and the vendor both changed the file?
% cvs -d /cvsweb import myservice ArsDigita acs-3-1-1
...
2 conflicts created by this import.
Use the following command to help the merge:
cvs checkout -jArsDigita:yesterday -jArsDigita myservice
file
|
| ArsDigita (branch 1.1.1)
| /
1.1 ------------- 1.1.1.1 acs-3-1-0
| |
| |
1.2 1.1.1.2 acs-3-1-1
Because you and the vendor both changed the file, we have an import conflict, which really just means that we need to take an extra step to merge the vendor's changes (1.1.1.2) with ours (1.2). CVS tells you exactly what to do:
% cd ~/develop
% cvs checkout -j ArsDigita:yesterday -j ArsDigita myservice
file
|
| ArsDigita (branch 1.1.1)
| /
1.1 ------------- 1.1.1.1 acs-3-1-0
| |
| |
1.2 1.1.1.2 acs-3-1-1
. .
. .
1.3 . . . . . merge (and resolve conflicts)
Note what's happening here. We are checking out a fresh copy of the project and doing a simultaneous merge. The patch to be applied is computed by selecting only the changes that have occured along the ArsDigita branch in the past 24 hours, which will of course include only the code that we just imported.
Now you have to go through the normal process of conflict resolution if you had overlapping changes. Otherwise CVS will combine your changes with the vendor's changes to produce the fully updated file. Don't forget to commit the results of the merge and update your development area.
% cd ~/develop/myservice (resolve conflicts) % cvs commit -m "merged with acs-3-1-1, conflicts resolved" % cd /web/myservice-dev % cvs update
Our final topic concerns managing the code between your three servers (development, staging and production). Say you are running a client project on which the site has launched but your programming team is still developing new features. For example, every week you release a new set of features onto the production site, plus you fix some bugs, plus you start up a development project that won't be released for some time. How do you keep track of what set of changes need to migrate from one working copy to another?
This is a problem with many possible solutions but not one "best" solution. You'll have to make the appropriate choice based on the complexity of the site and the amount of time you're willing to invest in software management. Here are three choices for solving the problem, differentiated by the number of code branches used.
The basic idea is that we have only one branch of development and we use CVS tags to mark the revision that should migrate to the production server. The development tree looks like this:
trunk
|
|
1.35
|
|
1.36 ... myservice-production
|
|
1.37
|
|
1.38
We use a special tag (myservice-production) to mark files for migrating to the production server. Note that all work is done on the development site. Any programmer can release a file for production using:
cvs tag -F myservice-production [files]
or remove a file from production using:
cvs tag -D myservice-production [files]
After that development continues but files would only be re-tagged when a new set of features is ready for release to the production server. The only command ever executed on the production server is cvs update:
cd /web/myservice cvs update -d -r myservice-production
The arguments to the update are -d to create new directories if necessary and -r to update all of the local copies to the revision tagged as myservice-production, and remove files that no longer have that tag.
Note that files on the production server will not be editable. When you ask CVS to update a file using a regular tag like myservice-production, it sets a sticky option for the file. This is to prevent you from trying to commit changes to any file that is not based on the latest revision. The command update -A will reset all sticky options and replace your local copy with the latest trunk revision.
The practical implication of sticky tags is the following. If a problem is discovered it must be fixed on the development server, re-tagged, and then the production server must be updated again to bring over the changes. However, this process is easily automated. You can have a cron job on the production box that updates the site every 15 minutes, or every hour, or every day - whatever schedule fits your management goals.
For better record keeping, or in case you need to suddenly revert to a previous working image of the production server, you can record particular snapshots of the tree using additional tags and the cvs rtag command:
cd /web/myservice cvs -d /cvsweb rtag -r myservice-production myservice-1-0-5 myservice
This applies the tag myservice-1-0-5 to all revisions that are currently tagged as myservice-production. Note that rtag does not require a working copy. It goes straight to the repository and adds the myservice-1-0-5 to every revision that is currently tagged myservice-production.
The production tag will change over time, but the release tags will record fixed snapshots of your project. Eventually your development tree will look like this:
trunk
|
|
1.35
|
|
1.36 ... myservice-1-0-3
|
|
1.37 ... myservice-1-0-4
|
|
1.38 ... myservice-1-0-5, myservice-production
In case you need to suddenly revert the production site to a previous snaphot, it's a simple matter of executing:
cvs update -r myservice-1-0-3
If you set things up like this you will not need a staging server,
although you could easily introduce one, e.g. for testing prior to
updating the production server. This is essentially the way we run www.arsdigita.com, with all changes
taking place on the development server, the tagging performed by a
shell script called arsdigita-publish.sh, and a cron job
updating the production site every 15 minutes.
To summarize the one-branch scenario:
If you want a little more flexibility and better separation between your development work and your production site, you can introduce a separate code branch for staging/production. You're shooting for something like this:
/web/myservice-dev/Development server (working copy of your project trunk) /web/myservice-staging/Staging (working copy of the production branch) /web/myservice/Production (read-only copy of the production branch)
You want to keep active development on your development server, testing and bug fixes on your staging server, and no edits (cvs update only) on your production server. Conceptually the development tree looks like this:
file
|
| myservice-production (branch 1.1.2)
| /
1.1 -------------- 1.1.2.1
| |
| |
1.2 1.1.2.2 ... myservice-1-0-1
| |
| |
1.3 1.1.2.3
| |
| |
1.4 1.1.2.4 ... myservice-1-0-2
Unlike the one-branch scenario, the production branch now allows normal development, e.g. you can edit and commit changes from the staging server just like any other working copy. These changes will be recorded on the myservice-production branch. This also enables content management tools that have been integrated with CVS to record changes in the production branch. And you can still record particular release snapshots (myservice-1-0-2) on the production branch.
The only additional complexity is that you will need to merge changes between your production and development branches. The command to add a file to the production branch for the first time is the same as above, with the addition of a -b branch option:
cvs tag -b myservice-production [files]
The production and staging servers are updated as before:
cd /web/myservice-staging cvs update -d -r myservice-production cd /web/myservice cvs update -d -r myservice-production
with the subtle distinction that myservice-production is now a branch tag (refers to a family of revisions rooted to a common ancestor on the trunk) rather than a regular tag (refers to a single revision in the repository).
To migrate changes from your development site to the production site, you will need to explicitly merge them. For example, let's say you've been working on custom-module and now want to migrate the changes to your production branch. The commands are:
% cd /web/myservice-staging/www/custom-module/
% cvs update -kk -j HEAD
% cvs commit -m "merged with dev"
trunk myservice-production (branch 1.35.2)
| |
| |
| |
1.35 1.35.2.1
| |
| |
1.36 1.35.2.2
. |
. |
. . . . . . 1.35.2.3
This uses the reserved tag HEAD, which always refers to the
latest revision on the trunk, to merge all files in the directory
/www/custom-module/. Once everything is tested on
the staging server, a single cvs update on the production
server will activate the changes.
One advantage to this approach is that you can quickly fix problems on the staging server and commit the changes to the production branch. If this happens you will want to merge the changes into your development copy so that you don't have to duplicate the fixes (and generate spurious conflicts later). This is also done with a merge, but in the opposite direction. As a general policy you will want to tag your production branch and do this at regular intervals, each time merging the changes between previous production "releases".
% cd /web/myservice-dev/www/custom-module/
% cvs update -kk -j myservice-1-0-2 -j myservice-1-0-3
% cvs commit -m "merged with myservice-1-0-3"
trunk myservice-production (branch 1.35.2)
| |
| |
| |
1.35 1.35.2.1 ... myservice-1-0-2
| |
| |
1.36 1.35.2.2 ... myservice-1-0-3
| .
| .
1.37 . . . . .
To summarize the two-branch scenario:
Our final option to use a separate branch for every production release of the site. The complication over using two branches is that you will create another release branch each time you want to launch a new set of features. In this case the release cycle involves merging the current release branch with your development copy, creating a new release branch, moving your staging server onto the new branch for testing, and finally updating the production server when testing is complete.
Over time your revision history tree will have multiple branches coming out of the trunk and then merging back in as you go through the process of successive releases:
file
|
| myservice-1-0 (branch 1.35.2)
| /
1.35 ------------- 1.35.2.1
| |
| |
1.36 1.35.2.2
. .
. .
1.37 . . . . .
|
|
1.38
| myservice-1-1 (branch 1.39.2)
| /
1.39 ------------- 1.39.2.1
| |
| |
1.40 1.39.2.2
. .
. .
1.41 . . . . .
Although the basic steps are described above, it might be helpful to list them explicitly in the context of a full release cycle. They are:
% cd /web/myservice-dev % cvs -n update
% cd /web/myservice-dev/ % cvs update -j myservice-1-0 % cvs commit -m "merged with myservice-1-0"
% cd /web/myservice-dev % cvs tag -b myservice-1-1
% cd /web/myservice-staging % cvs update -d -r myservice-1-1
Run any necessary datamodel upgrades, test the new code on the staging server, fix any problems that arise, and commit all changes.
% cd /web/myservice % cvs update -d -r myservice-1-1
This is essentially the way we handle new releases of the ArsDigita Community System, except for the final steps of packaging up the distribution.
To summarize the multiple-branches scenario:
Version control is a critical aspect of software development, and doing it correctly will not only save you from potential coding disasters but also help reinforce good software engineering. It enourages developers to document what they do, fosters communiction among your team, and helps you (as a project leader) track the activity taking place from day to day. At ArsDigita we've standardized on CVS as our version-control system. Although CVS may seem complex at first, the process of managing a Web development project is straightforward once you learn a few basic principles. The benefits far outweigh the small amount of overhead necessary to use CVS effectivly.
Here are some general guidelines to avoid trouble: