by Philip Greenspun (philg@mit.edu)
ArsDigita : ArsDigita Systems Journal : One article
What's wrong with the two-server plan? Nothing if you are running
photo.net circa 1997. The development team consisted of me and Jin.
The testing team... me and Jin! Note that there was no possibility of
simultaneous development and testing. ArsDigita.com customers, however, usually have
enough budget to pay for four or five programmers plus 20 or 30 internal
staffers who may be updating content, testing changes, and sometimes
contributing code. For a complex site, the publisher may wish to spend
a week testing before launching a revision. It isn't acceptable to idle
authors and developers while a handful of testers bangs away at the
development server. The solution? A staging server, rooted at
/web/foobar-staging/ (Server 3).
Here's how the three are used:
So it would seem that we'll need at least one new Oracle playground.
Here are the steps:
The bottom line is that it takes work to keep three Oracle users'
objects in sync. It is half as much work to sync two and almost as
useful. How to deploy these two Oracle users? Park one behind the
production server. Use the other one behind the dev and staging
servers.
CVS does all of this via its repository or "CVS root". This is a
directory, typically /usr/local/cvsroot/. Most Unix machines don't have
enough space in the /usr partition to store all Web content. Remember
that the CVS root will be at least as large as all of the files under
source control. Thus we will use /cvsweb as our CVS root and, if need
be, migrate it to a separate disk subsystem.
Create a project from your development Web sources (from
/web/foobar-dev/) so that they will end up at /cvsweb/foobar/.
Who is really using CVS then? A cron job. Every day just before
midnight the cron job should check in all changes from the dev server to
the main branch, with the change comment 'nightly check-in YYYY-MM-DD'.
The cron job should notify the Release Master if any files that are in
the repository have been deleted so that he or she can decide whether
the removal was a mistake or if typing
One person is designated the Release Master. Normally this person does
nothing. When the publisher is happy with the behavior of the
development server, the Release Master creates a CVS branch named
"199909Launch" or whatever. The Release Master updates the staging
server from CVS with this branch. Development proceeds with checkins to
the main CVS branch. These won't affect updates from the 199909Launch
branch.
Once the staging server has been thoroughly tested, the Release Master
checks in any changes that have been made. The check-in happens twice,
once to the 199909Launch branch (there won't be any conflicts since
nobody has been touching this) and once to the main branch (conflicts
may need to be resolved).
When the publisher decides to go live, the Release Master takes the
following steps:
If the Release Master is doing all of this hard work, why do we need to
train anyone else in CVS? A Web service is 24x7 but one person can't
work 24x7. So we need a Release Apprentice for each Web service who
knows everything that there is to know about this system.
The ArsDigita Community System generally contains the following under
/web/foobar:
If you're worried about your developers being sloppy and editing files
in /web/foobar/ when they thought they were in /web/foobar-dev/ remember
that you can always use
Suppose that you've ample money for server hardware, co-location fees,
and sysadmin resources. You probably want to split the production
machine out and only give the Release Master and Release Apprentice
access to that box. Let the developers and staging/testing folks fight
it out on a development server.
Compare this to the world of db-backed Web servers. If you want to
check out a copy of the tree and play with it, you have to create an
Oracle user and tablespace, import a recent Oracle export.dmp file to
populate your tablespace with what was on the production site, find a
free IP address or port and set up a Web server, and then keep your
Oracle table definitions in sync with any alterations other developers
may be making.
In the C world, developers live to satisfy themselves. More than
likely, not another soul on the planet will ever run the code that they
are authoring. So it is fine for them to work alone. In the Web world,
developers always work with the publisher and users. Those
collaborators will need to be alerted to this new server so that they
can offer criticism and advice. They might need special passwords or
firewall access since most publishers don't like to let the public see
their unfinished development efforts.
In the C world, you've got the luxury of one or two years between
product releases. All the work is done by people with at least four
years of training. In the Web world, a significant new release may need
to be produced in four weeks. Much of the work may be done by people
with no formal training of any kind, e.g., designers and content authors
editing templates or static .html pages. Given the chronic shortage of
personnel in this industry, do you want to limit yourself to being able
to hire only those who've been through a CVS training course? To those
who are formally minded enough to read the CVS man pages? Remember that
most of the contributors on your site will not be programmers.
The bottom line? It is just too much work to set up each contributor
with his or her own little server.
If you are setting up a new cvs server, spend a few extra minutes to configure CVS using the client-server ("pserver") mode, instead of the older file system mode. This will save you pain later and may keep you out of hot water. Pain, because moving the repository (your old one dies, your company IPO's and your boss wants to buy a big fancy server farm, you want to hide the repository behind a firewall) is matter of changing an environment variable. You get immedieate access control (developers can be protected from updating the production environment). CVS in file system mode can "hang" because it leaves a lock file around for each file and directory. Then you need a cvs guru to dive in and fix it. One note: you can't live in a mixed environment. It is either one mode or the other.
An expert tip on using client server: CVS uses gzip for compressing data across the network. The default setting is -z3 which is a pitiful waste of time. Recompile CVS to use -z9 by default (the network is the bottleneck, not CPU resources), or add it to everyone's .cvsrc configuration file (it lives in the users' home directory).
I've had some extremely painful experiences with CVS and large binary files. (Large is +32Mb) When CVS checks a file out of the repository, even if it is doing nothing more than a straight copy (no diff'ing, merging, etc.) the program brings the whole file into contiguous memory. This bloats the CVS process resident set size to at least the size of the file, +6Mb for the program, give or take. The process is inefficient, so subsequent large files don't reusue the space well. CVS bloats even more. Make sure that your server is configured with a lot of swap space (it should have a lot of memory anyway). Even so, performance will drag down into the ground until CVS is finished (could be 30 minutes for a large working set), then things will "mysteriously" return to normal.
In more mundane companies however you usually have at least one mid-lewel manager who will see the amount of code checked-in every day as a measurement of individual emplye efficiency, and wrech all sorts of havoc with this misguided "knowledge".
Apart from this your proposed method sounds remarkely similar to what I have been doing for various db backed websites over the last few years. It has proven itself to me to be a great time saver and I don't even want to calculate how many near disasters with their associated all night fix-up sessions it has saved me or my co-workers from.
The pserver is surely the only way to share CVS among a group of people without running into all sorts of non-interesting problems with nfs etc. You can also tunnel it through ssh for secure over-the-net operations.
As for the managers, they usually don't care. Obviously some misguided soul is going to use this tool to gather information on who worked on what and for how long, but around the office 99% of the people are interested in it because it saves us from many headaches. I don't think I ever want to work on a project without some kind of version control.
When dealing with large teams of developers using CVS can be a real headache. One alternative would be BitKeeper which solves most (if not all) CVS's problems. It was written by the guys that did SUN's TeamWare's source management system.
The Mozilla project is doing well with CVS and a quite large number of developers. Also, CVS pserver mode is rather unsafe, but it works really well over SSH. I made a small Perl script that is used as a shell for accounts that I only want them to do CVS that checks if the user is going to run CVS, and shows them a message if not. Also there is no need to create another instance of Oracle for this purpose. The same development instance can be used for all developers. Database changes happen more infrequently than does code changes. Where I work, the process of creating a seperate development space has been packaged into an RPM, or you can use whatever packaging system you prefer. There is no need for different IP address, just use a different port. All development areas can reside on the same machine.
If you have a very clear publishing objective, specs that never change,
and one very smart developer, you don't need version control. If you
have evolving objectives, changing specifications, and multiple
contributors, you need version control.
The Solution
Let's go through these item by item.
Item 1: Three Web Servers
Suppose that your overall objective is to serve a Web service accessible
at "foobar.com". You need a production server, rooted at /web/foobar
(Server 1). You don't want your programmers making changes on
the live production site. That's sort of the whole point of this
document. So you need a development server, rooted at /web/foobar-dev/
(Server 2). You might think that this is enough. When everyone
is happy with the dev server, have a code freeze, test a bit, then copy
the dev code over to the production directory and restart.
Item 2: two Oracle users/tablespaces
Suppose that you have a working production site. You could connect your
/web/foobar-dev/ to the production Oracle user. After all, Oracle's
raison d'être is concurrency control. It will be happy to run
eight simultaneous connections to your production site plus two or three
to the development server. The fly in this ointment is that one of your
developers might get a little sloppy and write a program that sends
drop table users rather than drop table
users_experimental_extra_table to the database.
Shouldn't we have three Oracle users? One for dev, one for staging, one
for production? No. It usually isn't worth it. Adding a column to a
relational database table seldom breaks queries. Until Oracle 8.1.5,
you weren't able to drop a column. And anyway the radical data model
changes tend to take place when a site has yet to be launched.
Item 3: one Concurrent Versions System (CVS) root
The Concurrent Versions System (CVS) is a powerful file system-based tool that
can do the following things:
CVS is free and open-source.
Item 4: Two Trained CVS Users
Don't plan to teach all of your contributors the arcana of CVS. The
ones who use GNU Emacs will need to learn to type c-x c-q and c-c c-c to
contribute change comments. But the contributors who use primitive
tools (FTP, HTTP PUT, vi) can remain blissfully unaware of the fact that
CVS is in use.
cvs remove is
warranted (the files don't really go away; they go into an "attic").
If there are significant data model changes, do this in the middle of
the night and consider bringing up a "comebacklater" server for a few
minutes!
Exactly which directories do we control?
A programmer's intuitions about which directories to control will
generally be 180-degrees off. For example, a programmer might think
that it isn't worth controlling graphics files. After all, CVS can't
really do much with these besides compare them byte by byte and tag them
with dates.
The bottom line is that it would be nice to just say "all of
/web/foobar-dev" but we can't do this unless we're careful with the
auxconfigdir (/parameters) and make sure to keep user-uploaded files out
of the /web/foobar/ directory.
Do you need a farm of big fancy servers to implement this?
How big and how many computers do you need to adopt the procedures
described in this document? Three Web servers, two Oracle users, the
CVS package, ... Sounds complicated. Actually you can run it all on a
$2000 Linux box.
cvs update to revert the production
site to the most recent approved version.
Why not one development area per developer?
Classically, CVS is used by C developers and each C programmer works
from his or her own directory. This makes sense because there is no
persistence in the C world. You compile your code, run a binary that
builds data structures in RAM and when the program terminates it doesn't
leave anything behind (except maybe a core file). Checking out a CVS
tree and working on it isn't a big deal.
Good Things About This System
To end this article on a positive note, let's summarize the good things
about this system:
More
asj-editors@arsdigita.com
Reader's Comments
-- Ken Mayer, July 23, 1999
Your proposed once per day automatic check-in of everything is a nice idea for a group such as your ArsDigita companay with it's fairly non standard
mission statement.
I'm sure some of you have expierenced mid-level managers who were too dump to even figure out how to do this, but I have never been that unlucky ;-)
-- Kristian Sørensen, July 24, 1999
Regarding putting the stuff in /parameters - the .ini files - under CVS, and requiring different .ini files for your three servers: this is a darn good reason to use Tcl configuration files in AOLserver 3.0 instead of .ini files. Then config file can use Tcl to determine whether it's a production, dev, or staging server (based on an environment variable, or the server home, etc.), and use the appropriate config values.
-- Rob Mayoff, February 26, 2000
Although my company does not use CVS, we have used Microsoft's Visual Source Safe and Intersolv's PVCS Version Manager. Both were a pain to setup and have people use them. All complains usually go away after the first time that version control saves your day after some screwup.
-- Pedro Vera-Perez, March 14, 2000
-- Petru Paler, April 18, 2000
#! /usr/bin/perl -Tw
use strict;
delete $ENV{ENV};
$ENV{PATH}='/usr/bin:/bin:/usr/sbin:/sbin';
if($ARGV[1] ne 'cvs server') {
print STDERR "This account can only be used for CVS access\n";
exit(1);
}
exec("cvs server");
-- Pierre Phaneuf, November 6, 2000
The whole purpose of having a source control system is so
multiple developers can work on the same set of source files.
Suggesting only 2 persons need to know CVS while others still
check stuff in with FTP shows that the author does not appreciate
the full potential of CVS or a similar source control system.
-- jay teo, March 9, 2001
I have to disagree with your comments on having one development area per developer. I think its mandatory and would defeat the whole purpose of CVS. Why would you have two developers working in the same directory at the same time? The file system doesn't allow for two users to edit the same file at the same time.
-- Thai Nguyen, February 14, 2002Related Links