How to Use CVS Repository on cafed server

Authors: Giulio Bottazzi
Contact: <bottazzi@sssup.it>
Date: 12 July 2013
Revision: 0.8.3
Copyright: GPL

Contents

Introduction

In the past years I found myself, repeatedly, in the need of explaining how to use the CVS repository on our server. Since this kind of information has taken, with the passing of time, the character of an oral tradition, I thought that it was a valuable effort to remove this information from the domain of the tacit knowledge and start, at least to some extent, to codify it. The instructions that follow are the result of this effort. Notice that they are both a very short introduction to the CVS suit of programs and to the "standard" followed by our group in structuring and managing collaborative projects. These instructions, however, only represent vague and incomplete guidelines. In any case, you are strongly advised both to consult some CVS guide in the Net and to discuss the details of the implementation of a new project with your coauthors.

What does a project contain

The essential idea guiding the management of a project via CVS (or any other collaboration system) is that the project contains all and only the SOURCE FILES which are needed to derive the final objects. This essentially means two things:

For instance, for a standard paper you typically

The whole point of this approach can be summarized in this idea: every participant to the project MUST be able to reproduce and modify any part of the project at any level. Any particular help by the original author of that particular part should not be necessary. In other terms, in any moment, one MUST be able to restart the project by the very beginning.

Starting a new project

First of all, in order to operate on a remote CVS repository you have to specify in your commands what repository to use. You can do it in two ways. You can set the 'CVSROOT' environment variable using:

export CVSROOT=user@machinename:repopath

where 'user' is an user with access to the repository, 'machinename' is the name of the server and 'repopath' is the position of the repository in the directories tree. In this way any following 'cvs' command will use that repository. Alternatively, you can avoid setting a variable by adding the option:

-d user@machinename:repopath

at each invocation of the cvs command.

At this point you need to create a new directory:

mkdir newdir

move all the necessary files in that directory and, if needed, create and fill subdirectories. Then move inside the root directory of the project:

cd newdir

and issue the initial importing command:

cvs import -m "Imported sources" newdir AAA aaa

or if the 'CVSROOT' variable was not set:

cvs  -d user@machinename:repopath import -m "Imported sources" newdir AAA aaa

where 'AAA' is a vendor tag, and 'aaa' is a release tag. In this way the name of the project will be 'newdir'. The vendor and release tag are labels used to mark the progress of the project. Initially you can set the vendor tag to some name related to the project and the release tag to 'initial' or just '1'.

Working on a project

In order to retrieve the last version of a project use:

cvs checkout projname

or if you did not specify a CVSROOT variable:

cvs -d user@machinename:repopath checkout projname

In this way a new directory named 'projname' will be created and filled with all the necessary files. You can then start editing or modifying the files. If you need to add a new file, first create it locally and then use:

cvs add filename

The add command can also be used to add sub-directories to the project.

To remove a file, first remove the local copy and then use:

cvs delete filename

All the modifications, the addition and the removal of files will not be performed until you issue a 'commit' command like this:

cvs commit -m "short description string"

Always add a short description with the option '-m' or the cvs program will ask for one, automatically starting an editor for you.

Handling directories removal is slightly different. First, it is necessary to remove all the files in it and leave it empty. Then use:

cvs update -P

The option -P automatically removes empty directories.

Before committing your modifications, always remember to check that they are consistent. If you are modifying programs, check that they can be compiled and run as expected. If you are modifying a document, run the spell checker and remove typos. In general there is no need to commit changes in the middle of a modification. Just do it at the end of your work. Also, before committing, remember to write down you modification in the 'ChangeLog' file (with a date and the name of the author) and modify, if needed, the README file (see next section).

Later if you need to work again on the project you do not need to check it out entirely. Just move in the project's directory and do:

cvs update

Before starting to work on a project it is always a good idea do give an update command. Just in case somebody else did modify something.

The command CVS has a lot of different options so I suggest to check its man page:

man cvs

One thing which can be useful is the possibility of inspecting the differences between the local version and the version in the repository. The command:

cvs diff

compare all the files. If you want to restrict the comparison to some specific file just add the name after 'diff':

cvs diff name_of_file

The structure of the project

In general, given the difficulties in managing subdirectories in CVS, it is always better to keep their number to a minimum.

A simple project, be it a document or a program, should be contained in one single root directory. Two files must always be present, namely 'README' and 'ChangeLog'.

The file 'README' contains a description of the project. It can be as simple as a list of command to issue in order to have the project properly set up, or as complicated as a reference manual. In any case this description should be clear and maintained up to date with the project itself.

The file 'ChangeLog' keeps track of all modifications. Each time a file of the project is modified, the name of the file, the date of modification and the name of the person who modified it should be recorded. If you use (X)emacs, you can simply record modifications in the change log with the command 'Alt-X add-change-log-entry'.

If the project needs figures and plots, it is a good idea to store them in a specific sub-directory, appropriately named. When possible, it is better if these figures are generated from scratch using some scripts. If the program is 'gnuplot', then put all the necessary commands in the file 'plots.gp' so that the figures are generated with:

gnuplot plots.gp

For more complex projects, the directory structure depends on the project itself. In the case of documents, for instance LaTeX documents, the root or main directory could contain two subdirectories: one named 'data' and one 'figures'.:

                   contains:

     /--->figures  figures, fig files
    /
main               TeX files, scripts
    \
     \--->data     empirical data, simulations results

The source code of the document, for instance the LaTeX '.tex' and '.bib' files, together with all the scripts and necessary programs reside in the main directory. The directory 'data' contains the empirical data or the result of simulations. The directory 'figures' contains all the figures, both the ones generated by scripts, like 'plots.gp', and the ones obtained from other sources. In general, for documents preparation, adding more directories is not required. Conversely, for complex software program of collection of programs, multiple directories are often necessary.

Project status and file information

The actual version of a specific file can be found using:

cvs -v status filename

where the verbose option -v is needed to retrieve information on available tags.

Historical information on all the modifications a given file undertook can be obtained with the command:

cvs log filename

This command list all recent revisions of the file, together with the date at which the revision was committed, the author of the revision and the message string accompanying the commit.

Working with revisions

Each time a modified version of a file is committed, a new revision is created. As said above, revisions can be listed with the log command. If a specific revision of a file is required, simply use the -r option. For instance to compare the actual version of the file filename with its past fifth revision use:

cvs diff -r 1.5 filename

To retrieve the same revision and put it in your local directory use:

cvs update -r 1.5 filename

In this way the version of the file in your local directory is identical to the fifth revision of the file. By using this simple option it is possible to navigate along the entire file history, moving forward or backward. However, when a file is updated to a specific version, a sticky tag is created which forbid a normal update or commit of the same file. You can check the existence of a sticky tag for a file using the command:

cvs status filename

if the files Sticky Tag is set to (none), the file is free from sticky tag and can be normally updated (and its modification committed).

In order to release the lock created by the use of the -r option, use the option -A:

cvs update -A filename

In order to revert back to an old version, you have to retrieve the old file, copy it to a new temporary file, remove the lock, copy the file back and commit the new version. Assume you want to move the file filename back to version 1.5. Do the following:

cvs update -r 1.5 filename
mv filename filename_old
cvs update -A filename
mv filename_old filename
cvs commit -m "reverting to version 1.5" filename

Tagging not copying

A project that involves distribute efforts from different people is likely to evolve a lot in its history. For example a paper can be sent to different journals, a presentation prepared for different audiences and a software program can undergo several releases. The point is that as long as these different versions can be considered different steps in the evolution of the same project, there is no reason to multiply the number of source files. In order to keep track of which precise version was sent to a given journal, or was included in a given release, the use of "tags" turns out to be very effective. When a project was created (see Starting a new project) we assigned to it an initial vendor tag "AAA" and release tag "aaa". Now at a given point in time, for instance before sending the paper to a journal, you can "tag" all the file in the project with a symbolic label using the command:

cvs tag tagname .

where the dot stands for all the files in the present directory and its sub-directories. Later, if the need arises to recover exactly that version, you can obtain it from the repository using the option '-r' with the check out command, like in:

cvs checkout -r tagname projectname

You can of course need to tag a single file. In this case, just use its name instead of the dot . as in:

cvs tag tagname filename

Notice that the list of available tag for a specific file can be retrieved using:

cvs log filename

and looking at the list of 'symbolic names' at the beginning of the output. If the need arises to remove a tag, you can simply remove it with:

cvs tag -d tagname filename

and you can add a tag to a specific revision number using:

cvs tag -r revnum tagname filename

where 'revnum' is a revision number, like for instance '1.25'.

Removing cruft

After a while, your local copy of the project can contain several unneeded files. For instance auxiliary files created by (La)TeX program, or backup copies of files generated by the editor in use. It is possible to remove all the files not in the project with a simple shell command:

rm `cvs update | gawk '{if($1=="?") print $2 }' `

Before doing it, please remember to commit all the changes, or newly added files will be lost!

Handling large data files

Projects sometimes contain large database files. These files are usually generated once and kept unchanged during the various revision of the project. Moreover, it is likely that they are maintained and prepared independently from the actual project and stored somewhere else. On the one hand, their dimension makes them unsuitable to be inserted directly in the CVS repository. On the other hand, since one wants to be sure to work on the last version of the data files, some mechanism should be put in place to keep track of the database version. These two requirements can be fulfilled in the following way

  • do not put the files in the repository. Instead describe their original source (where they come from) and how to generate them in the appropriate README file. Please remember that the generation procedure should also provide the names for the final files.

  • once the files have been generated, compute their checksum signature and put it in a file:

    md5sum namefiles >> data.checksum
    
  • copy the file data.checksum in the data sub-directory of the CVS project. Add it to the repository:

    cvs add data/data.checksum
    

Now any user who is able to obtain the data files can copy them in the data folder, and check if the obtained files are correct using the command:

md5sum -c data.checksum

in the data directory.

Of course, if for some reason the data files have to be modified, the information in data.checksum should be changed accordingly. In particular, remember that decompressing a file and compressing it again with tools like gzip or bzip, does in general change its checksum. In this case it is probably better to store the checksum of the uncompressed file in data.checksum and perform the check on it.

To avoid the removal of data files by the cruft removal script in Removing cruft , you have to tell CVS that these files should be ignored. Simply add their names in the .cvsignore file of the directory that contains them:

echo "namefiles" >> data/.cvsignore

and add .cvsignore to the repository.

Conclusions

I know that young researchers are typically impatient and feel the time spent in the organization of the sources and the description of the process that transforms these sources in the final objects as wasted. I might reply that I can't think of a time better spent. This is a Universal Truth, but it is possible that many will disagree with me. In any case, on our server we will implement a very effective method to protect you from yourself: when a project is found that contains unnecessary files, or lack necessary files, it will be simply removed from the repository and the last contributor will be asked to fix the issue.