.. -*- Mode: rst -*- ========================================= How to Use CVS Repository on cafed server ========================================= :Authors: Giulio Bottazzi :Contact: :Date: 16 May 2014 :Revision: 0.8.4 :Copyright: GPL .. contents:: .. 1 Introduction 2 What does a project contain 3 Starting a new project 4 Working on a project 4.1 The structure of the project 4.2 Tagging not copying 4.3 Project status and file information 4.4 Removing cruft 4.5 Handling large data files 5 Conclusions Introduction ============ In the past years I found myself, repeatedly, in the need of explaining how to use the CVS repository on our server. Since this kind of information has taken, with the passing of time, the character of an oral tradition, I thought that it was a valuable effort to remove this information from the domain of the tacit knowledge and start, at least to some extent, to codify it. The instructions that follow are the result of this effort. Notice that they are both a very short introduction to the CVS suit of programs and to the "standard" followed by our group in structuring and managing collaborative projects. These instructions, however, only represent vague and incomplete guidelines. In any case, you are strongly advised both to consult some CVS guide in the Net and to discuss the details of the implementation of a new project with your coauthors. What does a project contain =========================== The essential idea guiding the management of a project via CVS (or any other collaboration system) is that the project contains all and only the SOURCE FILES which are needed to derive the final objects. This essentially means two things: - any file that can be derived by some other file should NOT be INCLUDED in the repository - any file which is needed to derive the final components of the project should be INCLUDED in the repository For instance, for a standard paper you typically - include the latex file 'project.tex' and possibly a 'project.bib' but omit any .dvi,.ps. or .pdf file - if pictures are included in the document, and for instance generated by gnuplot, you include a gnuplot script file 'plots.gp' from which the plots can be generated using the command #gnuplot plots.gp but omit any .eps,.pdf,.gif file that derives from the execution of the previous program - if data are necessary to produce pictures or tables, include them in the repository. In this case it is useful to insert a README file that describes the source of these data. Also document the manipulations and transformations the data went through in order to reach their present status - if programs or scripts are necessary to manipulate data in order to obtain plots or tables, insert the source code of these programs/scripts and detail their use and purposes in the README file - if the same data are required in different formats, maybe because of the use of different software package, include if possible the data in a single format together with the scripts necessary to obtain other formats. Explain how to use these scripts in the README file. - it's a good idea to have a ChangeLog file in which major modifications are recorded. This serves as the "memory" of the project and to track the inclusion (or exclusions) of files in the repository. Note that in X/Emacs a ChangeLOg file is simply created and updated using the command ``Alt-X add-change-log-entry``. The whole point of this approach can be summarized in this idea: every participant to the project MUST be able to reproduce and modify any part of the project at any level. Any particular help by the original author of that particular part should not be necessary. In other terms, in any moment, one MUST be able to restart the project by the very beginning. Starting a new project ====================== First of all, in order to operate on a remote CVS repository you have to specify in your commands what repository to use. You can do it in two ways. You can set the 'CVSROOT' environment variable using:: export CVSROOT=user@machinename:repopath where 'user' is an user with access to the repository, 'machinename' is the name of the server and 'repopath' is the position of the repository in the directories tree. In this way any following 'cvs' command will use that repository. Alternatively, you can avoid setting a variable by adding the option:: -d user@machinename:repopath at each invocation of the cvs command. At this point you need to create a new directory:: mkdir newdir move all the necessary files in that directory and, if needed, create and fill subdirectories. Then move inside the root directory of the project:: cd newdir and issue the initial importing command:: cvs import -m "Imported sources" newdir AAA aaa or if the 'CVSROOT' variable was not set:: cvs -d user@machinename:repopath import -m "Imported sources" newdir AAA aaa where 'AAA' is a vendor tag, and 'aaa' is a release tag. In this way the name of the project will be 'newdir'. The vendor and release tag are labels used to mark the progress of the project. Initially you can set the vendor tag to some name related to the project and the release tag to 'initial' or just '1'. Working on a project ==================== In order to retrieve the last version of a project use:: cvs checkout projname or if you did not specify a CVSROOT variable:: cvs -d user@machinename:repopath checkout projname In this way a new directory named 'projname' will be created and filled with all the necessary files. You can then start editing or modifying the files. If you need to add a new file, first create it locally and then use:: cvs add filename The `add` command can also be used to add sub-directories to the project. To remove a file, first remove the local copy and then use:: cvs delete filename All the modifications, the addition and the removal of files will not be performed until you issue a 'commit' command like this:: cvs commit -m "short description string" Always add a short description with the option '-m' or the cvs program will ask for one, automatically starting an editor for you. Handling directories removal is slightly different. First, it is necessary to remove all the files in it and leave it empty. Then use:: cvs update -P The option -P automatically removes empty directories. Before committing your modifications, always remember to check that they are consistent. If you are modifying programs, check that they can be compiled and run as expected. If you are modifying a document, run the spell checker and remove typos. In general there is no need to commit changes in the middle of a modification. Just do it at the end of your work. Also, before committing, remember to write down you modification in the 'ChangeLog' file (with a date and the name of the author) and modify, if needed, the README file (see next section). Later if you need to work again on the project you do not need to check it out entirely. Just move in the project's directory and do:: cvs update Before starting to work on a project it is always a good idea do give an update command. Just in case somebody else did modify something. The command CVS has a lot of different options so I suggest to check its man page:: man cvs One thing which can be useful is the possibility of inspecting the differences between the local version and the version in the repository. The command:: cvs diff compare all the files. If you want to restrict the comparison to some specific file just add the name after 'diff':: cvs diff name_of_file The structure of the project ---------------------------- In general, given the difficulties in managing subdirectories in CVS, it is always better to keep their number to a minimum. A simple project, be it a document or a program, should be contained in one single root directory. Two files must always be present, namely 'README' and 'ChangeLog'. The file 'README' contains a description of the project. It can be as simple as a list of command to issue in order to have the project properly set up, or as complicated as a reference manual. In any case this description should be clear and maintained up to date with the project itself. The file 'ChangeLog' keeps track of all modifications. Each time a file of the project is modified, the name of the file, the date of modification and the name of the person who modified it should be recorded. If you use (X)emacs, you can simply record modifications in the change log with the command 'Alt-X add-change-log-entry'. If the project needs figures and plots, it is a good idea to store them in a specific sub-directory, appropriately named. When possible, it is better if these figures are generated from scratch using some scripts. If the program is 'gnuplot', then put all the necessary commands in the file 'plots.gp' so that the figures are generated with:: gnuplot plots.gp For more complex projects, the directory structure depends on the project itself. In the case of documents, for instance LaTeX documents, the root or main directory could contain two subdirectories: one named 'data' and one 'figures'.:: contains: /--->figures figures, fig files / main TeX files, scripts \ \--->data empirical data, simulations results The source code of the document, for instance the LaTeX '.tex' and '.bib' files, together with all the scripts and necessary programs reside in the main directory. The directory 'data' contains the empirical data or the result of simulations. The directory 'figures' contains all the figures, both the ones generated by scripts, like 'plots.gp', and the ones obtained from other sources. In general, for documents preparation, adding more directories is not required. Conversely, for complex software program of collection of programs, multiple directories are often necessary. Project status and file information ----------------------------------- The actual version of a specific file can be found using:: cvs -v status filename where the verbose option `-v` is needed to retrieve information on available tags. Historical information on all the modifications a given file undertook can be obtained with the command:: cvs log filename This command list all recent revisions of the file, together with the date at which the revision was committed, the author of the revision and the message string accompanying the commit. Working with revisions ---------------------- Each time a modified version of a file is committed, a new revision is created. As said above, revisions can be listed with the ``log`` command. If a specific revision of a file is required, simply use the ``-r`` option. For instance to compare the actual version of the file ``filename`` with its past fifth revision use:: cvs diff -r 1.5 filename To retrieve the same revision and put it in your local directory use:: cvs update -r 1.5 filename In this way the version of the file in your local directory is identical to the fifth revision of the file. By using this simple option it is possible to navigate along the entire file history, moving forward or backward. However, when a file is updated to a specific version, a *sticky tag* is created which forbid a normal update or commit of the same file. You can check the existence of a sticky tag for a file using the command:: cvs status filename if the files ``Sticky Tag`` is set to ``(none)``, the file is free from sticky tag and can be normally updated (and its modification committed). In order to release the lock created by the use of the ``-r`` option, use the option ``-A``:: cvs update -A filename In order to revert back to an old version, you have to retrieve the old file, copy it to a new temporary file, remove the lock, copy the file back and commit the new version. Assume you want to move the file ``filename`` back to version 1.5. Do the following:: cvs update -r 1.5 filename mv filename filename_old cvs update -A filename mv filename_old filename cvs commit -m "reverting to version 1.5" filename Tagging not copying ------------------- A project that involves distribute efforts from different people is likely to evolve a lot in its history. For example a paper can be sent to different journals, a presentation prepared for different audiences and a software program can undergo several releases. The point is that as long as these different versions can be considered different steps in the evolution of the same project, there is no reason to multiply the number of source files. In order to keep track of which precise version was sent to a given journal, or was included in a given release, the use of "tags" turns out to be very effective. When a project was created (see `Starting a new project`_) we assigned to it an initial vendor tag "AAA" and release tag "aaa". Now at a given point in time, for instance before sending the paper to a journal, you can "tag" all the file in the project with a symbolic label using the command:: cvs tag tagname . where the dot stands for all the files in the present directory and its sub-directories. Later, if the need arises to recover exactly that version, you can obtain it from the repository using the option '-r' with the check out command, like in:: cvs checkout -r tagname projectname You can of course need to tag a single file. In this case, just use its name instead of the dot `.` as in:: cvs tag tagname filename Notice that the list of available tag for a specific file can be retrieved using:: cvs log filename and looking at the list of 'symbolic names' at the beginning of the output. If the need arises to remove a tag, you can simply remove it with:: cvs tag -d tagname filename and you can add a tag to a specific revision number using:: cvs tag -r revnum tagname filename where 'revnum' is a revision number, like for instance '1.25'. FInally, a tag can be simply removed from 'filename' using:: cvs tag -d tagname filename Removing cruft -------------- After a while, your local copy of the project can contain several unneeded files. For instance auxiliary files created by (La)TeX program, or backup copies of files generated by the editor in use. It is possible to remove all the files not in the project with a simple shell command:: rm `cvs update | gawk '{if($1=="?") print $2 }' ` Before doing it, please remember to commit all the changes, or newly added files will be lost! Handling large data files ------------------------- Projects sometimes contain large database files. These files are usually generated once and kept unchanged during the various revision of the project. Moreover, it is likely that they are maintained and prepared independently from the actual project and stored somewhere else. On the one hand, their dimension makes them unsuitable to be inserted directly in the CVS repository. On the other hand, since one wants to be sure to work on the last version of the data files, some mechanism should be put in place to keep track of the database version. These two requirements can be fulfilled in the following way - do not put the files in the repository. Instead describe their original source (where they come from) and how to generate them in the appropriate README file. Please remember that the generation procedure should also provide the names for the final files. - once the files have been generated, compute their checksum signature and put it in a file:: md5sum namefiles >> data.checksum - copy the file `data.checksum` in the `data` sub-directory of the CVS project. Add it to the repository:: cvs add data/data.checksum Now any user who is able to obtain the data files can copy them in the `data` folder, and check if the obtained files are correct using the command:: md5sum -c data.checksum in the `data` directory. Of course, if for some reason the data files have to be modified, the information in `data.checksum` should be changed accordingly. In particular, remember that decompressing a file and compressing it again with tools like `gzip` or `bzip`, does in general change its checksum. In this case it is probably better to store the checksum of the uncompressed file in `data.checksum` and perform the check on it. To avoid the removal of data files by the cruft removal script in `Removing cruft`_ , you have to tell CVS that these files should be ignored. Simply add their names in the `.cvsignore` file of the directory that contains them:: echo "namefiles" >> data/.cvsignore and add `.cvsignore` to the repository. Conclusions =========== I know that young researchers are typically impatient and feel the time spent in the organization of the sources and the description of the process that transforms these sources in the final objects as wasted. I might reply that I can't think of a time better spent. This is a Universal Truth, but it is possible that many will disagree with me. In any case, on our server we will implement a very effective method to protect you from yourself: when a project is found that contains unnecessary files, or lack necessary files, it will be simply removed from the repository and the last contributor will be asked to fix the issue.