Importing files from a CMS library into Git.

1. Some background

CMS (Code Management System) is a non-distributed Version Control System (VCS) developed and maintained at DEC/Compaq/HP/VSI as part of the DECset collection of tools. It runs on OpenVMS VAX, Alpha, I64 and x86. A VCS is also known as revision control or source code management system. To name a few of the currently popular ones: CVS (concurrent versions system), SVN ("Subversion"), Mercurial and Git.

The base objects in a CMS library are elements. An element consists of all versions of a (source) file. As there are file versions on OpenVMS, here the element versions are named generations. A generation reflects a status of development in the (source) file. For the main line of development the generations are numbers, starting with 1 and increasing. For a side line the generation is called a variant. A variant's generation is specified by the main line number, from which it is derived, and a single letter, specified at reservation time, plus a number, again starting with 1, automatically assigned at checkin/replacement time. For example, an element FOO.C in a library may have generations 1, 2 and 3. When generation 3 exists and generation 2 needs a change, one creates a variant of generation 2, that opens a side line. For example it may be reserved as variant with the letter T. When the modified file is put back into CMS with the replace command, CMS creates a variant of FOO.C as generation 2T1.

With variant names only ranging from 'A' to 'Z' it is very likely that in big projects the names are reused. That is, a variant A for one element may not have any relationship to variant A of another element. Also, from looking at CMS libraries, it seems that in some projects there were no guide lines, how to use variant letters: the same letter may be used for different purposes.

The other important object in a CMS library is a class. It describes a development or project status of the software. A class is defined as a set of particular generations of elements. Obviously not all elements of the library need to be in a class and for an element any but only one generation including a variant can be contained in a class.

To compare with Git, the base objects are files as well. However the other important "object" is the current collection of the files know to Git, which describes a development or project status of the software. This "object" essentially is a snapshot of all files known to Git. Files do not have a version or generation identifier attached. A version of a file is defined by the snapshot to which it belongs. So the main developemnt line is a sequence of snapshots. A side line is a branch, which can be created from any snapshot and can have a user defined, descriptive name. A branch again is a sequence of snapshots. In CMS terms, it very likely contains at least one variant. Git snapshots can be tagged, simply said they can have names. All tags are - more or less - a subset of all snapshots. A tag, that is a named snapshot, can be compared to a CMS class. The big difference here is, that a CMS class can be defined at any time, independent of replacing CMS objects (files). A replacement in CMS can be compared to a committment in Git, which creates a snapshot. But as defining the members of a class is not a single CMS command, and as the generation of an element being a member of a class can be changed any time as well, mapping a CMS class to a Git tag/snapshot is not straight forward.

2. How imports can be done

A possible scheme how a CMS library is imported, by example. The CMS library view:

BAR.C(1)     FOO.C(1)     MAIN.C(1)
 |	      |	           |
-&------------&------------&--------- Class V1.0
 |\	      |	           |
BAR.C(2)      |           MAIN.C(2)
 |  |  	      |            |
BAR.C(3)      |            |
 |  |  	      |            |
-&------------&------------&--------- Class V2.0
 |  |  	      |            |
 | BAR.C(1A1) |	           |
 |  |         |            |
----&---------&------------&--------- Class ECO 1.1
 |  |         |            |
 | /	      |            |
BAR.C(4)      |            |
 |            |	           |
-&------------&------------&--------- Class V2.1
 |            |	           |
BAR.C(5)     FOO.C(2)      |
 |            |	           |



Obviously, the BAR.C(1A1) was created after BAR.C(2), with the merge being done into BAR.C(4). The diagram indicates that variant 1A1 was created after Class V2.0 was created and its members were defined.

Ideally, classes should be converted to tags, variants should be in branches and classes with variants should be tagged in branches as well.

That is, the result of an ideal import should look like (to identify the CMS elements their generations are added):

master
 |
BAR.C(1) FOO.C(1) MAIN.C(1)
 |
 +<--- tag V1.0
 |`---------------------------+<--- branch ECO 1.1
 |                            |
BAR.C(2) MAIN.C(2) FOO.C(1)   |
 |                           BAR.C(1A1) FOO.C(1) MAIN.C(1) 
BAR.C(3) MAIN.C(2) FOO.C(1)   |
 |                            +<--- tag ECO 1.1
 +<--- tag V2.0
 |
BAR.C(4) MAIN.C(2) FOO.C(1)
 |
 +<--- tag V2.1
 |
BAR.C(5) MAIN.C(2) FOO.C(2)
 |



However, that's impossible due to the design of CMS. An INSERT GENERATION <element> <class> can insert any version of an element into a class; there is no guarantee that all the elements of a class were checked in with the same command, at the same time or with the same remark and therefore these elements can not be grouped into one Git commit; it happens to be the case here, but that's only in this small, artificial example.

Here, to make it simple, each file generation in the main development stream of a CMS library is added to the main development stream of a Git repository, to the "master". CMS variants are not part of the main development stream and so they are not included in the "master". For a CMS class a Git branch is created at the root of the the "master". Each element of the class is added to this branch. There can be only one file generation in a class. It can be a variant. Unfortunately this duplicates files in the repository. Also, if there are more than one variant generations for a file, only the last one is in the Git repository. Obviously CMS variant can only be in Git branches. From the above example, BAR.C(1A1) becomes BAR.C in the branch ECO 1.1. This branch also contains generation 1 of FOO.C and MAIN.C.

Tagging current state of the branches with the associated CMS class name should be done - not yet implemented in the below menitoned perl script.

The example CMS library can be imported into Git as (to identify the CMS elements their generations are added):

master
 |`-----------&------------&------------&------------&
 |           V1.0	  V2.0	       ECO 1.1	    V2.1
BAR.C(1)      |		   |	        |  	     |
FOO.C(1)      |	           |            |            |
MAIN.C(1)     |            |            |	     |
 |	     BAR.C(1)	   |            |	     |
 |	     FOO.C(1)      |            |	     |
 |	     MAIN.C(1)     |            |	     |
BAR.C(2)     	      	   |            |	     |
MAIN.C(2)     		   |            |	     |
BAR.C(3)      		   |            |	     |
 |			  BAR.C(3)      |	     |
 |			  FOO.C(1)      |	     |
 |			  MAIN.C(2)     |	     |
 |	      		   	       BAR.C(1A1)    |
 |				       FOO.C(1)	     |
 |				       MAIN.C(1)     |
BAR.C(4)			       		     |
 |	      		   			    BAR.C(4)
 |						    FOO.C(1)
 |						    MAIN.C(2)
BAR.C(5)
FOO.C(2)

3. What's available for demonstration?

There are tools to import from a CMS library into a Git repository. Optionally, an import is done for all elements from

The third import type can be combined with one of the first two types. That is, one can import a class into an existing, imported Git repository.

The first import option will fail, if CMS elements were renamed or deleted: the named elements in the CMS history can not be found in the current CMS library.

For demonstration there are two tools available to do the import. One tool, a perl script, runs on the system with the Git repository. The other tool is an http server runing on the VMS system with the CMS library. The perl script sends CMS commands to the http server. The server executes the CMS commands in its environment and sends files/content back to the perl script. The perl script writes the received content into files and adds them to the Git repository.

USAGE: git-cmsimport.pl [OPTION]... LIBRARY URL
Import the latest generation of all elements from the CMS LIBRARY
(in VMS syntax) located by the URL. Creates a git repository and
adds the retrieved CMS elements to the master.

IMPORT MODE OPTIONS
  -c CLASS   Import all the elements of the CMS class CLASS, which are added
             to a git branch CLASS starting at the root of the git master.
             If a git repository for this CMS library already exists and the CLASS
             is not already in the repository, it is added.
  -h         Import the full main line of the CMS LIBRARY according to its history.
             Creates a repository with the elements in the master
OPTIONS
  -f FILE    Use the retrieval information in FILE to retrieve the CMS elements;
             without an IMPORT MODE OPTION or with -c CLASS a list of elements
             is expected, with the IMPORT MODE OPTION -h a history is expected
  -F         Do not import, only save the retrieval information into a file;
             without an IMPORT MODE OPTION or with -c CLASS a list of elements
             is saved into ./cms-elements.txt, with the IMPORT MODE OPTION -h
             the history is saved into ./cms-history.txt.
             The history is filtered for CREATE and REPLACE commands.
  -k         Keep the files on the server side: the client does not send a delete request
             and the server does not delete any fetched file. This can speed up importing
             the files into git but requires manual cleanup at the server side.
  -l         Locally lowercase all VMS names: library, user and elements
  -r REPOSITORY
             Name of the to be created repository; default is the last subdirectory
             in the specified LIBRARY argument.
  -T         Tag all commits with the CMS file and generation.
             This can be useful, to map CMS objects to git objects and vice versa.
             In case of importing a CLASS, its name is prepended to the tag.
  -t OFFSET  4 digit time zone offset from UTC (rfc2822)
  -v LEVEL   Verbose, log CMS FETCH commands, ...

Examples:
  ./x.pl -F -h [.cmsdemo] http://eisner.encompasserve.org:8080
  ./x.pl -f cms-history.txt -h -l -t -0600 [.cmsdemo] http://eisner.encompasserve.org:8080


For the above example, to import the main line and all classes one need to issue:

$ git-cmsimport.pl -h [.cmsdemo] http://eisner.encompasserve.org:8080
$ git-cmsimport.pl -c 'V1.0' [.cmsdemo] http://eisner.encompasserve.org:8080
$ git-cmsimport.pl -c 'V2.0' [.cmsdemo] http://eisner.encompasserve.org:8080
$ git-cmsimport.pl -c 'ECO 1.1' [.cmsdemo] http://eisner.encompasserve.org:8080
$ git-cmsimport.pl -c 'V2.1' [.cmsdemo] http://eisner.encompasserve.org:8080


The http server should be started from an empty directory. To fetch CMS elements from the CMS library, the server creates files in its default directory. After transmission of the content to the perl script, the file is deleted. Starting from an empty directory makes it easier to clean up in case of errors.

CMS remarks eventually will be used as Git commit messages. The remark is used as it was formatted by CMS. That is, it may consist of several lines.

As already indicated, the CMS commands (as shown in the history)

  DELETE ELEMENT name
  MODIFY ELEMENT oldname newname

create some problems.

DELETE deletes the whole element from the CMS library. Which means there was a "CREATE ELEMENT name" in the history, maybe followed by some "REPLACE name" entries, and finally this "DELETE ELEMENT name". With the element gone and all traces removed the element can't be found and can't be imported at the time the perl script sees and processed the "CREATE ELEMENT name" entry (or any of the subsequent REPLACE entries) in the CMS history. The perl script doesn't look ahead to know about the deletion. So the CMS command is processed but the script will very likely abort with an error message.

However, it may even do something, if the deleted element was re-created later with the very same name. This is bad, very bad! But there is not much which can be done in the script to avoid this,

The same is true for a "MODIFY ELEMENT oldname newname", which essentially renames an element. Again, anything in the history referencing "oldname" can not be imported by this script. And again, if "oldname" is reused, the trouble doubles or even more.

What can be done here, is to save the history with -F, check the history entries for such commands and do a preprocessing. That is remove all entries of a deleted element, but only prior to the actual deletion. This is not easy to do. For performance reasons the perl script asks only for specific records in the history file. The DELETE records are not retrieved. So this requires some manual interaction on both sides. And similar rename all entries of a renamed element, again only prior to the actual rename. Then an import with -f should work.