Posted on Tue, Apr 12, 2011
by Kendrick Burson
In my last post in this series on Source Code Management, I gave you a brief overview of some tools. This is my final article on the topic of SCM, and I’d like to delve more deeply into Subversion (SVN) and some other solutions.
As I mentioned previously, one major drawback to most of the systems discussed so far is the bandwidth issue. When you log in to an AccuRev client, or StarTeam, or others, the start-up time is very long. The client needs to pass entire file version back and forth across the wire to synchronize your working copy. This is the runtime strategy of the open source CVS (Concurrent Versioning System), and one of the main reasons for the creation of SVN. CVS was developed in 1986. It was built on the Unix tool RCS. CVS became ubiquitous in the development community at large and was the main SCM for the public SourceForge site for many years, until Subversion came along.
Another failing of the CVS system was that it did not track file tree structure changes — rename a file or folder, move a file or folder. In CVS, these actions created entirely new copies of the file with new histories. SVN aimed at tracking every change to your local workspace: renames, moves, additions, deletions, modifications, even changes to metadata like properties of files on folders.
One of the groundbreaking features of SVN was the atomic commit. The atomic commit assures that either all the changes you are attempting to commit are accepted and stored successfully on the central server, or none of them.
On other systems you are able to commit a collection of files; some will get stored and others will be kicked back with merge collisions. AccuRev does this with its auto-merge and ‘trickle down’ of merges when promoting up stream. The problem with this is that the public repository is left in an inconsistent state. Not all the changes required to build your version of the code are stored together, thus the project will likely fail to compile until you fix the merge collisions and complete the check-in as intended. I have seen this cause teams to lose several days while waiting for merges to be corrected. Often times what happens is that the changes are rolled back and you are left trying to navigate the meandering path of version diffs. Painful to say the least.
A big feature that I liked when SVN first came out was the binary deltas. When using CVS, if you checked in a Word document, and image, or any other file format that was not strictly text, it was marked as binary and a complete copy of the file was stored, not the deltas. With this, you could not use the diff tools to view what changed from one check-in to the next. SVN made sure that they had a binary intelligent diff/merge tool.
Finally SVN created the idea of the global revision ID. In an SVN repository, there is one auto-incrementing revision ID, the global ID. On every change committed, this ID is incremented by 1. Each file and folder is marked with their local revision ID, which is the last global revision ID that a change was committed to that file or folder, as well as the current global ID. This global ID becomes the uniqueness stamp for the entire repository. With this you can ask to see every file within a folder as of a specific global ID. When you create tags (labels), they mark a specific global ID to have special importance (release). When you create branches, you can branch off any global ID and receive the precise version of every file within that folder that existed at the time that global ID was created.
By now you have guessed that I am a fan of SVN. Subversion was first created in 2000 by the talented developers and architects at CollabNet. Since then it has quietly taken over the open source community and much of the commercial community as well. Not only is this tool free, but it does not constrain your work flow style, it does exactly what you want it to do, and does it very well. CollabNet continues to support SVN but has since handed stewardship over to the Apache Software Foundation, another credo for SVN and CollabNet.
SVN has tool integration into all the main platform IDEs, including XTools for Mac. On Windows, the TortoiseSVN is the de facto example of simplicity of form and function for working with an SCM. If you are using Visual Studio, Eclipse, IntelliJ, or other tool, the integrations make SVN translucent (you see it, but you don’t).
In the past 10 years there has been an ever-increasing push for distributed development, and with that the inception of Distributed Version Control. We discussed this a bit earlier. Two of the most popular solutions are Mercurial and Git. Unfortunately I do not have enough experience with either to speak with any depth about their relative strengths or weaknesses. What my research has shown is that both are solid solutions with a good working model for the future, but the client tools are still in their infancy and therefore their adoption is limited. I would wait until they offer more IDE integration and better client tools, especially for those who do not like the command line.
There is one more topic I would like to discuss here, and that is the idea of offsite hosting. Many software shops have a high paranoia around their IP. They hide everything behind thick firewalls and prohibit any form of data transfer that takes the source code outside of their direct control. These companies always opt for in-house hosting of SCM systems.
Today’s security protocols are increasingly effective such that hosting your SCM offsite becomes a viable and economically rewarding solution. There are many hosting sites that will host various SCM repositories, CVS, SVN, Git … For around $10 per month, you can host several gigabytes worth of file histories with unlimited users. These users can access the source using SSL and password protected accounts from anywhere in the world. This is the true meaning of distributed development. These hosting playgrounds are responsible for the uptime of the servers as well as data integrity and regular maintenance backups. This frees your team, and your company, to focus on doing what you really want to do — make great software.
If you want more information about these SCM solutions, or you want information about an SCM tool I have not discussed here, please check the available information on Wikipedia on its List of revision control software and Comparison of revision control software.
Hopefully now you have a better understanding of the many aspects of SCM systems and how those features affect your team’s performance and ability to develop code efficiently and reliably. Also, you have some idea of what to look for when choosing an SCM solution for your Release Management Stack. Now we can build on this technology a suite of Build Automation tools to make our release process more repeatable and reliable. We will save that for my next series of posts!
‘Til then — Keep it simple, keep it clean, make it great!
Posted on Thu, Apr 07, 2011
by Kendrick Burson
In my last post in this series, I gave an overview of managing your history and metadata. Now let’s look at some tools.
ClearCase has been around since the 90s. It is one of the most expensive options I have seen, and the most complicated to set up and maintain. There are few, if any, IDE integrations for ClearCase, unless you are an all IBM shop (sound familiar?). ClearCase offers excellent branching and merging utilities, possibly best in class, but with a price tag upwards of $3,000 per seat and the expertise required for care and feeding of this system, few but the largest companies choose this option.
AccuRev is not quite as expensive ($1000+/user) and less popular. There are fewer integration solutions for AccuRev due to its smaller user base. One of the largest drawbacks to AccuRev, which they probably think is their selling point, is that they try to reinvent the SCM solution space, creating their own metaphors and terminology. This is a company that is trying to reinvent the universe in their own image.
AccuRev tries to do too much, combining issue tracking and branch automation into the standard SCM. I have seen the AccuRev ‘Streams’ model abused heavily where several days are lost every month in automated merge hell — you really don’t want to go there. AccuRev can be a good solution if you have a full time configuration management team that maintains a strict policy on branching and merging procedures.
One nice feature of AccuRev is the centrally-stored workspace. When a user creates a workspace from a stream (branch), they create both a centrally-stored copy and a local working copy of the files they wish to work on. AccuRev maintains a synchronization between the central and local copies. When the user wants to promote their changes, both the central and local working copies are synchronized with the public branch. If at anytime their hard drive crashes, or they lose their computer, all work in progress is saved in the central copy of their workspace. Unlike the distributed model of SCM systems, you cannot work with the SCM system without being logged into the central server.
Another all-in-one solution is StarTeam from Borland (remember them?). This is another high-end (pricey) package that tries to coerce your workflow into their established patterns. I do not have experience with the care and feeding of StarTeam systems, but I have used it as a client. I like the integration of issue tracking with the revision history. It is an adjustment, and maybe does not fit the style of your team.
One of the more popular commercial offerings available is Perforce. Perforce has a lower cost of entry (about $700/user), solid client interface, integrations into all the main development IDEs, and some unique features with proxy servers and the like.
One major drawback to most of the systems discussed so far is the bandwidth issue. When you log in to an AccuRev client, or StarTeam, or others, the start-up time is very long. The client needs to pass entire file version back and forth across the wire to synchronize your working copy. This is the runtime strategy of the OpenSource CVS (Concurrent Versioning System), and one of the main reasons for the creation of SVN.
CVS was developed in 1986. It was built on the Unix tools RCS. CVS became ubiquitous in the development community at large and was the main SCM for the public SourceForge site for many years, until Subversion (SVN) came along.
SVN (Subversion) had one goal in mind: Take what CVS offered and fix what the community learned to be bad practices and shortcomings. One tenet exposed was that bandwidth is expensive while storage is cheap. To this end, SVN stores two complete copies of the files you check out to your workspace. One is a reference copy of the actual version you last synchronized to — this is stored in a hidden folder names “.svn” within each folder of your workspace. The second copy is the actual working copy in your workspace. The local SVN client tools can run diffs against the local reference copy without talking to the central server, thus saving bandwidth. On a commit, it can send only the deltas for each file, thus limiting bandwidth usage. The caveat is that you store two copies of every file on your local file system. With the price of terabyte storage devices under $100, this is not even an issue. The gains you get in speed and responsiveness of the system are priceless.
Next, I’ll delve more deeply into CVS, SVN, and two tools for distributed version control.