views:

5427

answers:

13

For one and a half years, I have been keeping my eyes on the git community in hopes of making the switch away from SVN. One particular issue holding me back is the inability to lock binary files. Throughout the past year I have yet to see developments on this issue. I understand that locking files goes against the fundamental principles of distributed source control, but I don't see how a web development company can take advantage of git to track source code and image file changes when there is the potential for binary file conflicts.

To achieve the effects of locking, a "central" repository must be identified. Regardless of the distributed nature of git, most companies will have a "central" repository for a software project. We should be able to mark a file as requiring a lock from the governing git repository at a specified address. Perhaps this is made difficult because git tracks file contents not files?

Do any of you have experience in dealing with git and binary files that should be locked before modification?

NOTE: It looks like Source Gear's new open source distributed version control project, Veracity, has locking as one of its goals.

A: 

I would not expect file-locking to ever make it as a feature in git. What kind of binary files are you primarily interested in? Are you actually interested in locking the files, or just preventing conflicts caused by not being able to merge them.

I seem to remember someone talking (or even implementing) support for merging OpenOffice-documents in git.

JesperE
+3  A: 

It's worth examining your current workflow to see if locking images is really necessary. It's relatively unusual for two people to independently edit an image, and a bit of communication can go a long way.

Khoth
+5  A: 

I agree that locking binary files is a necessary feature for some environments. I just had a thought about how to implement this, though:

  • Have a way of marking a file as "needs-lock" (like the "svn:needs-lock" property).
  • On checkout, git would mark such a file as read-only.
  • A new command git-lock would contact a central lock server running somewhere to ask permission to lock.
  • If the lock server grants permission, mark the file read-write.
  • git-add would inform the lock server of the content hash of the locked file.
  • The lock server would watch for that content hash to appear in a commit on the master repository.
  • When the hash appears, release the lock.

This is very much a half-baked idea and there are potential holes everywhere. It also goes against the spirit of git, yet it can certainly be useful in some contexts.

Within a particular organisation, this sort of thing could perhaps be built using a suitable combination of script wrappers and commit hooks.

Greg Hewgill
The biggest problem I see is git is wholly intended to work offline. Although, as you say, you can use build custom scripts to implement this. Beyond that, I'd be tempted to have a 'lock' branch which gets pushed and pulled from a remote. All it has is the lock table, replacing the lock server.
Michael Johnson
+1  A: 

Here are a few examples of binary files that may be updated by more than one person. Keep in mind that real-time communication is difficult for remote teams:

  • ERWin data models
  • OneNote
  • Visio
  • PowerPoint
  • Miscellaneous site images

In particular, I think the developers find comfort in knowing their hard work won't go to waste because someone else made a quick change and check-in.

Mario
But that is the whole point of version control. If someone overwrites a binary file when they shouldn't have, a simple "git log <path>" will show who made the most recent change. And their hard work can't be "wasted" because you can always revert a commit. No information is ever lost.If people are overwriting files they shouldn't be, that is a social problem, not a technological one.
haxney
That social problem is derived from absence of good communication tool to communicate about locks. Svn provides that tool, git does not. Email, chat and phone are terrible communication tools compared to svn or database record locking.
alpav
+5  A: 
Jörg W Mittag
+3  A: 

We've just recently started using Git (used Subversion previously) and I have found a change to workflow that might help with your problem, without the need for locks. It takes advantage of how git is designed and how easy branches are.

Basically, it boils down to pushing to a non-master branch, doing a review of that branch, and then merging into the master branch (or whichever the target branch is).

The way git is "intended" to be used, each developer publishes their own public repository, which they request others to pull from. I've found that Subversion users have trouble with that. So, instead, we push to branch trees in the central repository, with each user having their own branch tree. For instance, a hierarchy like this might work:

users/a/feature1
users/a/feature2
users/b/feature3
teams/d/featurey

Feel free to use your own structure. Note I'm also showing topic branches, another common git idiom.

Then in a local repo for user a:

feature1
feature2

And to get it to central server (origin):

git push origin feature1:users/a/feature1

(this can probably be simplified with configuration changes)

Anyway, once feature1 is reviewed, whomever is responsible (in our case, it's the developer of the feature, you could have a single user responsible for merges to master), does the following:

git checkout master
git pull
git merge users/name/feature1
git push

The pull does a fetch (pulling any new master changes and the feature branch) and the updates master to what the central repository has. If user a did their job and tracked master properly, there should be no problems with the merge.

All this means that, even if a user or remote team makes a change to a binary resource, it gets reviewed before it gets incorporated into the master branch. And there is a clear delineation (based on process) as to when something goes into the master branch.

You can also programmatically enforce aspects of this using git hooks, but again, I've not worked with these yet, so can't speak on them.

Michael Johnson
+2  A: 

Michael,

Your work-flow describes a way of catching conflicts before they are merged into the master repository. I was hoping to prevent developers from wasting time modifying a binary file that is already in the process of changing. It seems this is quite difficult to do using distributed version control.

Mario
+1  A: 

I have discussed this issue on git discussion groups and have concluded that at this time, there is no agreed upon method of centralized file locking for git.

Mario
+11  A: 

In response to Mario's additional concern with changes happening in multiple places on the binaries. So the scenario is Alice and Bob are both making changes to the same binary resource at the same time. They each have their own local repo, cloned from one central remote.

This is indeed a potential problem. So Alice finishes first and pushes to the central alice/update branch. Normally when this happens, Alice would make an announcement that it should be reviewed. Bob sees that and reviews it. He can either (1) incorporate those changes himself into his version (branching from alice/update and making his changes to that) or (2) publish his own changes to bob/update. Again, he makes an announcement.

Now, if Alice pushes to master instead, Bob has a dilemma when he pulls master and tries to merge into his local branch. His conflicts with Alice's. But again, the same procedure can apply, just on different branches. And even if Bob ignores all the warnings and commits over Alice's, it's always possible to pull out Alice's commit to fix things. This becomes simply a communication issue.

Since (AFAIK) the Subversion locks are just advisory, an e-mail or instant message could serve the same purpose. But even if you don't do that, Git lets you fix it.

No, there's no locking mechanism per se. But a locking mechanism tends to just be a substitute for good communication. I believe that's why the Git developers haven't added a locking mechanism.

Michael Johnson
Any source control system is a better way to communicate between developers, because it's structured. Email, chat or phone is worse because it's not structured. So when people say that they will resort to communication by email, chat or phone instead of using scm, it is wrong.Keeping source code and organizing communication between developers are 2 parts of any SCM and git solves only one part when svn solves both.
alpav
The important point in my mind is that a locked file is read-only on disk, and an unlocked file is RW. This means when someone tries to edit a locked file, their editor will at least warn them the file is RO. At this point they are prompted to communicate with whoever has locked the file, to find out if their changes are redundant, complementary, or incompatible. Without the VCS changing file permissions, the user isn't automatically prompted to communicate, and it's left up to their fallible memory and procedures.
KeyserSoze
@keysersoze: But the point of a DVCS (such as git) is that everyone is free to make changes as desired. So any sort of lock would have to be advisory only. What you're looking for is a centrally managed VCS (Perforce, TFS). I don't think DVCSes are designed to do what you're looking for.
Michael Johnson
+9  A: 

Subversion has locks, and they aren't just advisory. They can be enforced using the svn:needs-lock attribute (but can also be deliberately broken if necessary). It's the right solution for managing non-mergeable files. The company I work for stores just about everything in Subversion, and uses svn:needs-lock for all non-mergeable files.

I disagree with "locks are just a communication method". They are a much more effective method than push-notifications such as phone or e-mail. Subversion locks are self-documenting (who has the lock). On the other hand, if you have to communicate by other traditional push-notification channels, such as e-mail, who do you send the notification to? You don't know in advance who might want to edit the file, especially on open-source projects, unless you have a complete list of your entire development team. So those traditional communication methods aren't nearly as effective.

A central lock server, while against the principles of DVCS, is the only feasible method for non-mergeable files. As long as DVCS don't have a central lock feature, I think it will keep the company I work for using Subversion.

The better solution would be to make a merge tool for all your binary file formats, but that's a longer-term and ongoing goal that will never be "finished".

Here's an interesting read on the topic.

Craig McQueen
Exactly right. A DVCS isn't designed to be centrally controlled. However, it might be feasible to build a centrally controlled system on top of a DVCS, which gets you the power that most DVCSes can provide along with the central control needed in some situations.
Michael Johnson
+1  A: 

What about cad files? If the files aren't locked, to be kept read-only as well, most cad programms would just open them an change arbitrary bits, seen as a new file by any vcs. So in my view, locking is an ideal means for communicating your intend to change some particalur file. As well, it prevents some Software to gain write access in the first place. This allows updates of the local files, without the need to close the software or at least all files entirely.

A: 

Im not suggesting to use git at my company for the same problem. We use EA for all our designs and microsoft word for documentation, we don't know in advance who may edit a particular file so exclusive locking is our only option.

Hernan Rajchert
+1  A: 

git will work very well in a non-team environment where each developer is solely responsible for a piece of code or file, because in that case communication about locks is not needed.

If your organization requires team environment (usually to strip developers from job security), then use svn, git is not for you. Svn provides both - source control and communication between developers about locks.

alpav