views:

553

answers:

3

I have two related repositories, a master, which contains a number of sensitive files which must not be leaked, and a 'public' version, created with hg convert with --filemap to exclude the sensitive files and directories.

I would like further updates to the master that don't affect the sensitive files to be pushable to the slave, and updates to the slave to be pullable by the master. Right now this doesn't happen, as they are considered 'unrelated' repositories

If this is possible with Git, but not with Mercurial, migration is a possibility, though it will be a nuisance since some development happens on Windows machines. The slave is not yet seeing active outside use, so it's possible to nuke it and recreate it another way if necessary. It is even possible, if absolutely necessary, to dump the master entirely and re-clone from the slave, and then leave all of the sensitive portions completely unversioned, but I would greatly prefer not to have to do this, since some of those files are changing, and I'd like to keep track of those changes.

Does anyone have any good ideas?

Update: I've been poking at the documentation on Git -- can a "push all files except these" command be easily implemented using the Git staging area?

Update 2: This doesn't help me, but it might help someone with a similar issue: you can use hg convert --filemap repeatedly and it will only track the updates to the master, but this only works if the destination repository is written via filesystem, and won't work over the wire. It also doesn't help in the opposite direction, of course.

+1  A: 

The easiest solution would probably be to simply put a clone of 'slave' inside 'master'. The inner clone will be ignored by the outer clone -- so when you commit changes to the secret files you wont risk mixing them with the public files. Pushing and pulling will works like normal and you would have to do it twice, once for the inner and once for the outer.

Alternatively, you can forcibly pull 'slave' into 'master'. This will create a repository with two root revisions, but that is not a problem. After a merge you will have a mixed repository. New changesets can then be pulled in from 'slave' with no further warnings since the two repositories have become related.

If you change a public file in the mixed repository, you must first ensure that you have updated to a changeset that came from the 'slave' repository. Otherwise you wont be able to commit and the new changeset to 'slave'. So if you are at [s3]:

... [s1] --- [s2] ---[s3]
            /
... [p1] --/

and you want to update a public file, you need to update to [p1] first:

% hg update p1

and then edit and commit

% hg commit -m 'Public change'
(created new head)

Your history graph will now look like this:

... [s1] --- [s2] ---[s3]
            /
... [p1] --/-[p2]

and you will want to push [p2] to 'slave':

% hg push p2

and merge it with [s3]:

% hg merge

after which you will get

... [s1] --- [s2] ---[s3] --- [s4]
            /                /
... [p1] --/-[p2] ----------/

Other solutions can be found at the bottom of the NestedRepositories page on the Mercurial wiki. I believe the subrepo extension is being worked on for Mercurial 1.3, but we'll have to see (release date is July 1st).

Martin Geisler
Hm. I'm not liking either of these solutions. If I'm keeping a copy of 'slave' inside of 'master', then effectively I have to make all changes twice, manually, which kind of nulls the point of having the repositories linked.The second solution seems very fragile; i.e. it will break if someone is just working on the master without specifically reverting to an older revision. I guess what I want is a "push all files except these" command.Nothing inside NestedRepositories seemed applicable. These aren't nested projects -- they're mostly the same project.
Zed
With the first suggestion you wont have to make changes twice: a change made to `/slave/a.txt` will be commited to the slave repository and will be pushable from there to other clones of the slave repository.You will simply be working with two distinct repositories, where one just happens to be clones inside another.
Martin Geisler
You do have to make changes twice, one to a.txt and one to slave/a.txt. Any change made to a slave will definitely have to be made to the master. It's just that some of the files on the master should never, ever be exported to a slave, while all of the rest must be.
Zed
+1  A: 

I think I need more space than what a comment box will give me... :-)

Let me try to explain what I mean by an inner repository. We have two normal, separate repositories: 'slave' and 'master'. You put all your public files in 'slave' and put the secret files in 'master':

master/
  secret-file.txt
  another-secret.txt
  more-secrets.jpeg

slave-clone/
  public.png
  more-public.txt

You then combine them by making new clones:

% hg clone master master-clone
% cd master-clone
% hg clone ../slave slave-clone

You now have a clone of 'slave' inside a clone of 'master':

master-clone/
  secret-file.txt
  another-secret.txt
  slave-clone/
    public.png
    more-public.txt
  more-secrets.jpeg

In this repository you will see that 'slave-clone' is ignored by all Mercurial commands, except when you change directory to it. So you can do

% echo 'evil plans' > plans.txt
% hg add plans.txt
% hg commit -m 'Secret stuff'

and you will thus have made a commit on the outer repository. To edit some public stuff you enter 'slave-clone' and commit there, just like normal.

You can pull/push between 'slave' clones and 'master' clones like normal, since they are, well..., just normal clones. You said that the projects aren't secret/public files should not be seen as nested projects, but I'm suggesting that you make this distinction in order to split them into two directories.

Note that I'm assuming that it's okay for you to confine all pubic files to a subdirectory of the secret files. If you rename 'slave-clone' to 'public' or 'announcements' or similar, I don't think it sounds too far fetched :-)

Otherwise you might be able to do something with a bunch of symlinks so that you "join" the repositories by symlinking the files into a single directory.

Martin Geisler
Unfortunately, due to application requirements and hard-coded paths, I actually can't confine things this way. Nonetheless, it turned out that it was possible to symlink *almost* everything, and the few components that couldn't be symlinked for various reasons (Woohoo, let's hear it for the utterly broken symlink system in Windows) turned out to be easy to hand manage or automatically regenerate. I'll be writing my own solution shortly, but I'm at least upvoting this since the basic principle is sound.
Zed
Okay. I'm glad you managed to solve it another way.
Martin Geisler
+1  A: 

Okay, it seems like the real answer is, "Mercurial can't do what you want" (which is to have synchronization between a repository and a branch which is a strict subset of that repository) short of using some hideously inconvenient system like the patch queues. Git might have, but since its Windows port isn't yet ready for production use, I didn't look that hard into it.

However, it did turn out to be possible to reorganize the project in such a way that it was possible to split up the public and private portions into separate repositories with only relatively minor loss of history. (Actually, it got split three ways while we were at it -- the public section, a public-but-machine-specific section that we put in subdirectory local/ (so it could be cloned to the bulk of the machines with nearly identical specs without having to maintain and merge extra branches on the few weird ones), and a private section that we put in subdirectory private/.) The trick turned out to not try to put a cleaned slave inside the master, but to split out the private/local parts from the master and put them as (pseudo-)sub-repositories inside the slave repository on the master machine.

Stage 1 was to move files into the private/ and local/ directories, remove them from the master repository, and add them to the .hgignore as needed when the replacement symlinks would have been problematic on other machines. Thankfully, about 95% of the stuff we had to move this way wasn't location-sensitive, and the rest we could manage by hand. Those two directories then became repositories of their own.

Stage 2 we had basically already done: a hg convert using --filemap to create a public version of the repository where all traces of private data had been removed. (This actually required a little bit of filemap tuning: you have to exclude not only all the current private data filenames, but also any filenames they might have had in the past. Mercurial's ability to track file moves/renames seems to be not entirely robust.)

At this point, the .hg/ directory on the master got moved to a backup location, and a fresh pull was made from the cleaned, public repository. We then ran our tests to make sure everything was working, and started cloning the new base repository and selected local repositories off to slave machines, and tested those, and everything seems to be okay, though we had to tweak a few places where symlinks stopped working (mostly on Windows boxes, though in one spot, curiously enough, on the Linux side, and I'm going to have to dig into why the symlink wasn't followed when I get some spare time, though we solved the problem when it was discovered that that one particular file could, and really should, be automatically regenerated).

Zed
"Synchronization between a repository and a branch which is a strict subset of that repository" is referred to as "narrow cloning" and is a subject of lots of discussion and experiment in the DVCS world. As of this moment neither Mercurial nor Git really has it figured out. It will be *very* nice when they do.
quark