views:

306

answers:

6

I have a closed source project that is built on my open source framework. I want to know how I should structure my workflow. Below is my best guess using git with submodules.

  1. I create a public framework repo on github with submodules that are separate git repos.
  2. I purchase a "micro" account on github ($7) so I can have a private repo.
  3. I create a private repo and clone the public framework repo.

From here I can make changes to:

  1. My private code and push to my private repo on github
  2. The public framework code and push to my private github repo and then send a pull request from the public framework..? Or how would this work?

How do I handle a repo that contains private and public code and submodules. Right now it seems like I just have to maintain two separate codebases to achieve this.

I'm looking for the best answer that can help someone fairly new to git streamline the process of working on a codebase that is half open source and half private. One good thing about it is that each folder is either private or public so there is no worry about having private and public files together somewhere - yet some of the private folders might be in public ones!

Another example I could give would be using zendframework to build your private company site while still being able to do pulls each day (and maybe patch pushes) to the zend repo. And also pulls and pushes of your private site inside the zendframework.

For example, imagine a directory structure like this:

/private_folder
/public
        /public_folder
        /public_folder2
        /private_folder

Perhaps I'm asking two much to handle them all in one joined repo directory. Maybe there is no easy way to do this and I should separate them and do all the public patches in one and then just pull into my private repo. Of course, this means that if I am in the middle of working on some private code - I'll have to leave that repo and go open up the public one and make the patched code change, then go back to the private one, merge, and then continue working on the private code.

+2  A: 

You can have a 'public' and 'private' branch in your local repository. When you push, each branch gets pushed to a separate remote repository (look up the 'git push' syntax). Then, you can freely merge from public to private.

I'm sure there's a way you could merge selected changes from private to public, too, though I'd have to look it up.

Dietrich Epp
A: 

Make the public repo a submodule inside the private one. When pushing, remember you have to push them both. Also remember to check in the submodule itself in the private repo, so it tracks what revisions of the submodule it is using.

Andrew McGregor
+1  A: 

git submodules allows you to define a configuration (see this question), that is a reference to one commit of another component (in another repo).

You can develop both codes (your and the submodules) within the same repo, but when you are talking about multiple private directories within your public code, that calls for a subtree merge strategy.
It will allow you to consider your directories (the private and public ones) as one natural working tree.

And to better manage the push and pull of parts of your global repo to a private one, I would recommend the git subtree script tool.

VonC
+1  A: 

To summarize, I recommend this workflow:

  1. keep it simple; have one working copy for each repository (don't use git submodules)
  2. use your language's tools to package up your framework
  3. setup scripts or light tooling to make context switching fast or automatic

I've used git submodules in the past. I don't think they are a good fit for your use case. The big downsides that jump out at me are:

  • It helps to eat your own dog food when you build (or extract) a framework. Do you expect your framework users to also setup git submodules when they use your framework? I'm guessing not.
  • There is some risk of accidentally publishing your private source code into your open source framework.
  • Git submodules have improved quite a bit in the last year or so, but they still are relatively less well understood. Even competent gitters may struggle with submodules.

Here is one sub-question that I will admit is not so clear cut: "Which workflow makes it easier to bounce back and forth between the OSS framework and the private project?"

  • There is a certain allure to using submodules and having both projects in one tree. This will speed you up perhaps in your text editing, but probably will slow you down (or cause more mistakes than usual) when it comes to committing and pushing.

  • There is a certain allure to having the projects separated. The context switch (from one text editor window to another) may help remind you that the OSS project is for public consumption. For example, it may help discipline you to not to break backwards compatibility and to keep a good changelog. Committing and pushing will be easy relative to the submodule alternative.

One you have decided on your working copies, you'll want to figure out your day to day workflow. It will depend on your language of course. (In Ruby, for example, you might package up your OSS framework as a gem, build it, then have your private code depend on it.) Whatever you pick, setup some scripts (or editor shortcuts perhaps) to help you build your libraries (or packages) quickly, perhaps even automatically when files change, so that you can bounce between your framework and project effortlessly.

David James
I've been reading a lot in the progit.com book lately and I think that you are right - it would be best to keep everything separate if for simple reason that I might accidently push private code into the public repo. The other reason is that I will have to adopt this method anyway for additional projects that depend on the framework since I'm not going to keep adding more and more sites to one combined repo.
Xeoncross
+2  A: 

I recommend not to use git submodules, but 2 different repositories that are not connected on github.

You could build the relationship between them using symlinks on the checked out copies, which is basic and simple. The symlinks only have to be created once per location (production, development, coworkers).

The advantage is that nobody has to do the extra effort to learn and maintain git submodules, and you avoid the risk and complexity it brings.

It could be done by keeping a working copy of the os and of the private git repo somewhere on your local machine:

/repos/myproject-os
/repos/myproject-priv

Then you could create create your directory structure where the project actually will live and be worked on somewhere else on this machine (not inside the /repos/ tree) and create symblinks for the subdirectories you use:

ln -s /repos/myproject-os/dir1 /wrk/myproject/base/dir1
ln -s /repos/myproject-os/dir2 /wrk/myproject/base/dir2
ln -s /repos/myproject-priv/dir1 /wrk/myproject/base/dir3
ln -s /repos/myproject-priv/dir2 /wrk/myproject/base/someother/dir4
mkdir /wrk/myproject/base/config
mkdir /wrk/myproject/base/tmp

That way you have the repository structure always clean and can mix and arrange the directories from both repositories the way you want them, and you have also a space for local configs or temp files that do not go into the repositories.

You would do the git commits and everything from the /repos/ tree and your project would run and you would edit the files from the /wrk/ tree. Please note that the .git diretory where the git data lives would not be available form the /wrk/ tree, because you only link to subdirectories (or possibly single files from the root directory).

Part2: You say you want to make sure that you do not accidently push private code into the public repository. You could set up an additional git repository between your working OS repository and the github repository, let's say you put it into /repos/gatekeeper, then your tree looks like this:

/repos/gatekeeper/myproject-os
/repos/myproject-os
/repos/myproject-priv

Every time you push from /repos/myproject-os it goes to /repos/gatekeeper/myproject-os. But from /repos/myproject-priv you push directly to your private github repo.

That way you have the same workflow in both /repos/myproject-os and /repos/myproject-priv and you don't need to worry so much. From time to time when you want to push your changes to the real OS codebase, you go to /repos/gatekeeper/myproject-os and push from there to github.

You could do additional code review before that and look at the diffs so you are sure that only that what you really want goes public.

If you want additional security the /repos/gatekeeper/myproject-os could also be on a different machine or even different location.

Sven Larson
Nice idea. I'm guessing that for a lot of folders (I have hundreds) you could create a shell script that would create all the folders for you on each machine taking a simple `repo_path` variable into account.
Xeoncross
I'm going to award you the bounty because what you are saying would work - but I don't think I'll actually do this. Much to hard of a setup for several people in a team and with multiple projects like this.
Xeoncross
A: 

There's two approach here:

  1. You could use branch's of the same git repo. In your private repo create a branch with a reference to your public repo and handle both like that.

  2. If the components using in your private project are sub-project of your public stuff, then you should use submodules. The handling of submodule is in a kind-of early stage on git at version 1.6.6, but could be useful as your using subproject.

What is seems to me you can't loose if which project tribute to each project, so if you have that clear, then no matter what you choose it'll work !!!!!!. Besides git is easy.

erick2red