views:

478

answers:

3

Is it possible to have shallow submodules? I have a superproject with several submodules, each with a long history, so it gets unnecessarily big dragging all that history.

All I have found is this unanswered thread.

Should I just hack git-submodule to implement this?

+2  A: 

Reading through the git-submodule "source", it looks like git submodule add can handle submodules that already have their repositories present. In that case...

$ git clone $remote1 $repo
$ cd $repo
$ git clone --depth 5 $remotesub1 $sub1
$ git submodule add $remotesub1 $sub1
#repeat as necessary...

You'll want to make sure the required commit is in the submodule repo, so make sure you set an appropriate --depth.

Edit: You may be able to get away with multiple manual submodule clones followed by a single update:

$ git clone $remote1 $repo
$ cd $repo
$ git clone --depth 5 $remotesub1 $sub1
#repeat as necessary...
$ git submodule update
Ryan Graham
+1  A: 

Are the canonical locations for your submodules remote? If so, are you OK with cloning them once? In other words, do you want the shallow clones just because you are suffering the wasted bandwidth of frequent submodule (re)clones?

If you want shallow clones to save local diskspace, then Ryan Graham's answer seems like a good way to go. Manually clone the repositories so that they are shallow. If you think it would be useful, adapt git submodule to support it. Send an email to the list asking about it (advice for implementing it, suggestions on the interface, etc.). In my opinion, the folks there are quite supportive of potential contributors that earnestly want to enhance Git in constructive ways.

If you are OK with doing one full clone of each submodule (plus later fetches to keep them up to date), you might try using the --reference option of git submodule update (it is in Git 1.6.4 and later) to refer to local object stores (e.g. make --mirror clones of the canonical submodule repositories, then use --reference in your submodules to point to these local clones). Just be sure to read about git clone --reference/git clone --shared before using --reference. The only likely problem with referencing mirrors would be if they ever end up fetching non-fast-forward updates (though you could enable reflogs and expand their expiration windows to help retain any abandoned commits that might cause a problem). You should not have any problems as long as

  • you do not make any local submodule commits, or
  • any commits that are left dangling by non-fast-forwards that the canonical repositories might publish are not ancestors to your local submodule commits, or
  • you are diligent about keeping your local submodule commits rebased on top of whatever non-fast-forwards might be published in the canonical submodule repositories.

If you go with something like this and there is any chance that you might carry local submodule commits in your working trees, it would probably be a good idea to create an automated system that makes sure critical objects referenced by the checked-out submodules are not left dangling in the mirror repositories (and if any are found, copies them to the repositories that need them).

And, like the git clone manpage says, do not use --reference if you do not understand these implications.

# Full clone (mirror), done once.
git clone --mirror $sub1_url $path_to_mirrors/$sub1_name.git
git clone --mirror $sub2_url $path_to_mirrors/$sub2_name.git

# Reference the full clones any time you initialize a submodule
git clone $super_url super
cd super
git submodule update --init --reference $path_to_mirrors/$sub1_name.git $sub1_path_in_super
git submodule update --init --reference $path_to_mirrors/$sub2_name.git $sub2_path_in_super

# To avoid extra packs in each of the superprojects' submodules,
#   update the mirror clones before any pull/merge in super-projects.
for p in $path_to_mirrors/*.git; do GIT_DIR="$p" git fetch; done

cd super
git pull             # merges in new versions of submodules
git submodule update # update sub refs, checkout new versions,
                     #   but no download since they reference the updated mirrors

Alternatively, instead of --reference, you could use the mirror clones in combination with the default hardlinking functionality of git clone by using local mirrors as the source for your submodules. In new super-project clones, do git submodule init, edit the submodule URLs in .git/config to point to the local mirrors, then do git submodule update. You would need to reclone any existing checked-out submodules to get the hardlinks. You would save bandwidth by only downloading once into the mirrors, then fetching locally from those into your checked-out submodules. The hard linking would save disk space (although fetches would tend to accumulate and be duplicated across multiple instances of the checked-out submodules' object stores; you could periodically reclone the checked-out submodules from the mirrors to regain the disk space saving provided by hardlinking).

Chris Johnsen
+2  A: 

Following Ryan's answer I was able to come up with this simple script which iterates through all submodules and shallow clones them:

#!/bin/bash
git submodule init
for i in $(git submodule | sed -e 's/.* //'); do
    spath=$(git config -f .gitmodules --get submodule.$i.path)
    surl=$(git config -f .gitmodules --get submodule.$i.url)
    git clone --depth 1 $surl $spath
done
git submodule update
Mauricio Scheffer