Are the canonical locations for your submodules remote? If so, are you OK with cloning them once? In other words, do you want the shallow clones just because you are suffering the wasted bandwidth of frequent submodule (re)clones?
If you want shallow clones to save local diskspace, then Ryan Graham's answer seems like a good way to go. Manually clone the repositories so that they are shallow. If you think it would be useful, adapt git submodule
to support it. Send an email to the list asking about it (advice for implementing it, suggestions on the interface, etc.). In my opinion, the folks there are quite supportive of potential contributors that earnestly want to enhance Git in constructive ways.
If you are OK with doing one full clone of each submodule (plus later fetches to keep them up to date), you might try using the --reference
option of git submodule update
(it is in Git 1.6.4 and later) to refer to local object stores (e.g. make --mirror
clones of the canonical submodule repositories, then use --reference
in your submodules to point to these local clones). Just be sure to read about git clone --reference
/git clone --shared
before using --reference
. The only likely problem with referencing mirrors would be if they ever end up fetching non-fast-forward updates (though you could enable reflogs and expand their expiration windows to help retain any abandoned commits that might cause a problem). You should not have any problems as long as
- you do not make any local submodule commits, or
- any commits that are left dangling by non-fast-forwards that the canonical repositories might publish are not ancestors to your local submodule commits, or
- you are diligent about keeping your local submodule commits rebased on top of whatever non-fast-forwards might be published in the canonical submodule repositories.
If you go with something like this and there is any chance that you might carry local submodule commits in your working trees, it would probably be a good idea to create an automated system that makes sure critical objects referenced by the checked-out submodules are not left dangling in the mirror repositories (and if any are found, copies them to the repositories that need them).
And, like the git clone
manpage says, do not use --reference
if you do not understand these implications.
# Full clone (mirror), done once.
git clone --mirror $sub1_url $path_to_mirrors/$sub1_name.git
git clone --mirror $sub2_url $path_to_mirrors/$sub2_name.git
# Reference the full clones any time you initialize a submodule
git clone $super_url super
cd super
git submodule update --init --reference $path_to_mirrors/$sub1_name.git $sub1_path_in_super
git submodule update --init --reference $path_to_mirrors/$sub2_name.git $sub2_path_in_super
# To avoid extra packs in each of the superprojects' submodules,
# update the mirror clones before any pull/merge in super-projects.
for p in $path_to_mirrors/*.git; do GIT_DIR="$p" git fetch; done
cd super
git pull # merges in new versions of submodules
git submodule update # update sub refs, checkout new versions,
# but no download since they reference the updated mirrors
Alternatively, instead of --reference
, you could use the mirror clones in combination with the default hardlinking functionality of git clone
by using local mirrors as the source for your submodules. In new super-project clones, do git submodule init
, edit the submodule URLs in .git/config
to point to the local mirrors, then do git submodule update
. You would need to reclone any existing checked-out submodules to get the hardlinks. You would save bandwidth by only downloading once into the mirrors, then fetching locally from those into your checked-out submodules. The hard linking would save disk space (although fetches would tend to accumulate and be duplicated across multiple instances of the checked-out submodules' object stores; you could periodically reclone the checked-out submodules from the mirrors to regain the disk space saving provided by hardlinking).