tags:

views:

61

answers:

1

This is probably obvious and has been asked many times in different ways before, but I have not been able to find the answer after searching for some time.

Assume the following:

  • I have, say, a 500GB disk at the local end;
  • I have a 100 terabyte remote repository; therefore, the cost of cloning the entire repository is simply not feasible;
  • the working directory used to create the remote repository was composed of 1000 top level directories DIR001, DIR002, ... DIR00N each containing multiple subdirectories with files only under the leaf subdirectories (Ex. DIR001/subdir1/fileA1 ... DIR001/subf1/fileAN and DIR001/subdir2/fileB1 ... DIR001/subdir2/fileBN, ...
  • I did NOT explicitly tag or branch directories DIR001, DIR002, ... DIR00N or anything else for that matter
  • I init a brand new local git repository

How do I efficiently pull or fetch the last committed versions of, say, DIR001/subdir2/fileB1 ... DIR001/subdir2/fileBN from the remote repository and nothing else?

AND

just the last committed version of a single file from DIR001/subdir2/fileB1 ... DIR001/subdir2/fileBN from the remote repository and nothing else?

AND

How do I efficiently pull or fetch a previously committed version of a subset of said files and nothing else?

Maybe fetch/pull is not the correct command for this.

+2  A: 

The answer to "Partial cloning" can help you start experimenting with shallow clones.
But it will be limited:

  • to a certain depth, and/or to certain branches,
  • but not to certain files or directories (you can get a file or directory though sparse checkout, but you still have to get the full repo first!)

The real solution would be to separate the huge remote repo into submodules though.
See What are Git limits or Git style backup of binary files for illustrating this kind of situation.

VonC