tags:

views:

23

answers:

0

Ok, based on the answer in Fetch/Pull Part of Very Large Repository? let's change the organization of the data to try and accomplish my goal of efficient partial checkout and commit of changes within the partial checkout.

Assume the following:

  • I have directories DIR001, DIR002, ... DIR00N spread across 100 terabytes of data;

  • each DIR00N has a path DIR00N/SUBDIR_A00M/SUBDIR_B00P/ (there is a top level directory with a fixed hierarchy of child directories followed by one or more files at the leaves of the directory tree and only at the leaves; in my first question I simplified things a bit, since there is actually a deeper hierarchy of directories leading to the leaf files);

  • I want to control the fetching of a single leaf directory to minimize space at the local end AND I want to minimize the time of fetching the contents of this leaf directory both when starting with a brand new repository at the local end.

So would the following strategy work on the server to create a remote repository:

  • create an independent repository for each leaf directory SUBDIR_B00P;

  • create an independent repository for each top level directory DIR00N;

  • within the DIR00N repository, submodule add each repository SUBDIR_B00P under DIR00N/SUBDIR_A00M/SUBDIR_B00P working directory paths;

  • locally clone the remote DIR00N repository (my assumption is that I will now have references to all of the SUBDIR_B00P repositories within the cloned DIR00N repository; I don't really want all of these references, just the one reference to the SUBDIR_B00P I'm interested in at that particular moment in time, but I figure that this might be pretty fast, so I might be able to live with this (ultimately there might be around 50,000 to 300,000 SUBDIR_00BP);

  • locally submodule update to get the contents of the desired submodule from the DIR00N repository, thus, treating the DIR00N repository as a sort of proxy to grab the submodule from the SUBDIR_B00P repository?