views:

72

answers:

4

Hi,

I've been trying to workout a good way of doing this fast but I'm not sure what will be the most optimal, I'm hoping some of you more experienced developers can assist via your Data Structures knowledge :-)

Essentially I have a list of paths (Eg. C:\inetpub\wwwroot\, C:\www\websites\vhosts\somesite.com\, D:\www-mirror\websites\vhosts\somesite.co.uk), I have to check that the current file I'm working on (say C:\inetpub\wwwroot\styles\style.css) exists in the preconfigured list of paths.

So what I initially thought was to interate my list of items and do a CurrentFilename.StartsWith(PreconfigureListOfPathsPathName). But I'm iterating through the list quite regularly and it slows down as the list can contain sometimes 10, othertimes 1000 (clients on a server) paths.

What would you suggest as a fast solution for this problem? I'm writing in C# 3.5, this is only a small (but a critical) section of the project.

I thought about binary search trees, breaking down the paths and then doing a treemap and iterating through each path. But I'm not sure if its correct as we could have lots of nodes.

D:\www-mirror\websites\vhosts\somesite.co.uk\
D:\www-mirror\websites\vhosts\somesite.com\
D:\www-mirror\websites\vhosts\somesite.org\
D:\www-mirror\websites\vhosts\somesite.pl\

Tree map:

www-mirror->websites->vhosts->somesite* (has 4 nodes)
www-mirror->blah->woah->okay

But it looks a bit wonky.

A: 

I doubt that iterating through a list of 1000 items is actually your performance bottle neck here. I suspect actually hitting the disk or network share is what is eating up the time. If you're doing disk or network I\O, you need to do it on a worker thread. You don't need a complex structure for just walking 1000 items. You should do some timing to see where your perf issues actually lie...

If you were to post the code you're currently using to do the iteration, that may help get better answers as well.

jeffamaphone
Agree in principle, but if you put the I/O on a worker thread, don't you still have to wait for the reads to finish before you can iterate your items?
Robert Harvey
A: 

Your best bet is to model the allows paths with a tree and treat the examined path as a tree traversal. So you build a structure like:

root
+- C:
|  +- inetpub
|     +- wwwroot
|  +- www
|     +- websites
+- D:
   +- www-mirror

and so on

Alternatively you can simply have a sorted list of paths and do a bisection search on them to find the closest match (that is equal to or less in string comparison terms). If your string starts with that closest match, it's in an allowed directory.

You would have to normalize the inputs in this case (eg all lowercase, make sure all path separators are consistent, etc).

cletus
A: 

I would say trie is the best data structure possible for this scenario.I think, you can find trie implementation online. If not, its easy to write by following wikipedia.

For the trie, / will be the default node breaker. So, every node contains some path name and you transfer the trie based on the data. This solution might involve comparisons of maximum number of nodes originating from a particular path. Worst case will occur in the below scenario where you are having n length path and last node contains m files. In that case, you are effectively making n traversals+m comparisions, so its O(N+M). If the directories contains files which are uniformly distributed, then the time will be O(length of the path to be searched).

Another improvement will be to cache the recent answers and then check against them before proceeding in the trie.

Algorist
+1  A: 

Initialize a HashSet with the preconfigured paths. Then for each file to test, cut down the path from the end and probe the HashSet at each iteration:

class PreconfiguredPaths {
  private readonly HashSet<string> known = new HashSet<string>();

  public PreconfiguredPaths(params string[] paths) {
    foreach (var p in paths)
      known.Add(Normalize(p));
  }

  public string Parent(string path) {
    path = Normalize(path);

    while (path.Length > 0) {
      if (known.Contains(path))
        return path;
      else if (!path.Contains("\\"))
        break;

      path = Regex.Replace(path, @"\\[^\\]+$", "");
    }

    return null;
  }

  private string Normalize(string path) {
    return Regex.Replace(path, "\\\\+", "\\").TrimEnd('\\').ToLower();
  }
}

For example:

var paths = new PreconfiguredPaths(
  @"C:\inetpub\wwwroot\",
  @"C:\www\websites\vhosts\somesite.com\",
  @"D:\www-mirror\websites\vhosts\somesite.co.uk"
);

string[] files = {
  @"C:\inetpub\wwwroot\styles\style.css",
  @"F:\foo\bar\baz",
  @"D:\",
};

foreach (var f in files)
  Console.WriteLine("{0} => {1}", f, paths.Parent(f));

Output:

C:\inetpub\wwwroot\styles\style.css => c:\inetpub\wwwroot
F:\foo\bar\baz =>
D:\ =>
Greg Bacon
Thanks this seems doable!
Nicole Lee
You're welcome! I'm glad it helps.
Greg Bacon