I am looking for a time efficient method to parse a list of files into a tree. There can be hundreds of millions of file paths.
The brute force solution would be to split each path on occurrence of a directory separator, and traverse the tree adding in directory and file entries by doing string comparisons but this would be exceptionally slow.
The input data is usually sorted alphabetically, so the list would be something like:
C:\Users\Aaron\AppData\Amarok\Afile
C:\Users\Aaron\AppData\Amarok\Afile2
C:\Users\Aaron\AppData\Amarok\Afile3
C:\Users\Aaron\AppData\Blender\alibrary.dll
C:\Users\Aaron\AppData\Blender\and_so_on.txt
From this ordering my natural reaction is to partition the directory listings into groups... somehow... before doing the slow string comparisons. I'm really not sure. I would appreciate any ideas.
Edit: It would be better if this tree were lazy loaded from the top down if possible.