views:

76

answers:

1

The current issue im facing is comes from the following scenario. I have a script that runs a commandline program to find all files of a certain extension within an specific folder, lets call these files File A. Another section of the script runs a grep command through each file for filenames within File A. What would be the best method to store what filenames are in File A and only File A, and how could I achieve it? Thanks

+3  A: 

EDIT: I see you were the one who asked the previous question! Why open a new one?


There was a recent question on this exact problem -- the structure you are modelling is a directed graph. See my answer to that question, using Python's networkx package. Using this package is a good idea if you are going to do some post-processing of the data. However, for simple situations, you could make your own data structure. Here is a sample using an adjacency list representation of a graph; it is not difficult to use an adjacency matrix instead.

from collections import defaultdict
adj_list = defaultdict( set )

for filename in os.listdir( <dir> ):
    with open( filename ) as theFile:
        for line in theFile:
            # parse line into filename, say 'target'
            adj_list[ filename ].add( target )

This will give you a dictionary of filename -> files linked by that file.

katrielalex
you seem to be quite the speedy/frequent replier katrielalex, thanks for that. Is there a particular reason for using a list? Now that I think about it could one code a class structure in a similar manner?
well the answer is different for one, and your reply to this question happened to have solved a few problems.
@user428370: The overall structure here is a directed graph. To store that graph you need a data structure for it. (Just as e.g. a vector is a mathematical concept but it is stored in e.g. a list or a tuple.) There are several data structures commonly used for graphs; the two most common are adjacency lists and adjacency matrices (see Wikipedia for more info). In the above case I have implemented an adjacency list structure using a Python dictionary of sets.
katrielalex
I don't understand what you mean by "could one code a class structure in a similar manner" -- are you asking whether it would be possible to define a class to represent a graph? The answer is yes: `networkx.DiGraph` is one such class. If you're asking whether you can represent the inheritance structure of Python classes using a graph, the answer is also yes: it's called a dependency graph. http://en.wikipedia.org/wiki/Dependency_graph
katrielalex
ok that explains everything, thank you for taking the time to assist me.
You are most welcome!
katrielalex