The current issue im facing is comes from the following scenario. I have a script that runs a commandline program to find all files of a certain extension within an specific folder, lets call these files File A. Another section of the script runs a grep command through each file for filenames within File A. What would be the best method to store what filenames are in File A and only File A, and how could I achieve it? Thanks
+3
A:
EDIT: I see you were the one who asked the previous question! Why open a new one?
There was a recent question on this exact problem -- the structure you are modelling is a directed graph. See my answer to that question, using Python's networkx
package. Using this package is a good idea if you are going to do some post-processing of the data. However, for simple situations, you could make your own data structure. Here is a sample using an adjacency list representation of a graph; it is not difficult to use an adjacency matrix instead.
from collections import defaultdict
adj_list = defaultdict( set )
for filename in os.listdir( <dir> ):
with open( filename ) as theFile:
for line in theFile:
# parse line into filename, say 'target'
adj_list[ filename ].add( target )
This will give you a dictionary of filename -> files linked by that file.
katrielalex
2010-08-25 12:25:00
you seem to be quite the speedy/frequent replier katrielalex, thanks for that. Is there a particular reason for using a list? Now that I think about it could one code a class structure in a similar manner?
2010-08-25 12:40:58
well the answer is different for one, and your reply to this question happened to have solved a few problems.
2010-08-25 12:43:42
@user428370: The overall structure here is a directed graph. To store that graph you need a data structure for it. (Just as e.g. a vector is a mathematical concept but it is stored in e.g. a list or a tuple.) There are several data structures commonly used for graphs; the two most common are adjacency lists and adjacency matrices (see Wikipedia for more info). In the above case I have implemented an adjacency list structure using a Python dictionary of sets.
katrielalex
2010-08-25 13:02:06
I don't understand what you mean by "could one code a class structure in a similar manner" -- are you asking whether it would be possible to define a class to represent a graph? The answer is yes: `networkx.DiGraph` is one such class. If you're asking whether you can represent the inheritance structure of Python classes using a graph, the answer is also yes: it's called a dependency graph. http://en.wikipedia.org/wiki/Dependency_graph
katrielalex
2010-08-25 13:04:06
ok that explains everything, thank you for taking the time to assist me.
2010-08-25 13:23:47
You are most welcome!
katrielalex
2010-08-25 13:42:00