views:

77

answers:

2

I'm trying to clean up a database by first finding unreferenced objects. I have extracted all the database objects into a list, and all the ddl code into files, I also have all the Java source code for the project.

Basically what I want to do (preferably in Perl as it's the scripting language that I'm most familiar with) is to somehow index the contents of all the extracted database ddl and Java files (to speed up the search), step through the database object list and then search through all the files (using the index) to see if those objects are referenced anywhere and create a report.

If you could point me in the right direction to find something that indexes all those files in a way that I can search them (preferably in Perl) I would greatly appreciate it. The key here is to be able to do this programatically, not manually (using something like Google desktop search).

+2  A: 

Break the task down into its steps and start at the beginning. First, what does a record look like, and what information in it connects it to another record? Parse that record, store its unique identifier and a list of the things it references.

Once you have that list, invert it. For each reference, create a list of the objects referenced. Count them by their identifier. You should be able to get the ones whose count is zero.

That's a very general answer, but you asked a very general question. If you are having trouble, break it down into just one of those steps and ask a more specific question, supplying sample data and the code you've tried so far.

Good luck,

brian d foy
A: 

An interesting module you might use to do what you want is KinoSearch, it provides you the kind of indexing you said to be looking for. Then you can go through the object identifiers and check if there are references to it.

Daniel Ruoso