views:

63

answers:

3

We're trying to identify the locations of certain information stored across our enterprise in order to bring it into compliance with our data policies. On the file end, we're using Nessus to search through differing files, but I'm wondering about on the database end.

Using Nessus would seem largely pointless because it would output the raw data and wouldn't tell us what table or row it was in, or give us much useful information, especially considering these databases are quite large (hundreds of gigabytes).

Also worth noting, this system needs to be able to do pattern-based matching (such as using regular expressions). Not just a "dumb search" engine.

I've investigated the use of Data Mining and Data Warehousing in order to find this data but it seems like they're more for analysis of data than actually just finding data.

Is there a better method of searching through large amounts of data in a database to try and find this information? We're using both Oracle 11g and SQL Server 2008 and need to perform the searches on both, so I'd like to stay away from server-specific paradigms (although if I have to rewrite some code to translate from T-SQL to PL/SQL, and vice versa, I don't mind)

+2  A: 

On SQL Server for searching through large amounts of text, you can look into Full Text Search.

Read more here http://msdn.microsoft.com/en-us/library/ms142559.aspx

But if I am reading right, you want to spider your database in a similar fashion to how a web search engine spiders web sites and web pages.

You could use a set of full text queries that bring back the results spanning multiple tables.

Raj More
Does SQL Server's Full Text Search capabilities also offer pattern-based matching, such as regular expressions?
tearman
On the Oracle side the feature is called "Oracle Text": http://download.oracle.com/docs/cd/B28359_01/server.111/b28318/content.htm#CNCPT1537
Juergen Hartelt
I believe Full Text Search does support Regex
Raj More
No, SQL Server doesn't support full RegEx; just end-of-string wildcards, booleans, proximity and various word-forms.
RickNZ
You can do CLR stored procedures that do regex searches with SQL Server 2005 onwards. Read here http://msdn.microsoft.com/en-us/magazine/cc163473.aspx
Raj More
Using CLR seems like a great option for SQL Server. Hopefully Oracle will have something similar (probably in the Java environment though I can imagine).
tearman
A: 

Oracle Text is suited for searching for words/phrases in larg(ish) bits of text (eg PDFs, HTMLs, TXT or DOCs) held in the database. There is some limited fuzziness searching, but not regular expressions per se.

You don't really go into what sort of data you are looking for or what you have in your databases. Nessus indicates you are looking for security issues, but the title of "Data Correlation" suggests something completely different.

Really the data structures should provide the information about what to look for and where. That's what databases are about - structuring data for accessibility. A database backing a CMS, forum software or similar would be a different kettle of fish.

Gary
I looked into Oracle Text a little bit and it is a nice option but like I mentioned, I'm attempting to do pattern-based matching and such. And the challenge is I'm specifically looking for exception data that doesn't meet our standard data storage policies.
tearman
It looks like Oracle has a PL/SQL directive called "regexp_like()" which could accomplish the desired functionality on a core level.
tearman
+1  A: 

Oracle supports regular expression with the RegExp_Like() function and it ought to be fairly straightforward to automate the generation of the code you need based on system metadate (to find all text columns over a certain length, for example, and include them in a predicate againt that table to find the rows and values that match your regexp). Doesn't sound too challenging really. In theory you could check constrain columns to prevent the insertion of values that match a regexp but that might be overkill.

David Aldridge