Okay, here's the situation: We have a table of about 50 columns (created by joining database tables) and several thousand rows. We need to identify a pattern in several known faulty records of that data. Here's a really boiled down example. Given a table:
-----------------------
| id | title | date |
-----------------------
| 01 | c | 2009-01|
| 02 | a | 2009-02|
| 03 | a | 2009-02|
| 04 | b | 2009-03|
| 05 | b | 2009-03|
| 06 | a | 2009-04|
-----------------------
And I ask the library to tell me how are rows 1, 4 and 5 related? Or, how are they different from all other rows? The library would say:
- All selected rows have an odd month number
- All selected rows do not have the title = 'a'
Perhaps the library is iterating through a series of pivot table groupings in excel. Whenever it finds a combinations of groupings and calculations that are interesting, it tells you.
The actual situation (for the curious only): The exact situation is that we found out changes to the data have been 'undone' somehow. Instead of just 'redoing' the changes and hoping they stick, we're trying to figure out why they occured, so that they don't unstick. Here are some of the real columns and possible data patterns:
-----------------------------------------------------
| id | user | created_on| facility | review_status |
-----------------------------------------------------
| 01 | tom | 2009-01 | Bay | Locked |
| 02 | berry | 2009-02 | Inner | |
| 03 | jan | 2009-02 | Hamming | Submited |
| 04 | bernie| 2009-03 | Youth | Accepted |
| 05 | jack | 2009-03 | Johnson | Locked |
| 06 | frank | 2009-04 | Baber St.| |
-----------------------------------------------------
Our problem is that all of the review statuses (column 5) should have been marked as 'locked,' but weren't.
Anyone know of a pattern-finding library for this kind of stuff? The long answer below hit the nail on the head, DATA MINING software seems to be right on the money, but the solution must be an Open Source OR "free as in beer" solution. Thanks Everyone!
P.S. Petitio principii answers, or answers that make no attempt to answer the initial question will not be considered (actually, they're considered, just not in the way one would expect).