ansaurus

Question

how to identify the minimal set of parameters describing a data set

Answer 1

+1 A:

This looks very similar to Database Normalization.

You have a relation (your test data set), and some known functional dependencies ({sender} => arg, {} => foo and possibly {msg} => sender. If the order of tests is important then add {testNr} => msg.) and you want to eliminate redundancies.

Treat your test set as a database table, apply the normalization rules and create equivalent functions (getArgFromSender(sender) etc.) for each join.

finnw 2008-10-04 10:24:31

Answer 2

+1 A:

If the number of fields and records is small:

Brute force it by looping through every combination of fields, and for each combination detect if there are multiple items in the list which map to the same value.

If you can live with a fairly good choice of fields:

Start off assuming you need all fields. Then, select a field at random and see if it can be eliminated; if it can, cross it off the set of fields. Otherwise, choose another field at random and try again. If you find no fields can be eliminated, then you've found a reasonable set of fields. Had you chosen other fields first, you may find a better solution. You can repeat the whole procedure a few times and pick the best solution if you like. This kind of approach is called hill climbing.

(I suspect that this problem is NP complete, i.e. we probably don't know of an efficient and powerful solution so it is not worth losing sleep over trying to dream up a perfect solution.)

Dickon Reed 2008-10-04 21:00:30

Answer 3

+3 A:

The following papers describe algortithms for discovering functional dependencies:

Y. Huhtala, J. Kärkkäinen, P. Porkka, and H. Toivonen. TANE: An efficient algorithm for discovering functional and approximate dependencies. The Computer Journal, 42(2):100–111, 1999, doi:10.1093/comjnl/42.2.100.

I. Savnik and P. A. Flach. Bottom-up induction of functional dependencies from relations. In Proc. AAAI-93 Workshop: Knowledge Discovery in Databases, pages 174–185, Washington, DC, USA, 1993.

C. Wyss, C. Giannella, and E. Robertson. FastFDs: A Heuristic-Driven, Depth-First Algorithm for Mining Functional Dependencies from Relation Instances. In Proc. Data Warehousing and Knowledge Discovery, pages 101–110, Munich, Germany, 2001, doi:10.1007/3-540-44801-2.

Hong Yao and Howard J. Hamilton. "Mining functional dependencies from data." Data Mining and Knowledge Discovery, 2008, doi:10.1007/s10618-007-0083-9.

There has also been some work on discovering multivalued dependencies:

I. Savnik and P. A. Flach. "Discovery of Mutlivalued Dependencies from Relations." Intelligent Data Analysis Journal, 4(3):195–211, IOS Press, 2000.

Vebjorn Ljosa 2008-10-09 17:18:48

Hmm. I'm glad to see this problem really is hard, and I haven't been flailing around for no reason. Thanks for collecting those references.

Eric 2008-10-10 00:40:49

ansaurus

tags:

views:

answers:

how to identify the minimal set of parameters describing a data set

related questions