I need a problem for my algorithm. I need like search engine input data

views:

183

answers:

I need a problem for my algorithm. I need like search engine input data

Hello!

Imagine you have some products, or items, or just anything that you want to see in some order of importance. Like you want in a search engine. Like websites. And you don't know how to sort them. But you have some criteria that give you a clue. You have a bag of criteria, and according to each, you can find a sorting, but you cannot aggregate them to one preference list.

Well, I can. It's part of my thesis and I'd like to show the practical usefulness. I would appreciate suggestions on what to sort here and which criteria to use.

I thought about things like: A DVD store sorting DVDs according to: quality of the medium, match with the query string, user votes.

So, I would enjoy to have a real-world problem including data, where users would tell me if they like my sorting. And where I can see if the obtained sorting is useful. That's kind of the point: is this better than the standard algorithms.

cheers,

niko

+3 A:

You could sort programming questions by date, preferred and disliked tags, number of answers, votes,... to find the most interesting ones :)

sth 2009-01-18 11:52:48

The perfect answer!

Paul Dixon 2009-01-18 12:00:18

Don't know. Imagine I do this and then I look at the results: is it a good or a bad result? The point of this experiment would be to see that the obtained aggregation is useful. More useful than that of standard algorithms.

nes1983 2009-01-18 12:07:19

I mean, in your search it's difficult to argue if the aggregation was good or bad. I could do it and it might help the users, but I rather want something where the usefulness is easier to see. In the sense of: yea, that's what I was looking for!

nes1983 2009-01-18 12:14:08

Also, I cannot really handle ties, yet. So, handling the tags might be difficult.

nes1983 2009-01-18 12:24:21

You probably could create some ratings like newer is better, more votes is better, more answers is better. Then there probably is some standard method to construct some value of an ordering from that? Like # of items ordered "wrong" according to any of the ratings? Is that how it's supposed to work?

sth 2009-01-18 13:01:41

Well, My algorithm is similar to the kemeny voting mechanism. And, similarly, I do not give points to the items and then sort by points. So, this is fundamentally different from what is usually done in real-life applications. I want to compare the quality of my mechanism with ordinary methods.

nes1983 2009-01-18 13:32:29

sth: What I'm trying to say is: of course i could do this and bring the posts on this site into some order. But is it a GOOD order? nobody knows what good means here. i want an example where it makes a difference!

nes1983 2009-01-18 17:20:16

I'm just not sure how you want to exactly measure the quality of the ordering without somehow putting values to the elements. How is the quality of an ordering defined if not by some comparison of it's elements? How do you say what's a "good" order, if not by somehow comparing it's elements?

sth 2009-01-19 01:13:23

+2 A:

The netflix prize? See here: http://www.netflixprize.com/ Maybe it is more about clustering than sorting.

tuinstoel 2009-01-18 12:26:21

I'm downloading their 600 MB! Dataset. That will certainly blow my algorithm. The aggregation method I use is NP-hard. I can handle it within certain bounds. This will blow my bounds.

nes1983 2009-01-18 12:50:56

Apart from being too large, this would need a 2-step phase. First, I would have to do some regression to see what users' properties translate to which movie properties. Then I could aggregate over these properties. This is, indeed, interesting, but I would first have to find another parameterization

nes1983 2009-01-18 12:57:23

And, I would have to do some serious data analysis, before I can even start. Maybe this blows the frame given by my thesis.

nes1983 2009-01-18 12:58:08

Ok, I read into this. It's much more about statistics than about preference lists. Not for me!

nes1983 2009-01-18 17:19:28

ansaurus

tags:

views:

answers:

I need a problem for my algorithm. I need like search engine input data

related questions