Array comparisons and relevancy scores

Use an inverted index (Hash Table) to get it down to O(n). Put all the items in the first list in one hash table. Then iterate through all the items in the second list, looking up each item in the hash table.

What I don't know is how you are defining similar. If similarity is simply that the items in the two lists are equal, then this will work. However if similarity is more complex, then you may need to build multiple hash tables for each type of similarity possible. For example, you could have one hash table that keys off of the phonetic spelling of a word, and one that keys off of the exact string of the word.

If you have one list that is large like a Job Openings list, and you want to query the list for candidate skills, you should really use a search engine. A search engine is just a set of hash tables keyed off of keywords. There is no sense rebuilding a search engine when you can use one that has already been built. First you index all the job openings, then you query the search engine using words from a candidate's resume. A popular open source search engine that you may want to look into is Solr.

+1 for "it easier to answer when we know what we are talking about." .. -1 for answering when not knowing "what we are talking about." ;)

belisarius 2010-10-26 21:01:44

@belisarius : Well he has given some insight in the comment to his question, so I know it is job titles/job openings. More details can help but the answer is still valid, trying to group things together is always good to reduce complexity. + this was a little to long for a comment, easier to read this way

Loïc Février 2010-10-26 21:04:53

Was kidding. +1 for the categorisation. As I said in a previous comment <i>"If no other property is known, comparing n to m requires O(n m). If you need to do better than that you need something else. Example: hierarchical arrangements of keywords."</i> OR categories

belisarius 2010-10-26 21:08:47

I apologize for the lack of details. I'm calculating how relevant it is by matching keywords and keyphrases from multiple people's past job titles against job opening titles. Keywords are weighted by how often they appear throughout each person's work history, and keyphrases are matched by matching two keywords with possible wildcard words in between. If two past job titles were "Sales Representative" and "Director of Sales", it's organized in an array as (keywords => ("sales" => 2, "director" => 1, "representative" => 1)). I then go through each job opening and match keywords in the titles.

Nick 2010-10-27 17:04:14

The big issue that I'm wondering I can avoid is the process of matching jobs though. If I'm matching 10 people to 100 jobs, I go to the first person, then go through all 100 jobs and assign a relevance score, sort by which one is the most relevant, and then pick the top 4. Then repeating the process for the other 9 people. I'm wondering if the "selecting the top 4" part will prevent me from ever being better than O(n^2).

Nick 2010-10-27 17:17:09

Please edit your question to add these informations and then you can delete these 2 comments. Can you give your exact formula ? Or a readl example with, let's say 3 peoples and 5 jobs opening or something like that.

Loïc Février 2010-10-27 17:45:01

ansaurus

tags:

views:

answers:

Array comparisons and relevancy scores

related questions