views:

294

answers:

5

We have an auto-complete list that's populated when an you send an email to someone, which is all well and good until the list gets really big you need to type more and more of an address to get to the one you want, which goes against the purpose of auto-complete

I was thinking that some logic should be added so that the auto-complete results should be sorted by some function of most recently contacted or most often contacted rather than just alphabetical order.

What I want to know is if there's any known good algorithms for this kind of search, or if anyone has any suggestions.

I was thinking just a point system thing, with something like same day is 5 points, last three days is 4 points, last week is 3 points, last month is 2 points and last 6 months is 1 point. Then for most often, 25+ is 5 points, 15+ is 4, 10+ is 3, 5+ is 2, 2+ is 1. No real logic other than those numbers "feel" about right.

Other than just arbitrarily picked numbers does anyone have any input? Other numbers also welcome if you can give a reason why you think they're better than mine

Edit: This would be primarily in a business environment where recentness (yay for making up words) is often just as important as frequency. Also, past a certain point there really isn't much difference between say someone you talked to 80 times vs say 30 times.

+1  A: 

Maybe count the number of emails sent to each address. Then:

ORDER BY EmailCount DESC, LastName, FirstName

That way, your most-often-used addresses come first, even if they haven't been used in a few days.

Corbin March
Yeah, but in a business environment (which I guess I should've specified) one might be in contact with a customer/client for a few days/weeks either solving a problem or settling on a deal/agreement and in that case most recent is more relevant than most often.
Davy8
Absolutely, there are all sorts of potential users - I might email my remote boss once every two weeks forever, I might have a sales account active for a month, I might support a customer that needs extra help after new builds. Maybe a combination of frequency and time immediacy?
Corbin March
"Maybe a combination of frequency and time immediacy?" Yes, that's more of what I was looking for, kinda the specifics of how to balance the two.
Davy8
+2  A: 

Take a look at Self organizing lists.

A quick and dirty look:

Move to Front Heuristic: A linked list, Such that whenever a node is selected, it is moved to the front of the list.

Frequency Heuristic: A linked list, such that whenever a node is selected, its frequency count is incremented, and then the node is bubbled towards the front of the list, so that the most frequently accessed is at the head of the list.

It looks like the move to front implementation would best suit your needs.

EDIT: When an address is selected, add one to its frequency, and move to the front of the group of nodes with the same weight (or (weight div x) for courser groupings). I see aging as a real problem with your proposed implementation, in that it requires calculating a weight on each and every item. A self organizing list is a good way to go, but the algorithm needs a bit of tweaking to do what you want.

Further Edit: Aging refers to the fact that weights decrease over time, which means you need to know each and every time an address was used. Which means, that you have to have the entire email history available to you when you construct your list.

The issue is that we want to perform calculations (other than search) on a node only when it is actually accessed -- This gives us our statistical good performance.

chris
Interesting implementation, however this would still result in either most recent in front or most frequent in front, not a combination of the two.
Davy8
I'm not completely sure what you mean by aging. My issue with this solution is that it puts more emphasis on the frequency in rather than (more or less) equal on both.
Davy8
An example of where that might be an issue is say you talk with a customer back and forth a lot (there was a really big issue to settle) but now that it's done, you don't need to contact them anymore for a long time. They're still in front because of all the frequency and will be for a long time.
Davy8
+1 for self-organizing lists though
Davy8
+2  A: 

This kind of thing seems similar to what is done by firefox when hinting what is the site you are typing for.

Unfortunately I don't know exactly how firefox does it, point system seems good as well, maybe you'll need to balance your points :)

I'd go for something similar to:

NoM = Number of Mail

(NoM sent to X today) + 1/2 * (NoM sent to X during the last week)/7 + 1/3 * (NoM sent to X during the last month)/30

Contacts you did not write during the last month (it could be changed) will have 0 points. You could start sorting them for NoM sent in total (since it is on the contact list :). These will be showed after contacts with points > 0

It's just an idea, anyway it is to give different importance to the most and just mailed contacts.

Andrea Ambu
Heh, guess this is a slightly more calculated version of the answer I just posted
Davy8
It's just a sum of derivatives, each one with its own weight (the coefficients 1, 1/2, 1/3 :)
Andrea Ambu
Deleted mine since you basically said the same thing and then some
Davy8
You still have the aging problem.
chris
Contacts who you not wrote during the last month gain 0 points. They will be listed after contacts with points > 0, sorted by "total number of mail sent to that contact", is it not clear on my answer? Do I need rewording it?
Andrea Ambu
No - you misunderstood what I meant by aging. With this implementation, you need to calculate age on every instance of the address in order for the formula to work.
chris
I see it as a must if you want to combine "recentness" and frequency. Anyway it could be cached every time you send an email :)
Andrea Ambu
So... O(n)=? lol, it's only been 2-3 years and I don't remember how to calculate Big-O of all but the most trivial things.You're right though since the case for this is lots of entries, performance could come into play here.
Davy8
+2  A: 

If you want to get crazy, mark the most 'active' emails in one of several ways:

  • Last access
  • Frequency of use
  • Contacts with pending sales
  • Direct bosses
  • Etc

Then, present the active emails at the top of the list. Pay attention to which "group" your user uses most. Switch to that sorting strategy exclusively after enough data is collected.

It's a lot of work but kind of fun...

Corbin March
Heh, more complicated than I'm probably looking for, but could be useful to someone else, so +1
Davy8
+1  A: 

I like the idea of a point-based system, with points for recent use, frequency of use, and potentially other factors (prefer contacts in the local domain?).

I've worked on a few systems like this, and neither "most recently used" nor "most commonly used" work very well. The "most recent" can be a real pain if you accidentally mis-type something once. Alternatively, "most used" doesn't evolve much over time, if you had a lot of contact with somebody last year, but now your job has changed, for example.

Once you have the set of measurements you want to use, you could create an interactive apoplication to test out different weights, and see which ones give you the best results for some sample data.

Mark Bessey