tags:

views:

276

answers:

9

Have/Want List Matching Algorithm

I am implementing an item trading system on a high-traffic site. I have a large number of users that each maintain a HAVE list and a WANT list for a number of specific items. I am looking for an algorithm that will allow me to efficiently suggest trading partners based on your HAVEs and WANTs matched with theirs. Ideally I want to find partners with the highest mutual trading potential (i.e. I have a ton of things you want, you have a ton of things I want). I don't need to find the global highest-potential pair (which sounds hard), just find the highest-potential pairs for a given user (or even just some high-potential pairs, not the global max).

Example:

User 1 HAS A,C WANTS B,D

User 2 HAS D WANTS A

User 3 HAS A,B,D WANTS C

User 1 goes to the site and clicks a button that says 
  "Find Trading Partners" and the top-ranked result is
   User 3, followed by User 2.

An additional source of complexity is that the items have different values, and I want to match on the highest valued trade possible, rather than on the most number of matches between two traders. So in the example above, if all items are worth 1, but A and D are both worth 10, User 1 now gets matched with User 2 above User 3.

A naive way to do this would to compute the max trade value between the user looking for partners vs. all other users in the database. I'm thinking with some lookup tables on the right things I might be able to do better. I've tried googling around, since this seems like a classical problem, but I don't know the name for it.

Can anyone recommend a good approach to solving this problem? I've seen sites like the Magic Online Trading League that seem to solve it in realtime.

A: 

http://www.twitlonger.com/show/d828a3ae69cca8a9201d98720e9d97bd

TaslemGuy
I don't think you understood the question.
John Shedletsky
Yes, I do. It's just a breadth-first search.... It starts from any user, and must end there again. It must pass through and satisfy all or the path is not complete, and it can't pass the same way twice. When its done, it's path shows what to swap, in a term of pathfinding.
TaslemGuy
It's not a breadth-first search. The naive algorithm, as I explained, iterates through a list of N users.
John Shedletsky
A: 

Of course you could always seperate the system into three categories; "Wants," "Haves," and "Open Offers." So lets say User1 has Item A, User2 has Item B & C and is trading those for item A, but User1 still wants Item D, and User2 wants Item E. So User1 (assuming he's the trade "owner") puts a request, or want for Item D and Item E, thus the offer stands, and goes on the "Open Offers" list. If it isn't accepted or edited within two or so days, it's automatically cancelled. So User3 is looking for Item F and Item G, and searches on the "Have list" for Items F & G, which are split between User1 & User2. He realizes that User1 and User2's open offer includes requests for Items D & E, which he has. So he chooses to "join" the operation, and it's accepted on their terms, trading and swaping they items among them.

Lets say User1 now wants Item H. He simply searches on the "Have" list for the item, and among the results, he finds that User4 will trade Item H for Item I, which User1 happens to have. They trade, all is well.

KnexerKid
I don't think you understood the question.
John Shedletsky
+2  A: 

You could do this in O(n*k^2) (n is the number of people, k is the average number of items they have/want) by keeping hash tables (or, in a database, indexes) of all the people who have and want given items, then giving scores for all the people who have items the current user wants, and want items the current user has. Display the top 10 or 20 scores.


[Edit] Example of how this would be implemented in SQL:

-- Get score for @userid wants
SELECT UserHas.UserID, SUM(Items.Weight) AS Score
FROM UserWants
INNER JOIN UserHas ON UserWants.ItemID = UserHas.ItemID
INNER JOIN Items ON Items.ItemID = UserWants.ItemID
WHERE UserWants.UserID = @userid
GROUP BY UserWants.UserID, UserHas.UserID

This gives you a list of other users and their score, based on what items they have that the current user wants. Do the same for items the current user has the others want, then combine them somehow (add the scores or whatever you want) and grab the top 10.

BlueRaja - Danny Pflughoeft
O(n*k^2) might be the best you can do if you care about the optimal answer. If you could cleverly pick the people that *likely* were going to match up, maybe you could get results that were almost as good, but in realtime?
John Shedletsky
@John: For a single user, this is only going to take O(k^2) - unless every user has 10,000 haves/wants, this will be realtime. Assuming you're using a database, having tables of haves, wants, and doing an inner join on tables of weights and users should be fast enough; or, like Netflix, you could have a table for the best matchings and update it every day or two.
BlueRaja - Danny Pflughoeft
The SQL is helpful - isn't it O(k lg k) in the number of items? You have to traverse every single one of the Wants, but for the Haves, you can find a given ID in lg k, if there's an index on items.
John Shedletsky
@John: There are approx. k items this user wants; for every one of those items, there are approx. k users who have it. I initially stated using an O(1) hash-table, which is where I get O(k^2) from.
BlueRaja - Danny Pflughoeft
A: 

Just make it BC only. That solves all problems.

Luigi7723
I don't think you understood the question.
John Shedletsky
A: 

You could maintain a per-item list (as a complement to per-user list). Item search is then spot on. Now you can allow your self brute force search for most valuable pair by checking most valuable items first. If you want more complex (arguably faster) search you could introduce set of items that often come together as meta-items, and look for them first.

Dialecticus
A: 

I mark item by letter and user by number.

  • m - number of items in all have/want lists (have or want, not have and want)
  • x - number of users.

For each user you have list of his wants and haves. Left line is want list, right is have list (both will be sorted so we can use binary search).

1 - ABBCDE FFFGH
2 - CFGGH BE
3 - AEEGH BBDF

For each pair of users you generate two values and store them somewhere, you'd only generate it once and than actualize. Sorting first table and generating second, is O(m*x*log(m/x)) + O(log(m)) and will require O(x^2) extra memory. These values are: how many would first user get and how many another (if you want you can modify these values by multiplying them by value of particular item).

1-2 : 1 - 3 (user 1 gets 1) - (user 2 gets 3)
1-3 : 3 - 2
2-3 : 1 - 1 

You also compute and store best trader for each user. After you've generated this helpful data you can quickly query.

  • Adding/Removing item - O(m*log(m/x)) (You loop through user's have/want list and do binary search on have/want list of every other user and actualize data)
  • Finding best connection - O(1) or O(x) (Depends on whether result stored in cache is correct or needs to be updated. You loop through user's pairs and do whatever you want with data to return to user the best connection)

By m/x I estimate number of items in single user's want/have list.

In this algorithm I'm assuming that all data isn't stored in Database (I don't know if binary search is possible with Databases) and that inserting/removing item into list is O(1).

PS. Sorry for bad english and I hope I've computed it all correctly and that it is working because I also need it.

azram19
A: 

Okay, what about this:

There are basically giant "Pools"

Each "pool" contains "sections." Each "Pool" is dedicated to people who own a specific item. Each section is for people who own that item, and want another.

What I mean:

Pool A (For those requesting A)

--Section B (For those requesting A that have B)

--Section C (For those requesting A that have C, even if they also have B)

Pool B

--Section A

--Section B

Pool C

--Section A

--Section C

Each section is filled with people. "Deals" would consist of one "Requested" item, and a "Pack," you're willing to give any or all of the items up to get the item you requested.

Every "Deal" is calculated per-pool.... if you want a given item, you go to the pools of the items you'd be willing to give, and it find the Section which belongs to the item you are requesting.

Likewise, your deal is placed in the pools. So you can immediately find all of the applicable people, because you know EXACTLY which pools, and EXACTLY which sections to search in, no sorting necessary once they've entered the system.

And, then, age would have priority, older deals would be picked, rather than new ones.

TaslemGuy
Yikes. The space required here is O(I^2) where I is the number of items? Plus, I think doesn't solve the problem as stated. It solves a related problem.
cape1232
+1  A: 

This problem looks pretty similar to stable roomamates problem. I don't see any thing wrong with the SQL implementation that got highest votes but as some else suggested this is like a dating/match making problem similar to the lines of stable marriage problem but here all the participants are in one pool. The second wikipedia entry also has a link to a practical solution in javascript which could be useful

satyajit
Thanks for the tips - I'm going to check those out.
John Shedletsky
A: 

Let's assume you can hash your items, or at least sort them. Assume your goal is to find the best result for a given user, on request, as in your original example. (Optimizing trading partners to maximize overall trade value is a different question.)

This would be fast. O(log n) for each insertion operation. Worst case O(n) for suggesting trading partners, but you bound this by processing time.

  1. You're already maintaining a list of items per user.
  2. Give each user a score equal to the sum of the values of the items they have.
  3. Maintain a list of user-HAVES and user-WANTS per item (@Dialecticus), sorted by user score. (You can sort on demand, or keep the lists sorted dynamically every time a user changes their HAVE list.)
  4. When a user user1 requests suggested trade partners
    • Iterate over their items item in order by value.
    • Iterate over the user-HAVES user2 for each item, in order by user score.
    • Compute trade value for user1 trades-with user2.
    • Remember best trade so far.
    • Keep hash of users processed so far to avoid recomputing value for a user multiple times.
    • Terminate when you run out of processing time (your real-time guarantee).

Sorting by item value and user score is the approximation that makes this fast. I'm not sure how sub-optimal it would be, though. There are certainly easy examples where this would fail to find the best trade if you don't run it to completion. In practice, it seems like it might be good enough. In the limit, you can make it optimal by letting it run until it exhausts the lists in step 4.1 and 4.2. There's extra memory cost associated with the inverted lists, but you didn't say you were memory constrained. And generally, if you want speed, it's not uncommon to trade-off space to get it.

cape1232
I don't think 4 is O(m). If I understood it properly, you iterate over every user's Have list and for each user you compute trade value. It'like O(m) + O(x*z). Where z depends on metod you use to compute trade value.
azram19