views:

195

answers:

2
+4  Q: 

Circular Match

I have a database with three tables: userid_tbl, need_tbl, have_tbl

create table userid_tbl
(user_id varchar2(15) not null primary key);

create table need_tbl
(user_id varchar2(15) not null,
have_item varchar2(100) not null,
foreign key (user_id) references userid_tbl (user_id)
);

create table have_tbl
(user_id varchar2(15) not null,
have_item varchar2(100) not null,
foreign key (user_id) references userid_tbl (user_id)
);

Each user can create an unlimited number of records of needs or services they can provide to others in the database. The items are from a predetermined list.

After doing a join on the need and have table, I have matched up the needs and the wants, and discarded the requests that cannot be fulfilled by any user in a view that looks like this:

Item:         Need:            Have:
1             Bob              George
2             George           Herbie
3             Herbie           Bob
4             Larry            Wally
5             Wally            Larry
6             John             George

I am now trying to write a query where I can calculate the number of permutations of each user having their requested needs fulfilled. For instance, in the example above, Bob, George, and Herbie can together fulfill each other's needs, as can Larry and Wally, but John cannot, so the total count would be 2.

I first started out doing this with a LOOP, but there has to be a simpler, less resource intensive way to do this with a single query.

+9  A: 

See the article in my blog for detailed explanations:

If your users may be found more than once in the needs column of the table, it's an NP complex task.

If not, issue this query on your view:

SELECT  COUNT(*)
FROM    v_needs
CONNECT BY NOCYCLE
        need = PRIOR have
WHERE   CONNECT_BY_ISCYCLE = 1

CONNECT BY NOCYCLE allows loops within hierarchical queries (NOCYCLE just stops a branch when it finds a loops, no NOCYCLE returns an error).

CONNECT_BY_ISCYCLE returns true whenever it finds a loop (it returns 1 for the record which would yield the root of the current branch on the next iteration).

Note that this query will yield you all users within the loops without grouping them.

To group the users, issue this query:

SELECT  n.*, CONNECT_BY_ROOT(have), level
FROM    v_needs n
START WITH
        have IN
        (
        SELECT  MIN(have)
        FROM    (
                SELECT  have, CONNECT_BY_ROOT(have) AS root
                FROM    t_needs
                START WITH
                        have IN
                        (
                        SELECT  have
                        FROM    t_needs n
                        WHERE   CONNECT_BY_ISCYCLE = 1
                        CONNECT BY NOCYCLE
                                needs = PRIOR have
                        )
                CONNECT BY NOCYCLE
                        have = PRIOR needs
                )
        GROUP BY
                root
        )
CONNECT BY NOCYCLE
        have = PRIOR needs

Update:

From your post I can see that you have a limitation on the maximum loop length.

This makes this problem much easier to solve.

Just add the loop limiting condition into each of the the queries:

SELECT  n.*, CONNECT_BY_ROOT(have), level
FROM    v_needs n
START WITH
        have IN
        (
        SELECT  MIN(have)
        FROM    (
                SELECT  have, CONNECT_BY_ROOT(have) AS root
                FROM    t_needs
                START WITH
                        have IN
                        (
                        SELECT  have
                        FROM    t_needs n
                        WHERE   CONNECT_BY_ISCYCLE = 1
                        CONNECT BY NOCYCLE
                                needs = PRIOR have
                                AND level <= 5
                        )
                CONNECT BY NOCYCLE
                        have = PRIOR needs
                        AND level <= 5
                )
        GROUP BY
                root
        )
CONNECT BY NOCYCLE
        have = PRIOR needs
        AND level <= 5

This will stop traversing the hierarchy tree on the 5th level.

If you have 1,000,000 users in your database and 5 matches per user in average, the query will need to examine 1,000,000 * 5^5 = 3,125,000,000 rows, which is an observable number of rows.

The query will be greatly improved if you MATERIALIZE your view and create indexes on 'need' and 'have'.

In this case the query completion will be a matter of minutes.

Quassnoi
+1 It would be great if you could explain the NOCYCLE and CONNECT_BY_ISCYLE parts!
Andomar
A: 

This is a great start for me. I had been beating my head against the wall for days. Let me give a few more details to make this more clear. Unfortunately, the problem is a clique problem, and is NP-complete because all three columns are not unique.

Here's the real world application: I'm doing a research project on the efficiency of negative reciprocity, i.e. barter, in lieu of a monetary exchange where it is inappropriate, or illegal.

The specific example is organ transplants. Let's say Bob needs a kidney, and his wife is willing to donate a kidney, there is no guarantee that she is a match. However, she may be a match for another user in the database, who in return may be able to provide Bob with a matching kidney.

If I have a database with thousands of recipients, and tens of thousands of potential donors, I can potentially broker a multi-way transaction to maximize the benefit for as many recipients as possible.

There obviously have to be limitations as to the maximum number of levels required to close the loop. A 5-way transaction is possible, but a 100-way transaction is just not logistically feasible.

I had never even heard of Oracle Spatial until today. It looks like that might be exactly the product I need to make this easier.