views:

136

answers:

2

I'm working on a web service that fetches data from an oracle data source in chunks and passes it back to an indexing/search tool in XML format. I'm the C#/.NET guy, and am kind of fuzzy on parts of Oracle.

Our Oracle team gave us the following script to run, and it works well:

SELECT ROWID, [columns]
FROM [table]
WHERE ROWID IN (
    SELECT ROWID
    FROM (
        SELECT ROWID
        FROM [table]
        WHERE ROWID > '[previous_batch_last_rowid]'
        ORDER BY ROWID
    )
    WHERE ROWNUM <= 10000
)
ORDER BY ROWID

10,000 rows is an arbitrary but reasonable chunk size and ROWID is sufficiently unique for our purposes to use as a UID since each indexing run hits only one table at a time. Bracketed values are filled in programmatically by the web service.

Now we're going to start adding views to the indexing, each of which will union a few separate tables. Since ROWID would no longer function as a unique identifier, they added a column to the views (VIEW_UNIQUE_ID) that concatenates the ROWIDs from the component tables to construct a UID for each union.

But this script does not work, even though it follows the same form as the previous one:

SELECT VIEW_UNIQUE_ID, [columns]
FROM [view]
WHERE VIEW_UNIQUE_ID IN (
    SELECT VIEW_UNIQUE_ID
    FROM (
        SELECT VIEW_UNIQUE_ID
        FROM [view]
        WHERE VIEW_UNIQUE_ID > '[previous_batch_last_view_unique_id]'
        ORDER BY VIEW_UNIQUE_ID
    )
    WHERE ROWNUM <= 10000
)
ORDER BY VIEW_UNIQUE_ID

It hangs indefinitely with no response from the Oracle server. I've waited 20+ minutes and the SQLTools dialog box indicating a running query remains the same, with no progress or updates.

I've tested each subquery independently and each works fine and takes a very short amount of time (<= 1 second), so the view itself is sound. But as soon as the inner two SELECT queries are added with "WHERE VIEW_UNIQUE_ID IN...", it hangs.

Why doesn't this query work for views? In what important way are they not interchangeable here?

Updated: the architecture of the solution stipulates that it is to be stateless, so I shouldn't try to make the web service preserve any index state information between requests from consumers.

+3  A: 

they added a column to the views (VIEW_UNIQUE_ID) that concatenates the ROWIDs from the component tables to construct a UID for each union.

God, that is the most obscene idea I've seen in a long time. Let's say the view is a simple one like

SELECT C.CUST_ID, C.CUST_NAME, O.ORDER_ID, C.ROWID||':'||O.ROWID VIEW_UNIQUE_ID
FROM CUSTOMER C JOIN ORDER O ON C.CUST_ID = O.CUST_ID

Every time you want to do the

SELECT VIEW_UNIQUE_ID
FROM [view]
WHERE VIEW_UNIQUE_ID > '[previous_batch_last_view_unique_id]'
ORDER BY VIEW_UNIQUE_ID

It has to build that entire result set, apply the filter, and order it. For anything other than trivially sized tables, that will be a nightmare.

Stop using the database to paginate/chunk the data here and do that in the client. Open the database connection, execute the query, fetch the first ten thousand rows from the query, index them, fetch the next ten thousand. Don't close and reopen the query each time, only after you've processed each row. You'll be able to forget about ordering.

Gary
Unfortunately, the architecture of the system specifically stipulates that it's supposed to be stateless. (Edited post to say so explicitly.)
Calvin Fisher
A: 

For stateless, you need to re-architect. The whole thing with concatenated ROWIDs will not fly.

Start by putting the records to be processed into a fresh table, then you can flag them/process them/delete them in chunks.

INSERT INTO pending_table
SELECT 'N' state_flag, v.* FROM view v;

<start looping here>

UPDATE pending_table
SET state_flag = 'P'
WHERE ROWNUM < 10000;

COMMIT;

SELECT * FROM pending_table
WHERE state_flag = 'P';

<client processing>

DELETE FROM pending_table
WHERE state_flag = 'P';

<go back to start of loop, and keep going until pending_table is empty>
Gary
Actually, it's working fine now. Apparently, they made some mistake in the view command and that was causing the whole thing. Performance isn't even noticeably different from querying a real table. Thanks for your help though!
Calvin Fisher
Their description sounds like what you said: "It was logical view of the data... converted it to a materialized view..."
Calvin Fisher