views:

273

answers:

4

OK, so say I have a table with 500K rows, then I ad-hoc query with unsupported indexing which requires a full table scan. I would like to immediately view the first rows returned while the full table scan continues. Then I want to scroll thru the next results. In the meantime, I would like to display the progress of the table scan, example: "SEARCHING.. FOUND 23 OF 500,000 ROWS SO FAR". If I scroll too far ahead, I want to display a message like: "REACHED LAST ROW IN LOOK-AHEAD BUFFER.. QUERY HAS NOT COMPLETED".

In 1990, I saw an app writen in C with INFORMIX ESQL/C calls which started displaying rows even though the query had not completed.

Can a feature like this be transparently available with Oracle, Informix, MySQL, or other RDBMS?

Maybe using: spawn/exec, declare scroll cursor, open, fetch, etc.?

My current platform is IBM Informix-4GL 7.5 and IBM Informix Dynamic Server (IDS) 9.5 (non-existent version; options are 9.30, 9.40, 10.00, 11.10, 11.50) on Linux, but I would like to know how this feature could be implemented in Oracle, SQL Server, etc. The whole idea is to have the ability to start viewing rows before a long query completes, while displaying a counter of how many rows are available for immediate viewing, and how many rows must be scanned.


See also continuation question SO 2577034.

+1  A: 

It can be done, with an analytic function, but Oracle has to full scan the table to determine the count no matter what you do if there's no index. An analytic could simplify your query:

SELECT x,y,z, count(*) over () the_count
  FROM your_table
 WHERE ...

Each row returned will have the total count of rows returned by the query in the_count. As I said, however, Oracle will have to finish the query to determine the count before anything is returned.

Depending on how you're processing the query (e.g., a PL/SQL block in a form), you could use the above query to open a cursor, then loop through the cursor and display sets of records and give the user the chance to cancel.

DCookie
Would changing optimizer goal to `FIRST_ROWS` help ?
Sathya
Not if you want the count as well, because oracle, well, has to count all the rows ;-)
DCookie
+1  A: 

I'm not sure how you would accomplish this, since the query has to complete prior to the results being known. No RDBMS (that I know of) offers any means of determining how many results to a query have been found prior to the query completing.

I can't speak factually for how expensive such a feature would be in Oracle because I have never seen the source code. From the outside in, however, I think it would be rather costly and could double (if not more) the length of time a query took to complete. It would mean updating an atomic counter after each result, which isn't cheap when you're talking millions of possible rows.

Tim Post
+3  A: 

Paraphrasing:

I have a table with 500K rows. An ad-hoc query without a good index to support it requires a full table scan. I would like to immediately view the first rows returned while the full table scan continues. Then I want to scroll through the next results.

It seems that what you would like is some sort of system where there can be two (or more) threads at work. One thread would be busy synchronously fetching the data from the database, and reporting its progress to the rest of the program. The other thread would be dealing with the display.

In the meantime, I would like to display the progress of the table scan, example: "Searching...found 23 of 500,000 rows so far".

It isn't clear that your query will return 500,000 rows (indeed, let us hope it does not), though it may have to scan all 500,000 rows (and may well have only found 23 rows that match so far). Determining the number of rows to be returned is hard; determining the number of rows to be scanned is easier; determining the number of rows already scanned is very difficult.

If I scroll too far ahead, I want to display a message like: "Reached last row in look-ahead buffer...query has not completed yet".

So, the user has scrolled past the 23rd row, but the query is not yet completed.

Can this be done? Maybe like: spawn/exec, declare scroll cursor, open, fetch, etc.?

There are a couple of issues here. The DBMS (true of most databases, and certainly of IDS) remains tied up as far as the current connection on processing the one statement. Obtaining feedback on how a query has progressed is difficult. You could look at the estimated rows returned when the query was started (information in the SQLCA structure), but those values are apt to be wrong. You'd have to decide what to do when you reach row 200 of 23, or you only get to row 23 of 5,697. It is better than nothing, but it is not reliable. Determining how far a query has progressed is very difficult. And some queries require an actual sort operation, which means that it is very hard to predict how long it will take because no data is available until the sort is done (and once the sort is done, there is only the time taken to communicate between the DBMS and the application to hold up the delivery of the data).

Informix 4GL has many virtues, but thread support is not one of them. The language was not designed with thread safety in mind, and there is no easy way to retrofit it into the product.

I do think that what you are seeking would be most easily supported by two threads. In a single-threaded program like an I4GL program, there isn't an easy way to go off and fetch rows while waiting for the user to type some more input (such as 'scroll down the next page full of data').

The FIRST ROWS optimization is a hint to the DBMS; it may or may not give a significant benefit to the perceived performance. Overall, it typically means that the query is processed less optimally from the DBMS perspective, but getting results to the user quickly can be more important than the workload on the DBMS.


Somewhere down below in a much down-voted answer, Frank shouted (but please don't SHOUT):

That's exactly what I want to do, spawn a new process to begin displaying first_rows and scroll through them even though the query has not completed.

OK. The difficulty here is organizing the IPC between the two client-side processes. If both are connected to the DBMS, they have separate connections, and therefore the temporary tables and cursors of one session are not available to the other.

When a query is executed, a temporary table is created to hold the query results for the current list. Does the IDS engine place an exclusive lock on this temp table until the query completes?

Not all queries result in a temporary table, though the result set for a scroll cursor usually does have something approximately equivalent to a temporary table. IDS does not need to place a lock on the temporary table backing a scroll cursor because only IDS can access the table. If it was a regular temp table, there'd still not be a need to lock it because it cannot be accessed except by the session that created it.

What I meant with the 500k rows, is nrows in the queried table, not how many expected results will be returned.

Maybe a more accurate status message would be:

Searching 500,000 rows...found 23 matching rows so far

I understand that an accurate count of nrows can be obtained in sysmaster:sysactptnhdr.nrows?

Probably; you can also get a fast and accurate count with 'SELECT COUNT(*) FROM TheTable'; this does not scan anything but simply accesses the control data - probably effectively the same data as in the nrows column of the SMI table sysmaster:sysactptnhdr.

So, spawning a new process is not clearly a recipe for success; you have to transfer the query results from the spawned process to the original process. As I stated, a multithreaded solution with separate display and database access threads would work after a fashion, but there are issues with doing this using I4GL because it is not thread-aware. You'd still have to decide how the client-side code is going store the information for display.

Jonathan Leffler
Perhaps, if the progress of the query could be periodically displayed (every 100 rows)like "SEARCHING 500,000 rows, AT ROW n00, FOUND n, Continue (Y/N)?" while giving the user the option to cancel, it would be a nice feature to have since sometimes users don't realize the scope of their queries, which can tie-up valuable resources.
Frank Computer
Frank Computer
A: 

is it possible to create an SPL to select the first row from a table, increment a counter, read the next row and continue in this loop until EOF is reached?

Frank Computer
@Frank: Please take the time to read the [FAQ](http://stackoverflow.com/faq). You'll learn that this is not a discussion forum. "Replying to the thread" is not done here. Your last three answers should have been edits to your original question, and not three new answers.
John Saunders