tags:

views:

209

answers:

9

Hi, I have table with "varchar2" as primary key.
It has about 1 000 000 Transactions per day.

My app wakes up every 5 minute to generate text file by querying only new record.
It will remember last point and process only new records.

  1. Do you have idea how to query with good performance?
    I am able to add new column if necessary.

  2. What do you think this process should do by?

    • plsql?
    • java?
+2  A: 

Well, if you can add a new column, you could create a Processed column, which will indicate processed records, and create an index on this column for performance.

Then the query should only be for those rows that have been newly added, and not processed.

This should be easily done using sql queries.

astander
What is query which updated immediately when I select it . If you have it, this solution should acceptable.
Hlex
@Hlex you could use a trigger or stored procedure to accomplish that.
XpiritO
If you use PL/SQL, then better to do it in PL/SQL then doing it in TRIGGER.
Guru
Trigger to write file may have problem when try to write file concurrently. I think it costly to open-close file every new record coming.
Hlex
A: 

In MS SQL Server world where I work, we have a 'version' column of type 'timestamp' on our tables.

So, to answer #1, I would add a new column.

To answer #2, I would do it in plsql for performance.

Mark

MStodd
I see your solution. MS SQL have timestamp which change number when we modify DB. But there are no that feature in oracle.If I add sequence to new indexed int column. 1 year will running to 365 000 000 which I worry that 1 day it cycle to 0 and program will error.
Hlex
A: 

"astander" pretty much did the work for you. You need to ALTER your table to add one more column (lets say PROCESSED)..

You can also consider creating an INDEX on the PROCESSED ( a bitmap index may be of some advantage, as the possible value can be only 'y' and 'n', but test it out ) so that when you query it will use INDEX.

Also if sure, you query only for every 5 mins, check whether you can add another column with TIMESTAMP type and partition the table with it. ( not sure, check out again ).

I would also think about writing job or some thing and write using UTL_FILE and show it front end if it can be.

Guru
+2  A: 

Ah, I really hate to add another answer when the others have come so close to nailing it. But

As Ponies points out, Oracle does have a hidden column (ORA_ROWSCN - System Change Number) that can pinpoint when each row was modified. Unfortunately, the default is that it gets the information from the block instead of storing it with each row and changing that behavior will require you to rebuild a really large table. So while this answer is good for quieting the SQL Server fella, I'd not recommend it.

Astander is right there but needs a few caveats. Add a new column needs_processed CHAR(1) DEFAULT 'Y' and add a BITMAP index. For low cardinality columns ('Y'/'N') the bitmap index will be faster. Once you have the rest is pretty easy. But you've got to be careful not select the new rows, process them and mark them as processed in one step. Otherwise, rows could be inserted while you are processing that will get marked processed even though they have not been.

The easiest way would be to use pl/sql to open a cursor that selects unprocessed rows, processes them and then updates the row as processed. If you have an aversion to walking cursors, you could collect the pk's or rowids into a nested table, process them and then update using the nested table.

Scott Bailey
+3  A: 

Everyone here is really really close. However:

Scott Bailey's wrong about using a bitmap index if the table's under any sort of continuous DML load. That's exactly the wrong time to use a bitmap index.

Everyone else's answer about the PROCESSED CHAR(1) check in ('Y','N')column is right, but missing how to index it; you should use a function-based index like this:

CREATE INDEX MY_UNPROCESSED_ROWS_IDX ON MY_TABLE
  (CASE WHEN PROCESSED_FLAG = 'N' THEN 'N' ELSE NULL END);

You'd then query it using the same expression:

SELECT * FROM MY_TABLE
 WHERE (CASE WHEN PROCESSED_FLAG = 'N' THEN 'N' ELSE NULL END) = 'N';

The reason to use the function-based index is that Oracle doesn't write index entries for entirely NULL values being indexed, so the function-based index above will only contain the rows with PROCESSED_FLAG = 'N'. As you update your rows to PROCESSED_FLAG = 'Y', they'll "fall out" of the index.

Adam Musch
Sorry, terrible head cold today and I'm not completely "here" today. Adam is absolutely right that the best solution is to use a standard B-tree index and null values. Although I would do it w/o the functional index part. Call your column something like needs_processed with a default of 'Y' will set that value to 'Y' on all new columns and then after you have processed them, set the value to null which means that they will no longer be indexed. That way your index stays small and your queries remain simple. Of course taken with the caveat that I'm still heavily medicated :)
Scott Bailey
I think in your sample query you mean the right-side expression to be `'N'`.
Dave Costa
@Dave Costa: You are correct. Corrected answer.
Adam Musch
if I query by where PROCESSED_FLAG='N' , is it ok?
Hlex
@Hlex: No. You need to query by the expression being indexed.
Adam Musch
Ok, make it simple, after I update to null after process, only N value will be indexed. Collect?
Hlex
If you update a record's PROCESSED_FLAG to NULL or to any value other than 'N', that record will not be present in the index.
Adam Musch
A: 

If performance is really a problem and you want to create your file asynchronously, you might want to use Oracle Streams, which will actually get modification data from your redo log withou affecting performance of the main database. You may not even need a separate job, as you can configure Oracle Streams to do Asynchronous replication of the changes, through which you can trigger the file creation.

VikrantY
A: 

Why not create an extra table that holds two columns. The ID column and a processed flag column. Have an insert trigger on the original table place it's ID in this new table. Your logging process can than select records from this new table and mark them as processed. Finally delete the processed records from this table.

Rene
A: 

I'm pretty much in agreement with Adam's answer. But I'd want to do some serious testing compared to an alternative.

The issue I see is that you need to not only select the rows, but also do an update of those rows. While that should be pretty fast, I'd like to avoid the update. And avoid having any large transactions hanging around (see below).

The alternative would be to add CREATE_DATE date default sysdate. Index that. And then select records where create_date >= (start date/time of your previous select).

But I don't have enough data on the relative costs of setting a sysdate as default vs. setting a value of Y, updating the function based vs. date index, and doing a range select on the date vs. a specific select on a single value for the Y. You'll probably want to preserve stats or hint the query to use the index on the Y/N column, and definitely want to use a hint on a date column -- the stats on the date column will almost certainly be old.

If data are also being added to the table continuously, including during the period when your query is running, you need to watch out for transaction control. After all, you don't want to read 100,000 records that have the flag = Y, then do your update on 120,000, including the 20,000 that arrived when you query was running.

In the flag case, there are two easy ways: SET TRANSACTION before your select and commit after your update, or start by doing an update from Y to Q, then do your select for those that are Q, and then update to N. Oracle's read consistency is wonderful but needs to be handled with care.

For the date column version, if you don't mind a risk of processing a few rows more than once, just update your table that has the last processed date/time immediately before you do your select.

If there's not much information in the table, consider making it Index Organized.

Jim Hudson
A: 

What about using Materialized view logs? You have a lot of options to play with:

SQL> create table test (id_test number primary key, dummy varchar2(1000));

Table created

SQL> create materialized view log on test;

Materialized view log created

SQL> insert into test values (1, 'hello');

1 row inserted

SQL> insert into test values (2, 'bye');

1 row inserted

SQL> select * from mlog$_test;

   ID_TEST SNAPTIME$$  DMLTYPE$$ OLD_NEW$$ CHANGE_VECTOR$$
---------- ----------- --------- --------- ---------------------
         1 01/01/4000  I         N         FE
         2 01/01/4000  I         N         FE

SQL> delete from mlog$_test where id_test in (1,2);

2 rows deleted

SQL> insert into test values (3, 'hello');

1 row inserted

SQL> insert into test values (4, 'bye');

1 row inserted

SQL> select * from mlog$_test;

   ID_TEST SNAPTIME$$  DMLTYPE$$ OLD_NEW$$ CHANGE_VECTOR$$
---------- ----------- --------- --------- ---------------
         3 01/01/4000  I         N         FE
         4 01/01/4000  I         N         FE
FerranB