views:

71

answers:

4

It's my understanding that the quickest way to access a particular row is by its ROWID. In INFORMIX-SE 7.3, when I do a 'SELECT ROWID FROM table;' I notice that its values are type SERIAL[INT]. In oracle, they are SERIAL[HEX]. Has anyone ever used ROWID for any practical use?.. If I wanted to locate the most recent row added to a table, would 'SELECT MAX(ROWID) FROM table' be quicker and reliable than say 'SELECT MAX(pk_id) FROM table', where pk_id is a user-created SERIAL column?.. What other practical use have you ever put ROWID to work for you?

+1  A: 

Your understanding is not necessarily correct. The ROWID property in SQL Server is primarily intended for replication as a way to guarantee that the table has a single-field unique index value. This way the replication system does not have to account for any specific primary key semantics that your design might employ, while still being able to identify every row by a single column. No table is required to have a ROWID column unless it is part of a merge replication publication, so it's not something that every table has, unlike Oracle. It also doesn't serve the same purpose (they're Guid's--or uniqueidentifier in T-SQL parlance--on SQL Server and are random, not sequential integers like they are on Oracle).

The quickest way to retrieve a row from a table is by accessing the row via the clustered index. A table can only have one clustered index, as it's what determines the physical ordering of the rows on the disk. Furthermore, if the table has a primary key, the primary key is the clustered index. While it's possible to declare a table without a primary key and assign the clustered index to something else, I can't (off the top of my head) fathom a reason why you'd want to do this (or, for practical purposes, how you can justify having a table without a primary key).

In short, that means that the quickest way to retireve a row is by using the primary key of the table. Unless the ROWID column is the primary key (which is certainly possible to do), then it isn't the fastest way.

Adam Robinson
+1 for very concise answer. I just want to chip in and say that, while SQL primary keys are clustered by default, they don't *have* to be (as you pointed out). As to why this would be, your primary key could be a GUID, and the clustered index could be on a sequential int field, for example. I've made this change in some product databases that have clustered the GUID and have FKs on that field. It's a way to improve INSERT performance on an existing DB without changing the design.
Randolph Potter
@Adam- 1)Perhaps I didn't make my question clear, what I meant by "...by an SQL server?" was not the SQL Server product from Microsoft, I meant any SQL engine. 2)Your comment doesn't relate to the questions I posed, "Has anyone ever used ROWID for any practical use?.." or locating the most recently added row. 3) I'm very familiar with clustering, see "customer.id[serial] joining transaction.id[integer] vs. ..." question I previously posted, and 4)@Randolph- SQL primary keys are not clustered by default, unless you explicitly create a clustered index for them.
Frank Computer
@Frank: 1) If you're referring to *any* SQL engine, then the `ROWID` concept goes out the window, as it is not a standard SQL component. Not all engines have it. 2) The only practical purpose it can serve would depend entirely on the RDBMS. There's no appropriate answer if you want it to be general. 3) Not sure what you're getting at. If you're familiar with clustering, then you should know that it's the fastest way to retrieve or join a row. 4) That's simply incorrect. Primary keys *are* clustered by default in almost every RDBMS.
Adam Robinson
+1  A: 

Perhaps the term "RDBMS" rather than "an SQL server"?

Attaching any purpose to a ROWID is a bad idea. Particularly if you're in the habit of dropping and recreating tables. If your table needs a SERIAL PK, then that's what it should have. No good can come of using ROWIDs within your application.

RET
@RET- My only purpose for considering the use of ROWID is for obtaining faster performance in locating the most recently added row to a table. 'SELECT MAX(ROWID) FROM transactions' vs. 'SELECT MAX(pk_serial) FROM transactions'. Since my app was written with INFORMIX-SQL and using the Standadrd Engine (SE), ROWID's are type SERIAL[INT]. Newly added rows are placed at the end of the datafile. Rows are not deleted by users, but historic rows are not re-loaded in my reorg proc. Even if rows were to be deleted, they are flagged but not physically removed, thus their ROWID is not re-assigned.
Frank Computer
+2  A: 

Well, I can only really tell how it works in Oracle, using it for 19+ years :-)

Put simply, ROWID is an internel identification, that acts like an physical address. It can be split into database file no, block no, and row no. So obtaining the ROWID makes the db engine able to look the data up in a single direct IO.

In an index the B* tree will have ROWIDs on the leaf nodes pointing directly the location of the data, e.g. in a primary index.

Being an physical address it is submit to change on relocation on disk, which can happen after restoring a backup, rebuilding a table, or export/import of data.

The db engine can do some tricks, e.g. when moving a pluggable tablespace from one instance to another to avoid rebuilding indexes, however this is strickly db engine internals.

So to keep out of trouble leave the ROWID for internal use for the db engine. Storing the ROWID for your own usage will eventually lead to inconsistency.

Niels Wind
+1  A: 

In Informix-SE, the ROWID is basically the record number within the C-ISAM file that is used to hold the table. SE only deals in fixed size records, of course (no VARCHAR data).

In Informix Dynamic Server, the ROWID is (a) more complex (page number plus slot number) and (b) not always present (fragmented tables do not expose the ROWID, unless the table was created WITH ROWIDS, in which case the ROWID is a physical column that is indexed after all) - be aware!

If no data is ever deleted and you are using SE, then selecting the row with the maximum ROWID will be the most recently added row. If a row is deleted, then that space will eventually be reused, and then the most recently added row ceases to be the one with the maximum ROWID. (IDS does not make that promise for a variety of complex reasons.)

The SE implementation of ROWID does not store the ROWID in the table, and does not create an index on it, but it does not need an index because it knows the formula for where to go to find the data (offset in data file = ROWID * RowSize), give or take a plus one on RowSize or a minus one ROWID or both.

As to practical use for ROWID, the style that was used before fragmentation was added to IDS was to select a list of ROWID values for the records of interest in the table, maintaining that list in memory:

SELECT ROWID
  FROM InterestingTable
 WHERE SomeColumn = xxx
   AND AnotherColumn < yyy;

Then, the program could present these rows one at time, fetching the current data via the stored ROWID. The ROWID for a record would not change while a program was running. This ensured that the current data - whether edits from the current user or someone else - was shown when the record was displayed.

There's a program you're familiar with, ISQL Perform, that behaves like this. And it does not work with fragmented tables (necessarily in IDS; SE does not support fragmented tables) unless they are created with a physical ROWID column with the WITH ROWIDS clause.

Jonathan Leffler
@Jonathan- My only purpose for considering the use of ROWID is for obatining faster performance in locating the most recently added row to a table. 'SELECT MAX(ROWID) FROM transactions' vs. 'SELECT MAX(pk_serial) FROM transactions'. Since my app was written with INFORMIX-SQL and using the Standadrd Engine (SE), ROWID's are a SERIAL[INT] and newly added rows are placed at the end of the datafile. Rows are not deleted by users, but historic rows are not re-loaded on the daily reorg. I think, if rows were deleted, they are flagged but not physically removed, thus their ROWID is not re-assigned.
Frank Computer
@Frank: the key point is not to try storing ROWID values in (other) tables. Although ROWID is normally stable while a program is running (unless someone manages to alter a table, for example), you should not store ROWID values in the DB. Rows are flagged as deleted by changing the end-of-record marker from newline '\n' to NUL '\0'. However, the space *is* made available for reuse.
Jonathan Leffler
@Jonathan- I don't intend to use ROWID for storing it, just to use it in a query to locate the most recently added row, like in 'SELECT MAX(ROWID) FROM table', plus the system is single-user and the user is not allowed to delete any rows.
Frank Computer
@Jonathan- Could I use ROWID for selecting a certain percentage of rows from a table using something like 'SELECT * FROM table WHERE ROWID=(trunc(rowid/x)*x)'.. x being the 1/x fraction of rows I want returned?
Frank Computer
@Frank: I suppose you could use ROWID to select a fraction of rows (in SE; you could not easily do that in IDS). But you would be attuning your system to a quirk of the DBMS you are using.
Jonathan Leffler