views:

133

answers:

3

I have an as400 table containing roughly 1 million rows of full names / company names which I would like to convert to use another datastore while still matching the speed of the original.

Currently, a user enters the search and almost instantaneously gets the alphabetical position of the search term in the table and and a page of matches. The user can then paginate either up or down through the records very quickly.

There is almost no updating of the data and approximately 50 inserts per week. I'm thinking that any database can maintain an alphabetical index of the names, but I'm unsure of how to quickly find the position of the search within the dataset. Any suggestions are greatly appreciated.

+2  A: 

This sounds just like a regular pagination of results, except that instead of going to a specific page based on a page number or offset being requested, it goes to a specific page based on where the user's search fits in the results alphabetically.

Let's say you want to fetch 10 rows after this position, and 10 rows before.

If the user searches for 'Smith', you could do two selects such that:

SELECT
  name
FROM
  companies
WHERE
  name < 'Smith'
ORDER BY
  name DESC
LIMIT 10

and then

SELECT
  name
FROM
  companies
WHERE
  name >= 'Smith'
ORDER BY
  name
LIMIT 10

You could do a UNION to fetch that in one query, the above is just simplified.

The term the user searched for would fit half way through these results. If there are any exact matches, then the first exact match will be positioned such that it is eleventh.

Note that if the user searches for 'aaaaaaaa' then they'll probably just get the 10 first results with nothing before it, and for 'zzzzzzzz' they may get just the 10 last results.

I'm assuming that the SQL engine in question allows >= and < comparisons between strings (and can optimise that in indexes), but I haven't tested this, maybe you can't do this. If, like MySQL, it supports internationalized collations then you could even have the ordering done correctly for non-ascii characters.

thomasrutter
+1  A: 

If by "the position of the search" you mean the number of the record if they were enumerated alphabetically, you may want to try something like:

select count(*) from companies where name < 'Smith'

Most databases ought to optimize that reasonably well (but try it--theories you read on the web don't trump empirical data).

MarkusQ
+1  A: 

Just to add to the ordering suggestions:

  • Add an index to the name if this is your standard means of data retrieval.
  • You can paginate efficiently by combining LIMIT and OFFSET.
HUAGHAGUAH