This question is largely driven by curiosity, as I do have a working query (it just takes a little longer than I would like).
I have a table with 4 million rows. The only index on this table is an auto-increment BigInt ID. The query is looking for distinct values in one of the columns, but only going back 1 day. Unfortunately, the ReportDate column that is evaluated is not of the DateTime type, or even a BigInt, but is char(8) in the format of YYYYMMDD. So the query is a bit slow.
SELECT Category
FROM Reports
where ReportDate = CONVERT(VARCHAR(8), GETDATE(), 112)
GROUP BY Category
Note that the date converstion in the above statement is simply converting it to a YYYYMMDD format for comparison.
I was wondering if there was a way to optimize this query based on the fact that I know that the only data I am interested in is at the "bottom" of the table. I was thinking of some sort of recursive SELECT function which gradually grew a temporary table that could be used for the final query.
For example, in psuedo-sql:
N = 128
TemporaryTable = SELECT TOP {N} *
FROM Reports
ORDER BY ID DESC
/* Once we hit a date < Today, we can stop */
if(TemporaryTable does not contain ReportDate < Today)
N = N**2
Repeat Select
/* We now have a smallish table to do our query */
SELECT Category
FROM TemproaryTable
where ReportDate = CONVERT(VARCHAR(8), GETDATE(), 112)
GROUP BY Category
Does that make sense? Is something like that possible?
This is on MS SQL Server 2008.