I have a table with a "Date" column. Each Date may appear multiple times. How do I select only the dates that appear < k number of times?
SELECT * FROM [MyTable] WHERE [Date] IN
(
SELECT [Date]
FROM [MyTable]
GROUP By [Date]
HAVING COUNT(*) < @Max
)
See @[SQLMenace] 's response also. It's very similar to this, but depending on your database his JOIN will probably run faster, assuming the optimizer doesn't make the difference moot.
Use the COUNT aggregate:
SELECT Date
FROM SomeTable
GROUP BY Date
HAVING COUNT(*) < @k
For "appears x times" queries it is best to use HAVING clause. In your case, query can be like:
SELECT Date FROM table GROUP BY Date HAVING COUNT(*)<k
or, in you need to select other columns except Date:
SELECT * FROM Table WHERE Date IN (
SELECT Date FROM table GROUP BY Date HAVING COUNT(*)<k)
You can also rewrite the IN to INNER JOIN, however this won't give performance gain, as, in fact, query optimizer will do this for you in most RDBMS. Having index on Date will certainly improve performance for this query.
SELECT date, COUNT(date)
FROM table
GROUP BY date
HAVING COUNT(date) < k
And then to get the original data back:
SELECT table.*
FROM table
INNER JOIN (
SELECT date, COUNT(date)
FROM table
GROUP BY date
HAVING COUNT(date) < k) dates ON table.date = dates.date
Assuming you are using Oracle, and k = 5:-
select date_col,count(*)
from your_table
group by date_col
having count(*) < 5;
If your date column has time filled out as well, and you want to ignore it, modify the query so it looks as follows:-
select trunc(date_col) as date_col,count(*)
from your_table
group by trunc(date_col)
having count(*) < 5;
select dates
from table t
group by dates having count(dates) < k ;
Hopefully, it works for ORACLE. HTH
example
DECLARE @Max int
SELECT @Max = 5
SELECT t1.*
FROM [MyTable] t1
JOIN(
SELECT [Date]
FROM [MyTable]
GROUP By [Date]
HAVING COUNT(*) < @Max
) t2 on t1.[Date] = t2.[Date]
You may not be able to count directly on the datefield if your dates include times. You may need to convert to just the year/month/day format first and then do the count on that.
Otherwise your counts will be off as usually there are very few records withthe exact same time.