views:

138

answers:

7
+6  A: 

This is one way to do it:

select * from table_a, table_b
where table_a.sku = table_b.sku
and abs(datediff(second,table_a.datetime,table_b.datetime))<=3

Be careful, with big tables this kind of join can be very slow.

tekBlues
Must? Not really, there are many different ways of doing it...
Guffa
Sorry, I'm not a native English speaker and picked the wrong word. Fixed.
tekBlues
A: 
SELECT tbl1.*, tbl2.*
FROM tbl1, tbl2
WHERE ABS(DATEDIFF(second, tb1.date, tbl2.date)) <= 3
Justin Balvanz
I am sure this debate exists somewhere else on the site, but aren't we encouraging the use of INNER JOIN and LEFT JOIN syntax over putting the join clause in the WHERE statement now?
Bill
I think it's a personal choice. With a complicated JOIN you could potential be risking readibility in the code. We all know how hard it is to do proper documentation so that more you can make the code look like what you intend to do the better. That's how I like to see things. However, if I ran this and found it to be tremendously lagging, I would adjust accordingly. Write good, readible code first, then improve performance later...
Justin Balvanz
This is not about performance. The "old style" joins are simply inferior to the JOIN syntax, even optically. And... well... they are old. As in "deprecated over 15 years ago". Which is eons ago. ;-)
Tomalak
+5  A: 
SELECT
  t1.id,
  t2.id
FROM
  t1
  INNER JOIN t2 ON ABS(DATEDIFF(ms, t1.datefield, t2.datefield)) <= 3000
WHERE
  ...

Beware, this will be slow. And probably not always right (as in: it will not always join records that should be joined because they belong together, it will of course always be technically correct).

EDIT:

Changed from ABS(DATEDIFF(ss, t1.datefield, t2.datefield)) <= 3 to the above because of @richardtallent's excellent observation in the comments.

Tomalak
I agree with this answer and up-voted it, but the OP may want to use "ms" instead of "ss" and test for 3000 instead of 3. Otherwise, deltas of less than 3.5 seconds will be rounded down to 3 seconds and included in the result, which may not be desired.
richardtallent
Absolutely correct. Thanks for that.
Tomalak
It's actually worse than that--deltas of 3.999 seconds could be included while deltas of 2.999 could be excluded. Datediff counts boundary crossings so 0 to 2.999 would be 2s or 2999ms while 0 to 3.999 would be 3s or 3999ms. The updated version is better.
Michael Haren
+2  A: 

Two ways spring into mind:

... on other.date between dateadd(second,-3,log.date) and dateadd(second,3,log.date)

and:

... on abs(datediff(millisecond, other.date, log.date)) <= 3000

Check which one gives you the best performance.

(Edit: Changed seconds to milliseconds in the second alternative, as richardtallent pointed out in a comment to a different answer.)

Guffa
My guess is that they perform roughly equal. Both are scalar operations on a numerical value, both cannot use an index. The time needed to create two DATEADD() results or one datediff() result per record is negligible, so the performance hit comes from the table scan and the unavailability of an index.
Tomalak
Good answer. It's critical to use field values rather than expressions.
le dorfier
+2  A: 

It's challenging to do it efficiently, because you end up querying using an expression rather than field values directly, and such queries don't use indexes well.

In particular, don't use DATEDIFF.

You might have a chance with

WHERE DATEADD(ms, date1, 3) < date2 AND DATEADD(ms, date1, -3) > date2

or

WHERE date2 BETWEEN DATEADD(ms, date1, 3) AND DATEADD(ms, date1, -3)

Note: It's usually possible to refactor the insertions to have exactly the same timestamp values if they derive from the same event or transaction (but then they would have some other common key and you would't need to do this.)

You could also store the original value at a lower resolution - say full second, or even minute - depending on your requirements, but then you'd still have the boundary problems.

le dorfier
IMHO, this will result in a table scan just like the other scalar expressions, so probably it won't perform any better. +1 for providing alternatives in any case.
Tomalak
Thanks, I have a single-digit confidence level myself. But I think it's worth pointing out specifically what characteristics of a SQL statement make it more or less likely to be optimizable.
le dorfier
It was just a thought on my part, no criticism. I share your POV. ;-)
Tomalak
A: 

Not tested, but this should work...

Select *
From tbl1
  inner join tbl2 on tbl1.id=tbl2.id and 
     Datediff(second, tbl1.dttime, tbl2.dttime) between -3 and 3
Bill
+1  A: 

Can you depend on the order of the datetime values for rows that should match? For example, does the log entry always precede the entry into the other table? If so, modify one of the answers above to remove the ABS() function (taking special care of the order of the DATEDIFF parameters). This will help to prevent false matches.

Jesse
+1 I like your POV. I will definitely take a look at it.
THEn