ansaurus

Question

Answer 1

+6 A:

This is one way to do it:

select * from table_a, table_b
where table_a.sku = table_b.sku
and abs(datediff(second,table_a.datetime,table_b.datetime))<=3

Be careful, with big tables this kind of join can be very slow.

tekBlues 2009-06-15 17:02:16

Must? Not really, there are many different ways of doing it...

Guffa 2009-06-15 17:09:02

Sorry, I'm not a native English speaker and picked the wrong word. Fixed.

tekBlues 2009-06-15 17:19:20

Answer 2

A:

SELECT tbl1.*, tbl2.*
FROM tbl1, tbl2
WHERE ABS(DATEDIFF(second, tb1.date, tbl2.date)) <= 3

Justin Balvanz 2009-06-15 17:03:21

I am sure this debate exists somewhere else on the site, but aren't we encouraging the use of INNER JOIN and LEFT JOIN syntax over putting the join clause in the WHERE statement now?

Bill 2009-06-15 17:17:32

I think it's a personal choice. With a complicated JOIN you could potential be risking readibility in the code. We all know how hard it is to do proper documentation so that more you can make the code look like what you intend to do the better. That's how I like to see things. However, if I ran this and found it to be tremendously lagging, I would adjust accordingly. Write good, readible code first, then improve performance later...

Justin Balvanz 2009-06-15 17:44:20

This is not about performance. The "old style" joins are simply inferior to the JOIN syntax, even optically. And... well... they are old. As in "deprecated over 15 years ago". Which is eons ago. ;-)

Tomalak 2009-06-15 18:29:45

Answer 3

+5 A:

SELECT
  t1.id,
  t2.id
FROM
  t1
  INNER JOIN t2 ON ABS(DATEDIFF(ms, t1.datefield, t2.datefield)) <= 3000
WHERE
  ...

Beware, this will be slow. And probably not always right (as in: it will not always join records that should be joined because they belong together, it will of course always be technically correct).

EDIT:

Changed from ABS(DATEDIFF(ss, t1.datefield, t2.datefield)) <= 3 to the above because of @richardtallent's excellent observation in the comments.

Tomalak 2009-06-15 17:03:25

I agree with this answer and up-voted it, but the OP may want to use "ms" instead of "ss" and test for 3000 instead of 3. Otherwise, deltas of less than 3.5 seconds will be rounded down to 3 seconds and included in the result, which may not be desired.

richardtallent 2009-06-15 17:06:32

Absolutely correct. Thanks for that.

Tomalak 2009-06-15 17:12:12

It's actually worse than that--deltas of 3.999 seconds could be included while deltas of 2.999 could be excluded. Datediff counts boundary crossings so 0 to 2.999 would be 2s or 2999ms while 0 to 3.999 would be 3s or 3999ms. The updated version is better.

Michael Haren 2009-06-15 17:34:15

Answer 4

+2 A:

Two ways spring into mind:

... on other.date between dateadd(second,-3,log.date) and dateadd(second,3,log.date)

and:

... on abs(datediff(millisecond, other.date, log.date)) <= 3000

Check which one gives you the best performance.

(Edit: Changed seconds to milliseconds in the second alternative, as richardtallent pointed out in a comment to a different answer.)

Guffa 2009-06-15 17:04:55

My guess is that they perform roughly equal. Both are scalar operations on a numerical value, both cannot use an index. The time needed to create two DATEADD() results or one datediff() result per record is negligible, so the performance hit comes from the table scan and the unavailability of an index.

Tomalak 2009-06-15 17:10:09

Good answer. It's critical to use field values rather than expressions.

le dorfier 2009-06-15 17:14:37

Answer 5

+2 A:

It's challenging to do it efficiently, because you end up querying using an expression rather than field values directly, and such queries don't use indexes well.

In particular, don't use DATEDIFF.

You might have a chance with

WHERE DATEADD(ms, date1, 3) < date2 AND DATEADD(ms, date1, -3) > date2

or

WHERE date2 BETWEEN DATEADD(ms, date1, 3) AND DATEADD(ms, date1, -3)

Note: It's usually possible to refactor the insertions to have exactly the same timestamp values if they derive from the same event or transaction (but then they would have some other common key and you would't need to do this.)

You could also store the original value at a lower resolution - say full second, or even minute - depending on your requirements, but then you'd still have the boundary problems.

le dorfier 2009-06-15 17:05:01

IMHO, this will result in a table scan just like the other scalar expressions, so probably it won't perform any better. +1 for providing alternatives in any case.

Tomalak 2009-06-15 17:16:34

Thanks, I have a single-digit confidence level myself. But I think it's worth pointing out specifically what characteristics of a SQL statement make it more or less likely to be optimizable.

le dorfier 2009-06-15 18:02:38

It was just a thought on my part, no criticism. I share your POV. ;-)

Tomalak 2009-06-15 18:21:34

Answer 6

A:

Not tested, but this should work...

Select *
From tbl1
  inner join tbl2 on tbl1.id=tbl2.id and 
     Datediff(second, tbl1.dttime, tbl2.dttime) between -3 and 3

Bill 2009-06-15 17:13:56

Answer 7

+1 A:

Can you depend on the order of the datetime values for rows that should match? For example, does the log entry always precede the entry into the other table? If so, modify one of the answers above to remove the ABS() function (taking special care of the order of the DATEDIFF parameters). This will help to prevent false matches.

Jesse 2009-06-15 17:44:54

+1 I like your POV. I will definitely take a look at it.

THEn 2009-06-15 19:50:01

ansaurus

tags:

views:

answers:

Joining by datetimefield SQL Server.

related questions