I have lots of data with start and stop times for a given ID and I need to flatten all intersecting and adjacent timespans into one combined timespan. The sample data posted below is all for the same ID so I didn't list it.
To make things a bit clearer, take a look at the sample data for 03.06.2009:
The following timespans are overlapping or contiunous and need to merge into one timespan
- 05:54:48 - 10:00:13
- 09:26:45 - 09:59:40
The resulting timespan would be from 05:54:48 to 10:00:13. Since there's a gap between 10:00:13 and 10:12:50 we also have the following timespans:
- 10:12:50 - 10:27:25
- 10:13:12 - 11:14:56
- 10:27:25 - 10:27:31
- 10:27:39 - 13:53:38
- 11:14:56 - 11:15:03
- 11:15:30 - 14:02:14
- 13:53:38 - 13:53:43
- 14:02:14 - 14:02:31
which result in one merged timespan from 10:12:50 to 14:02:31, since they're overlapping or adjacent.
Below you will find the sample data and the flattened data as I would need it. The duration column is just informative.
Any solution - be it SQL or not - is appreciated.
EDIT: Since there are lots of different and interesting solutions I'm refining my original question by adding constraints to see the "best" (if there is one) solution bubble up:
- I'm getting the data via ODBC from another system. There's no way to change the table layout for me or adding indexes
- The data is indexed only by the date column (the time part isn't)
- There are about 2.5k rows for every day
- The estimated usage pattern of the data is roughly as follows:
- Most of the time (lets say 90%) the user will query just one or two days (2.5k - 5k rows)
- Sometimes (9%) the range will be up to a month (~75k rows)
- Rarely (1%) the range will be up to a year (~900k rows)
- The query should be fast for the typical case and not "last forever" for the rare case.
- Querying a year worth of data takes about 5 minutes (plain select without joins)
Within these constraints, what would be the best solution? I'm afraid that most of the solutions will be horribly slow since they join on the combination of date and time, which is not an index field in my case.
Would you do all the merging on the client or the server side? Would you first create an optimized temp table and use one of the proposed solutions with that table? I didn't have the time to test the solutions until now but I will keep you informed what works best for me.
Sample data:
Date | Start | Stop
-----------+----------+---------
02.06.2009 | 05:55:28 | 09:58:27
02.06.2009 | 10:15:19 | 13:58:24
02.06.2009 | 13:58:24 | 13:58:43
03.06.2009 | 05:54:48 | 10:00:13
03.06.2009 | 09:26:45 | 09:59:40
03.06.2009 | 10:12:50 | 10:27:25
03.06.2009 | 10:13:12 | 11:14:56
03.06.2009 | 10:27:25 | 10:27:31
03.06.2009 | 10:27:39 | 13:53:38
03.06.2009 | 11:14:56 | 11:15:03
03.06.2009 | 11:15:30 | 14:02:14
03.06.2009 | 13:53:38 | 13:53:43
03.06.2009 | 14:02:14 | 14:02:31
04.06.2009 | 05:48:27 | 09:58:59
04.06.2009 | 06:00:00 | 09:59:07
04.06.2009 | 10:15:52 | 13:54:52
04.06.2009 | 10:16:01 | 13:24:20
04.06.2009 | 13:24:20 | 13:24:24
04.06.2009 | 13:24:32 | 14:00:39
04.06.2009 | 13:54:52 | 13:54:58
04.06.2009 | 14:00:39 | 14:00:49
05.06.2009 | 05:53:58 | 09:59:12
05.06.2009 | 10:16:05 | 13:59:08
05.06.2009 | 13:59:08 | 13:59:16
06.06.2009 | 06:04:00 | 10:00:00
06.06.2009 | 10:16:54 | 10:18:40
06.06.2009 | 10:18:40 | 10:18:45
06.06.2009 | 10:23:00 | 13:57:00
06.06.2009 | 10:23:48 | 13:57:54
06.06.2009 | 13:57:21 | 13:57:38
06.06.2009 | 13:57:54 | 13:57:58
07.06.2009 | 21:59:30 | 01:58:49
07.06.2009 | 22:12:16 | 01:58:39
07.06.2009 | 22:12:25 | 01:58:28
08.06.2009 | 02:10:33 | 05:56:11
08.06.2009 | 02:10:43 | 05:56:23
08.06.2009 | 02:10:49 | 05:55:59
08.06.2009 | 05:55:59 | 05:56:01
08.06.2009 | 05:56:11 | 05:56:14
08.06.2009 | 05:56:23 | 05:56:27
Flattened result:
Date | Start | Stop | Duration
-----------+----------+----------+---------
02.06.2009 | 05:55:28 | 09:58:27 | 04:02:59
02.06.2009 | 10:15:19 | 13:58:43 | 03:43:24
03.06.2009 | 05:54:48 | 10:00:13 | 04:05:25
03.06.2009 | 10:12:50 | 14:02:31 | 03:49:41
04.06.2009 | 05:48:27 | 09:59:07 | 04:10:40
04.06.2009 | 10:15:52 | 14:00:49 | 03:44:58
05.06.2009 | 05:53:58 | 09:59:12 | 04:05:14
05.06.2009 | 10:16:05 | 13:59:16 | 03:43:11
06.06.2009 | 06:04:00 | 10:00:00 | 03:56:00
06.06.2009 | 10:16:54 | 10:18:45 | 00:01:51
06.06.2009 | 10:23:00 | 13:57:58 | 03:34:58
07.06.2009 | 21:59:30 | 01:58:49 | 03:59:19
08.06.2009 | 02:10:33 | 05:56:27 | 03:45:54