views:

2263

answers:

5

I have a spreadsheet with a dataset of a number of transactions, each of which is composed of substeps, each of which has the time that it occurred. There can be a variable number and order of steps.

I'd like to find the duration of each transaction. If I can do this in Excel then great, as it's already in that format. If there isn't a straight-forward way to do this in Excel, I'll load it into a database and do the analysis with SQL. If there is an Excel way round this it'll save a few hours setup though :)

A simplified example of my data is as follows:

TransID, Substep, Time
1, step A, 15:00:00
1, step B, 15:01:00
1, step C, 15:02:00
2, step B, 15:03:00
2, step C, 15:04:00
2, step E, 15:05:00
2, step F, 15:06:00
3, step C, 15:07:00
3, step D, 15:08:00
etc.

I'd like to produce a result set as follows:

TransID, Duration
1, 00:02:00
2, 00:03:00
3, 00:01:00
etc.

My initial try was with an extra column with a formula subtracting end time from start time, but without a repeating number of steps, or the same start and end steps I'm having difficulty seeing how this formula would work.

I've also tried creating a pivot table based on this data with ID as the rows and Time as the data. I can change the field settings on the time data to return grouped values such as count or max, but am struggling to see how this can be setup to show max(time) - min(time) for each ID, hence why I'm thinking about heading to SQL. If anyone can point out anything obvious I'm missing though, I'd be very grateful.

As suggested by Hobbo, I've now used a pivot table with TransID as the rows and twice added Time as the data. After setting the field settings on the Time to Max on the first and Min on the second, a formula can be added just outside the pivot table to calculate the differences. One thing I'd been overlooking here is that the same value can be added to the data section more than once!

A follow-on problem was that the formula I add is of the form =GETPIVOTDATA("Max of Time",$A$4,"ID",1)-GETPIVOTDATA("Min of Time",$A$4,"ID",1), whici doesn't then increment when copying and pasting. Solutions to this are to either use the pivot table toolbar to turn off GETPIVOTDATA formulae, or rather than clicking on the pivot table when selecting cells in the formula, type the cell references instead (e.g. =H4-G4)

+1  A: 

Maybe something as simple as a query like this.

SELECT TransID, DateDiff(mi, Min(Time),Max(Time)) AS Duration
FROM MyTable
GROUP BY TrandID
Kibbee
I could do that if I load it into a DB for sure. I was wondering if I can do it in Excel and was simply overlooking something. If not then knocking up a small database and import is tomorrow's task :)
Kris C
@kibbee, I think you may mean 'n', not mi.
Remou
+1  A: 

To add to Kibbee's post, in reference to the comment, you can use ADO with Excel:

'From: http://support.microsoft.com/kb/246335 '

strFile = Workbooks(1).FullName
strCon = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" & strFile _
    & ";Extended Properties=""Excel 8.0;HDR=Yes;IMEX=1"";"

Set cn = CreateObject("ADODB.Connection")
Set rs = CreateObject("ADODB.Recordset")

cn.Open strCon

strSQL = "SELECT TransID, DateDiff('n', Min([MyTime]),Max([MyTime])) AS Duration " _
         & "FROM [Sheet1$] GROUP BY TransID"

rs.Open strSQL, cn

'Write out to another sheet '
Worksheets(2).Cells(2, 1).CopyFromRecordset rs

EDIT: I have corrected some errors in the original post and changed the name of the time column to MyTime. Time is a reserved word in SQL and causes difficulties in queries. This now works on a very simple test.

Remou
Nice idea, I like it. It works for a simple query - e.g "SELECT TransID FROM [Sheet1$] GROUP BY TransID", but not for the query in question: "SELECT TransID, DateDiff(mi, Min(Time),Max(Time)) AS Duration FROM [Sheet1$] GROUP BY TransID". I assume to do with time formats, will investigate more...
Kris C
Thanks for your help - this would have got my answers with a little work I think, although the pivot table avoids any coding and is up and running so I'll use that this time. Handy to know about ADO with Excel - think I may find uses for that in the future!
Kris C
A: 

Sometimes it is possible to do something once in Excel far more easily than it is to do something repeatably.

Assuming you are just trying to get the answer once or twice, and then throw away the spreadsheet (as opposed to run it every night, or give it to someone else to run), here's how I would do it.

I assume your raw data is in columns A, B and C, with headings in row 1, and data starting in row 2.

Sort the table by TransId as your primary key, and Time as your secondary, both ascending. (The following won't work if this isn't done.)

Add a new column, D, titled Duration with a formula that like this (Excel formulae haven't formatting or comments; I have added those to help explain, but they need to be stripped out):

=IF(B2=B3,           // if this row's TransId is the same as the next one
    "",              // leave this field blank
    C3-              // else find the difference between the last timestamp and...
  VLOOKUP(   // look for the first value
  A2,          // matching this TransId
  A:C,         // within the entire table,
  3)           // Return the value in the third column - i.e. timestamp
 )

Now the data you want is in column D, but not in the format you want.

Select Columns A-D and copy them. Use Paste Special to copy the values only into a new worksheet.

Delete column B and column C in the new worksheet, so all is left is TransID and Duration.

Sort by Duration, to bring all the rows with values next to each other.

Sort only the rows with values by TransId.

Voila, and there is your solution! Hope you don't need to repeat this!

p.s. This is untested

Oddthinking
This is the kind of solution I thought Excel would do - I'll have a play with this against my real data tomorrow; if I can't get the ADO querying thing going, this looks like it should do what's needed. I need to do it up to about 10 times, so could get a little painful tho :S Oh excel...
Kris C
+1  A: 
Hobbo
Looks good, the only problem I have is that the formula I add is of the the form =GETPIVOTDATA("Max of Time, $A$4, "ID", 1) - GETPIVOTDATA("Max of Time, $A$4, "ID", 1). When I copy that to the cells below, the 1 doesn't update to 2, 3 etc so they all show the same time.
Kris C
Hmmm - my Excel (2003) doesn't seem to have that button. I'll have a look and see if I can find a similar setting elsewhere
Kris C
Ignore that, figured it out :) Type the cell references (e.g. =H4-G4) rather than clicking on the cells to select them in the formula.
Kris C
Thanks for your help - looks like that should scale up nicely to the actual data I'm dealing with :)
Kris C
Ah.. that button isn't visible by default. Click on the small drop-down arrow at the right hand side of the toolbar, 'Add and Remove buttons'->'Pivot toolbar'->'Generate GetPivotData'
Hobbo
+1  A: 

In excel:

  A     B        C
1 1, step A, 15:00:00
2 1, step B, 15:01:00
3 1, step C, 15:02:00
4 2, step B, 15:03:00
5 2, step C, 15:04:00
6 2, step E, 15:05:00
7 2, step F, 15:06:00
8 3, step C, 15:07:00
9 3, step D, 15:08:00

11 1, =max(if($A$1:$A$9=$A11,$C$1:$C$9,"")-min(if($A$1:$A$9=$A11,$C$1:$C$9,"")
12 2, =max(if($A$1:$A$9=$A12,$C$1:$C$9,"")-min(if($A$1:$A$9=$A12,$C$1:$C$9,"")

note: formulas are array functions so press ctrl-shift-enter after editing them.

wakingrufus