tags:

views:

85

answers:

5

I have a factored time series that looks like this:

df <- data.frame(a=c("11-JUL-2004", "11-JUL-2005", "11-JUL-2006", 
                   "11-JUL-2007", "11-JUL-2008"),
                 b=c("11-JUN-1999", "11-JUN-2000", "11-JUN-2001", 
                     "11-JUN-2002", "11-JUN-2003"))

First, I would like to convert this to a format native to R. Second, I would like to calculate the number of months between the two columns.

Update:

Essentially I'm trying to recreate what I do in SPSS, in R.

In SPSS I would:

  1. Convert the strings to date format DD-MMM-YYYY
  2. COMPUTE. RND((a-b)/60/60/24/30.416)

30.416 is short for 365/12 I don't care so much about month edge cases, hence the rounding operation.

+1  A: 
> Data <- data.frame(
+ V1=c("11-JUL-2004","11-JUL-2005","11-JUL-2006","11-JUL-2007","11-JUL-2008"),
+ V2=c("11-JUN-1999","11-JUN-2000","11-JUN-2001","11-JUN-2002","11-JUN-2003"))
> Data[,1] <- as.Date(Data[,1],"%d-%b-%Y")
> Data[,2] <- as.Date(Data[,2],"%d-%b-%Y")
> # Assuming 30 days per month
> (Data[,1]-Data[,2])/30
Time differences in days
[1] 61.90000 61.86667 61.86667 61.86667 61.90000
> # Assuming 30.416 days per month
> (Data[,1]-Data[,2])/30.416
Time differences in days
[1] 61.05339 61.02052 61.02052 61.02052 61.05339
> # Assuming month crosses
> require(zoo)
> Data[,1] <- as.yearmon(Data[,1])
> Data[,2] <- as.yearmon(Data[,2])
> (Data[,1]-Data[,2])*12
[1] 61 61 61 61 61
Joshua Ulrich
zoo looks like cleaner output AND input. I'll have to check that one out.
Brandon Bertelsen
@Brandon: yes, zoo's `yearmon` class is very handy if you're just dealing with monthly data. Note that you don't need to convert to `Date` first to use `yearmon` (e.g. on your initial data.frame: `Data[,1] <- as.yearmon(Data[,1],"%d-%b-%Y")`).
Joshua Ulrich
Yes, `zoo` is wonderful. But rest assured that under the hood it is using basic R types for the *ordered index*. It all comes back to understanding `POSIXct` et al -- unless you switch to something like lubridate.
Dirk Eddelbuettel
Any opinion on which one is better to put one's time into first?
Brandon Bertelsen
I use zoo (and xts) all the time for time series data. But you still need to grok how to work with POSIXct, strptime, ... Also, zoo and xts only wrap around matrices, not data.frames.
Dirk Eddelbuettel
@Dirk: I agree lubridate is very powerful. But there's nothing wrong with the basic types and a handy wrapper around them if that's all you need. And in the end, lubridate becomes machine code too before it hits the cpu...
Joris Meys
+2  A: 

Josh is spot-on with respect to the difficulty of what a month could mean. The lubridate package has some answers on that.

In terms of base R, we can answer it for weeks though:

> df[,"pa"] <- as.POSIXct(strptime(as.character(df$a),
+                         format="%d-%B-%Y", tz="GMT"))
> df[,"pb"] <- as.POSIXct(strptime(as.character(df$b),
+                         format="%d-%B-%Y",tz="GMT"))
> df[,"weeks"] <- difftime(df$pa, df$pb, unit="weeks")
> df[,"months"] <- difftime(df$pa, df$pb, unit="days")/30.416
> df
            a           b         pa         pb        weeks      months
1 11-JUL-2004 11-JUN-1999 2004-07-11 1999-06-11 265.29 weeks 61.053 days
2 11-JUL-2005 11-JUN-2000 2005-07-11 2000-06-11 265.14 weeks 61.021 days
3 11-JUL-2006 11-JUN-2001 2006-07-11 2001-06-11 265.14 weeks 61.021 days
4 11-JUL-2007 11-JUN-2002 2007-07-11 2002-06-11 265.14 weeks 61.021 days
5 11-JUL-2008 11-JUN-2003 2008-07-11 2003-06-11 265.29 weeks 61.053 days
> 

This uses the altered data.frame as per my edit so that we have proper column names. And if you throw an as.numeric() around difftime() you also get numbers.

Dirk Eddelbuettel
+3  A: 
df <- data.frame(c("11-JUL-2004","11-JUL-2005","11-JUL-2006","11-JUL-2007","11-JUL-2008"),
                 c("11-JUN-1999","11-JUN-2000","11-JUN-2001","11-JUN-2002","11-JUN-2003"))
names(df) <- c("X1","X2")
df <- within(df, X1 <- as.Date(X1, format = "%d-%b-%Y"))
df <- within(df, X2 <- as.Date(X2, format = "%d-%b-%Y"))

Then difftime() will give the difference in weeks:

> with(df, difftime(X1, X2, units = "weeks"))
Time differences in weeks
[1] 265.2857 265.1429 265.1429 265.1429 265.2857

Or if we use Brandon's approximation:

> with(df, difftime(X1, X2) / 30.416)
Time differences in days
[1] 61.05339 61.02052 61.02052 61.02052 61.05339

Closest I could get with lubridate (as highlighted by Dirk) is (using the above df)

> m <- with(df, as.period(subtract_dates(X1, X2)))
> m
[1] 5 years and 1 month   5 years and 1 month   5 years and 1 month   5 years and 1 month   5 years and 1 month
> str(m)
Classes ‘period’ and 'data.frame':  5 obs. of  6 variables:
 $ year  : int  5 5 5 5 5
 $ month : int  1 1 1 1 1
 $ day   : num  0 0 0 0 0
 $ hour  : int  0 0 0 0 0
 $ minute: int  0 0 0 0 0
 $ second: num  0 0 0 0 0
Gavin Simpson
OK, random drive-by down votes by people without leaving comments are beginning to p*** me of with this site. At least have the decency to say what is wrong with a comment so we have a chance to learn.
Gavin Simpson
I for one see nothing wrong with this answer. You could add that the number of months can be calculated as m$year*12+m$month. ;-)
Joris Meys
+1 From me, definitely no downvote.
Brandon Bertelsen
+2  A: 

Number 1 below seems closest to what you are asking for but 2 and 3 are alternatives you might also want to consider depending on your purpose. Also numbers 1 and 3 can be tried without rounding if you want to consider a fractional number of months.

# first convert columns of df to "Date" class
df[] <- lapply(df, as.Date, "%d-%b-%Y")

# 1. difference in days divided by 365.25/12
with(df, round((as.numeric(a) - as.numeric(b)) / (365.25/12)))

# 2. convert to 1st of month & then take diff in mos
library(zoo)
with(df, 12 * (as.yearmon(a) - as.yearmon(b)))

# 3. business style difference in months. See: ?"mondate-class"
library(mondate)
with(df, round(as.numeric(mondate(a) - mondate(b))))
G. Grothendieck
+1  A: 

Brandon,

You could do this with the lubridate package.

> library(lubridate)

Notify R that these are dates. Use the dmy() parser function because the dates are written Day, Month, Year (i.e, dmy).

> df <- transform(df, a = dmy(a), b = dmy(b))

Calculate the difference as a period. This will give you the number of whole years, months, days, etc.

> diff <- as.period(df$a - df$b)

Use math to convert the results to just months.

> 12* diff$year + diff$month

These were all 61 months apart. This would floor it to the nearest month. If you want to round based on the number of days you could do something like

> 12* diff$year + diff$month + round(diff$day/30)

I'm working on making these steps easier/more intuitive in the next version of lubridate.

Garrett