views:

66

answers:

2

Dear all,

I am trying to convert a uncommon date format into a standard date. Basically I have a dataset that contains a period with semiannual frequency formatted like: 206 denoting the second half of 2006, 106 denoting the first half and so forth. In order to rearrange it to 2006-06-01 respectively 2006-01-01, i have written a small function:

period2date = function(period)
{
check=list()
check=strsplit(as.character(period),split="")
x=as.numeric(check[[1]][1])
p=ifelse( x >= 2,6,1)
x=2

out=paste(x,"0",check[[1]][2],check[[1]][3],"-",p,"-1",sep="")
out=as.Date(out)


return(out)    
}

you may laugh now :) . Anyway, that function works and here comes the problem. I want to apply this function to the time column of data.frame. I tried the following:

as.data.frame(lapply(mydf$period,period2date)) 

which returned the result closest to what I want: structure.13665..class....Date.. 1 2006-06-01

and so forth.. obviously i´d love to keep the name of my column – or even better just add the newly formatted date to my original df. Plus I tried:

sapply(mydf$period,period2date) # with results equal to the line below
unlist(lapply(mydf$period,period2date))

[1] 13300 13514 13665

All I want to do is change the uncommon 206 etc. format to 2006-06-01 (which works) and add a column to mydf (which does not work)

thx for any suggestions in advance!

A: 

This is strange...:

as.Date(sapply(mydf$period,period2date))

returns "2006-06-01" "2006-01-01" etc. I am stunned because the period2date function already contains as.Date(). This is a solution to my problem, but I don´t understand it completely...

ran2
+2  A: 

R stores dates as numbers, so I think you're getting some wacky behavior because you're operating on the date output (i.e., putting the dates back into a matrix, which makes them appear as the numbers they really are). Instead, you should explicitly use a data.frame with data.frame(). Also, you may save some time if you use vectorized operations (I think the apply family still uses loops):

period2date <- function(period) {
    period <- as.character(period)
    half <- substr(period, 1, 1)
    year <- substr(period, 2, 3)
    dates <- as.Date(ifelse(half=="1", paste(year, "0101", sep=""), paste(year, "0701", sep="")), format="%y%m%d")
    return(dates)
}

data <- data.frame(data, period2date(data$dates))

You can make this cleaner by replacing vice appending the period/date column, also.

richardh