tags:

views:

119

answers:

5

Hello everyone. I have two data frames in R. One frame has a persons year of birth:

YEAR
/1931
/1924

and then another column shows a more recent time.

RECENT
09/08/2005
11/08/2005

What I want to do is subtract the years so that I can calculate their age in number of years, however I am not sure how to approach this. Any help please?

+1  A: 

You can do some formating:

as.numeric(format(as.Date("01/01/2010", format="%m/%d/%Y"), format="%Y")) - 1930

With your data:

> yr <- c(1931, 1924)
> recent <- c("09/08/2005", "11/08/2005")
> as.numeric(format(as.Date(recent, format="%m/%d/%Y"), format="%Y")) - yr
[1] 74 81

Since you have your data in a data.frame (I'll assume that it's called df), it will be more like this:

as.numeric(format(as.Date(df$recent, format="%m/%d/%Y"), format="%Y")) - df$year
Shane
Works for the data I've posted here, but my data set actually has many more rows. Is there a way I could accomplish this by calling on the data frames themselves?
Brian
In the sample way. Just replace recent and yr with your df columns.
Shane
A: 

Based on the previous answer, convert your columns to date objects and subtract. Some conversion of types between character and numeric is necessary:

> foo=data.frame(RECENT=c("09/08/2005","11/08/2005"),YEAR=c("/1931","/1924"))
> foo
      RECENT  YEAR
1 09/08/2005 /1931
2 11/08/2005 /1924
> foo$RECENTd = as.Date(foo$RECENT, format="%m/%d/%Y")
> foo$YEARn = as.numeric(substr(foo$YEAR,2,999))
> foo$AGE = as.numeric(format(foo$RECENTd,"%Y")) - foo$YEARn
> foo
      RECENT  YEAR    RECENTd YEARn AGE
1 09/08/2005 /1931 2005-09-08  1931  74
2 11/08/2005 /1924 2005-11-08  1924  81

Note I've assumed you have that slash in your year column.

Also, tip for when asking questions about dates is to include a day that is past the twelfth so we know if you are a month/day/year person or a day/month/year person.

Spacedman
Use classes! `as.Date()` does the work for you practically.
Vince
A: 

Given the data in your example:

> m <- data.frame(YEAR=c("/1931", "/1924"),RECENT=c("09/08/2005","11/08/2005"))
> m
   YEAR     RECENT
1 /1931 09/08/2005
2 /1924 11/08/2005

Extract year with the strptime function:

> strptime(m[,2], format = "%m/%d/%Y")$year - strptime(m[,1], format = "/%Y")$year
[1] 74 81
eyjo
Why? The beauty of object oriented programming is having methods that recognize date objects so you don't have to do this.
Vince
Why not? This solves the problem with just one conversions.
eyjo
+2  A: 

You can solve this with the lubridate package.

> library(lubridate)

I don't think /1931 is a common date class. So I'll assume all the entries are character strings.

> RECENT <- data.frame(recent = c("09/08/2005", "11/08/2005"))
> YEAR <- data.frame(year = c("/1931", "/1924"))

First, let's notify R that the recent dates are dates. I'll assume the dates are in month/day/year order, so I use mdy(). If they're in day/month/year order just use dmy().

> RECENT$recent <- mdy(RECENT$recent)
      recent
1 2005-09-08
2 2005-11-08

Now, lets turn the years into numbers so we can do some math with them.

> YEAR$year <- as.numeric(substr(YEAR$year, 2, 5))

Now just do the math. year() extracts the year value of the RECENT dates.

> year(RECENT$recent) - YEAR
  year
1   74
2   81

p.s. if your year entries are actually full dates, you can get the difference in years with

> YEAR1 <- data.frame(year = mdy("01/08/1931","01/08/1924"))
> as.period(RECENT$recent - YEAR1$year, units = "year")
[1] 74 years and 8 months   81 years and 10 months
Garrett
A: 

You can even use simple string manipulation :

m <- data.frame(YEAR=c("/1931", "/1924"),
  RECENT=c("09/08/2005","11/08/2005"),
  stringsAsFactors=F)

m$YEAR <- as.numeric(gsub("/","",m$YEAR))
m$recentyear <- as.numeric(substr(m$RECENT,7,10))

m$age <-  m$recentyear - m$YEAR

If you have birth dates, you can take that into account. This code gives the correct age, give or take a day or two due to the hack for calculating the age.

require(chron)
m <- data.frame(YEAR=c("11/2/1931", "17/11/1924"),
  RECENT=c("09/08/2005","11/08/2005"),
  stringsAsFactors=F)

m$YEAR <- chron(m$YEAR,format="d/m/y")
m$RECENT <- chron(m$RECENT,format="d/m/y")

m$age <-  floor((m$RECENT - m$YEAR)/365.25)

There are multiple ways of dealing with dates. I personally like the chron() library the best, but that's a matter of taste. All other options mentioned here are equally valid.

Joris Meys