views:

355

answers:

3

I have a bunch of Stata .dta files that I would like to use in R.

My problem is that the variable names are not helpful to me as they are like "q0100," "q0565," "q0500," and "q0202." However, they are labelled like "psu," "number of pregnant," "head of household," and "waypoint."

I would like to be able to grab the labels ("psu," "waypoint," etc. . .) and use them as my variable/column names as those will be easier for me to work with.

Is there a way to do this, either preferably in R, or through Stata itself? I know of read.dta in library(foreign) but don't know if it can convert the labels into variable names.

A: 

not at the computer now, but i thin hmisc has a function to import labels from spss. might work with stata too?

Andreas
+2  A: 

R does not have a built in way to handle variable labels. Personally I think that this is disadvantage that should be fixed. Hmisc does provide some facilitiy for hadling variable labels, but the labels are only recognized by functions in that package. read.dta creates a data.frame with an attribute "var.labels" which contains the labeling information. You can then create a data dictionary from that.

> data(swiss)
> write.dta(swiss,swissfile <- tempfile())
> a <- read.dta(swissfile)
> 
> var.labels <- attr(a,"var.labels")
> 
> data.key <- data.frame(var.name=names(a),var.labels)
> data.key
          var.name       var.labels
1        Fertility        Fertility
2      Agriculture      Agriculture
3      Examination      Examination
4        Education        Education
5         Catholic         Catholic
6 Infant_Mortality Infant.Mortality

Of course this .dta file doesn't have very interesting labels, but yours should be more meaningful.

Ian Fellows
Thanks, I had just sort of stumbled upon that at http://stat.ethz.ch/R-manual/R-patched/library/foreign/html/read.dta.html But I used>attributes(a)$var.labelsThen I can use the data.key idea you had and build a function that renames the variables as such.Thanks again.
Jared
sure, but variable labels can be quite verbose and contain characters that are not advisable to use for variable names.
Ian Fellows
A: 

Jared:

You can convert the variable labels to variable names from within Stata before exporting it to a R or text file.
As Ian mentions, variable labels usually do not make good variable names, but if you convert spaces and other characters to underscores and if your variable labels aren't too long, you can re-label your vars with the varlabels quite easily.

Below is an example using the inbuilt Stata dataset "cancer.dta" to replace all variable names with var labels--importantly, this code will not try to rename variable with no variable labels. Note that I also picked a dataset where there are lots of characters that aren't useful in naming a variable (e.g.: =, 1, ', ., (), etc)...you can add any characters that might be lurking in your variable labels to the list in the 5th line: "local chars "..." " and it will make the changes for you:

****************! BEGIN EXAMPLE
//copy and paste this code into a Stata do-file and click "do"//
sysuse  cancer, clear
desc
**
local chars "" " "(" ")" "." "1" "=" `"'"' "___" "__" "
ds, not(varlab "")    // <-- This will only select those vars with varlabs //
foreach v in `r(varlist)' {
    local `v'l "`:var lab `v''"
    **variables names cannot have spaces or other symbols, so::
        foreach s in `chars' {
    local `v'l: subinstr local `v'l "`s'" "_", all
              }
    rename `v' ``v'l'
    **make the variable names all lower case**
    cap rename ``v'l' `=lower("``v'l'")'
      }
desc
****************! END EXAMPLE

You might also consider taking a look at Stat Transfer and it's capabilities in converting Stata to R datafiles.

~ Eric


[email protected] |
[email protected] |
http://www.eric-a-booth.com

eric.a.booth
Thanks for the help Eric. I don't know Stata nearly as well as R (as in don't know it at all) so I had already gone with the solution above.
Jared