views:

370

answers:

8

Which conventions for naming variables and functions do you favor in R code?

As far as I can tell, there are several different conventions, all of which coexist in cacophonous harmony:

1. Use of period separator, e.g.

  stock.prices <- c(12.01, 10.12)
  col.names    <- c('symbol','price')

Pros: Has historical precedence in the R community, prevalent throughout the R core, and recommended by Google's R Style Guide.

Cons: Rife with object-oriented connotations, and confusing to R newbies

2. Use of underscores

  stock_prices <- c(12.01, 10.12)
  col_names    <- c('symbol','price')

Pros: A common convention in many programming langs; favored by Hadley Wickham's Style Guide, and used in ggplot2 and plyr packages.

Cons: Not historically used by R programmers; is annoyingly mapped to '<-' operator in Emacs-Speaks-Statistics (alterable with 'ess-toggle-underscore').

3. Use of mixed capitalization (camelCase)

  stockPrices <- c(12.01, 10.12)
  colNames    <- c('symbol','price')

Pros: Appears to have wide adoption in several language communities.

Cons: Has recent precedent, but not historically used (in either R base or its documentation).

Finally, as if it weren't confusing enough, I ought to point out that the Google Style Guide argues for dot notation for variables, but mixed capitalization for functions.

The lack of consistent style across R packages is problematic on several levels. From a developer standpoint, it makes maintaining and extending other's code difficult (esp. where its style is inconsistent with your own). From a R user standpoint, the inconsistent syntax steepens R's learning curve, by multiplying the ways a concept might be expressed (e.g. is that date casting function asDate(), as.date(), or as_date()? No, it's as.Date()).

+1  A: 

This comes down to personal preference, but I follow the google style guide because it's consistent with the style of the core team. I have yet to see an underscore in a variable in base R.

Shane
A: 

As I point out here:

http://stackoverflow.com/questions/1232074/how-does-the-verbosity-of-identifiers-affect-the-performance-of-a-programmer/1239385#1239385

it's worth bearing in mind how understandable your variable names are to your co-workers/users if they are non-native speakers...

For that reason I'd say underscores and periods are better than capitalisation, but as you point out consistency is essential within your script.

David Lawrence Miller
+10  A: 

Good previous answers so just a little to add here:

  • underscores are really annoying for ESS users; given that ESS is pretty widely used you won't see many underscores in code authored by ESS users (and that set includes a bunch of R Core as well as CRAN authors, excptions like Hadley notwithstanding);

  • dots are evil too because they can get mixed up in simple method dispatch; I believe I once read comments to this effect on one of the R list: dots are a historical artifact and no longer encouraged;

  • so we have a clear winner still standing in the last round: camelCase. I am also not sure if I really agree with the assertion of 'lacking precendent in the R community'.

And yes: pragmatism and consistency trump dogma. So whatever works and is used by colleagues and co-authors. After all, we still have white-space and braces to argue about :)

Dirk Eddelbuettel
+1 Well said! [If only the core team would put out a definitive style guide; I feel like that would give more credence to their already implied usage.]
Shane
I could just be misremembering based on my own bias towards mixed case but I believe that's what RG always used when I was working for him. I figure what's good for RG is good for me!
geoffjentry
Geoff: Not a bad rule to go by :)
Dirk Eddelbuettel
Dirk - I'm giving your answer the thumbs up here, but it would be truly wonderful if this style preference were reified in a document somewhere at r-project.org. At present, it's floating in the un-Google-able collective consciousness of the R Core Team :).
dataspora
Thanks for thumbs-up. As for for the 'canonical style document': wishing along doesn't make it so, or I'd be riding pink ponies. Maybe you can start by authoring something, which you could stick onto the R Wiki and we all edit, adopt and adhere to it. Hope springs eternal, as they say...
Dirk Eddelbuettel
I have no problems with camelCase though I prefer underscores and don't use ESS. I will say that it would be nice to have multiple naming conventions for different situations as the google guide aims for with camelcase for functions. It dramatically increases comprehension. Since underscores are used in a number of languages it would be ideal to have them for one thing, be it variables, functions et al
Dan
A: 

I have a preference for mixedCapitals.

But I often use periods to indicate what the variable type is:

mixedCapitals.mat is a matrix. mixedCapitals.lm is a linear model. mixedCapitals.lst is a list object.

and so on.

Jesse
+1  A: 

Underscores all the way! Contrary to popular opinion, there are a number of functions in base R that use underscores. Run grep("^[^\\.]*$", apropos("_"), value = T) to see them all.

I use the official Hadley style of coding ;)

hadley
That's neat! I wasn't aware of the *apropos* function before. This returns 10 functions for me in R 2.9.0; I'd hardly say that's a compelling case. What's your rationale for underscores when they're clearly in a minority for R?
Shane
Well it's 16 in R 2.10.0, so that's a 60% increase per version ;)I mainly like them because they remind me of Ruby; camelCase reminds me of Java.
hadley
Hadley, my heart says to support your underscore insurgency, but my head says to respect the community standard, and say yes to camelCase. :( But perhaps self-consistency is all that matters.
dataspora
+2  A: 

As others have mentioned, underscores will screw up a lot of folks. No, it's not verboten but it isn't particularly common either.

Using dots as a separator gets a little hairy with S3 classes and the like.

In my experience, it seems like a lot of the high muckity mucks of R prefer the use of camelCase, with some dot usage and a smattering of underscores.

geoffjentry
A: 

Usually I rename my variables using a ix of underscores and a mixed capitalization (camelCase). Simples variables are naming using underscores, example:

PSOE_votes -> number of votes for the PSOE (political group of Spain).

PSOE_states -> Categorical, indicates the state where PSOE wins {Aragon, Andalucia, ...)

PSOE_political_force -> Categorial, indicates the position between political groups of PSOE {first, second, third)

PSOE_07 -> Union of PSOE_votes + PSOE_states + PSOE_political_force at 2007 (h*eader -> votes, states, position*)

If my variable is a result of to applied fuction in one/two Variables I using a mixed capitalization.

Example:

positionXstates <- xtabs(~states+position, PSOE_07)

calejero
+1  A: 

I like camelCase when the camel actually provides something meaningful -- like the datatype.

dfProfitLoss, where df = dataframe

or

vdfMergedFiles(), where the function takes in a vector and spits out a dataframe

While I think _ really adds to the readability, there just seems to be too many issues with using .-_ or other characters in names. Especially if you work across several languages.

Robert