tags:

views:

94

answers:

2

In the "What is the most useful R trick?" (here), I read that using environments give "pass-by-reference capabilities". Are there any limits and/or gotchas with this approach?

Also, in general what are the pros and cons of using created environments? This is something I've been confused about for quite some time, so any clarity or reference would be very helpful to me.

Thank you in advance.

+4  A: 

Well, if you don't understand them, and the people you might someday have to read your code (including your future self) don't understand environments, then you shouldn't use them! They were designed to be used to encapsulate name spaces in packages and such. The fact that you can use them for pass-by-reference and hash tables doesn't necessarily mean you should. It's a trick. Generally, use of deep magic is not really advisable, even if it makes your code a little faster.

Harlan
@Harlan - So whenever I come across a new trick, I should avoid it because I don't understand it? Often I perform operations on large covariance matrices by passing them from function to function. Would using environments improve performance in this situation enough to warrant using them?
John A. Ramey
I'm not entirely sure of the implementation details, but I believe that if you don't modify the large matrices within the functions, they're not actually copied. As to your larger question, I'd advise that if you need the speed, it may be worth learning the wizardry, just keep in mind that it's a (mild) abuse of the languages semantics to do so, and that you may regret it later. Or, you may not regret it!
Harlan
+1 To touch on Harlan's concerns: yes, this is a dangerous usage because it introduces "side-effects". Whenever you allow a function to alter the outside world, you are opening yourself up to unexpected behavior. http://en.wikipedia.org/wiki/Side_effect_(computer_science)
Shane
+1 to Shane's comment. In this increasingly parallelized world it is good practice to start cutting back on uses of side effects.
Sharpie
+7  A: 

While I agree with Harlan's overall advice (i.e. don't use something unless you understand it), I would add:

Environments are a fundamental concept in R, and in my view, extremely useful (in other words: they're worth understanding!). Environments are very important to understand issues related to scope. Some basic things that you should understand in this context:

  1. search(): will show you the workspace; environments are listed in order of priority. The main environment is .GlobalEnv, and can always be referenced as such.
  2. ls(): will show you what's contained in an environment
  3. attach/detach: creates a new environment for an object
  4. get, assign, <<-, and <-: you should know the difference between these functions
  5. with: one method for working with an environment without attaching it.

Another pointer: have a look at the proto package (used in ggplot), which uses environments to provide controlled inheritance.

Lastly, I would point out that environments are very similar to lists: they can both store any kind of object within them (see this question). But depending on your use case (e.g. do you want to deal with inheritance and priority), a list can be easier to work with. And you can always attach a list as an environment.

Edit: If you want to see an example of proto at work in ggplot, have a look that the structure of a ggplot object, which is essentially a list composed partially of environments:

> p <- qplot(1:10, 1:10)
> str(p)
List of 8
 $ data       :'data.frame':    0 obs. of  0 variables
 $ layers     :List of 1
  ..$ :proto object 
 .. .. $ legend     : logi NA 
 .. .. $ inherit.aes: logi TRUE 
...
> class(p$layers[[1]])
[1] "proto"       "environment"
> is.environment(p$layers[[1]])
[1] TRUE

Notice how it's constructed using proto and is containing many environments as a result. You can also plot the relationships in these objects using graph.proto.

Shane
I'm hoping to understand them so that I can use them potentially. I'm somewhat familiar with the scoping rules in R and with most of the functions that you have listed, but I will explore there details in more depth. Thanks for the info.
John A. Ramey
Completely agree, Shane! It's important to understand environments and scoping in R if you're building any significant amount of code! But that doesn't necessarily imply you should use environments as data structures.
Harlan
@Harlan: I completely agree. Maybe I should be more forceful on that front. @John: Don't use environments unless you (1) understand them and (2) have a good reason to do so. A list is generally a better option. IMO, it's a best practice to avoid side-effects unless you absolutely can't!
Shane