So... I'm new to scheme r6rs, and am learning macros. Can somebody explain to me what is meant by 'hygiene'?
Thanks in advance.
So... I'm new to scheme r6rs, and am learning macros. Can somebody explain to me what is meant by 'hygiene'?
Thanks in advance.
If you imagine that a macro is simply expanded into the place where it is used, then you can also imagine that if you use a variable a
in your macro, there might already be a variable a
defined at the place where that macro is used.
This is not the a
that you want!
A macro system in which something like this cannot happen, is called hygienic.
There are several ways to deal with this problem. One way is simply to use very long, very cryptic, very unpredictable variable names in your macros.
A slightly more refined version of this is the gensym
approach used by some other macro systems: instead of you, the programmer coming up with a very long, very cryptic, very unpredictable variable name, you can call the gensym
function which generates a very long, very cryptic, very unpredictable and unique variable name for you.
And like I said, in a hygienic macro system, such collisions cannot happen in the first place. How to make a macro system hygienic is an interesting question in itself, and the Scheme community has spent several decades on this question, and they keep coming up with better and better ways to do it.
Here's what I found. Explaining what it means is another matter altogether!
http://www.r6rs.org/final/html/r6rs-lib/r6rs-lib-Z-H-1.html#node_toc_node_sec_12.1
I'm so glad to know that this language is still being used! Hygienic code is code that when injected (via a macro) does not cause conflicts with existing variables.
There is lots of good information on Wikipedia about this: http://en.wikipedia.org/wiki/Hygienic_macro
Hygiene is often used in the context of macros. A hygienic macro doesn't use variable names that can risk interfering with the code under expansion. Here is an example. Let's say we want to define the or
special form with a macro. Intuitively,
(or a b c ... d)
would expand to something like (let ((tmp a)) (if tmp a (or b c ... d)))
. (I am omitting the empty (or)
case for simplicity.)
Now, if the name tmp
was actually added in the code like in the above sketched expansion, it would be not hygienic, and bad because it might interfere with another variable with the same name. Say, we wanted to evaluate
(let ((tmp 1)) (or #f tmp))
Using our intuitive expansion, this would become
(let ((tmp 1)) (let ((tmp #f)) (if tmp (or tmp)))
The tmp
from the macro shadows the outer-most tmp
, and so the result is #f
instead of 1
.
Now, if the macro was hygienic (and in Scheme, it's automatically the case when using syntax-rules
), then instead of using the name tmp
for the expansion, you would use a symbol that is guaranteed not to appear anywhere else in the code. You can use gensym
in Common Lisp.
Paul Graham's On Lisp has advanced material on macros.
Macros transform code: they take one bit of code and transform it into something else. As part of that transformation, they may surround that code with more code. If the original code references a variable a
, and the code that's added around it defines a new version of a
, then the original code won't work as expected because it will be accessing the wrong a
: if
(myfunc a)
is the original code, which expects a
to be an integer, and the macro takes X
and transforms it to
(let ((a nil)) X)
Then the macro will work fine for
(myfunc b)
but (myfunc a)
will get transformed to
(let ((a nil)) (myfunc a))
which won't work because myfunc
will be applied to nil
rather than the integer it is expecting.
A hygienic macro avoids this problem of the wrong variable getting accessed (and a similar problem the other way round), by ensuring that the names used are unique.
Wikipedia has a good explanation of hygienic macros.
Apart from all the things mentioned, there is one important other thing to Scheme's hygienic macros, which follow from the lexical scope.
Say we have:
(syntax-rules () ((_ a b) (+ a b)))
As part of a macro, surely it will insert the +, it will also insert it when there's a + already there, but then another symbol which has the same meaning as +
. It binds symbols to the value they had in the lexical environment in which the syntax-rules
lies, not where it is applied, we are lexically scoped after all. It will most likely insert a completely new symbol there, but one which is globally bound to the same meaning as +
is at the place the macro is defined. This is most handy when we use a construct like:
(let ((+ *))
; piece of code that is transformed
)
The writer, or user of the macro thus needn't be occupied with ensuring its use goes well.