Hi,
Can someone suggest articles that explain the concept of Homoiconicity, especially using Clojure. Why is it that Clojure is homoiconic but its hard to do that in other languages such as Java ?
Hi,
Can someone suggest articles that explain the concept of Homoiconicity, especially using Clojure. Why is it that Clojure is homoiconic but its hard to do that in other languages such as Java ?
It almost seems to be to obvious, but first sources might be:
http://en.wikipedia.org/wiki/Homoiconicity
http://c2.com/cgi/wiki?DefinitionOfHomoiconic
Homoiconicity is explained in general and you can also find the originated sources. As it is explained by using the example of Lisp, it is not that far from Clojure.
When I was learning Lisp the idea of homoiconicity made sense when i learned that the lisp is "compiled" in two phases, reading and compiling and the code is represented with the same data structure for both of these:
So its pretty much s-expressions all the way from your brain to the .class file. you even write s-expressions that write s-expressions. so you can say that "the code is the data" or "code is data" because that just sounds better.
Before I proceed with some things I wanted to add another answer for, here's one more reference -- the part related to homoiconicity is fairly short, but it is Rich Hickey doing the explaining! Channel 9 has this nice video with Rich Hickey and Brian Beckman talking about Clojure. Concurrency is, understandably, the major focus, but homoiconicity does get its own (short) moment of screen time during which Rich nicely explains the interplay between read
(the function which converts concrete syntax as written down by the programmer to the internal representation built out from lists etc.) and eval
. He has this nice diagram showing how eval
never even knows that the code it evaluates comes from read
operating on a text file... Arthur has already explained the gist behind that, but hey, watch it anyway, it's a very nice video!
A disclaimer: I'll be mentioning Java and Python as examples below the next horizontal bar. I want to make clear that the following is just a rough sketch of why I think it might be difficult to make a homoiconic, Lisp-style-macro-enabled Java or Python; it's just an academic exercise, though, and I don't want to consider the question of whether there's any reason to try in the first place. Also, I don't want to imply that the syntax of a language with Lisp style macros must contain explicit delimiters for tree structures; Dylan (the paren-less Lisp?) apparently provides a counterexample. Finally, I use the expression Lisp style macros because I'm only examining Lisp style macros. The language Forth, for example, has a different macro facility which I don't really understand except that I know it to enable wicked cool looking code. Apparently syntax extensions can be implemented in a number of ways. With this out of the way...
I'd like to address the second part of your question -- how is it that most programming languages are considered not to be homoiconic? I'll have to touch upon the semantics of Lisp in the process, but since Nils has already provided links to good sources of information on the term "homoiconic" itself and Arthur has described the read -> macro expand -> compile cycle as found in Clojure, I'll be building on that in what follows. To start things off, let me quote a passage from Alan Kay (extracted from the Wikipedia article which also links to the original source):
[...] Interactive LISP [...] and TRAC [...] both are "homoiconic" in that their internal and external representations are essentially the same.
(Those [...] bits hide a lot of text, but the gist is unchanged.)
Now, let's ask ourselves the question: what is Java's internal representation of Java? ... Well, this doesn't even make sense. The Java compiler does have a certain internal representation of Java, namely an abstract syntax tree; to construct a "homoiconic Java", we'd have to make that AST representation a first-class object in Java and devise a syntax which would allow us to write ASTs directly. That could prove to be rather hard.
Python provides an example of a non-homoiconic language which is interesting in that it currently ships with an AST-manipulation toolkit in the form of the ast
module. The docs for that module explicitly state that Python ASTs may change between releases, which may or may not be discouraging; still, I suppose an industrious programmer could take the ast
module, devise a syntax (maybe S-expression based, maybe XML-based) for describing Python ASTs directly and construct a parser for that syntax in regular Python using ast
, thus taking a solid first step towards creating a homoiconic language with Python semantics. (I believe I came across a dialect of Lisp compiling to Python bytecode some time ago... I wonder if it might be doing something like that at some level?)
Even then the problem remains of extracting concrete benefits from that kind of homoiconicity. It's viewed as a beneficial property of members of the Lisp family of languages because it allows us to write programmes which write further programmes, among which macros are the most notable. Now, while macros are enabled in one way by the fact that it is so easy to manipulate the internal representation of Lisp code in Lisp, they are also enabled in an equally important way by the Lisp execution model: a Lisp programme is just a collection of Lisp forms; these are processed by the Lisp function eval
which is responsible for determining the values of the expressions and causing the appropriate side-effects at the correct time; the semantics of Lisp are exactly the semantics of eval
. The question of how things work internally to preserve this semantic illusion while being reasonably fast is an implementation detail; a Lisp system has an obligation to expose a function eval
to the programmer and to act as if Lisp programmes were being processed by that function.
In modern Lisp systems, it is a part of eval
's contract that it performs an additional preprocessing phase during which macros are expanded prior to evaluating the code (or compiling and running, as the case may be). That particular facility is not a necessary part of a Lisp system, but it is just so easy to plug it into this execution model! Also, I wonder if this isn't the only execution model which makes the Lisp kind of macro transformations manageable, which would mean that any language seeking to incorporate Lisp-style macros would have to adopt a similar execution model. My intuition tells me that this is indeed the case.
Of course once a language is written down in notation directly paralleling its ASTs and uses a Lisp-like execution model with an evaluator function / object, one must wonder if it isn't by any chance another dialect of Lisp... even if its AST-paralleling syntax happens to be XML-based. shudder
The whole idea of 'homoiconicity' is slightly confused and does not fit well into Lisp. Internal and external representations are not the same in Lisp. External representation is based on characters in files. The internal representation is based on Lisp data (numbers, strings, lists, arrays, ...) and is non-textual. How is that the same as characters?
The main difference between Lisp and many other programming languages is, that Lisp has a simple data representation for source code - one which is not based on strings. Obviously code can be represented as strings in text-based programming languages. But in Lisp the source can be represented in terms of primitive Lisp data structures. The external representation is based on s-expressions, which is a simple model to represent hierarchical data as text. The internal model is representation is based on lists, etc.
The basic model:
Note that READ and PRINT work for arbitrary Lisp data, and not only for Lisp forms.
Here's a short program to do symbolic differentiation. This is an example of LISP manipulating its own code. Try translating it to another language to see why LISP is good for this sort of thing.
;; The simplest possible symbolic differentiator
;; Functions to create and unpack additions like (+ 1 2)
(defn make-add [ a b ] (list '+ a b))
(defn addition? [x] (and (=(count x) 3) (= (first x) '+)))
(defn add1 [x] (second x))
(defn add2 [x] (second (rest x)))
;; Similar for multiplications (* 1 2)
(defn make-mul [ a b ] (list '* a b))
(defn multiplication? [x] (and (=(count x) 3) (= (first x) '*)))
(defn mul1 [x] (second x))
(defn mul2 [x] (second (rest x)))
;; Differentiation.
(defn deriv [exp var]
(cond (number? exp) 0 ;; d/dx c -> 0
(symbol? exp) (if (= exp var) 1 0) ;; d/dx x -> 1, d/dx y -> 0
(addition? exp) (make-add (deriv (add1 exp) var) (deriv (add2 exp) var)) ;; d/dx a+b -> d/dx a + d/dx b
(multiplication? exp) (make-add (make-mul (deriv (mul1 exp) var) (mul2 exp)) ;; d/dx a*b -> d/dx a * b + a * d/dx b
(make-mul (mul1 exp) (deriv (mul2 exp) var)))
:else :error))
;;an example of use: create the function x -> x^3 + 2x^2 + 1 and its derivative
(def poly '(+ (+ (* x (* x x)) (* 2 (* x x))) 1))
(defn poly->fnform [poly] (list 'fn '[x] poly))
(def polyfn (eval (poly->fnform poly)))
(def dpolyfn (eval (poly->fnform (deriv poly 'x))))