tags:

views:

516

answers:

5

Hi,

Can someone suggest articles that explain the concept of Homoiconicity, especially using Clojure. Why is it that Clojure is homoiconic but its hard to do that in other languages such as Java ?

+1  A: 

It almost seems to be to obvious, but first sources might be:

http://en.wikipedia.org/wiki/Homoiconicity

http://c2.com/cgi/wiki?DefinitionOfHomoiconic

Homoiconicity is explained in general and you can also find the originated sources. As it is explained by using the example of Lisp, it is not that far from Clojure.

Nils Schmidt
So if I try the Wikipedia example in Java.1. A java program has a Class with a field String that is in itself a java program. (properly escaped)2. Manipulate it using Regex to replace some substring, such as COS to SIN3. Make a call to an external program ( A java compiler ) that will execute the above modified String that is written to a file.4. The output would be the execution of the program which was a String in the parent class.5. You could as well take the String field as an input in the main method of parent class.So does that imply that Java is homoiconic as well ?
Icarus
Surely the bytecode representation of class files in Java is not same as the syntax for Java code...
Icarus
To add to the first case, I will have the classLoader load the class files generated after compilation of the above program and use reflection to execute methods on the newly loaded classes.
Icarus
Icarus: You're using one tool (regex) to parse your Java program as text. You're using a second tool (compiler) to rebuild the changed program. Nowhere did you suggest how Java programs are primarily represented in some Java-primitive data structure, as required by the first sentence of the WP article. What primitive data structure holds sin(x)?
Ken
Ken: a string object.
Rainer Joswig
+3  A: 

When I was learning Lisp the idea of homoiconicity made sense when i learned that the lisp is "compiled" in two phases, reading and compiling and the code is represented with the same data structure for both of these:

  • first you think of an s-expression in your head
  • then you type the s-expression as characters in a file
  • then the reader translates the characters in the file into s-expressions. Its not compiling the program, just building data structures from characters this is part of the reading phase.
  • then the reader looks at each of the expressions and decided if they are a macro and if so runs the macro to produce another s-expression. so at this point we have gone from s-expressions to characters to s-expressions, then from s-expressions to different s-expressions.
  • these s-expressions are then compiled into .class files that can be run by the jvm this is the second of "compiling" phase.

So its pretty much s-expressions all the way from your brain to the .class file. you even write s-expressions that write s-expressions. so you can say that "the code is the data" or "code is data" because that just sounds better.

Arthur Ulfeldt
Actually it's misleading to imply that macro expansion happens inside the reader. Note how `(eval '(when true (println :foo) (println :bar)))` works as expected, even though the reader clearly may not mangle the quoted list structure `'(when true (println :foo) (println :bar))`. The standard nomenclature distinguishes between read time (character stream -> data structures conversion) and macro expansion time (data structure transformation) too. The fact remains that macro expansion happens before compilation, though, which is the key point to understanding macros.
Michał Marczyk
thanks for pointing that out. I removed the comment about that being part of the reader.
Arthur Ulfeldt
The reader does not translate characters into s-expressions. The reader translates s-expressions into data.
Rainer Joswig
The reader also does not care about macros and does not run macros to produce other s-expressions. Macros and the reader are not related in any way.
Rainer Joswig
+12  A: 

Before I proceed with some things I wanted to add another answer for, here's one more reference -- the part related to homoiconicity is fairly short, but it is Rich Hickey doing the explaining! Channel 9 has this nice video with Rich Hickey and Brian Beckman talking about Clojure. Concurrency is, understandably, the major focus, but homoiconicity does get its own (short) moment of screen time during which Rich nicely explains the interplay between read (the function which converts concrete syntax as written down by the programmer to the internal representation built out from lists etc.) and eval. He has this nice diagram showing how eval never even knows that the code it evaluates comes from read operating on a text file... Arthur has already explained the gist behind that, but hey, watch it anyway, it's a very nice video!


A disclaimer: I'll be mentioning Java and Python as examples below the next horizontal bar. I want to make clear that the following is just a rough sketch of why I think it might be difficult to make a homoiconic, Lisp-style-macro-enabled Java or Python; it's just an academic exercise, though, and I don't want to consider the question of whether there's any reason to try in the first place. Also, I don't want to imply that the syntax of a language with Lisp style macros must contain explicit delimiters for tree structures; Dylan (the paren-less Lisp?) apparently provides a counterexample. Finally, I use the expression Lisp style macros because I'm only examining Lisp style macros. The language Forth, for example, has a different macro facility which I don't really understand except that I know it to enable wicked cool looking code. Apparently syntax extensions can be implemented in a number of ways. With this out of the way...


I'd like to address the second part of your question -- how is it that most programming languages are considered not to be homoiconic? I'll have to touch upon the semantics of Lisp in the process, but since Nils has already provided links to good sources of information on the term "homoiconic" itself and Arthur has described the read -> macro expand -> compile cycle as found in Clojure, I'll be building on that in what follows. To start things off, let me quote a passage from Alan Kay (extracted from the Wikipedia article which also links to the original source):

[...] Interactive LISP [...] and TRAC [...] both are "homoiconic" in that their internal and external representations are essentially the same.

(Those [...] bits hide a lot of text, but the gist is unchanged.)

Now, let's ask ourselves the question: what is Java's internal representation of Java? ... Well, this doesn't even make sense. The Java compiler does have a certain internal representation of Java, namely an abstract syntax tree; to construct a "homoiconic Java", we'd have to make that AST representation a first-class object in Java and devise a syntax which would allow us to write ASTs directly. That could prove to be rather hard.

Python provides an example of a non-homoiconic language which is interesting in that it currently ships with an AST-manipulation toolkit in the form of the ast module. The docs for that module explicitly state that Python ASTs may change between releases, which may or may not be discouraging; still, I suppose an industrious programmer could take the ast module, devise a syntax (maybe S-expression based, maybe XML-based) for describing Python ASTs directly and construct a parser for that syntax in regular Python using ast, thus taking a solid first step towards creating a homoiconic language with Python semantics. (I believe I came across a dialect of Lisp compiling to Python bytecode some time ago... I wonder if it might be doing something like that at some level?)

Even then the problem remains of extracting concrete benefits from that kind of homoiconicity. It's viewed as a beneficial property of members of the Lisp family of languages because it allows us to write programmes which write further programmes, among which macros are the most notable. Now, while macros are enabled in one way by the fact that it is so easy to manipulate the internal representation of Lisp code in Lisp, they are also enabled in an equally important way by the Lisp execution model: a Lisp programme is just a collection of Lisp forms; these are processed by the Lisp function eval which is responsible for determining the values of the expressions and causing the appropriate side-effects at the correct time; the semantics of Lisp are exactly the semantics of eval. The question of how things work internally to preserve this semantic illusion while being reasonably fast is an implementation detail; a Lisp system has an obligation to expose a function eval to the programmer and to act as if Lisp programmes were being processed by that function.

In modern Lisp systems, it is a part of eval's contract that it performs an additional preprocessing phase during which macros are expanded prior to evaluating the code (or compiling and running, as the case may be). That particular facility is not a necessary part of a Lisp system, but it is just so easy to plug it into this execution model! Also, I wonder if this isn't the only execution model which makes the Lisp kind of macro transformations manageable, which would mean that any language seeking to incorporate Lisp-style macros would have to adopt a similar execution model. My intuition tells me that this is indeed the case.

Of course once a language is written down in notation directly paralleling its ASTs and uses a Lisp-like execution model with an evaluator function / object, one must wonder if it isn't by any chance another dialect of Lisp... even if its AST-paralleling syntax happens to be XML-based. shudder

Michał Marczyk
+2  A: 

The whole idea of 'homoiconicity' is slightly confused and does not fit well into Lisp. Internal and external representations are not the same in Lisp. External representation is based on characters in files. The internal representation is based on Lisp data (numbers, strings, lists, arrays, ...) and is non-textual. How is that the same as characters?

The main difference between Lisp and many other programming languages is, that Lisp has a simple data representation for source code - one which is not based on strings. Obviously code can be represented as strings in text-based programming languages. But in Lisp the source can be represented in terms of primitive Lisp data structures. The external representation is based on s-expressions, which is a simple model to represent hierarchical data as text. The internal model is representation is based on lists, etc.

The basic model:

  • READ translates external s-expressions into data
  • EVAL takes Lisp forms in the form of Lisp data and evaluates them
  • PRINT translates Lisp data into external s-expressions

Note that READ and PRINT work for arbitrary Lisp data, and not only for Lisp forms.

Rainer Joswig
A: 

Here's a short program to do symbolic differentiation. This is an example of LISP manipulating its own code. Try translating it to another language to see why LISP is good for this sort of thing.

;; The simplest possible symbolic differentiator

;; Functions to create and unpack additions like (+ 1 2)
(defn make-add [ a b ] (list '+ a b))
(defn addition? [x] (and (=(count x) 3) (= (first x) '+)))
(defn add1   [x] (second x))
(defn add2   [x] (second (rest x)))

;; Similar for multiplications (* 1 2)
(defn make-mul [ a b ] (list '* a b))
(defn multiplication? [x] (and (=(count x) 3) (= (first x) '*)))
(defn mul1   [x] (second x))
(defn mul2   [x] (second (rest x)))

;; Differentiation. 
(defn deriv [exp var]
  (cond (number? exp) 0                                                              ;; d/dx c -> 0
        (symbol? exp) (if (= exp var) 1 0)                                           ;; d/dx x -> 1, d/dx y -> 0
        (addition? exp) (make-add (deriv (add1 exp) var) (deriv (add2 exp) var))     ;; d/dx a+b -> d/dx a + d/dx b
        (multiplication? exp) (make-add (make-mul (deriv (mul1 exp) var) (mul2 exp)) ;; d/dx a*b -> d/dx a * b + a * d/dx b
                                        (make-mul (mul1 exp) (deriv (mul2 exp) var)))
        :else :error))

;;an example of use: create the function x -> x^3 + 2x^2 + 1 and its derivative 
(def poly '(+ (+ (* x (* x x)) (* 2 (* x x))) 1))

(defn poly->fnform [poly] (list 'fn '[x] poly))

(def polyfn  (eval (poly->fnform poly)))
(def dpolyfn (eval (poly->fnform (deriv poly 'x))))
John Lawrence Aspden