views:

659

answers:

6

I'm working on a macro system for Python (as discussed here) and one of the things I've been considering are units of measure. Although units of measure could be implemented without macros or via static macros (e.g. defining all your units ahead of time), I'm toying around with the idea of allowing syntax to be extended dynamically at runtime.

To do this, I'm considering using a sort of partial evaluation on the code at compile-time. If parsing fails for a given expression, due to a macro for its syntax not being available, the compiler halts evaluation of the function/block and generates the code it already has with a stub where the unknown expression is. When this stub is hit at runtime, the function is recompiled against the current macro set. If this compilation fails, a parse error would be thrown because execution can't continue. If the compilation succeeds, the new function replaces the old one and execution continues.

The biggest issue I see is that you can't find parse errors until the affected code is run. However, this wouldn't affect many cases, e.g. group operators like [], {}, (), and `` still need to be paired (requirement of my tokenizer/list parser), and top-level syntax like classes and functions wouldn't be affected since their "runtime" is really load time, where the syntax is evaluated and their objects are generated.

Aside from the implementation difficulty and the problem I described above, what problems are there with this idea?

+1  A: 

Another thing I've considered is making this the default behavior across the board, but allow languages (meaning a set of macros to parse a given language) to throw a parse error at compile-time. Python 2.5 in my system, for example, would do this.

Instead of the stub idea, simply recompile functions that couldn't be handled completely at compile-time when they're executed. This will also make self-modifying code easier, as you can modify the code and recompile it at runtime.

Cody Brocious
+3  A: 

Here are a few possible problems:

  • You may find it difficult to provide the user with helpful error messages in case of a problem. This seems likely, as any compilation-time syntax error could be just a syntax extension.
  • Performance hit.

I was trying to find some discussion of the pluses, minuses, and/or implementation of dynamic parsing in Perl 6, but I couldn't find anything appropriate. However, you may find this quote from Nicklaus Wirth (designer of Pascal and other languages) interesting:

The phantasies of computer scientists in the 1960s knew no bounds. Spurned by the success of automatic syntax analysis and parser generation, some proposed the idea of the flexible, or at least extensible language. The notion was that a program would be preceded by syntactic rules which would then guide the general parser while parsing the subsequent program. A step further: The syntax rules would not only precede the program, but they could be interspersed anywhere throughout the text. For example, if someone wished to use a particularly fancy private form of for statement, he could do so elegantly, even specifying different variants for the same concept in different sections of the same program. The concept that languages serve to communicate between humans had been completely blended out, as apparently everyone could now define his own language on the fly. The high hopes, however, were soon damped by the difficulties encountered when trying to specify, what these private constructions should mean. As a consequence, the intreaguing idea of extensible languages faded away rather quickly.

Edit: Here's Perl 6's Synopsis 6: Subroutines, unfortunately in markup form because I couldn't find an updated, formatted version; search within for "macro". Unfortunately, it's not too interesting, but you may find some things relevant, like Perl 6's one-pass parsing rule, or its syntax for abstract syntax trees. The approach Perl 6 takes is that a macro is a function that executes immediately after its arguments are parsed and returns either an AST or a string; Perl 6 continues parsing as if the source actually contained the return value. There is mention of generation of error messages, but they make it seem like if macros return ASTs, you can do alright.

A. Rex
Thanks for your input. Bounties may be the best thing that's happened to SO. Going to give it a bit more time to see what else comes in, but I'll mark an accepted answer well before the bounty period is up. Thanks again :)
Cody Brocious
I wish I could help more. I don't feel this deserves your bounty =) but the bounty obviously brought my attention to your post.
A. Rex
The whole point of this question is less to get a direct answer on why this is a good/bad idea, but rather to get my mind in the right place to consider the ramifications of my idea. For that reason, this is quite a helpful response.
Cody Brocious
Just looked over the Perl link. I had forgotten all about them doing macros in perl6 -- definitely something I need to look into more. However, there doesn't seem to be a way for you to compile code that can't actually be completely compiled until later in runtime. Am I wrong in this?
Cody Brocious
I've actually never used Perl 6, but I read through the Synopses back when they came out. As far as I know, there's no straightforward way, but you could probably put something in a macro which is a placeholder for code that you compile at the very end of compilation (using an INIT/CHECK block?).
A. Rex
People who are downvoting this: can you at least let me know why? Thanks.
A. Rex
Accepted the answer so you'd get the bounty. Although this wasn't a direct answer, it's by far the most helpful :)
Cody Brocious
A: 

You'll probably need to delimit the bits of input text with unknown syntax, so that the rest of the syntax tree can be resolved, apart from some character sequences nodes which will be expanded later. Depending on your top level syntax, that may be fine.

You may find that the parsing algorithm and the lexer and the interface between them all need updating, which might rule out most compiler creation tools.

(The more usual approach is to use string constants for this purpose, which can be parsed to a little interpreter at run time).

Dickon Reed
Thanks for the input. This actually requires no special casing or changes in my parser as it is, as I parse things into an S-expression type tree and macros work on that. The parsing failure is really just not having a macro for an S-exp.
Cody Brocious
+2  A: 

Pushing this one step further, you could do "lazy" parsing and always only parse enough to evaluate the next statement. Like some kind of just-in-time parser. Then syntax errors could become normal runtime errors that just raise a normal Exception that could be handled by surrounding code:

def fun():
   not implemented yet

try:
  fun()
except:
  pass

That would be an interesting effect, but if it's useful or desirable is a different question. Generally it's good to know about errors even if you don't call the code at the moment.

Macros would not be evaluated until control reaches them and naturally the parser would already know all previous definitions. Also the macro definition could maybe even use variables and data that the program has calculated so far (like adding some syntax for all elements in a previously calculated list). But this is probably a bad idea to start writing self-modifying programs for things that could usually be done as well directly in the language. This could get confusing...

In any case you should make sure to parse code only once, and if it is executed a second time use the already parsed expression, so that it doesn't lead to performance problems.

sth
A: 

I don't think your approach would work very well. Let's take a simple example written in pseudo-code:

define some syntax M1 with definition D1
if _whatever_:
    define M1 to do D2
else:
    define M1 to do D3
code that uses M1

So there is one example where, if you allow syntax redefinition at runtime, you have a problem (since by your approach the code that uses M1 would be compiled by definition D1). Note that verifying if syntax redefinition occurs is undecidable. An over-approximation could be computed by some kind of typing system or some other kind of static analysis, but Python is not well known for this :D.

Another thing that bothers me is that your solution does not 'feel' right. I find it evil to store source code you can't parse just because you may be able to parse it at runtime.

Another example that jumps to mind is this:

...function definition fun1 that calls fun2...
define M1 (at runtime)
use M1
...function definition for fun2

Technically, when you use M1, you cannot parse it, so you need to keep the rest of the program (including the function definition of fun2) in source code. When you run the entire program, you'll see a call to fun2 that you cannot call, even if it's defined.

+1  A: 

Here are some ideas from my master's thesis, which may or may not be helpful. The thesis was about robust parsing of natural language. The main idea: given a context-free grammar for a language, try to parse a given text (or, in your case, a python program). If parsing failed, you will have a partially generated parse tree. Use the tree structure to suggest new grammar rules that will better cover the parsed text. I could send you my thesis, but unless you read Hebrew this will probably not be useful.

In a nutshell: I used a bottom-up chart parser. This type of parser generates edges for productions from the grammar. Each edge is marked with the part of the tree that was consumed. Each edge gets a score according to how close it was to full coverage, for example:

S -> NP . VP

Has a score of one half (We succeeded in covering the NP but not the VP). The highest-scored edges suggest a new rule (such as X->NP). In general, a chart parser is less efficient than a common LALR or LL parser (the types usually used for programming languages) - O(n^3) instead of O(n) complexity, but then again you are trying something more complicated than just parsing an existing language. If you can do something with the idea, I can send you further details. I believe looking at natural language parsers may give you some other ideas.

Yuval F