I'm working on a macro system for Python (as discussed here) and one of the things I've been considering are units of measure. Although units of measure could be implemented without macros or via static macros (e.g. defining all your units ahead of time), I'm toying around with the idea of allowing syntax to be extended dynamically at runtime.
To do this, I'm considering using a sort of partial evaluation on the code at compile-time. If parsing fails for a given expression, due to a macro for its syntax not being available, the compiler halts evaluation of the function/block and generates the code it already has with a stub where the unknown expression is. When this stub is hit at runtime, the function is recompiled against the current macro set. If this compilation fails, a parse error would be thrown because execution can't continue. If the compilation succeeds, the new function replaces the old one and execution continues.
The biggest issue I see is that you can't find parse errors until the affected code is run. However, this wouldn't affect many cases, e.g. group operators like [], {}, (), and `` still need to be paired (requirement of my tokenizer/list parser), and top-level syntax like classes and functions wouldn't be affected since their "runtime" is really load time, where the syntax is evaluated and their objects are generated.
Aside from the implementation difficulty and the problem I described above, what problems are there with this idea?