views:

566

answers:

8

Often, programmers write code that generates other code.

(The technical term is metaprogramming, but it is more common than merely cross-compilers; think about every PHP web-page that generates HTML or every XSLT file.)

One area I find challenging is coming up with techniques to ensure that both the hand-written source file, and the computer-generated object file are clearly indented to aid debugging. The two goals often seem to be competing.

I find this particularly challenging in the PHP/HTML combination. I think that is because:

  • there is sometimes more of the HTML code in the source file than the generating PHP
  • HTML files tend to be longer than, say, SQL statements, and need better indenting
  • HTML has space-sensitive features (e.g. between tags)
  • the result is more publicly visible HTML than SQL statements, so there is more pressure to do a reasonable job.

What techniques do you use to address this?


Edit: I accept that there are at least three arguments to not bothering to generate pretty HTML code:

  • Complexity of generating code is increased.
  • Makes no difference to rendering by browser; developers can use Firebug or similar to view it nicely.
  • Minor performance hit - increased download time for whitespace characters.

I have certainly sometimes generated code without thought to the indenting (especially SQL).

However, there are a few arguments pushing the other way:

  • I find, in practice, that I do frequently read generated code - having extra steps to access it is inconvenient.
  • HTML has some space-sensitivity issues that bite occasionally.

For example, consider the code:

<div class="foo">
    <?php
        $fooHeader();
        $fooBody();
        $fooFooter();
    ?>
</div>

It is clearer than the following code:

<div class="foo"><?php
        $fooHeader();
        $fooBody();
        $fooFooter();
?></div>

However, it is also has different rendering because of the whitespace included in the HTML.

+2  A: 

A technique that I use when the generating code dominates over the generated code is to pass an indent parameter around.

e.g., in Python, generating more Python.

def generateWhileLoop(condition, block, indentPrefix = ""):
    print indentPrefix + "while " + condition + ":"
    generateBlock(block, indentPrefix + "    ")

Alternatively, depending on my mood:

def generateWhileLoop(condition, block, indentLevel = 0):
    print " " * (indentLevel * spacesPerIndent) + "while " + condition + ":"
    generateBlock(block, indentLevel + 1)

Note the assumption that condition is a short piece of text that fits on the same line, while block is on a separate indented line. If this code can't be sure of whether the sub-items need to be indented, this method starts to fall down.

Also, this technique isn't nearly as useful for sprinkling relatively small amounts of PHP into HTML.

[Edit to clarify: I wrote the question and also this answer. I wanted to seed the answers with one technique that I do use and is sometimes useful, but this technique fails me for typical PHP coding, so I am looking for other ideas like it.]

Oddthinking
This solution doesn't work very well if you are assembling blocks of text versus just single liners.
Jack
In the example code, I show how single liners work (e.g. <code>condition</code>) and how blocks work (e.g. <code>block</block>).It doesn't work well when you don't know what you are expecting.
Oddthinking
I've done something similar, except that I cleaned it up by wrapping it in a class. I'd have something like an IndentedOutput class with an indentation counter. Then my calls for above would look like: out.line("while " + condition + ":"), out.indent(), generateBlock(...), out.unindent()
Boojum
+3  A: 

In the more general case, I have written XSLT code that generates C++ database interface code. Although at first I tried to output correctly indented code from the XSLT, this quickly became untenable. My solution was to completely ignore formatting in the XSLT output, and then run the resulting very long line of code through GNU indent. This produced a reasonably formatted C++ source file suitable for debugging.

I can imagine the problem gets a lot more prickly when dealing with combined source such as HTML and PHP.

Greg Hewgill
This addresses C++, but the author of the post specifically mentions PHP and HTML. As a common PHP programmer, I find this problem vexing, especially when formatting arrays (Ruby also has this problem). What's the answer in this specific case?
American Yak
@American Yak: I don't know. Does PHP have an equivalent to `indent`?
Greg Hewgill
+2  A: 

Generate an AST then traverse it inorder and emit source code that is properly formatted.

Watson Ladd
+1  A: 

I agree with oddthinking's answer.

Sometimes it's best to solve the problem by inverting it. If you find yourself generating a whole lot of text, consider if its easier to write the text as a template with small bits of intelligent generation code. Or if you can break the problem down into a series of small templates which you assemble, and then indent each template as a whole.

Schwern
+1  A: 

Making websites in PHP, I find mixing of HTML and function specific PHP problematic, it limits the overview and makes debugging harder. A solution to avoid mixing in this case is using template driven content, see Smarty for example. Except better intendation, templating of content is useful for other things like, for example, faster patching. If a customer requires a change in the layout, that particular layout issue can be quickly found and fixed, without bothering with the functional PHP code generating the data (and the other way around).

CooPs
A: 

Specifically on HTML generation - why does it matter?

You're spending a heck of a lot of time passing around indenting parameters, and trying to figure out how deeply nested you are etc.. Aside from being a general waste of time (since there is no difference in the final rendered output), how do you maintain all this stuff as you add other HTML markup and wrap pages in a div etc?

Anyway, install Firebug (and IE developer toolbar for testing IE afterwards) and they both show you the HTML in the nested format, AND you can just click on the page element to directly view the markup - WAY more efficient than looking at raw source HTML output.

gregmac
+2  A: 

I have found that ignoring indenting during generation is best. I have written a generic 'code formatting' engine that post processed all code outputted. This way, I can define indenting rules and code syntax rules seperately from the generator. There are clear benefits to this separation.

Jack
A: 

I the PHP/HTML situation I try to keep each code fragment consistently indented in its source code. This keeps the code readable where it really matters and usually has the side effect of producing HTML output that is readable. As others have said, firebug takes care of the rest.

Colonel Sponsz