views:

52

answers:

3

Overview

Around the end of 2009, I wrote a simple templating system for PHP/HTML to be used in-house by our designers for brochure-ware type websites. The goal of the system is to allow templating in otherwise pure HTML via custom tags that are processed by PHP. For example, a templated page might look like this:

<tt:Page template="templates/main.html">
  <tt:Content name="leftColumn">
    <p> blah blah </p>
    ...
  </tt:Content>
  <tt:Content name="rightColumn">
    <p> blah blah </p>
    ...
  </tt:Content>
</tt:Page>

The template itself might look something like this:

<html>
  <head>...</head>
  <body>
    <div style="float:left; width:45%">
      <tt:Container name="leftColumn" />
    </div>
    <div style="width:45%">
      <tt:Container name="rightColumn" />
    </div>
  </body>
</html>

Besides the Page and Content/Container tags, there are a few other tags included in the core for stuff like flow control, iterating over a collection, outputting dynamic values, etc. The framework is designed so it's very easy to add your own set of tags registered under another prefix and namespace.

Custom Tags to PHP

How do we parse these custom tags? Since the're no guarantee that the HTML file is well-formed XML, solutions like XSLT/XPATH won't be reliable. Instead, we use a regex to look for tags with registered prefixes, and replace those with PHP code. The PHP code is a stack-based design... upon encountering an opening tag, an object representing the tag is created pushed onto the stack, and its "initialization function" (if any) runs. Whenever a registered closing tag is encountered, the most recent object is popped off the stack, and its "rendering function" runs.

So, after the framework replaces the templating tags with PHP, our example page might look something like this (in realty it's a bit uglier):

<?php $tags->push('tt', 'Page', array('template'=>'templates/main.html')); ?>
  <?php $tags->push('tt', 'Content', array('name'=>'leftColumn')); ?>
    <p> blah blah </p>
    ...
  <?php $tags->pop(); ?>
  <?php $tags->push('tt', 'Content', array('name'=>'rightColumn')); ?>
    <p> blah blah </p>
    ...
  <?php $tags->pop(); ?>
<?php $tags->pop(); ?>

The good, the bad, and eval

Now, how to execute our newly-generated PHP code? I can think of a few options here. The easiest is to simply eval the string, and that works well enough. However, any programmer will tell you "eval is evil, don't use it..." so the question is, is there anything more appropriate than eval that we can use here?

I've considered using a temporary or cached file, using php:// output streams, etc, but as far as I can see these don't offer any real advantage over eval. Caching could speed things up, but in practice all the sites we have on this thing are already blazingly fast, so I see no need to make speed optimizations at this point.

Questions

For each of the things on this list: is it a good idea? Can you think of a better alternative?

  • the whole idea in general (custom tags for html / php)
  • converting tags to php code instead of processing directly
  • the stack-based approach
  • the use of eval (or similar)

Thanks for reading and TIA for any advice. :)

A: 

Yes, instead of eval, you can do what zend and other major frameworks do, use output buffering:

ob_start();
include($template_file); //has some HTML and output generating PHP
$result = ob_get_contents();
ob_end_clean();
Mike Sherov
This would mean saving the intermediate PHP code (which is generated from parsing the markup) to a file and then including it. This involves a lot of disk access that simply using 'eval' does not. What's the advantage to doing it this way?
no
I see what you mean. This answer is not wrong, but not realistic here then. I'll post another, more sane answer
Mike Sherov
@no, my new answer has been posted.
Mike Sherov
+2  A: 

I posted another answer because it's radically different from the first, which might also be valuable.

Essentially, this question is asking how to execute PHP code with regex. It may not seem that obvious, but this is what the eval is intending to accomplish.

With that said, instead of doing a pass of preg_replace and then doing an eval, you could just use PHP's preg_replace_callback function to execute a piece of code when matched.

See here for how the function works: http://us.php.net/manual/en/function.preg-replace-callback.php

Mike Sherov
Mike, see my response to John (you guys are saying pretty much the same thing I think).
no
+2  A: 

Let me advocate a different approach. Instead of generating PHP code dynamically and then trying to figure out how to execute it safely, execute it directly as you encounter the tags. You can process the entire block of HTML in one pass and handle each tag as you encounter it immediately.

Write a loop that looks for tags. Its basic structure will look like this:

  1. Look for a custom tag, which you find at position n.
  2. Everything before position n must be simple HTML, so either save it off for processing or output it immediately (if you have no tags on your $tags stack you probably don't need to save it anywhere).
  3. Execute the appropriate code for the tag. Instead of generating code that calls $tags->push, just call $tags->push directly.
  4. Go back to step 1.

With this approach you only call PHP functions directly, you never build PHP code on the fly and then execute it later. The need for eval is gone.

You'll basically have two cases for step #3. When you encounter an opening tag you will do an immediate push. Then later when you hit the closing tag you can do a pop and then handle the tag in the appropriate manner, now that you've processed the entire contents of the custom element.

It is also more efficient to process the HTML this way. Doing multiple search and replaces on a long HTML string is inefficient as each search and each replacement is O(n) on the length of the string. Meaning you're repeatedly scanning the string over and over, and each time you do a replacement you have to generate whole new strings of similar length. If you have 20KB of HTML then each replacement involves searching through that 20KB and then creating a new 20KB string afterwards.

John Kugelman
John, this sounds like the proper way to do it for strictly pure HTML (no PHP), and I'll take a shot at it. However when I designed the templating system I thought it would be nice if you could mix PHP in, and I think some of the sites now have it mixed in. Maybe that was a bad design idea, but I think it now necessitates the use of `eval` or similar. Also, at design time I thought caching the generated PHP would be a good idea. Do you think that might be comparable to or faster than your approach here?
no
Caching would mitigate the performance concerns, so you'd then only have to worry about the security aspect.
John Kugelman
If you allow users to mix in PHP code then that's a whole 'nother ball of wax. Once you're executing arbitrary code then worrying about a call to eval is pretty pointless. Your security concerns would then change to sandboxing the PHP code, using PHP's safe_mode, locking down user permissions so they can't do random system() calls, etc.
John Kugelman
John: good points, thanks. The system was designed for in-house use, so originally I wasn't concerned about the designers doing anything malicious with it. Things change, though, and having a safe mode for this thing will probably be necessary relatively soon. Since I didn't mention the mixed-in PHP in my OP, I'll mark this as correct.
no
Ah, well then if it's for in-house developers, `eval` away! :-)
John Kugelman