views:

304

answers:

5

I need to read a string, detect a {VAR}, and then do a file_get_contents('VAR.php') in place of {VAR}. The "VAR" can be named anything, like TEST, or CONTACT-FORM, etc. I don't want to know what VAR is -- not to do a hard-coded condition, but to just see an uppercase alphanumeric tag surrounded by curly braces and just do a file_get_contents() to load it.

I know I need to use preg_match and preg_replace, but I'm stumbling through the RegExps on this.

How is this useful? It's useful in hooking WordPress.

+1  A: 

You'll need to do a number of things. I'm assuming you can do the legwork to get the page data you want to preprocess into a string.

  1. First, you'll need the regular expression to match correctly. That should be fairly easy with something like /{\w+}/.

  2. Next you'll need to use all of the flags to preg_match to get the offset location in the page data. This offset will let you divide the string into the before, matching, and after parts of the match.

  3. Once you have the 3 parts, you'll need to run your include, and stick them back together.

  4. Lather, rinse, repeat.

  5. Stop when you find no more variables.

This isn't terribly efficient, and there are probably better ways. You may wish to consider doing a preg_split instead, splitting on /[{}]/. No matter how you slice it you're assuming that you can trust your incoming data, and this will simplify the whole process a lot. To do this, I'd lay out the code like so:

  1. Take your content and split it like so: $parts = preg_split('/[{}]/', $page_string);

  2. Write a recursive function over the parts with the following criteria:

    • Halt when length of arg is < 3
    • Else, return a new array composed of
    • $arg[0] . load_data($arg[1]) . $arg[2]
    • plus whatever is left in $argv[3...]
  3. Run your function over $parts.

Benson
{} should be \-escaped in a regex.
bobince
preg_split, eh? I will give that a look.
A: 

You can do it without regexes (god forbid), something like:

//return true if $str ends with $sub
function endsWith($str,$sub) {
    return ( substr( $str, strlen( $str ) - strlen( $sub ) ) === $sub );
}

$theStringWithVars = "blah.php cool.php awesome.php";
$sub = '.php';
$splitStr = split(" ", $theStringWithVars);
for($i=0;$i<count($splitStr);$i++) {
    if(endsWith(trim($splitStr[$i]),$sub)) {
        //file_get_contents($splitStr[$i]) etc...
    }    
}
karim79
Which do you think is faster? preg_replace_callback() or your technique with split/strlen/substr/trim?
+1  A: 

Off the top of my head, you want this:

// load the "template" file
$input = file_get_contents($template_file_name);

// define a callback. Each time the regex matches something, it will call this function.
// whatever this function returns will be inserted as the replacement
function replaceCallback($matches){
  // match zero will be the entire match - eg {FOO}. 
  // match 1 will be just the bits inside the curly braces because of the grouping parens in the regex - eg FOO
  // convert it to lowercase and append ".html", so you're loading foo.html

  // then return the contents of that file.
  // BEWARE. GIANT MASSIVE SECURITY HOLES ABOUND. DO NOT DO THIS
  return file_get_contents( strtolower($matches[1]) . ".html" );
};
// run the actual replace method giving it our pattern, the callback, and the input file contents
$output = preg_replace_callback("\{([-A-Z]+)\}", replaceCallback, $input);

// todo: print the output

Now I'll explain the regex

 \{([-A-Z]+)\}
  • The \{ and \} just tell it to match the curly braces. You need the slashes, as { and } are special characters, so they need escaping.
  • The ( and ) create a grouping. Basically this lets you extract particular parts of the match. I use it in the function above to just match the things inside the braces, without matching the braces themselves. If I didn't do this, then I'd need to strip the { and } out of the match, which would be annoying
  • The [-A-Z] says "match any uppercase character, or a -
  • The + after the [-A-Z] means we need to have at least 1 character, but we can have up to any number.
Orion Edwards
function replaceCallback($asMatches) { return file_get_contents(TEMPLATEPATH . '/hook-' . $asMatches[1] . '.php');}$content = preg_replace_callback('/\{([A-Z0-9]+)\}/', replaceCallback, $content);
That's slick, I like it. I'm still fairly uncomfortable with the initial concept though: beware of {../../../../../etc/shadow} and friends.
Benson
Yeah, I'm hoping that when I say A-Z0-9, that someone can't trip up the parser from a C or assembler level, and get it to start accepting ../.. and so on.
A: 

Orion above has a right solution, but it's not really necessary to use a callback function in your simple case.

Assuming that the filenames are A-Z + hyphens you can do it in 1 line using PHP's /e flag in the regex:

$str = preg_replace('/{([-A-Z]+)}/e', 'file_get_contents(\'$1.html\')', $str);

This'll replace any instance of {VAR} with the contents of VAR.html. You could prefix a path into the second term if you need to specify a particular directory.

There are the same vague security worries as outlined above, but I can't think of anything specific.

Ciaran McNulty
My adjustment: $content = preg_replace('/\{([-A-Z0-9]+)\}/e','file_get_contents(TEMPLATEPATH . \'/hook-$1.php\')',$content,1);This is because in my case there will only be one instance of a given {VAR} in $content, and the "1" on the end makes it run faster.
A: 

Comparatively speaking, regular expression are expensive. While you may need them to figure out which files to load, you certainly don't need them for doing the replace, and probably shouldn't use regular expressions. After all, you know exactly what you are replacing so why do you need fuzzy search?

Use an associative array and str_replace to do your replacements. str_replace supports arrays for doing multiple substitutions at once. One line substitution, no loops.

For example:

$substitutions = array('{VAR}'=>file_get_contents('VAR.php'),
'{TEST}'=>file_get_contents('TEST.php'),
...
);

$outputContents = str_replace( array_keys($substitutions), $substitutions, $outputContents);
Brent Baisley