tags:

views:

70

answers:

1

I'm using PHP 5's preg functions, if it makes a difference.

Consider the regular language matched by the following regular expression.

([^{}] | {[0-9a-zA-Z_]+})*

The language consists of strings of any number of characters, with special embedded tags marked off by left and right curly brackets, which contain a string of one or more alphanumeric or underscore characters. For example, the following is a valid string in the language:

asdfasdf 1243#$*#{A_123}asdf?{432U}

However, while validating a string with this regex, I would like to get a list of these curly-bracket-delimited tags and their positions in the string. Considering the previous example string, I'd like to have an array that tells me:

A_123: 20; 432U: 32

Is this possible with regular expressions? Or should I just write a function "by hand" without regexp that goes through every character of the string and parses out the data I need?

Forgive me if this is an elementary question; I'm just learning!

+2  A: 

To capture the offsets, you can set the PREG_OFFSET_CAPTURE flag. http://php.net/manual/en/function.preg-match.php

preg_match ($regex, $subject, $matches, PREG_OFFSET_CAPTURE);

You can run the following script yourself and see the results:

$regex = '~({(\w+)})+~';
$str = 'asdfasdf 1243#$*#{A_123}asdf?{432U}';

preg_match_all($regex, $str, $m, PREG_OFFSET_CAPTURE);
$tags = $m[1];

echo '<pre>';
print_r($tags); // prints tags and their offsets
echo '</pre>';

On the pattern:

  • \w is a escape sequence equivalent to the following character class: [a-zA-Z0-9_]
  • The round brackets (...) are used for grouping and they also create backreferences.
  • The + is a quantifier that means "one or more" of the previous pattern

A good resource on regex: http://www.regular-expressions.info

NullUserException
Actually the offsets are the least important part. I just need to get the set of curly-bracket-delimited tags. Maybe I should rephrase my question thus: I know how to use regular expressions to verify whether a given string belongs to a given regular language, but I don't know how to use them for anything else (e.g. to extract some set of substrings from the string, which is what I need to do here)
Brennan Vincent
You just want a list of whatever is in the curly braces `{}`? See my updated answer
NullUserException
Thanks! 7654321
Brennan Vincent