tags:

views:

490

answers:

7

How can I create a regular expression that will grab delimited text from a string? For example, given a string like

text ###token1### text text ###token2### text text

I want a regex that will pull out ###token1###. Yes, I do want the delimiter as well. By adding another group, I can get both: (###(.+?)###)

+4  A: 

/###(.+?)###/

if you want the ###'s then you need

/(###.+?###)/

the ? means non greedy, if you didn't have the ?, then it would grab too much. e.g. '###token1### text text ###token2###' would all get grabbed.

My initial answer had a * instead of a +. * means 0 or more. + means 1 or more. * was wrong because that would allow ###### as a valid thing to find.

For playing around with regular expressions. I highly recommend http://www.weitz.de/regex-coach/ for windows. You can type in the string you want and your regular expression and see what it's actually doing.

Your selected text will be stored in \1 or $1 depending on where you are using your regular expression.

David Beleznay
Just a note that in Java, this would be in group(1) from the matcher instance after the last call to find()
cynicalman
A: 

Assuming you want to match ###token2### as well...

/###.+###/
Jonathan
A: 

Check out Regex Buddy Jeff has recomended it several times http://www.codinghorror.com/blog/archives/000027.html

Unkwntech
A: 

Here is a good site as well, you can roll through all the tutorials and get yourself versed with Regex's.

http://www.regular-expressions.info/

Quintin Robinson
A: 

Use () and \x. A naive example that assumes the text within the tokens is always delimited by #:

text (#+.+#+) text text (#+.+#+) text text

The stuff in the () can then be grabbed by using \1 and \2 (\1 for the first set, \2 for the second in the replacement expression (assuming you're doing a search/replace in an editor). For example, the replacement expression could be:

token1: \1, token2: \2

For the above example, that should produce:

token1: ###token1###, token2: ###token2###

If you're using a regexp library in a program, you'd presumably call a function to get at the contents first and second token, which you've indicated with the ()s around them.

Colen
A: 

Well when you are using delimiters such as this basically you just grab the first one then anything that does not match the ending delimiter followed by the ending delimiter. A special caution should be that in cases as the example above [^#] would not work as checking to ensure the end delimiter is not there since a singe # would cause the regex to fail (ie. "###foo#bar###). In the case above the regex to parse it would be the following assuming empty tokens are allowed (if not, change * to +):

###([^#]|#[^#]|##[^#])*###

+1  A: 

In Perl, you actually want something like this:

$text = 'text ###token1### text text ###token2### text text';

while($text =~ m/###(.+?)###/g) {
  print $1, "\n";
}

Which will give you each token in turn within the while loop. The (.*?) ensures that you get the shortest bit between the delimiters, preventing it from thinking the token is 'token1### text text ###token2'.

Or, if you just want to save them, not loop immediately:

@tokens = $text =~ m/###(.+?)###/g;
Michael Cramer