ansaurus

Question

How to cycle through delimited tokens with a Regular Expression?

Answer 1

+4 A:

/###(.+?)###/

if you want the ###'s then you need

/(###.+?###)/

the ? means non greedy, if you didn't have the ?, then it would grab too much. e.g. '###token1### text text ###token2###' would all get grabbed.

My initial answer had a * instead of a +. * means 0 or more. + means 1 or more. * was wrong because that would allow ###### as a valid thing to find.

For playing around with regular expressions. I highly recommend http://www.weitz.de/regex-coach/ for windows. You can type in the string you want and your regular expression and see what it's actually doing.

Your selected text will be stored in \1 or $1 depending on where you are using your regular expression.

David Beleznay 2008-09-16 20:13:33

Just a note that in Java, this would be in group(1) from the matcher instance after the last call to find()

cynicalman 2008-09-16 20:20:04

Answer 2

A:

Assuming you want to match ###token2### as well...

/###.+###/

Jonathan 2008-09-16 20:14:55

Answer 3

A:

Check out Regex Buddy Jeff has recomended it several times http://www.codinghorror.com/blog/archives/000027.html

Unkwntech 2008-09-16 20:17:46

Answer 4

A:

Here is a good site as well, you can roll through all the tutorials and get yourself versed with Regex's.

http://www.regular-expressions.info/

Quintin Robinson 2008-09-16 20:18:45

Answer 5

A:

Use () and \x. A naive example that assumes the text within the tokens is always delimited by #:

text (#+.+#+) text text (#+.+#+) text text

The stuff in the () can then be grabbed by using \1 and \2 (\1 for the first set, \2 for the second in the replacement expression (assuming you're doing a search/replace in an editor). For example, the replacement expression could be:

token1: \1, token2: \2

For the above example, that should produce:

token1: ###token1###, token2: ###token2###

If you're using a regexp library in a program, you'd presumably call a function to get at the contents first and second token, which you've indicated with the ()s around them.

Colen 2008-09-16 20:19:07

Answer 6

A:

Well when you are using delimiters such as this basically you just grab the first one then anything that does not match the ending delimiter followed by the ending delimiter. A special caution should be that in cases as the example above [^#] would not work as checking to ensure the end delimiter is not there since a singe # would cause the regex to fail (ie. "###foo#bar###). In the case above the regex to parse it would be the following assuming empty tokens are allowed (if not, change * to +):

###([^#]|#[^#]|##[^#])*###

2008-09-16 20:20:10

Answer 7

+1 A:

In Perl, you actually want something like this:

$text = 'text ###token1### text text ###token2### text text';

while($text =~ m/###(.+?)###/g) {
  print $1, "\n";
}

Which will give you each token in turn within the while loop. The (.*?) ensures that you get the shortest bit between the delimiters, preventing it from thinking the token is 'token1### text text ###token2'.

Or, if you just want to save them, not loop immediately:

@tokens = $text =~ m/###(.+?)###/g;

Michael Cramer 2008-09-16 20:21:02

ansaurus

tags:

views:

answers:

How to cycle through delimited tokens with a Regular Expression?

related questions