ansaurus

Question

Help needed in writing regular expression -- TCL

Answer 1

A:

My trial: [regexp {^X([Y|Z|V|W]*)Z$}]

That would match the strings given, but as you are using the * multiplier it would also match strings like "XZ", "XYYYYYYYYYYYYYYYYZ" and "XYZYVWZWWWZVYYWZ". To match the middle character only once, don't use a multiplier:

^X([Y|Z|V|W])Z$

My trial: [regexp {^X([Y]*)Z$}]

The same there, it will also match strings like "XZ", "XYYZ" and "XYYYYYYYYYYYYYYYYZ". Don't put a multiplier after the set:

^X([Y])Z$

or simply regexp {^XYZ$}

That won't catch anything. To make it do the same as the other (catch the Y character), you need the parentheses:

^X(Y)Z$

Guffa 2010-05-02 09:36:34

Answer 2

+3 A:

i) (XYZ XZZ XVZ XWZ)

Clue : Starting string is X and Z ending string is same for all the pairs. Only the middle string is differs Y Z V W.

My trial: [regexp {^X([Y|Z|V|W]*)Z$}]

Assuming you're not after literal parentheses around the whole lot, you match that using this:

regexp {X([YZVW])Z} $string -> matchedSubstr

That's because the interior strings are all single characters. (It also stores the matched substring in the variable matchedSubstr; choose any variable name there that you want.) You should not use | inside a [] in a regular expression, as it has no special meaning there. (You might need to add ^$ anchors round the outside.)

On the other hand, if you want to match multiple character sequences (which the Y etc. are just stand-ins for) then you use this:

regexp {X(Y|Z|V|W)Z} $string -> matchedSubstr

Notice that | is being used here, but [] is not.

If your real string has many of these strings (whichever pattern you're using to match them) then the easiest way to extract them all is with the -all -inline options to regexp, typically used in a foreach like this:

foreach {wholeStr matchedSubstr} [regexp -all -inline {X([YZVW])Z} $string] {
    puts "Hey! I found a $matchSubstr in there!"
}

Mix and match to taste.

My trial: [regexp {^X([Y]*)Z$}] or simply regexp {^XYZ$}

Just want to make sure its a correct approach. Is there any other way available to optimize the regexp :)

That's optimal for an exact comparison. And in fact Tcl will optimize that internally to a straight string equality test if that's literal.

Donal Fellows 2010-05-02 10:51:09

well, it doesn't get more authoritative than an answer from a member of the Tcl core team. Cheers. :)

Jeff Atwood 2010-05-02 11:08:10

The only tricky bit with this question was working out exactly what was asked. (As normal. Those who are skilled at question asking usually don't need to ask in the first place. So I prefer to try to help people ask better questions. :-))

Donal Fellows 2010-05-02 11:51:42

user330727 2010-05-03 01:24:30

The `-inline` option makes `regexp` return a list containing the match and each of the captured substrings. With `-all` as well, it returns the concatenation of those lists for every found match in the string, which is great for use with `foreach`.

Donal Fellows 2010-05-03 06:32:39

user330727 2010-05-03 08:14:51

To be crystal clear: the number of variables to use in the `foreach` depends entirely on the number of capturing parentheses in the regular expression; there should be one more variable than the number of parens. If you don't match them up, things won't work. You can use `regexp -about the_RE_to_examine` to find out how many parentheses there are; it's the first number in the result list there. (Note that `regexp -about` doesn't do matching; it compiles and returns metadata.)

Donal Fellows 2010-05-03 09:42:10

Answer 3

A:

You can use the Visual Regexp tool to help, it provides feedback as you construct your regular expression.

Trey Jackson 2010-05-02 14:33:07

ansaurus

tags:

views:

answers:

Help needed in writing regular expression -- TCL

related questions