tags:

views:

119

answers:

3

Hello Everyone,

Just seeking a favour to write a regular expression to match the following set of strings. I want to write an expression which matches all the following strings TCL

i) ( XYZ XZZ XVZ XWZ )

Clue : Starting string is X and Z ending string is same for all the pairs. Only the middle string is differs Y Z V W.

My trial: [regexp {^X([Y|Z|V|W]*)Z$}]

I want to write an another regexp which catches/matches only the following string wherever comes

ii) (XYZ)

My trial: [regexp {^X([Y]*)Z$}] or simply regexp {^XYZ$}

Just want to make sure its a correct approach. Is there any other way available to optimize the regexp :)

i) 1st Question Tested

set to_Match_Str "XYZ XZZ XVZ XWZ"
    foreach {wholeStr to_Match_Str} [regexp -all -inline  {X[YZVW]Z} $to_Match_Str] { 

    puts "MATCH $to_Match_Str in the list" 
    } 

It prints only XZZ XWZ from the list. Its leaves out XYZ & XVZ When I include the paranthesis [regexp -all -inline {X([YZVW])Z} $to_Match_Str]. It prints all the middle characters correctly Y Z V W

A: 

My trial: [regexp {^X([Y|Z|V|W]*)Z$}]

That would match the strings given, but as you are using the * multiplier it would also match strings like "XZ", "XYYYYYYYYYYYYYYYYZ" and "XYZYVWZWWWZVYYWZ". To match the middle character only once, don't use a multiplier:

^X([Y|Z|V|W])Z$

My trial: [regexp {^X([Y]*)Z$}]

The same there, it will also match strings like "XZ", "XYYZ" and "XYYYYYYYYYYYYYYYYZ". Don't put a multiplier after the set:

^X([Y])Z$

or simply regexp {^XYZ$}

That won't catch anything. To make it do the same as the other (catch the Y character), you need the parentheses:

^X(Y)Z$
Guffa
+3  A: 

i) (XYZ XZZ XVZ XWZ)

Clue : Starting string is X and Z ending string is same for all the pairs. Only the middle string is differs Y Z V W.

My trial: [regexp {^X([Y|Z|V|W]*)Z$}]

Assuming you're not after literal parentheses around the whole lot, you match that using this:

regexp {X([YZVW])Z} $string -> matchedSubstr

That's because the interior strings are all single characters. (It also stores the matched substring in the variable matchedSubstr; choose any variable name there that you want.) You should not use | inside a [] in a regular expression, as it has no special meaning there. (You might need to add ^$ anchors round the outside.)

On the other hand, if you want to match multiple character sequences (which the Y etc. are just stand-ins for) then you use this:

regexp {X(Y|Z|V|W)Z} $string -> matchedSubstr

Notice that | is being used here, but [] is not.

If your real string has many of these strings (whichever pattern you're using to match them) then the easiest way to extract them all is with the -all -inline options to regexp, typically used in a foreach like this:

foreach {wholeStr matchedSubstr} [regexp -all -inline {X([YZVW])Z} $string] {
    puts "Hey! I found a $matchSubstr in there!"
}

Mix and match to taste.

My trial: [regexp {^X([Y]*)Z$}] or simply regexp {^XYZ$}

Just want to make sure its a correct approach. Is there any other way available to optimize the regexp :)

That's optimal for an exact comparison. And in fact Tcl will optimize that internally to a straight string equality test if that's literal.

Donal Fellows
well, it doesn't get more authoritative than an answer from a member of the Tcl core team. Cheers. :)
Jeff Atwood
The only tricky bit with this question was working out exactly what was asked. (As normal. Those who are skilled at question asking usually don't need to ask in the first place. So I prefer to try to help people ask better questions. :-))
Donal Fellows
user330727
The `-inline` option makes `regexp` return a list containing the match and each of the captured substrings. With `-all` as well, it returns the concatenation of those lists for every found match in the string, which is great for use with `foreach`.
Donal Fellows
user330727
To be crystal clear: the number of variables to use in the `foreach` depends entirely on the number of capturing parentheses in the regular expression; there should be one more variable than the number of parens. If you don't match them up, things won't work. You can use `regexp -about the_RE_to_examine` to find out how many parentheses there are; it's the first number in the result list there. (Note that `regexp -about` doesn't do matching; it compiles and returns metadata.)
Donal Fellows
A: 

You can use the Visual Regexp tool to help, it provides feedback as you construct your regular expression.

Trey Jackson