tags:

views:

67

answers:

2

I am trying to find the variables in a string, e.g.

"%0" can not be found. %1 Please try again %2

I need to know how each variable ends (space, period, end of line) cause I will check for the existence of same variable in the translated version of this string. Text comes from a CSV and strings do not end with a line break.

I am able to capture them all except the ones at the end of a string with:

reg = /[%@!][^\s]+[\s\.\z$]+/

I thought either $ or \z should match end of line but that does not seem to work. How can I capture %2 in the above scenario? (again, there is no line break at the end)

+1  A: 

$ matches end-of-line, but not when used inside brackets like that. Writing [$] is how you would look for the normal dollar-sign character '$'.

If the string you are searching is the exact string you listed above, try

reg = /^"(.*)" can not be found[.] (.*) Please try again (.*)$/
error_string =~ reg

Your three matching results will be stored in the special variables $1, $2, and $3.

bta
Thanks bta, that is just an example string, I have thousands of that to check so I cannot really use it in the form that you sent. What about \z why that won't work in the form that I wrote?
eakkas
Inside a character class, almost everything loses its special meaning. `\s` still matches all ASCII whitespace characters, but `$` just matches `$`, `\z` matches `z` (the `\` is ignored), and `.` matches `.` (whether you escape it or not).
Alan Moore
@eakkas: Yeah, I figured the string you gave was just an example. I was mostly just illustrating the way to use `$` and `()`. As far as I can tell, `$` and `\z` are equivalent operators, and both lose their special meaning when placed inside `[]`.
bta
A: 

Okay, I solved it with a different approach. Using a positive lookahead works as the character class is not needed

/[%@!][\w]+(?=\s|\z|\.|\W)/

For the example string, this returns:

%0

%1

%2
eakkas
Now try it without the lookahead: `/[%@!]\w+/` `\w+` consumes as many word characters as it can, so the next thing *must* be either a non-word character (`\W`, which includes `\s` and `\.`) or the end of the string. The lookahead isn't doing anything for you.
Alan Moore
What command/operator are you using to do the matching? Using the expression you posted, I am only able to get a single matching element at a time. Using `/([%@!]\w+)[^%@!]*([%@!]\w+)[^%@!]*([%@!]\w+)/` will give me all three.
bta