I have a space delimited list of files names, where spaces in the file names are prefixed by '\'
e.g. "first\ file second\ file"
How can I get my regex to match each file name?
I have a space delimited list of files names, where spaces in the file names are prefixed by '\'
e.g. "first\ file second\ file"
How can I get my regex to match each file name?
(\\ |[^ ])+
Everything except spaces, except when they're escaped. Should work, sorry for misunderstanding your question initially.
(\S|(?<=\\) )+
Explanation:
You are looking for either non white-space characters (\S
) or a space preceded by a backslash, multiple times.
All matches will be saved to mach group 1, apply the pattern globally to get all matches in the string.
EDIT
Thinking about it, you would not even need capturing to a sub-group. The match alone will be enough, so this could be a tiny bit more efficient (the ?:
switches to a non-capturing group):
(?:\S|(?<=\\) )+
I would do it like this:
/[^ \\]*(?:\\ [^\\ ]*)*/
This is Friedl's "unrolled loop" idiom. There will probably be very few escaped spaces in the target string relative to the other characters, so you gobble up as many of the other characters as you can each time you get a chance. This is much more efficient than an alternation matching one character at a time.
Edit: (Tomalak) I put slashes around the regex because the syntax highlighter seems to recognize them and paints the whole regex in one color. Without them, it can pick up on other characters, like quotation marks, and incorrectly (and confusingly) paint parts of the regex in different colors.
(Brad) The OP only mentioned spaces, so I only allowed for quoting them, but you're right. The original unrolled-loop example in the book was for double-quoted strings, which may contain any of several escape sequences, one of which is an escaped quotation mark. Here's the regex:
/"[^\\"]*(?:\\.[^\\"]*)*"/
(Tomalak) I don't know what you mean when you say that it doesn't match "the file name at the start of the string." It seems to match both of the file names in the OP's example. However, it also matches an empty string, which isn't good. That can be fixed, but unless efficiency is proved to be a problem, it isn't worth the effort. Stefan's solution works fine.