I'm trying to parse (in Ruby) what's effectively the UNIX passwd file-format: comma delimiters, with an escape character \
such that anything escaped should be considered literally. I'm trying to use a regular expression for this, but I'm coming up short — even when using Oniguruma for lookahead/lookbehind assertions.
Essentially, all of the following should work:
a,b,c # => ["a", "b", "c"]
\a,b\,c # => ["a", "b,c"]
a,b,c\
d # => ["a", "b", "c\nd"]
a,b\\\,c # => ["a", "b\,c"]
Any ideas?
The first response looks pretty good. With a file containing
\a,,b\\\,c\,d,e\\f,\\,\
g
it gives:
[["\\a,"], [","], ["b\\\\\\,c\\,d,"], ["e\\\\f,"], ["\\\\,"], ["\\\ng\n"], [""]]
which is pretty close. I don't need the unescaping done on this first pass, as long as everything splits correctly on the commas. I tried Oniguruma and ended up with (the much longer):
Oniguruma::ORegexp.new(%{
(?: # - begins with (but doesn't capture)
(?<=\A) # - start of line
| # - (or)
(?<=,) # - a comma
)
(?: # - contains (but doesn't capture)
.*? # - any set of characters
[^\\\\]? # - not ending in a slash
(\\\\\\\\)* # - followed by an even number of slashes
)*?
(?: # - ends with (but doesn't capture)
(?=\Z) # - end of line
| # - (or)
(?=,)) # - a comma
},
'mx'
).scan(s)