tags:

views:

45

answers:

4

I'm thinking we can look for an even number of quotes to the left, and to the right of the comma... but I'm not quite sure how to write it. Anyone know?

Actually..you'd just have to check either (left or right).

I want to split on this, so it has to match only the comma.

Example:

one, "two, three"

Should be split into two strings:

['one', ' "two, three"']
+1  A: 

Regex alone is not very good for determining nested conditions. Brace matching, quote matching etc, it just isn't really up to task. You can use a regex in combination with a loop to parse things, but on the other hand, it may be simpler to simply parse it yourself.

Maybe you could provide a few example strings to clarify what you need to match so I can answer better.

*edit: Looking at your proposed solution does it work with \\" where the \ is escaped, but not the "?

I suspect you'll find deficiencies in your regex if you're working with real world strings or complicated escape sequences. Likely this will not be the common case, but again, it is important to understand a regex is probably not what you actually want to do here. Regex has no concept of nested state, even for simple quotations escape sequences are hard to deal with correctly.

M2tM
Quotes can't be nested, so I don't think this is asking too much from a regex. If you can suggest a better way to split a string by unquoted commas, I'm all ears.
Mark
"I'm thinking we can look for an even number of quotes to the left, and to the right of the comma... but I'm not quite sure how to write it."
M2tM
You don't need to match the quotes, nor count the quotes. You just need to determine if there's an even or odd number, which is perfectly doable. See my answer (I figured it out).
Mark
The way your question was originally treated (minus the example which was added after my comment) you made it sound like you needed to detect an even number of quotes on either side of a comma, that is a balancing matching scheme and all the regex implementations I am aware of don't really have any support for that kind of logic.
M2tM
Finally, alternating ', `, and " nesting is a commonly done thing in HTML/Javascript. So yes, those symbols can be nested and in your example you are nesting them as well.
M2tM
' "two, three"' == nested comment symbols.
M2tM
@M2tM: I'm sorry if I gave that impression; I didn't mean that I needed to detect an even number of quotes, I meant that that could be a possible approach. And the different kinds of quotes were meant to demonstrate the output, not the input, so no, there wasn't any nesting in my example. However, you do raise a point -- I don't see why my users couldn't use different kinds of quotes.
Mark
I have re-read your post and I see I was mistaken in reading the second bit, I thought they were two examples of different representations that you needed to handle (when of course, the second is the stored representation.)
M2tM
Please read my edit on this post, dealing with escape sequences (if this is important) could be a potential difficulty of the approach you've chosen. I strongly recommend writing a simple parser if you're planning on allowing people other than yourself to interact with this system. I've dealt with a lot of import scripts and user data in my time working on websites and cms stuff, I guarantee you the first thing someone's going to do is try and store a " right in the middle of some place you don't account for it. Also awesome is using an existing CSV parser.
M2tM
Fair enough. You earned your +1. I'm not one that believes regexes are a one-size-fits-all solution anyway, but I thought this was simple enough that a regex could handle it. Isn't too much work to loop over the string and do some counting anyway.
Mark
A: 

Why not use the Split method of the string class?

String[] s = someString.Split(",");
// s[0] would contain the portion to the left of the comma
// s[1] would contain the portion to the right of the comma
tommieb75
because (a) this doesn't do what I asked, (b) this doesn't compile (in C#)
Mark
A: 

Nevermind... think I got it:

Regex("(?<=[^\"]*(?:\"[^\"]*\"[^\"])),");
Mark
+2  A: 

Are you parsing CSV? Regex is a pretty bad way to do it. Having read CSV definition (easily googlable) you can write an automaton to do it. Or... you can just steal one of many ready solutions out on web already.

liho1eye
No... it's not a CSV, but a CSV parser *would* do what I asked...
Mark