tags:

views:

318

answers:

3

I need to split a string into a list of parts in Ruby, but I need to ignore stuff inside paramentheses. For example:

A +4, B +6, C (hello, goodbye) +5, D +3

I'd like the resulting list to be:

[0]A +4
[1]B +6
[2]C (hello, goodbye) +5
[3]D +3

But I can't simply split on commas, because that would split the contents of the parentheses. Is there a way to split stuff out without pre-parsing the commas in the braces into something else?

Thanks.

A: 

The following answer is wrong:

#!/usr/bin/ruby1.8

s = 'A +4, B +6, C (hello, goodbye) +5, D +3'
p s.gsub(/ *\(.*?\)/, '').split(/, */)
# => ["A +4", "B +6", "C +5", "D +3"
Wayne Conrad
This eliminates comments... he needs them retained.
Myrddin Emrys
Err, you're removing the `(hello, goodbye)` part entirely. In the desired output Colen left it in there.
Bart Kiers
So much for my reading comprehension
Wayne Conrad
What? I can't vote down my own answer?
Wayne Conrad
;), no you can't vote (up or down) your own answers.
Bart Kiers
Thanks to whoever did the deed for me. It needs at least one more down vote, m'kay? No way this should be positive.
Wayne Conrad
+5  A: 

Try this:

s = 'A +4, B +6, C (hello, goodbye) +5, D +3'
tokens = s.scan(/(?:\(.*?\)|[^,])+/)
tokens.each {|t| puts t.strip}

Output:

A +4
B +6
C (hello, goodbye) +5
D +3

A short explanation:

(?:        # open non-capturing group 1
  \(       #   match '('
  .*?      #   reluctatly match zero or more character other than line breaks
  \)       #   match ')'
  |        #   OR
  [^,]     #   match something other than a comma
)+         # close non-capturing group 1 and repeat it one or more times

Another option is to split on a comma followed by some spaces only when the first parenthesis that can be seen when looking ahead is an opening parenthesis (or no parenthesis at all: ie. the end of the string):

s = 'A +4, B +6, C (hello, goodbye) +5, D +3'
tokens = s.split(/,\s*(?=[^()]*(?:\(|$))/)
tokens.each {|t| puts t}

will produce the same output, but I find the scan method cleaner.

Bart Kiers
# => ["A +4", " B +6", " C (hello, goodbye) +5", " D +3"]Looks perfect to me. Might want to #trim it to remove surrounding whitespace.
Myrddin Emrys
:) already saw the spaces and added `trim`
Bart Kiers
Great answer, thanks :)
Colen
You're welcome Colen.
Bart Kiers
+3  A: 
string = "A +4, B +6, C (hello, goodbye) +5, D +3"
string.split(/ *, *(?=[^\)]*?(?:\(|$))/)
# => ["A +4", "B +6", "C (hello, goodbye) +5", "D +3"]

How this regex works:

/
   *, *        # find comma, ignoring leading and trailing spaces.
  (?=          # (Pattern in here is matched against but is not returned as part of the match.)
    [^\)]*?    #   optionally, find a sequence of zero or more characters that are not ')'
    (?:        #   <non-capturing parentheses group>
      \(       #     left paren ')'
      |        #     - OR -
      $        #     (end of string)
    )
  )
/
gabriel
That may be a bit cryptic without an explanation for the faint hearted regex-enthusiast the OP probably is! :). But a good solution nevertheless.
Bart Kiers
How does this work? I couldn't find any good documentation about how regex worked with split - like Bart K. says I'm not that great with regexes
Colen
@Colen, I posted a very similar regex as a second solution including an explanation.
Bart Kiers
I've updated my answer to explain the regex.
gabriel