tags:

views:

77

answers:

3

Hi, I want to match the first number/word/string in quotation marks/list in the input with Regex. For example, it should match those:

"hello world" gdfigjfoj sogjds

-14.5 fdhdfdfi dfjgdlf

test14 hfghdf hjgfjd

(a (c b 7)) (3 4) "hi"

Any ideas to a regex or how can I start?

Thank you.

+2  A: 

Any ideas to a regex or how can I start?

You can start with any tutorial on basic regex, such as this.


[Edit] I missed that you wanted to count parentheses. That cannot be done in regex - nothing that involves counting (except for non-standard lookaheads) can.

BlueRaja - Danny Pflughoeft
+2  A: 

If you want to match balanced parenthesis, regex is not the right tool for the job. Some regex implementations do facilitate recursive pattern matching (PHP and Perl, that I know of), but AFAIK, C# cannot do that (EDIT: see Steve's comment below: .NET can do this as well, after all).

You can match up to a certain depth using regex, but that very quickly explodes in your face. For example, this:

\(([^()]|\([^()]*\))*\)

meaning

\(                        # match the character '('
(                         # start capture group 1
  [^()]                   #   match any character from the set {'0x00'..''', '*'..'ÿ'}
  |                       #   OR
  \(                      #   match the character '('
  [^()]*                  #   match any character from the set {'0x00'..''', '*'..'ÿ'} and repeat it zero or more times
  \)                      #   match the character ')'
)*                        # end capture group 1 and repeat it zero or more times
\)                        # match the character ')'

will match single nested parenthesis like (a (c b 7)) and (a (x) b (y) c (z) d), but will fail to match (a(b(c))).

Bart Kiers
Actually .NET does support balanced matching...http://blogs.msdn.com/bclteam/archive/2005/03/15/396452.aspx
Steve Wortham
Ah, it does! I rarely do any .Net work (but I believe this isn't the first time someone corrected me about this... :)). Thanks for the info Steve.
Bart Kiers
No problem. I've only had the need to write regular expressions with balanced matching a few times. It can quickly become a confusing mess to say the least. It does add a substantial amount of power to the regex language though. Ryan Byington includes an example for matching opening and closing parens in the link above so it should be useful to the OP.
Steve Wortham
I agree: they can become rather confusing! IMO, whenever you need to process something that is expressed in a recursive manner, it it time to write a little parser. Or write a little grammar and let a parser generator like Yacc or ANTLR do the dirty work for you. Nevertheless, good to know .NET has them too. I hope I don't forget (again).
Bart Kiers
Thank you. I'm using .NET, so: \\([^\\(\\)]*(((?<Open>\\()[^\\(\\)]*)+((?<Close-Open>\\))[^\\(\\)]*)+)*(?(Open)(?!))\\)
TTT
A: 

For first three cases, you could to use:

^("[^"]*"|[+-]?\d*(?:\.\d+)?|\w+)

For last one, I'm not sure if it's possible with regex to match that last closing parenthesis.

EDIT: using that suggested balanced matching for last one:

^\([^()]*(((?<Open>\()[^()]*)+((?<Close-Open>\))[^()]*)+)*(?(Open)(?!))\)
Rubens Farias
The last one is my problem actually - I've succeed to implement the three cases.
TTT