tags:

views:

263

answers:

3

I have a string which contains words with parentheses. I need to remove the whole word from the string.

For example: for the input, "car wheels_(four) klaxon" the result should be, "car klaxon".

Can someone give me an example that would accomplish this?

+7  A: 

You can do this with regular expressions. The regular expression you need is:

"\s?\S+[()]\S+\s?"

This removes any word containing either ( or ) or both, and removes both the word and collapses the surrounding whitespace. The match should be replaced with a single space.

In C# the regular expression could be used like this:

    string s = "car wheels_(four) klaxon";
    s = Regex.Replace(s, @"\s?\S*[()]\S*\s?", " ");

I'm not entirely sure of the VB translation for this, but hopefully you can figure it out.

Mark Byers
your regex just blew my mind
JohnIdol
A potential gotcha: this will match any word with a parenthesis even if the parenthesis is not balanced. That may or may not be desired.
Welbog
Thanks Mark, it works fine but if it has a word like "(wheel)" doesn't replace the word :P
Sein Kraft
Welbog: yes, you are right. This can be fixed easily for a single level of parentheses but if there can be nested parentheses, regex should not be used. To simplify it, I assumed the poster wishes to remove all words containing any number of parentheses, matching or not.
Mark Byers
I thought of something like `\w*\(\w*\)\w*` (with somewhat proper parentheses), but yours it nicer. Nice trick with the spaces. +1.
Kobi
@Mark: Sorry about the out-of-order comments. I wanted to make what I was saying clearer so I deleted the old one and replaced it. You're absolutely right about not using regex when there is possible nesting. I just wanted to make sure it's clear what the limits of your expression are.
Welbog
Sein Kraft: I have updated the regex, but I'm not sure exactly what you want. Now it will remove a lone '(' too. Is this right? Or should there be at least one alphabetical character in the word?
Mark Byers
I want to remove the word wich contains parenthesis. This could be in the middle of the word, in the end or in the begin. I mean, if it has a parenthesis just remove it. (sorry my english is very basic)
Sein Kraft
Some more issues with this is that it will also delete punctuation connected to the word so, 'Hello (dear). World!" will change to "Hello World!", but "Hello. World!" might be desired. Do you need to handle this case too?
Mark Byers
It works perfect! Thanks so much!
Sein Kraft
No, because the string is a serie of random words so they dont have dots or anithing else.
Sein Kraft
+1  A: 

Slightly different:

sed "s/\s\+\S*(.\+)\S*\s\+/ /g" yourfile

It works like this:

yourfile:

car wheels_(four) klaxon
ciao (wheel) hey
foo bar (baz) qux
stack overflow_(rulez)_the world

transformed in:

car klaxon
ciao hey
foo bar qux
stack world
Thrawn
Whoah. Regexs once more rule the world. http://xkcd.com/208/
Tonio
Doesn't quite work. On 'foo bar (baz) qux' you get "foo qux" instead of "foo bar qux". You need to match non-whitespace to avoid matching multiple words.
Mark Byers
Corrected for that :-)
Thrawn
Here's another test case: 'foo ba(r baz qu)x quux.' The poster didn't specify what he wanted in this case, so I don't know what the correct result should be, but we differ in our outputs for this case.
Mark Byers
A: 

If speed isn't an issue and you want to avoid overcomplicated regular expressions, you can use String.Split on " " to create an array of "words", iterate through each word, replace any that String.Contains "(" with an empty string, then use String.Join with a separator of "" to get your results.

Sorry can't send the codez, don't have a VB.NET compiler on hand.

Phil