tags:

views:

3879

answers:

2

I'm currently trying to split a string in C# (latest .NET and Visual Studio 2008), in order to retrieve everything that's inside square brackets and discard the remaining text.

E.g.: "H1-receptor antagonist [HSA:3269] [PATH:hsa04080(3269)]"

In this case, I'm interested in getting "HSA:3269" and "PATH:hsa04080(3269)" into an array of strings.

How can this be achieved?

+10  A: 

Split won't help you here; you need to use regular expressions:

// using System.Text.RegularExpressions;
// pattern = any number of arbitrary characters between square brackets.
var pattern = @"\[(.*?)\]";
var query = "H1-receptor antagonist [HSA:3269] [PATH:hsa04080(3269)]";
var matches = Regex.Matches(query, pattern);

foreach (Match m in matches) {
    Console.WriteLine(m.Groups[1]);
}

Yields your results.

Konrad Rudolph
Do you find it awkward in 3.5 that MatchCollection enumeartor still returns Match as Object?
chakrit
anyway... a better regex match might be \[([^\]]*)\] so as to be on the safe side :-)
chakrit
@chakrit: 1. Yes, but this cannot be changed for backwards compatibility reasons. Really a shame though. Microsoft should have the balls to do like Python 3: throw everything pre-2.0 out for good and introduce a breaking change. But this won't happen …
Konrad Rudolph
Perfect!Thanks man, really appreciate it :)
Hal
@chakrit: 2. This was indeed my first version (I usually always use explicit groups) but I reconsidered because that's wordier to express exactly the same pattern (for all practical purposes). There's really no risk here in using the more implicit character class along with a nongreedy quantifier.
Konrad Rudolph
A: 

Err, how about regex split then?! Untested:

string input = "H1-receptor antagonist [HSA:3269] [PATH:hsa04080(3269)]";   
string pattern = @"([)|(])";

foreach (string result in Regex.Split(input, pattern)) 
{
   Console.WriteLine("'{0}'", result);
}
Daz
You should have tested it. "([)|(])" matches ')', '|', or '('. You probably meant "(\[|\])", but that's wrong too; if you use capturing groups in the regex, the captured text is returned along with the other tokens, for a total of eight tokens. Try it here: http://www.myregextester.com/in
Alan Moore
Since the question was actually to use split, I thought I'd demonstrate a better solution with a link and a quick, untested sample, from where the user can use their initiative and solve the problem!
Daz