tags:

views:

102

answers:

4

Input string is something like this: OU=TEST:This001. We need extra "This001". Best in C#.

+7  A: 

What about :

/OU=.*?:(.*)/

Here is how it works:

OU=  // Must contain OU=
.    // Any character
*    // Repeated but not mandatory
?    // Ungreedy (lazy) (Don't try to match everything)
:    // Match the colon
(    // Start to capture a group
  .    // Any character
  *    // Repeated but not mandatory
)    // End of the group

For the / they're delimiters to know where the regex start and where it ends (and for adding options).

The captured group will contain This001.

But it would be faster with a simple Substring().

yourString.Substring(yourString.IndexOf(":")+1);

Resources :

Colin Hebert
I have never used that syntax, interesting, something to look at today :)
leppie
I can't find any documentation on your regex construct, can you please provide a reference?
leppie
@leppie, updated with more informations, but your should go on [regular-expressions.info](http://www.regular-expressions.info/) there is a lot of informations on regexes there.
Colin Hebert
The leading and ending `/` does not seem to be valid for .NET. IIRC, this is a JavaScript thing, not?
leppie
@leppie, it's perl syntax, but it's also used with other regex flavours. You're right, in .net no need for delimiters, but I still use them to have a "standard" regex, that anyone can understand.
Colin Hebert
Thanks, I thought I was a bit confused :)
leppie
One issue with this is it will over match. E.g in the case of CN=TEST:This001. We are extracting from a digital certificate. Only string after "OU=" needs to be matched.
Icerman
Just add `OU=.*?` at the start of your regex.
Colin Hebert
+4  A: 

"OU=" smells like you're doing an Active Directory or LDAP search and responding to the results. While regex is a brilliant tool, I just wanted to make sure that you're also aware of the excellent System.DirectoryServices.Protocols classes that were made for parsing, filtering and manipulating just this sort of data.

The SearchResult, SearchResultEntry and DirectoryAttribute in particular would be the friends you might be looking for. I don't doubt that you can regex or substring as cleverly as the next guy but it's also nice to have another good tool in the toolbox.

Have you tried these classes?

Sir Wobin
This is from a certificate. "OU=" is part of subject name from System.Security.Cryptography.X509Certificates.
Icerman
A: 

if the OU=TEST: is your requirement before the string you want to match, use this regex:

(?<=OU\s*=\s*TEST\s*:\s*).*

that regex matches any length of text after the colon, whereas any text before the colon is just a requirement.

You can replace TEST with [A-Za-z]+ to match any text other than TEST, or you can replace TEST with [\w]+ to match any length of any combination of alphabet and numbers.

\s* means it might be any number of whitespaces or nothing in that position, remove it if you don't need such a check.

Vantomex
+2  A: 

A solution without regex:

var str = "OU=TEST:This00:1";
var result = str.Split(new char[] { ':' }, 2)[1];

// result == This00:1

Regex vs Split vs IndexOf

Split

var str = "OU=TEST:This00:1";

var sw = new Stopwatch();

sw.Start();
var result = str.Split(new char[] { ':' }, 2)[1];
sw.Stop();

// sw.ElapsedTicks == 15

Regex

var str = "OU=TEST:This00:1";

var sw = new Stopwatch();

sw.Start();
var result = (new Regex(":(.*)", RegexOptions.Compiled)).Match(str).Groups[1];
sw.Stop();

// sw.ElapsedTicks == 7000 (Compiled)

IndexOf

var str = "OU=TEST:This00:1";

var sw = new Stopwatch();

sw.Start();
var result = str.Substring(str.IndexOf(":") + 1);
sw.Stop();

// sw.ElapsedTicks == 40

Winner: Split

Links

BrunoLM
Good idea, but only "OU=" and ":" have fixed value/length. Everything else is variable.
Icerman
@Icerman: The length doesn't matter, if it has at least one `:` it will get everything after it.
BrunoLM
`str.Split(new char[] { ':' }, 2)` the second parameter indicates how many pieces it will be split. Telling to split into 2 will cause `OU=jjj:kkkkkk:aaaaa:ssssss:xxxx` will return 2 groups: `OU=jjj` and `kkkkkk:aaaaa:ssssss:xxxx`
BrunoLM