tags:

views:

101

answers:

7

I have a bunch of files in a directory, mostly labled something like...

PO1000000100.doc or .pdf or .txt Some of them are PurchaseOrderPO1000000109.pdf

What i need to do is extract the PO1000000109 part of it. So basically PO with 10 numbers after it... How can I do this with a regex?

(What i'll do is a foreach loop on the files in the directory, get the filename, and run it through the regex to get the PO number...)

I'm using C# - not sure if this is relevant.

A: 

This RegEx will pick up all numbers from a string \d*.

As described here.

Oded
... and empty strings.
Bart Kiers
Was assuming correct input ;)
Oded
+2  A: 

Try this

String data = 
  Regex.Match(@"PO\d{10}", "PurchaseOrderPO1000000109.pdf", 
    RegexOptions.IgnoreCase).Value;

Could add a Regex.IsMatch with same vars above ofc :)

Don
Using `<pre><code>` ... `</code></pre>` tags could you perhaps consider splitting up the expression across lines to make it slightly more readable :)
PP
A: 

A possible regexp could be:

^.*(\d{10})\.\D{3}$
DerKlops
+2  A: 

If the PO part is always the same, you can just get the number without needing to use a regex:

new string(theString.Where(c => char.IsDigit(c)).ToArray());

Later you can prepend the PO part manually.

NOTE: I'm assuming that you have only one single run of numbers in your strings. If you have for example "abc12345def678" you will get "12345678", which may not be what you want.

Konamiman
+1  A: 
StuffHappens
+1  A: 
string data="PurchaseOrderPO1000000109.pdf\nPO1000000100.doc";
MatchCollection matches = Regex.Matches(data, @"PO[0-9]{10}");
foreach(Match m in matches){
    print(m.Value);
}

Results

PO1000000109
PO1000000100
S.Mark
A: 
var re = new System.Text.RegularExpressions.Regex("(?<=^PurchaseOrder)PO\\d{10}(?=\\.pdf$)");
Assert.IsTrue(re.IsMatch("PurchaseOrderPO1234567890.pdf"));
Assert.IsFalse(re.IsMatch("some PurchaseOrderPO1234567890.pdf"));
Assert.IsFalse(re.IsMatch("OrderPO1234567890.pdf"));
Assert.IsFalse(re.IsMatch("PurchaseOrderPO1234567890.pdf2"));
erikkallen