views:

608

answers:

6

I'd like to trim these purchase order file names (a few examples below) so that everything after the first "_" is omitted.

INCOLOR_fc06_NEW.pdf Keep: INCOLOR (write this to db as the VendorID) Remove: _fc08_NEW.pdf

NORTHSTAR_sc09.xls Keep: NORTHSTAR (write this to db as the VendorID) Remove: _sc09.xls

Our scenario: The managers are uploading these files to our Intranet web server, to make them available to download/view ect. I'm using Brettles NeatUpload, and for each file uploaded, am writing the files attributes into the PO table (sql 2000). The first part of the file name will be written to the DB as a VendorID.

The naming convention for these files is consistent in that the the first part of the file is always the vendor name (or Vendor ID) followed by an "_" then other unpredictable chars used to identify the type of Purchase Order then the file extention - which is consistently either .xls, .XLS, .PDF, or .pdf.

I tried TrimEnd - but the array of chars that you have to provide ends up being long and can conflict with the part of the file name I want to keep. I have a feeling I'm not using TrimEnd properly.

What is the best way to use string.TrimEnd (or any other string manipulation in C#) that will strip off all chars after the first "_" ?

+5  A: 
String s = "INCOLOR_fc06_NEW.pdf";
int index = s.IndexOf("_");

return index >= 0 ? s.Substring(0,index) : s;
Babak Naffas
It's good that you check index before using Substring(), but you're returning the part he wants trimmed off, not the initial part. Better make that `return index >= 0 ? s.Substring(0, index-1) : s;`
Sean Nyman
This returns the portion the OP wants removed. You should use s.Substring(0, index) instead.
Ahmad Mageed
(Or possibly (0, index), I can't remember if Substring arguments are inclusive or exclusive.)
Sean Nyman
Thanks, I rushed too much (yes, yes....that's what she said)
Babak Naffas
A: 
public string StripOffStuff(string sInput)
{
  int iIndex = sInput.IndexOf("_");

  return (iIndex > 0) ? sInput.Substring(0, iIndex) : sInput;
}

// Call it like:
string sNewString = StripOffStuff("INCOLOR_fc06_NEW.pdf");
Kelsey
Uhh, if there's no _ that will return an empty string. -1 until you get it right ;)
Sean Nyman
Could make it return whatever... would assume that an empty string would mean it was not possible, but I guess I could return the original. The empty string was to signify that the strip was not possible.
Kelsey
I would suspect that we'd want to return the original, since it's the part before the "_", but there doesn't happen to be one.
Steven Sudit
That doesn't seem like a very intuitive behaviour to me, especially given the context of the problem. If you're trying to trim off everything past the first instance of a delimiter, it makes more sense to return the original if there is no instance of said delimiter. Returning a particular string as an "error code", especially one that would be returned as an expected result of "valid" input (e.g. "_underscore" would be trimmed to "") will quickly lead to confusing and hard to maintain code.
Sean Nyman
Changed it to return the original. I have no preference either way.
Kelsey
I've looked at all the PO files (there are hundreds): every single one has an "_" after the vendor part. It's consistent enough that one could consider this in the solution.
Doug
A: 

TrimEnd removes white spaces and punctuation marks at the end of the String, it won't help you here. Read more about TrimEnd here: http://msdn.microsoft.com/en-us/library/system.string.trimend.aspx

Bnaffas code (with a small tweak):

String fileName = "INCOLOR_fc06_NEW.pdf";
int index = fileName.IndexOf("_");

return index >= 0 ? fileName.Substring(0, index) : fileName;

If you want to do something with the other parts, you could use a Split

string fileName = "INCOLOR_fc06_NEW.pdf";
string[] parts = fileName.Split('_');
Zyphrax
+2  A: 

I'll probably offend the anti-regex lobby, but here I go (ducking):

string stripped = Regex.Replace(filename, @"(?<=[^_]*)_.*",String.Empty);

This code will strip all extra characters after the first '_', unless there is no '_' in the string (then it will just return the original string).

It's one line of code. It's slower than the more elaborate IndexOf() algorithm, but when used in a non-performance-sensitive part of the code, it's a good solution.

Get your flame-throwers out...

Philippe Leybaert
A working solution is a working solution. It's good to keep an open mind for different solutions to the same problem.
Babak Naffas
What about the pro-regex-but-only-if-they're-used-sensibly lobby? ;) That lookbehind will always succeed because it can match zero characters. You could fix that by anchoring the lookahead to the beginning of the string, but why bother? @"_.*" works fine all by itself.
Alan Moore
Ok, I like your suggestion activa. No flame throwers from me - unless it'll be too slow working in a foreach loop. I'm using NeatUploads multiFile object which allows our managers to select multiple PO files at a time to upload in a batch.
Doug
When using Regex - do I have to include: using System."someting" for Regex.Replace to run properly in my .cs file? Gulp: I've never used Regular Expressions so far. Thanks all...
Doug
Ahh just found it: using System.Text.RegularExpressions. I've tested activa's solution, and it works just great. Solution found.
Doug
A: 

I would go with the SubString approach but to round out the available solutions here's a LINQ approach just for fun:

string filename = "INCOLOR_fc06_NEW.pdf";
string result = new string(filename.TakeWhile(c => c != '_').ToArray());

It'll return the original string if no underscore is found.

Ahmad Mageed
A: 

To go with all the "alternative" solutions, here's the second one that I thought of (after substring):

string filename = "INCOLOR_fc06_NEW.pdf";
string stripped = filename.Split('_')[0];
Sean Nyman
Very similar, but, I would go with this just to make sure something like "INCOLOR.pdf" can be handled too :) string filename = "INCOLOR_fc06_NEW.pdf"; string stripped = filename.Split({'_', '.'})[0];
Chansik Im