tags:

views:

167

answers:

4

Hi,

Does anyone know how to find all characters before the last underscore in a filename.

IABU_Real_Egypt_AUS09_012.indd

The result I need is IABU_Real_Egypt_AUS09

Thanks in advance

+2  A: 

How about:

(.*?)_[^_]*

Then the result you want is in group 1. (You haven't specified a language, so that's as far as I can go.)

There's more than one way to do this; I'm sure you could use lookahead or lookbehind. What I've done is:

  1. Match as many characters as possible (but non-greedily). Save them in a group.
  2. Match an underscore.
  3. Match any number of characters, as long as they aren't underscores.

This will involve some backtracking, so if this if a performance-critical piece of code, you might need to optimize it more than I have done.

A better solution would be to start at the end of the string and count backwards until you reach an underscore, then take the substring from 0 to that index. This would likely be much faster and much clearer than using a regex. For example, in Java:

public static String getUpToUnderscore(String str) {
    return str.substring(0, str.lastIndexOf('_'));
}
Michael Myers
A: 

Assuming you have at least 1 underscore, you could do something like this:

/(.*_[^_]+)/
+6  A: 

/(.*)_/ and take the value of the capture. Regexes are typically greedy so it's automatic (you don't need the negative character class).

irb(main):007:0> "IABU_Real_Egypt_AUS09_012.indd".match(/(.*)_/)[1]

=> "IABU_Real_Egypt_AUS09"

Ben Hughes
Hmm - finally a case where greediness is useful. I'm always trying to get regex engines to be non-greedy.
Triptych
+1  A: 

A non-regex example in C#:

s.Substring(0, s.LastIndexOf('_'))
Patrick McDonald
And lastIndexOf has equivalents in several languages.
streetpc
Why -1? regular expressions are not always the best solution, I was merely showing an alternative, simpler solution in C#, similar solutions are available in most programming languages.
Patrick McDonald
+1 that was my first thought as well
David Zaslavsky