views:

1202

answers:

7

I am in need of a regular expression that can remove the extension of a filename, returning only the name of the file.

Here are some examples of inputs and outputs:

myfile.png     -> myfile
myfile.png.jpg -> myfile.png

I can obviously do this manually (ie removing everything from the last dot) but I'm sure that there is a regular expression that can do this by itself.

Just for the record, I am doing this in JavaScript

+8  A: 
/(.*)\.[^.]+$/

Result will be in that first capture group. However, it's probably more efficient to just find the position of the rightmost period and then take everything before it, without using regex.

Amber
+1 Thanks, works great!
Andreas Grech
A: 

The regular expression to match the pattern is:

\.[^.]*$

It finds a period character (\.), followed by 0 or more characters that are not periods ([^.]*), followed by the end of the string ($).

Igor Oks
/.\w*$/.exec("myfile.png") => [".png"]
Andreas Grech
That regex is returning the extension, whereas I need to remove the extension
Andreas Grech
A: 

In javascript you can call the Replace() method that will replace based on a regular expression.

This regular expression will match everything from the begining of the line to the end and remove anything after the last period including the period.

^(.*)\..*$

The how of implementing the replace can be found in this Stackoverflow question.

Javascript regex question

Tinidian
Actually, as you currently have it written, it will remove anything after and including the *first* period, since you have your capture group set to be non-greedy, and your latter `.*` can match anything, including periods.
Amber
Yeah I realized that after I initially posted and then updated it. Thanks
Tinidian
A: 

I suggest \.[^.]*$ (or, as a JavaScript regex object, /\.[^.]*$/). This matches everything from the last dot until the end of the string, so just replace that with nothing.

Tim Pietzcker
+4  A: 

Just for completeness: How could this be achieved without Regular Expressions?

var input = 'myfile.png';
var output = input.substr(0, input.lastIndexOf('.')) || input;

The || input takes care of the case, where lastIndexOf() provides a -1. You see, it's still a one-liner.

Boldewyn
+2  A: 
/^(.+)(\.[^ .]+)?$/

Test cases where this works and others fail:

  • ".htaccess" (leading period)
  • "file" (no file extension)
  • "send to mrs." (no extension, but ends in abbr.)
  • "version 1.2 of project" (no extension, yet still contains a period)

The common thread above is, of course, "malformed" file extensions. But you always have to think about those corner cases. :P

Test cases where this fails:

  • "version 1.2" (no file extension, but "appears" to have one)
  • "name.tar.gz" (if you view this as a "compound extension" and wanted it split into "name" and ".tar.gz")

How to handle these is problematic and best decided on a project-specific basis.

Roger Pate
A: 

This will do it as well :)

'myfile.png.jpg'.split('.').reverse().slice(1).reverse().join('.');

I'd stick to the regexp though... =P

Marcus Westin