tags:

views:

113

answers:

4

Hello,

I got a system that only accepts regular expressions for matching and since I've never done it before I went on-line to lookup some tutorials but got really confused so I'm asking here.

Regular expressions needs to match the following:

File.f
File-1.f

in both cases it has to return what's before the . or - in the 2nd case (File).

I appreciate the help.

Thanks

+5  A: 

This should do:

^[^\.\-]+

(In English: Match has to start at the beginning of the string and consists of everything until either a . or a - is found.)

Lucero
Thank you for the quick answer.
Acanthus
`^[^.-]+` would also do it.
Alix Axel
you don't need to escape dot and hyphen in the character class
SilentGhost
@Axel and SilentGhost: note that both `.` and `-` have a special meaning, so that it is adivsable to escape them. Since the regex engine used has not been specified it is safer to escape them; the hyphen is used in the character class to specify ranges and some engines accept the dot for any character also AFAIR.
Lucero
You don't need to escape them when they are members of a character class but it certainly doesn't hurt to be explicit when it comes to metacharacters.
Andrew Hare
@Lucero not in a character class.
Skilldrick
@Lucero: not in character classes.
fireeyedboy
This answer does not "return anything" - i think the OP is looking for a match group here.
Kimvais
@Alix - cutest regex ever [^.-]
Stuart Branham
@Stuart: Lol. @Kimvais: Returns on $matches[0] I guess.
Alix Axel
+1  A: 
^([^.-]+).*\.f$

First ^ means beginning of a line

() means a group - this is the part that is captured and returned as the first group (depending on your language it is $1, \1 or groups()[0] or group(1)

[] means one from this set of characters

[^ means a set not containing these characters, i.e. it is "all characters but not the ones I list" opposed to [], which means "no characters but only the ones I list"

+ means that the previous can be repeated from 1 to infinity times.

. is 'any' single character

* is repeats from 0 to infinity times.

\. means the character . (because . is special)

f is just the letter f (or word f, actually)

$ is the end of line.

Kimvais
+1  A: 

I don't know what language you're using, but they all work mostly the same. In C# we would do something like the following:

List<string> files = new List<string>() {
    "File.f",
    "File-1.f"
};
Regex r = new Regex(@"^(?<name>[^\.\-]+)");
foreach(string file in files) {
    Match m = r.Match(file);
    Console.WriteLine(m.Groups["name"]);
}

The named group allows you to easily extract the prefix that you are seeking. The above prints

File
File

on the console.

I strongly encourage you to pick up the book Mastering Regular Expressions. Every programmer should be comfortable with regular expressions and Friedl's book is by far the best on the subject. It has pertinent to Perl, Java, .NET and PHP depending on your language choice.

Jason
Buy the book. A programmer without Regex is like McGuyer without a Swiss Army knife.
Even Mien
A: 

I agree that Kimvais has a really solid answer (I can't vote so sorry)

I wrote it up in perl before i read their answer, I came up with this:

$string1 = "John.f"; $string2 = "Eric-1.f";

$string1 =~ m/^([0-9a-zA-Z]+)[.-]/i; print $1 . "\n\n\n";

$string2 =~ m/^([0-9a-zA-Z]+)[.-]/i;

print $1 . "\n\n\n";

Basically it's along the same lines of Kimvais's except that his will accept any character's before the . or -, which I'm not sure if you want to see, mine will only accept number's or letter's then a . or a -

Good luck

onaclov2000