tags:

views:

73

answers:

3

I have a text input field for titles of various things and to help minimize false negatives on search results(internal search is not the best), I need to have a REGEX pattern which looks at the first four characters of the input string and removes the word(and space after the word) _the _ if it is there at the beginning only.

For example if we are talking about the names of bands, and someone enters The Rolling Stones , what i need is for the entry to say only Rolling Stones

Can a regex be used to automatically strip these 4characters?

+1  A: 

You can use the ^ identifier to match a pattern at the beginning of a line, however for what you are using this for, it can be considered overkill.

A lot of languages support string manipulations, which is a more suitable choice. I can provide an example to demonstrate in Python,

>>> def func(n):
    n = n[4:len(n)] if n[0:4] == "The " else n  
    return n

>>> func("The Rolling Stones")
'Rolling Stones'
>>> func("They Might Be Giants")
'They Might Be Giants'
Anthony Forloney
+1  A: 

Applying the regex

^(?:\s*the\s*)?(.*)$

will match any string, and capture it in backreference no. 1, unless it starts with the (optionally surrounded by whitespace), in which case backref no. 1 will contain whatever follows.

You need to set the case-insensitive option in your regex engine for this to work.

Tim Pietzcker
+1  A: 

As you don't clarify with language, here is a solution in Perl :

my $str = "The Rolling Stones";

$str =~ s/^the //i;

say $str; # Rolling Stones
bourbaki