Hi,
url = 'abcdc.com'
print url.strip('.com')
Expect: abcdc
Resut: abcd
Now I do url.rsplit('.com', 1)
Is there a better way..
Hi,
url = 'abcdc.com'
print url.strip('.com')
Expect: abcdc
Resut: abcd
Now I do url.rsplit('.com', 1)
Is there a better way..
You could do this:
url = 'abcdc.com'
if url.endswith('.com'):
url = url[:-4]
Or using regular expressions:
import re
url = 'abcdc.com'
url = re.sub('\.com$', '', url)
strip strips the characters given from both ends of the string, in your case it strips ".", "c", "o" and "m".
This is a perfect use for regular expressions:
>>> import re
>>> re.match(r"(.*)\.com", "hello.com").group(1)
'hello'
If you know it's an extension, then
url = 'abcdc.com' ... url.split('.')[0]
This works equally well with abcdc.com as abcdc.[anything] and is more extensible.
I don't see anything wrong with the way you're doing it with rsplit, it does exactly what you want. It all depends on how generic you want the solution to be. Do you always want to remove .com, or will it sometimes be .org? If that is the case, use one of the other solutions, otherwise, stick with rsplit()
The reason that strip() does not work the way you expect is that it works on each character individually. It will scan through your string and remove all occurrences of the characters from the end AND the front. So if your string started with 'c', that would also be gone. You would use rstrip to only strip from the back.
Depends on what you know about your url and exactly what you're tryinh to do. If you know that it will always end in '.com' (or '.net' or '.org') then
url=url[:-4]
is the quickest solution. If it's a more general URLs then you're probably better of looking into the urlparse library that comes with python.
If you on the other hand you simply want to remove everything after the final '.' in a string then
url.rsplit('.',1)[0]
will work. Or if you want just want everything up to the first '.' then try
url.split('.',1)[0]
def strip_end(text, suffix):
if not text.endswith(suffix):
return text
return text[:-len(suffix)]
Actually the simplest way would be to use 'replace':
url = 'abcdc.com'
print url.replace('.com','')