I want to replace whitespace with underscore in a string to create nice URLs. So that for example:
"This should be connected" becomes "This_should_be_connected"
I am using Python with Django. Can this be solved using regular expressions?
I want to replace whitespace with underscore in a string to create nice URLs. So that for example:
"This should be connected" becomes "This_should_be_connected"
I am using Python with Django. Can this be solved using regular expressions?
You don't need regular expressions, Python has a string method that does what you need:
mystring.replace (" ", "_")
Using the re
module:
import re
re.sub('\s+', '_', "This should be connected") # This_should_be_connected
re.sub('\s+', '_', 'And so\tshould this') # And_so_should_this
Unless you have multiple spaces or other whitespace possibilities as above, you may just wish to use string.replace
as others have suggested.
use string's replace method:
"this should be connected".replace(" ", "_")
"this_should_be_disconnected".replace("_", " ")
Replacing spaces is fine, but I might suggest going a little further to handle other URL-hostile characters like question marks, apostrophes, exclamation points, etc.
Also note that the general consensus among SEO experts is that dashes are preferred to underscores in URLs.
def urlify(s):
# Remove all non-word characters (everything except numbers and letters)
s = re.sub(r"\W", '', s)
# Replace all runs of whitespace with a single dash
s = re.sub(r"\s+", '-', s)
return s
# Prints: I-cant-get-no-satisfaction"
print urlify("I can't get no satisfaction!")
Django has a 'slugify' function which does this, as well as other URL-friendly optimisations. It's hidden away in the defaultfilters module.
>>> from django.template.defaultfilters import slugify
>>> slugify("This should be connected")
this-should-be-connected
This isn't exactly the output you asked for, but IMO it's better for use in URLs.
I'm using the following piece of code for my friendly urls:
from unicodedata import normalize
from re import sub
def slugify(title):
name = normalize('NFKD', title).encode('ascii', 'ignore').replace(' ', '-').lower()
#remove `other` characters
name = sub('[^a-zA-Z0-9_-]', '', name)
#nomalize dashes
name = sub('-+', '-', name)
return name
It works fine with unicode characters as well.
Python has a built in method on strings called replace which is used as so:
string.replace(old, new)
So you would use:
string.replace(" ", "_")
I had this problem a while ago and I wrote code to replace characters in a string. I have to start remembering to check the python documentation because they've got built in functions for everything.
string.replace(' ', '_'); //only replaces the first instance of ' ' with '_'. this is wrong.
string.replace(/\s/g, '_'); //this is correct, replaces all instances of ' ' with '_'.