def symbolsReplaceDashes(text):
I want to replace all spaces and symbols with hyphens. Because I want to use this with URL.
def symbolsReplaceDashes(text):
I want to replace all spaces and symbols with hyphens. Because I want to use this with URL.
import re
text = "this isn't alphanumeric"
result = re.sub(r'\W','-',text) # result will be "this-isn-t-alphanumeric"
The \W
class is the inverse of the \w
class, which consists of alphanumeric characters and underscores ([a-zA-Z0-9_]
). Thus, replacing any character that doesn't match \W
with a dash will leave you with a string that consists of only alphanumerics, underscores, and dashes, suitable for a URL.
Some people, when confronted with a problem, think “I know, I'll use regular expressions.” Now they have two problems.
This response doesn't use regular expressions, but should also work, with greater control over the types of symbols to filter. It uses the unicodedata module to remove all symbols by checking the categories of the characters.
import unicodedata
# See http://www.dpawson.co.uk/xsl/rev2/UnicodeCategories.html for character categories
replace = ('Sc', 'Sk', 'Sm', 'So', 'Zs')
def symbolsReplaceDashes(text):
L = []
for char in text:
if unicodedata.category(unicode(char)) in replace:
L.append('-')
else: L.append(char)
return ''.join(L)
You may need to use something like urllib.quote(output.encode('utf-8'))
to encode characters if ranges are beyond basic ASCII alphanumeric characters.
Instead of regex, if you want to escape a string to be used for an url, use urllib.quote()
or urllib.quote_plus()
. For more complex queries, you might want to build the url using urllib.urlencode()
. You can reverse the quotation with urllib.unquote()
and urllib.unquote_plus()
.