tags:

views:

86

answers:

4
def symbolsReplaceDashes(text):

I want to replace all spaces and symbols with hyphens. Because I want to use this with URL.

+5  A: 
import re
text = "this isn't alphanumeric"
result = re.sub(r'\W','-',text) # result will be "this-isn-t-alphanumeric"

The \W class is the inverse of the \w class, which consists of alphanumeric characters and underscores ([a-zA-Z0-9_]). Thus, replacing any character that doesn't match \W with a dash will leave you with a string that consists of only alphanumerics, underscores, and dashes, suitable for a URL.

Amber
He may want underscores replaced by dashes as well.
jemfinch
It's possible. If that's the case, `r'[\W-]'` as the pattern will accomplish that easily.
Amber
A: 

Some people, when confronted with a problem, think “I know, I'll use regular expressions.” Now they have two problems.

SamB
-1: Just because regex isn't the solution for everything doesn't mean it's not the solution for some things.
Amber
Regular expressions are very much the correct solution to this problem, pithy quotations notwithstanding.
jemfinch
Some people, when confronted with a problem, think "I know, I'll quote Jamie Zawinski." Now they still have their original problem.
Paul McGuire
+1  A: 

This response doesn't use regular expressions, but should also work, with greater control over the types of symbols to filter. It uses the unicodedata module to remove all symbols by checking the categories of the characters.

import unicodedata

# See http://www.dpawson.co.uk/xsl/rev2/UnicodeCategories.html for character categories
replace = ('Sc', 'Sk', 'Sm', 'So', 'Zs')
def symbolsReplaceDashes(text):
    L = []
    for char in text:
        if unicodedata.category(unicode(char)) in replace:
            L.append('-')
        else: L.append(char)
    return ''.join(L)

You may need to use something like urllib.quote(output.encode('utf-8')) to encode characters if ranges are beyond basic ASCII alphanumeric characters.

David Morrissey
+1  A: 

Instead of regex, if you want to escape a string to be used for an url, use urllib.quote() or urllib.quote_plus(). For more complex queries, you might want to build the url using urllib.urlencode(). You can reverse the quotation with urllib.unquote() and urllib.unquote_plus().

Lie Ryan
Since the OP is asking for something which is a lossy transformation, my guess is less that they want to escape the string, and more that they want to generate "nice" URLs from things like post titles, et cetera.
Amber