views:

2108

answers:

9

I want to replace whitespace with underscore in a string to create nice URLs. So that for example:

"This should be connected" becomes "This_should_be_connected"

I am using Python with Django. Can this be solved using regular expressions?

+17  A: 

You don't need regular expressions, Python has a string method that does what you need:

mystring.replace (" ", "_")
rogeriopvl
This doesn't work with other whitespace characters, such as \t or a non-breaking space.
Roberto Bonvallet
Yes you are correct, but for the purpose of the question asked, it doesn't seem necessary to take those other spaces into account.
rogeriopvl
+2  A: 

Using the re module:

import re
re.sub('\s+', '_', "This should be connected") # This_should_be_connected
re.sub('\s+', '_', 'And     so\tshould this')  # And_so_should_this

Unless you have multiple spaces or other whitespace possibilities as above, you may just wish to use string.replace as others have suggested.

Jarret Hardie
Thank you, this was exactly what I was asking for. But I agree, the "string.replace" seems more suitable for my task.
Lucas
+7  A: 

use string's replace method:

"this should be connected".replace(" ", "_")

"this_should_be_disconnected".replace("_", " ")

mdirolf
+9  A: 

Replacing spaces is fine, but I might suggest going a little further to handle other URL-hostile characters like question marks, apostrophes, exclamation points, etc.

Also note that the general consensus among SEO experts is that dashes are preferred to underscores in URLs.

def urlify(s):

     # Remove all non-word characters (everything except numbers and letters)
     s = re.sub(r"\W", '', s)

     # Replace all runs of whitespace with a single dash
     s = re.sub(r"\s+", '-', s)

     return s



# Prints: I-cant-get-no-satisfaction"
print urlify("I can't get no satisfaction!")
Triptych
This is interesting. I will definitely use this advice.
Lucas
Remember to urllib.quote() the output of your urlify() - what if s contains something non-ascii?
zgoda
+11  A: 

Django has a 'slugify' function which does this, as well as other URL-friendly optimisations. It's hidden away in the defaultfilters module.

>>> from django.template.defaultfilters import slugify
>>> slugify("This should be connected")

this-should-be-connected

This isn't exactly the output you asked for, but IMO it's better for use in URLs.

Daniel Roseman
That is an interesting option, but is this a matter of taste or what are the benefits of using hyphens instead of underscores. I just noticed that Stackoverflow uses hyphens like you suggest. But digg.com for example uses underscores.
Lucas
This happens to be the preferred option (AFAIK). Take your string, slugify it, store it in a SlugField, and make use of it in your model's get_absolute_url(). You can find examples on the net easily.
shanyu
@Lulu people use dashes because, for a long time, search engines treated dashes as word separators and so you'd get an easier time coming up in multi-word searches.
James Bennett
+2  A: 

I'm using the following piece of code for my friendly urls:

from unicodedata import normalize
from re import sub

def slugify(title):
    name = normalize('NFKD', title).encode('ascii', 'ignore').replace(' ', '-').lower()
    #remove `other` characters
    name = sub('[^a-zA-Z0-9_-]', '', name)
    #nomalize dashes
    name = sub('-+', '-', name)

    return name

It works fine with unicode characters as well.

Armandas
Could you explain where this differs from the built-in Django slugify function?
andybak
A: 

Python has a built in method on strings called replace which is used as so:

string.replace(old, new)

So you would use:

string.replace(" ", "_")

I had this problem a while ago and I wrote code to replace characters in a string. I have to start remembering to check the python documentation because they've got built in functions for everything.

A: 
perl -e 'map { $on=$_; s/ /_/; rename($on, $_) or warn $!; } <*>;'

Match et replace space > underscore of all files in current directory

A: 
string.replace(' ', '_'); //only replaces the first instance of ' ' with '_'. this is wrong.
string.replace(/\s/g, '_'); //this is correct, replaces all instances of ' ' with '_'.
twmulloy