views:

172

answers:

4

I have a textbox where a user puts a string like this:

"hello world! I think that __i__ am awesome (yes I am!)"

I need to create a correct URL like this:

hello-world-i-think-that-i-am-awesome-yes-i-am

How can it be done using regular expressions?

Also, is it possible to do it with Greek (for example)?

"Γεια σου κόσμε"

turns to

geia-sou-kosme

In other programming languages (Python/Ruby) I am using a translation array. Should I do the same here?

+1  A: 

A simple regex for doing this job is matching all "non-word" characters, and replace them with a -. But before matching this regex, convert the string to lowercase. This alone is not fool proof, since a dash on the end may be possible.

[^a-z]+

Thus, after the replacement; you can trim the dashes (from the front and the back) using this regex:

^-+|-+$

You'd have to create greek-to-latin glyps translation yourself, regex can't help you there. Using a translation array is a good idea.

Pindatjuh
+1  A: 

I can't really say for Greek characters, but for the first example, a simple:

/[^a-zA-Z]+/

Will do the trick when using it as your pattern, and replacing the matches with a "-"

As per the Greek characters, I'd suggest using an array with all the "character translations", and then adding it's values to the regular expression.

Marcos Placona
+1  A: 

To roughly build the url you would need something like this.

var textbox = "hello world! I think that __i__ am awesome (yes I am!)";
var url = textbox.toLowerCase().replace(/([^a-z])/, '').replace(/\s+/, " ").replace(/\s/, '-');

It simply removes all non-alpha characters, removes double spacing, and then replaces all space chars with a dash.

You could use another regular expression to replace the greek characters with english characters.

Ben Rowe
+4  A: 

Try this:

function doDashes(str) {
    var re = /[^a-z0-9]+/gi; // global and case insensitive matching of non-char/non-numeric
    var re2 = /^-*|-*$/g;     // get rid of any leading/trailing dashes
    str = str.replace(re, '-');  // perform the 1st regexp
    return str.replace(re2, '').toLowerCase(); // ..aaand the second + return lowercased result
}
console.log(doDashes("hello world! I think that __i__ am awesome (yes I am!)"));
// => hello-world-I-think-that-i-am-awesome-yes-I-am

As for the greek characters, yeah I can't think of anything else than some sort of lookup table used by another regexp.

Edit, here's the oneliner version:
Edit, added toLowerCase():
Edit, embarrassing fix to the trailing regexp:

function doDashes2(str) {
    return str.replace(/[^a-z0-9]+/gi, '-').replace(/^-*|-*$/g, '').toLowerCase();
}
npup
Thanks for implementing!
Pindatjuh
Added tolowercasing and a fix to the regexp that handles the leading/trailing garbage that can occurr.
npup