views:

165

answers:

4

Hey,

I need a regular expression to solve the following problem (links to similar problems is also appreciated, related tutorials etc.):

"__some_words_a_b___" => "__some words a b___"
"____" => "____"
"some___words" => "some   words"

So I want underscores between words to be replaced with space and keep leading and trailing underscores. I found this:

^[ \t]+|[ \t]+$

and I guess it most be something like that. I will use it in jQuery, Java (stdlibs) and maybe XSLT.

Addition: The sentences do not necessarily start with underscores or ends with underscores. It is also possible that a sentence ain't containing underscores at all. Multiple underscores should render to multiple spaces

Best regards Lasse Espeholt

A: 

Maybe this is what you want (Javascript):

var newString = oldString.replace(/(\w)_(\w)/g, "$1 $2");

If there can be many underscores between words, then:

var newString = oldString.replace(/(\w)_+(\w)/g, "$1 $2");

If you want to keep the same number of spaces as underscores:

var newString = oldString.replace(/(\w)(_+)(\w)/g, function(_, l1, u, l2) {
  return l1 + (u.length == 1 ? ' ' : (new Array(u.length - 1).join(' '))) + l2;
});
Pointy
Thanks for contribution :) but "__hej_med_dig__" renders to "_ hej med dig _"
lasseespeholt
+1  A: 

I think this would be simpler using both a regex and string substituion. Here's an answer in Python, because I'm not familiar enough with jQuery, Java, or XSLT:

import re

def mangle_string(string):
    """
    Replace underscores between letters with spaces, leave leading and
    trailing underscores alone.
    """
    # Match a string that starts with zero or more underscores, followed by a
    # non-underscore, followed by zero or more of any characters, followed by
    # another non-underscore, followed by zero or more underscores, then the
    # end of the string.  If the string doesn't match that pattern, then return
    # it unmodified.
    m = re.search(r'^(_*)([^_]+.*[^_]+)(_*)$', string)
    if not m:
        return string
    # Return the concatentation of first group (the leading underscores), then
    # the middle group (everything else) with any internal underscores
    # replaced with spaces, then the last group (the trailing underscores).
    return m.group(1) + m.group(2).replace('_', ' ') + m.group(3)
Zach Hirsch
The idea is that I want consistency in my methods. Is it possible to make a replacement string which do what you do in the return statement?
lasseespeholt
Probably, but I don't think it'll be as straightforward (and it most likely won't perform as well as string substitution).
Zach Hirsch
+3  A: 

This should work in Javascript:

var newString = oldString.replace(/([^_].*?)_(?=[^_|^\s])/g,"$1 ");

Edit: if you have whitespace in the string already, might need something like this added:

var newString = oldString.replace(/([^_|\s].*?)_(?=[^_|^s])/g,"$1 ");

Any other edge cases I forgot? :) Oh yeah, another edge case. Keep the ending underscore if followed by whitespace (like a newline, end of line, etc).

edit: Alternate solution for if the number of underscores in between words>1

var arrayString = oldString.replace(/^(_+)(.*?)(_+)$/g,"$1;$2;$3");
var a = arrayString.split(";");
var newString = a[0]+a[1].replace(/_/g," ")+a[2];
ghoppe
The last one almost work :) however if the sentence is __test__test__ one underscore remain. I tried this: ([^_|\s].*?)_+(?=[^_]) but it replaced both underscores with one space
lasseespeholt
If that's the case, you'll need two replace methods.
ghoppe
fair enough :) ([^_|\s].*?)_+(?=[^_]) will do, thanks for your time :)
lasseespeholt
A: 

I wouldn't use a RegEx for this. I would count the leading and trailing underscores and then join the leading substring (if any) with middle.replace('_',' ') and the trailing substring (if any). If the leading underscores run to the end just immediately return the original string.

Software Monkey
It is properly a lot faster but in these languages it will take some lines to do. And in XSLT I would prefer a reg.exp. Performance is not an issue in my case :)
lasseespeholt