ansaurus

Question

Answer 1

A:

Suppose you already received the value of rel:

var value = document.getElementById(id).getAttribute( "rel");
var rel = (new String( value)).replace( /\s/g,"_");
document.getElementById(id).setAttribute( "rel", rel);

Artem Barger 2009-05-14 09:19:29

He is using textmate. That is an editor

Norbert Hartl 2009-05-14 09:23:30

Upps, somehow I've missed that. o_O

Artem Barger 2009-05-14 10:11:17

Answer 2

A:

I don't think you can do this properly. Though I wonder why you need to do it at one go?

I can think of a really poor way of doing it, but even if I don't recommend it, here goes:

You could sort of do it with the regex below. However, you would have to increase the number of captures and outputs with a _ on the end to the potential number of spaces in the rel. I bet that is a requirement which disallows this solution.

Search:

{\<a *href\=\"[^\"]*" *rel\=\"}{([^ ]*|[^\"]*)}( |\")*{([^ ]*|[^\"]*)}( |\")*{([^ ]*|[^\"]*)}( |\")*{([^ ]*|[^\"]*)}( |\")*{([^ ]*|[^\"]*)}( |\")*{([^ ]*|[^\"]*)}( |\")*{([^ ]*|[^\"]*)}( |\")*{([^ ]*|[^\"]*)}( |\")*

Replace:

\1\2_\3_\4_\5_\6_\7_\8_

This way has two downsides, one is there might be limitations to the number of captures you can have in Textmate, two is you'll end up with a large number of _'s on the end of each line.

With your current test, with the regex above, you would end up with:

<a href="#" rel="this_is_a_test">____

PS: This regex is of the format of the visual studio search/replace box. You'll probably need to change some characters to make it fit textpad.

 {} => capturing group

  () => grouping

  [^A] => anything but A

  ( |\")* => space or "

  \1 => is the first capture

Rune Sundling 2009-05-14 10:12:03

Hey thanks!You gave me something to think about.You're absolutely right. I don't need to do it in one go.I found a way to match the first space, although it looks a bit like a joke:(?<=rel="[\w+][\w+][\w+][\w+])\s+(-:Anyway then I get:<a href="#" rel="this_is a test">I'm thinking I should be able to run the search/replace a few times until it stops getting matches.Basically replacing the spaces one at a time:<a href="#" rel="this_is a test"><a href="#" rel="this_is_a test"><a href="#" rel="this_is_a_test">Q's:How do I avoid the repeated [\w+]?Will it match the _'s?

2009-05-14 11:06:56

Wow, the comment ate my newlines ...Hope it's still readable!

2009-05-14 11:07:41

In Visual Studio syntax, this would work as you describe: Search:{\<a *href\=\"[^\"]*" *rel\=\"([^ ]*|[^\"]*)} Replace:\1_(note that it is a space after the last visible character in the regex to match a space)

Rune Sundling 2009-05-14 11:40:42

but yes, w will match _

Rune Sundling 2009-05-14 11:49:48

Thanks!But I get this result for the first run: rel="this _is a test"The space character is matched and inserted in the replacement string.It should be easy to remove the spaces afterwards, but the problem means I keep targeting the same location for insertion: rel="this _________is a test"

2009-05-14 11:54:05

Sounds like you put the space on the inside of the capturing group and not the outside. it should be }space and not space}

Rune Sundling 2009-05-14 12:29:05

Answer 3

A:

Regexes are fundamentally bad at parsing HTML (see Can you provide some examples of why it is hard to parse XML and HTML with a regex? for why). What you need is an HTML parser. See Can you provide an example of parsing HTML with your favorite parser? for examples using a variety of parsers.

Chas. Owens 2009-05-14 14:29:32

Answer 4

A:

I have to get on-board the "you're using the wrong tool for the job" train here. You have Textmate, so that means OSX, which means you have sed, awk, ruby and perl that can all do this much much better and easier.

Learning how to use one of these tools to do text manipulation will give you uncountable benefits in the future. Here is a URL that will ease you into sed: http://www.grymoire.com/Unix/Sed.html

Adam Luter 2009-05-14 14:35:57

Answer 5

A:

If you're using TextMate, then you're on a Mac, and therefore have Python.

Try this:

#!/usr/bin/env python

import re

input = open('test.html', 'r')

p_spaces = re.compile(r'^.*rel="[^"]+".*$')

for line in input:
    matches = p_spaces.findall(line)

    for match in matches:
        new_rel = match.replace(' ', '_')
        line = line.replace(match, new_rel)

    print line,

Sample output:

 $ cat test.html
testing, testing, 1, 2, 3
<a href="#" rel="this is a test">
<unrelated line>
Stuff
<a href="#" rel="this is not a test">
<a href="#" rel="this is not a test" rel="this is invalid syntax (two rels)">
aoseuaoeua

 $ ./test.py
testing, testing, 1, 2, 3
<a_href="#"_rel="this_is_a_test">
<unrelated line>
Stuff
<a_href="#"_rel="this_is_not_a_test">
<a_href="#"_rel="this_is_not_a_test"_rel="this_is_invalid_syntax_(two_rels)">
aoseuaoeua

ShawnMilo 2009-05-14 14:53:55

ansaurus

tags:

views:

answers:

Regex match spaces in html attribute

related questions