tags:

views:

104

answers:

4

Is it possible to find all text inside a node and take the matched text and replace the contents of a node using only a single regular expression? I should say that this is in a file.

Given:

<x>This text</x>
<!-- Unknown number of nodes between <x> and <y> -->
<y>Junk</y>

Change to:

<x>This text</x>
<!-- Unknown number of nodes between <x> and <y> -->
<y>This text</y>

Normally, I would do a regular expression to find the contents of x and store it in a variable. Then, I would run a second regular expression to find the contents of y and replace it with the variable's data. Just wondering if there is a "1-step" solution... Thanks.

+3  A: 

If you use JQuery, you could simply do this:

$('y').html($('x').html());

Otherwise, with standard JavaScript:

document.getElementsByTagName('y')[0].innerHTML = document.getElementsByTagName('x')[0].innerHTML;
js1568
Oops, this is in an XML file, so I couldn't use jQuery. I'll updated the tag for "Perl"
Stephen
A: 

If you have a string that you want to replace, this seems to work.

<script>
    var text = "<x>This text<x>\n<y>Junk<y>";
    var replaced = text.replace(/<x>(.*)<x>\n<y>.*<y>/s, "<x>$1</x>\n<y>$1</y>");
    alert(replaced);
</script>
Paulo Manuel Santos
The dot loses its special meaning in character classes, so `[.\n]` matches a dot or a newline; you probably want another `.*` there. And you should use the `s` modifier, not `m`. Then the regex should work, but with all those greedy `.*`s, you might find it unacceptably slow. (That's assuming `<x>` and `<y>` are unique, but the OP says they are.)
Alan Moore
@Alan Thanks for the correction. I edited the code, but I guess the question is not about javascript anymore..
Paulo Manuel Santos
A: 

I was able to contact a friend about this.

The regEx, assuming only one x and one y node, would look like this.

s/X([^X]*)X([^Y]*)Y([^Y]*)Y/X\1X\2Y\1Y/

where X is <x> and Y is <y>

Stephen
you can't use strings in character classes that way (re: `[^<x>]`)
ysth
+2  A: 
$filecontents =~ s!(<x>(?>(.*?)</x>)(?>.*?<y>))(?>.*?(</y>))!$1$2$3!s;

But you are better off using an XML parser (assuming this is XML). For instance, the above won't work with your sample text, because it will think the <y> in the comment is the beginning of the y tag.

ysth
+1. Good point, and nice use of atomic groups!
Alan Moore
that's what they're for. \K would help too, but I didn't want to assume 5.10+
ysth