views:

73

answers:

5

I don't know if I can use regex for this, but I want to replace something in this xml:

<custom-attribute name="Attribute_1" dt:dt="string">Danny Boyle</custom-attribute>
<custom-attribute name="DVD-releasedatum" dt:dt="string">06/10/1999</custom-attribute>

should become

<Attribute_1>Danny Boyle</Attribute_1>
<DVD-releasedatum>06/10/1999</DVD-releasedatum>

Removing this from the first tag isn't hard, but how can I close my newly formed tag?

+1  A: 

If you want to do this once, Regex replace can be an option. Otherwise, there are better ways to transform XML, XSLT, for example.

For using Regex, you could replace

\<custom-attribute.*?name="(\w+)".*?\>(.*?)\</custom-attribute\>

with

<$1>$2</$1>

Replace $1 and $2 with whatever references are called in you program. Save a backup first, though =)

Jens
Why do you add an ending `.` to the replacement text...
KennyTM
Well, its the end of the sentence starting with "For using...".But I agree, its misleading here. I'll get rid of it. =)
Jens
+1  A: 

This works for C# (not sure what language you're using):

string input = "<custom-attribute name=\"Attribute_1\" dt:dt=\"string\">Danny Boyle</custom-attribute>\r\n<custom-attribute name=\"DVD-releasedatum\" dt:dt=\"string\">06/10/1999</custom-attribute>";

string output = Regex.Replace(input, "<custom-attribute name=\"(.*?)\".*?>(.*?)</custom-attribute>", "<$1>$2</$1>");

output:

<Attribute_1>Danny Boyle</Attribute_1>
<DVD-releasedatum>06/10/1999</DVD-releasedatum>
Andy Shellam
You should make your .*s lazy.
Jens
@Jens - done, thanks.
Andy Shellam
+1  A: 

Using e.g. gvim, this will do it:

:%s/.*name="\([^"]*\)"[^>]*>\([^<]*\)<.*/<\1>\2<\/\1>/cg

This is the matching part:

.*name="\([^"]*\)"[^>]*>\([^<]*\)<.*

This is the replace part:

<\1>\2<\/\1>
Tomislav Nakic-Alfirevic
Downloading gvim for this was the fastest solution. xslt looks interesting, but I should learn more about that.How can I try this regex in sed, though?
skerit
Although I'm not sure about sed's handling of selection groups, I believe it should accept the same pattern: `sed 's/.*name="\([^"]*\)"[^>]*>\([^<]*\)<.*/<\1>\2<\/\1>/g'` filename.xml
Tomislav Nakic-Alfirevic
+1  A: 
while(<DATA>)
{
  if($_=~s/^\<.*=\"(.*)\" .*\>([a-zA-Z]+|[0-9\/ ]+).*/<$1>$2<\/$1>/)
  {
      print $_;
  }
 }
__DATA__
<custom-attribute name="Attribute_1" dt:dt="string">Danny Boyle</custom-attribute>
<custom-attribute name="DVD-releasedatum" dt:dt="string">06/10/1999</custom-attribute>
muruga
+2  A: 

This looks like a job for XSLT:

<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;
    <xsl:template match="custom-attribute">
        <xsl:element name="{@name}">
            <xsl:apply-templates/>
        </xsl:element>
    </xsl:template>
</xsl:stylesheet>

This gives you the desired output, and is very flexible for future modification and extension.

Jakob