tags:

views:

259

answers:

3

I'd like to replace all self-closed elements to the long syntax (because my web-browser is tripping on them).

Example

<iframe src="http://example.com/thing"/&gt;

becomes

<iframe src="http://example.com/thing"&gt;&lt;/iframe&gt;

I'm using python's flavor of regex.

A: 

In Perl,

s:(<(\w+)[^>]*?)/>:$1></$2>:

will do it.

Kinopiko
I tested it and it trips up on <div><iframe src="http://example.com/thing"/></div> for me. Does it properly handle that case for you?
Michael La Voie
Does it work now?
Kinopiko
+1  A: 

Use this python regex:

(<(\w+)[^<]*?)/>

It differs from @Kinopiko's in that it will handle nested elements.

Explanation of Regex

  1. Find the opening bracket: <
  2. Find the word following: (\w+)
  3. Find any and all tags between the tag name and its closing bracket except for another open bracket to handle nested tags: [^<]*?
  4. Find the closing tag: >

Then just replace with this statement:

\1></\2>
Michael La Voie
shouldn't the second < be a > ?
Paul Tarjan
If you just want to make sure it doesn't swallow another tag then either one is ok.
Kinopiko
+3  A: 

None of those solutions will accommodate attributes like foo="/>". Try:

s:<([\w\-_]+)((?:[^'">]|'[^']*'|"[^"]*")*)/\s*>:<$1$2></$1>:

Exploded to show detail:

<
    ([\w\-_]+)    # tag name
    (
        [^'">]*| # "normal" characters, or
        '[^']*'| # single-quoted string, or
        "[^"]*"  # double-quotes string
    )*
    /\s*         # self-closing
>

This should always work provided that the markup is valid. (You could rearrange this using lazy quantifiers if you so chose; e.g. '[^']' => '.*?'.)

Thom Smith
`<div class="/>">` isn't valid HTML is it?
nickf
You're looking at attribute *names* in HTML *5*. '/' is permitted between quotes, and '>', while prohibited, is (sadly) commonly seen in production code. Conservative in what you emit, liberal in what you accept.
Thom Smith