tags:

views:

300

answers:

6

I have a bunch of XML that has lines that look like this

<_char font_name="/ITC Stone Serif Std Bold" italic="true" />

but sometimes look like this

<_char font_size="88175" italic="true" font_name="/ITC Stone Serif Std Bold" />

Here's what I need to do

  • Replace italic="true" with italic="false for every line that contains ITC Stone Serif Std Bold, regardless of whether it comes before OR after the italic part.

Can this be done with a single regex?

I'm not looking for a real-time solution. I just have a ton of XML files that have this "mistake" in them and I'm trying to do a global search-and-replace with PowerGrep which would require a single regex. If scripting's the only way to do it, then so be it.

A: 

I'm not 100% sure, but I feel like something like XSLT might be better suited to this task, and depending upon what Regex engine you're using, I'm not sure that you can do this with a single regular expression.

theraccoonbear
Clarification added to the OP.
Mark Biek
+1  A: 

Well, in general, using RE for XML parsing isn't a great idea. But if you really wanted, the easiest way would be to just do it in two lines:

if (/ITC Stone Serif Std Bold/) {
   s/italic="true"/italic="false"/g;
}
zigdon
3 lines? And that could match other font changes (something my solution also suffers from).
Jonathan Leffler
+3  A: 

Does the simple use of '|' operator satisfy you ?

name="/ITC Stone Sans Std Bold"[^>]italic="(true)"|italic="(true)"[^>]font_name="/ITC Stone Serif Std Bold"

That should detect any line with the attribute name before of after attribute italic with value true.

VonC
That's probably the easiest way to do it. I was mostly wondering if there was some magical regex operator I didn't know about.
Mark Biek
Not to my knowledge... but may be a regexp guru can post a better solution.
VonC
This seems like the best approach for my purposes.
Mark Biek
A: 

In Perl - untested:

while (<>)
{
    s/italic="true"/italic="false"/ if m%font_name="/ITC Stone Sans Std Bold" italic="true"|italic="true" font_name="/ITC Stone Serif Std Bold"%;
    print;
}

Very simple minded - might need a global qualifier, might need a more complex substitute if other parts of the same line could contain italic options.

Also - a thought - should you take this opportunity to make the notation uniform, so always put italic in front of (or behind) the font name?

Jonathan Leffler
These are, unfortunately, auto-generated files. Otherwise I'd be all in favor of standardizing. Luckily I only have to do this once.
Mark Biek
A: 
Pattern: /(<_char(?=(?:\s+\w+="[^"]*")*?\s+font_name="[^"]*?ITC Stone Serif Std Bold[^"]*")(?:\s+\w+="[^"]*")*?\s+italic=")true(?=")/
Replacement: '$1false'
MizardX
A: 

Perl 5.10

Using new features of Perl 5.10.

s(
 <_char \s* [^>]*? \K (?: (?&font) \s+ (?&italic) | (?&italic) \s+ (?&font) )
 (?(DEFINE)
  (?<font>font_name="/ITC[ ]Stone[ ]Serif[ ]Std[ ]Bold")
  (?<italic>italic="true")
 )
){
 $+{font} . 'italic="false"'
}xge

Warning: not tested.

Brad Gilbert