tags:

views:

271

answers:

5

I'd like to grep the following in BBedit.

Find:

<dc:subject>Knowledge, Mashups, Politics, Reviews, Ratings, Ranking, Statistics</dc:subject>

Replace with:

<dc:subject>Knowledge</dc:subject>
<dc:subject>Mashups</dc:subject>
<dc:subject>Politics</dc:subject>
<dc:subject>Reviews</dc:subject>
<dc:subject>Ratings</dc:subject>
<dc:subject>Ranking</dc:subject>
<dc:subject>Statistics</dc:subject>

OR

Find:

<dc:subject>Social web, Email, Twitter</dc:subject>

Replace with:

<dc:subject>Social web</dc:subject>
<dc:subject>Email</dc:subject>
<dc:subject>Twitter</dc:subject>

Basically, when there's more than one category, I need to find the comma and space, add a linebreak and wrap the open/close around the category.

Any thoughts?

A: 

Find:

(.+?),\s?

Replace:

\1\r

I'm not sure what you meant by “wrap the open/close around the category” but if you mean that you want to wrap it in some sort of tag or link just add it to the replace.

Replace:

<a href="http://example.com/"&gt;\1&lt;/a&gt;\r

Would give you

<a href="http://example.com/"&gt;Social web</a>
<a href="http://example.com/"&gt;Email&lt;/a&gt;
<a href="http://example.com/"&gt;Twitter&lt;/a&gt;

Or get fancier with Replace:

<a href="http://example.com/tag/\1/"&gt;\1&lt;/a&gt;\r

Would give you

<a href="http://example.com/tag/Social web/">Social web</a>
<a href="http://example.com/tag/Email/"&gt;Email&lt;/a&gt;
<a href="http://example.com/tag/Twitter/"&gt;Twitter&lt;/a&gt;

In that last example you may have a problem with the “Social web” URL having a space in it. I wouldn't recommend that, but I wanted to show you that you could use the \1 backreference more than once.

The Grep reference in the BBEdit Manual is fantastic. Go to Help->User Manual and then Chapter 8. Learning how to use RegEx well will change your life.

UPDATE Weird, when I first looked at this it didn't show me your full example. Based upon what I see now you should

Find:

(.+?),\s?

Replace:

<dc:subject>\1</dc:subject>\r
crazyj
Thank you crazyj! Works great on a single line, but I'm trying to cleanup an RSS file so the lines are part of a long document and I need to find those too.I can't seem to initiate a find on:<dc:subject>(.+?),\s?</dc:subject>To replace it with:<dc:subject>\1</dc:subject>\rDo you know how I would do that?
EC
A: 

Thank you crazyj! Works great on a single line, but I'm trying to cleanup an RSS file so the lines are part of a long document and I need to find those too.

I can't seem to initiate a find on:

<dc:subject>(.+?),\s?</dc:subject>

To replace it with:

<dc:subject>\1</dc:subject>\r

Do you know how I would do that?

EC
A: 

I don't use BBEdit, but in Vim you can do this:

%s/(_[^<]+)<\/dc:subject>/\=substitute(submatch(0), ",[ \t]*", "<\/dc:subject>\r", "g")/g

It will handle multiple lines and tags that span content with line breaks. It handles lines with multiple too, but won't always get the newline between the close and start tag.

If you post this to the google group vim_use and ask for a Vim solution and the corresponding perl version of it, you would probably get a bunch of suggestions and something that works in BBEdit and then also outside any editor in perl.

Don

Thanks for your help, Don. I'm not familiar with Vim but will look into it. I have a giant XML file on my desktop and am trying to just convert the above so I can import it into a new database.
EC
A: 

You can use sed to do this either, in theory you just need to replace ", " with the closing and opening <dc:subject> and a newline character in between, and output to a new file. But sed doesn't seem to like the html angle brackets...I tried escaping them but still get error messages any time they're included. This is all I had time for so far, so if I get a chance to come back to it I will. Maybe someone else can solve the angle bracket issue:

sed s/, /</dc:subject>\n<dc:subject>/g file.txt > G:\newfile.txt

Ok I think I figured it out. Basically had to put the replacement text containing angle brackets in double quotes and change the separator character sed uses to something other than forward slash, as this is in the replacement text and sed didn't like it. I don't know much about grep but read that grep just matches things whereas sed will replace, so is better for this type of thing:

sed s%", "%"</dc:subject>\n<dc:subject>"%g file.txt > newfile.txt
Ciaran Bruen
Thanks so much. I'm not familiar with sed at all, but will see if I can figure it out.
EC
A: 

You can't do this via normal grep. But you can add a "Unix Filter" to BBEdit doing this work for you:

#!/usr/bin/perl -w

while(<>) {
my $line = $_;
$line =~ /<dc:subject>(.+)<\/dc:subject>/;
my $content = $1;
my @arr;

if ($content =~ /,/) {
    @arr = split(/,/,$content);
}
my $newline = '';
foreach my $part (@arr) {
    $newline .= "\n" if ($newline ne '');
    $part =~ s/^\s*(\S*(?:\s+\S+)*)\s*$/$1/; 
    $newline .= "<dc:subject>$part</dc:subject>";
}
print $newline;
}

How to add this UNIX-Filter to BBEdit you can read at the "Installation"-Part of this URL: http://blog.elitecoderz.net/windows-zeichen-fur-mac-konvertieren-und-umgekehrt-filter-fur-bbeditconverting-windows-characters-to-mac-and-vice-versa-filter-for-bbedit/2009/01/

Erik