tags:

views:

338

answers:

7

I m looking for a way to find and replace large number of text files. For example;

I want to choose;

<li><a href="">Istanbul, TR POS </a></li>
<li><a href="">Ankara, TR POS </a></li>
<li><a href="">Izmir, TR POS </a></li>

WITH;

<li><a href="pos-istanbul-tr.php">Istanbul, TR POS </a></li>
<li><a href="pos-ankara-tr.php">Ankara, TR POS </a></li>
<li><a href="pos-izmir-tr.php">Izmir, TR POS </a></li>

Notice that initial of the label is lowercased and also added as part of the link. This should be done for a large number of text files so i m looking for the most efficient way via regex or any software that you think that might help.

A: 

I think you're going to need programming to do this, as you want to manipulate the matching text.

Sounds very doable with awk, if you're on a platform that has it. Or you could whip something up in Python, Perl, or whatever you prefer. There will very likely be other answers with actual code.

unwind
A: 

If you're on Linux you might find this thread helpful:

You could use a command line tool like sed, a scripting language like Python/Perl, or any number of other solutions to do this. If you can give more details as to what you're looking for and what OS it needs to run on that would help in providing a more specific answer.

Jay
A: 

Use some text editor capable of regexp and "search in files". E.g. EditPlus

then replace

<li><a href="">([A-Za-z]+), TR POS <\/a><\/li>

with

<li><a href="pos-\1-tr.php">\1, TR POS </a></li>

(Might need some more escaping, i.e. backslashes...)

Stroboskop
You'd have to explain what the <Va><Vli> commands do. As it stands, it looks dubious and unhelpful.
Jonathan Leffler
The OP asks that `\1` be lowercased in "pos-\1-tr.php".
J.F. Sebastian
@JL: most text editor implementations of regexp react "not good" to unescaped special characters, escpecially '/'. But i've seen so many different implementations, i've given up on finding a generic solution. You'll have to trust me that my version is a good first shot.
Stroboskop
@JFS: yeah, you're right, lowercasing is probably the part that text editor regexps fail on.
Stroboskop
... except for textmate. Those apple guys have everything.
Stroboskop
+5  A: 
$ perl -i.bak \
>  -pe's/href="">([^,]+)/\'href="pos-\'. (lc $1) . \'-tr.php">\'. $1/eg' \
>  *.html

Cross-platform variant (+ building on @Jonathan Leffler's answer)

Save it to fill-href.pl:

#!/usr/bin/perl -w -pi.bak
s/href="">([^,]+)/href="pos-\L$1\E-tr.php">$1/g

Run:

perl fill-href.pl test1.html test2.html
J.F. Sebastian
You might want to match the whole block including closing tags, in case there are similar, yet slightly different lines not to be transformed.
Stroboskop
+1  A: 
perl -pi.bak -e 's%<li><a href="">(\W+), TR POS </a></li>%<li><a href="pos-\L$1\E-tr.php">$1, TR POS </a></li>%g;' file1 file2 ...

Untested - probably over simplified, but should work on sample data. The '-p' causes Perl to print each line; the '-i.bak' creates a backup of the file with '.bak' extension and overwrites the original.

Jonathan Leffler
A: 

If you happen to have access to a Perl compatible regex (PCRE) engine, for example PHP's preg_replace(), or even Perl, if you must ;-), you can replace this regex:

<a href="">([^,]+),\s+(\w+)\s+(\w+)

with this:

<a href="\L$3-$1-$2\E.php">$1, $2 $3

The \L and \E modifiers do the lower-casing for you.

$i = '<li><a href="">Izmir, TR POS </a></li>';
$r = '/<a href="">([^,]+),\\s+(\\w+)\\s+(\\w+)/';
$s = '<a href="\\L$4-$2-$3\\E.php">$1, $2 $3';

$o = preg_replace($r, $s, $i);
echo $o;

VI / VIM have a similar mechanism of manipulating case in back-references.

Tomalak
+1  A: 

Using textmate regex engine what you need is this:

find: <li><a href="">([A-Za-z]+), TR POS <\/a><\/li>

replace: <li><a href="pos-\L$1-tr.php">$1, TR POS </a></li>

Observe that the first replacement has a modifier that takes the first match to lower case. The second $1 doesnt have the modifier. I test it and it does exactly what you need.

JorgeO