views:

72

answers:

2

I have a relatively large csv/text data file (33mb) that I need to do a global search and replace the delimiting character on. (The reason is that there doesn't seem to be a way to get SQLServer to escape/handle double quotes in the data during a table export, but that's another story...)

I successfully accomplished a Textmate search and replace on a smaller file, but it's choking on this larger file.

It seems like command line grep may be the answer, but I can't quite grasp the syntax, ala:

grep -rl OLDSTRING . | xargs perl -pi~ -e ‘s/OLDSTRING/NEWSTRING/’

So in my case I'm searching for the '^' (caret) character and replacing with '"' (double-quote).

grep -rl " grep_test.txt | xargs perl -pi~ -e 's/"/^'

That doesn't work and I'm assuming it has to do with the escaping of the doublequote or something, but I'm pretty lost. Help anyone?

(I suppose if anyone knows how to get SQLServer2005 to handle double quotes in a text column during export to csv, that'd really solve the core issue.)

+3  A: 

Your perl substitution seems to be wrong. Try:

grep -rl \" . | xargs perl -pi~ -e 's/\^/"/g'

Explanation:

grep : command to find matches
-r : to recursively search
-l : to print only the file names where match is found
\" : we need to escape " as its a shell meta char
. : do the search in current working dir
perl : used here to do the inplace replacement
-i~ : to do the replacement inplace and create a backup file with extension ~
-p : to print each line after replacement
-e : one line program
\^ : we need to escape caret as its a regex meta char to mean start anchor
codaddict
That both worked and helped explain it clearly. Thank you very much!
Robert Pierce
@codaddict. Oh, ok I didn't have enough 'points' to do that before. Thanks.
Robert Pierce
+1  A: 
sed -i.bak 's/\^/"/g' mylargefile.csv

Update: you can also use Perl as rein has suggested

perl -i.bak -pe 's/\^/"/g' mylargefile.csv

But on big files, sed may run a bit faster than Perl, as my result shows on a 6million line file

$ tail -4 file
this is a line with ^
this is a line with ^
this is a line with ^

$ wc -l<file
6136650

$ time sed 's/\^/"/g' file  >/dev/null

real    0m14.210s
user    0m12.986s
sys     0m0.323s
$ time perl  -pe 's/\^/"/g' file >/dev/null

real    0m23.993s
user    0m22.608s
sys     0m0.630s
$ time sed 's/\^/"/g' file  >/dev/null

real    0m13.598s
user    0m12.680s
sys     0m0.362s

$ time perl  -pe 's/\^/"/g' file >/dev/null

real    0m23.690s
user    0m22.502s
sys     0m0.393s
ghostdog74
Thanks for the help. I've never used sed, but if it's that concise it must be worth looking at. :)
Robert Pierce
perl -i.bak -pe 's/\^/"/g' mylargefile.csv isn't all that longer ...
reinierpost