views:

673

answers:

7

I have a set of 10 CSV files, which normally have a an entry of this kind

a,b,c,d
d,e,f,g

Now due to some error entries in this file have become of this kind

a,b,c,d
d,e,f,g
,,,
h,i,j,k

Now I want to remove the line with only commas in all the files. These files are on a Linux filesystem.

Any command that you recommend that can replaces the erroneous lines in all the files.

+2  A: 
sed 's/,,,/replacement/' < old-file.csv > new-file.csv

optionally followed by mv new-file.csv old-file.csv

wnoise
O.M.G! Kickin' it old skool! It makes me feel ooolllldddd. :-)
Peter Rowell
doesn't remove the line... see David's for better use of sed
orip
It asked for replacement, not removal when I answered.
wnoise
@Peter, it's not old school it's classic@orip it's a fine use of sed
jskulski
In wnoise's defense, it did say replace at first; see the edit history. And this sed usage is portable across platforms; notations using '-i' are specific to GNU sed (and hence valid for the question which is about files on Linux).
Jonathan Leffler
+1  A: 

Replace or remove, your post is not clear... For replacement see wnoise's answer. For removing, you could use

awk '$0 !~ /,,,/ {print}' <old-file.csv > new-file.csv
Keltia
+5  A: 

It depends on what you mean by replace. If you mean 'remove', then a trivial variant on @wnoise's solution is:

grep -v '^,,,$' old-file.csv > new-file.csv

Note that this deletes just those lines with exactly three commas. If you want to delete mal-formed lines with any number of commas (including zero) - and no other characters on the line, then:

grep -v '^,*$' ...

There are endless other variations on the regex that would deal with other scenarios. Dealing with full CSV data with commas inside quotes starts to need something other than a regex machine. It can be done, within broad limits, especially in more complex regex systems such as PCRE or Perl. But it requires more work.

Check out Mastering Regular Expressions.

Jonathan Leffler
+1  A: 

Do you want to replace them with something, or delete them entirely? Either way, it can be done with sed. To delete:

sed -i -e '/^,\+$/ D' yourfile1.csv yourfile2.csv ...

To replace: well, see wnoise's answer, or if you don't want to create new files with the output,

sed -i -e '/^,\+$/ s//replacement/' yourfile1.csv yourfile2.csv ...

or

sed -i -e '/^,\+$/ c\
replacement' yourfile1.csv yourfile2.csv ...

(that should be entered exactly as is, including the line break). Of course, you can also do this with awk or perl or, if you're only deleting lines, even grep:

egrep -v '^,+$' < oldfile.csv > newfile.csv

I tested these to make sure they work, but I'd advise you to do the same before using them (just in case). You can omit the -i option from sed, in which case it'll print out the results (rather than writing them back to the file), or omit the output redirection >newfile.csv from grep.

EDIT: It was pointed out in a comment that some features of these sed commands only work on GNU sed. As far as I can tell, these are the -i option (which can be replaced with shell redirection, sed ... <infile >outfile ) and the \+ modifier (which can be replaced with \{1,\} ).

David Zaslavsky
nice - 'sed -i' rocks
orip
Some of your 'sed' options are not portable (GNU sed specific). Not a major problem as long as you're aware of that.
Jonathan Leffler
@Johnathan: true, I only ever use GNU sed and I tend to forget about its extensions unless I'm actually staring at the info page. Thanks.
David Zaslavsky
@Dahvid :D The question did say "Linux file system" - your answer is valid given that constraint.
Jonathan Leffler
+1  A: 

What about trying to keep only lines which are matching the desired format instead of handling one exception ?

If the provided input is what you really want to match:

grep -E '[a-z],[a-z],[a-z],[a-z]' < oldfile.csv > newfile.csv

If the input is different, provide it, the regular expression should not be too hard to write.

MatthieuP
+1  A: 

Most simply:

$   grep -v ,,,, oldfile > newfile   
$   mv newfile oldfile
Brendan Dowling
Only 3 commas in pattern to be removed. :D
Jonathan Leffler
A: 

yes, awk or grep are very good option if you are working in linux platform. However you can use perl regex for other platform. using join & split options.

Why split and join? Yes, you can certainly use perl. But the basic loop would be using a regex to match or not match the lines to be printed - I don't see the join/split operation. Even a replacement instead of a delete probably wouldn't use join or split.
Jonathan Leffler