views:

383

answers:

8

I have csv files with the following format:

CSV FILE
"a"             , "b"     , "c" , "d"
hello, world    , 1       , 2   , 3
1,2,3,4,5,6,7   , 2       , 456 , 87
h,1231232,3     , 3       , 45  , 44

The problem is that the first field has commas "," in it. I have no control over file generation, as that's the format I receive them in. Is there a way to read a CSV file backwards, from the end of line to the beginning?

I don't mind writing a little python script to do so, if I’m guided in the right direction.

+1  A: 

You could always do something with regex's, like (perl regex)

#!/usr/bin/perl

use IO::File;

if (my $file = new IO::File("test.csv"))
{
    foreach my $line (<$file>) {
    $line =~ m/^(.*),(.*?),(.*?),(.*?)$/;
    print "[$1][$2][$3][$4]\n";
    }
} else {
    print "Unable to open test.csv\n";
}

(The first is a greedy search, the last 3 are not) Edit: posted full code instead of just the regex

cyberconte
http://xkcd.com/208/
Tom Ritter
+1  A: 

Reverse the string first and then process it.

tmp = tmp[::-1]

klabranche
+13  A: 

The rsplit string method splits a string starting from the right instead of the left, and so it's probably what you're looking for (it takes an argument specifying the max number of times to split):

line = "hello, world    , 1       , 2   , 3"
parts = line.rsplit(",", 3)
print parts  # prints ['hello, world    ', ' 1       ', ' 2   ', ' 3']

If you want to strip the whitespace from the beginning and end of each item in your splitted list, then you can just use the strip method with a list comprehension

parts = [s.strip() for s in parts]
print parts  # prints ['hello, world', '1', '2', '3']
Eli Courtwright
that worked great, thanks a bundle
dassouki
+1  A: 

From the sample You have provided, it looks like "columns" are fixed size. First (the one with commas) is 16 characters long, so why don't You try reading the file line by line, then for each line reading the first 16 characters (as a value of first column), and the rest accordingly? After You have each value, You can go and parse it further (trim whitespaces, and so on...).

mkolodziejski
I just formatted it, for your viewing pleasure
dassouki
+1  A: 

That's not then a CSV file, comma separated means just that.

How can you be certain that is not:

CSV FILE
"a"             , "b"     , "c" , "d"
hello           , world   , 1   , 2   , 3
1               , 2       , 3   , 4   , 5,6,7,2,456,87
h               , 1231232 , 3   , 3   , 45,44

If the file is as you indicate then the first group should be surrounded by quotes, looks as though the field names are so odd that fields containing commas are not.

I'm not a fan of fixing errors away from their source, I'd push back to the data generator to deliver proper CSV if that's what they are claiming it is.

Lazarus
Good point, but i'm sure about the format as i'm 100% as the last three columns are type int. We recieve the files as is. We have no control over the generation. Hell, I'm not even sure what they use to generate them
dassouki
One of my favourite quotes is "I love IT standards... there are so many to choose from!" What's missing is that even after one of the many standards is chosen there are a million ways to cock it up. Glad to see you found a solution, I quite liked the Regex solution too but it would be expensive processor-wise (old Perl programmer with a soft spot for regex). Good luck with your processing!
Lazarus
A: 

If you always expect the same number of columns, and only the first column can contain commas, just read anything and concatenate excess columns at the beginning.

The problem is that the interface is ambiguous, and you can try to circumvent this, but the better solution is to try to get the interface fixed (which is often harder than creating several patches...).

Gamecat
A: 

I agree with mr beer. That is a badly formed csv file. Your best bet is to find other delimiters or stop overloading the commas or quote/escape the non field separating commas

Tim
+3  A: 

I don't fully understand why you want to read each line in reverse, but you could do this:

import csv
file = open("mycsvfile.csv")
reversedLines = [line[::-1] for line in file]
file.close()
reader = csv.reader(reversedLines)
for backwardRow in reader:
    lastField = backwardRow[0][::-1]
    secondField = backwardRow[1][::-1]
Greg
That worked great :D thanks
dassouki