ansaurus

Question

Answer 1

+1 A:

You could always do something with regex's, like (perl regex)

#!/usr/bin/perl

use IO::File;

if (my $file = new IO::File("test.csv"))
{
    foreach my $line (<$file>) {
    $line =~ m/^(.*),(.*?),(.*?),(.*?)$/;
    print "[$1][$2][$3][$4]\n";
    }
} else {
    print "Unable to open test.csv\n";
}

(The first is a greedy search, the last 3 are not) Edit: posted full code instead of just the regex

cyberconte 2009-08-13 14:30:56

http://xkcd.com/208/

Tom Ritter 2009-08-13 14:32:25

Answer 2

+1 A:

Reverse the string first and then process it.

tmp = tmp[::-1]

klabranche 2009-08-13 14:31:05

Answer 3

+13 A:

The rsplit string method splits a string starting from the right instead of the left, and so it's probably what you're looking for (it takes an argument specifying the max number of times to split):

line = "hello, world    , 1       , 2   , 3"
parts = line.rsplit(",", 3)
print parts  # prints ['hello, world    ', ' 1       ', ' 2   ', ' 3']

If you want to strip the whitespace from the beginning and end of each item in your splitted list, then you can just use the strip method with a list comprehension

parts = [s.strip() for s in parts]
print parts  # prints ['hello, world', '1', '2', '3']

Eli Courtwright 2009-08-13 14:32:41

that worked great, thanks a bundle

dassouki 2009-08-13 16:23:31

Answer 4

+1 A:

From the sample You have provided, it looks like "columns" are fixed size. First (the one with commas) is 16 characters long, so why don't You try reading the file line by line, then for each line reading the first 16 characters (as a value of first column), and the rest accordingly? After You have each value, You can go and parse it further (trim whitespaces, and so on...).

mkolodziejski 2009-08-13 14:33:15

I just formatted it, for your viewing pleasure

dassouki 2009-08-13 14:35:28

Answer 5

+1 A:

That's not then a CSV file, comma separated means just that.

How can you be certain that is not:

CSV FILE
"a"             , "b"     , "c" , "d"
hello           , world   , 1   , 2   , 3
1               , 2       , 3   , 4   , 5,6,7,2,456,87
h               , 1231232 , 3   , 3   , 45,44

If the file is as you indicate then the first group should be surrounded by quotes, looks as though the field names are so odd that fields containing commas are not.

I'm not a fan of fixing errors away from their source, I'd push back to the data generator to deliver proper CSV if that's what they are claiming it is.

Lazarus 2009-08-13 14:35:21

Good point, but i'm sure about the format as i'm 100% as the last three columns are type int. We recieve the files as is. We have no control over the generation. Hell, I'm not even sure what they use to generate them

dassouki 2009-08-13 14:38:46

One of my favourite quotes is "I love IT standards... there are so many to choose from!" What's missing is that even after one of the many standards is chosen there are a million ways to cock it up. Glad to see you found a solution, I quite liked the Regex solution too but it would be expensive processor-wise (old Perl programmer with a soft spot for regex). Good luck with your processing!

Lazarus 2009-08-14 08:43:29

Answer 6

A:

If you always expect the same number of columns, and only the first column can contain commas, just read anything and concatenate excess columns at the beginning.

The problem is that the interface is ambiguous, and you can try to circumvent this, but the better solution is to try to get the interface fixed (which is often harder than creating several patches...).

Gamecat 2009-08-13 14:36:01

Answer 7

A:

I agree with mr beer. That is a badly formed csv file. Your best bet is to find other delimiters or stop overloading the commas or quote/escape the non field separating commas

Tim 2009-08-13 14:36:03

Answer 8

+3 A:

I don't fully understand why you want to read each line in reverse, but you could do this:

import csv
file = open("mycsvfile.csv")
reversedLines = [line[::-1] for line in file]
file.close()
reader = csv.reader(reversedLines)
for backwardRow in reader:
    lastField = backwardRow[0][::-1]
    secondField = backwardRow[1][::-1]

Greg 2009-08-13 14:36:26

That worked great :D thanks

dassouki 2009-08-13 16:24:03

ansaurus

tags:

views:

answers:

parsing CSV files backwards

related questions