views:

33

answers:

2

I'm working on a Python script that will upload a bunch of files to an FTP site. To check to see whether the file has changed, I'm comparing file sizes. The problem is, the files I'm uploading have \r\n line endings, but transferring via FTP (ascii mode to a Linux box) converts to \n line endings. Obviously I'm losing a bunch of bytes in this process, so I can't compare file sizes any more.

I'm not sure the best way to proceed here. Convert from \r\n to \n on the fly when checking file sizes? Upload everything in binary mode? Stop comparing file size?

+1  A: 

I would not base your check on whether the file has changed based on filesize. Since it is ascii text, the file could have changed and still have the exact same number of bytes.

Jansen Price
+1  A: 

Using file sizes is a bad idea unless they can only grow if they change (typically not the case, though, unless they are log files or something).

One option is to keep track of a checksum (md5sum is typically what is used) for each file (which could be uploaded to the ftp server as well). If the checksum matches what is in the master checksum file, then nothing has changed, otherwise upload the changed file and update the md5sum of that file.

jstedfast
+1 You have the right idea, though these days, using MD5 is definitely not recommended (nor SHA-1 anymore, for that matter). Use at least SHA-256.
Chris Jester-Young
I'd argue that it depends. As a simple checksum, CRC is debatably sufficient or too weak, while MD5 and SHA-1 are definitely good enough. For a cryptographically secure hash, then yes, be prepared to use at least a SHA-2 family hash now and upgrade it as time goes on.
ephemient