tags:

views:

334

answers:

3

Trying to understand how you're supposed to read files in python. This is what I've done and it isn't working quite properly:

import os.path

filename = "A 180 mb large file.data"
size = os.path.getsize(filename)

f = open(filename, "r")
contents = f.read()
f.close()

print "The real filesize is", size
print "The read filesize is", len(contents)

f = open(filename, "r")

size = 0

while True:
    contents = f.read(4)
    if not contents: break
    size += len(contents)

f.close()

print "this time it's", size

Outputs:

The real filesize is 183574528
The read filesize is 10322
this time it's 13440

Somebody knows whats going on here? :)

+5  A: 

If your file confuses the C libraries, then your results are expected.

The OS thinks it's 180Mb.

However, there are null bytes scattered around, which can confuse the C stdio libraries.

Try opening the file with "rb" and see if you get different results.

S.Lott
+3  A: 

The first is the filesize in bytes, the other times you read the file as text and count characters. Change all open(filename, "r") to open(filename, "rb") and it works.

THC4k
So how do you check the size of a string in bytes? Because that is what you get, isn't it, a string from f.read? Let's say I'd like to send a file from one computer to another, and it's a large file, so it has to be sent piece by piece. First of all, the sending computer would send the size of the file, so the other computer would know what to expect. Then it would start sending the file. The other computer would have to calculate how many bytes it has gotten, so it would know when the entire file has been sent. So how would you check that?
quano
A: 

This is not about strings : Python is perfectly happy with null bytes in strings.

This is because you are on Windows and you open the file in text mode, so it converts all "\n" into "\r\n", thus destroying all your binary data.

Open your file in binary mode with mode "rb"

peufeu