tags:

views:

81

answers:

5

hi all ,

i am facing some problem with files with huge data. i need to skip doing some execution on those files. i get the data of the file into a variable. now i need to get the byte of the variable and if it is greater than 102400 , then print a message.

update : i cannot open the files , since it is present in a tar file. the content is already getting copied to a variable called 'data' i am able to print contents of the variable data. i just need to check if it has more than 102400 bytes.

thanks

A: 

This answer seems irrelevant, since I seem to have misunderstood the question, which has now been clarified. However, should someone find this question, while searching with pretty much the same terms, this answer may still be relevant:

Just open the file in binary mode

f = open(filename, 'rb')

read/skip a bunch and print the next byte(s). I used the same method to 'fix' the n-th byte in a zillion images once.

Confusion
+6  A: 
import os
length_in_bytes = os.stat('file.txt').st_size
if length_in_bytes > 102400:
   print 'Its a big file!'

Update to work on files in a tarfile

import tarfile
tf = tarfile.TarFile('foo.tar')
for member in tarfile.getmembers():
    if member.size > 102400:
        print 'It's a big file in a tarfile - the file is called %s!' % member.name
Robert Christie
i just update the question. thanks a ton.
randeepsp
@randeepsp I've updated the example to work show an example working with tarfiles
Robert Christie
This is better than checking len(data) because it entirely skips reading the data when it's big.
Beni Cherniavsky-Paskin
+2  A: 

If I'm understanding the question correctly, you want to skip certain input files if they're too large. For that, you can use os.path.getsize():

import os.path
if os.path.getsize('f') <= 102400:
  doit();
mrkj
+2  A: 

Just check the length of the string, then:

if len(data) > 102400:
  print "Skipping file which is too large, at %d bytes" % len(data)
else:
  process(data) # The normal processing
unwind
+1  A: 

len(data) gives you the size in bytes if it's binary data. With strings the size depends on the encoding used.

THC4k
i'm a novice so got confused within the huge code. thanks
randeepsp