tags:

views:

97

answers:

5

I am using the output streams from the io module and writing to files. I want to be able to detect when I have written 1G of data to a file and then start writing to a second file. I can't seem to figure out how to determine how much data I have written to the file.

Is there something easy built in to io? Or might I have to count the bytes before each write manually?

+3  A: 

See the tell() method on the stream object.

janneb
A: 

I recommend counting. There's no internal language counter that I'm aware of. Somebody else mentioned using tell(), but an internal counter will take roughly the same amount of work and eliminate the constant OS calls.

#pseudocode
if (written + sizeOfNew > 1G) {
    rotateFile()
}
Autocracy
Except that if you're judicious with `tell()` and allow for some margin of error, it's a lot less overhead than counting.
Nick Bastin
I cant imagine how that could be less overhead. Adding to an integer and comparing it to a maximum value are both single-instruction operations, or close to it if your type is a bit large. tell() is walking down a system call tree. Also, whenever you tell(), you're checking something that has already be written. You can code for that without too much trouble, but...
Autocracy
+6  A: 

See the Python documentation for File Objects, specifically tell().

Example:

>>> f=open('test.txt','w')
>>> f.write(10*'a')
>>> f.tell()
10L
>>> f.write(100*'a')
>>> f.tell()
110L
Mark Tolonen
+1  A: 

if you are using this file for a logging purpose i suggest using the RotatingFileHandler in logging module like this:

import logging

file_name = 'test.log'

test_logger = logging.getLogger('Test')
handler = logging.handlers.RotatingFileHandler(file_name, maxBytes=10**9)
test_logger.addHandler(handler)

N.B: you can also use this method even if you don't use it for logging if you like doing hacks :)

singularity
A: 

One fairly straight-forward approach is to subclass the builtin file class and have it keep track of the amount of output which is written to the file. Below is a some sample code showing how that might be done and it appears to mostly work.

I say mostly because the size of the files produced is sometimes slightly over the maximum while testing it, but that's because for the test the file was opened in text mode and on Windows all the '\n' linefeed characters are being converted into '\r\n' (carriage-return, linefeed) pairs, which throws the size accumulator off. Also, as currently written, the bufsize argument that the standard file() and open() functions acceppt is not supported, so the system's default size and mode get used.

Depending on exactly what you're doing the size issue may or may not be big problem -- for large maximum sizes it might thrown be off significantly. If anyone has a good platform-independent fix for this, by all means let it be known.

import os.path
verbose = False

class LtdSizeFile(file):
    ''' A file subclass which  limits size of file written to approximately "maxsize" bytes '''
    def __init__(self, filename, mode='wt', maxsize=None):
        self.root, self.ext = os.path.splitext(filename)
        self.num = 1
        self.size = 0
        if maxsize is not None and maxsize < 1:
            raise ValueError('"maxsize: argument should be a positive number')
        self.maxsize = maxsize
        file.__init__(self, self._getfilename(), mode)
        if verbose: print 'file "%s" opened' % self._getfilename()

    def close(self):
        file.close(self)
        self.size = 0
        if verbose: print 'file "%s" closed' % self._getfilename()

    def write(self, text):
        lentext =len(text)
        if self.maxsize is None or self.size+lentext <= self.maxsize:
            file.write(self, text)
            self.size += lentext
        else:
            self.close()
            self.num += 1
            file.__init__(self, self._getfilename(), self.mode)
            if verbose: print 'file "%s" opened' % self._getfilename()
            self.num += 1
            file.write(self, text)
            self.size += lentext

    def writelines(self, lines):
        for line in lines:
            self.write(line)

    def _getfilename(self):
        return '{0}{1}{2}'.format(self.root, self.num if self.num > 1 else '', self.ext)

if __name__=='__main__':
    import random
    import string

    def randomword():
        letters = []
        for i in range(random.randrange(2,7)):
            letters.append(random.choice(string.lowercase))
        return ''.join(letters)

    def randomsentence():
        words = []
        for i in range(random.randrange(2,10)):
            words.append(randomword())
        words[0] = words[0].capitalize()
        words[-1] = ''.join([words[-1], '.\n'])
        return ' '.join(words)

    lsfile = LtdSizeFile('LtdSizeTest.txt', 'wt', 100)
    for i in range(100):
        sentence = randomsentence()
        if verbose: print '  writing: {!r}'.format(sentence)
        lsfile.write(sentence)

    lsfile.close()
martineau