views:

298

answers:

5

Hi !

I have a python code which reads many files. but some files are extremely large due to which i have errors coming in other codes. i want a way in which i can check for the character count of the files so that i avoid reading those extremely large files. Thanks.

+4  A: 
os.path.getsize(path) 

Return the size, in bytes, of path. Raise os.error if the file does not exist or is inaccessible.

S.Mark
+4  A: 
os.stat(filepath).st_size

Assuming by ‘characters’ you mean bytes. ETA:

i need total character count just like what the command 'wc filename' gives me unix

In which mode? wc on it own will give you a line, word and byte count (same as stat), not Unicode characters.

There is a switch -m which will use the locale's current encoding to convert bytes to Unicode and then count code-points: is that really what you want? It doesn't make any sense to decode into Unicode if all you are looking for is too-long files. If you really must:

import sys, codecs

def getUnicodeFileLength(filepath, charset= None):
    if charset is None:
        charset= sys.getfilesystemencoding()
    readerclass= codecs.getReader(charset)
    reader= readerclass(open(filepath, 'rb'), 'replace')
    nchar= 0
    while True:
        chars= reader.read(1024*32)  # arbitrary chunk size
        if chars=='':
            break
        nchar+= len(chars)
    reader.close()
    return nchar

sys.getfilesystemencoding() gets the locale encoding, reproducing what wc -m does. If you know the encoding yourself (eg. 'utf-8') then pass that in instead.

I don't think you want to do this.

bobince
hi bob , i need total character count just like what the command 'wc filename' gives me unix
randeepsp
@randeepsp: Update your question with additional information. Do not add this kind of important information in comments.
S.Lott
+4  A: 

Try

import os
os.path.getsize(filePath)

to get the size of your file, in bytes.

Sapph
+8  A: 

If you want the unicode character count for a text file given a specific encoding, you will have to read in the entire file to do that.

However, if you want the byte count for a given file, you want os.path.getsize(), which should only need to do a stat on the file as long as your OS has stat() or an equivalent call (all Unixes and Windows do).

Mike
Because of UTF coding schemes, it's possible that you'll have characters with a varying number of bytes.
S.Lott
+1  A: 

alternative way

f=open("file")
os.fstat( f.fileno() ).st_size
f.close()
ghostdog74