os.stat(filepath).st_size
Assuming by ‘characters’ you mean bytes. ETA:
i need total character count just like what the command 'wc filename' gives me unix
In which mode? wc
on it own will give you a line, word and byte count (same as stat
), not Unicode characters.
There is a switch -m
which will use the locale's current encoding to convert bytes to Unicode and then count code-points: is that really what you want? It doesn't make any sense to decode into Unicode if all you are looking for is too-long files. If you really must:
import sys, codecs
def getUnicodeFileLength(filepath, charset= None):
if charset is None:
charset= sys.getfilesystemencoding()
readerclass= codecs.getReader(charset)
reader= readerclass(open(filepath, 'rb'), 'replace')
nchar= 0
while True:
chars= reader.read(1024*32) # arbitrary chunk size
if chars=='':
break
nchar+= len(chars)
reader.close()
return nchar
sys.getfilesystemencoding()
gets the locale encoding, reproducing what wc -m
does. If you know the encoding yourself (eg. 'utf-8') then pass that in instead.
I don't think you want to do this.