ansaurus

Question

how to check the character count of a file in python

Answer 1

+4 A:

os.path.getsize(path)

Return the size, in bytes, of path. Raise os.error if the file does not exist or is inaccessible.

S.Mark 2010-01-06 05:03:18

Answer 2

+4 A:

os.stat(filepath).st_size

Assuming by ‘characters’ you mean bytes. ETA:

i need total character count just like what the command 'wc filename' gives me unix

In which mode? wc on it own will give you a line, word and byte count (same as stat), not Unicode characters.

There is a switch -m which will use the locale's current encoding to convert bytes to Unicode and then count code-points: is that really what you want? It doesn't make any sense to decode into Unicode if all you are looking for is too-long files. If you really must:

import sys, codecs

def getUnicodeFileLength(filepath, charset= None):
    if charset is None:
        charset= sys.getfilesystemencoding()
    readerclass= codecs.getReader(charset)
    reader= readerclass(open(filepath, 'rb'), 'replace')
    nchar= 0
    while True:
        chars= reader.read(1024*32)  # arbitrary chunk size
        if chars=='':
            break
        nchar+= len(chars)
    reader.close()
    return nchar

sys.getfilesystemencoding() gets the locale encoding, reproducing what wc -m does. If you know the encoding yourself (eg. 'utf-8') then pass that in instead.

I don't think you want to do this.

bobince 2010-01-06 05:03:19

hi bob , i need total character count just like what the command 'wc filename' gives me unix

randeepsp 2010-01-06 08:33:15

@randeepsp: Update your question with additional information. Do not add this kind of important information in comments.

S.Lott 2010-01-06 11:14:33

Answer 3

+4 A:

Try

import os
os.path.getsize(filePath)

to get the size of your file, in bytes.

Sapph 2010-01-06 05:05:04

Answer 4

+8 A:

If you want the unicode character count for a text file given a specific encoding, you will have to read in the entire file to do that.

However, if you want the byte count for a given file, you want os.path.getsize(), which should only need to do a stat on the file as long as your OS has stat() or an equivalent call (all Unixes and Windows do).

Mike 2010-01-06 05:05:18

Because of UTF coding schemes, it's possible that you'll have characters with a varying number of bytes.

S.Lott 2010-01-06 11:15:20

Answer 5

+1 A:

alternative way

f=open("file")
os.fstat( f.fileno() ).st_size
f.close()

ghostdog74 2010-01-06 05:33:00

ansaurus

tags:

views:

answers:

how to check the character count of a file in python

related questions