views:

64

answers:

1

hello, I am making a inverted index using hadoop and python. I want to know how can I include the byte offset of a line/word in python. I need something like this

hello hello.txt@1124

I need the locations for making a full inverted index. Please help.

+2  A: 

Like this?

file.tell()

Return the file’s current position, like stdio's ftell().

http://docs.python.org/library/stdtypes.html#file-objects

Unfortunately tell() does not function since OP is using stdin instead of a file. But it is not hard to build a wrapper around it to give what you need.

class file_with_pos(object):
    def __init__(self, fp):
        self.fp = fp
        self.pos = 0
    def read(self, *args):
        data = self.fp.read(*args)
        self.pos += len(data)
        return data
    def tell(self):
        return self.pos

Then you can use this instead:

fp = file_with_pos(sys.stdin)
Wai Yip Tung
I am reading input from sys.stdin and file.tell() doesn't seem to work with it..
Siddharth Sharma
Added wrapper class in the answer.
Wai Yip Tung
thank you for you response ... will try it out ... However, at present i have implemented a counter variable to keep a track of position. It is working pretty good as i need only relative location within a file.
Siddharth Sharma
@Siddharth: the code suggested by Wai seems to do exactly what your “present” code does. Unless you post your own code and mark it as *the* answer, please mark Wai's answer as the chosen answer.
ΤΖΩΤΖΙΟΥ