views:

582

answers:

4

Hi!

Is there a pythonic way of reading - say - mixed integer and char input without reading the whole input at once and without worrying about linebreaks? For example I have a file with whitespace-separated data of which I only know that there are x integers, then y chars and then z more integers. I don't want to assume anything about linebreaks.

I mean something as mindless as the following in C++:

...

int i, buf;
char cbuf;
vector<int> X, Z;
vector<int> Y;

for (i = 0; i < x; i++) {
    cin >> buf;
    X.push_back(buf);
}

for (i = 0; i < y; i++) {
    cin >> cbuf;
    Y.push_back(cbuf);
}

for (i = 0; i < z; i++) {
    cin >> buf;
    Z.push_back(buf);
}

EDIT: i forgot to say that I'd like it to behave well under live input from console as well - i.e. there should be no need to press ctrl+d before getting tokens and the function should be able to return them as soon as a line has been entered. :)

Best regards, Artur Gajowy

+2  A: 

Like this?

>>> data = "1 2 3 4 5 6 abcdefg 9 8 7 6 5 4 3"

For example, we might get this with data= someFile.read()

>>> fields= data.split()
>>> x= map(int,fields[:6])
>>> y= fields[6]
>>> z= map(int,fields[7:])

Results

>>> x
[1, 2, 3, 4, 5, 6]
>>> y
'abcdefg'
>>> z
[9, 8, 7, 6, 5, 4, 3]
S.Lott
Wow, using map(int, aList) is something I wouldn't think of. Nice :-)
Abgan
doesn't the data = someFile.read() read the entire file (which is what the question tries to avoid)?
Orion Edwards
Yes, but... The question was mostly on parsing incrementally and ignoring line breaks. The "without reading the whole input" is sometimes specious (unless your file is several Gb).
S.Lott
+3  A: 

if you don't want to read in a whole line at a time, you might want to try something like this:

def read_tokens(file):
    while True:
        token = []
        while True:
            c = file.read(1)
            if c not in ['', ' ', '\t', '\n']:
                token.append(c)
            elif c in [' ', '\t', '\n']:
                yield ''.join(token)
                break
            elif c == '':
                yield ''.join(token)
                raise StopIteration

that should generate each whitespace-delimited token in the file reading one character at a time. from there you should be able to convert them to whatever type they should be. the whitespace can probably be taken care of better, too.

Autoplectic
+4  A: 

How about a small generator function that returns a stream of tokens and behaves like cin:

def read_tokens(f):
   for line in f:
       for token in line.split():
           yield token

x = y = z = 5  # for simplicity: 5 ints, 5 char tokens, 5 ints
f = open('data.txt', 'r')
tokens = read_tokens(f)
X = []
for i in xrange(x):
    X.append(int(tokens.next()))
Y = []
for i in xrange(y):
    Y.append(tokens.next())
Z = []
for i in xrange(z):
    Z.append(int(tokens.next()))
unbeknown
A: 

How's this? Building on heikogerlach's excellent read_tokens.

def read_tokens(f):
   for line in f:
       for token in line.split():
           yield token

We can do things like the following to pick up 6 numbers, 7 characters and 6 numbers.

fi = read_tokens(data)
x= [ int(fi.next()) for i in xrange(6) ]
y= [ fi.next() for i in xrange(7) ]
z= [ int(fi.next()) for i in xrange(6) ]
S.Lott