ansaurus

Question

Process two files at the same time in Python

Answer 1

A:

for line1, line2 in zip(file(filename1), file(filename2)):
    # do your thing

or similar

Corey Porter 2009-11-13 18:51:05

what does "zip" do here?

Werner 2009-11-13 18:56:02

It interleaves the elements from a list of iterables. You could use itertools.izip as an alternative.

RaphaelSP 2009-11-13 19:03:24

In this specific case, it returns a list of tuples, each tuple being (line x from file1, line x from file2)

RaphaelSP 2009-11-13 19:03:59

But how does this help **at all** with the OP's problem of processing 40 lines from file 2 for each line from file 1?!

Alex Martelli 2009-11-13 19:07:48

Downvoted because this won't work for the original question. This solution assumes each file has the same number of lines, but the original question was clear that file 2 had 40 lines for every one in file 1.

Bryan Oakley 2009-11-13 20:43:34

Answer 2

A:

12340 is not any data (in sense that there are much bigger data to process on the market).

Even better approach would use build in sqlite module. If not use some simple format like CSV for example. This is a structure organized. If not use threads, you could process two files simultaneously.

bua 2009-11-13 18:54:05

can that sqlite module be used from python? how?

Werner 2009-11-13 18:56:46

import sqlite3 http://docs.python.org/library/sqlite3.html

bua 2009-11-13 19:23:46

Answer 3

+8 A:

I'm not sure if I completely understand what you're trying to do, is something like this?

f1 = open ('car_names.txt')
f2 = open ('car_descriptions.txt')
for car_name in f1.readlines ():
        for i in range (6):   # echo the first 6 lines
                print f2.readline ()
        assert f2.readline() == '@CAR_NAME'  # skip the 7th, but assert that it is @CAR_NAME
        print car_name    # print the real car name
        for i in range (33):  # print the remaining 33 of the original 40
               print f2.readline ()

eduffy 2009-11-13 19:02:30

yes, i guess so! I will check it now! thanks

Werner 2009-11-13 19:05:07

Answer 4

+4 A:

Reading car_names.txt will save you a piddling amount of memory (really really tiny by today's standards;-) but it absolutely won't be any faster than slurping it down at one gulp (best case it will be exactly the same speed, probably even a little bit slower unless your underlying operating system and storage system do a great job at read-lookahead caching / buffering). So I suggest:

import fileinput

carnames = open('car_names.txt').readlines()
carnamit = iter(carnames)

skip = False
for line in fileinput.input(['car_descriptions.txt'], True, '.bak'):
  if not skip:
    print line,
  if '@CAR_NAME' in line:
    print next(carnamit),
    skip = True
  else:
    skip = False

So measure the speed of this, and an alternative that does

carnamit = open('car_names.txt')

at the start instead of reading all lines over a list like my first version -- I bet that the first version (in as much as there's any measurable and repeatable difference) will prove to be faster.

BTW, the fileinput module of the standard library is documented here, and it's truly a convenient way to perform "virtual rewriting in-place" of text files (typically keeping the old version as a backup, just in case -- but even if the machine should crash in the middle of the operation the old version of the data will still be there, so in a sense the "rewriting" operates atomically with respect to machine crashes, a nice little touch;-).

Alex Martelli 2009-11-13 19:04:22

hi, excuse me, i understand your approach, but i get the error:NameError: name 'Next' is not definedam i missing some other library?

Werner 2009-11-13 19:18:54

I believe next is new in Python 2.6. Are you running an earlier version?

Brent Newey 2009-11-13 19:31:18

In 2.5 or earlier, you need `carnamit.next()` instead of the nicer `next(carnamit)` that works in 2.6 and later.

Alex Martelli 2009-11-13 22:59:14

Answer 5

+8 A:

First, make a generator that retrieves the car name from a sequence. You could yield every 7th line; I've made mine yield whatever line follows the line that starts with @CAR_NAME:

def car_names(seq):
    yieldnext=False
    for line in seq:
        if yieldnext: yield line
        yieldnext = line.startswith('@CAR_NAME')

Now you can use itertools.izip to go through both sequences in parallel:

from itertools import izip
with open(r'c:\temp\cars.txt') as f1:
    with open(r'c:\temp\car_names.txt') as f2:
        for (c1, c2) in izip(f1, car_names(f2)):
            print c1, c2

Robert Rossney 2009-11-13 20:00:12

Who told you that this was windows?

Davide 2009-11-23 02:48:08

I test the code that I post. If the fact that you can infer that my machine runs Windows troubles you, I suggest lying in a quiet, darkened room with a cool, damp washcloth over your eyes until the feeling passes.

Robert Rossney 2009-11-23 17:27:04

Wow, what a heated reply to my question! It looks like you need the cool washcloth more than I do! Anyway, my comment was only meant as a sad note: windows users often tend to think everybody is a windows user. You can tested your script with a file in the same directory (like this other guy http://stackoverflow.com/questions/1731102/process-two-files-at-the-same-time-in-python/1731180#1731180 ) or you could just have abstracted the path with `filename`. Improve your answer instead of ranting!

Davide 2009-11-28 05:50:42

You were, after all, moved to comment on a matter of perfect irrelevance. What next, complaining when someone uses "fizz" and "buzz" as temporary variable names because "foo" and "bar" are standard?

Robert Rossney 2009-11-28 17:34:01

Answer 6

A:

I think this fits the question:

it reads the description file one line at a time
when it sees @CAR_NAME, it still emits it, but replaces the next line in the description file with the next line from the names file


def merge_car_descriptions(namefile, descrfile):
    names = open(namefile,'r')
    descr = open(descrfile,'r')
    for d in descr:
        if '@CAR_NAME' in d:
            yield d + names.readline()
            descr.next()
        else:
            yield d

if __name__=='__main__':
    import sys
    if len(sys.argv) != 3:
        sys.exit("Syntax: %s car_names.txt car_descriptions.txt" % sys.argv[0])
    for l in merge_car_descriptions(sys.argv[1], sys.argv[2]):
        print l,

Useless 2009-11-26 14:21:40

ansaurus

tags:

views:

answers:

Process two files at the same time in Python

related questions