views:

210

answers:

2

I have 3 text files

  1. many lines of value1<tab>value2 (maybe 600)
  2. many more lines of value2<tab>value3 (maybe 1000)
  3. many more lines of value2<tab>value4 (maybe 2000)

Not all lines match, some will have one or more vals missing. I want to take file 1, read down it and lookup corresponding values in files 2 & 3, and write the output as - for example

value1<tab>value2<tab>value3<tab>value4
value1<tab>value2<tab>blank <tab>value4

i.e. indicate that the value is missing by printing a bit of text

in awk I can BEGIN by reading the files into arrays up front then END and step through them. But I want to use Python (3) for portability. I do it on a pc using MS Access and linking tables but there is a time penalty for each time I use this method.

All efforts to understand this in dictionaries or lists have confused me. I now seem to have every Python book!

Many thanks to anyone who can offer advice. (if interested, it's arp, mac and vendor codes)

+3  A: 

Start with this.

def loadDictionaryFromAFile( aFile ):
    dictionary = {}
    for line in aFile:
        fields = line.split('\t')
        dictionary[fields[0]]= fields

dict2 = loadDictionaryFromAFile( open("file2","r" )
dict3 = loadDictionaryFromAFile( open("file3","r" )

for line in open("file1","r"):
    fields = line.split("/t")
    d2= dict2.get( fields[0], None )
    d3= dict3.get( fields[0], None )
    print fields, d2, d3

You may want to customize it to change the formatting of the output.

S.Lott
+5  A: 

Untested:

f1 = open("file1.txt")
f2 = open("file2.txt")
f3 = open("file3.txt")

v1 = [line.split() for line in f1]
# dict comprehensions following, these need Python 3
v2 = {vals[0]:vals[1] for vals in line.split() for line in f2}
v3 = {vals[0]:vals[1] for vals in line.split() for line in f3}

for v in v1:
  print( v[0] + "\t" + v[1] + "\t" + v2.get(v[1],"blank ") + "\t" + v3.get(v[1],"blank ") )
balpha
+1 Pythonic solutions look so pretty
Kai
Dammit. Those dict comprehensions are so awesome. Once cherrypy, mako, django, beautfiulsoup, twisted, numpy, nltk, et al. standardize on Python 3 I am SO there.
Triptych