tags:

views:

70

answers:

3

I have a file in the following format:

 [s1,s2,s3,s4,...] SOME_TEXT
 (per line)

For example:

 [dog,cat,monkey] 1,2,3
 [a,b,c,d,e,f] 13,4,6

the brackets are included.

let's say I have another field like this, which contains two lines:

 [banana,cat2,monkey2] 1,2,3
 [a2,b2,c2,d,e,f] 13,4,6

I want to take two files of this form and align them the following way:

 [dog^banana,cat^cat2,monkey^monkey2] 1,2,3
 [a^a2,b^b2,c^c2,d^d2,e^e2,f^f2] 13,4,6

while making sure that "SOME TEXT" in corresponding lines (such as 1,2,3 and 13,4,6) is the same and that the number of elements in the brackets in each corresponding line is the same. What would be a quick compact way to do it?

Thanks.

A: 

I'd use a regex to chop off everything after the first ] (and hang on to it). Then another regex to explode the string into an array. Then do whatever you need to do to it with regards to merging different arrays from different files, and then piecing it all back together shouldn't be too hard. I'll leave the regex's as an exercise for the reader :-)

fredley
A: 
for l, m in zip(f1, f2):
    l_head, l_tail = l.strip("[ ").split("]")
    m_head, m_tail = m.strip("[ ").split("]")

    l_head = l_head.split(",")
    m_head = m_head.split(",")
    assert len(l_head) == len(m_head)

    l_tail = l_tail.split(",")
    m_tail = m_tail.split(",")
    assert len(l_tail) == len(m_tail)

    ...

I haven't given your variables good names because I don't know what they are. I would name them something more useful.

I also haven't written the code for reassembling the lines. It shouldn't be too hard...

katrielalex
+3  A: 
def read_file(fp,hash):
    for l in fp:
        p = l[1:].find(']')
        k = l[p+3:-1]
        v = l[1:p+1].split(",")
        if k not in hash:
            hash[k] = v
        else:
            hash[k] = zip(hash[k], v)

hash = {}

for fname in ('f1.txt', 'f2.txt'):
    with open(fname) as fp:
        read_file(fp, hash)

for k,v in hash.items():
    print "[{0}] {1}".format(",".join("^".join(vv) for vv in v), k)

This is a basic way to do it, if you need the lines in the files in the order they were read you'll have to do a bit more work.

Here's the output I get:

[a^a2,b^b2,c^c2,d^d,e^e,f^f] 13,4,6
[dog^banana,cat^cat2,monkey^monkey2] 1,2,3

Edit:

This also assumes that each key ie. 13,4,6 appears once in a file. If it can appear multiple times you'll have to change the hash[k] = zip(hash[k],v) to something more elaborate such has

if k not in hash:
    hash[k] = [[vv] for vv in v]
else:
    for i,vv in enumerate(v):
        hash[k][i].append(vv)
GWW
This is how I looked at it too. Alternatively, I wonder if there's merit to skipping the split(",") and just storing the value as the raw string from the file. hash[k] = hash[k] + "," + v
gbc
If you skip splitting the value it's messier to merge later on with other values. However, the key doesn't have to be split
GWW
Indeed, I see what you mean. I skimmed over the important bit of joining with "^"!
gbc