tags:

views:

55

answers:

2

I have a dbf table like below which is the result of one to many join from two tables. I want to have unique zone values from one Taxlot id field.

table name: input table
tid ----- zone
1 ------ A

1 ------ A

1 ------ B

1 ------ C

2 ------ D

2 ------ E

3 ------ C



Desirable output table table name: input table
tid ----- zone
1 ------ A, B, C

2 ------ D, E

3 ------ C

I got some help but couldn't make it to work.

inputTbl = r"C:\temp\input.dbf"
taxIdZoningDict = {}
searchRows = gp.searchcursor(inputTbl)
searchRow = searchRows.next()
while searchRow:
   if searchRow.TID in taxIdZoningDict:
      taxIdZoningDict[searchRow.TID].add(searchRow.ZONE)
   else:
      taxIdZoningDict[searchRow.TID] = set() #a set prevents dulpicates!
      taxIdZoningDict[searchRow.TID].add(searchRow.ZONE)
   searchRow = searchRows.next()

outputTbl = r"C:\temp\output.dbf"
gp.CreateTable_management(r"C:\temp", "output.dbf")
gp.AddField_management(outputTbl, "TID", "LONG")
gp.AddField_management(outputTbl, "ZONES", "TEXT", "", "", "20")
tidList = taxIdZoningDict.keys()
tidList.sort() #sorts in ascending order
insertRows = gp.insertcursor(outputTbl)
for tid in tidList:
   concatString = ""
   for zone in taxIdZoningDict[tid]
      concatString = concatString + zone + ","
   insertRow = insertRows.newrow()
   insertRow.TID = tid
   insertRow.ZONES = concatString[:-1]
   insertRows.insertrow(insertRow)
del insertRow
del insertRows
A: 

Hi mikehjun,

Here is a quickly made code in Python that may suit your needs, with minimal fidgeting.

import collections

d = collections.defaultdict(list)

with open("input_file.txt") as f:   
    for line in f:
        parsed = line.strip().split()
        print parsed
        k = parsed[0]
        v = parsed[2]
        d[k].append(v)

for k, v in sorted(d.iteritems()):
    s = " ----- "
    v = list(set(v)) # Must be a library function to do this
    v.sort()
    print k, s,
    for j in v:
        print j,
    print

Hope this helps

Morlock
Use `collections.defaultdict` and save a bit of mess. `d=defaultdict(list)` and the last three lines of the loop simplify to `d[k].append( v )`
S.Lott
Also, use `for k in d: v= items[k]` in the second loop to make it a hair simpler and more pythonic.
S.Lott
@S.Lott Thank you. I'll implement this in my code! Cheers
Morlock
Note, when working with files: 1) **Always** use a context manager, that is to say `with open("input_file.txt") as f:` to ensure that the file gets closed no matter what and 2) Never use `for line in f.readlines():`, which is wasteful. Use `for line in f:`. (Similarly, you can loop over a dict's items with `for k, v in d.iteritems()`, using `iteritems` to avoid making a needless list.) Also, `some_string.split(" ")` is typically spelled `some_string.split()`.
Mike Graham
@Mike Graham Thank you. This is great input. I'll be sure to implement this starting right now in the Python code I am doing this afternoon. Cheers
Morlock
@S.Lott I'm currently working. If you have time, be our guest :) I'll also get a chance to see how exactly you use 'collection.defaultdict'. Cheers
Morlock
I edited in these things and additionally change the loop to be over `sorted(d.iteritems())` so that the order of the results would be the expected order. Note that this makes the solution O(n log n) and that there are plenty of O(n) solutions available.
Mike Graham
A: 

I think Morlock's answer does not satisfy the requirement of dropping duplicates. I would use a defaultdict(set), which will automatically omit dups, instead of defaultdict(list), and thus .add() instead of .append().

Vicki Laidler
It did :) But it got lost in the edits. I added a line 'v = list(set(v)) to fix that. Cheers
Morlock