views:

90

answers:

3

So I have already imported one XML-ish file with 3000 elements and parsed them into a CSV for output. But I also need to import a second CSV file with 'keyword','latitude','longitude' as columns and use it to add the GPS coordinates to additional columns on the first file.

Reading the python tutorial, it seems like {dictionary} is what I need, although I've read on here that tuples might be better. I don't know.

But either way - I start with:

    floc = open('c:\python\kenya_location_lookup.csv','r')
    l = csv.DictReader(floc)
    for row in l: print row.keys()

The output look like:

{'LATITUDE': '-1.311467078', 'LONGITUDE': '36.77352011', 'KEYWORD': 'Kianda'} {'LATITUDE': '-1.315288401', 'LONGITUDE': '36.77614331', 'KEYWORD': 'Soweto'} {'LATITUDE': '-1.315446430425027', 'LONGITUDE': '36.78170621395111', 'KEYWORD': 'Gatwekera'} {'LATITUDE': '-1.3136151425171327', 'LONGITUDE': '36.785863637924194', 'KEYWORD': 'Kisumu Ndogo'}

I'm a newbie (and not a programmer). Question is how do I use the keys to pluck out the corresponding row data and match it against words in the body of the element in the other set?

+1  A: 

Reading the python tutorial, it seems like {dictionary} is what I need, although I've read on here that tuples might be better. I don't know.

They're both fine choices for this task.

print row.keys() The output look like:

{'LATITUDE': '-1.311467078',

No it doesn't! This is the output from print row, most definitely NOT print row.keys(). Please don't supply disinformation in your questions, it makes them really hard to answer effectively (being a newbie makes no difference: surely you can check that the output you provide actually comes from the code you also provide!).

I'm a newbie (and not a programmer). Question is how do I use the keys to pluck out the corresponding row data and match it against words in the body of the element in the other set?

Since you give us absolutely zero information on the structure of "the other set", you make it of course impossible to answer this question. Guessing wildly, if for example the entries in "the other set" are also dicts each with a key of KEYWORD, you want to build an auxiliary dict first, then merge (some of) its entries in the "other set":

l = csv.DictReader(floc)
dloc = dict((d['KEYWORD'], d) for d in l)
for d in otherset:
  d.update(dloc.get(d['KEYWORD'], ()))

This will leave the location missing from the other set when not present in a corresponding keyword entry in the CSV -- if that's a problem you may want to use a "fake location" dictionary as the default for missing entries instead of that () in the last statement I've shown. But, this is all wild speculation anyway, due to the dearth of info in your Q.

Alex Martelli
My bad. The command I entered wasfor row in l: print rowwithout the .keys(). I'm sorry. I'm debugging and fiddled with various types of output to understand how the data is getting stored.
A: 

If you dump the DictReader into a list (data = [row for row in csv.DictReader(file)]), and you have unique keywords for each row, convert that list of dictionaries into a dictionary of dictionaries, using that keyword as the key.

>>> data = [row for row in csv.DictReader(open('C:\\my.csv'),
...                                       ('num','time','time2'))]
>>> len(data)  # lots of old data :P
1410
>>> data[1].keys()
['time2', 'num', 'time']
>>> keyeddata = {}
>>> for row in data[2:]:  # I have some junk rows
...     keyeddata[row['num']] = row
...
>>> keyeddata['32']
{'num': '32', 'time2': '8', 'time': '13269'}

Once you have the keyword pulled out, you can iterate through your other list, grab the keyword from it, and use it as the index for the lat/long list. Pull out the lat/long from that index and add it to the other list.

Nick T
Thanks! I will test this out later today - unfortunately stuck in meetings for a while first.
Thanks - I have your code working, but I'm still don't know syntax for refering to the 'location' part of the dictionary that matches keyword 'kibera' for example.In your example, Keyeddata['32'] returns the {dict} for 'num' = '32'. How would you assign x = the 'time' cooresponding to ('num'=32) ?
A: 

Thanks -

Alex: My code for the other set is working, and the only relevant part is that I have a string that may or may not contain the 'keyword' that is in this dictionary.

Structurally, this is how I organized it:

def main():
    f = open('c:\python\ggce.sms', 'r')
    sensetree = etree.parse(f)
    senses = sensetree.getiterator('SenseMakingItem')
    bodies = sensetree.getiterator('Body')       
    stories = []
    for body in bodies:
            fix_body(body)
            storybyte = unicode(body.text)
            storybit = storybyte.encode('ascii','ignore')
            stories.append(storybit)
    rows = [ids,titles,locations,stories]
    out = map(None, *rows)
    print out[120:121]
    write_data(out,'c:\python\output_test.csv')

(I omitted the code for getting its, titles, locations because they work and will not be used to get the real locations from the data within stories)

Hope this helps.

the code didn't seem to format correctly
Just indent it by four spaces (or mark it and press Ctrl-K).
Tim Pietzcker