tags:

views:

138

answers:

4

Hello, I have csv file having some address data mostly in Finnish language. I need to read that file and getting some geocode information of these address. But It doesn't work for Finnish alphabet and says it cant read those! Can anybody please help me out of this?

import urllib,urllib2,time

addr_file = 'address.csv'
out_file = 'addresses_geocoded.csv'
out_file_failed = 'failed.csv'
sleep_time = 2
root_url = "http://maps.google.com/maps/geo?"
gkey = "asfasdfasdfasdf"       # not an actual value
return_codes = {'200':'SUCCESS',
                         '400':'BAD REQUEST',
                         '500':'SERVER ERROR',
                         '601':'MISSING QUERY',
                         '602':'UNKOWN ADDRESS',
                         '603':'UNAVAILABLE ADDRESS',
                         '604':'UNKOWN DIRECTIONS',
                         '610':'BAD KEY',
                         '620':'TOO MANY QUERIES'

                         }
def geocode_for_musiquitous(addr_file,out_fmt='csv'):
        #encode our dictionary of url parameters
        values = {'q' : addr_file, 'output':out_fmt, 'key':gkey}
        data = urllib.urlencode(values)
        #set up our request
        url = root_url+data
        req = urllib2.Request(url)
        #make request and read response
        response = urllib2.urlopen(req)
        geodat = response.read().split(',')
        response.close()

        # this section is just handle the data returned from google
        code = return_codes[geodat[0]]
        if code == 'SUCCESS':
                code,precision,lat,lng = geodat
                return {'code':code,'precision':precision,'lat':lat,'lng':lng}
        else:
                return {'code':code}

def main():
#open  i/o files
        outf = open(out_file,'w')
        outf_failed = open(out_file_failed,'w')
        inf = open(addr_file,'r')
        for address in inf:
            #get latitude and longitude of address
                data = geocode_for_musiquitous(address)


            #output results and log to file



                if len(data)>1:
                        print "Latitude and Longitude of "+address+":"
                        print "\tLatitude:",data['lat']
                        print "\tLongitude:",data['lng']
                        outf.write(address.strip()+data['lat']+','+data['lng']+'\n')
                        outf.flush()
                else:
                        print "Geocoding of '"+addr_file+"' failed with error code "+data['code']
                        outf_failed.write(address)


                        outf_failed.flush()

                time.sleep(sleep_time)

                #clean up
        inf.close()
        outf.close()
        outf_failed.close()

if __name__ == "__main__":
        main()
A: 

I don't know Python, but I'm pretty sure this is an encoding issue.

Make sure your address file is UTF-8 encoded and that urlencode() function you use can deal with UTF-8 characters (the latter shouldn't be a problem though, as Python can handle UTF-8 natively as far as I know).

Pekka
A: 

Use the codecs module.

codecs.open():

codecs.open(filename, mode[, encoding[, errors[, buffering]]])

Open an encoded file using the given mode and return a wrapped version providing transparent encoding/decoding. The default file mode is 'r' meaning to open the file in read mode.

You can use wrapped file objects to read encoded files into unicode strings.

gimel
+1  A: 

The argument of urllib.url should be UTF-8 encoded beforehand:

addr_file = addr_file.encode("utf-8")
values = {'q' : addr_file, 'output':out_fmt, 'key':gkey}
data = urllib.urlencode(values)

And make sure you open the CSV file with the correct encoding (might be "windows-1252" or "iso-8859-1"):

inf = codecs.open(addr_file, 'r', 'iso-8859-1')
AndiDog
Thanks a lot ! It works...
rahman.bd
A: 

You need to open file using the correct encoding using the codecs module. The correct encoding for Finnish is probably ISO-8859-1

inf = codecs.open(addr_file,'r', 'iso-8859-1')

If this is not the correct encoding for your file you need to find out what the correct encoding for you file is then check whether a codec for it is available like below:

import codecs
codec = codecs.lookup("iso-8859-1'")
print codec.name

If codecs.lookup() returns a codec object for the encoding you a looking for then it is available and can be used in codecs.open().

Tendayi Mawushe
Thanks..a lot to you!...
rahman.bd