tags:

views:

791

answers:

2

I have a dataframe with Address, City, State, Zip entities. From there, I'm trying to use the Yahoo APIs to Geocode each address.

I'm basing this off the code in O'Reilly's Data Mashups using R Tutorial. The original example takes a vector of street addresses and uses a hard-coded city. I'm trying to make a dynamic example that supports multiple cities.

The abbreviated version of the code is:

    geocodeAddresses<-function(myStreets)
    }
  appid<-'<put your appid here>'
          baseURL<-"http://local.yahooapis.com/MapsService/V1/geocode?appid="
          myGeoTable<-data.frame(address=character(),lat=numeric(),long=numeric(),EID=numeric())
          for (myStreet in myStreets){  
            requestUrl<-paste(baseURL, appid, "&street=", URLencode(myStreet$address),"&city=",URLencode(myStreet$city),"&state=",URLencode(myStreet$state),sep="")
            xmlResult<-xmlTreeParse(requestUrl,isURL=TRUE,addAttributeNamespaces=TRUE)
            geoResult<-xmlResult$doc$children$ResultSet$children$Result
            lat<-xmlValue(geoResult[['Latitude']])
            long<-xmlValue(geoResult[['Longitude']])
            myGeoTable<-rbind(myGeoTable,data.frame(address=myStreet,Y=lat,X=long,EID=NA))
          }
    }

When I try and reference myStreet$City and myStreet$Address, I receive error

$ operator is invalid for atomic vectors

Other than looping through data frame myStreets, I don't know how I can make the call to the Yahoo API only once for each row and store both the long/lat for each member.

+3  A: 

If myStreets is data.frame then for loop takes each column of it. So first step takes Addres and Addres$City doesn't make sense.

You could change for condition to loop over rows:

for (i in 1:nrow(myStreets))  {
   myStreet <- myStreets[i,]
   # rest is the same
}

To optimized your code you can also do something like:

myGeoTable <- data.frame( address=myStreet$address, lat=NA_real_, long=NA_real_, EID=NA_real_)
for (i in 1:nrow(myStreets))  {
  myStreet <- myStreets[i,] 
  requestUrl <- ...
  ...
  myGeoTable[i,2:4] <- c(lat,long,NA)
}
Marek
+2  A: 

If you're going to do this, I wouldn't talk about it in public. It's against their terms of service. I'd suggest using USC webgis instead. A couple of months ago I geocoded around half a million records without too many problems.

hadley
I'll have to check the TOS a little better. I thought that since O'Reilly uses it in a tutorial, and we only get 5k/24 hour period, that its fair game. Thanks for the heads-up. I'll be sure and look at USC webgis.
Neil Kodner