Wednesday, October 2, 2013

Getting Geocodes through R and Google's Web Service

Part of my new job as a Data Integration Analyst is learning how to study and manipulate data.  So far, I've really enjoyed this new challenge and I love having the opportunity to learn something new.  

I learned pretty quickly that R is a pretty popular programming language within the realm of data and analytics.  By itself, R can perform some complex data analysis. However, packages provided by other R enthusiasts can be loaded into the R interface to make it more powerful.  I've spent the last few months getting more familiar with the language and additional packages and learning to appreciate it.  Although I still have a lot to learn, I can already see that R can do a lot of really cool stuff.

One aspect of analytics that I've been particularly fascinated with involves analyzing data through geography.  R has a lot of packages that make this pretty straightforward.  The ones I've seen so far are great, but, in order to map a specific place, you need geocoordinates (latitude and longitude points).  Providing just an address to R and one of these mapping packages won't do.

I really want to map some data regarding voters in my home county, Wake county, North Carolina.  I think I figured out how to do it.

Google provides a free web service that allows you to collect geocoordinates for any address. All you have to do is provide Google with a residential address through a URL.

R has a function that allows you to collect data through the web.  It's as easy as this:

getweb <- url('http://maps.googleapis.com/maps/api/geocode/xml?address=1600+Pennsylvania+Avenue,+20500&sensor=true')
getaddress <- readLines(getweb)
close(getweb)

I've just requested the geocoordinates of the White House, placed the results in another object, then closed the connection with Google.

Google returns the data in an XML string, which is now in my 'getaddress' object.  Google can also return JSON, but R has a package that can interpret XML for you. Once you install the package, you can collect the coordinates from the XML like so:

lng <- xmlValue(getNodeSet(xmlParse(y),'//result//geometry//location//lng')[[1]])
lat <- xmlValue(getNodeSet(xmlParse(y),'//result//geometry//location//lat')[[1]])

You now have coordinates!  Using one of R's available mapping packages, you can plot it like so.  




This simple map was created using one of the easier of R's maps packages to create a map. Here's the process:

map('usa',bg='lightblue',col='tan',fill=T)
points(lng,lat,pch='*',cex=10,col='red')

This is really just a glimpse into the world of mapping through R.  There's a ton of resources out there that allow you to map all sorts of regions, locations, boundaries, and landmarks. 

The possibilities are endless.

NOTE:  Google is very generous to provide geocoordinates for free.  However, they do limit the number of daily queries for each person to 2500.  

1 comment:

Oscar said...

I would like to thank Ultimate Life Clinic for reversing my father's Amyotrophic Lateral Sclerosis (ALS). My father’s ALS condition was fast deteriorating before he started on the ALS Herbal medicine treatment from Ultimate Life Clinic. He was on the treatment for just 6 months and we never thought my father will recover so soon. He has gained some weight in the past months and he is able to walk with no support. You can reach them through there website www.ultimatelifeclinic.com