Monday, October 28, 2013

Mapping Shapefiles from the State of North Carolina

I've been looking forever on how to use shapefiles originating from North Carolina for making maps.  Normally, I'll get crazy latitude and longitude coordinates if I plot the shapefiles using the default parameters through R's rgdal package.

Tonight, I stumbled upon this StackOverflow post, which shows that TWO functions are needed in order to fully utilize NC shapefiles: readOGR and spTransform.

I've always just used readOGR, which works fine for shapefile originating from other places, like the US Census bureau.  I've always had problems with files from my state, however, until I came across the aforementioned StackOverflow post.

Here's what I put into R to make a simple map of all roads in Wake County:

roads <- readOGR("c:\\data\\poly\\wake_streets\\streets.shp","streets", p4s = CRS("+proj=lcc +lat_1=34.33333333333334 +lat_2=36.16666666666666 +lat_0=33.75 +lon_0=-79 +x_0=609601.2199999997 +y_0=0 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=us-ft +no_defs"))
roadmap <- spTransform(roads, CRS("+proj=longlat +datum=WGS84"))
plot(roadmap,axes=T)
title("Roads in Wake County, North Carolina")

This'll produce a map like so:



If you want to try this at home, you can get a bunch of shapefiles from North Carolina's State Board of Elections FTP site that seem to work with this code.

Now that this mystery has finally been uncovered, my mapping options are much improved!

Wednesday, October 2, 2013

Getting Geocodes through R and Google's Web Service

Part of my new job as a Data Integration Analyst is learning how to study and manipulate data.  So far, I've really enjoyed this new challenge and I love having the opportunity to learn something new.  

I learned pretty quickly that R is a pretty popular programming language within the realm of data and analytics.  By itself, R can perform some complex data analysis. However, packages provided by other R enthusiasts can be loaded into the R interface to make it more powerful.  I've spent the last few months getting more familiar with the language and additional packages and learning to appreciate it.  Although I still have a lot to learn, I can already see that R can do a lot of really cool stuff.

One aspect of analytics that I've been particularly fascinated with involves analyzing data through geography.  R has a lot of packages that make this pretty straightforward.  The ones I've seen so far are great, but, in order to map a specific place, you need geocoordinates (latitude and longitude points).  Providing just an address to R and one of these mapping packages won't do.

I really want to map some data regarding voters in my home county, Wake county, North Carolina.  I think I figured out how to do it.

Google provides a free web service that allows you to collect geocoordinates for any address. All you have to do is provide Google with a residential address through a URL.

R has a function that allows you to collect data through the web.  It's as easy as this:

getweb <- url('http://maps.googleapis.com/maps/api/geocode/xml?address=1600+Pennsylvania+Avenue,+20500&sensor=true')
getaddress <- readLines(getweb)
close(getweb)

I've just requested the geocoordinates of the White House, placed the results in another object, then closed the connection with Google.

Google returns the data in an XML string, which is now in my 'getaddress' object.  Google can also return JSON, but R has a package that can interpret XML for you. Once you install the package, you can collect the coordinates from the XML like so:

lng <- xmlValue(getNodeSet(xmlParse(y),'//result//geometry//location//lng')[[1]])
lat <- xmlValue(getNodeSet(xmlParse(y),'//result//geometry//location//lat')[[1]])

You now have coordinates!  Using one of R's available mapping packages, you can plot it like so.  




This simple map was created using one of the easier of R's maps packages to create a map. Here's the process:

map('usa',bg='lightblue',col='tan',fill=T)
points(lng,lat,pch='*',cex=10,col='red')

This is really just a glimpse into the world of mapping through R.  There's a ton of resources out there that allow you to map all sorts of regions, locations, boundaries, and landmarks. 

The possibilities are endless.

NOTE:  Google is very generous to provide geocoordinates for free.  However, they do limit the number of daily queries for each person to 2500.