Monday, August 21, 2017

Voting Districts Day 13: Rework and Shapely Saves the Day

I reached an impasse a while back after realizing that I needed a means of determining whether voting districts shared boundaries.  I built the algorithm to cluster the voting districts together, but the clustering fell apart when my initial points were out in the wilderness or towards the end when the process had slim pickings.  Here's an example of my starting points (green Xs) and the final central points (pink diamonds) for each clustered group.  you can see it completely fell apart at the western part of the state.

Enter the Python module Shapely, which has a great touches function where you can see if two voting districts share a boundary, which I can make a requirement before associating a voting district with other districts.

Additionally, shapely does a lot of other great things that makes this process much easier.  One of the big ones is that it allows you to create polygon objects, which has lots of helpful methods, such as finding a centroid, getting the bounds and area, etc.  

So, I'm working on reworking pretty much everything to use the Shapely module.  Stay tuned.

Monday, April 24, 2017

Voting Districts Day 12: Coloring our Shapes

Last time, we were able to create a map through Python and Matplotlib with our shapefile consisting of all of the NC voting districts. This time, we want to map it, but give each shape a different color.  

This can be accomplished by providing the facecolor variable when we add our voting district shapefile patch collection with an array parameter instead of the 'green' that we provided it last time.  

Eventually, we want to specify a color by the chosen voting district.  That'll be for next time.  This time, we'll just create an array of random colors.

I do know that we'll have thirteen voting districts.  I'm going to create a dictionary consisting of each voting district number and a corresponding color:

color_switch = {0: 'teal', 1:'red',2:'blue',3:'green',4:'purple',5:'brown',6:'orange',7:'white',8:'black',9:'tan',10:'lightblue',11:'pink',12:'yellow'}

From there, build a list by looping through the number of voting districts, selecting one of those items, and appending it to that list we'll eventually use to color the shapes.  to select something at random, we'll use the aptly named random module.

import random
color_choice = [] #list for our colors

#for each shape in our shapefile, pick a random color and append to our color choice list.
for shape in shapes: color_choice.append(random.choice(color_switch))

And then apply the list as an array (thanks to numpy) when we add the collection to the map.  

import numpy as np
ax.add_collection(PatchCollection(patches, facecolor= np.array(color_choice), edgecolor='k',  linewidths=0.2, zorder=2))

And you get a map like so!



Here's all the code to create the map from the beginning:

import matplotlib.pyplot as plt #what I need to plot stuff to my map.
from mpl_toolkits.basemap import Basemap #what I need to create my basemap.
import shapefile #what I need to read the shapefile from the NC SBE.
from pyproj import Proj #module used to change our projection from the nc to the traditional

from matplotlib.patches import Polygon #used to convert our newly reprojected coordinates to a polygon/patch shape that matplotlib can plot.  
from matplotlib.collections import PatchCollection #we'll be adding all of our voting districts to a patch collection and then plotting that collection.  

import numpy as np #allows us to better interact with arrays, which is the structure used with polygons/ patches.
import random #use this to select a random color from our voting district color dictionary.

conv_coords = list() #hold our converted coordinates here as we apply each set of converted coordinates that make up our shape.  
patches = [] #list/collection we'll be sticking our patches/polygons/voting districts into.  This'll pass as the points within our point collection.  

#projection type for the voting districts.
nc = Proj("+proj=lcc +lat_1=34.33333333333334 +lat_2=36.16666666666666 +lat_0=33.75 +lon_0=-79 +x_0=609601.2192024384 +y_0=0 +datum=NAD83 +to_meter=0.3048006096012192 +no_defs ", preserve_units=True)

vote = shapefile.Reader('ncsbe\\Precincts.shp') #creates an instance that has the lists of data we want.
shapes = vote.shapes() #lists of coordinates making up the shape for each voting district.

for x in range(0,len(shapes)): #for each voting district...
    conv_coords.append([]) #apply a new list to our main list consisting of all the shapes.
    for y in range(0,len(shapes[x].points)): #for each set of coords in the shape...
        lon, lat = nc(shapes[x].points[y][0], shapes[x].points[y][1], inverse=True) #convert the shape file points into traditional lat/long coords.
        conv_coords[x].append([lon,lat]) #write the converted coordinates to the new sublist. 
    patches.append(Polygon(np.array(conv_coords[x]), True)) #sublist consisting of all the shapes coordinates is complete.  append to patchcollection list.  


#dictionary that associates a voting district to a color.
color_switch = {0: 'teal', 1:'red',2:'blue',3:'green',4:'purple',5:'brown',6:'orange',7:'white',8:'black',9:'tan',10:'lightblue',11:'pink',12:'yellow'}
color_choice = [] #list to hold our color selections.

#for each shape in our voting district shapefile.
for shape in shapes: color_choice.append(random.choice(color_switch))

#create our basemap.
m = Basemap(projection= 'cyl', lon_0 = -80, lat_0 = 35, llcrnrlon=-84.9,llcrnrlat=33.5,urcrnrlon=-75.,urcrnrlat=36.6, resolution='i')

#create a figure/subplot that'll be our canvas for the voting districts.  plt is the main plot object.  
fig     = plt.figure(figsize=(20,6)) #provide a parameter for figsize to make a wide canvas (NC's long shape).
ax      = fig.add_subplot(111)

#add the general stuff we want on our map that comes with basemap.
m.drawcountries(linewidth=0.5)
m.drawcoastlines(linewidth=0.5)
m.drawstates(linewidth=0.5)

#and with that subplot, apply our patch collection (all of our voting district shapes.
ax.add_collection(PatchCollection(patches, facecolor= np.array(color_choice), edgecolor='k',  linewidths=0.2, zorder=2))

#create a file with our basemap and plot overlay.
plt.savefig('vote_map_test_colors.png',dpi=600, alpha=True)
#and complete.
plt.close()

Saturday, April 15, 2017

Voting Districts Day 11: Mapping Voting District Shapefiles with Matplotlib

Now that we can create the most basic of maps, let's see if we can build on top of that and apply our layer of voting districts.

Because the voting district shapefile is using a not so traditional projection with latitude and longitude coordinates that you wouldn't necessarily expect, we can't just simply use the straight forward method in matplotlib to apply the shapefile layer to the map.  

Instead, we have to convert the unorthodox coordinates to the traditional projection to match our map's base and then apply that conversion.  That conversion must be a set of polygons (or patches in a patch collections) that we can subplot on top of our basemap consisting of the state of NC.  

import matplotlib.pyplot as plt #what I need to plot stuff to my map.
from mpl_toolkits.basemap import Basemap #what I need to create my basemap.
import shapefile #what I need to read the shapefile from the NC SBE.
from pyproj import Proj #module used to change our projection from the nc to the traditional

from matplotlib.patches import Polygon #used to convert our newly reprojected coordinates to a polygon/patch shape that matplotlib can plot.  
from matplotlib.collections import PatchCollection #we'll be adding all of our voting districts to a patch collection and then plotting that collection.  

import numpy as np #allows us to better interact with arrays, which is the structure used with polygons/ patches.

conv_coords = list() #hold our converted coordinates here as we apply each set of converted coordinates that make up our shape.  
patches = [] #list/collection we'll be sticking our patches/polygons/voting districts into.  This'll pass as the points within our point collection.  

#projection type for the voting districts.
nc = Proj("+proj=lcc +lat_1=34.33333333333334 +lat_2=36.16666666666666 +lat_0=33.75 +lon_0=-79 +x_0=609601.2192024384 +y_0=0 +datum=NAD83 +to_meter=0.3048006096012192 +no_defs ", preserve_units=True)

vote = shapefile.Reader('ncsbe\\Precincts.shp') #creates an instance that has the lists of data we want.
shapes = vote.shapes() #lists of coordinates making up the shape for each voting district.

for x in range(0,len(shapes)): #for each voting district...
    conv_coords.append([]) #apply a new list to our main list consisting of all the shapes.
    for y in range(0,len(shapes[x].points)): #for each set of coords in the shape...
        lon, lat = nc(shapes[x].points[y][0], shapes[x].points[y][1], inverse=True) #convert the shape file points into traditional lat/long coords.
        conv_coords[x].append([lon,lat]) #write the converted coordinates to the new sublist. 
    patches.append(Polygon(np.array(conv_coords[x]), True)) #sublist consisting of all the shapes coordinates is complete.  append to patchcollection list.  

#create our basemap.
m = Basemap(projection= 'cyl', lon_0 = -80, lat_0 = 35, llcrnrlon=-84.9,llcrnrlat=33.5,urcrnrlon=-75.,urcrnrlat=36.6, resolution='i')



#create a figure/subplot that'll be our canvas for the voting districts.  plt is the main plot object.  
fig     = plt.figure()
ax      = fig.add_subplot(111)

#add the general stuff we want on our map that comes with basemap.
m.drawcountries(linewidth=0.5)
m.drawcoastlines(linewidth=0.5)
m.drawstates(linewidth=0.5)

#and with that subplot, apply our patch collection (all of our voting district shapes.
ax.add_collection(PatchCollection(patches, facecolor= 'green', edgecolor='k',  linewidths=0.2, zorder=2))

#create a file with our basemap and plot overlay.
plt.savefig('vote_map_test.png',dpi=600, alpha=True)
#and complete.
plt.close()

and with that, we get a map with the voting districts.  Now we have to figure out how to apply colors to them so we can represent the different districts...


Monday, April 10, 2017

Voting Districts Day 10: Creating a basic map with Matplotlib

Last time, we converted our coordinates from the weird NC format to the standard lat/long format in anticipation of matplotlib needing a better format. 

This time, we are just trying to create something very basic.  I would love to just see a picture with North Carolina on it, for instance.  

So, the package that we need to do this from matplotlib is: 
from mpl_toolkits.basemap import Basemap

To create a basemap, use the following command:
m = Basemap(projection= 'cyl', lon_0 = -80, lat_0 = 35, llcrnrlon=-84.9,llcrnrlat=33.5,urcrnrlon=-75.,urcrnrlat=36.6, resolution='i')

There are a lot of properties in here.  lon_0 and lat_0 set the center point of the map.  llcrnrlon (left lower corner longitude), llcrnrlat (lower left latitude), urcrnrlon (upper right longitude), and rucrnrlat (upper right latitude) set the corner points of the map.  resolution seems pretty general.  h is really a high resolution, but i seems to be adequate for me.  Finally, projection appears to be the type of map.  cyl is the default value and, according to the matplotlib documentation, it seems to be the easiest to get along with.  I'm probably going to need that to deal with the custom North Carolina projections.  All the properties can be found here.

You can also add standard map stuff to your basemap, like coastline boundaries, state/ county/ country boundaries.  I want the coastlines and an outline of the state.

To plot my results, I'm going to need matplotlib pyplot package.  

Putting it all together...
from mpl_toolkits.basemap import Basemap
import matplotlib.pyplot as plt
m = Basemap(projection= 'cyl', lon_0 = -80, lat_0 = 35, llcrnrlon=-84.9,llcrnrlat=33.5,urcrnrlon=-75.,urcrnrlat=36.6, resolution='i')
m.drawstates(linewidth=0.5)
m.drawcoastlines(linewidth=0.5)

#save a picture of our map.
plt.savefig('basic_map.png',dpi=300)

And here's what we get!  We still need a little tweaking, but not too terrible.



Wednesday, April 5, 2017

Voting Districts Day 9: Back to Python for Mapping

Because of the large size of the resulting JSON file, I don't think I'm going to be able to use the leafletjs javascript module to create a map.  I think that Python is my best bet.  

Mapping in Python might not be so bad anyway.  Matplotlib has a bunch of tools available, particularly modules Basemap and pyplot.  

This page has a lot of great examples that I can use to better understand what's going on.  The trick, though, is getting my shapefile read into it...

Looking at the documentation, I have to figure out some way of dealing with projections again.  My voter district shapefile isn't the standard lat/long values. 
Pyproj has a great method where you can set the inverse property and spit out lat long coordinates from the weird NC coordinates.  So, I'll create a big list of all the coordinates of all the shapes after running them through this method.

import shapefile
from pyproj import Proj

vote = shapefile.Reader('ncsbe\\Precincts.shp') #creates an instance that has the lists of data we want.

shapes = vote.shapes() #create lists of coordinates making up the shape for each voting district.
#for each shape in the shapefile...
for x in range(0,len(shapes)):
    #for each point in the shape...
    for y in range(0,len(shapes[x].points)):

        lon, lat = nc(shapes[x].points[y][0], shapes[x].points[y][1], inverse=True) #convert the shape file points into traditional lat/long coords.

and that's going to give me some data that I can plot according to the documentation in matplotlib (I think).

Wednesday, March 8, 2017

Voting Districts Day 8 Changing Projections

Well, I can't map my json file because the coordinates are not the traditional latitude/longitude values, but numbers in the 800Ks.  Here's an example of a coordinate:  [1884558.6496061385, 851226.325625971]

Why's this?  The shapefile I downloaded is using a different, less popular, projection to describe where the shapes go in relation to the earth.  So, I need to change the projection of the voting district shapefile to have traditional latitude longitude coordinates.

There's a python package called pyproj that'll hopefully help me.  I was able to install the binary version with pip.  

What projection do I use?  Well, to figure this out, I looked at the prj file that came with the voting district's shapefile and it said:  NAD_1983_StatePlane_North_Carolina_FIPS_3200_Feet.

Thanks to spatialreference.org, I was able to get the projection the pyproj needed here (clicked the Proj4 link).  I plugged the value in the Proj4 link into the right parameter using pyproj's Proj class:

from pyproj import Proj
myProj = Proj("+proj=lcc +lat_1=34.33333333333334 +lat_2=36.16666666666666 +lat_0=33.75 +lon_0=-79 +x_0=609601.2199999999 +y_0=0 +ellps=GRS80 +datum=NAD83 +to_meter=0.3048006096012192 +no_defs")

and then test it out using a coordinate from the voting district shapefile using the corresponding method for Proj and the myProj object I created.

 lat = my_json['features'][1000]['geometry']['coordinates'][0][0][1]
 lon = my_json['features'][1000]['geometry']['coordinates'][0][0][0]
 longi, latti = myProj(lon, lat, inverse=True)

From that, the long and lat values looked to be in the right ballpark (NOT!).  

It turns out that the projection requires another parameter according to this forum post due to the measurements being different (feet vs meters).  I need to set preserve_units to True.    

nc = Proj("+proj=lcc +lat_1=34.33333333333334 +lat_2=36.16666666666666 +lat_0=33.75 +lon_0=-79 +x_0=609601.2192024384 +y_0=0 +datum=NAD83 +to_meter=0.3048006096012192 +no_defs ", preserve_units=True)

Now when I use Proj with the new parameters, I get something much more realistic:  (-81.23183299105735, 36.13090600029268).

So, now I've got to plug the use of pyproj into my overall script, figure out how to convert all of my shapes' coordinates so I can finally create a map!



Tuesday, February 21, 2017

Voting Districts Day 7: Post a VERY basic map

So, I'm starting to figure out how to create a map with a leaflet package in javascript.


Leafletjs' tutorial has been very helpful so far in figuring out how to make a map appear and adding a layer to it. I can then add a layer to the map by pasting the coordinates via a json file. I was able to find a shape representing NC on Github from a gentleman named Johan.

I couldn't get it to work without just cutting and pasting the JSON text into the javascript portion of the file, but I'll work on that later. However, I was able to get a shape of NC to appear down below.



You need to reference the leaflet js in the head of the html.

Note: I stuck some hyphens at the beginning of the tag so it would show on the page below.
<--link href="docs/images/favicon.ico" rel="shortcut icon" type="image/x-icon">
<--link href="https://unpkg.com/leaflet@1.0.3/dist/leaflet.css" rel="stylesheet">
<--script src="https://unpkg.com/leaflet@1.0.3/dist/leaflet.js"><--/script>

You can then add the map and then the layer once you create it as a variable.

<--div id="mapid" style="height: 400px; width: 800px;">
<--/div>
<--script>
***Create Map***
var mymap = L.map('mapid').setView([35.505, -80.09], 7);
***Create layer variable***
var nc = ***JSON goes here***
***Add layer to map***
L.geoJSON(nc).addTo(mymap);
<--/script>

NOTE: To get the map to appear in Blogger, you have to use the HTML button, which is next to the compose button. I couldn't figure out to have my cake and eat it too. So, I typed up EVERYTHING manually, including the breaks and such, in html. Thank heavens for the preview button!

And that was the simplest way I could find to do this. This javascript is very new and foreign to me.

Next up, we are going to figure out how to create a json with our voting data (back to Python!) and shapefiles and how to read that in javascript as an external file.

Monday, February 6, 2017

Voting Districts Day 6: Voting District Draft

Next up, we are going to go through our starting points where each will take turns choosing a voting district that is closest to them. This draft of sorts is intended to cluster the voting districts together to build our congressional districts.

To do this, we are going to continually pick the closest voting district for each point until there are no more voting districts left to pick. As voting districts are picked, we will associate the voting district to the point by appending the point and its distance to the voting district to our big list consisting of each voting district and their center points (the value we used to measure the distance to the starting point). We'll also take the voting district away as an option by replacing the distance value in the matrix from the measure to 'xxxxx.' We'll know when we are done when all the values in one of the matrix rows consists of all 'xxxxx' values.

Here's the code I wrote with the usual verbose comments:

#Our list to iterate through the points, giving each a turn to get its closest voting district.
point_list = list()

#Next, until all the voting districts are picked, round robin through the points, designating the closest available voting
#district and removing that voting district as an option for the next pick.  For the ones already found, mark out with xxxxx, which will also eliminate it from being
#found as the minimum for the next point check.  Do this until all the voting districts equal xxxxx, which means they've all been designated to a point.  

#we need a variable that'll be used to hold how many voting districts haven't been chosen.  
choices_left = 1

#while there are still voting districts unspoken for...
while (choices_left > 0):
    #I want to pull an item from a list of the points using pop, but make the pop or the list random...
    #can do this with random.shuffle, then pop the last one out!
    
    #when the list is empty, fill it back up with the number of points and shuffle that list.
    if (len(point_list) == 0):
        for e in range(0,len(start_points)): point_list.append(e)
        random.shuffle(point_list)
    point_choice = point_list.pop() #pull a point.
    min_distance = min(matrix[point_choice]) #get the least distance found between a voting district and that point.
    dist_loc = matrix[point_choice].index(min_distance) #figure out which voting district has that minimum distance.
    coords[dist_loc].append([point_choice, min_distance]) #and apply the point and distance to that district as the points choice.
    #then mark that voting district as chosen by changing the distance values corresponding to that district to xxxxx.
    for f in range(0, len(matrix)):
        matrix[f][dist_loc] = 'xxxxx'

    #figure out how many voting districts have yet to be spoken for by counting how many in one line of the matrix do not equal xxxxx.   
    choices_left = len([g for g in matrix[0] if g != 'xxxxx'])
    #print(str(point_choice) + ' ' + str(min_distance) + '  ' + str(dist_loc) + '  ' + str(choices_left))

Next, we've got to figure out what to do with this data. It would be great to build a map with it to see what it looks like, but I'm not sure exactly how to pull that off.

It looks like javascript and leaflet.js may be a nice option, but there's definitely a learning curve there for me. I'll research and see what I can do.

Thursday, February 2, 2017

Voting Districts Day 5: The district/distance matrix

Next, I'm going to build a matrix showing the distance between each voting district and the start points.  

To figure out the distance between the two points, I'm going to attempt to use the pythagorean theorem (a² + b² = c²) where a equals the difference between the x coordinates and b equals the difference between the y coordinates.  This'll be a measurement "as the crow flies." 

Our point shared by the x and y axis will have the same latitude as one point and the same longitude as the other point to make a right triangle.  To calculate the distance for the x and y lines, we'll subtract the latitudes and longitudes of the points between the centroid and the starting point.  

(voting_district_x - starting_point_x)² + (voting_district_y - starting_point_y)² = (the_distance between the points)²

OR

math.sqrt(((start_points[a][0] - coords[b][2]) ** 2) + ((start_points[a][1] - coords[b][5]) ** 2))

Now I just have to do a double loop to build the matrix showing the distance between each voting district and starting point like so:

matrix = list()

for a in range(0, len(start_points)):
    point_line = list()
    for b in range(0, len(coords)):
        point_line.append(math.sqrt(((start_points[a][0] - coords[b][2]) ** 2) + ((start_points[a][1] - coords[b][5]) ** 2)))
    matrix.append(point_line)

And that should give me a nice matrix that I can run a draft through to assign voting districts to starting points!  That's what we're going to work on next.  

Wednesday, February 1, 2017

Voting Districts Day 4: Starting with Random Points

Now that we've got the center points for our voting district shapes, we need to figure out how to create central points where we can start our clustering.  

There is a random library included with Python that I hope can do the job.  It has a uniform function that will give me a random float between two numbers that I designate.  I'll just loop through to generate these random coordinates based on how many clusters I want to build.

First, I want to make sure these starting coordinates fall somewhere approximately within the state of NC.  So, I'm going to get the total min and max x and y axis values by writing ALL of the x and y points to two lists and pull a min and max from those lists to create a range for the random numbers.

import random
...
xmids = list()
ymids = list()

for x in range(0,len(shapes)):

    for y in range(0,len(shapes[x].points)):

        xmids.append(shapes[x].points[y][0]) 
        ymids.append(shapes[x].points[y][1])


start_points = list()
xmax_o = max(xmids)
xmin_o = min(xmids)
ymax_o = max(ymids)
ymin_o = min(ymids)

From there, we'll use the random.uniform method through a loop to create whatever number of points we want to start with.
start_point_count = 12 for z in range(0,start_point_count):
 start_points.append([random.uniform(xmin_o,xmax_o),random.uniform(ymin_o, ymax_o)])

And we now have our random starting points!

Next up, we are going to build ourselves a matrix consisting of the distances between the centroids of our voting district shapes and our starting points.  From there, we will then figure out which voting district should be associated with each point by going through the voting districts draft style where each point will take a turn picking its closest voting district.

Sunday, January 22, 2017

Voting Districts Day 3: Yet Another Package Change to Pyshp

Still having trouble with Fiona.  So, I'm trying another package for reading shapefiles:  pyshp.

Pyshp installs through pip without issue (as long as I do it as an admin).  Hooray!  

pyshp comes with the shapefile library, which reads a shapefile into a structure of lists and dictionaries.  

What should I read?  The NC Board of Elections has a shapefile that has  all of the voting districts available on their FTP site.

And after downloading, we can read it like so:
import shapefile
vote = shapefile.Reader('ncsbe\\Precincts.shp') #creates an instance that has the lists of data we want.
shapes = vote.shapes() #lists of coordinates making up the shape for each voting district.  

To figure out the center of the shapefile,  I hope this isn't too simple:
What I should do is get the min/max for both the x and y, then average that.  I'll put it all in a list.

coords = list()

for x in range(0,len(shapes)):
xmin = 10000000
xmax = -10000000
ymin = 10000000
ymax = -10000000
for y in range(0,len(shapes[x].points)):
xmin = min(xmin, shapes[x].points[y][0])
xmax = max(xmax, shapes[x].points[y][0])
ymin = min(ymin, shapes[x].points[y][1])
ymax = max(ymax, shapes[x].points[y][1])
coords.append([xmin, xmax,(xmin + xmax)/2, ymin,ymax, (ymin + ymax)/2])

If I want the metadata for each shape through the records method, this is how to do that.  

recs = vote.records()

For now, I just care about the shapes and their distance relative to one another.  

Sunday, January 1, 2017

Voting Districts Day 2: Wrong Python Packages?

I think that I'm going to switch up and use some different packages for reading this spatial data.  In order to use the SciPy stuff, I still need more packages and those packages require some unorthodox installation methods.  It includes a ton of stuff I'm not sure I need.  So, I'm going to do this one package at a time.  Well, two in this case.  

It looks like Shapely and Fiona can do what I want initially, which is to read shapefiles and plot them.  

When I try to use pip for the install ("pip install Fiona"), it doesn't work.  It says I need Microsoft Visual C++.  

So, after searching the internet, I discovered that I have to install the packages via a wheel.  A plethora of these wheels can be found here at this University of California - Irvine website here.  

You also use pip to install these wheels; you just have to download them into the same folder first.  Also, I have to run the command prompt as an administrator via a right click when starting the application.  

Shapely seemed to work like a charm when attempting the old "import shapely" line in Python.  Fiona had other ideas.

Looking at Fiona's documentation, she requires a GDAL package, which was also available on the same UC Irvine page that had the other wheels.  I installed that one and was finally able to run "import Fiona" successfully.  

So, next time, we'll see about reading the shapefiles.