Create a Density Map from Excel Data

How do we know if our law libraries, in the early 21st century, are best positioned to serve our modern clientele?  When you consider that many law libraries found their homes in courthouses in the early 20th century, and the changes in the profession since then, it makes you wonder if we have the right locations.

Location, Location, oh Hell with it

If we were fast-food restaurants, we would know.  In fact, not only would we know whether our current location was right, we would move if it wasn’t.  But how do largely non-profit or government law libraries make this type of decision?  Is there data we can use?

I think there is and I found or created a number of Microsoft Excel spreadsheets that included data based on zip codes (post codes).  My goal was to create a map that (a) showed the density of lawyers overlayed with (b) locations of law libraries.  Excel has a Power Map tool available for download but I was not able to get what I wanted from it.

Here’s what I tried next.

Density Map Tools

The goal was to create a heat map.  But using a real map, not just a colored table.  The term of art is choropleth all though it took me an age to figure that out.  I’m not sure in the end a choropleth is exactly what I was looking for but it certainly helped to hone my searching.

OpenHeatMap

OpenHeatMap is a free online tool that I really liked.  The developer has pretty clear documentation, you label your columns in Excel to match the specific information his tool accepts, and you upload your file.  You can then see a Flash-based map based on OpenStreetMap data.  Unfortunately, while the tool worked, my file was so large that it kept crashing my Flash plugin and so was unwieldy.

Google Fusion Tables

I tried Google Fusion Tables next.  This is a beta project from Google and is free.  It has one significant upside, which is that you can geo-reference your data file using Google Maps API.  If you only have a list of postal or zip codes, this can be a huge time saver.  It will look at your zip code, and return a latitude and longitude for your location.

google-maps-fusion-tables-geocode

There is a cap to how many references Google Maps API will geo-reference, which seems to be 2500 in 24 hours unless you have a premium account.  Using Google Fusion Tables seems to bypass this, as I was able to get 10,000 rows geo-coded without having any type of special account.

Unfortunately, I didn’t get the results I expected.  I take complete responsibility for this myself but all of my 20,000+ locations ended up in one of two places in the Atlantic ocean.  When I ran the data through a second time, I ended up with better plotting but still in the wrong place.

This data point should be in south eastern Ontario, not Chicago, Illinois.

This data point should be in south eastern Ontario, not south western Michigan.

I decided I needed a tool that was clearer to me.  I moved on to Tableau.

Intermission:  I Need Latitude and Longitude

Google’s Map API provides this information within Google Fusion Tables but I was working with solely post codes when I started over.  The challenge was how to get the latitude and longitude for a long list of zip codes.  The answer was surprisingly simple.

First, I grabbed the Canadian post codes data from Geocoder.ca.  This data file is a list of post codes, latitude, longitude, municipality, and province.  I opened it up in Microsoft Excel and cut out all of the non-Ontario data.  Easy enough.  Now to filter out the postal codes that did not occur in my OWN list.

Enter two functions in Microsoft Excel:  INDEX and MATCH.   This guy had already figured out the answer so, with a bit of experimentation I went from:

Geocoder data in Microsoft Excel lined up with my 2 columns

Geocoder data in Microsoft Excel lined up with my 2 columns

to this:

Using INDEX and MATCH to pull data from a third column

Using INDEX and MATCH to pull data from a third column

This means that if postal code K0A 1A0 (cell F2) occurs in any cell in column A (that’s the A:A in the MATCH statement), insert the Latitude (from B:B) for that row.  You can continue it across for the other cells, changing B:B to C:C for Longitude (use =INDEX(C:C,MATCH(F2,A:A,0)) ), and D:D (Municipality).  The MATCH statement stays the same, because you’re testing to see if the post code in column F exists in column A.

Once you’ve set up the formulas in the first row, you can drag all three down your entire worksheet.  Use the fill handle and it will update each row properly.  You will get a load of N/A errors if your post code isn’t a match.  This helped me since I didn’t have an Ontario-limited list, and it got rid of post codes that might have been from different provinces or countries.

A tip:  WAIT.  Once you have dragged your fill handle down over your sheet, in my case over 20,000 rows, Excel gets to work.  It can take a long time (10s of minutes) for it to complete its work.  Leave it alone.  I learned this when I saw one city name populated for over half of my rows, clearly a mistake.  But Excel just hadn’t powered through all the rows yet.  When I gave it time, it completed successfully.  I could then cut and paste my columns (F, G, H, I, J) into a new worksheet to use with the mapping tools.

Tableau Public (Community)

Tableau was one of a number of programs that I came across with fee-based applications.  At this point, I was still looking for a free option for what was likely to be a one-off project.  Tableau has a free desktop application called Tableau Public and a community to go along with it.  This sold me, particularly when I got into the tool and saw how powerful it could be.

The first challenge I had was that it didn’t recognize any of my locations.  The zip code/postal code function defaults to the US.  Once I flipped this dropdown to Canada, it started to map properly.

Tableau Public assumes you mean US zip codes.  Change it to Canada if that's where your data belongs.

Tableau Public assumes you mean US zip codes. Change it to Canada if that’s where your data belongs.

But it still didn’t get all of my locations.  In fact, I went from 22,000 errors to more than 6,000 errors.  In this case, it could not find the municipalities that I was referencing.  But I know that these exist.  As you can see below, Alliston is a town that is near Barrie but there is no corresponding match in the pre-populated list from Tableau.

A list of Canadian cities in Tableau is not comprehensive.

A list of Canadian cities in Tableau is not comprehensive.

The mapping in Tableau is great and, I think, it is possibly the best tool that I used while trying to create this heat map.  In the end, though, I was only able to map most of my points and not all of them.

CartoDB

CartoDB is a web-based mapping tool.  It is the easiest tool that I found although the free version requires you to make your maps publicly accessible.  It’s a pretty steep monthly subscription to be able to lock them down.  So while I ended up using it to create my map, the pricing for my one-off project means I’ll be heading back to Tableau to spend more time understanding whether I can go around its limitations.

There really isn’t much to say about CartoDB, which says enough.  The interface is easy to use and I quickly uploaded my initial dataset (which would be the density file) plus two additional files of geo-located sites:  law libraries and courthouses.  It was a matter of minutes to figure out how to overlay these three files and to give them different looks.  The buildings received a single icon or dot, while the density map had a color range.

Locations of buildings and density of populations in relation to them.

Locations of buildings and density of populations in relation to them.

Using maps can make data more interesting where it relates to a place.  It takes the numbers out of a spreadsheet column and gives you some perspective.  CartoDB’s ability to easily layer this data means you can contrast a variety of datasets.

Now that I have created the visual, I am not sure what I will do with it next.  The project was initiated based on a question, not only of what the density and proximity was, but whether I could do it.  I need to now spend some time looking at the visualization of the datasets – which are already generating more questions – and see if I can come up with any possible answers.

 

5 thoughts on “Create a Density Map from Excel Data

  1. Just FYI – for your Tableau example, Alliston may not be recognized as it is not a municipality rather it is a community in the Town of New Tecumseth.

  2. Hi David,

    Give SimpleHeatmap a try: http://simpleheatmap.com
    It’s free while in beta, supports geocoding and gives you more interactivity than Fusion Tables. I’d be curious to know how it supports larger datasets. If you try it, drop me a line and let me know.

    • That’s interesting. It had the same defect for me that Fusion tables did – I dumped about a dozen lat/lon into your interface and ended up in the middle of the Pacific. Probably needs a bit more interface or guide to how to input the data list. Personally, I’d avoid any ad-powered site, considering the alternatives like CartoDB. Thanks for sharing the link though.

  3. I didn’t see your reply until just now, and I wish I’d seen it earlier. The issue with getting dumped in the middle of the Pacific is a bug that we only recently discovered and will be looking into soon. We’ve also had a chance since my original post to load test this with about 1.5M data points, and admittedly to my surprise it was able to handle it and perform fairly snappily after the initial map generation. Ads are a necessary evil until we’re ready to launch our subscriptions. Would you mind sharing with me what we could do to make the interface more user friendly and easier to use?

    • And it’s not just you, I expect. Since posting this, I’ve seen a number of articles (like this and this) about data that is understood resulting in being mapped to a default location.

      Suggestions for Simpleheatmap: I dumped a bunch of data in and, although it accurately mapped to Ontario, Canada, I was left staring at the Pacific Ocean. It’d be nice if you relocated to a data point. Also, the intensity/radius features don’t appear to work (they do, you just can’t tell) at higher altitude. There was no change until I zoomed in and clicked the options. The color scheme meant that my data points weren’t terribly distinct from the background (See what I mean here). You might look at the default color schemes in Excel for ranges – blue to red, orange to red – or more distinct greens/yellows to standout against the map background. I realize at some point, it’s not “simple”heatmap if you add a bunch of features! But I think these would help with folks who are giving it a whirl.

Comments are closed.