When you create a data visualization, it can make it easier for you and other people to think about the data in different ways. I’ve been toying with heat maps and other visualizations. Now I have data that I wanted to compare on a timeline, similar to the heat maps in Excel. I found a useful tutorial on Google’s Chart API and it’s timeline visualization.
Format the Data
This was the most complicated part of the process. I wanted to understand more about who was participating in a crowdsourcing activity based on the posts that they were making. I scraped the site after filtering the activity on the site to grab just what I wanted. In my case, scraped makes it sound more interesting than it was. I loaded all of the results, saved as HTML, and then used Microsoft Wordpad to strip out everything but the repeatable data.
One way to do this is to replace chunks of HTML with a comma, slowly converting the web page into a comma-separated values file. That way, I can import the file into Microsoft Excel to play around with it further. I can do more rapid text or formatting clean up. In my case, I wanted to combine data elements into month day, year which is easy to do with the concatenate command.
In the case of Google’s Chart API, though, I needed to format the dates specially. As you can see in the example, it requires the new Date(year, month, day). I selected the column containing the dates in Excel, right-clicked and selected Format cells…. You want to make a custom format of yyyy, m, d.
The example uses four columns and, initially, so did I. But in the end, so long as the main column is repeatable – recurring account or database ID numbers, the same person’s name (formatted consistently) – you may want to use fewer. In my case, I was focusing on author name so I only needed that plus the start and end dates.
I already had the start date and, since it was a posted article, the end date was the same. I copied and pasted the date column. And then I copied and pasted the values into a new workbook.
Prepare for JSON
I saved the file out as a comma-separated values file (CSV) and then opened it in Microsoft Word. I like Word because I can quickly build a macro to do some of the prep work. For example, this one will insert the square brackets at the beginning and end of each line:
Sub JsonReady() ' ' JsonReady Macro ' ' For i = 1 To 5 Selection.TypeText Text:="[" Selection.EndKey Unit:=wdLine Selection.TypeText Text:="]" Selection.MoveRight Unit:=wdCharacter, Count:=1 Next i End Sub
Using the Timeline Visualization
The end result is something like a Gantt chart but without necessarily having relationships between the data lines. The timeline examples are helpful but incomplete, in some cases. I ended up having to hunt around the web before I realized that they weren’t loading Jquery. The head of my file is:
Until I added in the Jquery line, the result was blank. And it threw no errors so I wasn’t sure what was going on. I added the additional google.load lines after some more research but am not sure if they’re necessary or not. I stopped investigating when the chart showed up.
The general timeline output isn’t great. I would like to try:
- compressing the height of each line to see more rows at a time;
- being able to sort based on the number of posts, so that there is a bell curve or trend line created, visually, by the posting rates. I expect that I need to do this in Excel before I export it the first time;
- consider shifting from a day-by-day breakdown to somehow focusing on larger periods of time – like showing everyone who posted in May, rather than each day of May separately.
This visualization has a lot of possibilities, though. I can see using it in a law library to show database transactions over time (which ones were used more or less and when) or which user or location was experiencing given activity over time. In some cases a table can handle that kind of data but once you get a year’s worth of detail, sometimes a visualization can help tell the story better.