Plot Activities or People Over Time

When you create a data visualization, it can make it easier for you and other people to think about the data in different ways.  I’ve been toying with heat maps and other visualizations.  Now I have data that I wanted to compare on a timeline, similar to the heat maps in Excel.  I found a useful tutorial on Google’s Chart API and it’s timeline visualization.

Format the Data

This was the most complicated part of the process.  I wanted to understand more about who was participating in a crowdsourcing activity based on the posts that they were making.  I scraped the site after filtering the activity on the site to grab just what I wanted.  In my case, scraped makes it sound more interesting than it was.  I loaded all of the results, saved as HTML, and then used Microsoft Wordpad to strip out everything but the repeatable data.

One way to do this is to replace chunks of HTML with a comma, slowly converting the web page into a comma-separated values file.  That way, I can import the file into Microsoft Excel to play around with it further.  I can do more rapid text or formatting clean up.  In my case, I wanted to combine data elements into month day, year which is easy to do with the concatenate command.

In the case of Google’s Chart API, though, I needed to format the dates specially.  As you can see in the example, it requires the new Date(year, month, day).  I selected the column containing the dates in Excel, right-clicked and selected Format cells….  You want to make a custom format of yyyy, m, d.

Change Excel column of dates for Google Chart API

The example uses four columns and, initially, so did I.  But in the end, so long as the main column is repeatable – recurring account or database ID numbers, the same person’s name (formatted consistently) – you may want to use fewer.  In my case, I was focusing on author name so I only needed that plus the start and end dates.

I already had the start date and, since it was a posted article, the end date was the same.  I copied and pasted the date column.  And then I copied and pasted the values into a new workbook.

Prepare for JSON

I saved the file out as a comma-separated values file (CSV) and then opened it in Microsoft Word.  I like Word because I can quickly build a macro to do some of the prep work.  For example, this one will insert the square brackets at the beginning and end of each line:

Sub JsonReady()
'
' JsonReady Macro
'
'
For i = 1 To 5
    Selection.TypeText Text:="["
    Selection.EndKey Unit:=wdLine
    Selection.TypeText Text:="]"
    Selection.MoveRight Unit:=wdCharacter, Count:=1
Next i
End Sub

The downside to using Microsoft Word is that you can end up with weird things like smart quotes.  When exported to text, these get ASCII characters that won’t work in javascript and need further cleanup.  I think next time, I’ll do only the bare minimum in Word and then flip over to my HTML editor (Bluefish or Microsoft Expression Web 4) to do the rest of the cleanup.  The benefit is that an HTML editor will generally color code content, so you can see when something isn’t quite right.

Using the Timeline Visualization

The end result is something like a Gantt chart but without necessarily having relationships between the data lines.  The timeline examples are helpful but incomplete, in some cases.  I ended up having to hunt around the web before I realized that they weren’t loading Jquery.  The head of my file is:

<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" /><html>
<title>Crowdsourcing Activity Over Time</title>

<script type="text/javascript" src="https://www.google.com/jsapi">
    </script>
<script type="text/javascript" src="https://ajax.googleapis.com/
    ajax/libs/jquery/1.10.2/jquery.min.js"></script>

<script type="text/javascript">

google.load("search", "1");
google.load("jquery", "1.4.2");
google.load("jqueryui", "1.7.2");
google.load("visualization", "1", {packages:["timeline"]});

google.setOnLoadCallback(drawChart);

function drawChart() {
  var container = document.getElementById('timeline');
  var chart = new google.visualization.Timeline(container);
  var dataTable = new google.visualization.DataTable();

Until I added in the Jquery line, the result was blank.  And it threw no errors so I wasn’t sure what was going on.  I added the additional google.load lines after some more research but am not sure if they’re necessary or not.  I stopped investigating when the chart showed up.

Each row represents a different contributor or author, with the timeline at the bottom.

Each row represents a different contributor or author and their activity by day, with the timeline at the bottom.

The general timeline output isn’t great.  I would like to try:

  • compressing the height of each line to see more rows at a time;
  • being able to sort based on the number of posts, so that there is a bell curve or trend line created, visually, by the posting rates.  I expect that I need to do this in Excel before I export it the first time;
  • consider shifting from a day-by-day breakdown to somehow focusing on larger periods of time – like showing everyone who posted in May, rather than each day of May separately.

This visualization has a lot of possibilities, though.  I can see using it in a law library to show database transactions over time (which ones were used more or less and when) or which user or location was experiencing given activity over time.  In some cases a table can handle that kind of data but once you get a year’s worth of detail, sometimes a visualization can help tell the story better.