Stock charts for everything else: Google Public Data

Google rolled out a simple little feature today: enter “unemployment rate wayne county” and they’ll offer you a chart. Click it, and you’ll see the unemployment rate since 1990, and be able to add other counties to compare. It ain’t much, but it’s neat.

Now, unemployment data *is* take-my-shirt-off-WOO-HOO-high-five thrilling, but this’ll get much more interesting if Google follows through (from the Official Google Blog):

The data we’re including in this first launch represents just a small fraction of all the interesting public data available on the web. There are statistics for prices of cookies, CO2 emissions, asthma frequency, high school graduation rates, bakers’ salaries, number of wildfires, and the list goes on. … we have been working on creating a new service that make lots of data instantly available for intuitive, visual exploration. Today’s launch is a first step in that direction.

Tidy snippets of civic information, linkable and comparable, from all aspects of public data — that’s one damn cool almanac! More like Everyblock than Wikipedia. Data, but easier. Fucking linkable!

Who’s gonna step up?

From this day forward, any news story about unemployment must link to the chart, just like business stories link to stock charts. Anything less is a disservice to readers. It’s zero-effort, free, informative, and damn neat. Why the hell not?

The future

The sci-fi geek in me sees this as just one more step towards Google’s lofty mission: “to organize the world’s information and make it universally accessible and useful.” It’s coming: All the data, one gesture away, on my cornea-screen. Oh, hell yes.

From concept to sketch to software: Building a new way to visualize votes… mmm, environminty!

Ryan Mark and I built enviroVOTE to help people visualize the environmental impact of the 2008 elections. We designed it in two evenings and made it real in a three-and-a-half-day long bender of data crunching and code.

This is the story of that time.

Sketch of enviroVOTE
+ coffee = enviroVOTE is real, live software

Sunday evening, 26 October: the concept

The idea struck us when Ryan and I discovered we had a common problem: homework. Ryan was on the hook to produce a story about the environment for News 21‘s election night coverage, and I needed to build an example presenting news data in some interesting way using charts and graphs. So we decided to combine our efforts and make something that would visualize environmental information about the election.

We searched for data to present, and found that it came in many shapes; like a candidate’s track record of support on environmental issues, or statistics on national parks, nuclear power and everything in-between. But the most compelling data set we found was not stats- or issue-based: endorsements made by environmental groups.

Statistics were cut because they’re only peripherally related to the races being run. It’s not particularly interesting to say something like “in states with more than five hydroelectric power sources, the democratic candidate prevailed 18% of the time.”

Only sportscasters can get away with that crap.

Why not issues, then? They’re hard to quantify. Candidate websites are frequently slippery, ambiguous things, and we found that few politicians responded to efforts that would make their positions crystal clear like Project Vote Smart’s Political Courage Test and Candid Answers’ Voters Guide to the Environment. The best data we could find were candidates’ voting records, but without understanding the nuance of each piece of legislation, it’s nearly impossible to determine if a vote was for or against the goodness of the earth. (Also, only incumbents have voting records.)

An endorsement is a true-false, unambiguous, easy to count thing. Environmental groups like the Sierra Club and the League of Conservation Voters publish their support for candidates online. Even better, the aforementioned Project Vote Smart — a volunteer group dedicated to strengthening our democracy through access to information — aggregates endorsements, and makes them readily available for current and historic races. And Vote Smart makes them available via an API, so others can mash up their data, just like we were itching to do.

Wednesday evening, 29 October: the design

A second fury of inspiration led to the design of the site. Marcel Pacatte, my instructor and head of Medill’s Chicago newsroom, was our source of journalistic wisdom. He and I identified our audience and discussed the angles and presentation methods that would best serve them. Obvious ideas like red/blue states and a map of the nation’s greenness were tossed — maps aren’t all that good at showing off numbers. (Notable exceptions include cartograms and the famous diagram of Napoleon’s march to Moscow, neither of which seemed sensible metaphors to adopt.)

Working out the enviroVOTE concepts on a whiteboard
Scope creep, be damned!

We decided to not make a voter’s guide, since there was little time before the election for folks to find the site, and to instead make something that’s interesting the day of the elections, and useful in the days following. So we looked for numbers to support that mission.

Counting environmentally-friendly victories would be both timely on election night, and purposeful later. We could calculate a win for the earth by counting endorsements: if the winning candidate had more endorsements, it was a green race. This was easy to aggregate nationally as well as by state.

And by running the same numbers on the previous races (two years ago for the House, six for the Senate, etc.) we could calculate the change in the environmental-friendliness of the nation’s elected officials, a figure that became known as “environmintiness.”

In addition, some races potentially held more impact for the environment than others — because of their location or the candidates running — so we decided it was necessary to highlight these key races alongside the numbers.

The sketch that served as the primary design document for enviroVOTE
The sketch that served as the primary design document for enviroVOTE

In a whirlwind sketch-a-thon, the design for the site flew together. We would show off the two big numbers in the simplest possible way. No maps, pies or (praise the lord!) Flash necessary. They’re both just percentages. To set off one from the other, we decided on a percentage for the percent change, and a one-bar chart for the victory counts, in aggregate and for individual states.

Users would be interested in seeing results from their home state, so we made the states our primary navigation, and listed them, along with their bar chart, down the left side of the page. (We explicitly decided to not use a map for navigation, like most sites do. If I lived in Rhode Island, I’d effing hate those sites.)

Putting the big numbers front and center and listing the incoming race results down the right gave users an up-to-the-minute snapshot of the evening. The writeups about key races, though important, were our least timely information, so we made them big and bold, but placed them mostly below the fold.

We produced a simple design, just three pages — home, a state and a race — each presenting more detail as you drilled down.

Saturday and Sunday morning, 1-2 November: the development

Development began Saturday morning. We decided to build the site on Django, the free and open source web development framework that we were concurrently using to build News Mixer, the big final project of our master’s degree program. (If you’re interested in our reasons why, and how it all works, check out my post that explains the same stuff re: News Mixer.)

We brainstormed names for our new baby, and immediately checked to see if the urls were available. envirovote.us was the first one we really liked, so we bought it and started running. Ryan designed a logo and whipped up a color scheme, and thus a brand was born.

Improvising the details, we built the site very closely to as it was designed. (The initial sketches were mine, but Ryan gets the props for making it look so damn sexy.) Coding the site took about a day and a half, minus time for Ryan to go home and sleep, and for me to cook soup.

We used the awesome, free tools at Google Code to list tasks and ideas, manage our source code, and track defects. The simple concept and excellent tools helped make this a relatively issue-free development cycle. Django, FTW!

Sunday afternoon and Monday, 2-3 November: the gathering, massaging, and jamming in of data

Pretty much finished with the code, minus subsequent bug fixes and tweaks, we started on the data.

Ryan used the Project Vote Smart API to gather information on current and historical races: the states, districts, and candidates that form the backbone of our system. He wrote Python scripts to repeatedly call the API, munge the response, and aggregate all of the races, candidates, wacky political parties, and the rest into files we could then pump into the database.

I attacked from the other side and scoured environmental groups’ websites, as well as the endorsement data provided by Project Vote Smart, to collect the endorsements we use to calculate the big numbers.

Once all the data was collected into text files, we then wrote more scripts to read those files, scrub the data of inconsistencies, poor spelling, and other weirdness, and finally fill the database.

All of this took a day and a half, far longer than we had hoped, and as much time as was necessary to build the website. I did not cook soup. We ordered in.

enviroVOTE is real, live software
Coffee, nerd sweat… smells like software. Yet, curiously minty-fresh.

Tuesday, 4 November

After attending class all day in Evanston, Ryan and I headed downtown for an evening of data input and cursing at screens.

Julia Dilday and Alexander Reed watched the AP wire all night, tracking races and gathering results and entering them into the system. I cannot express how much more difficult this was than we anticipated. Julia and Alex: thank you thank you thank you thank you.

Ryan kept the system humming through the night. He tamed the beast: keeping the site online, fixing bugs, and updating the administrative interface in an effort to improve the poor working conditions of Julia and Alex.

I ran the public relations effort: taking interviews, helping input incoming races, and getting the word out about our little project. I also gave enviroVOTE a voice. We set up a Twitter account to tell the nation about environmintiness as the results came in. (For a time, the site automatically twittered with each race result, until we realized that it was sending far more tweets than anyone would ever want to read, and turned it off.)

The aftermath

I’m told the presidential race was noteworthy, though I can’t recall who won — it was just one of nearly 500 races we recorded that night, and we weren’t watching the TV.

Since the 2nd, we’ve fixed a few bugs and we’ve slowly added the final race results as they’ve trickled in. The site is not nearly as dynamic as is was election night, but maybe we’ll have another few days free next year.

enviroVOTE: Tune in tonight to track the environmintiness of the elections

This morning, Ryan Mark and I launched enviroVOTE!

Conceived last Monday, and built in a three-day coding sprint that ended in the wee hours this morning, the site tracks the environmental impact of the elections by comparing winning candidates with environmentally-friendly endorsements.

enviroVOTE

The numbers

Amy Gahran got the scoop with her E-Media Tidbits post:

The site’s home page features a meter bar currently set to zero. That will change as election results come in tonight. You can also view races by state, with links to specific eco-group endorsements given to specific candidates. …

But the analysis goes deeper than that. Below the meter bar is a percentage figure. That’s where Envirovote gauges the level of enviromintiness of the 2008 elections. Boyer defines enviromintiness as “The freshness of the breath of the nation. Technically, this is the percent change in the eco-friendliness of this year’s elections compared to the last applicable elections for the same seats.”

We calculate the eco-friendliness of a candidate based on how many environmental endorsements they’ve received compared to their race-mates.  Most of the endorsement data, as well as candidate and race information was lovingly sucked through the tubes from Project Vote Smart.  Other data was pulled from Wikipedia and the environmental groups’ websites.

The awesomeness to come

The enviro-meter hasn’t moved yet, but very soon it’ll show the environmental impact of today’s election.  We’ll post the results as they come in tonight, and if America made environminty choices, those bars are gonna start turning green!

So, what are you waiting for?

Check out enviroVOTE tonight, as the polls come in!  And for the play-by-play, follow us on Twitter!

NYT’s new Visualization Lab: They bring the data, you mix the charts

As announced on their excellent Open blog, the Times rolled out a neat tool yesterday:

The New York Times Visualization Lab… allows readers to create compelling interactive charts, graphs, maps and other types of graphical presentations from data made available by Times editors. NYTimes.com readers can comment on the visualizations, share them with others in the form of widgets and images, and create topic hubs where people can collect visualizations and discuss specific subjects.

It’s based on the technology developed by the folks at Many Eyes (about which I’ve blogged before). In this implementation you can’t upload your own data. Instead, the data you’re able visualize is provided by the Times editors.

Still learning a bit

The interface is pretty cludgy, and the initial data sets don’t quite work with the canned visualizations (NYT folks: if you’re watching, see below for my bug report), but they should be able to work that stuff out.

England and Wales

My other complaint is that the data is more like what I’d look for in an atlas than I’d expect from a newspaper. Party Affiliation By Religious Tradition, National League HR per AB Leaders 2006-2008, and Sarah Palin’s Speech at the RNC are fun as a start, but don’t realize the potential of this system.

I sure hope data sets discovered while researching New York Times stories get uploaded to the lab. They’ve got to have some FOIAed federal data on their desktops. That kind of stuff is begging for citizen journalism.

Or, do it yourself

If you love this, you’ll want to take a swing at making your own charts over at the full-featured Many Eyes site. I’ve been playing with the Illinois State Board of Education’s schools report card data:

(The Times did make one huge improvement… their embedded charts have a *way* better color scheme.)

Nathan at FlowingData weighted in on the Lab last night:

I said the API was a good step forward. The Visualization Lab is more than a step. … I’m looking forward to seeing how well Times readers take to this new way of interacting.

Agreed. I’m really excited about this. It ain’t perfect, but it’s an exciting development for online news, especially if they start uploading lots of source materials and make it a bit easier to use. The big question is: Will people use it?

NYT to release open-source “document viewer” for investigative journalism

To help create their fantastic piece about Hillary Clinton’s White House schedules, the NYT developed a tool to aid them in analysis of the enormous amount of information that the schedules contained.

Today at the Online News Association conference, Aron Pilhofer, editor of interactive news tech at the NYT, told a session audience that they are planning to release this tool as an open-source project!

(He said it’ll be on Amazon EC2, though I’m not sure exactly what that’ll amount to.)

Details are slim, but this seems like a pretty cool thing. Pilhofer didn’t give a timeline on this project, or on their previously-announced news API, but both are on the way.

I’m guessing it’ll be after the election. They’re probably pretty busy creating all those kick-ass visualizations.

UPDATE: Be sure to check Aron’s comment below. It will be open source, but they’ll also deploy it to EC2 for folks to use instantly.

Thinking about data visualization for journalists

I posted the other day about data visualization tools, but even the best tools can’t save you if you’re clueless about visualization techniques. Most of this stuff isn’t web-specific, but I rant so frequently about this stuff to my classmates that I thought it’d be worthy of a post.

Charts!

Flowing Data recently challenged their readers to improve this chart:

A bad chart

What was the graph trying to show? It was trying to show party registration in California over the past five presidential elections. Did it succeed? No. It failed miserably; however, you did much better. Here are all the reworks.

My favorite rework tells the story far better:

A good chart

More charts

The Gettysburg Powerpoint Presentation is absolutely priceless (quote from Norvig’s “making of” page):

I imagined what Abe Lincoln might have done if he had used PowerPoint rather than the power of oratory at Gettysburg. (I chose the Gettysburg speech because it was shorter than, say, the Martin Luther King “I have a dream” speech, and because I had an idea for turning “four score and seven years” into a gratuitous graph.)

Organizational overview from Gettysburg Address

Cartograms!

Le monde dans les yeux d’un rédac chef (The world in the eyes of an editor in chief) illustrates how news organizations cover the world disproportionately using one of my favorite visualization techniques, cartograms.

The cartograms below show the world through the eyes of editors-in-chief, in 2007. Countries swell as they receive more media attention; others shrink as we forget them.

Cartogram of the Economist\'s news coverage

Check out Worldmapper for lots more killer cartograms like this one:

Territory size shows the proportion of the world’s adherents to Islam living there.

Cartogram of national proportions of Muslims worldwide

And no cartogram rant would be complete without the fantastic 2004 election race map:

The (contiguous 48) states of the country are colored red or blue to indicate whether a majority of their voters voted for the Republican candidate (George W. Bush) or the Democratic candidate (John F. Kerry) respectively. The map gives the superficial impression that the “red states” dominate the country, since they cover far more area than the blue ones.

Red and Blue states

In this map, it appears that only a rather small area is taken up by true red counties, the rest being mostly shades of purple with patches of blue in the urban areas.

Purple counties

Further reading

If you’re digging this, and you’re not yet familiar with Edward Tufte’s work… now’s when your mind gets blown. His books, including the classic The Visual Display of Quantitative Information, are absolutely brilliant. I took one of his courses several years ago — it was mind elevating.

One example that Tufte uses has become, as far as I can tell, *the* visual representation of successful data visualization: Charles Minard’s graphic of Napoleon’s March. From the Wikipedia:

Charles Minard\'s graphic of Napoleon\'s March

The graph displays several variables in a single two-dimensional image:

  • the army’s location and direction, showing where units split off and rejoined
  • the declining size of the army (note e.g. the crossing of the Berezina river on the retreat)
  • the low temperatures during the retreat.

Brilliant.

Hacker journalism: Version control for campaign promises

The always outstanding Threat Level sez:

John McCain’s campaign published a side-by-side comparison of Barack Obama’s Iraq War policy web pages on Tuesday using a new automated online tracking service called Versionista.

Obama statements compared at Versionista

The Friday, July 11 version of the page says: “at great cost our troops have helped reduce violence in some areas of Iraq, but even those reductions do not get us below the unsustainable levels of violence of mid-2006.”

The Monday, July 14 version spidered by Versionista says: “Our troops have heroically helped reduce civilian casualties in Iraq to early 2006 levels. This is a testament to our military’s hard work, improved counterinsurgency tactics, and enormous sacrifice by our troops and military families.”

We (software dorks) have been doing this for years.  It’s how we tell who broke something:

Trac project, Changeset 7273 for trunk
Trac project, Changeset 7273 for trunk

Revision: 380, SoC
Revision: 380, SoC

Version control is an enormously powerful tool. If you’re making software without it, you’re nuts. (It’s also the primary reason I don’t use word processors to write – there’s no good way get a diff between two copies of a Word doc. Well, that… and Word sucks.)

I just wish a journalist had done this, instead of a campaign worker.

Ugh. Next time.