Nature Conservancy: An Infovis Review

Posted on Tumblr 9/20/2013 to infovis658:

The Nature Conservancy features a Carbon Footprint Calculator that asks the user a series of questions about their energy use with regard to home, travel, diet and waste. Each answer translates to an estimated carbon usage, which are added up and compared to the U.S. average.

As I was answering the questions, I was looking critically at the figures they use for each potential use. In particular, I was interested in their choice of pie and bar charts to compare personal energy consumption to a national average.

The Results tab displays a pie chart of my household carbon use next to the U.S. Average. According to the calculator, my carbon footprint at 22 Tons of carbon per year is much lower than the U.S. Average and in fact matches the world average for a household of four people. This might be expected since I live in an apartment in a city with a functioning and expansive commuter transit system. But I wasn’t sure it was telling the full story.

The charts break down carbon use by categories for home, travel, diet and waste, indicating that the greatest usage (52.1%) in my household is for Home energy, followed by Diet (22.7%), Travel (12.9%) and Waste (12.3%). This makes sense as I use public transportation, eat a mostly vegetarian diet, recycle a lot and have limited control over the energy use in my building (I have no real way to adjust the heat). Putting my usage chart next to another pie chart indicating the U.S. average can be somewhat misleading, as at first glance it appears that I am using a larger amount of energy than the U.S. average in certain categories such as Home and Diet, when it is really a greater proportion. The bar chart at right shows the relationship a bit more clearly, since it shows total usage. One complaint is that the bar chart seem to be using a different color for Recyling & Waste than the pie chart. Also, it is difficult to tell if the light blue area for the World Average represents a specific category or total carbon usage.

A level of granularity in energy use is not reflected in the questions, which assigns a predetermined amount of carbon to each action under a suggested usage modification. These include an estimated decrease from the U.S. average, if you are doing something to reduce carbon use in that area; zero change, if you note that you are only doing a little or rarely; or an increase in carbon if you are not implementing the suggested change.

Example: “We’ve taken steps to heat and cool our home efficiently. “

Impact in Tons of C02:

Wherever possible: -1.5

In some areas: 0.0

Very little: 1.3

For example, my apartment building is a 100 year old building that uses oil heat. The co-op board’s plan has been to convert slowly to oil with a lower carbon concentration and will eventually switch to natural gas, the higher concentration of carbon is currently not avoidable. I may be doing all I can within my own apartment to reduce the carbon footprint in my living space, but it may still be higher than someone at a more efficient building who is doing less.

The chart provides a few helpful calls to action next to the results. One is a rather clever way to use the results to help users offset their guilt or rather “Offset Your Carbon Footprint Now.” This may ultimately be the goal of the visualization, but it appears to be flawed from a non-profit development viewpoint. Each Metric Ton of carbon use per year is multiplied by a $15 donation to indicate how much your household should contribute to the Nature Conservancy. It says I should give $300. But doesn’t this mean that the more someone cares about the issue and the more someone tries to reduce their carbon footprint, the lower the donation estimate would be?

I would take it a step further. Non-profit solicitations usually offer a range of suggested donations. Why not use that figure as a baseline for a minimum donation? Then, compare it to the US average again and ask the user if they would like to up their donation to offset additional carbon use.

U.S. Historical Voting Patterns: An Infovis Review

Posted to Tumblr on 10/4/2013 to infovis658:

This series of geographic representations of voting patterns in the U.S. by Elizabeth Anderson and Jeffrey Jones of the University of Michigan focuses on changes over time in the Southern states, particularly the early strength of the Democratic Party in the South and events that precipitated the dissipation of the Democratic hold on the South. It spans the years from Post-Civil War 1868 to the 1984 Presidential election. I was interested in exploring this topic after getting into an online debate with my cousin, who was insisting that Southern Democrats remain responsible for inequalities in minority voter populations. My contention was that the Southern Democratic power base began to lose strength after the Civil War and had dissolved by the time of the Reagan administration.

In a series of twelve slides, this presentation tells a story of the transformation of the Southern voter from a solidly Democratic block to a more diverse political region in the 1980s. It takes the viewer through the history of the Civil War and Reconstruction to the KKK’s terrorist campaign to eliminate black representation, and then compares the election results of the Presidential and Congressional campaigns of 1900, 1922, and 1948 and the results of the 1964 and 1968 campaigns, which showed a decline in Democratic support in the region. I was very interested to learn that Nixon had initially rejected running an overtly racist campaign, but then pushed ahead with the so-called “Southern Strategy,” which indeed broke up what had been the Democratic party’s “Solid South.” The series ends with a statement that racism in Republican campaigns remained covert and coded into the 1984 campaign.

While the data presented did support my argument, there were some problems with the presentation that caused some initial confusion when reviewing the slides. But first I do want to point out that I was pleased to see the data was represented on a scale from 0 to 100% throughout the presentation, which is generally good practice. I would have preferred that the colors ranged from white to dark rather than the full spectrum. Also, because generally the maps compare Democratic and Republican parties, and the differences are stark on most of the images, the juxtaposition of strong blue and red hues violates MacDonald’s color selection guideline for preventing depth perception problems and confusing afterimages. It does include a color key, as recommended by MacDonald, and the researcher limited the scale to five colors, which is also recommended.

One thing that was confusing was that the data being measured from one slide to the next was often the opposite that one would expect. For example, the data for the 1922 Congressional election shows the percent vote for Republican candidates, with a farily solid blue (0%) bloc in the Southern states. The following slide showing the percent vote for Democratic Congressional candidates in 1948 flipped the colors as most of the South is red (100%) except for Florida and the area around Atlanta and northern Georgia. This required a shift in thinking about the colors, since the data showing a blue bloc shifting to an almost entirely red bloc in the next slide, seems to indicate a drastically different story when it was intended to show a similar idea.

Source: Anderson, E. and J. Jones. (September 2002). Race, voting rights and segregation: rise and fall of the black voter. Retrieved from http://www.umich.edu/~lawrace/votetour1.htm

Reference: MacDonald, Lindsay W. (1999). “Using Color Effectively in Computer Graphics” Computer Graphics and Applications, IEEE 19(4): 20–35

pLogo DNA Sequencing: An Infovis Review

Posted on Tumblr today on infovis658:

pLogo is a visualization method developed at the University of Connecticut and Harvard University Medical School to study DNA and protein sequences. The team published an interactive version at http://plogo.uconn.edu so scientists can analyze their own data. The pLogo methodology was published in Nature Methods on October 6 and reported in Medical Xpress:

Medical Xpress:

http://medicalxpress.com/news/2013-10-tool-visualizing-dna-protein-sequences.html#inlRlv

Nature Methods:

http://www.nature.com/nmeth/journal/vaop/ncurrent/full/nmeth.2646.html

pLogo takes information from a DNA or protein sequence and maps it on a chart showing the log-odds of the binomial probability of individual letters, representing the biological residues that make up each molecule. The size of the letters indicate the level of statistical significance. The color represents the physiochemical properties.

A help guide on the pLogo website was useful in interpreting the data, particularly since I have no training in DNA sequencing. Rolling your mouse over various components of the chart provides a popup explanation of each feature.

http://plogo.uconn.edu/help/plogomap

The pLogo map is useful for someone who is new to the content of the visualization. I hadn’t noticed the red, horizontal lines that represent “…the p = 0.05 statistical significant threshold following Bonferroni correction.” The lines help the user focus on an area of the chart that represents significant information. These lines could be thicker or the red could be eliminated from the letter colors to help it stand out better.

I felt the “column numbers” running through the center of the chart at the zero axis was strange, since there is no indication whether this area contains no data or is simply an inserted label. Having the zero tick mark in the center of the column numbers label seems to indicate that there were no values near zero, but it could be that we are supposed to read the line above and below as zero. It is hard to tell. I also felt that the letters would be easier to read if the minimum size was taller. Some of the least significant letters are difficult to read.

What I liked about the program is that it includes an algorithm that analyzes and autocorrects input errors, similar to Tableau’s function that automatically selects a chart format and indicates any duplicate data with an asterisk. If this could be coupled with a Google Refine-like editing feature it would be very powerful.

According to the FAQ:

“Foregrounds and preprocessed and filtered before being used for pLogo generation. Sequences with invalid characters or widths that do not match the majority will be discarded by this preprocessing step. The foreground preprocessing will also remove duplicate sequences in the foreground (retaining only 1 instance of the duplicated sequence). To see which sequences were removed by the foreground preprocessing stage, click the “foreground preprocessing” tab below the foreground input box. Numbers in the right hand column of this window can be clicked to view the sequences that were removed for a given reason. “

I couldn’t tell if it allows you to edit or correct any changes, but it does allow you to export the data, so that, presumably, you can make edits and import it again. The pLogo team also provides a FAQ and videos explaining how to use the interactive features of the website. I don’t have a DNA dataset, and couldn’t test it myself, so the videos were helpful.

References:

O’Shea, J. P. et al. (2013/10/06). “pLogo: a probabilistic approach to visualizing sequence motifs.” Nature Methods. Nature Publishing Group.  Web. http://dx.doi.org/10.1038/nmeth.2646