The Occasional Mentor: On Data Science in UX, Content Strategy vs UX Writing and the Durability of Digital Humanities

THE OCCASIONAL MENTOR A monthly(ish) column based on questions I’ve answered on Quora, heard on Slack groups, and other career advice I’ve given over the prior month. Hope you like it, but feel free to challenge me in the comments, if you have a different experience. Below are questions I answered in June. How are … Read more

Driving the Dome

For eight weeks every Wednesday, Brett and I snuck away under darkness and rain to the American Museum of Natural History’s Hayden Planetarium to learn how to make a star show. We learned the ins and outs of the planetarium’s computer controls and presented shows for our family and friends on March 30th in the dome. … Read more

Infovis Review: Fast Company World Leaders

I posted this today on Tumblr at infovis658:

This infographic is from Fast Company’s fastcocreate.com website and means to describe what each country on the planet leads the world in. At first glance the labels resemble a tag cloud in its informal font style, curved placement and color scheme. The dangling participle in the title also lends formality and conversational whimsy to the graphic, leading one to expect humor rather than a serious analysis.

Antarctica may in fact lead the world in emperor penguins, but it is clearly not meant to be a statement of economic or political leadership. Also, with no underlying data presented, we are left to take the infovizzers word for the results displayed. While some of the statements are interesting, such as Australia leading the world in melanoma, which could be related to its proximity to the widening ozone hole, there doesn’t appear to be any connection between each country and the item or items in which they lead. Or is there? The US leads in nobel laureates and lawnmower deaths. Could it be a statement on the distribution of lauded intellectuals versus Darwin awardees? But then, over in Russia, what do raspberries and nuclear warheads have in common? Perhaps a thumbing of the nose to external powers? Clearly, most of the items are not meant to be taken seriously. It is humorous and fun to explore. And I am sure it sparks a lot of interesting conversations.

http://www.fastcocreate.com/3020280/creativity-by-the-numbers/this-map-shows-what-every-country-leads-the-world-in-and-its-not-e

New York Times Paywall: an Infovis Review

My Tumblr post today to infovis658:

This past week, Ryan Chittum of the Columbia Journalism Review published a series of articles examining the effect of The New York Times’s paywall on revenue.

Accompanying the pieces are a series of graphs breaking out various categories of revenue. The first article exclaims that the paywall has overtaken digital advertising as a source of revenue for the Times. Chittum follows up with two additional articles exploring, with graphs, how digital revenue figures compare to revenue from print advertising and subscriptions and whether having a paywall at all is justified. The analysis seems to indicate that a paywall is not only justified, but necessary.

There are no graphs in the first article, only a series of paragraphs citing revenue data and subscriber statistics. Without a graphic, one simply understands that revenue is up, that digital subscriptions have surpassed digital advertisement and that the paywall has some significant, positive effect on print subscriptions. (Chittum links to an earlier article on this topic).

What is interesting about the series of articles is how it has become a conversation of sorts between two colleagues at the Review. Chittum notes that his colleague, Felix Salmon, had a negative interpretation of the digital revenue figures, suggesting that the only reason digital subscribers surpassed digital advertisement is because ad revenue is falling. In response, Chittum published a number of graphs, addressing concerns from his colleague in the November 5 article and continues the conversation with additional analyses on November 6.

Most of the graphs are stacked column charts, representing the total revenue for various income categories. In the November 5 article, Chittum places stacked column chart of digital advertisement versus digital subscription revenue next to a standard column chart to highlight the difference between the two categories. The stacked bar charts showing revenue with and without the paywall shows a similar trend (shape) in overall revenue, albeit with a much higher total revenue.

There is a clear drop in overall revenue between 3Q 2008 and 3Q 2009, primarily due to a drop in print advertising, despite what appears to be slight growth in print circulation during that period. Chittum notes that the paywall appears to be the reason that revenues have not fallen over the past few years, and that while it is currently “peanuts” compared to print revenue, that it will play an important and necessary role in the future, especially as the Times moves toward an all-digital medium within the next ten years.

References

Chittum, R. (November 5, 2013). “The NYT paywall don’t get no respect: despite saving the paper’s bacon.” Columbia Journalism Review. Web. Retrieved from http://www.cjr.org/the_audit/the_nyt_paywall_shores_up_the.php

Chittum, R. (November 1, 2013). “The NYT’s paywall overtakes digital ads: Meantime, the Globe’s drag on the Times, quantified.” Columbia Journalism Review. Web. Retrieved from http://www.cjr.org/the_audit/the_stand-alone_new_york_times.php

Chittum, R. (November 6, 2013). “The NYT paywall plugs the hole: charting the state of The New York Times.” Columbia Journalism Review. Web. Retrieved from http://www.cjr.org/the_audit/the_paywalls_piece_of_the_pie.php

Nature Conservancy: An Infovis Review

Posted on Tumblr 9/20/2013 to infovis658:

The Nature Conservancy features a Carbon Footprint Calculator that asks the user a series of questions about their energy use with regard to home, travel, diet and waste. Each answer translates to an estimated carbon usage, which are added up and compared to the U.S. average.

As I was answering the questions, I was looking critically at the figures they use for each potential use. In particular, I was interested in their choice of pie and bar charts to compare personal energy consumption to a national average.

The Results tab displays a pie chart of my household carbon use next to the U.S. Average. According to the calculator, my carbon footprint at 22 Tons of carbon per year is much lower than the U.S. Average and in fact matches the world average for a household of four people. This might be expected since I live in an apartment in a city with a functioning and expansive commuter transit system. But I wasn’t sure it was telling the full story.

The charts break down carbon use by categories for home, travel, diet and waste, indicating that the greatest usage (52.1%) in my household is for Home energy, followed by Diet (22.7%), Travel (12.9%) and Waste (12.3%). This makes sense as I use public transportation, eat a mostly vegetarian diet, recycle a lot and have limited control over the energy use in my building (I have no real way to adjust the heat). Putting my usage chart next to another pie chart indicating the U.S. average can be somewhat misleading, as at first glance it appears that I am using a larger amount of energy than the U.S. average in certain categories such as Home and Diet, when it is really a greater proportion. The bar chart at right shows the relationship a bit more clearly, since it shows total usage. One complaint is that the bar chart seem to be using a different color for Recyling & Waste than the pie chart. Also, it is difficult to tell if the light blue area for the World Average represents a specific category or total carbon usage.

A level of granularity in energy use is not reflected in the questions, which assigns a predetermined amount of carbon to each action under a suggested usage modification. These include an estimated decrease from the U.S. average, if you are doing something to reduce carbon use in that area; zero change, if you note that you are only doing a little or rarely; or an increase in carbon if you are not implementing the suggested change.

Example: “We’ve taken steps to heat and cool our home efficiently. “

Impact in Tons of C02:

Wherever possible: -1.5

In some areas: 0.0

Very little: 1.3

For example, my apartment building is a 100 year old building that uses oil heat. The co-op board’s plan has been to convert slowly to oil with a lower carbon concentration and will eventually switch to natural gas, the higher concentration of carbon is currently not avoidable. I may be doing all I can within my own apartment to reduce the carbon footprint in my living space, but it may still be higher than someone at a more efficient building who is doing less.

The chart provides a few helpful calls to action next to the results. One is a rather clever way to use the results to help users offset their guilt or rather “Offset Your Carbon Footprint Now.” This may ultimately be the goal of the visualization, but it appears to be flawed from a non-profit development viewpoint. Each Metric Ton of carbon use per year is multiplied by a $15 donation to indicate how much your household should contribute to the Nature Conservancy. It says I should give $300. But doesn’t this mean that the more someone cares about the issue and the more someone tries to reduce their carbon footprint, the lower the donation estimate would be?

I would take it a step further. Non-profit solicitations usually offer a range of suggested donations. Why not use that figure as a baseline for a minimum donation? Then, compare it to the US average again and ask the user if they would like to up their donation to offset additional carbon use.

U.S. Historical Voting Patterns: An Infovis Review

Posted to Tumblr on 10/4/2013 to infovis658:

This series of geographic representations of voting patterns in the U.S. by Elizabeth Anderson and Jeffrey Jones of the University of Michigan focuses on changes over time in the Southern states, particularly the early strength of the Democratic Party in the South and events that precipitated the dissipation of the Democratic hold on the South. It spans the years from Post-Civil War 1868 to the 1984 Presidential election. I was interested in exploring this topic after getting into an online debate with my cousin, who was insisting that Southern Democrats remain responsible for inequalities in minority voter populations. My contention was that the Southern Democratic power base began to lose strength after the Civil War and had dissolved by the time of the Reagan administration.

In a series of twelve slides, this presentation tells a story of the transformation of the Southern voter from a solidly Democratic block to a more diverse political region in the 1980s. It takes the viewer through the history of the Civil War and Reconstruction to the KKK’s terrorist campaign to eliminate black representation, and then compares the election results of the Presidential and Congressional campaigns of 1900, 1922, and 1948 and the results of the 1964 and 1968 campaigns, which showed a decline in Democratic support in the region. I was very interested to learn that Nixon had initially rejected running an overtly racist campaign, but then pushed ahead with the so-called “Southern Strategy,” which indeed broke up what had been the Democratic party’s “Solid South.” The series ends with a statement that racism in Republican campaigns remained covert and coded into the 1984 campaign.

While the data presented did support my argument, there were some problems with the presentation that caused some initial confusion when reviewing the slides. But first I do want to point out that I was pleased to see the data was represented on a scale from 0 to 100% throughout the presentation, which is generally good practice. I would have preferred that the colors ranged from white to dark rather than the full spectrum. Also, because generally the maps compare Democratic and Republican parties, and the differences are stark on most of the images, the juxtaposition of strong blue and red hues violates MacDonald’s color selection guideline for preventing depth perception problems and confusing afterimages. It does include a color key, as recommended by MacDonald, and the researcher limited the scale to five colors, which is also recommended.

One thing that was confusing was that the data being measured from one slide to the next was often the opposite that one would expect. For example, the data for the 1922 Congressional election shows the percent vote for Republican candidates, with a farily solid blue (0%) bloc in the Southern states. The following slide showing the percent vote for Democratic Congressional candidates in 1948 flipped the colors as most of the South is red (100%) except for Florida and the area around Atlanta and northern Georgia. This required a shift in thinking about the colors, since the data showing a blue bloc shifting to an almost entirely red bloc in the next slide, seems to indicate a drastically different story when it was intended to show a similar idea.

Source: Anderson, E. and J. Jones. (September 2002). Race, voting rights and segregation: rise and fall of the black voter. Retrieved from http://www.umich.edu/~lawrace/votetour1.htm

Reference: MacDonald, Lindsay W. (1999). “Using Color Effectively in Computer Graphics” Computer Graphics and Applications, IEEE 19(4): 20–35

pLogo DNA Sequencing: An Infovis Review

Posted on Tumblr today on infovis658:

pLogo is a visualization method developed at the University of Connecticut and Harvard University Medical School to study DNA and protein sequences. The team published an interactive version at http://plogo.uconn.edu so scientists can analyze their own data. The pLogo methodology was published in Nature Methods on October 6 and reported in Medical Xpress:

Medical Xpress:

http://medicalxpress.com/news/2013-10-tool-visualizing-dna-protein-sequences.html#inlRlv

Nature Methods:

http://www.nature.com/nmeth/journal/vaop/ncurrent/full/nmeth.2646.html

pLogo takes information from a DNA or protein sequence and maps it on a chart showing the log-odds of the binomial probability of individual letters, representing the biological residues that make up each molecule. The size of the letters indicate the level of statistical significance. The color represents the physiochemical properties.

A help guide on the pLogo website was useful in interpreting the data, particularly since I have no training in DNA sequencing. Rolling your mouse over various components of the chart provides a popup explanation of each feature.

http://plogo.uconn.edu/help/plogomap

The pLogo map is useful for someone who is new to the content of the visualization. I hadn’t noticed the red, horizontal lines that represent “…the p = 0.05 statistical significant threshold following Bonferroni correction.” The lines help the user focus on an area of the chart that represents significant information. These lines could be thicker or the red could be eliminated from the letter colors to help it stand out better.

I felt the “column numbers” running through the center of the chart at the zero axis was strange, since there is no indication whether this area contains no data or is simply an inserted label. Having the zero tick mark in the center of the column numbers label seems to indicate that there were no values near zero, but it could be that we are supposed to read the line above and below as zero. It is hard to tell. I also felt that the letters would be easier to read if the minimum size was taller. Some of the least significant letters are difficult to read.

What I liked about the program is that it includes an algorithm that analyzes and autocorrects input errors, similar to Tableau’s function that automatically selects a chart format and indicates any duplicate data with an asterisk. If this could be coupled with a Google Refine-like editing feature it would be very powerful.

According to the FAQ:

“Foregrounds and preprocessed and filtered before being used for pLogo generation. Sequences with invalid characters or widths that do not match the majority will be discarded by this preprocessing step. The foreground preprocessing will also remove duplicate sequences in the foreground (retaining only 1 instance of the duplicated sequence). To see which sequences were removed by the foreground preprocessing stage, click the “foreground preprocessing” tab below the foreground input box. Numbers in the right hand column of this window can be clicked to view the sequences that were removed for a given reason. “

I couldn’t tell if it allows you to edit or correct any changes, but it does allow you to export the data, so that, presumably, you can make edits and import it again. The pLogo team also provides a FAQ and videos explaining how to use the interactive features of the website. I don’t have a DNA dataset, and couldn’t test it myself, so the videos were helpful.

References:

O’Shea, J. P. et al. (2013/10/06). “pLogo: a probabilistic approach to visualizing sequence motifs.” Nature Methods. Nature Publishing Group.  Web. http://dx.doi.org/10.1038/nmeth.2646 

Towards a 9/11 GeoArchive

Imagine if the most graphic and expressive artifacts from one of the most historic events in New York City lay rolled in tubes in a dusty corner. What if millions of bytes of geographic data, produced through an unprecedented, community collaboration, were dispersed, disconnected and hidden from public view? If you had the opportunity to … Read more

NY Times: Japan Interactive Earthquake Map

The New York Times’ Interactive Map of the Damage from the Earthquake in Japan: http://www.nytimes.com/packages/flash/newsgraphics/2011/0311-japan-earthquake-map/index.html?hp I was able to locate the town where my friend Pia’s brother is teaching English (center of quake zone but far from the nuclear plants, very little structural damage, no casualties) and where my daughter’s camp friend’s family lives (quite … Read more