Posted on Tumblr today on infovis658:
pLogo is a visualization method developed at the University of Connecticut and Harvard University Medical School to study DNA and protein sequences. The team published an interactive version at http://plogo.uconn.edu so scientists can analyze their own data. The pLogo methodology was published in Nature Methods on October 6 and reported in Medical Xpress:
Medical Xpress:
http://medicalxpress.com/news/2013-10-tool-visualizing-dna-protein-sequences.html#inlRlv
Nature Methods:
http://www.nature.com/nmeth/journal/vaop/ncurrent/full/nmeth.2646.html
pLogo takes information from a DNA or protein sequence and maps it on a chart showing the log-odds of the binomial probability of individual letters, representing the biological residues that make up each molecule. The size of the letters indicate the level of statistical significance. The color represents the physiochemical properties.
A help guide on the pLogo website was useful in interpreting the data, particularly since I have no training in DNA sequencing. Rolling your mouse over various components of the chart provides a popup explanation of each feature.
http://plogo.uconn.edu/help/plogomap
The pLogo map is useful for someone who is new to the content of the visualization. I hadn’t noticed the red, horizontal lines that represent “…the p = 0.05 statistical significant threshold following Bonferroni correction.” The lines help the user focus on an area of the chart that represents significant information. These lines could be thicker or the red could be eliminated from the letter colors to help it stand out better.
I felt the “column numbers” running through the center of the chart at the zero axis was strange, since there is no indication whether this area contains no data or is simply an inserted label. Having the zero tick mark in the center of the column numbers label seems to indicate that there were no values near zero, but it could be that we are supposed to read the line above and below as zero. It is hard to tell. I also felt that the letters would be easier to read if the minimum size was taller. Some of the least significant letters are difficult to read.
What I liked about the program is that it includes an algorithm that analyzes and autocorrects input errors, similar to Tableau’s function that automatically selects a chart format and indicates any duplicate data with an asterisk. If this could be coupled with a Google Refine-like editing feature it would be very powerful.
According to the FAQ:
“Foregrounds and preprocessed and filtered before being used for pLogo generation. Sequences with invalid characters or widths that do not match the majority will be discarded by this preprocessing step. The foreground preprocessing will also remove duplicate sequences in the foreground (retaining only 1 instance of the duplicated sequence). To see which sequences were removed by the foreground preprocessing stage, click the “foreground preprocessing” tab below the foreground input box. Numbers in the right hand column of this window can be clicked to view the sequences that were removed for a given reason. “
I couldn’t tell if it allows you to edit or correct any changes, but it does allow you to export the data, so that, presumably, you can make edits and import it again. The pLogo team also provides a FAQ and videos explaining how to use the interactive features of the website. I don’t have a DNA dataset, and couldn’t test it myself, so the videos were helpful.
References:
O’Shea, J. P. et al. (2013/10/06). “pLogo: a probabilistic approach to visualizing sequence motifs.” Nature Methods. Nature Publishing Group. Web. http://dx.doi.org/10.1038/nmeth.2646