pLogo DNA Sequencing: An Infovis Review

Posted on Tumblr today on infovis658:

pLogo is a visualization method developed at the University of Connecticut and Harvard University Medical School to study DNA and protein sequences. The team published an interactive version at http://plogo.uconn.edu so scientists can analyze their own data. The pLogo methodology was published in Nature Methods on October 6 and reported in Medical Xpress:

Medical Xpress:

http://medicalxpress.com/news/2013-10-tool-visualizing-dna-protein-sequences.html#inlRlv

Nature Methods:

http://www.nature.com/nmeth/journal/vaop/ncurrent/full/nmeth.2646.html

pLogo takes information from a DNA or protein sequence and maps it on a chart showing the log-odds of the binomial probability of individual letters, representing the biological residues that make up each molecule. The size of the letters indicate the level of statistical significance. The color represents the physiochemical properties.

A help guide on the pLogo website was useful in interpreting the data, particularly since I have no training in DNA sequencing. Rolling your mouse over various components of the chart provides a popup explanation of each feature.

http://plogo.uconn.edu/help/plogomap

The pLogo map is useful for someone who is new to the content of the visualization. I hadn’t noticed the red, horizontal lines that represent “…the p = 0.05 statistical significant threshold following Bonferroni correction.” The lines help the user focus on an area of the chart that represents significant information. These lines could be thicker or the red could be eliminated from the letter colors to help it stand out better.

I felt the “column numbers” running through the center of the chart at the zero axis was strange, since there is no indication whether this area contains no data or is simply an inserted label. Having the zero tick mark in the center of the column numbers label seems to indicate that there were no values near zero, but it could be that we are supposed to read the line above and below as zero. It is hard to tell. I also felt that the letters would be easier to read if the minimum size was taller. Some of the least significant letters are difficult to read.

What I liked about the program is that it includes an algorithm that analyzes and autocorrects input errors, similar to Tableau’s function that automatically selects a chart format and indicates any duplicate data with an asterisk. If this could be coupled with a Google Refine-like editing feature it would be very powerful.

According to the FAQ:

“Foregrounds and preprocessed and filtered before being used for pLogo generation. Sequences with invalid characters or widths that do not match the majority will be discarded by this preprocessing step. The foreground preprocessing will also remove duplicate sequences in the foreground (retaining only 1 instance of the duplicated sequence). To see which sequences were removed by the foreground preprocessing stage, click the “foreground preprocessing” tab below the foreground input box. Numbers in the right hand column of this window can be clicked to view the sequences that were removed for a given reason. “

I couldn’t tell if it allows you to edit or correct any changes, but it does allow you to export the data, so that, presumably, you can make edits and import it again. The pLogo team also provides a FAQ and videos explaining how to use the interactive features of the website. I don’t have a DNA dataset, and couldn’t test it myself, so the videos were helpful.

References:

O’Shea, J. P. et al. (2013/10/06). “pLogo: a probabilistic approach to visualizing sequence motifs.” Nature Methods. Nature Publishing Group.  Web. http://dx.doi.org/10.1038/nmeth.2646 

Oct 25 – Harnessing the Spatial Data Explosion

We live in a world where vast troves of new information are being captured every day: smartphones double as data collection devices; social media applications aggregate geographically encoded mood swings; collectively tagged photos lead to new spatial data; global volunteers charting unmapped cities in the face of disasters or create new historical climate models from … Read more

Registration for the 2013 NYGeoCON is Now Open!

Registration is now open for the 2013 NYGeoCON being held on November 12th-13th in beautiful Saratoga Springs, NY! The early registration rate is $185 for the entire event and $120 for one day. You can register by going to: http://www.nysgis.net/nygeocon2013/register/. Also, we have reserved a block of rooms at the Saratoga Hilton at the government rate of $104/night. … Read more

Summer School

My summer involved a full set of research courses, including Museums & Library Research at the Metropolitan Museum of Art, Researching Local Histories and the Summer Map Institute at NYPL. The workload was a bit heavy due to the fact that the MetMuseum course was not actually a two week course, as noted in the … Read more

Map Literature Review

For this literature review I selected two sources, one practical and one fanciful. The practical source is “Digital Map Librarianship: A Working Syllabus” from the IFLA Section of Geography and Map Libraries. The fanciful one is You Are Here: Personal Geographies and other Maps of the Imagination by Katherine Harmon.   “Digital Map Librarianship: A … Read more

Findings Report: Web Map Resources

The following is a Findings Report on web map resources that I completed as part of a study of NYC Community Gardens for Pratt’s Map Institute at the New York Public Library Map Division taught by Matt Knutzen. I reviewed the Library of Congress Geography and Map Reading Room website, GeoCommons and OASISNYC as potential … Read more

SILS Student Showcase

I was nominated to present three projects at the Pratt SILS Student Showcase on May 10, 2013, including a review of technology platforms for a digital humanities skillshare application; a group project on linked open data at cultural heritage institutions in which I studied the Australian War Memorial, EU Screen and the Deutsche National Bibliotek; … Read more