Gephi is free, open source visualization software that allows users to display data in network graphs. I had some prior experience with Gephi visualizations as a viewer, as opposed to user. The technical infrastructure team at the Information Architecture Institute was working with Gephi to create website maps and concept diagrams of the IA Library. I admired the simplicity of the node and edges display and was happy to get hands on training in the tool.
Gephi is a powerful tool that lets people explore and display relationships and connections. For this project, I selected the Marvel Universe Social Graph from Infochimps:
This dataset constructed by Cesc Rosselló, Ricardo Alberich, and Joe Miro from the University of the Balearic Islands contains Marvel characters and the comic book issues that they appeared in. It is a very large dataset holding 99,662 records.
You can now begin to format the visualization.
The resulting visualization is called a force-directed graph, because the force of gravity or propulsion between them defines the relationship between two nodes. Isolates get pushed to the outside and will continue to float away to infinity. You can stop it by setting Gravity to a higher level. The . Depending on the number of nodes, rendering the graph could take some time.
To the right edge of the color bar, you will see a small box for selecting color palettes. Click this box, and then click Default to open a range of color schemes. Select a color scheme that you like or that emphasizes the data appropriately.
Figure 1 The Marvel Comics UniverseThe Marvel Universe, in this example, is a dense mass of nodes with a several groups loosely connected to the center and a number of unconnected nodes floating toward the edges.A close view of the central nodes reveals groupings of related nodes within the dense structure.
Figure 2 Detail of the Marvel Comics Universe
Gephi has a number of additional tools that allow you to run statistical analyses and filters on the data. Rather than go through steps for using these tools, I highlight a few here.
The Modularity chart shows the distribution of nodes by number of connections. The sample file indicates a modularity of 0.683 and 56 distinct communities of Marvel characters.
The Graph Distance Report shows the size of the network and length of the average path between nodes. The sample Marvel data indicates a graph diameter of 11 and an average path length of 4.45.
Filters allow you to select for various attributes and topologies.
My completed Gephi visualization can be downloaded at:
Gephi is clearly a powerful analytical tool that requires some trial and error to use the functions properly. It is particularly important to have the data formatted properly before import and to make sure that you know the difference between a node table and an edge table. I had to re-import my data a few times before I got it right.
Also, different kinds of data may require a different layout. Once the data is imported it becomes a playground of possibilities. Any number of displays, analyses and filters can be applied to the resulting visualization.
The other thing that I noticed is that a large dataset can really slow down the performance of the tool, making rendering quite slow, especially when generating the initial layout and adjusting zoom levels.
As for the Marvel data, I was not surprised to find such a dense and interconnected graph. While I didn’t attempt to show node labels, it was clear that some groups were more connected to one another than others based on the patterns of the grouping and deepness of the hue.
The dataset I analyzed shows characters from the Marvel Universe and the series in which they appeared. It does not indicate the number of issues in each series the character appeared or whether the character was a major or minor player in the series. Some interesting directions one can take with the Marvel Universe dataset might be to apply weights to the relationships. For example, one could attempt to measure how many times within a series the character appeared, or how often one character interacted with another and apply those weights to the data in Gephi. The current visualization only indicates that the characters appear together. It would be interesting to figure out if Gephi would be able to measure the strength of the connection between characters in this way, but it would be quite an involved project!