views:

1186

answers:

10

Histograms and scatterplots are great methods of visualizing data and the relationship between variables, but recently I have been wondering about what visualization techniques I am missing. What do you think is the most underused type of plot?

Answers should:

  1. Not be very commonly used in practice.
  2. Be understandable without a great deal of background discussion.
  3. Be applicable in many common situations.
  4. Include reproducible code to create an example (preferably in R). A linked image would be nice.
+3  A: 

Check out Edward Tufte's work and especially this book

You can also try and catch his travelling presentation. It's quite good and includes a bundle of four of his books. (i swear i don't own his publisher's stock!)

By the way, i like his sparkline data visualization technique. Surprise! Google's already written it and put it out on Google Code

Paul Sasik
+5  A: 

Regarding sparkline and other Tufte idea, the YaleToolkit package on CRAN provides functions sparkline and sparklines.

Another package that is useful for larger datasets is hexbin as it cleverly 'bins' data into buckets to deal with datasets that may be too large for naive scatterplots.

Dirk Eddelbuettel
+1 to the sparklines. I'm currently working on a package that is focused on sparkline creation in R-- they make great additions to tables in Sweave reports.
Sharpie
Cool! I am not too happy with what Jay has in YaleToolkit and would love to have sparklines in tables!
Dirk Eddelbuettel
+15  A: 

I really agree with the other posters: Tufte's books are fantastic and well worth reading.

First, I would point you to a very nice tutorial on ggplot2 and ggobi from "Looking at Data" earlier this year. Beyond that I would just highlight one visualization from R, and two graphics packages (which are not as widely used as base graphics, lattice, or ggplot):

Heat Maps

I really like visualizations that can handle multivariate data, especially time series data. Heat maps can be useful for this. One really neat one was featured by David Smith on the Revolutions blog. Here is the ggplot code courtesy of Hadley:

stock <- "MSFT"
start.date <- "2006-01-12"
end.date <- Sys.Date()
quote <- paste("http://ichart.finance.yahoo.com/table.csv?s=",
                stock, "&a=", substr(start.date,6,7),
                "&b=", substr(start.date, 9, 10),
                "&c=", substr(start.date, 1,4), 
                "&d=", substr(end.date,6,7),
                "&e=", substr(end.date, 9, 10),
                "&f=", substr(end.date, 1,4),
                "&g=d&ignore=.csv", sep="")    
stock.data <- read.csv(quote, as.is=TRUE)
stock.data <- transform(stock.data,
  week = as.POSIXlt(Date)$yday %/% 7 + 1,
  wday = as.POSIXlt(Date)$wday,
  year = as.POSIXlt(Date)$year + 1900)

library(ggplot2)
ggplot(stock.data, aes(week, wday, fill = Adj.Close)) + 
  geom_tile(colour = "white") + 
  scale_fill_gradientn(colours = c("#D61818","#FFAE63","#FFFFBD","#B5E384")) + 
  facet_wrap(~ year, ncol = 1)

Which ends up looking somewhat like this:

alt text

RGL: Interactive 3D Graphics

Another package that is well worth the effort to learn is RGL, which easily provides the ability to create interactive 3D graphics. There are many examples online for this (including in the rgl documentation).

The R-Wiki has a nice example of how to plot 3D scatter plots using rgl.

GGobi

Another package that is worth knowing is rggobi. There is a Springer book on the subject, and lots of great documentation/examples online, including at the "Looking at Data" course.

Shane
nice. Thanks for including the code/image.
Ian Fellows
what is indicated by the vertical position of the 'Z' or bend in each solid black vertical line?
doug
Those are month boundaries (months don't end on the same day).
Shane
A: 

3D link graph with node interphase (overlapping nodes to not necessarily connect).

This is what I use in my head and I cannot even draw it.

Joshua
+1  A: 

Mosaic plots seem to me to meet all four criteria mentioned. There are examples in r, under mosaicplot.

Peter Flom
A better implementation of mosaic plots is in the vcd library (function name 'mosaic'). It has a much more flexible method signature and it is implemented in grid (rather than the 'base' graphics system).
doug
A: 

In addition to Tufte's excellent work, I recommend the books by William S. Cleveland: Visualizing Data and The Elements of Graphing Data. Not only are they excellent, but they were all done in R, and I believe the code is publicly available.

Peter Flom
+10  A: 

Plots using polar coordinates are certainly underused--some would say with good reason. I think the situations which justify their use are not common; i also think that when those situations arise, polar plots can reveal patterns in data that linear plots cannot.

I think that's because sometimes your data is polar (cyclical) rather than linear.

Here's an example. This plot shows a Website's mean traffic volume by hour. Notice the two spikes at 10 pm and at 1 am. For the Site's network engineers, those are significant; it's also significant that they occur near each other other (just two hours apart). But if you plot the same data on a traditional coordinate system, this pattern would be completely concealed--plotted linearly, these two spikes would be 20 hours apart, which they are, thought they are also just two hours apart on consecutive days. The polar chart above shows this in a parsimonious and intuitive way (a legend isn't necessary).

alt text

There are two ways (that i'm aware of) to create plots like this using R (i created the plot above w/ R). One is to code your own function in either the base or grid graphic systems. They other way, which is easier, is to use the circular package. The function you would use is 'rose.diag':

data = c(35, 78, 34, 25, 21, 17, 22, 19, 25, 18, 25, 21, 16, 20, 26, 
                 19, 24, 18, 23, 25, 24, 25, 71, 27)
three_palettes = c(brewer.pal(12, "Set3"), brewer.pal(8, "Accent"), 
                   brewer.pal(9, "Set1"))
rose.diag(data, bins=24, main="Daily Site Traffic by Hour", col=three_palettes)
doug
+3  A: 

Horizon graphs (pdf), for visualising many time series at once.

Parallel coordinates plots (pdf), for multivariate analysis.

Association and mosaic plots, for visualising contingency tables (see the vcd package)

Richie Cotton
+1  A: 

Another nice time series visualization that I was just reviewing is the "bump chart" (as featured in this post on the "Learning R" blog). This is very useful for visualizing changes in position over time.

You can read about how to create it on http://learnr.wordpress.com/, but this is what it ends up looking like:

alt text

Shane
I do like the bump chart for this particular data, but have a hard time thinking of more general situations where it would be of use. That said, I think we can all agree that the Learning R blog rocks the socks.
Ian Fellows
A bump chart is a parallel coordinate plot of ranked data.
hadley
+2  A: 

Boxplots! Example from the R help:

boxplot(count ~ spray, data = InsectSprays, col = "lightgray")

In my opinion it is the most handy way to take a quick look at the data or to compare distributions. For more complex distributions there is an extension called vioplot.

mbq
I loooove violin plots.
Matt Parker