Row-wise summary curves in faceted ggplot2 figures

I really enjoy reading the Junk Charts blog.  A recent post made me wonder how easy it would be to add summary curves for small-multiple type plots, assuming the “small multiples” to summarize were the X component of a ggplot2::facet_grid(Y ~ X) layer.  In other words, how could I plot the same summary curve across each row of the faceted plot?

First we need some data.  I have been working on a spectrum estimation tool with Robert Parker, and ran some benchmark tests of the core function against a function with similar functionality, namely spec.mtm in the multitaper package.

The benchmarking was done using the rbenchmark package.  In short, I generate an auto-regressive simulation using arima.sim(list(order = c(1,1,0), ar = 0.9), n), and then benchmark the functions for incremental increases in n (the length of the simulated set);  here is the resulting information as an R-data file. (I’m not showing the code used to produce the data, but if you’re curious I’ll happily provide it.)

With a bit of thought (and trial-and-effort for me), I found Hadley’s reshape2 and plyr packages made it straightforward to calculate the group statistics (note some prior steps are skipped for brevity, but the full code is linked at the end):

## reduce data.frame with melt
allbench.df.mlt <- reshape2::melt(allbench.df.drp, id.vars=c("test","num_terms"))

## calculate the summary information to be plotted:
## 'value' can be anything, but here we use meadian values from Hmisc::smean.cl.normal, which calculates confidence limits using a t-test
## 'summary' is not important for plotting -- it's just a name
tmpd <- plyr::ddply(allbench.df.mlt, .(variable, num_terms), summarise, summary="medians", value=mean_cl_normal(value)[1,1])

## create copies of 'tmpd' for each test, and map them to one data.frame
tests <- unique(allbench.df$test)
allmeds <- plyr::ldply(lapply(X=tests, FUN=function(x,df=tmpd){df$test <- x; return(df)}))

Here’s the final result, after adding a ggplot2::geom_line layer with the allmeds data frame to the faceted plot:

rlpspec_allbench

This type of visualization helps visually identify differences among subsets of data.  Here, the lines help distinguish the benchmark information by method (facet columns).  Of course the stability of benchmark data depends on the number replications, but here we can see the general shape of the user.self and elapsed times are consistent across the three methods, and that the rlpSpec methods consume less sys.self time with increasing series length.  Most surprising to me is the convergence of relative times with increasing series length.  When the number of terms is more than approximately 5000, the methods have roughly equal performance; below this threshold the spec.mtm method can be upwards of 2-3 times faster, which should not be too surprising given that it calls Fortran source code.

I assume there is a slick way to do this with ggplot2::stat_summary, but I was scratching my head trying to figure it out.  Any insight into a better or easier way to do this is especially welcome!

Here is the code to produce the figure, as a gist.  If you have any troubles accessing the data, please let me know.

About these ads

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.
The Esquire Theme.

Follow

Get every new post delivered to your Inbox.

Join 36 other followers

%d bloggers like this: