How-To: Scatterplot Matrix

You’ve probably seen the simple scatterplot example already. We will now extend this simple example to do something a bit more interesting: a scatterplot matrix! That is, a grid of little scatterplots, each displaying the relationship between two dimensions i and j.

For example, given tabular data on Iris flowers, how might we explore the relationships between sepal and petal dimensions and species?

{ sepalLength: 5.1, sepalWidth: 3.5, petalLength: 1.4, petalWidth: 0.2, species: "setosa" },
{ sepalLength: 4.9, sepalWidth: 3.0, petalLength: 1.4, petalWidth: 0.2, species: "setosa" },
{ sepalLength: 4.7, sepalWidth: 3.2, petalLength: 1.3, petalWidth: 0.2, species: "setosa" },
...

Here’s a rudimentary scatterplot matrix:

var s = 100,
    p = 5,
    keys = pv.keys(flowers[0]);

var vis = new pv.Panel()
    .width(keys.length * (s + 2 * p))
    .height(keys.length * (s + 2 * p));

var cell = vis.add(pv.Panel)
    .data(keys)
    .width(s)
    .left(function() this.index * (s + 2 * p) + p)
  .add(pv.Panel)
    .data(keys)
    .height(s)
    .top(function() this.index * (s + 2 * p) + p)
    .strokeStyle("#ccc");

cell.add(pv.Dot)
    .def("x", function(k0, k1) pv.Scale.linear(flowers, function(d) d[k0]).range(0, s))
    .def("y", function(k0, k1) pv.Scale.linear(flowers, function(d) d[k1]).range(0, s))
    .data(flowers)
    .left(function(d, k0, k1) this.x()(d[k0]))
    .bottom(function(d, k0, k1) this.y()(d[k1]));

vis.render();

Breaking it down:

1. A root [PvPanel panel], vis, to contain the visualization. Its dimensions are a multiple of the size s and some padding p between cells.

2. Another panel, cell, to contain the scatterplot. But note that cell is not directly a child of the root: we use one panel to make columns, and another panel to make rows. These panels share the same data: the array of dimensions (“sepalLength”, “sepalWidth”, etc.), keys.

3. A dot, to produce the scatterplot. We use two local variables x and y to store linear scales for each dimensions. Note: it would be more efficient to cache these scales for each dimension, rather than constructing them on the fly. (Exercise: Try it!)

The result is a good start:

Of course, one of the problems with the rudimentary approach is that a linear scale isn’t appropriate for all dimensions: the species name is ordinal, not quantitative. It might be useful in Protovis generated a suitable default scale for you, but it’s easy enough to write your own scale factory:

function scale(t) {
  return (typeof data[0][t] == "number")
      ? pv.Scale.linear(data, function(d) d[t]).range(0, s)
      : pv.Scale.ordinal(data, function(d) d[t]).split(0, s);
}

Then, replace the defs for x and y to use the new scale factory:

    .def("x", function(k0, k1) scale(k0))
    .def("y", function(k0, k1) scale(k1))

You could employ another ordinal scale to color the dots by species. This is left as an exercise for the reader.

Lastly, you may want to add appropriate labels, ticks, or even a legend!

Next: Scale Interaction