Displaying multivariate data

. provides two very useful functions for representing multivariate data. If X is a numeric matrix or data frame, the command

pairs(X)

produces a pairwise scatterplot matrix of the variables defined by the columns of X, that is, every column of X is plotted against every other column of X and the resulting n(n-1) plots are arranged in a matrix with plot scales constant over the rows and columns of the matrix.

When three or four variables are involved a coplot may be more enlightening. If a and b are numeric vectors and c is a numeric vector or factor object (all of the same length), then the command

coplot(a b | c)

produces a number of scatterplots of a against b for given values of c. If c is a factor, this simply means that a is plotted against b for every level of c. When c is numeric, it is divided into a number of conditioning intervals and for each interval a is plotted against b for values of c within the interval. The number and position of intervals can be controlled with given.values= argument to coplot() -- the function co.intervals() is useful for selecting intervals. You can also use two given variables with a command like

coplot(a b | c + d)

which produces scatterplots of a against b for every joint conditioning interval of c and d.

The coplot() and pairs() function both take an argument panel= which can be used to customise the type of plot which appears in each panel. The default is points() to produce a scatterplot but by supplying some other low-level graphics function of two vectors x and y as the value of panel= you can produce any type of plot you wish. An example panel function useful for coplots is panel.smooth().



Jeff Banfield
2/13/1998