## Revisiting Stone and Robert’s classic checkerboard score.

I was looking over the classic Stone and Roberts (1990) paper in Oecologia that presented a simple way to check for nestedness in community occurrence data. I’ve been playing with different metrics of nestedness for a community data set. However, I think I found an error in the paper. They define the number of checkerboard units as:

Cij=(ri-Sij)(rj-Sij) (1)

Where ri is the number of times that species i occurs without species j, rj is the number of times species j occurs without species i, and Sij is the number of times they co-occur. Cij calculates the number of checkerboards in your matrix, and this will be highly dependent on your sample size. They suggest standardizing by the total number of possible species pairs (P), which for M species is

P=M(M-1)/2 (2)

So then the C-score, which represents the number of checkerboard units per species pair, would be:

C-score=Cij/P. (3)

Stone and Roberts (1990) are quite clear that equation (3) defines the C-score: “We define the checkerboard score for a particular colonisation pattern (matrix) as the mean number of checkerboard units *per species-pair* of the community” (italics are mine).

Luckily, they include some of the example matricies that they use as examples right in the paper. So they define two matrices, U and V, which have the same row and column sums, but matrix U has a lot of checkerboards relative to matrix V.

They calculate C(U) as 52.6, and C(V) as 30.3. But these values represent the mean of Cij for each matrix (Equation 1), not the C-score calculated from equation 3. If we use equation 3, we get a C-score of 0.28 for matrix U and 0.16 for matrix V. Indeed, even though they define the C-score as equation 3, every time they present a C-score it is actually the mean of equation 1. For example, on page 76, they discuss matricies B, D, E and F and present C(B)=C(E)=C(F)=2.67 and C(D)=2. Again, these are the mean of Cij. If I calculate the C-score as they define it using equation 3 I get 0.45 and 0.33 respectively. Later in the paper they analyze two data sets. I don’t have access to the two data sets that they analyze, but given the scale of the C-scores they present (i.e. they are significantly greater than 1), I strongly suspect they also were calculated as the mean of Cij, not as the C-score. What does this mean for their conclusions?

Well I did some playing of my own, generating random matrices and calculating Cij like Stone and Roberts actually did and calculating C-score like Stone say they did. They are perfectly linearly correlated, Cij is just a couple of orders of magnitude larger than C-score (1467 is the slope of the linear regression):

So what does all of this mean? At the end of the day, I don’t think this changes anything but the scale of their analyses - the numbers they present are just ~3 orders of magnitude larger than they should be – but it probably doesn’t change their conclusions. So if it doesn’t matter, why did I just waste your time on this silly blog? Well, I spent the afternoon trying to figure out why I was getting the wrong answers from matrix U and V, so I thought I would share this with the internets.