Cancer Clusters: How To Be Sure You Have One. K. Hwang, S. Ferson, and R. Grimson, Applied Biomathematics, Setauket, NY and SUNY Stony Brook, NY
How much should we worry about ten more cases of childhood cancers in a city than would be expected from its population size and the background incidence rate? Until now, it has been difficult or impossible to compute P-values for putative diseases clusters when the data set is small (as it usually is for cancer). Traditional statistical tests for clustering assume asymptotically large sample sizes and are therefore not strictly applicable when data are sparse. Numerical studies show, in fact, that widely used tests such as chi-square routinely and strongly overestimate the evidence for clustering. Thus, they can cause more alarm than is warranted. We describe several new statistical methods, implemented in a convenient software package, that can be used to compute exact P-values for clustering. These new methods can be used whatever the size of the data set, and are especially useful when data sets are extremely small. They provide tools to public health researchers and epidemiologists that, for the first time, have wide applicability for detecting clustering and other epidemiologic patterns in data sets of the size usually encountered in practice. We describe the relative statistical power of these tests under different kinds of clustering mechanisms and data set configurations.