Abstract of Meeting Paper

Society for Risk Analysis 1995 Annual Meeting

Defining the Statistical Moments of Small Data Sets: Indoor Activity Patterns Case Study. T. E. McKone, K. L. Kiefer, R. C. Currie, S. Geng, and D. P. H. Hsieh, Risk Sciences Program, Environmental Toxicology Department, University of California, Davis, CA, 95616

In analyzing parameter uncertainty/variability for human exposure, one issue that must be confronted is how to rank individual inputs according to their contribution to overall variance in the predicted health risk. An important first step in this process is to define the range and statistical moments (mean, variance, etc.) of the input data. The purpose of this paper is to demonstrate how small data sets in combination with prior information can be used to maximize the precision and accuracy of the moment estimates for these data sets. We demonstrate the process using data sets of reported shower duration as a case study. Shower duration is an indoor activity related to dermal and inhalation exposures to contaminated ground water. We consider three data sets in our analysis--(1) a group of 2000 individuals in Perth, Australia who recorded and reported their shower durations, (2) two groups of U.S. university students consisting respectively of 27 and 25 individuals who both made prior estimates and then measured and reported their shower durations, and (3) a group of 16 environmental professionals who both estimated then measured and reported their shower duration. We fit the moments of these data sets to distributions using three methods--(1) graphical goodness of fit methods (graphical method), (2) constrained maximum entropy (subjective method), and (3) resampling (bootstrap) method. Using these data sets and these three methods, we examined the extent to which the large data from Australia provided an appropriate proxy distribution of shower duration relative to the smaller data sets for U.S. students and U.S. environmental professionals. This comparison is based on the reliability of the moment estimates for the various data sets. We then assessed the reliability with which graphical, subjective, and resampling methods can be used to characterize the likely range of the "true" moments of any distribution when only a small subset of the full set of samples is available.

This work was performed in part under the auspices of the U.S. Department of Energy (DOE) through Lawrence Livermore National Laboratory under Contract W-7405-Eng-48 and in part at the University of California, Davis Risk Sciences Program with funding provided by California Office of Environmental Health Hazard Assessment.