Reimagining the Science Behind Sample Size

Wildlife of the same species regularly gather in groups on the landscape.

Some groups represent families. Other groups arise when animals congregate to, say, sleep in a den for the night, visit a reliable water hole during the dry season, or make their annual migration.

Grouping behaviors can influence disease dynamics. Similar to how the flu works its way through every member of a single household, other contagious diseases can spread among animals that interact closely.

Biologists who study wildlife diseases often want to know if a specific disease is present or absent in a population of wildlife. Statisticians regularly help these biologists by calculating the number of animal tests needed for scientists to be sufficiently confident that the overall population does not harbor the disease. But traditional sample size computations often assume that each animal is ‘its own island’. That is, the disease status of any two individuals is independent (uncorrelated), which is likely not the case when individuals exhibit grouping behaviors.

A group of white-tailed deer of varied ages.

We asked ourselves, ‘Can we leverage the fact that transmission of some diseases occurs more easily among groups when collecting samples to study disease in overall populations?’ Led by statistician Dr. Booth and wildlife disease ecologist Dr. Schuler, our interdisciplinary team dove into the foundational math equations underlying sample size and modified them to account for the fact that animals that hang out together frequently share their diseases.

We found that in the study of certain contagious diseases, it is possible to make use of animal clustering to obtain the same amount of population-scale information by sampling FEWER individuals than prescribed using traditional sample size calculations! Clustering allows for sampling savings!

To understand how, suppose you are interested in how many beavers have contagious disease X in New York State. Beavers naturally live in family groups, where members congregate in their lodge. If scientists test Papa beaver for contagious disease X, and Papa beaver regularly interacts with Mama beaver and Baby beaver, then the one diagnostic result from Papa beaver can suggest the diagnostic results of the entire family! Now scale that same concept up to the set of ponds representing the entire beaver population in NYS. If you randomly visit a set of ponds, where each pond contains only one family of beavers, and take diagnostic samples from only the Papa beavers (n beavers), then we can infer disease information for many more than beavers because each beaver family is correlated with its Papa!

Beaver family swimming near their lodge. Beavers stay in family groups, which include an adult male and female, young kits, and slightly older offspring.

But CAUTION! This ‘sampling savings’ is not applicable in every epidemiological or sampling situation!

Disease type matters when trying to save on samples! For example, say Papa beaver has a non-contagious disease like Y. Testing Papa beaver for Y won’t tell us anything about the status of Y in Mama and Baby beaver, so these novel equations make us no better off than before.

The distribution of samples across the landscape also affects sampling savings! For example, say you have money to pay for 100 diagnostic tests. It turns out, it is better to sample a small number of beavers from many families than a large number of beavers from few families. After all, if you test a lot of beavers within the same family, each test provides diminishing returns of new disease information, meaning that testing becomes increasingly wasteful. If scientists fail to consider the grouping behavior of the beavers, they may test far too many beavers from the same family and learn much less about the overall beaver population of NYS than they could had they spread their sampling out across many beaver families.

In summary, if (1) the disease of interest is contagious, (2) the wildlife species of interest tends to cluster in time or space in some predicable manner, and (3) we can sample few animals from as many different clusters as possible, then we can obtain scientifically valid sample sizes to be confident that a specific disease is absent from a wildlife population while sampling fewer individuals compared to using traditional sample size equations.

See the exciting full article, "Sample Size for Estimating Disease Prevalence in Free-Ranging Wildlife Populations: A Bayesian Modeling Approach," published in the Journal of Agricultural, Biological, and Environmental Statistics

Finding the key to sample size relies on basic concepts, such as utilizing mathematical equations to calculate sample sizes tailored to specific wildlife species and diseases.

Wildlife of the same species regularly gather in groups on the landscape.

But CAUTION! This ‘sampling savings’ is not applicable in every epidemiological or sampling situation!

Related Content