Presence-only datasets represent an important source of information on species' distributions. Collections of presence-only data, however, are often spatially biased, particularly along roads and near urban populations. These biases can lead to inaccurate inferences and predicted distributions. We demonstrate a new approach of accounting for effort bias in presence-only data by explicitly incorporating sample biases in species distribution modelling.
First, we used logistic regression to model sampling effort of recorded rare vascular plants, bryophytes and butterflies in Alberta. Second, we simulated presence/absence data for nine ‘virtual’ species based on three relative occurrence thresholds – common, rare and very rare – for each taxonomic group. We sampled these virtual species using our bias model to represent typical sampling effort characteristic of presence-only datasets. We then modelled the distributions of these virtual species using logistic regression and attempted to recover their original simulated distributions using a sample weighting term (prior weight) estimated as the inverse of probability of sampling. Bias-adjusted model estimates were compared to those obtained from random samples and biased samples without adjustment. We also compared prior-weight adjustment to bias-file and target-group background approaches in Maxent.
Sample weighting recovered regression coefficients and mapped predictions estimated from unbiased presence-only data and improved model predictive accuracy as evaluated by regression and correlation coefficients, sensitivity and specificity. Similar model improvements were achieved using the Maxent bias-file method, but results were inconsistent for the target-group background approach.