Selection of survey sites maximizing uniformity in geography

Selection of sites to be sampled in a survey, with the goal of maximizing uniformity of points in geographic space.

Usage

uniformG_selection(master, expected_points, guess_distances = TRUE,
                   initial_distance = NULL, increase = NULL,
                   max_n_samplings = 1, replicates = 10,
                   use_preselected_sites = TRUE,
                   median_distance_filter = NULL, set_seed = 1,
                   verbose = TRUE, force = FALSE)

Arguments

master: master_matrix object derived from function prepare_master_matrix or master_selection object derived from functions random_selection, uniformE_selection, or EG_selection.
expected_points: (numeric) total number of survey points (sites) to be selected.
guess_distances: (logical) whether or not to use internal algorithm to select automatically initial_distance and increase. Default = TRUE. If FALSE, initial_distance and increase must be defined.
initial_distance: (numeric) distance in km to be used for a first process of thinning and detection of remaining points. Default = NULL.
increase: (numeric) initial value to be added to or subtracted from initial_distance until reaching the number of expected_points. Default = NULL.
max_n_samplings: (numeric) maximum number of samples to be chosen after performing all thinning replicates. Default = 1.
replicates: (numeric) number of thinning replicates. Default = 10.
use_preselected_sites: (logical) whether to use sites that have been defined as part of the selected sites previous any selection. Object in master must contain the site(s) preselected in and element of name "preselected_sites" for this argument to be effective. Default = TRUE. See details for more information on the approach used.
median_distance_filter: (character) optional argument to define a median distance-based filter based on which sets of sampling sites will be selected. The default, NULL, does not apply such a filter. Options are: "max" and "min".
set_seed: (numeric) integer value to specify an initial seed. Default = 1.
verbose: (logical) whether or not to print messages about the process. Default = TRUE.
force: (logical) whether to replace existing set of sites selected with this method in master.

Value

A master_selection object (S3) with an element called selected_sites_G containing one or more sets of selected sites.

Details

Survey sites are selected searching for maximum geographic distances among all sites. This approach helps in selecting points that can cover most of the geographic extent of the region of interest. This type of selection could be appropriate when the region of interest has a complex geographic pattern (e.g., an archipelago). This type of selection does not consider environmental conditions in the region of interest, which is why important environmental combinations may not be represented in the final selection of sites.

Exploring the geographic and environmental spaces of the region of interest would be a crucial first step before selecting survey sites. Such explorations can be done using the function explore_data_EG.

If use_preselected_sites = TRUE and such sites are included as an element in the object in master, the approach for selecting uniform sites in geography is different than what was described above. User-preselected sites will always be part of the sites selected. Other points are selected based on an algorithm that searches for sites that are uniformly distributed in geographic space but at a distance from preselected sites that helps in maintaining uniformity. Note that preselected sites will not be processed; therefore, uniformity of such points cannot be warrantied.

As multiple sets could result from selection when the use_preselected_sites is set as FALSE, the argument of the function median_distance_filter could be used to select the set of sites with the maximum ("max") or minimum ("min") median distance among selected sites. The option "max" will increase the geographic distance among sampling sites, which could be desirable if the goal is to cover the region of interest more broadly. The other option, "min", could be used in cases when the goal is to reduce resources and time needed to sample such sites.

Examples

# Data
m_matrix <- read_master(system.file("extdata/m_matrix.rds",
                                    package = "biosurvey"))

# Selecting sites uniformly in G space
selectionG <- uniformG_selection(m_matrix, expected_points = 40,
                                 max_n_samplings = 1, replicates = 5)
#> Element 'preselected_sites' in 'master' is NULL, setting
#> 'use_preselected_sites' = FALSE
#> Running algorithm for selecting sites, please wait...
#>     Distance  93.995  resulted in  99  points
#>     Distance  103.394  resulted in  89  points
#>     Distance  112.794  resulted in  74  points
#>     Distance  122.193  resulted in  63  points
#>     Distance  131.593  resulted in  54  points
#>     Distance  140.992  resulted in  50  points
#>     Distance  150.391  resulted in  44  points
#>     Distance  159.791  resulted in  40  points
#> Total number of sites selected: 40