Selection of survey sites maximizing uniformity in geography
Source:R/uniformG_selection.R
uniformG_selection.Rd
Selection of sites to be sampled in a survey, with the goal of maximizing uniformity of points in geographic space.
Usage
uniformG_selection(master, expected_points, guess_distances = TRUE,
initial_distance = NULL, increase = NULL,
max_n_samplings = 1, replicates = 10,
use_preselected_sites = TRUE,
median_distance_filter = NULL, set_seed = 1,
verbose = TRUE, force = FALSE)
Arguments
- master
master_matrix object derived from function
prepare_master_matrix
or master_selection object derived from functionsrandom_selection
,uniformE_selection
, orEG_selection
.- expected_points
(numeric) total number of survey points (sites) to be selected.
- guess_distances
(logical) whether or not to use internal algorithm to select automatically
initial_distance
andincrease
. Default = TRUE. If FALSE,initial_distance
andincrease
must be defined.- initial_distance
(numeric) distance in km to be used for a first process of thinning and detection of remaining points. Default = NULL.
- increase
(numeric) initial value to be added to or subtracted from
initial_distance
until reaching the number ofexpected_points
. Default = NULL.- max_n_samplings
(numeric) maximum number of samples to be chosen after performing all thinning
replicates
. Default = 1.- replicates
(numeric) number of thinning replicates. Default = 10.
- use_preselected_sites
(logical) whether to use sites that have been defined as part of the selected sites previous any selection. Object in
master
must contain the site(s) preselected in and element of name "preselected_sites" for this argument to be effective. Default = TRUE. See details for more information on the approach used.- median_distance_filter
(character) optional argument to define a median distance-based filter based on which sets of sampling sites will be selected. The default, NULL, does not apply such a filter. Options are: "max" and "min".
- set_seed
(numeric) integer value to specify an initial seed. Default = 1.
- verbose
(logical) whether or not to print messages about the process. Default = TRUE.
- force
(logical) whether to replace existing set of sites selected with this method in
master
.
Value
A master_selection
object (S3) with an element
called selected_sites_G containing one or more sets of selected sites.
Details
Survey sites are selected searching for maximum geographic distances among all sites. This approach helps in selecting points that can cover most of the geographic extent of the region of interest. This type of selection could be appropriate when the region of interest has a complex geographic pattern (e.g., an archipelago). This type of selection does not consider environmental conditions in the region of interest, which is why important environmental combinations may not be represented in the final selection of sites.
Exploring the geographic and environmental spaces of the region of interest
would be a crucial first step before selecting survey sites. Such
explorations can be done using the function explore_data_EG
.
If use_preselected_sites
= TRUE and such sites are included as an
element in the object in master
, the approach for selecting uniform
sites in geography is different than what was described above.
User-preselected sites will always be part of the sites selected. Other
points are selected based on an algorithm that searches for sites that are
uniformly distributed in geographic space but at a distance from preselected
sites that helps in maintaining uniformity. Note that preselected sites will
not be processed; therefore, uniformity of such points cannot be warrantied.
As multiple sets could result from selection when the
use_preselected_sites
is set as FALSE, the argument of the function
median_distance_filter
could be used to select the set of sites with
the maximum ("max") or minimum ("min") median distance among selected sites.
The option "max" will increase the geographic distance among sampling sites,
which could be desirable if the goal is to cover the region of interest more
broadly. The other option, "min", could be used in cases when the goal is to
reduce resources and time needed to sample such sites.
Examples
# Data
m_matrix <- read_master(system.file("extdata/m_matrix.rds",
package = "biosurvey"))
# Selecting sites uniformly in G space
selectionG <- uniformG_selection(m_matrix, expected_points = 40,
max_n_samplings = 1, replicates = 5)
#> Element 'preselected_sites' in 'master' is NULL, setting
#> 'use_preselected_sites' = FALSE
#> Running algorithm for selecting sites, please wait...
#> Distance 93.995 resulted in 99 points
#> Distance 103.394 resulted in 89 points
#> Distance 112.794 resulted in 74 points
#> Distance 122.193 resulted in 63 points
#> Distance 131.593 resulted in 54 points
#> Distance 140.992 resulted in 50 points
#> Distance 150.391 resulted in 44 points
#> Distance 159.791 resulted in 40 points
#> Total number of sites selected: 40