Selection of survey sites maximizing uniformity in environmental space considering geographic structure
Source:R/EG_selection.R
EG_selection.Rd
Selection of sites to be sampled in a survey, with the goal of maximizing uniformity of points in the environment, but considering geographic patterns of data. Sets of points that are environmentally similar and have a disjoint pattern in geography, are selected twice (two survey sites are placed so they consider the biggest geographic clusters).
Usage
EG_selection(master, n_blocks, guess_distances = TRUE, initial_distance = NULL,
increase = NULL, max_n_samplings = 1, replicates = 10,
use_preselected_sites = TRUE, select_point = "E_centroid",
cluster_method = "hierarchical", median_distance_filter = NULL,
sample_for_distance = 250, set_seed = 1,
verbose = TRUE, force = FALSE)
Arguments
- master
master_matrix object derived from the function
prepare_master_matrix
or master_selection object derived from functionsrandom_selection
,uniformG_selection
, oruniformE_selection
.- n_blocks
(numeric) number of blocks to be selected to be used as the base for further explorations. If preselected sites are used, this number must be larger than the number of unique blocks already represented by such sites.
- guess_distances
(logical) whether or not to use internal algorithm to automatically select
initial_distance
andincrease
. Default = TRUE. If FALSE,initial_distance
andincrease
must be defined.- initial_distance
(numeric) Euclidean distance to be used for a first process of thinning and detection of remaining blocks. See details in
point_thinning
. Default = NULL.- increase
(numeric) initial value to be added to or subtracted from
initial_distance
until reaching the number ofexpected_points
. Default = NULL.- max_n_samplings
(numeric) maximum number of samples to be chosen after performing all thinning
replicates
. Default = 1.- replicates
(numeric) number of thinning replicates performed to select blocks uniformly. Default = 10.
- use_preselected_sites
(logical) whether to use sites that have been defined as part of the selected sites previous any selection. Object in
master
must contain the site(s) preselected in in the slot named "preselected_sites" for this argument to be effective. Default = TRUE. See details for more information on the approach used.- select_point
(character) how or which point will be selected for each block or cluster. Three options are available: "random", "E_centroid", and "G_centroid". E_ or G_ centroid indicate that the point(s) closets to the respective centroid will be selected. Default = "E_centroid".
- cluster_method
(character) name of the method to be used for detecting geographic clusters of points inside each block. Options are "hierarchical" and "k-means"; default = "hierarchical". See details in
find_clusters
.- median_distance_filter
(character) optional argument to define a median distance-based filter based on which sets of sampling sites will be selected. The default, NULL, does not apply such a filter. Options are: "max" and "min". See details.
- sample_for_distance
(numeric) sample to be considered when measuring the geographic distances among points in blocks created in environmental space. The distances measured are then used to test whether points are distributed uniformly or not in the geography. Default = 250.
- set_seed
(numeric) integer value to specify a initial seed. Default = 1.
- verbose
(logical) whether or not to print messages about the process. Default = TRUE.
- force
(logical) whether to replace existing set of sites selected with this method in
master
.
Value
A master_selection
object (S3) with a special element called
selected_sites_EG containing one or more sets of selected sites depending on
max_n_samplings
and median_distance_filter
.
Details
Two important steps are needed before using this function: 1) exploring data
in environmental and geographic spaces, and 2) performing a regionalization
of the environmental space. Exploring the data can be done using the function
explore_data_EG
. This step is optional but strongly
recommended, as important decisions that need to be taken depend on the
of the data in the two spaces. A regionalization of the environmental space
configuration of the region of interest helps in defining important parts of
your region that should be considered to select sites. This can be done
using the function make_blocks
. Later, the regions created in
environmental space will be used for selecting one or more sampling sites per
block depending on the geographic pattern of such environmental combinations.
The process of survey-site selection with this function is the most complex among all functions in this package. The complexity derives from the aim of the function, which is to select sites that sample appropriately environmental combinations in the region of interest (environmental space), but considering the geographic patterns of such environmental regions (geographic space).
In this approach, the first step is to select candidate blocks (from the
ones obtained with make_blocks
) that are uniformly distributed
in environmental space. The geographic configuration of points in such
blocks is explored to detect whether they are clustered (i.e., similar
environmental conditions are present in distant places in the region of
interest). For blocks with points that present one cluster in geography,
only one survey site is selected, and for those with multiple clusters in
geographic space, two survey sites are selected considering the two largest
clusters.
If use_preselected_sites
is TRUE and such sites are included as an
element in the object in master
, the approach for selecting sites in
environmental space considering geographic patterns is a little different.
User-preselected sites will always be part of the sites selected. Other points
are selected based on an algorithm that searches for sites that are uniformly
distributed in environmental space but at a distance from preselected sites
that helps in maintaining uniformity among environmental blocks selected.
Note that preselected sites will not be processed, therefore, uniformity of
blocks representing such points cannot be warrantied.
As multiple sets could result from selection, the argument of the function
median_distance_filter
could be used to select the set of sites with
the maximum ("max") or minimum ("min") median distance among selected sites.
Option "max" will increase the geographic distance among sampling sites,
which could be desirable if the goal is to cover the region of interest more
broadly. The other option, "min", could be used in cases when the goal is to
reduce resources and time needed to sample such sites.
Examples
# \donttest{
# Data
m_matrix <- read_master(system.file("extdata/m_matrix.rds",
package = "biosurvey"))
# Making blocks for analysis
m_blocks <- make_blocks(m_matrix, variable_1 = "PC1", variable_2 = "PC2",
n_cols = 10, n_rows = 10, block_type = "equal_area")
# Selecting sites considering E and G spaces
EG_sel <- EG_selection(master = m_blocks, n_blocks = 10,
initial_distance = 1.5, increase = 0.1,
replicates = 1, max_n_samplings = 1,
select_point = "E_centroid",
cluster_method = "hierarchical",
sample_for_distance = 100)
#> Element 'preselected_sites' in 'master' is NULL, setting
#> 'use_preselected_sites' = FALSE
#> Preparing data for analysis
#> Selecting relevant environmental blocks, please wait...
#> Running algorithm for selecting sites, please wait...
#> Process 1 of 1
#> Total number of sites selected: 13
head(EG_sel$selected_sites_EG[[1]])
#> Longitude Latitude Mean_temperature Max_temperature Min_temperature
#> 14566 -97.41667 19.91667 180 271 84
#> 17425 -95.91667 17.41667 215 305 128
#> 589 -115.25000 32.08333 215 396 49
#> 15327 -97.25000 19.25000 110 208 12
#> 4615 -109.25000 28.58333 197 344 47
#> 17631 -93.25000 17.25000 223 306 138
#> Annual_precipitation Prec_wettest_month Prec_driest_month PC1
#> 14566 2595 508 73 4.4103564
#> 17425 3059 614 61 5.6781935
#> 589 76 12 0 -1.4351612
#> 15327 1147 219 22 -0.8983117
#> 4615 782 208 8 -0.1402882
#> 17631 2491 366 79 4.8642518
#> PC2 Block
#> 14566 4.32123454 75
#> 17425 3.32007840 96
#> 589 -2.23368181 1
#> 15327 4.46512880 31
#> 4615 0.03781652 37
#> 17631 2.52913164 84
dim(EG_sel$selected_sites_EG[[1]])
#> [1] 13 11
# }