Skip to contents

Selection of sites to be sampled in a survey, with the goal of maximizing uniformity of points in the environment, but considering geographic patterns of data. Sets of points that are environmentally similar and have a disjoint pattern in geography, are selected twice (two survey sites are placed so they consider the biggest geographic clusters).

Usage

EG_selection(master, n_blocks, guess_distances = TRUE, initial_distance = NULL,
             increase = NULL, max_n_samplings = 1, replicates = 10,
             use_preselected_sites = TRUE, select_point = "E_centroid",
             cluster_method = "hierarchical", median_distance_filter = NULL,
             sample_for_distance = 250, set_seed = 1,
             verbose = TRUE, force = FALSE)

Arguments

master

master_matrix object derived from the function prepare_master_matrix or master_selection object derived from functions random_selection, uniformG_selection, or uniformE_selection.

n_blocks

(numeric) number of blocks to be selected to be used as the base for further explorations. If preselected sites are used, this number must be larger than the number of unique blocks already represented by such sites.

guess_distances

(logical) whether or not to use internal algorithm to automatically select initial_distance and increase. Default = TRUE. If FALSE, initial_distance and increase must be defined.

initial_distance

(numeric) Euclidean distance to be used for a first process of thinning and detection of remaining blocks. See details in point_thinning. Default = NULL.

increase

(numeric) initial value to be added to or subtracted from initial_distance until reaching the number of expected_points. Default = NULL.

max_n_samplings

(numeric) maximum number of samples to be chosen after performing all thinning replicates. Default = 1.

replicates

(numeric) number of thinning replicates performed to select blocks uniformly. Default = 10.

use_preselected_sites

(logical) whether to use sites that have been defined as part of the selected sites previous any selection. Object in master must contain the site(s) preselected in in the slot named "preselected_sites" for this argument to be effective. Default = TRUE. See details for more information on the approach used.

select_point

(character) how or which point will be selected for each block or cluster. Three options are available: "random", "E_centroid", and "G_centroid". E_ or G_ centroid indicate that the point(s) closets to the respective centroid will be selected. Default = "E_centroid".

cluster_method

(character) name of the method to be used for detecting geographic clusters of points inside each block. Options are "hierarchical" and "k-means"; default = "hierarchical". See details in find_clusters.

median_distance_filter

(character) optional argument to define a median distance-based filter based on which sets of sampling sites will be selected. The default, NULL, does not apply such a filter. Options are: "max" and "min". See details.

sample_for_distance

(numeric) sample to be considered when measuring the geographic distances among points in blocks created in environmental space. The distances measured are then used to test whether points are distributed uniformly or not in the geography. Default = 250.

set_seed

(numeric) integer value to specify a initial seed. Default = 1.

verbose

(logical) whether or not to print messages about the process. Default = TRUE.

force

(logical) whether to replace existing set of sites selected with this method in master.

Value

A master_selection object (S3) with a special element called selected_sites_EG containing one or more sets of selected sites depending on max_n_samplings and median_distance_filter.

Details

Two important steps are needed before using this function: 1) exploring data in environmental and geographic spaces, and 2) performing a regionalization of the environmental space. Exploring the data can be done using the function explore_data_EG. This step is optional but strongly recommended, as important decisions that need to be taken depend on the of the data in the two spaces. A regionalization of the environmental space configuration of the region of interest helps in defining important parts of your region that should be considered to select sites. This can be done using the function make_blocks. Later, the regions created in environmental space will be used for selecting one or more sampling sites per block depending on the geographic pattern of such environmental combinations.

The process of survey-site selection with this function is the most complex among all functions in this package. The complexity derives from the aim of the function, which is to select sites that sample appropriately environmental combinations in the region of interest (environmental space), but considering the geographic patterns of such environmental regions (geographic space).

In this approach, the first step is to select candidate blocks (from the ones obtained with make_blocks) that are uniformly distributed in environmental space. The geographic configuration of points in such blocks is explored to detect whether they are clustered (i.e., similar environmental conditions are present in distant places in the region of interest). For blocks with points that present one cluster in geography, only one survey site is selected, and for those with multiple clusters in geographic space, two survey sites are selected considering the two largest clusters.

If use_preselected_sites is TRUE and such sites are included as an element in the object in master, the approach for selecting sites in environmental space considering geographic patterns is a little different. User-preselected sites will always be part of the sites selected. Other points are selected based on an algorithm that searches for sites that are uniformly distributed in environmental space but at a distance from preselected sites that helps in maintaining uniformity among environmental blocks selected. Note that preselected sites will not be processed, therefore, uniformity of blocks representing such points cannot be warrantied.

As multiple sets could result from selection, the argument of the function median_distance_filter could be used to select the set of sites with the maximum ("max") or minimum ("min") median distance among selected sites. Option "max" will increase the geographic distance among sampling sites, which could be desirable if the goal is to cover the region of interest more broadly. The other option, "min", could be used in cases when the goal is to reduce resources and time needed to sample such sites.

Examples

# \donttest{
# Data
m_matrix <- read_master(system.file("extdata/m_matrix.rds",
                                    package = "biosurvey"))

# Making blocks for analysis
m_blocks <- make_blocks(m_matrix, variable_1 = "PC1", variable_2 = "PC2",
                        n_cols = 10, n_rows = 10, block_type = "equal_area")

# Selecting sites considering E and G spaces
EG_sel <- EG_selection(master = m_blocks, n_blocks = 10,
                       initial_distance = 1.5, increase = 0.1,
                       replicates = 1, max_n_samplings = 1,
                       select_point = "E_centroid",
                       cluster_method = "hierarchical",
                       sample_for_distance = 100)
#> Element 'preselected_sites' in 'master' is NULL, setting
#> 'use_preselected_sites' = FALSE
#> Preparing data for analysis
#> Selecting relevant environmental blocks, please wait...
#> Running algorithm for selecting sites, please wait...
#>     Process 1 of 1
#> Total number of sites selected: 13

head(EG_sel$selected_sites_EG[[1]])
#>        Longitude Latitude Mean_temperature Max_temperature Min_temperature
#> 14566  -97.41667 19.91667              180             271              84
#> 17425  -95.91667 17.41667              215             305             128
#> 589   -115.25000 32.08333              215             396              49
#> 15327  -97.25000 19.25000              110             208              12
#> 4615  -109.25000 28.58333              197             344              47
#> 17631  -93.25000 17.25000              223             306             138
#>       Annual_precipitation Prec_wettest_month Prec_driest_month        PC1
#> 14566                 2595                508                73  4.4103564
#> 17425                 3059                614                61  5.6781935
#> 589                     76                 12                 0 -1.4351612
#> 15327                 1147                219                22 -0.8983117
#> 4615                   782                208                 8 -0.1402882
#> 17631                 2491                366                79  4.8642518
#>               PC2 Block
#> 14566  4.32123454    75
#> 17425  3.32007840    96
#> 589   -2.23368181     1
#> 15327  4.46512880    31
#> 4615   0.03781652    37
#> 17631  2.52913164    84
dim(EG_sel$selected_sites_EG[[1]])
#> [1] 13 11
# }