Detection of clusters in 2D spaces — find

Finds clusters of data in two dimensions based on distinct methods.

Usage

find_clusters(data, x_column, y_column, space,
              cluster_method = "hierarchical", n_k_means = NULL,
              split_distance = NULL)

Arguments

data: matrix or data.frame that contains at least two columns.
x_column: (character) the name of the x-axis.
y_column: (character) the name of the y-axis.
space: (character) space in which the thinning will be performed. There are two options available: "G", if it will be in the geographic space, and "E", if it will be in the environmental space.
cluster_method: (character) name of the method to be used for detecting clusters. Options are "hierarchical" and "k-means"; default = "hierarchical".
n_k_means: (numeric) number of clusters to be identified when using the "k-means" in cluster_method.
split_distance: (numeric) distance in meters (if space = "G") or Euclidean distance (if space = "E") to identify clusters if cluster_method = "hierarchical".

Value

A data frame containing data and an additional column defining clusters.

Details

Clustering methods make distinct assumptions and one of them may perform better than the other depending on the pattern of the data.

The k-means method tends to perform better when data are grouped spatially (spherically) and clusters are of a similar size. The hierarchical clustering algorithm usually takes more time than the k-means method. Both methods make assumptions and may work well on some data sets but fail on others.

Examples

# Data
m_matrix <- read_master(system.file("extdata/m_matrix.rds",
                                    package = "biosurvey"))

# Cluster detection
clusters <-  find_clusters(m_matrix$data_matrix, x_column = "PC1",
                           y_column = "PC2", space = "E",
                           cluster_method = "hierarchical", n_k_means = NULL,
                           split_distance = 4)
head(clusters)
#>    Longitude Latitude Mean_temperature Max_temperature Min_temperature
#> 8  -117.0833 32.58333              170             266              71
#> 9  -116.9167 32.58333              169             293              59
#> 10 -116.7500 32.58333              165             318              44
#> 11 -116.5833 32.58333              153             326              23
#> 12 -116.4167 32.58333              146             329               9
#> 13 -116.2500 32.58333              148             333               7
#>    Annual_precipitation Prec_wettest_month Prec_driest_month       PC1
#> 8                   234                 46                 1 -1.752933
#> 9                   277                 52                 1 -1.728394
#> 10                  336                 64                 2 -1.696348
#> 11                  409                 78                 2 -1.845195
#> 12                  400                 76                 1 -2.063613
#> 13                  320                 60                 1 -2.181253
#>          PC2 clusters
#> 8  1.0409917        1
#> 9  0.5996405        1
#> 10 0.2890784        1
#> 11 0.4419228        1
#> 12 0.5029680        1
#> 13 0.3238374        1