Finds clusters of data in two dimensions based on distinct methods.
Usage
find_clusters(data, x_column, y_column, space,
cluster_method = "hierarchical", n_k_means = NULL,
split_distance = NULL)
Arguments
- data
matrix or data.frame that contains at least two columns.
- x_column
(character) the name of the x-axis.
- y_column
(character) the name of the y-axis.
- space
(character) space in which the thinning will be performed. There are two options available: "G", if it will be in the geographic space, and "E", if it will be in the environmental space.
- cluster_method
(character) name of the method to be used for detecting clusters. Options are "hierarchical" and "k-means"; default = "hierarchical".
- n_k_means
(numeric) number of clusters to be identified when using the "k-means" in
cluster_method
.- split_distance
(numeric) distance in meters (if
space
= "G") or Euclidean distance (ifspace
= "E") to identify clusters ifcluster_method
= "hierarchical".
Details
Clustering methods make distinct assumptions and one of them may perform better than the other depending on the pattern of the data.
The k-means method tends to perform better when data are grouped spatially (spherically) and clusters are of a similar size. The hierarchical clustering algorithm usually takes more time than the k-means method. Both methods make assumptions and may work well on some data sets but fail on others.
Examples
# Data
m_matrix <- read_master(system.file("extdata/m_matrix.rds",
package = "biosurvey"))
# Cluster detection
clusters <- find_clusters(m_matrix$data_matrix, x_column = "PC1",
y_column = "PC2", space = "E",
cluster_method = "hierarchical", n_k_means = NULL,
split_distance = 4)
head(clusters)
#> Longitude Latitude Mean_temperature Max_temperature Min_temperature
#> 8 -117.0833 32.58333 170 266 71
#> 9 -116.9167 32.58333 169 293 59
#> 10 -116.7500 32.58333 165 318 44
#> 11 -116.5833 32.58333 153 326 23
#> 12 -116.4167 32.58333 146 329 9
#> 13 -116.2500 32.58333 148 333 7
#> Annual_precipitation Prec_wettest_month Prec_driest_month PC1
#> 8 234 46 1 -1.752933
#> 9 277 52 1 -1.728394
#> 10 336 64 2 -1.696348
#> 11 409 78 2 -1.845195
#> 12 400 76 1 -2.063613
#> 13 320 60 1 -2.181253
#> PC2 clusters
#> 8 1.0409917 1
#> 9 0.5996405 1
#> 10 0.2890784 1
#> 11 0.4419228 1
#> 12 0.5029680 1
#> 13 0.3238374 1