This is a cluster analysis of 29 shape feature parameters derived from several
image sets, which contain binary images of reference shapes, segmented and processed
with the R package SAFARI
. The goal of this project is to evaluate the performance of
different clustering methods based on similarity to the ground truth and devise a
novel model-based clustering method.
MPEG-7
- 70 classes (shapes) of 20 images each with minor differences in between.ETH-80
- Binary version of the famous dataset. 8 classes (objects) with 10 subclasses each, with 41 images in each subclass.Iris
- Famous dataset from 1936. 150 observations with 4 variables.maps
- Raw maps data of the countries of the world, the 48 contiguous US states, and all 254 counties in Texas. Includes binary shape outlines to scale of 49 European countries (excluding Russia), the above mentioned US states, and the 13 Texas counties comprising the DFW metroplex.
data/
- Datasetsplots/
- Generated plots comparing clustering method accuracycode/mpeg_7.R
- Cluster analysis of MPEG-7 featurescode/eth_80.R
- Cluster analysis of ETH-80 featurescode/iris.R
- Cluster analysis of the Iris datasetcode/maps.R
- Cluster analysis of the maps datasetcode/image_thresholding.R
- Thresholds grayscale images to binary
BiocManager
-SAFARI
dependencyEBImage
-SAFARI
dependency and image thresholdingremotes
- For installingSAFARI
SAFARI
- For segmenting shapes from binary images and extracting shape featurestidyverse
- Data manipulation withdplyr
and graphics withggplot2
mclust
- Gaussian mixture model clustering and Adjusted Rand Indexfactoextra
- Beautiful cluster visualizationsparallel
- Parallel computation in R