This repository is for reproducibility project from EECS553 (Machine Leaning) Course. We verified the paper "A Tighter Analysis of Spectral Clustering, and Beyond", published in ICML 2022.
- Less-separated Synthetic Dataset:
- run
python experiments.py complete
- Change
r
value at - Plot Results: run
python plot_results.py
in the folder https://github.com/dom-lee/EECS553-Reproducibility-Spectral-Clustering/tree/reproducibility/results/sbm
- Change
- Test on BSDS dataset with different parameters:
- run
python experiments.py bsds
- We set break condition to cluster only 25 images
- Change
variance
at
- Test on MNIST dataset with different parameters:
- run
python experiments.py mnist
- Change parameter k to construct different K-NN graph
- Test different number of eigenvectors
- Check the performance of Spectral Clustering with fewer eigenvectors after reducing dimensionality through Sketching
- check
sketch.py
for sketch algorithm - MNIST
- Uncomment
- run
python experiments.py mnist
- USPS
- Uncomment
- run
python experiments.py usps
- Visualize Graph (Synthetic Dataset, MNIST, USPS)
- run
python plot_graph.py complete
- run
python plot_graph.py mnist
- run
python plot_graph.py usps
This directory contains the code to reproduce the results in the paper "A Tighter Analysis of Spectral Clustering, and Beyond", published in ICML 2022.
Our code is primarily written in Python 3. There is also a matlab script for analysing the results of the BSDS experiment.
We recommend running the python code inside a virtual environment.
To install the dependencies of the project, run
pip install -r requirements.txt
If you would like to run the experiments on the BSDS dataset, you should untar the data file
in the data/bsds
directory. On Linux, this is done with the following commands.
cd data/bsds
tar -xvf BSR_bsds500.tgz
To run one of the experiments described in the paper, run
python experiment.py {experiment_name}
where {experiment_name}
is one of cycle
, grid
, mnist
, usps
, or bsds
.
The MNIST and USPS experiments will run easily on a laptop or desktop. The cycle
and grid
experiments will also run
on a personal computer but could take a few minutes since they must run multiple trials for each number of eigenvectors.
Please note that the BSDS experiment is quite resource-intensive, and we recommend running on a compute server.
You can instead choose to run the BSDS experiment on only one of the images from the dataset using the following command.
python experiment.py bsds {bsds_image_id}
For example:
python experiment bsds 176039
The output from the experiments will be in the results
directory, under the appropriate experiment name.
The BSDS results can be analysed using the matlab script analyseBsdsResults.m
which will call the
BSDS benchmarking code to evaluate the image segmentation output.
While the analyseBsdsResults
script will evaluate the BSDS segmentations, if you would like to view the
segmented images, you can use the provided MATLAB function compareSegmentations
. This is the method used to generate
Figure 1 in the paper. For example:
compareSegmentations("176039")
Note that the experiment must have been run for the image ID 176039 before running the MATLAB visualisation script.
If you are primarily interested in the application of spectral clustering to image segmentation, you could take a look at this GitHub repository which includes only the image segmentation code from our project and provides a straightforward interface to segment any image file.
@InProceedings{pmlr-v162-macgregor22a,
title = {A Tighter Analysis of Spectral Clustering, and Beyond},
author = {Macgregor, Peter and Sun, He},
booktitle = {Proceedings of the 39th International Conference on Machine Learning},
pages = {14717--14742},
year = {2022},
volume = {162},
publisher = {PMLR},
}