The XAMI dataset contains 1000 annotated images of observations from diverse sky regions of the XMM-Newton Optical Monitor (XMM-OM) image catalog. An additional 50 images with no annotations are included to help decrease the amount of False Positives or Negatives that may be caused by complex objects (e.g., large galaxies, clusters, nebulae).
The HuggingFace repository for this dataset can be found here.
git clone https://github.com/ESA-Datalabs/XAMI-dataset.git
cd XAMI-dataset
# create an environment with the package requirements
conda env create -f environment.yaml
# Install the package in editable mode
pip install -e .
Then
- using a python script (see load_and_visualise_dataset.pynb)
from xami_dataset import XAMIDataset
# Download the dataset
xami_dataset = XAMIDataset(
repo_id="iulia-elisa/XAMI-dataset",
dataset_name="xami_dataset",
dest_dir='./data')
- Or you can simply download only the dataset and unarchive it using a CLI command
DEST_DIR='/path/to/local/dest'
huggingface-cli download iulia-elisa/XAMI-dataset xami_dataset.zip --repo-type dataset --local-dir "$DEST_DIR" && unzip "$DEST_DIR/xami_dataset.zip" -d "$DEST_DIR" && rm "$DEST_DIR/xami_dataset.zip"
The dataset is splited into train and validation categories and contains annotated artefacts in COCO format for Instance Segmentation. We use multilabel Stratified K-fold (k=4) to balance class distributions across splits. We choose to work with a single dataset splits version (out of 4) but also provide means to work with all 4 versions.
Please check Dataset Structure for a more detailed structure of our dataset in COCO-IS and YOLOv8-Seg format.
A particularity of our XAMI dataset compared to every-day images datasets are the locations where artefacts usually appear.
Here are some examples of common artefacts in the dataset:
The images have been annotated using the following projects:
- Zooniverse project, where the resulted annotations are not externally visible.
- Roboflow project, which allows for more interactive and visual annotation projects.