Skip to content

ESA-Datalabs/XAMI-dataset

Repository files navigation

XAMI: XMM-Newton optical Artefact Mapping for astronomical Instance segmentation

The Dataset

The XAMI dataset contains 1000 annotated images of observations from diverse sky regions of the XMM-Newton Optical Monitor (XMM-OM) image catalog. An additional 50 images with no annotations are included to help decrease the amount of False Positives or Negatives that may be caused by complex objects (e.g., large galaxies, clusters, nebulae).

The HuggingFace repository for this dataset can be found here.

Downloading the dataset

Cloning the repository

git clone https://github.com/ESA-Datalabs/XAMI-dataset.git
cd XAMI-dataset

# create an environment with the package requirements
conda env create -f environment.yaml 

# Install the package in editable mode
pip install -e .

Then

Downloading the dataset from HuggingFace

from xami_dataset import XAMIDataset

# Download the dataset
xami_dataset = XAMIDataset(
    repo_id="iulia-elisa/XAMI-dataset", 
    dataset_name="xami_dataset", 
    dest_dir='./data')
  • Or you can simply download only the dataset and unarchive it using a CLI command
DEST_DIR='/path/to/local/dest'

huggingface-cli download iulia-elisa/XAMI-dataset xami_dataset.zip --repo-type dataset --local-dir "$DEST_DIR" && unzip "$DEST_DIR/xami_dataset.zip" -d "$DEST_DIR" && rm "$DEST_DIR/xami_dataset.zip"

About

The dataset is splited into train and validation categories and contains annotated artefacts in COCO format for Instance Segmentation. We use multilabel Stratified K-fold (k=4) to balance class distributions across splits. We choose to work with a single dataset splits version (out of 4) but also provide means to work with all 4 versions.

Please check Dataset Structure for a more detailed structure of our dataset in COCO-IS and YOLOv8-Seg format.

Artefacts

A particularity of our XAMI dataset compared to every-day images datasets are the locations where artefacts usually appear.

Examples of an image with multiple artefacts.

Here are some examples of common artefacts in the dataset:

Examples of common artefacts in the OM observations.

Annotation platforms

The images have been annotated using the following projects:

© Licence

CC BY-NC 3.0 IGO.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published