This project, developed by Ufuk Çakır, introduces the code to generate the GAMMA (Galactic Attributes of Mass, Metallicity, and Age) dataset, a comprehensive collection of galaxy data tailored for Machine Learning applications. The dataset offers detailed 2D maps and 3D cubes of 11 727 galaxies, capturing essential attributes: stellar age, metallicity, and mass. Ideal for feature extraction, clustering, and regression tasks, GAMMA offers a unique lens for exploring galactic structures through computational methods and is a bridge between astrophysical simulations and the field of scientific machine learning (ML)
The full paper is available on arXiv!
Interdisciplinary Center for Scientific Computing (IWR), Heidelberg University, 09/2023
⚠️ Warning: If you want to use this code or dataset in your research, please cite the following paper:@misc{çakır2023, title={GAMMA: Galactic Attributes of Mass, Metallicity, and Age Dataset},
author={Ufuk Çakır and Tobias Buck},
year={2023},
eprint={2312.06016},
archivePrefix={arXiv},
primaryClass={astro-ph.GA},
url={https://arxiv.org/abs/2312.06016},
}
Figure 1: Sample Galaxies in 2D (top) and 3D (bottom)
The GAMMA dataset can be downloaded on Zenodo.
The code loads galaxies from the IllustrisTNG suite. For that, the respective python package should be installed:
$ cd ~
$ git clone https://github.com/illustristng/illustris_python.git
$ pip install illustris_python/
Check the Starting Guide on the TNG webpage for more information.
For installation of the code, run
source setup.sh
The config.json file contains all the settings nedded to run the data generation. All the configuration should be made there. The required fields are:
- simulation: The simluation from which the data should be generated. Currently only "IllustrisTNG" is supported.
- "particle_types": The particle type to calculate the images
- "galaxy_parameters": Additional parameters to be saved for each galaxy.
- "img_res": Image resolution in each dimension
- "path": Output Path of the Created HDF5 File
- "halo_ids": If none, it will do automatic selection of galaxies.
- "dim": dimension of image , either (2 and/or 3) dimensional
- "log_M_min": lower Mass cut in log10(M_sun/h)
- "log_M_max": upper Mass cut in log10(M_sun/h)
- "fields": Fields to calculate the images. For each field the attributes "mass_weighted" and "normed" define wheter or not to calculate a mass weighted image and to norm or not.
- "GalaxyArgs": Arguments specified to load galaxy defined in the Galaxy Class
To generate the dataset run
source generate_data.sh
This will select galaxies using the select_galaxies function and save the halo ids in a numpy array. Finaly the code runs the generate.py code to generate the dataset from the selected galaxies.
The data will be stored in a HDF5 File in the following way:
You can use the Gamma
class defined in load.py:
>>> from gamma.load import Gamma
>>> path = "GAMMA.hdf5"
data = Gamma(path)
To get specific data from the Attributes group you can simply call
>>> data = Gamma("data.hdf5")
>>> data.get_attribute("mass") # Get the mass of all galaxies in the dataset
>>> data.get_attribute("mass", 10) # Get the mass of the 10th galaxy in the dataset
To get the images you can use:
>>> image = data.get_image("stars", "Masses", 10) # Get the stars masses image of the 10th galaxy in the dataset
>>> all_images = data.get_image("stars", "Masses") # Get all stars masses images in the dataset
>>> from gamma.load import Gamma
>>> path = "GAMMA.hdf5"
>>> data = Gamma(path)
>>> from gamma.model import mPCA
>>> model = mPCA(data, dim=2) # Initialize PCA model for two dimensional data
Creating datamatrix with the following fields:
===============================================
Particle type: stars
Fields: ['GFM_Metallicity', 'GFM_StellarFormationTime', 'Masses']
Dimension: dim2
Default arguments are used for the fields that are not specified in the norm_function_kwargs
===============================================
Created datamatrix with shape: (11727, 12288)
>>> model.fit(n_components = 60, show_results = True)
For more results, check the benchmark page.