Skip to content
Jeffrey Ede edited this page Sep 18, 2020 · 33 revisions

Overview

Electron microscopy datasets are available from a combination of Zenodo and Google Drive storage (mirror 1). They're also available from a publicly accessible University of Warwick dataserver (mirror 2). TEM and STEM Images/Crops datasets were collected by dozens of Warwick scientists working on hundreds of projects and therefore have a diverse constitution. Wavefunctions are for atom columns.

A preprint|paper provides dataset details and visualizations. Datasets are in the public domain and can be used without restriction. Most datasets are large (100+ GB) so downloads may take a couple of hours or more depending on your internet connection. In addition, if many users have recently downloaded a dataset from mirror 1, you might get an error saying "download quota exceeded for this file so you can't download at this time". To avoid this, either sign in to Google Drive or use mirror 2.

Exit Wavefunctions

Multiple datasets containing 98340 wavefunctions simulated with clTEM. In addition, there are 1000 experimental focal series. Wavefunctions are in 64-bit complex (320, 320) numpy array files (.npy) that can be opened with np.load(). Focal series images are in TIFF format. Featured in this preprint.

Datasets include:

  • Wavefunctions (wavefunctions_partitioned_multiple_hq): n=3, multiple materials - 27.8 GB.
  • Wavefunctions Unseen Training (wavefunctions_multiple_unseen_train_hq): n=3, multiple materials, materials in training set - 1.2 GB.
  • Wavefunctions Single (wavefunctions_single_hq): n=3, single material - 3.7 GB.
  • Wavefunctions Restricted (wavefunctions_multiple_forth_hq): n=3, multiple materials, simulation hyperparameter ranges reduced by a factor close to 1/4 - 9.1 GB.
  • Wavefunctions n=1 (wavefunctions): n=1, multiple materials. See dataset_info.txt for partitioning into training, validation and test sets. - 28.6 GB.
  • Wavefunctions n=1 Unseen Training (unseen_train): n=1, multiple materials, materials in training set - 1.1 GB.
  • Wavefunctions n=1 Single (wavefunctions_single): n=1, single material - 3.7 GB.
  • Experimental Focal Series (experimental_focal_series): 1000 experimental focal series. Series have a quadratically increasing defocus sequence; however, they are at different spatial scales - 13.7 GB.
  • CIFs (cifs): Downloaded from the COD and used for clTEM simulations - 203.9 MB.
  • ULRs (url_lists): COD URLs cifs were downloaded from.

Download mirror 1
Download mirror 2 (Password: W4rw1ck3m!)

Exit Wavefunctions 96x96

Wavefunctions downsampled to 96x96. They are in 32-bit complex (dataset_size, 320, 320, 2) numpy array files (.npy) that can be opened with np.load(). Python index [...,0] is the real part, and [...,1] is the imaginary part. Training, validation, and test sets are concatenated along the batch axis (training data at low indices).

  • Wavefunctions 96x96 (wavefunctions_n=3): Bilinearly dowsampled from wavefunctions_multiple_hq with antialiasing. 36324 wavefunctions: 24530 training, 3399 validation, and 8395 test. - 2.62 GB.
  • Wavefunctions 96x96 Restricted (wavefunctions_restricted_n=3): Bilinearly dowsampled from wavefunctions_multiple_forth_hq with antialiasing. 11870 wavefunctions: 8002 training, 1105 validation, and 2763 test. - 855 MB.
  • Wavefunctions 96x96 Single (wavefunctions_single_n=3): Bilinearly dowsampled from wavefunctions_single_hq with antialiasing. 4825 wavefunctions: 3861 training, and 964 validation. - 347 MB.

Download mirror 1

Electron Micrographs 96x96

Size 96x96 images intended for rapid development. Images are in numpy array files (.npy) that can be opened with np.load().

  • Full TEM images downsampled to 96x96 with antialiasing. Images are in a (17266, 96, 96, 1) numpy array file (.npy). - 607 MB.
  • Full STEM images downsampled to 96x96 with antialiasing. Images are in a (19769, 96, 96, 1) numpy array file (.npy). - 695 MB.
  • 96x96 crops from full STEM images. Images are in a (19769, 96, 96, 1) numpy array file (.npy). - 695 MB.

Download mirror 1

STEM Full Images

Full STEM images in a variety of shapes. Featured in this paper.

Info: 159.4 GB. 16227 images.

Download mirror 1
Download mirror 2 (Password: W4rw1ck3m!)

STEM Crops

Non-overlapping 512x512 crops from images in the STEM full images dataset. Featured in this paper.

Info: 157.3 GB. 110933 training, 21259 validation and 28877 test set crops, totalling 161069 crops.

Download mirror 1
Download mirror 2 (Password: W4rw1ck3m!)

TEM Full Images

Full TEM images. Featured in this paper.

Info: 269.8 GB. 11350 training, 2431 validation and 3486 test images, totalling 17267 images.

Download mirror 1
Download mirror 2 (Password: W4rw1ck3m!)

Contact

Jeffrey Ede: [email protected]