diff --git a/examples/00_example_sig53_dataset.ipynb b/examples/00_example_sig53_dataset.ipynb new file mode 100644 index 0000000..5647098 --- /dev/null +++ b/examples/00_example_sig53_dataset.ipynb @@ -0,0 +1,303 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Example 00 - The Official Sig53 Dataset\n", + "This notebook walks through an example of how the official Sig53 dataset can be instantiated and analyzed.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Import Libraries\n", + "First, import all the necessary public libraries as well as a few classes from the `torchsig` toolkit." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from torchsig.utils.writer import DatasetCreator\n", + "from torchsig.utils.visualize import IQVisualizer, SpectrogramVisualizer\n", + "from torchsig.datasets.modulations import ModulationsDataset\n", + "from torchsig.datasets.sig53 import Sig53\n", + "from torchsig.utils.dataset import SignalDataset\n", + "from torchsig.datasets import conf\n", + "from torch.utils.data import DataLoader\n", + "from matplotlib import pyplot as plt\n", + "from tqdm import tqdm\n", + "import numpy as np\n", + "import os" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Instantiate Sig53 Dataset\n", + "To instantiate the Sig53 dataset, several parameters are given to the imported `Sig53` class. These paramters are:\n", + "- `root` ~ A string to specify the root directory of where to instantiate and/or read an existing Sig53 dataset\n", + "- `train` ~ A boolean to specify if the Sig53 dataset should be the training (True) or validation (False) sets\n", + "- `impaired` ~ A boolean to specify if the Sig53 dataset should be the clean version or the impaired version\n", + "- `transform` ~ Optionally, pass in any data transforms here if the dataset will be used in an ML training pipeline\n", + "- `target_transform` ~ Optionally, pass in any target transforms here if the dataset will be used in an ML training pipeline\n", + "\n", + "A combination of the `train` and the `impaired` booleans determines which of the four (4) distinct Sig53 datasets will be instantiated:\n", + "- `train=True` & `impaired=False` = Clean training set of 1.06M examples\n", + "- `train=True` & `impaired=True` = Impaired training set of 5.3M examples\n", + "- `train=False` & `impaired=False` = Clean validation set of 106k examples\n", + "- `train=False` & `impaired=True` = Impaired validation set of 106k examples\n", + "\n", + "The final option of the impaired validation set is the dataset to be used when reporting any results with the official Sig53 dataset.\n", + "\n", + "Additional optional parameters of potential interest are:\n", + "- `regenerate` ~ A boolean specifying if the dataset should be regenerated even if an existing dataset is detected (Default: False)\n", + "- `eb_no` ~ A boolean specifying if the SNR should be defined as Eb/No if True (making higher order modulations more powerful) or as Es/No if False (Defualt: False)\n", + "- `use_signal_data` ~ A boolean specifying if the data and target information should be converted to `SignalData` objects as they are read in (Default: False)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Specify script options" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "figure_dir = \"figures\"\n", + "if not os.path.isdir(figure_dir):\n", + " os.mkdir(figure_dir)\n", + "\n", + "cfg = conf.Sig53CleanTrainQAConfig\n", + "# cfg = conf.Sig53CleanTrainConfig # uncomment to run for real\n", + "\n", + "ds = ModulationsDataset(\n", + " level=cfg.level,\n", + " num_samples=cfg.num_samples,\n", + " num_iq_samples=cfg.num_iq_samples,\n", + " use_class_idx=cfg.use_class_idx,\n", + " include_snr=cfg.include_snr,\n", + " eb_no=cfg.eb_no,\n", + ")\n", + "\n", + "creator = DatasetCreator(ds, seed=12345678, path=\"sig53/sig53_clean_train\")\n", + "creator.create()\n", + "sig53 = Sig53(\"sig53\", train=True, impaired=False)\n", + "\n", + "# Retrieve a sample and print out information\n", + "idx = np.random.randint(len(sig53))\n", + "data, (label, snr) = sig53[idx]\n", + "print(\"Dataset length: {}\".format(len(sig53)))\n", + "print(\"Data shape: {}\".format(data.shape))\n", + "print(\"Label Index: {}\".format(label))\n", + "print(\"Label Class: {}\".format(Sig53.convert_idx_to_name(label)))\n", + "print(\"SNR: {}\".format(snr))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Plot Subset to Verify\n", + "The `IQVisualizer` and the `SpectrogramVisualizer` can be passed a `Dataloader` and plot visualizations of the dataset. The `batch_size` of the `DataLoader` determines how many examples to plot for each iteration over the visualizer. Note that the dataset itself can be indexed and plotted sequentially using any familiar python plotting tools as an alternative plotting method to using the `torchsig` `Visualizer` as shown below." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# For plotting, omit the SNR values" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "class DataWrapper(SignalDataset):\n", + " def __init__(self, dataset):\n", + " self.dataset = dataset\n", + " super().__init__(dataset)\n", + "\n", + " def __getitem__(self, idx):\n", + " x, (y, z) = self.dataset[idx]\n", + " return x, y\n", + "\n", + " def __len__(self) -> int:\n", + " return len(self.dataset)\n", + "\n", + "\n", + "plot_dataset = DataWrapper(sig53)\n", + "\n", + "data_loader = DataLoader(dataset=plot_dataset, batch_size=16, shuffle=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Transform the plotting titles from the class index to the name" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def target_idx_to_name(tensor: np.ndarray) -> list:\n", + " batch_size = tensor.shape[0]\n", + " label = []\n", + " for idx in range(batch_size):\n", + " label.append(Sig53.convert_idx_to_name(int(tensor[idx])))\n", + " return label\n", + "\n", + "\n", + "visualizer = IQVisualizer(\n", + " data_loader=data_loader,\n", + " visualize_transform=None,\n", + " visualize_target_transform=target_idx_to_name,\n", + ")\n", + "\n", + "for figure in iter(visualizer):\n", + " figure.set_size_inches(14, 9)\n", + " plt.savefig(\"figures/00_iq_data.png\")\n", + " break" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Repeat but plot the spectrograms for a new random sampling of the data\n", + "visualizer = SpectrogramVisualizer(\n", + " data_loader=data_loader,\n", + " nfft=1024,\n", + " visualize_transform=None,\n", + " visualize_target_transform=target_idx_to_name,\n", + ")\n", + "\n", + "for figure in iter(visualizer):\n", + " figure.set_size_inches(14, 9)\n", + " plt.savefig(\"figures/00_spectrogram.png\")\n", + " break" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Analyze Dataset\n", + "\n", + "Loop through the dataset recording classes and SNRs" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "class_counter_dict = {\n", + " class_name: 0 for class_name in list(Sig53._idx_to_name_dict.values())\n", + "}\n", + "all_snrs = []\n", + "\n", + "for idx in tqdm(range(len(sig53))):\n", + " data, (modulation, snr) = sig53[idx]\n", + " class_counter_dict[Sig53.convert_idx_to_name(modulation)] += 1\n", + " all_snrs.append(snr)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Plot the distribution of classes" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "class_names = list(class_counter_dict.keys())\n", + "num_classes = list(class_counter_dict.values())\n", + "\n", + "plt.figure(figsize=(9, 9))\n", + "plt.pie(num_classes, labels=class_names)\n", + "plt.title(\"Class Distribution Pie Chart\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "plt.figure(figsize=(11, 4))\n", + "plt.bar(class_names, num_classes)\n", + "plt.xticks(rotation=90)\n", + "plt.title(\"Class Distribution Bar Chart\")\n", + "plt.xlabel(\"Modulation Class Name\")\n", + "plt.ylabel(\"Counts\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Plot the distribution of SNR values" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "plt.figure(figsize=(11, 4))\n", + "plt.hist(x=all_snrs, bins=100)\n", + "plt.title(\"SNR Distribution\")\n", + "plt.xlabel(\"SNR Bins (dB)\")\n", + "plt.ylabel(\"Counts\")" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.8.10" + }, + "orig_nbformat": 4 + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/examples/00_example_sig53_dataset.py b/examples/00_example_sig53_dataset.py deleted file mode 100644 index 39f71d2..0000000 --- a/examples/00_example_sig53_dataset.py +++ /dev/null @@ -1,172 +0,0 @@ -# Example 00 - The Official Sig53 Dataset -# This notebook walks through an example of how the official Sig53 dataset can be instantiated and analyzed. - -# ---- -# ### Import Libraries -# First, import all the necessary public libraries as well as a few classes from the `torchsig` toolkit. - -from torchsig.utils.writer import DatasetCreator -from torchsig.utils.visualize import IQVisualizer, SpectrogramVisualizer -from torchsig.datasets.modulations import ModulationsDataset -from torchsig.datasets.sig53 import Sig53 -from torchsig.utils.dataset import SignalDataset -from torchsig.datasets import conf -from torch.utils.data import DataLoader -from matplotlib import pyplot as plt -from tqdm import tqdm -import numpy as np -import os - -# ---- -# ### Instantiate Sig53 Dataset -# To instantiate the Sig53 dataset, several parameters are given to the imported `Sig53` class. These paramters are: -# - `root` ~ A string to specify the root directory of where to instantiate and/or read an existing Sig53 dataset -# - `train` ~ A boolean to specify if the Sig53 dataset should be the training (True) or validation (False) sets -# - `impaired` ~ A boolean to specify if the Sig53 dataset should be the clean version or the impaired version -# - `transform` ~ Optionally, pass in any data transforms here if the dataset will be used in an ML training pipeline -# - `target_transform` ~ Optionally, pass in any target transforms here if the dataset will be used in an ML training pipeline -# -# A combination of the `train` and the `impaired` booleans determines which of the four (4) distinct Sig53 datasets will be instantiated: -# - `train=True` & `impaired=False` = Clean training set of 1.06M examples -# - `train=True` & `impaired=True` = Impaired training set of 5.3M examples -# - `train=False` & `impaired=False` = Clean validation set of 106k examples -# - `train=False` & `impaired=True` = Impaired validation set of 106k examples -# -# The final option of the impaired validation set is the dataset to be used when reporting any results with the official Sig53 dataset. -# -# Additional optional parameters of potential interest are: -# - `regenerate` ~ A boolean specifying if the dataset should be regenerated even if an existing dataset is detected (Default: False) -# - `eb_no` ~ A boolean specifying if the SNR should be defined as Eb/No if True (making higher order modulations more powerful) or as Es/No if False (Defualt: False) -# - `use_signal_data` ~ A boolean specifying if the data and target information should be converted to `SignalData` objects as they are read in (Default: False) - -# Specify script options -figure_dir = "examples/figures" -if not os.path.isdir(figure_dir): - os.mkdir(figure_dir) - -cfg = conf.Sig53CleanTrainQAConfig -# cfg = conf.Sig53CleanTrainConfig # uncomment to run for real - -ds = ModulationsDataset( - level=cfg.level, - num_samples=cfg.num_samples, - num_iq_samples=cfg.num_iq_samples, - use_class_idx=cfg.use_class_idx, - include_snr=cfg.include_snr, - eb_no=cfg.eb_no, -) - -creator = DatasetCreator(ds, seed=12345678, path="examples/sig53/sig53_clean_train") -creator.create() -sig53 = Sig53("examples/sig53", train=True, impaired=False) - -# Retrieve a sample and print out information -idx = np.random.randint(len(sig53)) -data, (label, snr) = sig53[idx] -print("Dataset length: {}".format(len(sig53))) -print("Data shape: {}".format(data.shape)) -print("Label Index: {}".format(label)) -print("Label Class: {}".format(Sig53.convert_idx_to_name(label))) -print("SNR: {}".format(snr)) - - -# ---- -# ### Plot Subset to Verify -# The `IQVisualizer` and the `SpectrogramVisualizer` can be passed a `Dataloader` and plot visualizations of the dataset. The `batch_size` of the `DataLoader` determines how many examples to plot for each iteration over the visualizer. Note that the dataset itself can be indexed and plotted sequentially using any familiar python plotting tools as an alternative plotting method to using the `torchsig` `Visualizer` as shown below. - - -# For plotting, omit the SNR values -class DataWrapper(SignalDataset): - def __init__(self, dataset): - self.dataset = dataset - super().__init__(dataset) - - def __getitem__(self, idx): - x, (y, z) = self.dataset[idx] - return x, y - - def __len__(self) -> int: - return len(self.dataset) - - -plot_dataset = DataWrapper(sig53) - -data_loader = DataLoader(dataset=plot_dataset, batch_size=16, shuffle=True) - - -# Transform the plotting titles from the class index to the name -def target_idx_to_name(tensor: np.ndarray) -> list: - batch_size = tensor.shape[0] - label = [] - for idx in range(batch_size): - label.append(Sig53.convert_idx_to_name(int(tensor[idx]))) - return label - - -visualizer = IQVisualizer( - data_loader=data_loader, - visualize_transform=None, - visualize_target_transform=target_idx_to_name, -) - -for figure in iter(visualizer): - figure.set_size_inches(14, 9) - plt.savefig("examples/figures/00_iq_data.png") - break - - -# Repeat but plot the spectrograms for a new random sampling of the data -visualizer = SpectrogramVisualizer( - data_loader=data_loader, - nfft=1024, - visualize_transform=None, - visualize_target_transform=target_idx_to_name, -) - -for figure in iter(visualizer): - figure.set_size_inches(14, 9) - plt.savefig("examples/figures/00_spectrogram.png") - break - - -# ---- -# ### Analyze Dataset -# The dataset can also be analyzed at the macro level for details such as the distribution of classes and SNR values. This exercise is performed below to show the nearly uniform distribution across each. - -# Loop through the dataset recording classes and SNRs -class_counter_dict = { - class_name: 0 for class_name in list(Sig53._idx_to_name_dict.values()) -} -all_snrs = [] - -for idx in tqdm(range(len(sig53))): - data, (modulation, snr) = sig53[idx] - class_counter_dict[Sig53.convert_idx_to_name(modulation)] += 1 - all_snrs.append(snr) - - -# Plot the distribution of classes -class_names = list(class_counter_dict.keys()) -num_classes = list(class_counter_dict.values()) - -plt.figure(figsize=(9, 9)) -plt.pie(num_classes, labels=class_names) -plt.title("Class Distribution Pie Chart") -plt.savefig("examples/figures/00_class_distribution_pie.png") - -plt.figure(figsize=(11, 4)) -plt.bar(class_names, num_classes) -plt.xticks(rotation=90) -plt.title("Class Distribution Bar Chart") -plt.xlabel("Modulation Class Name") -plt.ylabel("Counts") -plt.savefig("examples/figures/00_class_distribution_bar.png") - - -# Plot the distribution of SNR values -plt.figure(figsize=(11, 4)) -plt.hist(x=all_snrs, bins=100) -plt.title("SNR Distribution") -plt.xlabel("SNR Bins (dB)") -plt.ylabel("Counts") -plt.savefig("examples/figures/00_snr_distribution_hist.png") diff --git a/torchsig/__init__.py b/torchsig/__init__.py index 6a9beea..3d18726 100644 --- a/torchsig/__init__.py +++ b/torchsig/__init__.py @@ -1 +1 @@ -__version__ = "0.4.0" +__version__ = "0.5.0"