You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Support Additional Default Datasets for Enhanced Testing and Training
Your Name
Surya Subramanian
Description
To enhance the versatility and applicability of the Deep Learning Playground (DLP), we propose to add support for more default datasets. These datasets will provide users with a wider range of options for testing and training their machine learning models. Each dataset comes with unique characteristics and challenges, making them ideal for various research and application purposes.
Here are some proposed datasets you can try to integrate:
CIFAR100: Offers 100 classes, each with 600 images (500 for training and 100 for testing). A more complex version of CIFAR10.
SVHN (Street View House Numbers): A real-world image dataset for developing machine learning and object recognition algorithms, requiring minimal data preprocessing.
ImageNet: A large and complex visual database designed for visual object recognition software research.
CelebA (Celebrity Faces Attributes): A large-scale face attributes dataset with over 200,000 celebrity images, each with 40 attribute annotations.
COIL100 (Columbia Object Image Library 100): Consists of 7200 images of 100 objects, each photographed from various angles.
Omniglot: A dataset designed for one-shot learning, containing 1623 different handwritten characters from 50 different alphabets.
STL10: Inspired by CIFAR-10, this dataset is meant for developing unsupervised feature learning, deep learning, and self-taught learning algorithms.
EMNIST (Extended MNIST): Expands the original MNIST dataset to include handwritten letters.
Task Breakdown
Integration of Datasets: Implement the integration of these datasets into the DLP system, ensuring they are easily accessible and usable for users.
Architecture Optimization: For each dataset, research and determine the most effective neural network architectures that are suitable for testing. This involves understanding the specific characteristics and challenges posed by each dataset.
Documentation and Examples: Provide detailed documentation and example use cases for each dataset, guiding users on how to leverage these datasets effectively.
Testing and Validation: Conduct thorough testing (through POSTMAN + default dataset) to ensure the seamless integration of these datasets into the DLP. Validate the performance of suggested architectures for each dataset. More info on how to do this is in Notion.
The text was updated successfully, but these errors were encountered:
Found a small bug in tabularConstants.ts where there was a typographical error for the california housing data set leading to incorrect referencing. Just adding an underscore between "california" and "housing" fixed the problem.
Another small bug. The DIGITS data set was not working when selected. This data set had just not been loaded in from sci-kit learn. I loaded it in and added it in dataset.py and that seems to have fixed the problem.
Feature Name
Support Additional Default Datasets for Enhanced Testing and Training
Your Name
Surya Subramanian
Description
To enhance the versatility and applicability of the Deep Learning Playground (DLP), we propose to add support for more default datasets. These datasets will provide users with a wider range of options for testing and training their machine learning models. Each dataset comes with unique characteristics and challenges, making them ideal for various research and application purposes.
Here are some proposed datasets you can try to integrate:
CIFAR100: Offers 100 classes, each with 600 images (500 for training and 100 for testing). A more complex version of CIFAR10.
SVHN (Street View House Numbers): A real-world image dataset for developing machine learning and object recognition algorithms, requiring minimal data preprocessing.
ImageNet: A large and complex visual database designed for visual object recognition software research.
CelebA (Celebrity Faces Attributes): A large-scale face attributes dataset with over 200,000 celebrity images, each with 40 attribute annotations.
COIL100 (Columbia Object Image Library 100): Consists of 7200 images of 100 objects, each photographed from various angles.
Omniglot: A dataset designed for one-shot learning, containing 1623 different handwritten characters from 50 different alphabets.
STL10: Inspired by CIFAR-10, this dataset is meant for developing unsupervised feature learning, deep learning, and self-taught learning algorithms.
EMNIST (Extended MNIST): Expands the original MNIST dataset to include handwritten letters.
Task Breakdown
Integration of Datasets: Implement the integration of these datasets into the DLP system, ensuring they are easily accessible and usable for users.
Architecture Optimization: For each dataset, research and determine the most effective neural network architectures that are suitable for testing. This involves understanding the specific characteristics and challenges posed by each dataset.
Documentation and Examples: Provide detailed documentation and example use cases for each dataset, guiding users on how to leverage these datasets effectively.
Testing and Validation: Conduct thorough testing (through POSTMAN + default dataset) to ensure the seamless integration of these datasets into the DLP. Validate the performance of suggested architectures for each dataset. More info on how to do this is in Notion.
The text was updated successfully, but these errors were encountered: