Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE]: Expand DLP to Support Additional Default Datasets for Enhanced Testing and Training #1058

Open
codingwithsurya opened this issue Nov 20, 2023 · 4 comments
Assignees
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@codingwithsurya
Copy link
Contributor

Feature Name

Support Additional Default Datasets for Enhanced Testing and Training

Your Name

Surya Subramanian

Description

To enhance the versatility and applicability of the Deep Learning Playground (DLP), we propose to add support for more default datasets. These datasets will provide users with a wider range of options for testing and training their machine learning models. Each dataset comes with unique characteristics and challenges, making them ideal for various research and application purposes.

Here are some proposed datasets you can try to integrate:
CIFAR100: Offers 100 classes, each with 600 images (500 for training and 100 for testing). A more complex version of CIFAR10.

SVHN (Street View House Numbers): A real-world image dataset for developing machine learning and object recognition algorithms, requiring minimal data preprocessing.

ImageNet: A large and complex visual database designed for visual object recognition software research.

CelebA (Celebrity Faces Attributes): A large-scale face attributes dataset with over 200,000 celebrity images, each with 40 attribute annotations.

COIL100 (Columbia Object Image Library 100): Consists of 7200 images of 100 objects, each photographed from various angles.

Omniglot: A dataset designed for one-shot learning, containing 1623 different handwritten characters from 50 different alphabets.

STL10: Inspired by CIFAR-10, this dataset is meant for developing unsupervised feature learning, deep learning, and self-taught learning algorithms.

EMNIST (Extended MNIST): Expands the original MNIST dataset to include handwritten letters.

Task Breakdown
Integration of Datasets: Implement the integration of these datasets into the DLP system, ensuring they are easily accessible and usable for users.

Architecture Optimization: For each dataset, research and determine the most effective neural network architectures that are suitable for testing. This involves understanding the specific characteristics and challenges posed by each dataset.

Documentation and Examples: Provide detailed documentation and example use cases for each dataset, guiding users on how to leverage these datasets effectively.

Testing and Validation: Conduct thorough testing (through POSTMAN + default dataset) to ensure the seamless integration of these datasets into the DLP. Validate the performance of suggested architectures for each dataset. More info on how to do this is in Notion.

Copy link
Contributor

Hello @codingwithsurya! Thank you for submitting the Feature Request Form. We appreciate your contribution. 👋

We will look into it and provide a response as soon as possible.

To work on this feature request, you can follow these branch setup instructions:

  1. Checkout the main branch:
```
 git checkout nextjs
```
  1. Pull the latest changes from the remote main branch:
```
 git pull origin nextjs
```
  1. Create a new branch specific to this feature request using the issue number:
```
 git checkout -b feature-1058
```

Feel free to make the necessary changes in this branch and submit a pull request when you're ready.

Best regards,
Deep Learning Playground (DLP) Team

@karkir0003 karkir0003 moved this from Backlog to Todo in DLP Project Board Nov 20, 2023
@karkir0003 karkir0003 added the good first issue Good for newcomers label Nov 20, 2023
@karkir0003 karkir0003 moved this from Todo to In Progress in DLP Project Board Dec 29, 2023
@LuHG18
Copy link
Collaborator

LuHG18 commented Feb 7, 2024

Found a small bug in tabularConstants.ts where there was a typographical error for the california housing data set leading to incorrect referencing. Just adding an underscore between "california" and "housing" fixed the problem.

@LuHG18
Copy link
Collaborator

LuHG18 commented Feb 7, 2024

Another small bug. The DIGITS data set was not working when selected. This data set had just not been loaded in from sci-kit learn. I loaded it in and added it in dataset.py and that seems to have fixed the problem.

@karkir0003
Copy link
Member

Hey @LuHG18 ETA on the PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
Status: In Progress
Development

No branches or pull requests

3 participants