-
Notifications
You must be signed in to change notification settings - Fork 27
Welcome to Curated AI Resources! 🚀
Here you can access a curated list of helpful resources:
Datasets are imperative to train AI models and access to quality data is always demanding. Below you can find the public datasets and search engines. There are several private datasets as well but they come with a cost.
- Google Dataset Search
- Kaggle Dataset Search
- NASA Earth Dataset (largest collection of geo-related datasets about the earth, climate and water bodies)
- AWS Opendata
- Azure Opendata
- Data.world
- UCI ML Dataset
- Datahub.io
- GitHub/awesome-public-dataset
- Govdata.de
- Destatis.de
- Data.gov
- CMU library Dataset
- University of Hamburg Dataset TUHH
- PaperswithCode/Dataset
- Computer Vision datasets:
- Visualdata.io
- xView
- ImageNet
- Google open images
- IMDB-wiki (annotated face images)
- Dog-breed dataset
- TUM
- Kinetic 700-2020 (human poses from YT videos)
- Colors with RGB values
- cityscape (semantic segmentation)
- NLP datasets:
- QuantumStat
- QA
- Amazon reviews
- Rotten Tomato reviews
- Sentiment analysis: IMDB reviews, Stanford sentiment, Twitter US airlines
- Mobility datasets (self-driving car):
- Waymo
- Berkeley DeepDrive
- WPI dataset (traffic lights, pedestrian, and lane detection)
- Bosch small traffic light,
- Comma.ai (car’s speed, acceleration, steering angle, and GPS coordinates)
- MIT DriveSeg
- UCSD-LISA
- Geo & Satellite datasets:
ML services and open source codes help to speed up ML project planning, data pipeline and model development for a quick release of an AI feature. The following is a list of recommended useful resources:
-
Huggingface: Build, train and deploy state-of-the-art models powered by the reference open source in machine learning. Examples of the common models:
- Natural Language Processing: Transformers, Masked word completion with BERT, Name Entity Recognition with Electra, Text generation with GPT-2, GPT-J, Q&A with DistilBERT and RoBERTa, Summarization with BART, and Translation with T5.
- Computer Vision: Image classification with ViT, Object Detection with DETR, Semantic Segmentation with SegFormer, Panoptic Segmentation with DETR.
- Audio: Automatic Speech Recognition with Wav2Vec2, Keyword Spotting with Wav2Vec2.
- Multimodal tasks: Visual Question Answering with ViLT.
- Others: Knock Knock: Library to get a notification when your training is complete or when it crashes during the process with two additional lines of code. \
-
Google Vertex AI: Build, deploy, and scale ML models faster, with pre-trained and custom tooling within a unified artificial intelligence platform. Google provides many AI products to speed up prototyping and production such as AutoML, and Dialogflow. Here is the complete list.
-
ZenML: Extensible, open-source MLOps framework to create production-ready machine learning pipelines. Built for data scientists, it has a simple, flexible syntax, is cloud- and tool-agnostic, and has interfaces/abstractions that are catered towards ML workflows.
-
Weights & Biases: A great ‍MLOps platform to build models faster with experiment tracking, dataset versioning, and model management.
-
AWS Sagemaker: Build, train, and deploy machine learning (ML) models for any use case with fully managed infrastructure, tools, and workflows.
-
IBM Watson Studio: Build and scale trusted AI on any cloud. Automate the AI lifecycle for MLOps.
-
Neptune.ai: Log, organize, compare, register, and share all your ML model metadata in a single place. Automate and standardize as your modelling team grows.
-
Papers with Code: Categorized list of state-of-the-art machine learning research along with open source code (if available / published by authors on GitHub).