Skip to content

Latest commit

 

History

History
21 lines (20 loc) · 4.99 KB

README_Industrial Strength NLP.md

File metadata and controls

21 lines (20 loc) · 4.99 KB

Industrial Strength NLP

  • Blackstone - Blackstone is a spaCy model and library for processing long-form, unstructured legal text. Blackstone is an experimental research project from the Incorporated Council of Law Reporting for England and Wales' research lab, ICLR&D.
  • CTRL - A Conditional Transformer Language Model for Controllable Generation released by SalesForce
  • Facebook's XLM - PyTorch original implementation of Cross-lingual Language Model Pretraining which includes BERT, XLM, NMT, XNLI, PKM, etc.
  • Flair - Simple framework for state-of-the-art NLP developed by Zalando which builds directly on PyTorch.
  • Github's Semantic - Github's text library for parsing, analyzing, and comparing source code across many languages .
  • GluonNLP - GluonNLP is a toolkit that enables easy text preprocessing, datasets loading and neural models building to help you speed up your Natural Language Processing (NLP) research.
  • GNES - Generic Neural Elastic Search is a cloud-native semantic search system based on deep neural networks.
  • Grover - Grover is a model for Neural Fake News -- both generation and detection. However, it probably can also be used for other generation tasks.
  • Kashgari - Kashgari is a simple and powerful NLP Transfer learning framework, build a state-of-art model in 5 minutes for named entity recognition (NER), part-of-speech tagging (PoS), and text classification tasks.
  • OpenAI GPT-2 - OpenAI's code from their paper "Language Models are Unsupervised Multitask Learners".
  • sense2vec - A Pytorch library that allows for training and using sense2vec models, which are models that leverage the same approach than word2vec, but also leverage part-of-speech attributes for each token, which allows it to be "meaning-aware"
  • Snorkel - Snorkel is a system for quickly generating training data with weak supervision https://snorkel.org.
  • SpaCy - Industrial-strength natural language processing library built with python and cython by the explosion.ai team.
  • Stable Baselines - A fork of OpenAI Baselines, implementations of reinforcement learning algorithms http://stable-baselines.readthedocs.io/.
  • Tensorflow Lingvo - A framework for building neural networks in Tensorflow, particularly sequence models. Lingvo: A TensorFlow Framework for Sequence Modeling.
  • Tensorflow Text - TensorFlow Text provides a collection of text related classes and ops ready to use with TensorFlow 2.0.
  • Wav2Letter++ - A speech to text system developed by Facebook's FAIR teams.
  • YouTokenToMe - YouTokenToMe is an unsupervised text tokenizer focused on computational efficiency. It currently implements fast Byte Pair Encoding (BPE) [Sennrich et al.].
  • 🤗 Transformers - Huggingface's library of state-of-the-art pretrained models for Natural Language Processing (NLP).