DocTopic

What is DocTopic?

A topic modelling and similarity retrieval interface that helps you managing your documents. DocTopic uses Gensim, a popular Python library designed for implementing key NLP algorithms at scale.

Some excellent tutorials can be found on their website. They also offer support and professional services.
An interactive introduction to similarity search can be found here.

Features

Create a searchable corpus from your multilingual documents with 2 clicks.
Use unsupervised training algorithms such as Latent Semantic Analysis and Latent Dirichlet Allocation for topic modelling purposes.
Query your corpus to retrieve documents that are structurally similar or belong to a similar domain.
Update your search indices with new files so that they can be retrieved later.
Use the Jupyter notebook implementation to run the app on a remote server.

Use cases for translation service providers

Identify relevant resources from historical project data such as:

previous translations to be used as templates
translation vendors who are experts in their field
project parameters such as turn-around times, pre-processing steps, etc.

Quickly assess the similarity of files within a project to help with:

staggered/cascading deliveries
assigning files to multiple vendors

Classify documents automatically and create topic clusters to better understand:

the translation needs of your customer segments
your level of specialization and how you can use it to build your brand

Installation

DocTopic has been created with Python 3.7. It requires Gensim in addition to Numpy, Scipy and PyQt5/qtpy. You will probably want to us a virtual environment like conda. The Anaconda distribution comes with the latter packages already installed. Then:

pip install -U gensim

Questions

If you found any of the content from this repo helpful, confusing or missing, I would like to hear from you.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
GUI		GUI
sources		sources
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
controls.py		controls.py
doctopic.ipynb		doctopic.ipynb
gui.py		gui.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DocTopic

What is DocTopic?

Features

Use cases for translation service providers

Installation

Questions

About

Releases

Packages

Languages

License

SeeligA/doctopic

Folders and files

Latest commit

History

Repository files navigation

DocTopic

What is DocTopic?

Features

Use cases for translation service providers

Installation

Questions

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages