Make Pre-processing options work for PreTrainedVectorizer #307

dafajon · 2021-12-30T07:38:15Z

Currently get_pretrained_embeddings, get_bert_embeddings work on the raw form of the document. As a result preprocessing settings do not apply to the text that goes into the transformer based vectorizers.

Add ignore_preprocess option to vectorizer to use raw text.
Build input str sequence from filtered Token objects before passing it to the SentenceTransformer.encode method.

The text was updated successfully, but these errors were encountered:

dafajon assigned dafajon and ertugrul-dmr Dec 30, 2021

ertugrul-dmr linked a pull request Jan 5, 2022 that will close this issue

Pre-processing options work for PreTrainedVectorizers [resolves #307] #308

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make Pre-processing options work for PreTrainedVectorizer #307

Make Pre-processing options work for PreTrainedVectorizer #307

dafajon commented Dec 30, 2021

Make Pre-processing options work for PreTrainedVectorizer #307

Make Pre-processing options work for PreTrainedVectorizer #307

Comments

dafajon commented Dec 30, 2021