Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dealing with multiple sentences #1

Open
LivC193 opened this issue Feb 27, 2020 · 2 comments
Open

Dealing with multiple sentences #1

LivC193 opened this issue Feb 27, 2020 · 2 comments

Comments

@LivC193
Copy link

LivC193 commented Feb 27, 2020

Hi sorry to bother you, but I have one question.

Documents have multiple sentences so how do you deal with that ? Do you split the text into sentences and the concatenate the final embeddings for each sentence or do you remove all punctuation marks so the text won't have any [SEP] tokens.

@xuyige
Copy link
Owner

xuyige commented Mar 10, 2020

thank you for your issue
for document classification, we do not split the text into sentences (except the Hierarchical methods)
we do not remove punctuation masks. for the whole document, we regard it as a long sentence.

@AnastasiaMaugham
Copy link

AnastasiaMaugham commented Dec 1, 2020

thank you for your issue
for document classification, we do not split the text into sentences (except the Hierarchical methods)
we do not remove punctuation masks. for the whole document, we regard it as a long sentence.

hi, could you tell me how to code with different numbers of sentences in the hierachical methods? (variant length of inputs)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants