-
Notifications
You must be signed in to change notification settings - Fork 32
Bad segmentation in Arabic project #27
Comments
Hi @uhallac! In your document there is no punctuation, so the segmenter has no hints to understand the structure of sentences. Moreover, all the text is in one single paragraph, so morphologically it is correct to not split it into more segments. Can you please explain me better the segmentation you were expecting? |
Hi @giusilvano, Where are the tags coming from? I don't see any special characters between words but spaces only. |
I checked in the source file of your project and each word seems to carry an ID related to a past revision-check work. The filters are producing tags to preserve these IDs in the target file. We have to discuss internally if this is useful or not. Can you confirm you used the Word's revisions feature on this text? |
The file was created using only a paragraph from a larger client document with the same issue. Not quite sure if revisions feature was used on it, couldn't detect them in Libre Office editor. As far as I know Matecat doesn't let such documents get analyzed at all, am I wrong? This restriction by the way is a huge obstacle when using the Matecat API to create projects automatically. Latest revision of a document should be used in such cases in my opinion. Thank you. |
You are right, MateCat does not allow files with revisions. Our point on this is that a file with revisions contains a lot of comments and suggestions that must be accepted / rejected by a human in order to have the document in a consistent state. Moreover the implementation of the auto-accept of revisions is really hard! Anyway this issue requires a fix on the underlying framework, Okapi. I will communicate them the problem, but I can't estimate how long it will take to process it. |
Thank you for the information. |
Can you please check the following Arabic project?
https://www.matecat.com/translate/33409docx/ar-SA-en-GB/1116612-3ffd0e90c8f0
The segmentation seems to have failed. Do you think this is a Matecat-Filters issue?
Thank you.
The text was updated successfully, but these errors were encountered: