Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix extraction of Message-IDs and fix function that sets the Thread-ID wrongly #1

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

bockthom
Copy link

@bockthom bockthom commented May 7, 2021

In this PR, we fix two bugs which caused wrong data when identifying mail threads:

  1. In some cases, the header field for message ids is not spelled as expected. Such mails just got a numeric message id, which means, that these mails could not be mapped to any references/reply-to-header in other messages, and are, thus, ignored in thread computation. (For instance, in qemu we identified roughly 28,500 mails for which the message id has not been extracted, which is quite a substantial number of e-mails. Including two spelling variants of Message-ID, namely Message-Id and Message-id decreased the number of such e-mails from 28,500 to 10). We provide a fix to cover these additional spelling variants if no header field spelled Message-ID is available.
  2. Even when having extracted the "correct" Message-IDs, the thread computation contained two additional bugs:
  • When there are multiple "In-Reply-To" fields, we iterate over them until we find a referenced message id in our hashtable. However, we need to break as soon as we found one, instead of calling next which could result in a not-available reference.
  • If there are multiple "In-Reply-To" fields, we end up in unrelated mails having the same thread id (e.g., 0), as the thread id of the wrong e-mail is updated as the loop variable of the inner loop is called i, as also the loop variable of the outer loop is called. That is, when returning from the inner loop, the outer loop accesses i again but gets the last value of the inner loop instead of the current value of the outer loop, and, thus, is updating the wrong array element. We renamed the inner loop variable to j to avoid this problem.

[As Codeface installs tm-plugin-mail from your repository, this PR also acts as a direct bugfix to Codeface.]

Usually, there has to be a "Message-ID" header in every e-mail.
Unfortunately, sometimes, this field is also spelled "Message-Id" instead.
So, adjust the readMail function such that it grep for spelling variant "Message-Id" if no "Message-ID" was found.

Signed-off-by: Thomas Bock <[email protected]>
Usually, there has to be a "Message-ID" header in every e-mail.
Unfortunately, sometimes, this field is also spelled "Message-Id" instead, and sometimes even "Message-id".
So, adjust the readMail function such that it grep for spelling variant "Message-id" if neither "Message-ID" nor "Message-Id" was found.

Signed-off-by: Thomas Bock <[email protected]>
If there are multiple parentIDs and iterating over them, just break if there exists a thread for one of the parentIDs.
To achieve this, a `break` is needed instead of `next` within the `for` loop.

Signed-off-by: Thomas Bock <[email protected]>
The inner loop in the the threads function used the same loop variable ("i") as the outer loop. As a consequence, at the end of the inner loop, i contains the last value of the inner loop iteration instead of the current outer loop i, and thus the thread id of a completely wrong message got updated. (However, this did, fortunately, not happen very often as the inner loop is executed only very very rarely.)
To fix this problem, rename the loop-variable of the inner loop to "j" instead.

Signed-off-by: Thomas Bock <[email protected]>
Sometimes the Message-ID is not in the same line as the key "Message-ID:" but in the subsequent line.
This has not been considered yet and leads to an error as the Message-ID is set to an empty string.
To fix this, check if the Message-ID is an empty string, and if so, extract the subsequent line from the header.
As in most cases, the subsequent line is indented, remove the white space in front of the actual Message-ID.
Signed-off-by: Thomas Bock <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant