You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The cluster generator uses fields of the msg assuming that they will be strings.
However that is not the case if non-ascii characters have been used.
In such cases, code such as msg.get('subject') will return an email.header.Header object.
This causes code such as bytes(subject, encoding = 'ascii') to fail with
TypeError: encoding without a string argument
In turn, this causes the archiver to revert to a very basic fallback mid:
mid = hashlib.sha224(str("%s-%s" % (lid, msg_metadata['archived-at'])).encode('utf-8')).hexdigest() + "@" + (lid if lid else "none")
Unless archived-at is defined, this will be constant for a given list id
This is relatively easy to fix; the generator should use the msg_metadata dict which
the archiver has already set up.
HOWEVER, to ensure that it's possible to regenerate the same Permalinks, any fix MUST be implemented as a new generator type, with a new syntax (i.e. change the 'r' prefix).
There are probably some other changes that need to be made to the cluster generator.
For example, Message-Id should be canonicalised.
Note that the fallback mid cannot be changed, as that would affect all the generators.
The text was updated successfully, but these errors were encountered:
Let's bump the 'r' prefix then. I'd suggest moving to next char, 'v' :)
So, am I understanding this right in that the generators should be passed msg_metadata instead of msg in the first variable slot, and that would address the main issue? If so, we'd probably be best off by accessing them with msg.get('subject', '') as they may not have been present in the email and this would avoid a key error.
What do you mean by message-id being canonicalized?
The cluster generator uses fields of the msg assuming that they will be strings.
However that is not the case if non-ascii characters have been used.
In such cases, code such as msg.get('subject') will return an email.header.Header object.
This causes code such as bytes(subject, encoding = 'ascii') to fail with
TypeError: encoding without a string argument
In turn, this causes the archiver to revert to a very basic fallback mid:
Unless archived-at is defined, this will be constant for a given list id
This is relatively easy to fix; the generator should use the msg_metadata dict which
the archiver has already set up.
HOWEVER, to ensure that it's possible to regenerate the same Permalinks, any fix MUST be implemented as a new generator type, with a new syntax (i.e. change the 'r' prefix).
There are probably some other changes that need to be made to the cluster generator.
For example, Message-Id should be canonicalised.
Note that the fallback mid cannot be changed, as that would affect all the generators.
The text was updated successfully, but these errors were encountered: