Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding list of actual "registered" domains in result data #75

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

TytoCapensis
Copy link

Currently, when analyzing an email with eml_parser, the domains appearing in the body of the email are given in the output, like this:

"domain": [
	"b2b.parallels.com",
	"click.parallels.com",
	"coronavirus.data.gov.uk"
],

However, a possible issue here is that the full domains are listed, including the subdomains part. This can make the identification of entities and actors complicated if a lot of subdomains are present in the domain table.

This commit takes the opportunity to use publicsuffixlist (already used in eml_parser) to add a table named domain_registered in the data returned by an eml_parser analysis.

The domains in domain_registered are the true registered domains, i.e. the "closest" domains to the TLD. Thanks to publicsuffixlist, public suffixes like co.uk or co.jp can be taken into consideration.

Now, the output looks like this:

"domain": [
	"b2b.parallels.com",
	"click.parallels.com",
	"coronavirus.data.gov.uk"
],
"domain_registered": [
	"parallels.com",
	"data.gov.uk"
],

Do not hesitate to suggest any improvements (especially regarding the name of the table)

@TytoCapensis TytoCapensis changed the title Added domain_registered table in result data Adding list of actual "registered" domains in result data Aug 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant