Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integration of bot data #37

Closed
bockthom opened this issue Jun 8, 2021 · 0 comments · Fixed by #38
Closed

Integration of bot data #37

bockthom opened this issue Jun 8, 2021 · 0 comments · Fixed by #38
Assignees

Comments

@bockthom
Copy link
Collaborator

bockthom commented Jun 8, 2021

As we aim at integrating bot data to be able to filter bots, @hechtlC and I discussed on how we can achieve this. We agreed on the following adaption to our toolchain:

From an external tool, we get a prediction on which GitHub user should be treated as bot in a certain project. The output of this tool contains a row for each user, where the first column is the GitHub username and the last column is the predicted user type (Bot, Human, Unknown).

As we just have a user name, but neither name nor e-mail address, we cannot just pass the username to the id service. However, as all the users should already appear in the GitHub issue data, we just need to get a mapping from username to name and e-mail address from the GitHub issue processing. So, the bot processing will look like as follows:

(0) Within GitHub issue processing: Dump final user table as a csv file, to be able to map username to name and e-mail.

Bot processing:
(1) Read bot data (created from external tool).
(2) Read user table (dumped from GitHub issue processing).
(3) Get name and e-mail from user table for each username appearing in the bot data.
(4) Dump final bots.list file, consisting of three columns: "name", "e-mail", "user type"

(5) Within author post-processing: Update name and e-mail of bot data also during post-processing.

This implementation is will be the baseline for se-sic/coronet#202.


As we dump the user table after the GitHub issue processing, this might also be useful when dealing with issue #24 (at some point in some)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant