You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As we aim at integrating bot data to be able to filter bots, @hechtlC and I discussed on how we can achieve this. We agreed on the following adaption to our toolchain:
From an external tool, we get a prediction on which GitHub user should be treated as bot in a certain project. The output of this tool contains a row for each user, where the first column is the GitHub username and the last column is the predicted user type (Bot, Human, Unknown).
As we just have a user name, but neither name nor e-mail address, we cannot just pass the username to the id service. However, as all the users should already appear in the GitHub issue data, we just need to get a mapping from username to name and e-mail address from the GitHub issue processing. So, the bot processing will look like as follows:
(0) Within GitHub issue processing: Dump final user table as a csv file, to be able to map username to name and e-mail.
Bot processing:
(1) Read bot data (created from external tool).
(2) Read user table (dumped from GitHub issue processing).
(3) Get name and e-mail from user table for each username appearing in the bot data.
(4) Dump final bots.list file, consisting of three columns: "name", "e-mail", "user type"
(5) Within author post-processing: Update name and e-mail of bot data also during post-processing.
As we aim at integrating bot data to be able to filter bots, @hechtlC and I discussed on how we can achieve this. We agreed on the following adaption to our toolchain:
From an external tool, we get a prediction on which GitHub user should be treated as bot in a certain project. The output of this tool contains a row for each user, where the first column is the GitHub username and the last column is the predicted user type (Bot, Human, Unknown).
As we just have a user name, but neither name nor e-mail address, we cannot just pass the username to the id service. However, as all the users should already appear in the GitHub issue data, we just need to get a mapping from username to name and e-mail address from the GitHub issue processing. So, the bot processing will look like as follows:
(0) Within GitHub issue processing: Dump final user table as a csv file, to be able to map username to name and e-mail.
Bot processing:
(1) Read bot data (created from external tool).
(2) Read user table (dumped from GitHub issue processing).
(3) Get name and e-mail from user table for each username appearing in the bot data.
(4) Dump final
bots.list
file, consisting of three columns: "name", "e-mail", "user type"(5) Within author post-processing: Update name and e-mail of bot data also during post-processing.
This implementation is will be the baseline for se-sic/coronet#202.
As we dump the user table after the GitHub issue processing, this might also be useful when dealing with issue #24 (at some point in some)
The text was updated successfully, but these errors were encountered: