-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issues reproducing for Bios #26
Comments
Hi Antoine, Thanks for reaching out! Regarding the Bios dataset, the augmented Bios dataset with economy labels is recently released, and I will revise the preprocessing script to add it soon. For Bios experiments, I noticed that the batch size is set to 16 ("batch_size":16), which might be too small given the default learning rate ("lr":0.003). Could you please test with larger batch sizes or smaller learning rates? Hopefully this would help. Otherwise, feel free to share your codes, I am more than happy to help! Best, |
II reduced the lr and increased the batch size, it seems to work much better, thank you very much. Many thanks for this work and for your kind answer, Antoine |
Dear Xudong, Thanks again for your kind and prompt answer :-) . We still do not manage to reach the results you demonstrate in your articles (getting close but not exactly). For example for the CE baseline, we reach 79.05 as max accuracy on the BiasInBios dataset. Could you possibly share the parameters you used for BiasInBios and Moji (the optimal ones leading to the results in your papers). Similarly for the other methods ? Best regards, Antoine and Thibaud (@LetenoThibaud) |
Hi Antoine and Thibaud, Once again, thanks for reaching out! Please be aware that we used fixed encoder models (e.g. BERT) in our previous experiments, and only trained MLP to make predictions. In our recent experiments, we tried to fine-tuned the whole model and further improve the results. To fine-tune the whole BERT model, could you please:
In terms of the hyperparameters of each debiasing methods, we used the same batch size and learning rate as the vanilla method, and only search for the best trade-off hyperparameters for each debiasing method. The corresponding results are can downloaded. I have attached a jupyter notebook to demonstrate the process, which can be run in Google Colab. Please have a look and fell free to message me for any further information. Best, |
Hi Xudong, Thank you for your quick answer, Based on your code and by using the data downloaded from the notebook you sent, we managed reproducing your vanilla results. This will be very helpful for our works, thanks again. Best regards, Thibaud |
Dear Xudong,
First, a great thanks for your work, this is of high value for people working in fair classification. Kudos !
Second, I have some issue reproducing the results for the Bios dataset.
I used your code to download and preprocess the data : datasets.prepare_dataset("bios", "data/bios")
After that, in src/dataloaders/loaders/Bios.py, I had to comment:
if self.args.protected_task in ["economy", "both"] and self.args.full_label:
#if self.args.protected_task in ["gender", "economy", "both", "intersection"] and self.args.full_label:
Otherwise it couldn't build the datalaoder (because the data built with prepare_dataset does not contain economy_label).
Finally, I run this code:
##############
args = {
"dataset": "Bios_gender",
"emb_size": 768,
"num_classes": 28,
"batch_size": 16,
"data_dir": "data/bios",
"device_id": 0,
"exp_id":"fcl",
}
debias_options = fairlib.BaseOptions()
debias_state = debias_options.get_state(args=args, silence=True)
fairlib.utils.seed_everything(2022)
debias_model = fairlib.networks.get_main_model(debias_state)
debias_model.train_self()
##############
Everything run well, except the model get random results and the loss is not improving over the epochs. Do you have a clue about what is happening ?
For Moji, it works perfectly.
Best regards, and thank you again for your work,
Antoine
The text was updated successfully, but these errors were encountered: