This dataset includes a list of accidentally duplicate pull requests collected from GitHub, which can be seen in
You would be appreciated if you can open an issue/pull-request to
- add new duplicates you have found
- point out the errors in the dataset
Attention: please do not submit duplicate issue/pull-request :)
title={A dataset of duplicate pull-requests in github},
author={Yu, Yue and Li, Zhixing and Yin, Gang and Wang, Tao and Wang, Huaimin},
booktitle={Proceedings of the 15th International Conference on Mining Software Repositories},
Li, Z., Yu, Y., Zhou, M., Wang, T., Yin, G., Lan, L, & Wang, H.Redundancy, Context, and Preference: An Empirical Study of Duplicate Pull Requests in OSS Projects. (2020). IEEE Transactions on Software Engineering (TSE)
Wang, Q., Xu, B., Xia, X., Wang, T., & Li, S. (2019, October). Duplicate Pull Request Detection: When Time Matters. In Proceedings of the 11th Asia-Pacific Symposium on Internetware (pp. 1-10).
Zhou, S., Vasilescu, B., & Kästner, C. (2019, August). What the fork: a study of inefficient and efficient forking practices in social coding. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE) (pp. 350-361).
Ren, L., Zhou, S., Kästner, C., & Wąsowski, A. (2019, February). Identifying redundancies in fork-based development. In Proceedings 2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER) (pp. 230-241). IEEE.
Li, Z., Yu, Y., Wang, T., Yin, G., Mao, X., & Wang, H. (2019). Detecting Duplicate Contributions in Pull-based Model Combining Textual and Change Similarities. Journal of Computer Science and Technology.
Li, Z., Yin, G., Yu, Y., Wang, T., & Wang, H. (2017, September). Detecting duplicate pull-requests in github. In Proceedings of the 9th Asia-Pacific Symposium on Internetware (pp. 1-6).