The first attempt to apply large multimodal models in paleography and archaeology
2School of Humanities, Shanghai Jiao Tong University
*Corresponding authors
中文版速递:知乎
Overview of the OBI-Bench: OBI-Bench presents five in-process tasks: 1) recognition: locating dense oracle bone characters from original oracle bone or rubbings; 2) rejoining: reconstructing fragmented text fragments into coherent texts; 3) classification: categorizing individual characters into their respective meanings; 4) retrieval: returning relevant results according to the given query OBI images; 5) deciphering: interpreting the OBI for historical and cultural investigation.
- [2024/12/2] 🔥Github repo for OBI-Bench is online.
We collect 5,523 OBI images from 11 distinct sources. Due to the lack of publicly available OBI recognition datasets on real oracle bones and OBI rejoining datasets, we propose the original oracle bone recognition (O2BR) dataset and OBI-rejoin dataset.
We select 23 up to date and prevailing LMMs for evaluation including 6 proprietary LMMs and 17 open-source LMMs.
Results on the classification tasks (click to expand)
- Effects of the number of character categories on classification accuracy:
- To be released
- To be released
Please contact the first author of this paper for queries.
- Zijian Chen,
[email protected]
If you find our work interesting, please feel free to cite our paper:
@article{chen2024obi,
title={OBI-Bench: Can LMMs Aid in Study of Ancient Script on Oracle Bones?},
author={Chen, Zijian and Chen, Tingzhu and Zhang, Wenjun and Zhai, Guangtao},
journal={arXiv preprint arXiv:2412.01175},
year={2024}
}
We extend our deepest gratitude to the frontline OBI researchers and scholars involved in the meticulous collation and proofreading of the oracle bone inscriptions. It is your persistent manual efforts that have provided a valuable data foundation for the development of artificial intelligence models.