Skip to content

OBI-Future/OBI-Bench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 

Repository files navigation

OBI-Bench: Can LMMs Aid in Study of Ancient Script on Oracle Bones? 🔍

The first attempt to apply large multimodal models in paleography and archaeology

1Institute of Image Communication and Information Processing, Shanghai Jiao Tong University

2School of Humanities, Shanghai Jiao Tong University

*Corresponding authors

中文版速递:知乎

Overview of the OBI-Bench: OBI-Bench presents five in-process tasks: 1) recognition: locating dense oracle bone characters from original oracle bone or rubbings; 2) rejoining: reconstructing fragmented text fragments into coherent texts; 3) classification: categorizing individual characters into their respective meanings; 4) retrieval: returning relevant results according to the given query OBI images; 5) deciphering: interpreting the OBI for historical and cultural investigation.

Release

General Principles

Focusing on OBI Task-oriented Abilities of LMMs & Covering Multi-stage Font Appearances

Image Sources

We collect 5,523 OBI images from 11 distinct sources. Due to the lack of publicly available OBI recognition datasets on real oracle bones and OBI rejoining datasets, we propose the original oracle bone recognition (O2BR) dataset and OBI-rejoin dataset.

Benchmark Candidates

We select 23 up to date and prevailing LMMs for evaluation including 6 proprietary LMMs and 17 open-source LMMs.

Performance Benchmark on Five OBI Tasks

Results on the recognition tasks (click to expand)
Results on the rejoining tasks (click to expand)
Results on the classification tasks (click to expand)
  • Effects of the number of character categories on classification accuracy:
Results on the retrieval tasks (click to expand)
Results on the deciphering tasks (click to expand)
  • Comparison between GPT-4o and Qwen-VL-Max:
More deciphering results (click to expand)

Original Oracle Bone Recognition (O2BR) Dataset 📦

  • To be released

OBI-rejoin Dataset 📦

  • To be released

Contact 📧

Please contact the first author of this paper for queries.

Citation📎

If you find our work interesting, please feel free to cite our paper:

@article{chen2024obi,
  title={OBI-Bench: Can LMMs Aid in Study of Ancient Script on Oracle Bones?},
  author={Chen, Zijian and Chen, Tingzhu and Zhang, Wenjun and Zhai, Guangtao},
  journal={arXiv preprint arXiv:2412.01175},
  year={2024}
}

Acknowledgements💡

We extend our deepest gratitude to the frontline OBI researchers and scholars involved in the meticulous collation and proofreading of the oracle bone inscriptions. It is your persistent manual efforts that have provided a valuable data foundation for the development of artificial intelligence models.