Captioning Art-Historical Photographs

Table of Contents 📋

Overview
Report
Repo Structure
Setup
Datasets
1. Artpedia Dataset
2. Wildenstein Plattner Institute (WPI) Dataset
Models
Experiments
Contribution
Credits

Overview 🌃

The Wildenstein Plattner Institut owns a vast collection of art-historical photographs. To make them more accessible, captions are required, especially for visually impaired individuals. Because manually creating captions would be too time-consuming an automatic solution is needed. We present an approach, that is specifically adapted to art images. We work with the language-image pre-training framework BLIP. For that, we leverage the LAVIS library, which provides an interface for applying BLIP. We finetune BLIP's pre-trained base model on the art image dataset Artpedia and investigate methods to improve the training dataset and caption generation. In all approaches, the finetuned models show better results than the pre-trained ones. The captions become more detailed and sometimes contain art-specific content like painting styles.

Report 📄

If you are interested in the details of our project, please have a look at our report. There, you will find our motivation behind this project, a detailed explanation of the methods used, specifically the BLIP model and the evaluation metrics. All of our experiments and results are presented and discussed in depth, including qualitative and quantitative analysis. We also talk about possible future work, in case you'd like to enhance our work.

Repo Structure 🗂

Folder / File	Description
LAVIS	Forked LAVIS submodule
artists	Evaluation of the artist experiment & querying of artists
artpedia	Artpedia dataset, formatting, analysis, training & evaluation
grayscaled_artpedia	Converting Artpedia images to grayscale, training & evaluation
inference	Constrained caption generation
plot	Script for plotting the evaluation results
results	Evaluation results (finetuned vs. base model)
wpi	WPI dataset, analysis & evaluation
lavis_example.ipynb	Google Colab Jupyter notebook with examples for using LAVIS to generate captions & attention maps
report	Our extensive report of the project

Setup 🛠

Python and a package manager like pip and/or conda need to be installed on your system.

Open your terminal and clone the repository, including the submodules.

git clone --recurse-submodules  https://github.com/valeriatisch/captioning-art-photographs-blip.git

If cloned the repo without the submodules, you can run:

git submodule update --init

To update the repo with the submodules, you can run:

git pull --recurse-submodules

We recommend creating a virtual environment. You can create and activate a new environment with conda or another package manager of your preference.

conda create -n captioning_art pip python=3.8
conda activate captioning_art

To deactivate the environment, run:

conda deactivate

To remove the environment, run:

conda remove -n captioning_art --all

To set up the LAVIS submodule, please follow this guide. You can also use our installs.sh script for installation or take a look at it in case you run into installation problems. You might need to install another PyTorch and CUDA versions to match your systems requirements.

chmod +x installs.sh
./installs.sh

Install the remaining requirements. It might happen that some requirements will already be satisfied through the previous installation, but that shouldn't be a problem.

pip install -r requirements.txt

To run the code on your local machine, you can simply do as follows:

python path_to_file/file.py <args>

You will also find many shell scripts in the repo, equally named as the corresponding python scripts. The shell scripts are used to submit a batch job on an HPC cluster using the slurm workload manager. The script sets various Slurm directives, such as job name, email notifications, partition to run the job, GPUs needed, memory, and time limit. If you want to use the shell scripts, you need to adjust the settings inside them.
To run a job, execute:

sbatch path_to_file/file.sh

To train a model, run:

cd LAVIS
python train.py --cfg-path lavis/projects/blip/train/<CONFIG>.yaml

To generate captions, run:

cd LAVIS
python predict.py --image_path=<PATH_TO_IMAGE>

Please take a look into the script for more options.

To evaluate a model, run:

cd LAVIS
python evaluate.py --cfg-path lavis/projects/blip/eval/<CONFIG>.yaml

Please notice, that you need to adjust the YAML configs and fill in the right paths, or you might want to create your own. You also need to adjust the paths in LAVIS/lavis/tasks/captioning.py.

Please refer to the LAVIS readme and the LAVIS documentation for more information and advanced usage.

How to get the datasets used for our experiments, is described in the Datasets section.

How to run the individual experiments, is described in more detail in the Experiments section.

To run the Jupyter notebooks, please make sure Jupyter is installed on your system.

jupyter --version
pip install jupyter

Navigate to the directory containing the notebook you'd like to open and launch it:

cd path_to_directory_with_notebook
jupyter the_notebook.ipynb

The notebook will be opened in your default browser.

There is one notebook that needs to be run in a Google Colab environment. If you are not familiar with Google Colab, please look up this guide.

Datasets 📚

Artpedia Dataset

The Artpedia Dataset and the corresponding paper can be found here.
To download the images you can execute the beginning cells of this Jupyter Notebook to initiate the download of the images.
In the same notebook, we also provide an analysis of the Artpedia dataset.
All plots regarding the Artpedia dataset are saved in artpedia/plots.

We enhanced the Artpedia dataset, so that in the end each image has the following attributes:

Attribute	Description	Source
`title`	the title of the painting	original
`img_url`	the wikimedia url to the image where to download it	original
`year`	the year the painting was created	original
`visual_sentences`	a list of visual sentences describing the painting	original
`contextual_sentences`	a list of contextual sentences describing the painting	original
`split`	training (`train`), validation (`val`) and test (`test`) split	original
`got_img`	`yes` or `no`, depending on whether the image could be downloaded	new
`matching_scores`	a list of matching scores between the image and each visual sentence in the same order the sentences are stored `visual_sentences`	new
`cosine_similarities`	a list of cosine similarities between the image and each visual sentence in the same order the sentences are stored `visual_sentences`	new
`artist`	the artist of the painting	new

The enhanced dataset can be found here.

Wildenstein Plattner Institute (WPI) Dataset

The Annotations of the WPI dataset can be found in this JSON file.
You can execute the beginning cells of this Jupyter Notebook to initiate the download of the images.
In the same notebook, we also provide an analysis of the WPI dataset.
All plots regarding the WPI dataset are saved in wpi/plots.

The WPI dataset contains the following attributes for an image:

Attribute	Description	Present
`img_urls`	the url to the image where to download it	always
`Title`	the title of the image	always
`img_path`	the path to the downloaded image	always
`Genres`	a list of genres the image belongs to	sometimes
`Topics`	a list of topics the image contains	sometimes
`Names`	a list of names like the publisher or author of the image	sometimes
`Places`	a list of places the image displays	sometimes

Models 🧠

You can find the models we finetuned under this link. The password is blip_models.

Experiments 🧪

Do you want to recreate our experiments or use our scripts on other images? Here, we try to guide you through the process of preparing the datasets, finetuning, and evaluating the BLIP model as best we can.
In general, you just need to run the corresponding scripts as described in the Setup section.

Finetuning BLIP on Artpedia 🖼️

We finetune the BLIP base model on the Artpedia dataset. We use the training, validation, and test split provided by Artpedia.

Training 🏋️‍♀️

First, adjust the config file lavis/projects/blip/train/artpedia.yaml.
To train the model, adjust the shell script and then run:

sbatch artpedia/train_artpedia.sh

Or run the python script directly from the LAVIS directory specifying the path to the right config file:

cd LAVIS
python train.py --cfg-path lavis/projects/blip/train/artpedia.yaml

Evaluation 📊

Again, adjust lavis/projects/blip/eval/caption_artpedia_eval.yaml first.
To evaluate the model, adjust the shell script too, and run:

sbatch artpedia/eval_artpedia.sh

Or:

cd LAVIS
python evaluate.py --cfg-path lavis/projects/blip/eval/caption_artpedia_eval.yaml

Finetuning on Grayscale Images 🖤

We transform the images of the Artpedia dataset to grayscale and finetune the BLIP base model on them. Again, we use the training, validation, and test split provided by Artpedia.

To transform the images to grayscale, run:

sbatch grayscaled_artpedia/convert_bw.sh

Or run the python script directly specifying the input and output directory:

python grayscaled_artpedia/convert_bw.py artpedia/images/ artpedia/images/bw/

Finetuning on Filtered Artpedia Dataset 🔍

We apply BLIP's filter to calculate matching scores for the images and visual sentences of the Artpedia dataset. We filter out the image-text pairs having a score below a threshold of 80%.

To generate matching scores for the dataset, run:

cd LAVIS
sbatch match.sh

Or run the python script directly specifying the args:

cd LAVIS
python caption_matching.py ../artpedia/imgs ../artpedia/artpedia_res.json ../artpedia/artpedia_scored.json

To train with a filtered version of the Artpedia dataset use LAVIS/lavis/projects/blip/train/artpedia_filtered.yaml. You can define your threshold in this config as well.

Constrained Caption Generation 🤖

We explore the potential of constrained caption generation by forcing tags or other words to be included in generated captions.

To generate captions with a constraint, you can run LAVIS/predict.py and pass the words to include with --forcewords.

Artists 👩‍🎨

We investigate the model's ability to recognize the artists of given artworks.

Unfortunately, the original Artpedia dataset does not provide the artists. You can run artists/query_artpedia_artists.py or artists/query_artpedia_artists.sh to get the artists. Don't forget to specify the input and output paths.

For evaluation, use lavis/projects/blip/eval/caption_artists_eval.yaml.

Contribution 🤝

You are more than welcome to contribute to this project.
Please feel free to open an issue, create a pull request or just share an idea.

Credits 🙏

This project makes use of the following two libraries and a dataset:

Artpedia - "A New Visual-Semantic Dataset with Visual and Contextual Sentences" by Matteo Stefanini, Marcella Cornia, Lorenzo Baraldi, Massimiliano Corsini, Rita Cucchiara
BLIP - "Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation" by Junnan Li, Dongxu Li, Caiming Xiong, Steven Hoi
LAVIS - "A Library for Language-Vision Intelligence" by Dongxu Li, Junnan Li, Hung Le, Guangsen Wang, Silvio Savarese, Steven C. H. Hoi

We want to say a special thank you to the developers for their hard work and for making their code publicly available for others to use.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Captioning Art-Historical Photographs

Table of Contents 📋

Overview 🌃

Report 📄

Repo Structure 🗂

Setup 🛠

Datasets 📚

Artpedia Dataset

Wildenstein Plattner Institute (WPI) Dataset

Models 🧠

Experiments 🧪

Finetuning BLIP on Artpedia 🖼️

Training 🏋️‍♀️

Evaluation 📊

Finetuning on Grayscale Images 🖤

Finetuning on Filtered Artpedia Dataset 🔍

Constrained Caption Generation 🤖

Artists 👩‍🎨

Contribution 🤝

Credits 🙏

About

Releases 1

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
LAVIS @ a154d41		LAVIS @ a154d41
artists		artists
artpedia		artpedia
grayscaled_artpedia		grayscaled_artpedia
inference		inference
plot		plot
results		results
wpi		wpi
.gitignore		.gitignore
.gitmodules		.gitmodules
Changes.md		Changes.md
README.md		README.md
get_stanford_model.sh		get_stanford_model.sh
installs.sh		installs.sh
lavis_example.ipynb		lavis_example.ipynb
report.pdf		report.pdf
requirements.txt		requirements.txt

valeriatisch/captioning-art-historical-photographs

Folders and files

Latest commit

History

Repository files navigation

Captioning Art-Historical Photographs

Table of Contents 📋

Overview 🌃

Report 📄

Repo Structure 🗂

Setup 🛠

Datasets 📚

Artpedia Dataset

Wildenstein Plattner Institute (WPI) Dataset

Models 🧠

Experiments 🧪

Finetuning BLIP on Artpedia 🖼️

Training 🏋️‍♀️

Evaluation 📊

Finetuning on Grayscale Images 🖤

Finetuning on Filtered Artpedia Dataset 🔍

Constrained Caption Generation 🤖

Artists 👩‍🎨

Contribution 🤝

Credits 🙏

About

Resources

Stars

Watchers

Forks

Releases 1

Languages