Merge pull request #6 from nlpaueb/vkougia_overall_upgrading

Update
nlpaueb · Oct 11, 2019 · 6a90ed4 · 6a90ed4
2 parents 72ecff3 + b082a19
commit 6a90ed4
Show file tree

Hide file tree

Showing 11 changed files with 387 additions and 527 deletions.
diff --git a/SiVL19/README.md b/SiVL19/README.md
@@ -1,7 +1,20 @@
 A Survey on Biomedical Image Captioning
 =================
 
-Implementation of the baseline and evaluation methods described in the [paper.](https://arxiv.org/abs/1905.13302)
+Code to download and preprocess the datasets, run the baselines and evaluate 
+the results as described in the paper 
+[A Survey on Biomedical Image Captioning](https://www.aclweb.org/anthology/W19-1803).
+
+> V. Kougia, J. Pavlopoulos and I Androutsopoulos, "A Survey on Biomedical Image Captioning". 
+Proceedings of the Workshop on Shortcomings in Vision and Language of the Annual Conference 
+of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT 2019), Minneapolis, USA, 2019.
+
+## Dependencies ##
+To use this code you will need to install python 3.6 and the packages from the requirements.txt file. To install them run 
+```shell
+pip install -r requirements.txt.
+```
+To use the MS COCO evaluation script (coco_evaluation.py) follow the instructions described [here](https://github.com/salaniz/pycocoevalcap) to install the library.
 
 ## Datasets ##
 
@@ -10,7 +23,7 @@ available, so to download it you need to follow the instructions described [here
 in the Participant registration section. Then, you can run the corresponding script that uses the downloaded *csv*
 file.
 For each dataset a folder is created that contains the images and the data *tsv* files with 
-the following format: *image_name <\t> caption*.
+the following format: *image_name <\t> caption*. All data files as well as the results files should follow this format.
 
 
 ```shell
@@ -30,16 +43,13 @@ the demo script *sivl_run_me.ipynb*.
 
 ## Evaluation ##
 
-The evaluation with the WMS can be performed as shown in *sivl_run_me.ipynb*.
-To evaluate with the BLEU 1-4, METEOR and ROUGE measures we used the [MS COCO caption evaluation code](https://github.com/tylin/coco-caption).
-After you git clone the code and have the specified requirements run the following commands to move our 
-two scripts in the coco-caption folder and perform the evaluation. 
-To run the main script *mscoco_main_eval.py* give as arguments the path to the dataset folder that contains 
-the *json* files and the dataset name.
+The evaluation with the WMD and the MS COCO captioning measures can be performed as in *sivl_run_me.ipynb*.
+You can either use the *compute_wmd* and *compute_scores* methods for the WMD and MS COCO evaluations respectively (as shown in *sivl_run_me.ipynb*) 
+or run the main methods of the scripts providing the necessary arguments as shown below:
 ```shell
-git clone https://github.com/tylin/coco-caption.git
-mv bio_image_caption/SiVL19/mscoco_main_eval.py coco-caption/
-mv bio_image_caption/SiVL19/bio_eval.py coco-caption/
-python coco-caption/mscoco_main_eval.py /dataset_folder dataset_name
-```
+# For the WMD evaluation:
+python wmd_evaluation.py path_to_gold_captions/gold.tsv path_to_results/results.tsv path_to_embeddings/emb.bin
 
+# For the MSCOCO evalaution:
+python coco_evaluation.py path_to_gold_captions/gold.tsv path_to_results/results.tsv
+```
diff --git a/SiVL19/bio_eval.py b/SiVL19/bio_eval.py
diff --git a/SiVL19/coco_evaluation.py b/SiVL19/coco_evaluation.py
@@ -0,0 +1,79 @@
+import re
+import argparse
+import pandas as pd
+from pycocoevalcap.bleu.bleu import Bleu
+from pycocoevalcap.meteor.meteor import Meteor
+from pycocoevalcap.rouge.rouge import Rouge
+
+parser = argparse.ArgumentParser(description="Takes as arguments a file with the gold captions and "
+                                             "a file with the generated ones and computes "
+                                             "BLEU 1-4, METEOR and Rouge-L measures")
+parser.add_argument("gold", help="Path to tsv file with gold captions")
+parser.add_argument("generated", help="Path to tsv file with generated captions")
+
+
+def preprocess_captions(images_captions):
+    """
+
+    :param images_captions: Dictionary with image ids as keys and captions as values
+    :return: Dictionary with the processed captions as values
+    """
+
+    # Clean for BioASQ
+    bioclean = lambda t: re.sub('[.,?;*!%^&_+():-\[\]{}]', '',
+                                t.replace('"', '').replace('/', '').replace('\\', '').replace("'",
+                                                                                              '').strip().lower())
+    pr_captions = {}
+    # Apply bio clean to data
+    for image in images_captions:
+        # Save caption to an array to match MSCOCO format
+        pr_captions[image] = [bioclean(images_captions[image])]
+
+    return pr_captions
+
+
+def compute_scores(gts, res):
+    """
+    Performs the MS COCO evaluation using the Python 3 implementation (https://github.com/salaniz/pycocoevalcap)
+
+    :param gts: Dictionary with the image ids and their gold captions,
+    :param res: Dictionary with the image ids ant their generated captions
+    :print: Evaluation score (the mean of the scores of all the instances) for each measure
+    """
+
+    # Preprocess captions
+    gts = preprocess_captions(gts)
+    res = preprocess_captions(res)
+
+    # Set up scorers
+    scorers = [
+        (Bleu(4), ["Bleu_1", "Bleu_2", "Bleu_3", "Bleu_4"]),
+        (Meteor(), "METEOR"),
+        (Rouge(), "ROUGE_L")
+    ]
+
+    # Compute score for each metric
+    for scorer, method in scorers:
+        print("Computing", scorer.method(), "...")
+        score, scores = scorer.compute_score(gts, res)
+        if type(method) == list:
+            for sc, m in zip(score, method):
+                print("%s : %0.3f" % (m, sc))
+        else:
+            print("%s : %0.3f" % (method, score))
+
+
+if __name__ == "__main__":
+
+    args = parser.parse_args()
+    gold_path = args.gold
+    results_path = args.generated
+
+    # Load data
+    gts_data = pd.read_csv(gold_path, sep="\t", header=None, names=["image_ids", "captions"])
+    gts = dict(zip(gts_data.image_ids, gts_data.captions))
+
+    res_data = pd.read_csv(results_path, sep="\t", header=None, names=["image_ids", "captions"])
+    res = dict(zip(res_data.image_ids, res_data.captions))
+
+    compute_scores(gts, res)
diff --git a/SiVL19/create_json_files.py b/SiVL19/create_json_files.py
diff --git a/SiVL19/create_vocabulary.py b/SiVL19/create_vocabulary.py
@@ -2,44 +2,56 @@
 import os
 
 
-def create_vocabulary(filepath):
+def create_vocabulary(filepath, results_path):
+    """
+    Creates vocabulary of unique words and computes statistics for the train captions
 
-	# clean for BioASQ
-	bioclean = lambda t: re.sub('[.,?;*!%^&_+():-\[\]{}]', '', t.replace('"', '').replace('/', '').replace('\\', '').replace("'",'').strip().lower()).split()
+    :param filepath: The path to the train data tsv file with the form: "image \t caption"
+    :param results_path: The folder in which to save the vocabulary file
+    :return: The average caption length
+    """
 
-	total_words = []
-	pr_captions = []
+    # Clean for BioASQ
+    bioclean = lambda t: re.sub('[.,?;*!%^&_+():-\[\]{}]', '',
+                                t.replace('"', '').replace('/', '').replace('\\', '')
+                                .replace("'",'').strip().lower()).split()
 
-	#load data
-	train_path = os.path.join(filepath, "train_images.tsv")
 
-	with open(train_path, "r") as file:
+    total_words = []
+    pr_captions = []
 
-		for line in file:
-			line = line.replace("\n", "").split("\t")
+    # Read data
+    with open(filepath, "r") as file:
 
-			tokens = bioclean(line[1])
-			for token in tokens:
-				total_words.append(token)
-			caption = " ".join(tokens)
-			pr_captions.append(caption)
+        for line in file:
+            line = line.replace("\n", "").split("\t")
 
+            # Apply bioclean to the caption
+            tokens = bioclean(line[1])
+            for token in tokens:
+                total_words.append(token)
+            caption = " ".join(tokens)
+            pr_captions.append(caption)
 
-	print("Total number of captions is",len(pr_captions))
 
-	unique_captions = set(pr_captions)
-	print("Total number of unique captions is", len(unique_captions))
+    print("Total number of captions is",len(pr_captions))
 
-	mean_length = len(total_words)/len(pr_captions)
-	print("The average caption length is", mean_length, "words")
+    # Find the unique captions in the train data
+    unique_captions = set(pr_captions)
+    print("Total number of unique captions is", len(unique_captions))
 
-	#create vocabulary of unique words
-	vocabulary = set(total_words)
-	print("Unique words are", len(vocabulary))
-	with open(os.path.join(filepath, "vocabulary.txt"), 'w') as output_file:
-		for word in vocabulary:
-			output_file.write(word)
-			output_file.write("\n")
+    # Compute the mean caption length
+    mean_length = len(total_words)/len(pr_captions)
+    print("The average caption length is", mean_length, "words")
+
+    # Create vocabulary of unique words
+    vocabulary = set(total_words)
+    print("Unique words are", len(vocabulary))
+    # Save vocabulary file to dataset folder
+    with open(os.path.join(results_path, "vocabulary.txt"), 'w') as output_file:
+        for word in vocabulary:
+            output_file.write(word)
+            output_file.write("\n")
 
 
-	return mean_length
+    return mean_length