the judge

the judge is a sentence bert model fine tuned to choose the sentence from a input text, that better responds to a query. At inference time, the model is binded to C++ for embeddings similarities calculation. Furthermore, the Flask API enables communication using HTTP requests to perform inferences with the model.

Build and Running Instructions

The project can be built using Docker or by setting up the local environment manually. Is highly recomended using a Docker container to avoid dependencies errors.

Docker

To build the Docker image, run from the root dir: docker build -t the-judge .. After creating the image (can be a slow step), run docker run -d -p 5000:5000 the-judge to run the container in the background, exposing the port 5000 to the user.

Manual Setup

To configure your local environment, you must first install the project's dependencies:

apt-get update && apt-get install -y \
    build-essential \
    python3 \
    python3-pip \
    python3-dev \
    make \
    g++ \
    curl \
    && apt-get clean

After installing system dependencies, run pip install -r requirements.txt to install python dependencies and make to compile the C++ code. To run the Flask server, execute from the root DIR python server/app.py.

In both scenarios (Docker or manual setup), to make a default test query to the model just run sh test.sh from the root DIR. Check the input_test.json file for query details.

Training Details

The model was trained using the Stanford Questions and Answers Dataset (SQUAD). To perform the training, the TrainingModel class was defined (in the Model.py file). The model was fine tuned in a Google Colab environenment using the train_model.ipynb. After the fine tuning, the model achieved a accuracy of 83% in the test set (10% higher than the base model).

Inference Details

As for training, an InferenceModel class was defined for inference. Additionally, to improve the model performance with large essays, the inference works as follows:

The system receives an essay and a list of queries.
The essay is parsed in smaller contexts.
For each query, is computed a rank of contexts by embedding similarity.
For each possible context, the best answer is computed. The first context that returns a valid response(i.e. with significant similarity value) is taken as correct and the query is marked as answered.

To more details, check the Model.py and the app.py files.

Comments and considerations

The model had good results, showing improved performance on the SQUAD dataset compared to the base model. However, when using a generic input (input_test.json) the model did not perform that good, answering some questions incorrectly and not identifying queries without answers. I believe that the model's performance can increase significantly by adding more ambiguous input texts and using larger contexts than those provided by SQUAD.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
cpp		cpp
data		data
model		model
server		server
.gitattributes		.gitattributes
.gitignore		.gitignore
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
requirements.txt		requirements.txt
test.sh		test.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

the judge

Build and Running Instructions

Docker

Manual Setup

Training Details

Inference Details

Comments and considerations

About

Releases

Packages

Languages

JoaoP-Silva/the-judge

Folders and files

Latest commit

History

Repository files navigation

the judge

Build and Running Instructions

Docker

Manual Setup

Training Details

Inference Details

Comments and considerations

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages