Mathsage (or Math-sage) is a semester-long project for Natural Language Processing (CS5624).
MathQA Dataset GitHub repo
Our final code is contained in the directory, "final", containing the following files:
- MathQA_Preprocessing: Preprocesses the MathQA dataset as outlined in our paper
- MathQA_Preprocessing2: A continuation of the previous file
- MathQA_Const_Classifier: Creates the multi label constant classifier portion of our model
- MathQA_Op_Classifier: Creates the multi label operator classifier portion of our model
- MathQA_Masked_Language_Modeling: Fine tunes a pretrained encoder model on MathQA
- MathQA_Independent_Subexpressions: Contains the code for the outlined independent subexpression predictor
- MathQA_Final_Training: Used for training the seq2seq model for flan-t5-base and flan-t5-large
- MathQA_Flan_T5_Base_Evaluation - Evaluates the performance of the base model
- MathQA_Flan_T5_Large_Evaluation - Evaluates the performance of the large model