Official Implementation of Execution-based Code Generation using Deep Reinforcement Learning
The utilization of programming language (PL) models, pretrained on large-scale code corpora, as a means of automating software engineering processes has demonstrated considerable potential in streamlining various code generation tasks such as code completion, code translation, and program synthesis. However, current approaches mainly rely on supervised fine-tuning objectives borrowed from text generation, neglecting specific sequence-level features of code, including but not limited to compilability as well as syntactic and functional correctness. To address this limitation, we propose PPOCoder, a new framework for code generation that combines pretrained PL models with Proximal Policy Optimization (PPO) deep reinforcement learning and employs execution feedback as the external source of knowledge into the model optimization. PPOCoder is transferable across different code generation tasks and PLs.
Overview of the PPOCoder with actor and critic models: The action is sampled from the policy based on the given source data
To run the code, install the dependencies in requirements.txt.
pip install -r requirements.txt
We finetune/evaluate models on the following major dataset benchmarks for different code generation tasks:
- CodeSearchNet (CSN) is available here
- XLCoST is available here
- APPS is available here
- MBPP is available here
We preprocess the data and construct input/output sequences in the same manner as outlined in the original benchmark papers. Unzip and place all benchmarks in the data
folder.
We have created run.sh
script to execute PPO-based PL model fine-tuning based on the compiler signal. To run the script for different code generation tasks, configure the following parameters:
Parameters | Description | Example Values |
---|---|---|
l1 |
Source Language | java |
l2 |
Target Language | cpp |
asp |
Action Space Size | 5 |
ns |
Number of Synthetic Samples | 10 |
data_path |
Path to the original data samples | data/xlcost/java-cpp/ |
output_path |
Path to save generations and outputs | saved_results/java-cpp/ |
baseline_output_dir |
Path to the base finetuned CodeT5 (before RL) outputs | baselines/saved_models/java-cpp/ |
load_model_path |
Path to the base finetuned CodeT5 model (before RL) for each downstream task | baselines/saved_models/java-cpp/pytorch_model.bin |
max_source_length |
Maxmim Source Length | 400 |
max_target_length |
Maxmim Target Length | 400 |
train_batch_size |
Training Batch Size | 32 |
test_batch_size |
Testing Batch Size | 48 |
lr |
Learning Rate | 1e-6 |
kl_coef |
Initial coefficient of the KL divergence penalty in the reward | 0.1 |
kl_target |
Target of the KL which adaptively controls the KL coefficient | 1 |
vf_coef |
Coefficient of the vf error in the ppo loss | 1e-3 |
run |
Index of the run | 1 |
Running run.sh
saves generated programs in a .txt
file and the model weights at the end of each epoch.
If you find the paper or the repo useful, please cite it with
@article{shojaee2023ppocoder, title={Execution-based code generation using deep reinforcement learning}, author={Shojaee, Parshin and Jain, Aneesh and Tipirneni, Sindhu and Reddy, Chandan K}, journal={arXiv preprint arXiv:2301.13816}, year={2023} }