Formally verified code generation using Large Language Models (LLMs)
A project that uses LLMs to generate formally verified code. The input are formal specifications, it then uses LLMs to generate the code that will be used. The code is then verified using formal methods to ensure that it meets the specifications. The project is still in its early stages and is not yet functional.
- Python 3.6 or higher
- Python dependencies listed in requirements.txt (python3 -m pip install -r requirements.txt)
- Frama-c (https://frama-c.com/)
- Why3 (https://why3.lri.fr/)
- Provers in the Why3 platform, i.e. Alt-Ergo, CVC4, and Z3
- .env file in the root directory with a API keys for the used services of OPENAI and GROQ, OPENAI_API_KEY = "..." and GROQ_API_KEY = "..."
- alt-ergo 2.4.3 (https://alt-ergo.lri.fr/)
- CVC4 1.7 (https://cvc4.github.io/)
- Z3 4.8.6
- Clone the repository
- Install the required python packages using the following command:
pip install -r requirements.txt
- Install Frama-c and Why3 using the links provided above.
- Run the following command to generate the code:
python main.py function
- 'verify' : Verifies the given C file (-c) using a formal specification (-fsf) file.
- 'list_dir' : Lists the files given in a directory, where the directory is described by the -d flag
- 'improve_code_step': Only executes one code improvement step on the given formal specification and program candidate
- 'generate_prompt' : Generates a prompt for the user to given an .h file
- 'generate_code' : Generates the code for the given .h file using Large Language models. Requires the API key to be set.
- 'generate_code_folder: Recursively generate code for each subfolder in a given directory
- Description: Specifies the function to execute (e.g.,
list_files
,verify
). - Type:
str
- Default: Required
- Description: Enables debug mode, providing verbose logging.
- Type:
bool
- Default:
False
- Description: Clears debug files in the specified output path.
- Type:
bool
- Default:
False
- Description: Path to the directory for file operations.
- Type:
str
- Default:
None
- Description: Path to the C file to be verified or improved.
- Type:
str
- Default:
None
- Description: Path to the formal specification file for verification.
- Type:
str
- Default:
None
- Description: Path to save output files, such as generated code or verification results.
- Type:
str
- Default:
None
- Description: Name of the output file for generated code.
- Type:
str
- Default:
None
- Description: Natural language description of the task for code generation.
- Type:
str
- Default:
None
- Description: Function signature for the code to be generated.
- Type:
str
- Default:
None
- Description: Specifies the type of specification for code generation (
natural
,formal
, orboth
). - Type:
str
- Default:
both
- Description: Name of the LLM model to use for code generation (e.g.,
gpt-3.5-turbo
). - Type:
str
- Default:
gpt-3.5-turbo
- Description: Sampling temperature for the LLM; higher values generate more diverse outputs.
- Type:
float
- Default:
1.0
- Description: Maximum number of tokens to generate in the output.
- Type:
int
- Default:
4096
- Description: Enables or disables loops in the generated code.
- Type:
bool
- Default:
False
- Description: Number of iterations to use for code generation or improvement.
- Type:
int
- Default:
10
python3 main.py generate_code_folder -d ../paper_problems/ -ieg 10 -iter 10 -temp 1 -wpt 5 -o ../output/llama3.1-8b-10-10-1-one-shot-both -output-file generated_code.c -fsf formal-specification.h -nl natural-language-specification.h -sig function-signature.h -pt one-shot -spectype both
This generates code for all problems in a folders