Writing an application writer using almost only prompts and an LLM, learning about coding with LLMs along the way.
Before we start, a quick disclaimer: The following will indiscriminately execute code generated by GPT models, on your machine, without your review, and with potentially undesired consequences.
Install the openai package:
pip install openai
Put your OpenAI API key in a file called key.txt
:
chmod 600 key.txt
Boot the application builder (takes a minute):
python -m boot
Run the builder:
python -m builder write a script in factorial.py that prints factorial of the provided argument
python -m builder in the file factorial.py, update the usage text to "usage: factorial int"
Run your application:
python -m factorial 10
Doesn't work? See Troubleshooting
This tool writes software for writing software using natural language, not code. There are plenty GPT-based application writing tools out there, but they are built with human-written code. If writing applications using natural language is the future, then let's start now. We use a minimal amount of code to get started and build from there using natural language prompts only.
boot.py
is the only Python source file in this repository. It forms the start
of a bootstrap sequence that sets up an increasingly capable programming tool
chain. It simply reads the prompt in boot.txt
and feeds it to the chat
completions API.
As a first step of the boot sequence, boot.txt
recreates the functionality
boot.py
. After all, what is an application to write applications if it cannot
write itself? This also encourages us to start small and shows our premise has
some validity.
After this reboot
step, we need to branch out, not to get stuck in a chain of
prompts that can only run prompts to run prompts. Reboot allows us to run
multiple prompts in sequence.
The first of this is a run
source file. This breaks us free from having to
provide detailed instructions on how to call the OpenAI API every time we want
to run a prompt. For every source file, we still need two prompts because it is
harder to instruct the language model in a single prompt to write python to
write python.
The second is run_cached
, which adds basic caching capabilities in order to
speedup repeated boot runs. This simply appends to the existing source file,
requiring a reload of the module. The third is run_prompt
, which simplifies
calling the API with a single prompt.
Lastly, reboot runs the main
prompt, which is the main sequence of
bootstrapping prompts. By splitting this out of reboot, we have access to the
functionality we defined in the above prompts, like prompt comments.
To iterate fast, we cache all API calls early in the bootstrap process, using a
basic cache that writes to the cache
folder with the md5sum of the messages as
filename, and .py extension.
The exceptions are the early boot prompts, up to the main
prompt. After the
first boot run, you can use main.py for all subsequent boots, unless you modify
any of the prompts that run before main:
python -m main
The cache does not invalidate. To purge the cache, run:
rm -rf cache
There is a good chance that your boot run throws an error or otherwise doesn't work as expected. For the same input, the GPT model does not always write the same code. More over, OpenAI updates their models regularly, resulting in different completions.
If the boot
run fails, you probably get a stack trace with very little
information in it. This is because we use the exec
function to run code and
there are no python sources to point to. To see the code is executed, modify the
suspected prompt by inserting a line that says something like "Print it", right
before "Run using exec".
The prompt runner looks for the --verbose
flag (-v
) to print the output of
the model. This can be helpful for getting context for an error message:
python -m boot --verbose
python -m main --verbose
If the builder
run fails, there is probably a bug in the generated source
code. Old-fashioned debugging is possible at this point, see sources in src/
.
To generate better sources, prompts will need to be engineered. Consider posting
an issue on GitHub and include the generated sources.
Still stuck? Cheat a little and use patch-openai to log and cache all API calls.
Why?
Application building tools like gpt-engineer are showing promise, but still get stuck on relatively simple applications. To understand what is really needed to write software using a GPT, we'll start from scratch and understand the fundamental challenges that come with AI-generated code.
Why not just write one big prompt that builds the whole application in one go?
That would be great. The primary reason is that the OpenAI API returns text (and calls functions), but does not write complex applications, so we still need to write code to transform model output into our application and guide it along the way. The point of this project is not to write code, but to use prompts, so we're left with bootstrapping from shorter and instructive prompts.
Secondly, we haven't managed to get GPT to write large amounts of functioning code from a single prompt. It will likely get there, and as it gets closer, we incrementally simplify the boot prompts.
Isn't this just a painful way to write mediocre code?
It certainly feels that way.
MIT