-
Notifications
You must be signed in to change notification settings - Fork 198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AI-generated written text content #800
base: master
Are you sure you want to change the base?
Conversation
pre-commit.ci autofix |
docs/gpt.rst
Outdated
@@ -0,0 +1,67 @@ | |||
gpt | |||
======= |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
======= | |
=== |
docs/gpt.rst
Outdated
======= | ||
|
||
.. dfhack-tool:: | ||
:summary: AI-generated written content! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can understand the exclamation point, but summaries have to end in a period to match the parser : P
How about "Generate written content with AI."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK will fix thanks.
docs/gpt.rst
Outdated
``enable gpt`` | ||
======= | ||
Enables the plugin. The overlay will be shown when a knowledge item or unit view sheet is open. | ||
|
||
``disable gpt`` | ||
======= | ||
Disables the plugin. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this actually isn't necessary. players can turn the overlay on and off in gui/control-panel
.
docs/gpt.rst
Outdated
|
||
Setup: | ||
|
||
1. Register for an OpenAI API account. It must be a paid or active trial account. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a url would be useful here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
docs/gpt.rst
Outdated
|
||
1. Register for an OpenAI API account. It must be a paid or active trial account. | ||
2. Generate an API token for your account. | ||
3. Save your OpenAI API token to a file at the root of your DF directory, `oaak.txt`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we try not to put any user data in the DF root directory. I suggest putting it in dfhack-config/
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, makes sense
docs/gpt.rst
Outdated
3. Save your OpenAI API token to a file at the root of your DF directory, `oaak.txt`. | ||
4. Install python. We used version 3.11 installed from the Microsoft Store. | ||
5. Install python dependencies Flask and OpenAI: `pip install Flask` and `pip install OpenAI`. | ||
6. Start the local helper python app: cd into dfhack/scripts directory & run `python gptserver.py`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see two options here. if we deploy this with DFHack, then this python script should probably go in the DFHack/dfhack repo and get installed somewhere more appropriate (like maybe the DF root dir)
if this script gets distributed as a mod (see https://docs.dfhack.org/en/stable/docs/guides/modding-guide.html#the-structure-of-a-mod), then the python code can get included in the mod files
5. Install python dependencies Flask and OpenAI: `pip install Flask` and `pip install OpenAI`. | ||
6. Start the local helper python app: cd into dfhack/scripts directory & run `python gptserver.py`. | ||
|
||
Once the python helper is running, you may now enable and use the gpt plugin. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest having the overlay display a message like "cannot contact server -- do you have the Python helper running?" instead of requiring the player to remember to start the server first
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The overlay doesn't try to contact the server. It waits until you press the button and then, when there is something to submit, then it tries submitting. At that point, if it can't reach the server, the UI displays a message that says what you suggest (more or less).
The UI becomes fairly unresponsive though, since if the server can't be reached, the exiting luasocket library throws a C++ exception, and Lua's lack of modern error handling means we have to resort to pcall, which seems to carry a nasty performance penalty, perhaps exacerbated by how often updateLayout is being called. Still looking into this one, but it seems to have a workaround at least, and isn't expected to happen very often.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pcall shouldn't really be introducing much of a penalty, although errors themselves might... how often are you calling it?
docs/gpt.rst
Outdated
Tweaking additional OpenAI API parameters will require modifying `gptserver.py` to suit | ||
your particular desires, until such time as someone may have added additional | ||
configuration options in a future update to DFHack :D |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would be fairly trivial to have a configuration screen that can open from the overlay and send options data to the python server
gpt.lua
Outdated
--@ module = true | ||
|
||
local json = require('json') | ||
local dfhack = require('dfhack') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should not be necessary -- dfhack
is already in the default environment
local function string_from_Status(status) | ||
if status == Status.start then return "start" end | ||
if status == Status.waiting then return "waiting" end | ||
if status == Status.receiving then return "receiving" end | ||
if status == Status.done then return "done" end | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider utils.invert()
. e.g.
local Status = {
...
}
local Status_rev = utils.invert(Status)
local function string_from_Status(status)
return Status_rev[status]
end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK will update it tomorrow. BTW this is one thing that Swift really got perfect, how they handle enums with associated values is pure genius. I've been experimenting with Swift for Windows. They recently added some very powerful C++ interoperability features where you can basically just start writing Swift and it compiles and runs as if you wrote C++, or something. Curious.
-- | ||
|
||
-- Whether or not to print debug outpuut to the console. | ||
local is_debug_output_enabled = false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this comes up so often... we really need a centralized debugfilter
-integrated Lua logging system. I'm not saying you have to write it. I'm just kvetching.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
amen 🗡️
local is_debug_output_enabled = false | ||
|
||
-- Port on which to communicate with the python helper. | ||
local port = 5001 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should probably get all this from the config file -- dfhack-config/gpt.json
-- so players have a way to make changes (preferably through an in-game GUI)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a constant (not meant to be changed).
local timeout = client_timeout_secs + client_timeout_msecs/1000 | ||
|
||
-- Number of onRenderFrame events to wait before polling again. | ||
local polling_interval = 10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the time between graphical frames can vary significantly. polling_interval_ms
would be better; we have a timer API to get ms -- example: https://github.com/DFHack/dfhack/blob/develop/plugins/lua/overlay.lua#L432-L444
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK will look into it thanks
prompt_value = 'Return a response stating simply, "There has been an error."' | ||
else | ||
debug_log('Creating prompt for non-poem/non-star-chart/non-unsupported content_type: ' .. content_type) | ||
prompt_value = 'In between the four carrots is a description of a written ' .. content_type .. ': ^^^^' .. knowledge_description .. '^^^^. \n\n' .. excerpts_prompt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
carrots -> carets
if keys._MOUSE_R_DOWN or keys.LEAVESCREEN then | ||
if view then | ||
view:dismiss() | ||
end | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we actually need to close the window if the knowledge description screen is closed? why not just leave it open so the player can appreciate it? they can always close it with a right click on the window at any time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thing is, right click usually doesn't close the knowledge description window, it just backs you to the list of known written content items, at which point we want the window closed so the user can open another knowledge item without necessarily generating it as soon as it's opened (which is what would happen if the window has remained opened). If someone is trying to economize which things they generate, that would be annoying.
The good news is that if you click back into the same knowledge item you just left, the text is all still there, and you can always alt-tab to the python cmd console to see the output there also in case you forgot to screenshot a good one.
I would suggest running the plugin and playing around with it, and then if you think I've misread the situation please let me know and I'll make the adjustment, or maybe we think of a third solution that's better than either of these options.
GPTBannerOverlay.ATTRS{ | ||
default_pos={x=-35,y=-2}, | ||
default_enabled=true, | ||
viewscreens={'dwarfmode/ViewSheets/UNIT','dwarfmode/ViewSheets/ITEM'}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems too broad -- do we need more fine-grained focus strings here? it's not hard to add more context in modules/Gui.cpp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just using what was available. If you give me one for the UNIT/Skills/Knowledge I'll gladly use it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, let me see if I can give you something better here.
ref: DFHack/dfhack#3675
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
end | ||
end | ||
|
||
self.subviews.label:updateLayout() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can get expensive. better to only update the layout when the text actually changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this is what's making the pcall() lock up the UI updating? I'll play around with it tomorrow.
if status == Status.done then return "done" end | ||
end | ||
|
||
local Content_Type = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this feels..odd, but I need to read through the code a few more times before I feel comfortable suggesting changes
would it make sense to make the generated text persistent per world? then the same knowledge item would always get the same generated text per world (and it would cut down on API costs if the same item were viewed multiple times) |
we have facilities for mocking out pretty much anything. You can even mock out part of the DF game state so the script can read test data from the another good unit test example: https://github.com/DFHack/scripts/blob/master/test/prioritize.lua |
Co-authored-by: Myk <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
responding to the comments from myk002
-- | ||
|
||
-- Whether or not to print debug outpuut to the console. | ||
local is_debug_output_enabled = false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
amen 🗡️
local is_debug_output_enabled = false | ||
|
||
-- Port on which to communicate with the python helper. | ||
local port = 5001 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a constant (not meant to be changed).
local timeout = client_timeout_secs + client_timeout_msecs/1000 | ||
|
||
-- Number of onRenderFrame events to wait before polling again. | ||
local polling_interval = 10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK will look into it thanks
local function string_from_Status(status) | ||
if status == Status.start then return "start" end | ||
if status == Status.waiting then return "waiting" end | ||
if status == Status.receiving then return "receiving" end | ||
if status == Status.done then return "done" end | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK will update it tomorrow. BTW this is one thing that Swift really got perfect, how they handle enums with associated values is pure genius. I've been experimenting with Swift for Windows. They recently added some very powerful C++ interoperability features where you can basically just start writing Swift and it compiles and runs as if you wrote C++, or something. Curious.
GPTBannerOverlay.ATTRS{ | ||
default_pos={x=-35,y=-2}, | ||
default_enabled=true, | ||
viewscreens={'dwarfmode/ViewSheets/UNIT','dwarfmode/ViewSheets/ITEM'}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just using what was available. If you give me one for the UNIT/Skills/Knowledge I'll gladly use it
if keys._MOUSE_R_DOWN or keys.LEAVESCREEN then | ||
if view then | ||
view:dismiss() | ||
end | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thing is, right click usually doesn't close the knowledge description window, it just backs you to the list of known written content items, at which point we want the window closed so the user can open another knowledge item without necessarily generating it as soon as it's opened (which is what would happen if the window has remained opened). If someone is trying to economize which things they generate, that would be annoying.
The good news is that if you click back into the same knowledge item you just left, the text is all still there, and you can always alt-tab to the python cmd console to see the output there also in case you forgot to screenshot a good one.
I would suggest running the plugin and playing around with it, and then if you think I've misread the situation please let me know and I'll make the adjustment, or maybe we think of a third solution that's better than either of these options.
end | ||
end | ||
|
||
self.subviews.label:updateLayout() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this is what's making the pcall() lock up the UI updating? I'll play around with it tomorrow.
gpt.lua
Outdated
--@ module = true | ||
|
||
local json = require('json') | ||
local dfhack = require('dfhack') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
local dfhack = require('dfhack') |
Right now it's a bit of a compromise. It caches the last-received item for the session in memory, so that if you were to re-open the same one again many times in a row, it will show you the cached response. However as soon as you look at something else, now that occupies the cache, so that if you go back to the first one, it will regenerate fresh. This can be nice, since it saves you from regenerating an entry if you just accidentally closed it and then reopened it. But it also lets you regenerate it purposefully in case the first version was lame, by looking at something else, and then coming back to the first thing. That could be a good thing, if you're using the default cheap/fast engine, which is more of a quantity over quality compromise. In this case you'll encounter more times where you want to regenerate the same prompt. However, if you're running GPT4, which costs about 5x more but is also likely to produce a great result the first time every time, then you'll probably not want to regenerate the same one twice very often. In the current situation, the onus is on the user to simply not click on something again that they don't want to regenerate; if they forgot it was generated before, that's on them. Personally (and I'm not young) I have found it pretty easy to remember which ones I already generated, and I just generate one when I see something interesting that I'm fairly certain I haven't looked at before. If I'm confused, I can refer to my screenshots. That being said, I can imagine every user will have a different set of expectations and preferences for how that ought to work, so in an ideal world it's configurable. So, if we were to consider adding support for caching every response and persisting the responses to disk, what kind of database store do we have in the project's dependencies, capable handle this in a memory-efficient manner? |
how much text are we talking here? The excerpts I've seen are pretty small, and putting a few MB in JSON isn't out of the question. At runtime you'd cache in a table in Lua in memory. What are the upper bounds of the memory complexity requirements? Note that we do already link to zlib, so compression is an option if that becomes necessary. |
Description
This PR adds a module
gpt
with an overlay and resizable window, enabling AI-generated written content based on in-game descriptions of written content.gpt.lua
sends written descriptions to a python helper script that runs in the background,gptserver.py
, which in turn handles secure communication with OpenAI's servers.Security
While the
gpt.lua
dfhack script communicates with the helper script using an unencrypted localhost TCP connection, no sensitive information is ever passed between dfhack and the python helper, which will ignore any connections from a hosts other than localhost (127.0.0.1).All communications between
gptserver.py
and OpenAI's cloud are performed via OpenAI's official python module, which in turn handles all the nitty-gritty of requests. Their module employs up-to-date HTTPS security.Your personal OpenAI key will be stored on disk in a file named
oaak.txt
. This file is not encrypted but it's only read by the python script (not by DFHack). Use common sense permissions for this file like you would for an SSH private key.OpenAI also allows you to set a hard limit on monthly costs, so that if this script goes in an infinite loop or somehow your API key leaks out, you won't be on the hook for thousands of dollars. Also, you can regenerate the API key at anytime on OpenAI's website and make a new key, which is recommended to be done quite frequently regardless of what security you're using.
Should these measures be seen as insufficient, we could investigate platform-specific system keychain storage for the API key, however this might make the python script less portable across platforms. Please comment below on your thoughts on this.
Costs
Use of this script requires an OpenAI API account with a positive balance. You can get $5 free API credit that expires in 90 days for signing up, which should be good for 50 to 500 requests depending on the model used. They charge by the token (appx. 3/4 of a word), and the cost varies widely depending on which model is used.
The default model selection is the fast, cheap one. You may elect to go with GPT-3.5 or 4.0 by launching
gptserver.py
with command-line options-gpt3
or-gpt4
, but be ready for slower response times and higher per-token fees. Overall, for typical use of a DF player, one would not expect the costs to exceed $5-10/mo. even for heavy use, and as we move into the future, these kinds of services will likely gain a free tier for non-commercial light use (similar to the public ChatGPT service).Remaining work to be done:
Tests
At this point the PR contains no automated tests, hence why it's being posted as a draft PR.
I would like to ask the preliminary reviewers for advice on testing strategy as I have not yet looked into how testing of these scripts is done.
Integration testing
My initial thoughts would be to use the test harness (?) to boot the gptserver.py in a "mock server" mode where it will give canned responses for canned requests. We would want to add a Pipfile.lock (if there's not already one) to ensure the sands of python dependencies don't shift out from under the test runners (might also come in handy for end users so we don't have to list the dependencies in the docs; that's not very maintainable and I felt gross doing it).
This setup would allow us to run tests that validate the integration between the lua and python scripts hasn't been broken and error states in the python script get handled properly (right now the UI doesn't reflect a death of the python script appropriately, which is a TODO before this merges).
End-to-end testing
If we know the knowledge descriptions of some Urist McTestbuddy in the test world, and we can macro the right screens being opened, then we could setup something like an end-to-end test with the only mocked thing being OpenAI's service.
Unit testing
Since the script does sanitization on response strings to strip characters that DF can't render properly, and since it does heuristics to determine which type of written content a given description is of, it would be good to have some unit tests around these functions. I've tried to make those as functional as possible to facilitate testing.
Since the script is stateful in its present form, I would also like to add tests to validate whether the state transitions are handled properly and in a robust manner. There should be no combination of states that the script cannot recover gracefully from. Ideally it could be made completely stateless except for a single state store, but I haven't gotten that far yet. (Redux FTW)
Future directions
I'm a bit leery of exposing implementation details specific to OpenAI within the DFHack UI, because I am guessing that a decent number of folks will want to rewrite the python script to talk to a different LLM which would likely have its own, totally different API. Having to then also update a bunch of UI code could be annoying.