Skip to content

Commit

Permalink
Rework prompt
Browse files Browse the repository at this point in the history
  • Loading branch information
evgenius1424 committed Oct 28, 2024
1 parent 9a78f78 commit b948169
Showing 1 changed file with 45 additions and 22 deletions.
67 changes: 45 additions & 22 deletions apps/learnbefore-bff/src/get-words.ts
Original file line number Diff line number Diff line change
Expand Up @@ -5,18 +5,19 @@ import { Word, wordSchema } from "../types"
export async function* getWords(
openAI: OpenAI,
text: string,
translationLanguage: string = "Russian",
): AsyncGenerator<Word> {
let data = ""
for await (const part of await openAI.chat.completions.create({
model: "gpt-3.5-turbo",
model: "gpt-4o",
stream: true,
max_tokens: 4096,
response_format: { type: "json_object" },
messages: [
{ role: "system", content: systemPrompt },
{ role: "system", content: getPrompt(translationLanguage) },
{
role: "user",
content: getUserPrompt(text),
content: text,
},
],
})) {
Expand Down Expand Up @@ -45,26 +46,48 @@ export async function* getWords(
}
}

const systemPrompt =
"Use only RFC8259 compliant compact JSON and help to extract big list of words from the text that the language learner is unlikely to know or that are crucial to the understanding of the text. Words should be converted to dictionary form. Duplicates, names of characters, persons or toponyms are not allowed." +
"Words that do not exist in the text are not allowed. Returns an empty response if the text contains no words."
function getPrompt(translationLanguage: string) {
return `Please process the input text as follows:
function getUserPrompt(text: string, translationLanguage = "Russian") {
return `
You must extract 40 words from the text below which language learner likely do not know or need to know in order to understand the text.
Please ensure the extracted words are diverse and relevant to the context of the text.
Translation language is ${translationLanguage}.
Example of list of words in JSON:
1. First detect the source language of the text and remove:
- Most frequently used words in that language (approximately top 5000)
- Basic vocabulary (A1/A2 level) including:
* Common verbs (equivalents of be, do, go, etc.)
* Basic adjectives (equivalents of good, bad, big, small)
* Time expressions
* Basic numbers and quantities
* Family terms
* Elementary nouns
* Question words
* Pronouns
* Articles (if language has them)
* Prepositions
* Conjunctions
* Basic adverbs
* Auxiliary/modal verbs
* Common greetings
* Basic location words
* Everyday action words
2. For each remaining word:
- Convert to dictionary form
- Remove duplicates
- Keep order
- Create entry with:
* Original word
* Definition
* Russian translation
* Detected language code (ISO 639-1)
3. Format each entry as JSON:
{
"words": [
{
words: [
"word": "Hello", // The word itself.
"meaning": "A greeting or expression of goodwill.", // The definition or meaning of the word.
"translation": "Здравствуйте", // Translation of the word.
"languageCode": "en", // ISO 639 Language code indicating the language of the word (e.g., "en" for English).
]
"word": "[Original word]",
"meaning": "[Definition in source language]",
"translation": "[${translationLanguage} translation]",
"languageCode": "[ISO 639-1 code]"
}
Text: ${text}`.trim()
]
}`
}

0 comments on commit b948169

Please sign in to comment.