Skip to content

Commit

Permalink
Merge pull request #50 from swarnat/master
Browse files Browse the repository at this point in the history
feat(*): option to add language for base_prompt
  • Loading branch information
B-urb authored Jun 18, 2024
2 parents 431b237 + 806e37d commit c83bdbb
Show file tree
Hide file tree
Showing 2 changed files with 33 additions and 15 deletions.
5 changes: 2 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,11 +45,13 @@ With these prerequisites met, you are now ready to proceed with the installation

The application requires setting environment variables for its configuration. Below is a table describing each environment variable, indicating whether it is required or optional, its default value (if any), and a brief description:


| Environment Variable | Required | Default Value | Description |
|--------------------------|---------|----------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `PAPERLESS_TOKEN` | Yes | None | The authentication token for accessing the Paperless API. |
| `PAPERLESS_BASE_URL` | Yes | None | The base URL for the Paperless API. |
| `PAPERLESS_FILTER` | NO | "NOT tagged=true" | Filter string that filters the documents to be fetched from paperless |
| `LANGUAGE` | No | "EN" | Allow to use translated base prompts (Support: EN, DE) |
| `OLLAMA_HOST` | No | "localhost" | The hostname where the Ollama service is running. |
| `OLLAMA_PORT` | No | "11434" | The port on which the Ollama service is accessible. |
| `OLLAMA_SECURE_ENDPOINT` | No | "false" | Whether to use HTTPS (`true`) or HTTP (`false`) for Ollama. |
Expand All @@ -60,9 +62,6 @@ The application requires setting environment variables for its configuration. Be






Make sure to set the required environment variables (`PAPERLESS_TOKEN` and `PAPERLESS_BASE_URL`) before running the application. Optional variables have default values and will use those defaults if not explicitly set.
For Development these should be defined in a `.env` file located at the root of your project directory.

Expand Down
43 changes: 31 additions & 12 deletions src/main.rs
Original file line number Diff line number Diff line change
Expand Up @@ -101,18 +101,37 @@ fn init_ollama_client(host: &str, port: u16, secure_endpoint: bool) -> Ollama {

// Refactor the main process into a function for better readability
async fn process_documents(client: &Client, ollama: &Ollama, model: &str, base_url: &str, filter: &str) -> Result<(), Box<dyn std::error::Error>> {
let prompt_base = env::var("BASE_PROMPT").unwrap_or_else(|_| "Please extract metadata\
from the provided document and return it in JSON format.\
The fields I need are:\
title,topic,sender,recipient,urgency(with value either n/a or low or medium or high),\
date_received,category.\
Analyze the document to find the values for these fields and format the response as a \
JSON object. Use the most likely answer for each field. \
The response should contain only JSON data where the key and values are all in simple string \
format(no nested object) for direct parsing by another program. So now additional text or \
explanation, no introtext, the answer should start and end with curly brackets \
delimiting the json object ".to_string()
);

let language= env::var("LANGUAGE").unwrap_or_else(|_| "EN".to_string()).to_uppercase();
let base_prompt;

match language.as_ref(){
"DE"=> base_prompt = "Bitte ziehe die Metadaten aus dem bereitgestelltem Dokument \
und antworte im JSON format. \
Die Felder, welche ich brauche sind:\
title,topic,sender,recipient,urgency(mit werten entweder n/a oder low oder medium oder high),\
date_received(im maschinenlesbarem format),category.\
Analysiere das Dokument, um die Werte für diese Felder zu finden und forme die Antwort als JSON-Objekt. \
Verwende die wahrscheinlichste Antwort für jedes Feld in der gleichen Sprache wie das Dokument. \
Die Antwort sollte nur JSON-Daten enthalten, bei denen die Schlüssel und Werte alle in einfacher Textform \
(keine verschachtelten Objekte) vorliegen, um von einem anderen Programm direkt analysiert werden zu können. \
Also keine zusätzlichen Texte oder Erklärungen, der Antworttext sollte mit eckigen Klammern beginnen und enden, \
die das JSON-Objekt umfassen ".to_string(),
_=> base_prompt = "Please extract metadata\
from the provided document and return it in JSON format.\
The fields I need are:\
title,topic,sender,recipient,urgency(with value either n/a or low or medium or high),\
date_received(in machine-readable format),category.\
Analyze the document to find the values for these fields and format the response as a \
JSON object. Use the most likely answer for each field. \
The response should contain only JSON data where the key and values are all in simple string \
format(no nested object) for direct parsing by another program. So now additional text or \
explanation, no introtext, the answer should start and end with curly brackets \
delimiting the json object ".to_string()
};

let prompt_base = env::var("BASE_PROMPT").unwrap_or_else(|_| base_prompt.to_string());

let mode_env = env::var("MODE").unwrap_or_else(|_| "0".to_string());
let mode_int = mode_env.parse::<i32>().unwrap_or(0);
let mode = Mode::from_int(mode_int);
Expand Down

0 comments on commit c83bdbb

Please sign in to comment.