Evaluate the options for implementing safety guardrails for locally run LLMs.
- Does not involve fine tuning of models.
- Must be run locally and offline.
- Low latency
- Less than 0.8 seconds?
- Accuracy
- False Positives can lead to the app's unusability and missing information.
- False Negatives can lead to vulnerabilities.
- If the ideal balance cannot be achieved, the FP-FN tradeoff can be decided based on the LLM use-case.
- Low memory footprint
- Less that 800 MB?
- Profanity and toxic content
- Prompt Injection
- PII
- Should certain PII's be allowed? E.g. In a RAG application, extracting email ids of the authors from a whitepaper.
- Use non-LLM, lightweight models from Guardrails-AI to validate the user input and LLM output.
- Use system prompt engineering similar to Safety-System-Prompts to instruct the LLM itself to behave in a safe manner.