AI Safety for local LLMs

Evaluate the options for implementing safety guardrails for locally run LLMs.

Guardrail requirements

Does not involve fine tuning of models.
Must be run locally and offline.
Low latency
- Less than 0.8 seconds?
Accuracy
- False Positives can lead to the app's unusability and missing information.
- False Negatives can lead to vulnerabilities.
- If the ideal balance cannot be achieved, the FP-FN tradeoff can be decided based on the LLM use-case.
Low memory footprint
- Less that 800 MB?

Profanity and toxic content
Prompt Injection
PII
- Should certain PII's be allowed? E.g. In a RAG application, extracting email ids of the authors from a whitepaper.

Use non-LLM, lightweight models from Guardrails-AI to validate the user input and LLM output.
Use system prompt engineering similar to Safety-System-Prompts to instruct the LLM itself to behave in a safe manner.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
guardrails-ai		guardrails-ai
model-safety		model-safety
nemo-guardrails		nemo-guardrails
.gitignore		.gitignore
README.md		README.md
guardrails-design.png		guardrails-design.png