This repository has been archived by the owner on Jun 26, 2024. It is now read-only.
v0.0.2
What's Changed
- Load finetuned weights by @aniketmaurya in #2
- Refactor serve by @aniketmaurya in #3
For inference
from llama_inference import LLaMAInference
import os
WEIGHTS_PATH = os.environ["WEIGHTS"]
checkpoint_path = f"{WEIGHTS_PATH}/lit-llama/7B/state_dict.pth"
tokenizer_path = f"{WEIGHTS_PATH}/lit-llama/tokenizer.model"
model = LLaMAInference(checkpoint_path=checkpoint_path, tokenizer_path=tokenizer_path, dtype="bfloat16")
print(model("New York is located in"))
For serving a REST API
# app.py
from llama_inference.serve import ServeLLaMA, Response
import lightning as L
component = ServeLLaMA(input_type=PromptRequest, output_type=Response)
app = L.LightningApp(component)
Full Changelog: v0.0.1...v0.0.2