This repository has been archived by the owner on Jun 26, 2024. It is now read-only.

v0.0.2

aniketmaurya released this 16 Apr 19:22

· 37 commits to main since this release

049a196

What's Changed

Load finetuned weights by @aniketmaurya in #2
Refactor serve by @aniketmaurya in #3

For inference

from llama_inference import LLaMAInference
import os

WEIGHTS_PATH = os.environ["WEIGHTS"]

checkpoint_path = f"{WEIGHTS_PATH}/lit-llama/7B/state_dict.pth"
tokenizer_path = f"{WEIGHTS_PATH}/lit-llama/tokenizer.model"

model = LLaMAInference(checkpoint_path=checkpoint_path, tokenizer_path=tokenizer_path, dtype="bfloat16")

print(model("New York is located in"))

For serving a REST API

# app.py
from llama_inference.serve import ServeLLaMA, Response

import lightning as L

component = ServeLLaMA(input_type=PromptRequest, output_type=Response)
app = L.LightningApp(component)

Full Changelog: v0.0.1...v0.0.2

Contributors

aniketmaurya

Assets 2