Skip to content

Latest commit

 

History

History
66 lines (46 loc) · 1.84 KB

README.md

File metadata and controls

66 lines (46 loc) · 1.84 KB

pygguf

GGUF parser in Python with NumPy-vectorized dequantization of GGML tensors.

DISCLAIMER

  • This code has only been tested for the TinyLlama model. It might (or might not) work for other models. If any issues arise, a probable source might be the weird transposition of the key and query weights.
  • If you want maximum performance, you should probably use a C implementation instead.

Prerequisites

Install NumPy:

pip install numpy

Download the Q4_K_M model file from https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF/tree/main

mkdir -p 'data/TinyLlama-1.1B-Chat-v1.0-GGUF'
wget 'https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF/resolve/main/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf?download=true' -O 'data/TinyLlama-1.1B-Chat-v1.0-GGUF/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf'

Install pygguf:

git clone https://github.com/99991/pygguf.git
cd pygguf
pip install -e .

Example

import gguf

filename = "data/TinyLlama-1.1B-Chat-v1.0-GGUF/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf"

with open(filename, "rb") as f:
    # Load metadata
    info, tensorinfo = gguf.load_gguf(f)

    # Print metadata
    for key, value in info.items():
        print(f"{key:30} {repr(value)[:100]}")

    # Load tensors
    for name in tensorinfo:
        weights = gguf.load_gguf_tensor(f, tensorinfo, name)

        print(name, type(weights), weights.shape)

Testing

For testing, follow these steps:

  1. Install required libraries (only required for testing)
    • pip install tqdm requests safetensors
  2. Run