PaliGemma Inference Pipeline

Replication and efficient inference for the PaliGemma model, a state-of-the-art vision-language model combining a SigLIP vision encoder with a Gemma language decoder. Optimized for both CPU and Apple Silicon (MPS) devices.

Features

Efficient inference with automatic device selection (MPS/CPU)
Advanced optimizations for both CPU and MPS:
- MPS (Apple Silicon) Optimizations:
  - Automatic mixed precision (float16)
  - Metal-specific memory layout optimizations
  - Optimized memory format for Metal Performance Shaders
  - Automatic fallback to CPU if MPS is unavailable
- CPU Optimizations:
  - Dynamic quantization of linear layers (int8)
  - Optimized memory layout
  - CPU-specific automatic mixed precision
  - Inference mode optimizations
Support for image and text inputs
Customizable inference parameters
Efficient memory management with proper context handling

Installation

Clone the repository:

git clone https://github.com/codingwithsurya/PaliGemma-Inference-Pipeline.git
cd paligemma-inference

Install the required packages:
```
pip install -r requirements.txt
```
Set up your Hugging Face token:
- Copy .env.template to .env:
```
cp .env.template .env
```
- Edit .env and replace your_token_here with your Hugging Face token
- You can get your token from Hugging Face Settings

Usage

Run inference using the following command:

# For MPS (Apple Silicon GPU) - Recommended for Mac users
python inference.py --prompt "Describe this image" --image_file_path "path/to/your/image.jpg"

# For CPU-only inference
python inference.py --prompt "Describe this image" --image_file_path "path/to/your/image.jpg" --only_cpu

# Using the sample dog image
python inference.py --prompt Describe this image in detail --image_file_path dog.jpg --max_tokens_to_generate 300

Parameters

--prompt: The text prompt for the model
--image_file_path: Path to the input image
--only_cpu: Flag to force CPU-only inference (default: False, will use MPS if available)
--max_tokens_to_generate: Maximum number of tokens to generate (default: 300)
--temperature: Sampling temperature (default: 0.7)
--top_p: Top-p sampling parameter (default: 0.9)

Technical Details

This project leverages several advanced deep learning concepts and optimizations:

Architecture:
- Vision Transformer (ViT) for image processing
- Transformer architecture with multi-head attention
- Rotary positional embeddings
- Grouped query attention
Optimizations:
- Device-specific optimizations (MPS/CPU)
- Automatic mixed precision training
- Dynamic quantization
- Optimized memory layouts
- Inference mode optimizations
- Proper context management for optimal performance
Memory Management:
- Efficient tensor operations
- Automatic device selection and fallback
- Optimized memory formats for each device

Performance

The implementation automatically selects the best available device:

On Apple Silicon Macs: Uses MPS (Metal Performance Shaders) for GPU acceleration
On other systems: Falls back to optimized CPU inference with quantization

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
__pycache__		__pycache__
.DS_Store		.DS_Store
.env.template		.env.template
.gitignore		.gitignore
README.md		README.md
dog.jpg		dog.jpg
gemma_model.py		gemma_model.py
inference.py		inference.py
optimized_inference.py		optimized_inference.py
paligemma_processor.py		paligemma_processor.py
requirements.txt		requirements.txt
siglip_model.py		siglip_model.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PaliGemma Inference Pipeline

Features

Installation

Usage

Parameters

Technical Details

Performance

Contributing

Acknowledgements

About

Releases

Packages

Languages

codingwithsurya/PaliGemma-Inference-Pipeline

Folders and files

Latest commit

History

Repository files navigation

PaliGemma Inference Pipeline

Features

Installation

Usage

Parameters

Technical Details

Performance

Contributing

Acknowledgements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages