CLIP-Finder

CLIP-Finder.mp4

CLIP-Finder is an iOS application that leverages advanced AI models to perform image similarity searches. It utilizes two CoreML models optimized for the Apple Neural Engine, ensuring efficient on-device processing. The app allows users to search for images from the photo gallery using natural language descriptions or live camera input. All searches are completely offline, providing a user-friendly interface for searching and profiling while taking full advantage of Apple's cutting-edge AI capabilities.

This project is based on Apple's MobileCLIP architecture. Details of the architecture can be found in the following paper. The selected subarchitecture is MobileCLIP-S0, finding consistency with the latency times of the Image/Text encoders reported by the authors. The ImageEncoder is based on FastViT with some minor modifications. The general architecture of the two approaches implemented in CLIP-Finder is presented below:

Features

Text-based image search
Image-based search using iPhone's front or rear camera
Two CoreML models optimized for the Apple Neural Engine:
- CLIP Image Model
- CLIP Text Model
GPU-accelerated image preprocessing using MPSGraph
Similarity calculation using dot product in MPSGraph
Model profiling for performance analysis across different compute units
Cache management for optimized performance
Tokenizer: Implementation in Swift based on the Tokenizer written in Python from open_clip

Release

Try It Out

To experience these new features firsthand, join the TestFlight program:

TestFlight Link

Components

AI Models:
- CLIP Image Model (CoreML, optimized for Apple Neural Engine)
- CLIP Text Model (CoreML, optimized for Apple Neural Engine)
Image Processing:
- Preprocessing: Utilizes MPSGraph for efficient GPU-based image preparation
- Postprocessing: Employs MPSGraph for similarity calculations using dot product, and selects the photos with the highest similarity scores
User Interface:
- Main view for search operations
- Settings view for cache management and model profiling
Data Management:
- Core Data for efficient storage and retrieval of processed image features
Asynchronous Image Prediction (Turbo Mode)

CLIP-Finder2 now includes an experimental asynchronous image prediction feature, also known as "Turbo Mode". This feature can be activated through a button in the camera interface.
- Faster image processing: Turbo Mode enables asynchronous camera prediction, potentially speeding up the image search process.
- Activation: To activate, tap the "Turbo" button in the lower right corner of the camera interface.
- For more information on asynchronous prediction in Core ML, refer to this WWDC 2023 session: Improve Core ML integration with async prediction
⚠️ WARNING: Turbo Mode is faster but may cause the app to freeze momentarily. Use with caution.

Core ML Packages

This section describes the variations of the Core ML packages available. These packages are designed to provide different levels of performance and accuracy, suitable for a variety of applications.

The Core ML packages are available at: 🤗 MobileCLIP on HuggingFace.

Core ML Conversion Scripts

This section provides details on the CoreML conversion scripts used for converting models to the CoreML format. The scripts are available as Jupyter Notebooks and can be found in the repository.

Scripts

CLIPImageModel to CoreML
- This notebook demonstrates the process of converting a CLIP image model to CoreML format.
CLIPTextModel to CoreML
- This notebook demonstrates the process of converting a CLIP text model to CoreML format.

Search Methods

Text Search: Enter descriptive text to find matching images in your gallery
Image Search: Use either the front or rear camera of your iPhone to capture an image and find similar photos

System Requirements

iOS 17 or later
iPhone with A12 Bionic chip or later (for optimal AI model performance on the Neural Engine)

Important Notes

It is recommended to turn off Low Power Mode for optimal performance, especially to fully utilize the Apple Neural Engine.
The app requires permission to access your photo gallery and camera.

Batch Processing

CLIP-Finder implements efficient batch processing to handle large photo galleries:

On app launch, the entire photo gallery is preprocessed using the Neural Engine with a batch size of 512 photos. This approach significantly speeds up the initial processing time.
When new photos are added to the device's gallery, CLIP-Finder detects and processes only the new images upon the next app launch. This incremental processing ensures that the app stays up-to-date with the latest additions to your photo library without redundant calculations.
Similarly, if photos are deleted from the device, CLIP-Finder updates its database accordingly during the next app launch. This cleanup process maintains the accuracy of the search results and optimizes storage usage.

Refer to this Blog for more details: 🤗 Hugging Face Blog

Settings

The Settings view provides two main functions:

Clear Cache: Removes all preprocessed image data to free up storage space
Model Profiler: Runs a performance analysis on both CoreML models across different computational units (CPU, GPU, Neural Engine, and combinations), allowing you to see the performance benefits of the Apple Neural Engine

This is the view for the CoreMLProfiler built-in app.

Acknowledgments

This project is based on the architecture of the MobileCLIP, which is licensed under the MIT License. An acknowledgment is extended to the authors of the paper: Pavan Kumar Anasosalu Vasu, Hadi Pouransari, Fartash Faghri, Raviteja Vemulapalli, and Oncel Tuzel, for their valuable contributions.

License

This project is licensed under the MIT License. See the LICENSE file for more details.

Name		Name	Last commit message	Last commit date
Latest commit History 74 Commits
CLIP-Finder2.xcodeproj		CLIP-Finder2.xcodeproj
CLIP-Finder2		CLIP-Finder2
.DS_Store		.DS_Store
LICENSE		LICENSE
PredictionModes.md		PredictionModes.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CLIP-Finder

Features

Release

Try It Out

Components

Core ML Packages

Core ML Conversion Scripts

Scripts

Search Methods

System Requirements

Important Notes

Batch Processing

Settings

Acknowledgments

License

About

Releases 4

Packages

Contributors 2

Languages

License

fguzman82/CLIP-Finder2

Folders and files

Latest commit

History

Repository files navigation

CLIP-Finder

Features

Release

Try It Out

Components

Core ML Packages

Core ML Conversion Scripts

Scripts

Search Methods

System Requirements

Important Notes

Batch Processing

Settings

Acknowledgments

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 4

Packages 0

Contributors 2

Languages

Packages