Cloud-Native Neural Search? Framework for Any Kind of Data
Jina is a neural search framework that empowers anyone to build SOTA and scalable deep learning search applications in minutes.
โฑ๏ธ Save time - The design pattern of neural search systems, building a solution in just minutes.
๐ All data types - Processing, indexing, querying, understanding of video, image, long/short text, music, source code, PDF, etc.
๐ฉ๏ธ Local & cloud friendly - Distributed architecture, scalable & cloud-native from day one. Same developer experience on both local and cloud.
๐ฑ Own your stack - Keep end-to-end stack ownership of your solution. Avoid integration pitfalls you get with fragmented, multi-vendor, generic legacy tools.
pip install -U jina
More install options including Conda, Docker, on Windows can be found here.
We promise you to build a scalable ResNet-powered image search service in 20 minutes or less, from scratch. If not, you can forget about Jina.
Document, Executor, and Flow are three fundamental concepts in Jina.
- Document is the basic data type in Jina;
- Executor is how Jina processes Documents;
- Flow is how Jina streamlines and distributes Executors.
Leveraging these three components, let's build an app that find similar images using ResNet50.
๐ก Preliminaries: download dataset, install PyTorch & Torchvision
from jina import DocumentArray, Document
docs = DocumentArray.from_files('img/*.jpg') # load all image filenames into a DocumentArray
for d in docs: # preprocess them
(d.load_uri_to_image_blob() # load
.set_image_blob_normalization() # normalize color
.set_image_blob_channel_axis(-1, 0)) # switch color axis
import torchvision
model = torchvision.models.resnet50(pretrained=True) # load ResNet50
docs.embed(model, device='cuda') # embed via GPU to speedup
q = (Document(uri='img/00021.jpg') # build query image & preprocess
.load_uri_to_image_blob()
.set_image_blob_normalization()
.set_image_blob_channel_axis(-1, 0))
q.embed(model) # embed
q.match(docs) # find top-20 nearest neighbours, done!
Done! Now print q.matches
and you will see most-similar images URIs.
Add 3 lines of code to visualize them:
for m in q.matches:
m.set_image_blob_channel_axis(0, -1).set_image_blob_inv_normalization()
q.matches.plot_image_sprites()
Sweet! FYI, one can use Keras or PaddlePaddle for the embedding model. Jina supports them well.
With an extremely trivial refactoring and 10 extra lines of code, you can make the local script as a ready-to-serve service:
-
Import what we need.
from jina import DocumentArray, Executor, Flow, requests
-
Copy-paste the preprocessing step and wrap it via
Executor
:class PreprocImg(Executor): @requests def foo(self, docs: DocumentArray, **kwargs): for d in docs: (d.load_uri_to_image_blob() .set_image_blob_normalization() .set_image_blob_channel_axis(-1, 0))
-
Copy-paste the embedding step and wrap it via
Executor
:class EmbedImg(Executor): @requests def foo(self, docs: DocumentArray, **kwargs): import torchvision model = torchvision.models.resnet50(pretrained=True) docs.embed(model)
-
Wrap the matching step into
Executor
:class MatchImg(Executor): _da = DocumentArray() @requests(on='/index') def index(self, docs: DocumentArray, **kwargs): self._da.extend(docs) @requests(on='/search') def foo(self, docs: DocumentArray, **kwargs): docs.match(self._da) for d in docs.traverse_flat('r,m'): # only require for visualization d.convert_uri_to_datauri() # convert to datauri d.pop('embedding', 'blob') # remove unnecessary fields for save bandwidth
-
Connect all
Executor
s in aFlow
, scale embedding to 3:f = Flow(port_expose=12345, protocol='http').add(uses=PreprocImg).add(uses=EmbedImg, replicas=3).add(uses=MatchImg)
-
Index image data and serve REST query from public:
with f: f.post('/index', DocumentArray.from_files('img/*.jpg'), show_progress=True, request_size=8) f.block()
Done! Now query it via curl
you can get most-similar images:
Or go to http://0.0.0.0:12345/docs
and test requests via Swagger UI:
Or use a Python client to access the service:
from jina import Client, Document
from jina.types.request import Response
def print_matches(resp: Response): # the callback function invoked when task is done
for idx, d in enumerate(resp.docs[0].matches): # print top-3 matches
print(f'[{idx}]{d.scores["cosine"].value:2f}: "{d.uri}"')
c = Client(protocol='http', port=12345) # connect to localhost:12345
c.post('/search', Document(uri='img/00021.jpg'), on_done=print_matches)
At this point, you probably have taken 15 minutes but here we are: an image search service with rich features:
โ Solution as a service | โ Scale in/out any component | โ Query via HTTP/WebSocket/gRPC/Client |
โ Distribute/Dockerize components | โ Async/non-blocking I/O | โ Extendable REST interface |
Have another 7 minutes? We can show you how to bring your service to the next level by deploying it to Kubernetes on Google Cloud Platform.
- Create a Kubernetes cluster and get credentials:
gcloud container clusters create test --machine-type e2-highmem-2 --num-nodes 1 --zone europe-west3-a gcloud container clusters get-credentials test --zone europe-west3-a --project jina-showcase
- Move each
Executor
class to a separate folder with one Python file:PreprocImg
-> ๐preproc_img/exec.py
EmbedImg
-> ๐embed_img/exec.py
MatchImg
-> ๐match_img/exec.py
- Push all Executors to Jina Hub:
You will get three Hub Executors that can be used via Docker container.
jina hub push preproc_img jina hub push embed_img jina hub push embed_img
- Adjust
Flow
a bit and open it:f = Flow(name='readme-flow', port_expose=12345, infrastructure='k8s').add(uses='jinahub+docker://PreprocImg').add(uses='jinahub+docker://EmbedImg', replicas=3).add(uses='jinahub+docker://MatchImg') with f: f.block()
Intrigued? Then find more about Jina from our docs.
- ๐ Fashion image search:
jina hello fashion
- ๐ค QA chatbot:
pip install "jina[demo]" && jina hello chatbot
- ๐ฐ Multimodal search:
pip install "jina[demo]" && jina hello multimodal
- ๐ด Fork the source of a demo to your folder:
jina hello fork fashion ../my-proj/
- Join our Slack community to chat to our engineers about your use cases, questions, and support queries.
- Join our Engineering All Hands meet-up to
discuss your use case and learn Jina's new features.
- When? The second Tuesday of every month
- Where? Zoom (see our public calendar/.ical/Meetup group) and live stream on YouTube
- Subscribe to the latest video tutorials on our YouTube channel
Jina is backed by Jina AI and licensed under Apache-2.0. We are actively hiring AI engineers, solution engineers to build the next neural search ecosystem in open source.
We welcome all kinds of contributions from the open-source community, individuals and partners. We owe our success to your active involvement.