Skip to content
/ jina Public
forked from jina-ai/serve

Cloud-native neural search framework for ๐™–๐™ฃ๐™ฎ kind of data

License

Notifications You must be signed in to change notification settings

jonochang/jina

ย 
ย 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Jina logo: Jina is a cloud-native neural search framework

Cloud-Native Neural Search? Framework for Any Kind of Data

Python 3.7 3.8 3.9 PyPI Docker Image Version (latest semver) codecov

Jina is a neural search framework that empowers anyone to build SOTA and scalable deep learning search applications in minutes.

โฑ๏ธ Save time - The design pattern of neural search systems, building a solution in just minutes.

๐ŸŒŒ All data types - Processing, indexing, querying, understanding of video, image, long/short text, music, source code, PDF, etc.

๐ŸŒฉ๏ธ Local & cloud friendly - Distributed architecture, scalable & cloud-native from day one. Same developer experience on both local and cloud.

๐Ÿฑ Own your stack - Keep end-to-end stack ownership of your solution. Avoid integration pitfalls you get with fragmented, multi-vendor, generic legacy tools.

Install

pip install -U jina

More install options including Conda, Docker, on Windows can be found here.

Get Started

Get started with Jina to build production-ready neural search solution via ResNet in less than 20 minutes

We promise you to build a scalable ResNet-powered image search service in 20 minutes or less, from scratch. If not, you can forget about Jina.

Basic Concepts

Document, Executor, and Flow are three fundamental concepts in Jina.

  • Document is the basic data type in Jina;
  • Executor is how Jina processes Documents;
  • Flow is how Jina streamlines and distributes Executors.

Leveraging these three components, let's build an app that find similar images using ResNet50.

ResNet50 Image Search in 20 Lines

๐Ÿ’ก Preliminaries: download dataset, install PyTorch & Torchvision

from jina import DocumentArray, Document

docs = DocumentArray.from_files('img/*.jpg')  # load all image filenames into a DocumentArray
for d in docs:  # preprocess them
    (d.load_uri_to_image_blob()  # load
     .set_image_blob_normalization()  # normalize color 
     .set_image_blob_channel_axis(-1, 0))  # switch color axis

import torchvision
model = torchvision.models.resnet50(pretrained=True)  # load ResNet50
docs.embed(model, device='cuda')  # embed via GPU to speedup

q = (Document(uri='img/00021.jpg')  # build query image & preprocess
     .load_uri_to_image_blob()
     .set_image_blob_normalization()
     .set_image_blob_channel_axis(-1, 0))
q.embed(model)  # embed
q.match(docs)  # find top-20 nearest neighbours, done!

Done! Now print q.matches and you will see most-similar images URIs.

Print q.matches to get visual similar images in Jina using ResNet50

Add 3 lines of code to visualize them:

for m in q.matches:
    m.set_image_blob_channel_axis(0, -1).set_image_blob_inv_normalization()
q.matches.plot_image_sprites()

Visualize visual similar images in Jina using ResNet50

Sweet! FYI, one can use Keras or PaddlePaddle for the embedding model. Jina supports them well.

Scale and Serve in 10 Extra Lines

With an extremely trivial refactoring and 10 extra lines of code, you can make the local script as a ready-to-serve service:

  1. Import what we need.

    from jina import DocumentArray, Executor, Flow, requests
  2. Copy-paste the preprocessing step and wrap it via Executor:

    class PreprocImg(Executor):
        @requests
        def foo(self, docs: DocumentArray, **kwargs):
            for d in docs:
                (d.load_uri_to_image_blob()
                 .set_image_blob_normalization()
                 .set_image_blob_channel_axis(-1, 0))
  3. Copy-paste the embedding step and wrap it via Executor:

    class EmbedImg(Executor):
        @requests
        def foo(self, docs: DocumentArray, **kwargs):
            import torchvision
            model = torchvision.models.resnet50(pretrained=True)
            docs.embed(model)
  4. Wrap the matching step into Executor:

    class MatchImg(Executor):
        _da = DocumentArray()
    
        @requests(on='/index')
        def index(self, docs: DocumentArray, **kwargs):
            self._da.extend(docs)
    
        @requests(on='/search')
        def foo(self, docs: DocumentArray, **kwargs):
            docs.match(self._da)
            for d in docs.traverse_flat('r,m'):  # only require for visualization
                d.convert_uri_to_datauri()  # convert to datauri
                d.pop('embedding', 'blob')  # remove unnecessary fields for save bandwidth
  5. Connect all Executors in a Flow, scale embedding to 3:

    f = Flow(port_expose=12345, protocol='http').add(uses=PreprocImg).add(uses=EmbedImg, replicas=3).add(uses=MatchImg)

    Plot it via f.plot('flow.svg') and you get:

  6. Index image data and serve REST query from public:

    with f:
        f.post('/index', DocumentArray.from_files('img/*.jpg'), show_progress=True, request_size=8)
        f.block()

Done! Now query it via curl you can get most-similar images:

Use curl to query image search service built by Jina & ResNet50

Or go to http://0.0.0.0:12345/docs and test requests via Swagger UI:

Visualize visual similar images in Jina using ResNet50

Or use a Python client to access the service:

from jina import Client, Document
from jina.types.request import Response

def print_matches(resp: Response):  # the callback function invoked when task is done
    for idx, d in enumerate(resp.docs[0].matches):  # print top-3 matches
        print(f'[{idx}]{d.scores["cosine"].value:2f}: "{d.uri}"')

c = Client(protocol='http', port=12345)  # connect to localhost:12345
c.post('/search', Document(uri='img/00021.jpg'), on_done=print_matches)

At this point, you probably have taken 15 minutes but here we are: an image search service with rich features:

โœ… Solution as a service โœ… Scale in/out any component โœ… Query via HTTP/WebSocket/gRPC/Client
โœ… Distribute/Dockerize components โœ… Async/non-blocking I/O โœ… Extendable REST interface

Deploy to Kubernetes on GCP in 7 Minutes

Have another 7 minutes? We can show you how to bring your service to the next level by deploying it to Kubernetes on Google Cloud Platform.

  1. Create a Kubernetes cluster and get credentials:
    gcloud container clusters create test --machine-type e2-highmem-2  --num-nodes 1 --zone europe-west3-a
    gcloud container clusters get-credentials test --zone europe-west3-a --project jina-showcase
  2. Move each Executor class to a separate folder with one Python file:
    • PreprocImg -> ๐Ÿ“ preproc_img/exec.py
    • EmbedImg -> ๐Ÿ“ embed_img/exec.py
    • MatchImg -> ๐Ÿ“ match_img/exec.py
  3. Push all Executors to Jina Hub:
    jina hub push preproc_img
    jina hub push embed_img
    jina hub push embed_img
    You will get three Hub Executors that can be used via Docker container.
  4. Adjust Flow a bit and open it:
    f = Flow(name='readme-flow', port_expose=12345, infrastructure='k8s').add(uses='jinahub+docker://PreprocImg').add(uses='jinahub+docker://EmbedImg', replicas=3).add(uses='jinahub+docker://MatchImg')
    with f:
        f.block()

Intrigued? Then find more about Jina from our docs.

Run Quick Demo

Support

Join Us

Jina is backed by Jina AI and licensed under Apache-2.0. We are actively hiring AI engineers, solution engineers to build the next neural search ecosystem in open source.

Contributing

We welcome all kinds of contributions from the open-source community, individuals and partners. We owe our success to your active involvement.

All Contributors

About

Cloud-native neural search framework for ๐™–๐™ฃ๐™ฎ kind of data

Resources

License

Code of conduct

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 97.4%
  • HTML 0.8%
  • Shell 0.6%
  • CSS 0.4%
  • Dockerfile 0.4%
  • EJS 0.2%
  • JavaScript 0.2%