Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More documentation please #49

Open
jusjosgra opened this issue Feb 25, 2020 · 6 comments
Open

More documentation please #49

jusjosgra opened this issue Feb 25, 2020 · 6 comments

Comments

@jusjosgra
Copy link

This library is excellent and I would really like to use it, so first of all thank you for making it available!

Unfortunately, there isnt enough documentation to make it particularly usable beyond a default deployment. For example it is not clear how I could:

  1. send a query directly to es through the nboost proxy without reranking (this is useful for evaluating performance)
  2. use my own model; you provide cli access to control from a list of hosted models but I would like to load a custom model

I am digging through the codebase to solve these issues but it would be great if you could document the API as I am sure these things are straight forward.

Thanks again for the great library.

@Sharathmk99
Copy link

@jusjosgra where you able to train custom model and use it with nboost. I'll be interested to know how you did it.

@AAbbhishekk
Copy link

@jusjosgra Do you have the required data for benchmarking??

@pertschuk
Copy link
Contributor

pertschuk commented Mar 1, 2020

@jusjosgra Thank you for your positive feedback - I will try to improve the documentation.

Essentially the way it works is that there are a set of config variables (which can be viewed by running nboost --help. These config variables are set in the following hierarchy:
python args = {**cli_args, **json_args, **query_args}
e.g. so json args (in the body of the request) overwrite the cli config args, and query_args (set as query params) overwrite both.

As for these two specific questions:
1 ) As per above you can add in the JSON body { 'nboost' : { 'no_rerank': True } } (or add as a query param)
2) You can just link to the relative directory of a Pytorch transformers model (or tensorflow .ckpt model) with --model_dir. If loading a custom model you also need to specify a --model_cls with which to load the custom model which in the case of pytorch transformers would be PtBertRerankModelPlugin.

Nboost expects a binary classifier model trained as 0 = not relevant to search 1 = relevant to search.

@jusjosgra
Copy link
Author

Thanks so much for your thorough response!

For no_rerank, it appears that this is specified at proxy deploy time, is there a way to pass it at query time to skip the reranking step dynamically?

@pertschuk
Copy link
Contributor

pertschuk commented Mar 3, 2020

@jusjosgra all of these flags are both deploy time and dynamic, as you can include it in the json body or query params of the request to override.

For example if request is a POST with the following JSON body:

{
   "query" : "This is a search",
   "nboost" : { 
        "no_rerank" : true
   }
}

or if it is a GET request simply append it to the url http://nboost_host:8000?q=query&no_rerank=True

@pidugusundeep
Copy link

pidugusundeep commented Mar 5, 2020

Hey @pertschuk, Yes I was able to that but it still returns the same results both for no_rerank=True and without, and I am curious to know whey does it return the nBoost scores if the no rerank is set to true such as below

nboost": {
        "scores":{
               "........................."
        }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants