-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SOLR-16667 #4
base: main
Are you sure you want to change the base?
SOLR-16667 #4
Conversation
…(not in the right order with respect to FeaturesInfo)
…n_for_feature_vector_cache
…Id in feature vector cache
…hich are empty if only logging is required)
…ingQuery before using it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor comments, but the overall pull request looks OK!
|
||
public abstract String makeFeatureVector(LTRScoringQuery.FeatureInfo[] featuresInfo); | ||
|
||
private static int fvCacheKey(LTRScoringQuery scoringQuery, int docid) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the generation of the cache key has been moved from this class, motivation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this method is no longer invoked inside the Logger (see row 81 below), therefore I moved it where it is called (in LTRScoringQuery)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also this is now called when creating the LTRScoringQuery (query part of the key) and added as a private variable in the LTRScoringQuery object
https://issues.apache.org/jira/browse/SOLR-16667
Description
Currently, the feature vector cache is used only for logging purposes in the learning-to-rank model.
It would be useful to integrate the cache usage also in the reranking phase to speed up the process.
Solution
A new learning-to-rank feature vector cache has been added to speed up the reranking process. Before this contribution, there was a unique ltr cache for logging purposes, that cache has been removed and a unique new cache has been added for both logging and reranking.
Currently the new cache stores a key defined by: the feature store name, the features definition (in the feature store), and the document id.
The cache is defined in the Solr config as:
Then, the cache is used in org.apache.solr.ltr.LTRScoringQuery.ModelWeight.ModelScorer.FeatureTraversalScorer#fillFeaturesInfo
If no hit happens in the cache, the old behavior is maintained and the feature vector is calculated from scratch.
A change has also been made in org.apache.solr.ltr.model.LinearModel#score and org.apache.solr.ltr.model.NeuralNetworkModel.DefaultLayer#calculateOutput in order to be able to manage NaN values.
When asking for a sparse/dense feature vector format, we would like to:
To apply this behavior we need to differentiate between a "default" value and a computed value that is equal to the default.
Suppose to have a boolean feature, in this case, if the feature value is not defined we will assign the default one (zero and no computation done), but zero is also the value given when the feature is false (here the computation is done).
How to differentiate the two cases?
The user can differentiate the two cases by defining NaN as the default value of that feature. In this way he will see:
Here the need to manage these NaN values in the linear model and in the neural model (their behavior has not been changed).
Tests
A test has been added to check that the feature vectors' of the results returned after a hit in the cache are the same returned when computed from scratch: org/apache/solr/ltr/TestFeatureVectorCache.java
A test has been added to check the new sparse/format behavior: org.apache.solr.ltr.feature.TestFeatureLogging#testDefaultNaNFeatureExtraction
Some tests have been changed to correctly match the default format (sparse or dense) chosen when starting up the test: org/apache/solr/ltr/feature/TestFieldValueFeature.java
Checklist
Please review the following and check all that apply:
main
branch../gradlew check
.