Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FSTORE-980] Helper, primary key and event time columns with feature view #1111

Merged
merged 44 commits into from
Nov 21, 2023

Conversation

davitbzh
Copy link
Contributor

@davitbzh davitbzh commented Sep 7, 2023

This PR adds/fixes/changes...

  • please summarize your changes to the code
  • and make sure to include all changes to user-facing APIs

JIRA Issue: -

Priority for Review: -

Related PRs: -

How Has This Been Tested?

  • Unit Tests
  • Integration Tests
  • Manual Tests on VM

Checklist For The Assigned Reviewer:

- [ ] Checked if merge conflicts with master exist
- [ ] Checked if stylechecks for Java and Python pass
- [ ] Checked if all docstrings were added and/or updated appropriately
- [ ] Ran spellcheck on docstring
- [ ] Checked if guides & concepts need to be updated
- [ ] Checked if naming conventions for parameters and variables were followed
- [ ] Checked if private methods are properly declared and used
- [ ] Checked if hard-to-understand areas of code are commented
- [ ] Checked if tests are effective
- [ ] Built and deployed changes on dev VM and tested manually
- [x] (Checked if all type annotations were added and/or updated appropriately)

@davitbzh davitbzh added the WIP This issue or pull request is a work in progress label Sep 7, 2023
@davitbzh davitbzh requested a review from kennethmhc September 7, 2023 19:32
@davitbzh davitbzh changed the title [FSTORE-980] Primary key and event time columns with feature view [FSTORE-980] Helper, primary key and event time columns with feature view Oct 17, 2023
@davitbzh davitbzh requested a review from SirOibaf October 18, 2023 08:11
@davitbzh davitbzh removed the WIP This issue or pull request is a work in progress label Oct 18, 2023
"""
if (
self._batch_vectors_server is None
or self._single_vector_server._helper_column_prepared_statements is None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe initialise _helper_column_prepared_statements by default? My concern is when calling init_serving again, it may interrupt the concurrent call to get_feature_vector.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

self, entries, return_type=None, passed_features=[], allow_missing=False
):
"""Assembles serving vector from online feature store."""
return result_dict, serving_vector
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

result_dict only stores the result of the last prepared statement. same for the batch case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I understood. I just put this in a function to not duplicate the code. It must do the same as here https://github.com/logicalclocks/feature-store-api/blob/master/python/hsfs/core/vector_server.py#L248

Am I missing something?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here
result_dict store the value of transformed serving_vector
result_dict = self._apply_transformation(serving_vector)

but in the code above

for row in result_proxy:
                    result_dict = self.deserialize_complex_features(
                        self._complex_features, row._asdict()
                    )
                    serving_vector.update(result_dict)

the last assignment to result_dict is inside a for loop.

Copy link
Contributor Author

@davitbzh davitbzh Oct 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes but _vector_result is just a copy of from here to here. That was the whole idea. I am still unable to spot the difference

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is exactly the same code here

                for row in result_proxy:
                    result_dict = self.deserialize_complex_features(
                        self._complex_features, row._asdict()
                    )
                    serving_vector.update(result_dict)

entries, self._helper_column_prepared_statements
)

# drop serving key names
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why drop serving key here even though they are requested by users?

@davitbzh davitbzh requested a review from kennethmhc October 25, 2023 22:07
@@ -315,14 +319,14 @@ def get_feature_vectors(
def get_inference_helper(self, entry, return_type):
"""Assembles serving vector from online feature store."""

result_dict, _ = self._vector_result(
serving_vector = self._vector_result(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to filter out features which are not helper columns?

Copy link
Contributor

@kennethmhc kennethmhc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@SirOibaf SirOibaf enabled auto-merge (squash) November 21, 2023 23:23
@SirOibaf SirOibaf merged commit 6a6ab76 into logicalclocks:master Nov 21, 2023
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants