-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature store marketplace operator #441
Merged
Merged
Changes from all commits
Commits
Show all changes
125 commits
Select commit
Hold shift + click to select a range
1109802
bug fix
najiyacl 3fbcda9
init operator
harsh97 22553e3
scaffolding changes
harsh97 f322656
changes
harsh97 51969cf
Complete scaffolding
harsh97 dd143ea
Implement ads init and ads run for marketplace operator
amitkrprajapati ee6c6a7
Updating schema for helidon
harsh97 ddcff7c
Review changes
harsh97 10a57fa
Updating metastore id
harsh97 a53d7d5
Added a way to export helm chart from marketplace
amitkrprajapati 3807662
License checker
harsh97 d6ddf7c
adeded release notes
KshitizLohia 53bb9c3
adeded release notes
KshitizLohia 0cb3b29
Fs bug fix (#440)
KshitizLohia f8a664e
temp changes
harsh97 ca4c1df
updating schema yaml for marketplace
harsh97 c9ac25f
Revert to old api server for marketplace e2e. Fix security token issu…
harsh97 fa61db9
Upgrade helm if chart it already exist.
amitkrprajapati ce76a1e
Added changes to operator (#442)
KshitizLohia 1d2e353
FIX MERGE CONFLICT
amitkrprajapati 617eea8
Merge remote-tracking branch 'origin/feature/feature-store-marketplac…
amitkrprajapati b6e5fa5
Major refactoring and addition of helmers
harsh97 2659178
changes to schema yaml
harsh97 93ef950
Add unit tests
amitkrprajapati c75382a
Add unit tests - local marketplace and utils
amitkrprajapati 93df978
addressing review comments
harsh97 f9e1baf
Merge remote-tracking branch 'origin/feature/feature-store-marketplac…
harsh97 02f33ad
hardcode ocir
harsh97 7ef8513
disabling debug by default
harsh97 6f6e25b
Addressing reviwe comments
harsh97 b0330a7
update doc
harsh97 239454b
Better logging and exception handling
harsh97 bd9480b
Add unit tests - prerequisite checker
amitkrprajapati c3d3e2b
Merge remote-tracking branch 'origin/feature/feature-store-marketplac…
amitkrprajapati cb71ef6
Use helm wait instead
harsh97 a465fe9
Merge remote-tracking branch 'origin/feature/feature-store-marketplac…
harsh97 079b74f
fix failing test cases
amitkrprajapati f5c9648
fix merge conflict and fix test case
amitkrprajapati eddfa24
added documentation for dataset and entity
KshitizLohia dc19372
added documentation for dataset and entity
KshitizLohia 6d10feb
added documentation for dataset and entity
KshitizLohia 655d9d6
added documentation for dataset and entity
KshitizLohia f293e0e
added documentation for dataset and entity
KshitizLohia e8072fd
added documentation for dataset and entity
KshitizLohia 22db6e5
code changes for name instead of display_name
yogesh266 3b6ccc1
added documentation for dataset and entity
KshitizLohia eeff5b7
added documentation for transformation
KshitizLohia 216ae42
changes in the doc
yogesh266 687ac10
Merge remote-tracking branch 'origin/main' into feature/feature-store…
harsh97 0bc9e3b
updating operator to work with latest listing
harsh97 82024b5
updated documentation
KshitizLohia c81a7b4
Merge branch 'feature/feature-store-marketplace-operator' of github.c…
KshitizLohia 85c2de9
updated documentation
KshitizLohia c1890af
Feature store display name to name changes (#494)
KshitizLohia 4802652
Adding api gateway support
harsh97 09cdd19
Merge remote-tracking branch 'origin/feature/feature-store-marketplac…
harsh97 28d1688
fixed restore
yogesh266 54dc2ac
fixed restore (#503)
KshitizLohia dc6a791
documentation for operator
harsh97 e823dcb
Merge remote-tracking branch 'origin/feature/feature-store-marketplac…
harsh97 9300e91
Merge branch 'main' of github.com:oracle/accelerated-data-science int…
KshitizLohia fd93a87
backmerged and document fixes
KshitizLohia 75eb770
removed ocids and fixed issues related to environment variables
KshitizLohia 22baa87
added dataset and feature group documents
KshitizLohia 04d652b
added feature store details
KshitizLohia e714740
name fix
KshitizLohia 0709fe6
name fix
KshitizLohia 0f4cbcb
changed image
KshitizLohia 9b1fb7a
added cicd examples
KshitizLohia c07e730
added cicd examples
KshitizLohia d2e3ce3
Merge branch 'main' of github.com:oracle/accelerated-data-science int…
KshitizLohia 4b48635
fixing uts
harsh97 bb2325d
Merge remote-tracking branch 'origin/feature/feature-store-marketplac…
harsh97 73495c5
making kubernetes imports local
harsh97 db7959e
add kubernetes import in try catch block
harsh97 14afce4
Updating unit tests and adding feature store marketplace requirements
harsh97 20dcfa1
uncommenting prerequisite check
harsh97 7e022b9
adding missing import
harsh97 e6768ad
bug fixes
harsh97 ceb9155
bugfixes
harsh97 2b3ff5e
bugfixes
harsh97 bc50029
fix unit tests
harsh97 fb9791d
fixing uts
harsh97 12e7638
updating requirements
harsh97 e1b8a11
documentation fix
KshitizLohia c5dc098
fixing typo and specifying exact range of compatible langchain versions
harsh97 7c7ae05
bugfix
harsh97 b6dfdd2
updating langchain version
harsh97 96646c4
documentation fix
KshitizLohia eb6eb76
Merge branch 'feature/feature-store-marketplace-operator' of github.c…
KshitizLohia 8b5eae3
Adding bugfixes and unit tests
harsh97 3f2b878
Merge remote-tracking branch 'origin/feature/feature-store-marketplac…
harsh97 4a6bca9
Merge remote-tracking branch 'origin/main' into feature/feature-store…
harsh97 a2f072f
name fix
KshitizLohia f8c09fd
add more unit tests
harsh97 f4b4929
Merge remote-tracking branch 'origin/feature/feature-store-marketplac…
harsh97 2230534
code fixes for transfrmation mode check
yogesh266 9374a42
documentation fix
KshitizLohia c89aa0e
Merge branch 'feature/feature-store-marketplace-operator' of github.c…
KshitizLohia 054e6e2
documentation fix
KshitizLohia 72e753c
code fixes for transfrmation mode check (#516)
KshitizLohia f624981
disable showing choices
harsh97 c22d7a6
Merge remote-tracking branch 'origin/feature/feature-store-marketplac…
harsh97 2c7a0d2
Disable apigateway stack deployment
harsh97 e032cd1
Revert "Disable apigateway stack deployment"
harsh97 80793ec
Adding feature store docs
harsh97 16bdd26
document fix
KshitizLohia 93ecfab
Documentation and version fixes
harsh97 2855f71
Revert "Revert "Disable apigateway stack deployment""
harsh97 aac9e1c
Merge remote-tracking branch 'origin/feature/feature-store-marketplac…
harsh97 cbea610
policy fix
harsh97 70c1fd5
Merge remote-tracking branch 'origin/main' into feature/feature-store…
harsh97 4ec4e8f
formatting fixes
harsh97 a52d7ee
removing file with ocids
harsh97 82bbf69
adding copyright text
harsh97 d20cdca
Add missing import
harsh97 2c4fc53
Adding demos
harsh97 3f63194
revert uneeded modifications
harsh97 b69dde3
remove uneeded file
harsh97 6d4632a
Merge remote-tracking branch 'origin/main' into feature/feature-store…
harsh97 8f87171
addressed comments
KshitizLohia 50425af
Addressing review comments
harsh97 ecb4de3
Merge remote-tracking branch 'origin/feature/feature-store-marketplac…
harsh97 4de3355
Fixing requirements.txt
harsh97 24e2c1c
Addressing cli client bug
harsh97 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,65 @@ | ||
# Oracle Feature Store (ADS) | ||
|
||
[![Python](https://img.shields.io/badge/python-3.8-blue?style=for-the-badge&logo=pypi&logoColor=white)](https://pypi.org/project/oracle-ads/) [![PysparkConda](https://img.shields.io/badge/fspyspark32_p38_cpu_v2-1.0-blue?style=for-the-badge&logo=pypi&logoColor=white)](https://docs.oracle.com/en-us/iaas/data-science/using/conda-pyspark-fam.htm) [![Notebook Examples](https://img.shields.io/badge/docs-notebook--examples-blue?style=for-the-badge&logo=pypi&logoColor=white)](https://github.com/oracle-samples/oci-data-science-ai-samples/tree/master/notebook_examples) [![Delta](https://img.shields.io/badge/delta-2.0.1-blue?style=for-the-badge&logo=pypi&logoColor=white)](https://delta.io/) [![PySpark](https://img.shields.io/badge/pyspark-3.2.1-blue?style=for-the-badge&logo=pypi&logoColor=white)](https://spark.apache.org/docs/3.2.1/api/python/index.html) [![Great Expectations](https://img.shields.io/badge/greatexpectations-0.17.19-blue?style=for-the-badge&logo=pypi&logoColor=white)](https://greatexpectations.io/) [![Pandas](https://img.shields.io/badge/pandas-1.5.3-blue?style=for-the-badge&logo=pypi&logoColor=white)](https://pandas.pydata.org/) [![PyArrow](https://img.shields.io/badge/pyarrow-11.0.0-blue?style=for-the-badge&logo=pypi&logoColor=white)](https://arrow.apache.org/docs/python/index.html) | ||
|
||
Managing many datasets, data sources, and transformations for machine learning is complex and costly. Poorly cleaned data, data issues, bugs in transformations, data drift, and training serving skew all lead to increased model development time and poor model performance. Feature store solves many of the problems because it is a centralized way to transform and access data for training and serving time, Feature stores help define a standardised pipeline for ingestion of data and querying of data. | ||
|
||
ADS feature store is a stack-based solution that is deployed in your tenancy using OCI Resource Manager. | ||
|
||
Following are brief descriptions of key concepts and the main components of ADS feature store. | ||
|
||
- ``Feature Vector``: Set of feature values for any one primary and identifier key. For example, all and a subset of features of customer ID 2536 can be called as one feature vector . | ||
- ``Feature``: A feature is an individual measurable property or characteristic of an event being observed. | ||
- ``Entity``: An entity is a group of semantically related features. The first step a consumer of features would typically do when accessing the feature store service is to list the entities and the entities associated with features. Another way to look at it is that an entity is an object or concept that's described by its features. Examples of entities are customer, product, transaction, review, image, document, and so on. | ||
- ``Feature Group``: A feature group in a feature store is a collection of related features that are often used together in ML models. It serves as an organizational unit within the feature store for users to manage, version, and share features across different ML projects. By organizing features into groups, data scientists and ML engineers can efficiently discover, reuse, and collaborate on features reducing the redundant work and ensuring consistency in feature engineering. | ||
- ``Feature Group Job``: Feature group jobs are the processing instance of a feature group. Each feature group job includes validation results and statistics results. | ||
- ``Dataset``: A dataset is a collection of features that are used together to either train a model or perform model inference. | ||
- ``Dataset Job``: A dataset job is the processing instance of a dataset. Each dataset job includes validation results and statistics results. | ||
|
||
## Documentation | ||
|
||
- [Oracle Feature Store SDK (ADS) Documentation](https://feature-store-accelerated-data-science.readthedocs.io/en/latest/) | ||
- [OCI Data Science and AI services Examples](https://github.com/oracle/oci-data-science-ai-samples) | ||
- [Oracle AI & Data Science Blog](https://blogs.oracle.com/ai-and-datascience/) | ||
- [OCI Documentation](https://docs.oracle.com/en-us/iaas/data-science/using/data-science.htm) | ||
|
||
## Examples | ||
|
||
### Quick start examples | ||
|
||
| Jupyter Notebook | Description | | ||
|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | ||
| [Feature store querying](https://github.com/oracle-samples/oci-data-science-ai-samples/blob/main/notebook_examples/feature_store_querying.ipynb) | - Ingestion, querying and exploration of data. | | ||
| [Feature store quickstart](https://github.com/oracle-samples/oci-data-science-ai-samples/blob/main/notebook_examples/feature_store_quickstart.ipynb) | - Ingestion, querying and exploration of data. | | ||
| [Schema enforcement and schema evolution](https://github.com/oracle-samples/oci-data-science-ai-samples/blob/main/notebook_examples/feature_store_schema_evolution.ipynb) | - `Schema evolution` allows you to easily change a table's current schema to accommodate data that is changing over time. `Schema enforcement`, also known as schema validation, is a safeguard in Delta Lake that ensures data quality by rejecting writes to a table that don't match the table's schema. | | ||
| [Storage of medical records in feature store](https://github.com/oracle-samples/oci-data-science-ai-samples/blob/main/notebook_examples/feature_store_ehr_data.ipynb) | Example to demonstrate storage of medical records in feature store | | ||
|
||
### Big data operations using OCI DataFlow | ||
|
||
| Jupyter Notebook | Description | | ||
|---------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------| | ||
| [Big data operations with feature store](https://github.com/oracle-samples/oci-data-science-ai-samples/blob/main/notebook_examples/feature_store_spark_magic.ipynb) | - Ingestion of data using Spark Magic, querying and exploration of data using Spark Magic. | | ||
|
||
### LLM Use cases | ||
|
||
| Jupyter Notebook | Description | | ||
|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | ||
| [Embeddings in Feature Store](https://github.com/oracle-samples/oci-data-science-ai-samples/blob/main/notebook_examples/feature_store_embeddings.ipynb) | - `Embedding feature stores` are optimized for fast and efficient retrieval of embeddings. This is important because embeddings can be high-dimensional and computationally expensive to calculate. By storing them in a dedicated store, you can avoid the need to recalculate embeddings for the same data repeatedly. | | ||
| [Synthetic data generation in feature store using OpenAI and FewShotPromptTemplate](https://github.com/oracle-samples/oci-data-science-ai-samples/blob/main/notebook_examples/feature_store_medical_synthetic_data_openai.ipynb) | - `Synthetic data` is artificially generated data, rather than data collected from real-world events. It's used to simulate real data without compromising privacy or encountering real-world limitations. | | ||
| [PII Data redaction, Summarise Content and Translate content using doctran and open AI](https://github.com/oracle-samples/oci-data-science-ai-samples/blob/main/notebook_examples/feature_store_pii_redaction_and_transformation.ipynb) | - One way to think of Doctran is a LLM-powered black box where messy strings go in and nice, clean, labelled strings come out. Another way to think about it is a modular, declarative wrapper over OpenAI's functional calling feature that significantly improves the developer experience. | | ||
| [OpenAI embeddings in feature store](https://github.com/oracle-samples/oci-data-science-ai-samples/blob/main/notebook_examples/feature_store_embeddings_openai.ipynb) | - `Embedding feature stores` are optimized for fast and efficient retrieval of embeddings. This is important because embeddings can be high-dimensional and computationally expensive to calculate. By storing them in a dedicated store, you can avoid the need to recalculate embeddings for the same data repeatedly. | | ||
|
||
|
||
## Contributing | ||
|
||
This project welcomes contributions from the community. Before submitting a pull request, please [review our contribution guide](./../../CONTRIBUTING.md) | ||
|
||
Find Getting Started instructions for developers in [README-development.md](https://github.com/oracle/accelerated-data-science/blob/main/README-development.md) | ||
|
||
## Security | ||
|
||
Consult the security guide [SECURITY.md](https://github.com/oracle/accelerated-data-science/blob/main/SECURITY.md) for our responsible security vulnerability disclosure process. | ||
|
||
## License | ||
|
||
Copyright (c) 2020, 2022 Oracle and/or its affiliates. Licensed under the [Universal Permissive License v1.0](https://oss.oracle.com/licenses/upl/) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is needed for generation of class files for feature store for class level documentation