Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Partial PEP updates #59

Open
nleroy917 opened this issue Dec 16, 2022 · 6 comments
Open

Partial PEP updates #59

nleroy917 opened this issue Dec 16, 2022 · 6 comments
Labels
enhancement New feature or request wontfix This will not be worked on

Comments

@nleroy917
Copy link
Member

To support the PATCH requests and partial updating of PEPs on pephub, it would be convenient to have a .update_pep() function. I can also just execute raw SQL if needed.

@nleroy917 nleroy917 added the enhancement New feature or request label Dec 16, 2022
@khoroshevskyi
Copy link
Member

Ideally, we should have partial updates of all the value in the column. So for the structure that we have right now, we can't change columns e.g. annotation.

Some useful information:
"Ideally, you don't use JSON documents for structured, regular data that you want to manipulate inside a relational database. Use a normalized relational design instead."
https://dba.stackexchange.com/questions/115825/jsonb-with-indexing-vs-hstore/115849#115849

In this case my proposition is to move keys: last_update, number_of_samples to a new columns. Otherwise we shouldn't change this values separately.

About changing the PEPs, I think we should do it using complete peppy object

p.s. partial pep updates is ready, but I think it is really a bad idea to change json values separately.

@nsheff
Copy link
Contributor

nsheff commented Dec 21, 2022

but I think it is really a bad idea to change json values separately.

I don't think there's any problem to update JSON values separately. Can you explain why you are reluctant?

@nleroy917
Copy link
Member Author

nleroy917 commented Dec 21, 2022

Would it be easier to supply and API that is more something like this:

values_to_update = {
  "description": "This is a new PEP.",
  "is_private": False,
  "sample_table": "sample_name,file\nfrog1,frog1.txt\nfrog2,frog2.txt"
 }

for key, value in values_to_update.items():
    pepdbagent.update_project("nleroy917/basic:default", key, value)

There needs to be some logic as to whether or not the key-value pairs belong to the project or the annotation... that can be tricky I understand its not just a simple update statement.

@khoroshevskyi
Copy link
Member

About project dict, it's fine that all the values (config, sample, subsample) are in one place, because it's one project. But if we are talking about annotation, e.g. last_update, It's better to create new column.

I found information that: "From a performance perspective, it hardly matters whether you change a single piece of data inside a JSON object or all of it: a new version of the row has to be written."

Additionally, if we want to update json, the SQL is more complicated, especially if we want to update few columns, or few values from one column.
Sample sql code:

UPDATE projects 
    SET digest = %s, private = %s, anno_info = jsonb_set(anno_info, '{description}', '%s', '{number_of_samples}', '%s')
    WHERE namespace = %s and name = %s and tag = %s;

Ok, we also can send separate requests to update every key individually. But in that case, it will be inefficient.


p.s. I already wrote function to update values partially, but there is still some errors...:

def update_item(self,
                update_dict: dict,
                namespace: str,
                name: str,
                tag: str) -> UploadResponse

update_dict – dict with update key->values. Dict structure: { name: Optional[str] namespace: Optional[str] tag: Optional[str] digest: Optional[str] project_value: dict private: Optional[bool] anno_info: dict }
namespace – project namespace
name – project name
tag – project tag

def update_item(
self,
update_dict: dict,
namespace: str,
name: str,
tag: str,
) -> UploadResponse:
"""
Update partial parts of the project record
:param update_dict: dict with update key->values. Dict structure:
{
name: Optional[str]
namespace: Optional[str]
tag: Optional[str]
digest: Optional[str]
project_value: dict
private: Optional[bool]
anno_info: dict
}
:param namespace: project namespace
:param name: project name
:param tag: project tag
:return: ResponseModel with information if project was updated
"""
cursor = self.pg_connection.cursor()
if self.project_exists(namespace=namespace, name=name, tag=tag):
try:
set_sql, set_values = self.__create_update_set(UpdateModel(**update_dict))
sql = f"""UPDATE {DB_TABLE_NAME}
{set_sql}
WHERE {NAMESPACE_COL} = %s and {NAME_COL} = %s and {TAG_COL} = %s;"""
_LOGGER.debug("Updating items...")
cursor.execute(
sql,
(*set_values, namespace, name, tag),
)
_LOGGER.info(f"Record '{namespace}/{name}:{tag}' was successfully updated!")
self._commit_to_database()
except Exception as err:
_LOGGER.error(f"Error while updating project! Error: {err}")
return UploadResponse(
registry_path=f"{namespace}/{name}:{tag}",
log_stage="update_item",
status="failure",
info=f"Error in executing SQL. {err}!",
)
else:
_LOGGER.error("Project does not exist! No project will be updated!")
return UploadResponse(
registry_path=f"{namespace}/{name}:{tag}",
log_stage="update_item",
status="failure",
info="Project does not exist!",
)
return UploadResponse(
registry_path=f"{namespace}/{name}:{tag}",
log_stage="update_item",
status="success",
info="Record was successfully updated!",
)
@staticmethod
def __create_update_set(update_info: UpdateModel) -> Tuple[str, tuple]:
"""
Create sql SET string by passing UpdateModel that later is converted to dict
:param update_info: UpdateModel (similar to database model)
:return: {sql_string (contains db keys) and updating values}
"""
_LOGGER.debug("Creating SET SQL string to update project")
sql_string = f"""SET """
sql_values = []
first = True
for key, val in update_info.dict(exclude_none=True).items():
if first:
sql_string = ''.join([sql_string, f"{key} = %s"])
first = False
else:
sql_string = ', '.join([sql_string, f"{key} = %s"])
if isinstance(val, dict):
input_val = json.dumps(val)
else:
input_val = val
sql_values.append(input_val)
# if not isinstance(val, dict):
# else:
# keys_str = ""
# for key1, val1 in val.items():
# keys_str = ", ".join([keys_str, f""" '{{{str(key1)}}}', %s"""])
# sql_values.append(f'"{val1}"')
# sql_string = ', '.join([sql_string, f"""{key} = jsonb_set({key} {keys_str})"""])
return sql_string, tuple(sql_values)

@nleroy917
Copy link
Member Author

@khoroshevskyi this is great, and I can integrate this with pephub as it is. Wondering if there is another method for updating things like description or the project config file or the sample tables?

@khoroshevskyi khoroshevskyi added the wontfix This will not be worked on label Jan 4, 2023
@khoroshevskyi khoroshevskyi mentioned this issue Jul 7, 2023
@khoroshevskyi
Copy link
Member

In v0.7.0 I added sample update/add/delete functionality. So this is partial solution for this issue

@nsheff nsheff added this to PEP Jun 25, 2024
@nsheff nsheff added this to the v0.9.0 milestone Jun 25, 2024
@khoroshevskyi khoroshevskyi removed their assignment Jun 25, 2024
@khoroshevskyi khoroshevskyi removed this from the v0.9.0 milestone Jul 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

3 participants