Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Partial context updates #93

Open
wants to merge 340 commits into
base: dev
Choose a base branch
from
Open

Partial context updates #93

wants to merge 340 commits into from

Conversation

pseusys
Copy link
Collaborator

@pseusys pseusys commented Mar 13, 2023

Description

Context storages are updated partially now instead of reading and writing whole data at once.

Checklist

  • I have covered the code with tests
  • I have added comments to my code to help others understand it
  • I have updated the documentation to reflect the changes
  • I have performed a self-review of the changes
  • Consider extending UpdateScheme from BaseModel
  • Decide how we want to use clear method.

@pseusys pseusys self-assigned this Mar 13, 2023
@pseusys pseusys requested review from kudep and RLKRo April 7, 2023 01:43
@pseusys pseusys added the enhancement New feature or request label Apr 7, 2023
@pseusys pseusys marked this pull request as ready for review April 7, 2023 01:43
@kudep kudep marked this pull request as draft April 24, 2023 16:41
dff/context_storages/database.py Outdated Show resolved Hide resolved
dff/context_storages/update_scheme.py Outdated Show resolved Hide resolved
dff/context_storages/update_scheme.py Outdated Show resolved Hide resolved
dff/context_storages/update_scheme.py Outdated Show resolved Hide resolved
dff/context_storages/update_scheme.py Outdated Show resolved Hide resolved
dff/context_storages/update_scheme.py Outdated Show resolved Hide resolved
dff/context_storages/update_scheme.py Outdated Show resolved Hide resolved
dff/context_storages/json.py Outdated Show resolved Hide resolved
dff/context_storages/update_scheme.py Outdated Show resolved Hide resolved
dff/context_storages/update_scheme.py Outdated Show resolved Hide resolved
dff/context_storages/update_scheme.py Outdated Show resolved Hide resolved
dff/context_storages/update_scheme.py Outdated Show resolved Hide resolved
@pseusys

This comment was marked as outdated.

RLKRo

This comment was marked as outdated.

chatsky/context_storages/database.py Outdated Show resolved Hide resolved
_responses_field_name: Literal["responses"] = "responses"
_default_subscript_value: int = 3

def __init__(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't call it lazy.
The entry point to a chatsky program is usually pipeline.run where interface is connected.
My proposal is to connect db at the same time (in the entry point of a program).

The lazy part of this implementation is connecting db inside methods if it is not already connected which is done for the cases where pipeline.run is, for some reason, not used.

chatsky/context_storages/database.py Show resolved Hide resolved
return sha256(string).digest()


class ContextDict(BaseModel, Generic[K, V]):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add separate classes for context dicts of different types for easier value_type management.

class ContextDict

class LabelDict(ContextDict[int, Label])
	_value_type = Label

class MessageDict(ContextDict[int, Message]):
	_value_type = Message

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make ContextDict abstract.

docs/source/user_guides/context_guide.rst Show resolved Hide resolved
self.main_table = Table(
f"{table_name_prefix}_{self._main_table_name}",
metadata,
Column(self._id_column_name, String(self._UUID_LENGTH), index=True, unique=True, nullable=False),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think it should be named ID_LENGTH in that case.
I want to be safe with the id length (wouldn't want a context not saved because the id is too long).
Maybe make it 255 characters?
Also, SQLAlchemy docs say that this number can be interpreted as either bytes or characters depending on the db used.

chatsky/context_storages/sql.py Outdated Show resolved Hide resolved
Copy link
Member

@RLKRo RLKRo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 154 to 160
@overload
async def get(self, key: K) -> V: ... # noqa: E704

@overload
async def get(self, key: Iterable[K]) -> List[V]: ... # noqa: E704

async def get(self, key, default=None) -> V:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty sure overloaded methods need to have the same arguments.

chatsky/core/context.py Outdated Show resolved Hide resolved
_responses_field_name: Literal["responses"] = "responses"
_default_subscript_value: int = 3

def __init__(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant in the pipeline.run method (which currently calls messenger_interface.connect and is an entry point for every chatsky bot), not pipeline._run_pipeline.

Which is what I meant when I said that it isn't that lazy:
DB will be initialized before the first request is received.

return sha256(string).digest()


class ContextDict(BaseModel, Generic[K, V]):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make ContextDict abstract.

"""
return asyncio.run(self.set_item_async(key, value))
if not self.connected:
logger.debug(f"Connecting to context storage {type(self).__name__} ...")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should do an info level log in the connect method about the connection being made and here do a warning level log that db was not initiated via pipeline.run.

See my reply to the connection thread for more context.

chatsky/context_storages/sql.py Outdated Show resolved Hide resolved
chatsky/context_storages/file.py Outdated Show resolved Hide resolved
self.main_table = Table(
f"{table_name_prefix}_{self._main_table_name}",
metadata,
Column(self._id_column_name, String(self._UUID_LENGTH), index=True, unique=True, nullable=False),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should allow setting ID_LENGTH in init.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants