-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Partial context updates #93
base: dev
Are you sure you want to change the base?
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
_responses_field_name: Literal["responses"] = "responses" | ||
_default_subscript_value: int = 3 | ||
|
||
def __init__( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't call it lazy.
The entry point to a chatsky program is usually pipeline.run
where interface is connected.
My proposal is to connect db at the same time (in the entry point of a program).
The lazy part of this implementation is connecting db inside methods if it is not already connected which is done for the cases where pipeline.run
is, for some reason, not used.
chatsky/core/ctx_dict.py
Outdated
return sha256(string).digest() | ||
|
||
|
||
class ContextDict(BaseModel, Generic[K, V]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add separate classes for context dicts of different types for easier value_type management.
class ContextDict
class LabelDict(ContextDict[int, Label])
_value_type = Label
class MessageDict(ContextDict[int, Message]):
_value_type = Message
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make ContextDict
abstract.
chatsky/context_storages/sql.py
Outdated
self.main_table = Table( | ||
f"{table_name_prefix}_{self._main_table_name}", | ||
metadata, | ||
Column(self._id_column_name, String(self._UUID_LENGTH), index=True, unique=True, nullable=False), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think it should be named ID_LENGTH
in that case.
I want to be safe with the id length (wouldn't want a context not saved because the id is too long).
Maybe make it 255 characters?
Also, SQLAlchemy docs say that this number can be interpreted as either bytes or characters depending on the db used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I closed all resolved comments.
There are still two comments left unanswered:
https://github.com/deeppavlov/chatsky/pull/93/files#r1842234145
https://github.com/deeppavlov/chatsky/pull/93/files#r1850984921
chatsky/core/ctx_dict.py
Outdated
@overload | ||
async def get(self, key: K) -> V: ... # noqa: E704 | ||
|
||
@overload | ||
async def get(self, key: Iterable[K]) -> List[V]: ... # noqa: E704 | ||
|
||
async def get(self, key, default=None) -> V: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pretty sure overloaded methods need to have the same arguments.
_responses_field_name: Literal["responses"] = "responses" | ||
_default_subscript_value: int = 3 | ||
|
||
def __init__( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant in the pipeline.run
method (which currently calls messenger_interface.connect
and is an entry point for every chatsky bot), not pipeline._run_pipeline
.
Which is what I meant when I said that it isn't that lazy:
DB will be initialized before the first request is received.
chatsky/core/ctx_dict.py
Outdated
return sha256(string).digest() | ||
|
||
|
||
class ContextDict(BaseModel, Generic[K, V]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make ContextDict
abstract.
chatsky/context_storages/database.py
Outdated
""" | ||
return asyncio.run(self.set_item_async(key, value)) | ||
if not self.connected: | ||
logger.debug(f"Connecting to context storage {type(self).__name__} ...") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should do an info level log in the connect
method about the connection being made and here do a warning level log that db was not initiated via pipeline.run
.
See my reply to the connection thread for more context.
chatsky/context_storages/sql.py
Outdated
self.main_table = Table( | ||
f"{table_name_prefix}_{self._main_table_name}", | ||
metadata, | ||
Column(self._id_column_name, String(self._UUID_LENGTH), index=True, unique=True, nullable=False), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should allow setting ID_LENGTH
in init.
Description
Context storages are updated partially now instead of reading and writing whole data at once.
Checklist
UpdateScheme
fromBaseModel
clear
method.