-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Split Large Records Support #293
Comments
I need to read your |
@nunomaduro I saw that docs page. My proposition is similar but more flexible and allows for clear separation of concerns to avoid doing nasty Additionally giving access to the context allows writing an universal splitter for e.g. articles and comments where the only difference is split length (say 1000 words vs. 100 words). Additionally I deliberately omitted support for magic |
@nunomaduro Any word on this? I can poke on implementation of this, but it doesn't make sense if it will be left rooting due to architectural doubts. |
@kiler129 Your idea seems fine, but I don't want rush things. Let's hold this for some weeks. |
Background
Algolia recommends splitting large records (e.g. blog posts) into smaller chunks for better search relevance. There seems to be no support for this in the Symfony bundle
Suggestion
I think that functionality should be implemented in a flexible way, allowing anyone to define decoupled business logic around that. In order to do that from my perspective couple basic principles has to be met:
Architecture
So far this is more a rough idea than a plan. First of all I believe the chunk splitting should happen after all normalizers were executed. The bundle should not interfere with normalization, especially since AFAIK you cannot emit multiple objects for a single normalized object.
I suggest that configuration gets a format similar to:
Additionally two new interfaces should be introduced:
ChunkTransformerInterface
IdTransformerInterface
Responsible for generating new object ID based on a chunk. Called after chunk transformation for each chunk.
RFC
My suggestion allows for any business scenarios for chunking and variety of implementation. One can simply decide to implement a method on the entity itself and implement
ChunkTransformerInterface
while putting very simplistic implementation likereturn explode('.', $normalized['body']);
while others may use advanced NL-aware strategies (which actually is more what we need).I will like to hear from you guys (@nunomaduro @alcaeus ? ;)) what do you think about the whole proposition as well as the suggested implementation. I will be probably be able to offer a PR for this if the change is desired.
The text was updated successfully, but these errors were encountered: