-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reducing size #8
Conversation
I fixed CI |
Lint Error is not my fault, The golang-ci version is too old, not compatible with go 1.18.1. please upgrade Github action. i run golang-ci in my local is ok. |
Don't worry, we are aware of the issues, have them on the main Bluge project as well: blugelabs/bluge#112 Hopefully I can get that sorted out, and apply it here as well. I plan to start reviewing this weekend. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Is this going to be backwards compatible with existing indexes? or would users need to rebuild their index?
Yes, the MR not compatible old version. |
OK, this is going to take longer than I originally thought, but I did an overview today and it all seems reasonable. A few initial questions/concerns, meanwhile I'll start trying to go feature by feature and be more thorough.
Thanks again for the contribution, I will dig deeper into the individual features next. |
|
Ah, it is chunk, thanks, that makes sense. |
looks good to me, thanks for the great work @hengfeiyang ! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great. I have done as much review as possible, thank you @voldyman for assisting. I think there is more value in getting this merged, so more people can use it, than there is value in waiting longer for me to review completely.
I will starting fixing the CI so we can proceed.
read.go
Outdated
|
||
metaLenData, err := s.data.Read(int(storedOffset), int(storedOffset+binary.MaxVarintLen64)) | ||
// document chunk coder | ||
thunkI := docNum / uint64(defaultDocumentChunkSize) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we rename to chunkI
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
renamed
OK, so my plan is to do the following tomorrow (please comment if you think there are other steps):
|
Alright I don't have perms to fix the lint issue on this branch, so I'll clean it up after merging |
i merged ci udpate from master |
@hengfeiyang hello, seems like this PR introduced several critical issues in Bluge, see discussion in blugelabs/bluge#123, so the latest version is broken. |
@FZambia i create a new pr sync our version in zinc. |
Thanks! Do you mean that in zinc you already have a version of Bluge which already solves those panics/errors? |
I'm not sure fixed your case, we just fixed some panic. Also i can do more test with your given test code. blugelabs/bluge#123 (comment) |
After that,
we can reduce index size from 28 MB -> 9 MB with the same test data.
on a bigger data set,
we can reduce index size from 900 MB -> 160 MB with the same test data.
test data:
https://github.com/zinclabs/zinc/releases/download/v0.1.1/olympics.ndjson.gz