-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improving archiver space efficiency #3
Comments
The most mainstream video codec (like h.265 and others) could reduce storage usage by just encoding the diff parts between frames. So I think this goal could be completed by the future archiver implemented by the video, which is also in the ROADMAP. What do you think about it? |
But that's a lot of tuning options on video encoding process, maybe the early implementation is not the best one(mostly perhaps) ... 🫣 |
I just tried it on my Linux machine:
|
with the default h265 encoding, the video would take 4 MiB for every 10 minutes. |
so it would take about 200~ MiB for 8 hrs daily usage, 1.4~ GiB for weekly usage. |
the file size of the video always depends on the content, so it would increase with complex content, but I think it would NOT exceed of 10x more spaces. I think it's kind of enough to use for now. 🤩 What do you think about it? @AsterNighT |
That makes sense. Actually I haven't heard of rewind before. |
I tried a few seemingly viable way for screen capturing and video encoding.
|
I dive into the detail about what rewind does..
it really does the same thing: when there are no lots of changes in the content, it drops some picture, fallback to 1 images per 20s~. When there are lost of changes on the screen, it would use the 0.5 fps. |
There are lots of algorithms for image similarity detection.. I have been lost in them. maybe I would make a simple one(Histogram comparison) and a heavy one(opencv).. and make it extensible |
I'm not sure but would manually detecting image similarity outperform encoding it with some encoding algorithm? It will be more tunable indeed. While would it not be a more accessible way, if you would like to compress texts, to use some compressing algorithm than manually detecting text similarity and deduplicate them? |
I don't think simple algorithms like Histogram would be of great effect, consider this: You are reading a very long markdown article on, say, github. There will be load of texts and clearly you would like these texts to be recorded. But the histogram of the article will stay almost the same (after all they are texts only, in this sense they are "similar"). Or rather, maybe the filtering should be done after tesseract, not before. Since after all it is the texts that are searched with, not the image itself. I'm thinking something like "Retaining the word set of the most recently captured X screenshots and calculate the similarity between the current pic and the set". Never done such thing before so I'm not sure if it works. |
The idea of project is quite interesting. I haven't really tested it for long but it seems promising. The greatest drawback now seems to be the archiver storage consumption. It only takes like 3 minites to produce 100MB screenshots.
The screenshots seem to contain lots of duplications. It would be good if there would be a filter before an image is ever archived. A first idea to me is to check for duplicated images with hashes. It is also possible to "rank" the images based on the text extracted from it, but this would require careful research and design.
The text was updated successfully, but these errors were encountered: