Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft: V6 with rmscene #59

Draft
wants to merge 59 commits into
base: master
Choose a base branch
from
Draft

Conversation

Azeirah
Copy link
Contributor

@Azeirah Azeirah commented Mar 6, 2023

This pull request should not be merged.

I post this here as a pull request to use Github's diff interface.

I think it's best to merge this step-by-step. My proposal is to create a branch along with a respective issue for each (top-level) point below. Some points are good to merge immediately, so they will take only minutes, others will benefit strongly from further discussion, cleanup and testing so they might take a bit longer.

  • Depend on rmscene
  • Poetry.toml should be merged by hand. I believe the only change I made was to add rmscene as a dependency, my merge uses more outdated packages compared to upstream master (lucasrla/remarks)
  • The commit in remarks/conversion/drawing.py should definitely be turned into a pull request. Sometimes a line has only one point, whereas a segment for PyMuPDF requires at least two points. I currently ignore it, but it might be a better alternative to render a point?
    • Exact same scenario occurs in get_ann_max_bound in parsing.py
  • Parsing and rendering .rm v6 documents in parsing.py, remarks.py and utils.py. This is very much a work in progress. It renders lines correctly in a couple of scenarios, and close to correctly in some others. Very poorly in others though.
  • Tests are probably not a good idea to directly copy, because they tend to contain private or copyrighted material.
  • There are a couple of tiny refactors included in the code such as the changes to the process_ocr function in remarks.py, it had an unused parameter. I cleaned this up without affecting any behavior whatsoever.
  • I added an @cache annotation to read_meta_file in utils.py. This reduces filesystem reads for the .metadata file to 1. If you use remarks in bulk this does save some time. I like how Python allows you to optimize unoptimally written code this simply :)

If you agree with the approach @lucasrla, I think it's best if I make the issues and branches myself since I know what code belongs together.

@Azeirah Azeirah marked this pull request as draft March 6, 2023 20:38
@lucasrla
Copy link
Owner

lucasrla commented Mar 8, 2023

Thanks, Laura!

Let me divide your commits into four buckets:

  1. Immediate merges
  • If you submit "Add: Reduce number of file reads" (2583378) as a PR, I'll merge it right away
  1. More info and testing needed
  • Are you sure "Refactor: Remove unnecessary arg from ocr" (bd94728) doesn't break anything? I think the reference had to be updated (but my memory could be wrong):

remarks/remarks/remarks.py

Lines 472 to 473 in 12e1bde

# Update ann_page reference to the OCRed page
ann_page = work_doc[0]

  1. Not necessary anymore
  • "Fix error where NoneType has no len()" (61d5fa1) has already been merged upstream via f3550d2
  1. Work towards supporting v6 .rm files (reMarkable >= 3.0)
  • I'll first update one of my devices to v3.2. Then I'll have a look at your commits and start a separate branch to receive PRs related to v6 .rm files.

Sounds good?

Thanks again!

@lucasrla
Copy link
Owner

lucasrla commented Mar 8, 2023

@Azeirah
Copy link
Contributor Author

Azeirah commented Mar 8, 2023

If you submit "Add: Reduce number of file reads" (2583378) as a PR, I'll merge it right away

I made a PR for the cache annotation.

Are you sure "Refactor: Remove unnecessary arg from ocr" (bd94728) doesn't break anything? I think the reference had to be updated (but my memory could be wrong):

I'm not 100% sure, no. I relied on PyCharm's inspections.

afbeelding

Although I do think it's correct in this case. ann_page gets passed to process_ocr. It is not used in the body. Then, ann_page is redefined to be work_doc[0] and lastly gets returned. In the caller-site, the original ann_page reference is overwritten by this line:

work_doc, ann_page = process_ocr(work_doc, ann_page)

There are no other callers.

@wittmeis
Copy link

Hi Laura, thanks for your prompt reply.

Frankly, I am not using the type folio but it could be that I once hit the type button in the menu. Maybe this is enough to change the coordinate system?

I will check myself first with a simple and clean test notebook. In case it does not work, I can also provide the test as a PR.

@wittmeis
Copy link

wittmeis commented Jul 31, 2023

Sorry but I have another question and I do not know where to post this....

Have you considered to use the USB web API for converting annotated PDFs and notebooks to PDFs?

I have found this library here but have not tried it yet. The advantage would be that RM2 format changes are not important and I would assume that the USB web API is more stable. Moreover, in case of the notebooks the used template would be also part of the PDF. Of course it requires the device to be turned on to do the conversion on the device.

For my personal use-case a possible setup could be:

  • rsync for creating a backup of the device
  • a Python tooling that checks for updated files and triggers the re-conversion of these files using the USB API

@Azeirah
Copy link
Contributor Author

Azeirah commented Jul 31, 2023

Sorry but I have another question and I do not know where to post this....

Have you considered to use the USB web API for converting annotated PDFs and notebooks to PDFs?

I have found this library here but have not tried it yet. The advantage would be that RM2 format changes are not important and I would assume that the USB web API is more stable. Moreover, in case of the notebooks the used template would be also part of the PDF. Of course it requires the device to be turned on to do the conversion on the device.

For my personal use-case a possible setup could be:

* rsync for creating a backup of the device

* a Python tooling that checks for updated files and triggers the re-conversion of these files using the USB API

If it fits your use-case better, then I suppose the web interface is the way to go. It's not the use-case I'm looking for though. I need something that works with the API.

@torbenkeller
Copy link

Hey @Azeirah, awesome work! I really apreciate that! How is the state of this branch? Could I use it without fear?

@Azeirah
Copy link
Contributor Author

Azeirah commented Oct 15, 2023

Hey @Azeirah, awesome work! I really apreciate that! How is the state of this branch? Could I use it without fear?

It mostly works pretty well, these two are the largest limitations:

  • Does not output OCRed text very well or in some cases at all (working on this)
  • Annotated pages are a lot larger than PDF pages so the file looks inconsistent

There might also still be undiscovered bugs.

Overall it's pretty stable, over 80 users of https://scrybble.ink are using this branch to sync their documents to Obsidian.md

@benneti benneti mentioned this pull request Jan 24, 2024
13 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants