Extracting comments, highlights according to the colors #820
-
I had raised an issue #819. Would like to continue the discuss here as @JorjMcKie suggested :) So the idea is to have the Comments, Highlights or any other annotations (or the text below them) in a document. I am thinking of atomized notes out of a pdf directly. I am aiming to create a system for myself where I can read a pdf, annotate in the pdf itself, take short notes or maybe questions (for Active Recall), and then when I am done reading the pdf, extract them all in a file. But since, I am interested in atomic notes, I need to have them in separate files all linked in a single file. The reason for having atomized notes are two fold - I get to use same fact/point at multiple places (I am using obsidian) and I can have them directly inserted into ANKI. Currently, with the help of @JorjMcKie , I am able to understand a few things about comments what I had planned to use as the main tool for capturing my notes (irrespective of the device I am on). But I think it can be expanded to have more formatted notes utilizing coloured annotations and even the highlights (the text under them, as I came to understand :) ) So, here are the next questions I have:
Let's have a discussion on how I can proceed or even if its good idea to go for this kind of a system. |
Beta Was this translation helpful? Give feedback.
Replies: 9 comments 28 replies
-
To continue on the text extraction topic: we had been touching Popup annotstions.
|
Beta Was this translation helpful? Give feedback.
-
So, @dummifiedme - you can check the popup existence for every annotation type (via |
Beta Was this translation helpful? Give feedback.
-
Let me help out: |
Beta Was this translation helpful? Give feedback.
-
You could have used The first is the default. To set the second, use |
Beta Was this translation helpful? Give feedback.
-
best memorize the coordinates where you found the text and store it together in an intermediate list. When thru with the page, sort that list to your liking and write the text to its destination. |
Beta Was this translation helpful? Give feedback.
-
One quick question. |
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
As per your regex point: My personal position towards regex is more of the type: (1) avoid using it, (2) if you think you absolutely need it: think again. Or, as the Python documentation words it: |
Beta Was this translation helpful? Give feedback.
-
With help from @JorjMcKie and #318, I am able to fix myself with a way to extract comments(text annotations) and highlighted text from a pdf. I still would like to implement two more things:
As for point 1, as @JorjMcKie explained, I can see the color property of an annotation (gives both stroke and filled colour) but I dont yet know how to classify them (the colours) in categories (such as red, orange, blue, yellow, blue etc). Not sure, but I think I can define a colour using maybe a dict? But still, I woud like to have a range such that light red to dark red would be "red" and similarly violet to dark blue should be "blue". How can I do that? For point 2, I can see the type of all the annotations, but dont yet know how to capture the image under a rect bounded by it. The text under a box is understood (#318) but what about the images? If I just want to capture a screenshot of anything that is inside the "Square" or any shape. Also, can we capture the "ink" type annotations in an image form? If point 2 is satisfied, we can even draw a square round our ink annots and get them inserted in the note! Seems awesome to me :p |
Beta Was this translation helpful? Give feedback.
One quick question.
I had added some images inside the "TEXT" type annotations, are they accessible by any chance? The code breaks whenever it hits an image, I think. How can I include those images?