Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ex/citar-search-pdf-contents to wiki #477

Closed
bdarcus opened this issue Dec 9, 2021 · 35 comments
Closed

Add ex/citar-search-pdf-contents to wiki #477

bdarcus opened this issue Dec 9, 2021 · 35 comments

Comments

@bdarcus
Copy link
Contributor

bdarcus commented Dec 9, 2021

I don't think we necessarily want to add this to citar per se (it's another dependency, and specific to PDF), but this could be a good candidate for the wiki.


Adding this in case others are interested; an idea from a private email.

Is there some way to elegantly:

  • search within some subset of (pdf) files (by, for example, author)
  • browse through the results, perhaps narrow them, and then to open the selected result's pdf at its respective page number

It seems to me a solution that hooks up citar and pdf-tools would be the way to go, and specifically to exploit pdf-occur.

image

It seems like pdf-occur-dired-do-search should be what we need. To use it, use dired to mark the files you want to search, and then run that command.

Just need to figure out which files to mark (using citar), and how much of this can be handled programmatically.

Originally posted by @bdarcus in #476

Turns out the solution is pdf-occur-search.

(defun ex/citar-search-pdf-contents ()
  ;; from localauthor
  "Search pdfs."
  (interactive)
  (let* ((refs (citar-select-refs))         
         (files (citar-file--files-for-multiple-entries refs citar-library-paths '("pdf")))
         (string (read-string "Search string: ")))
    (pdf-occur-search files string t)))
@roshanshariff
Copy link
Collaborator

roshanshariff commented Dec 9, 2021

If I understand correctly, the key step is to get a dired buffer showing all the files for selected entries?

Another way to accomplish this is to run citar-open-library-files, select the entries you want, then when offered the choice of files, run embark-export (which is available through embark-act). This creates a dired buffer with all the associated files, including those listed in the file field of the entry.

At that point you have to mark all the PDFs and you can run pdf-occur-dired-do-search. This last step is a little clunky, but it could be easier if pdf-occur-dired-minor-mode had a command to mark all PDFs (or search all PDFs without marking them).

@bdarcus
Copy link
Contributor Author

bdarcus commented Dec 9, 2021

If I understand correctly, the key step is to get a dired buffer showing all the files for selected entries?

That's right.

So you don't know a way to do this?

@roshanshariff
Copy link
Collaborator

roshanshariff commented Dec 9, 2021

The procedure I gave does this, using citar-open-library-files and embark-export. Do you mean a way to do it without Embark?

The code to look at is in embark-export-dired, and it basically comes down to calling dired with (DIR . FILES), where DIR is the directory ("/" works for this) and FILES is a list of files, which you can get from citar-file--files-for-multiple-entries.

The advantage of doing it this way is that it includes files given in the "file" field, rather than just files in a particular directory.

@bdarcus
Copy link
Contributor Author

bdarcus commented Dec 9, 2021

I mean using dired directly.

I'm not sure that's the best approach, but I wanted to experiment with it.

So, for example:

At that point you have to mark all the PDFs and you can run pdf-occur-dired-do-search.

Yeah, I want to avoid that, given we know upfront which files they are.

This seems related to previous discussions with @oantolin about "mark-all"-like behavior for running actions.

The advantage of doing it this way is that it includes files given in the "file" field, rather than just files in a particular directory.

That's true.

But I don't like the idea of requiring people to mark the files for this case, since the idea is we want to search the pdf contents for all of them.

Any ideas for solving that?

@bdarcus
Copy link
Contributor Author

bdarcus commented Dec 9, 2021

Hmm ... I'll experiment more, but the exported buffer doesn't include all the files; it includes the candidates against which one can run the citar-open-library command. I'm not seeing how to then search the contents of the list of PDFs associated with them.

@bdarcus
Copy link
Contributor Author

bdarcus commented Dec 9, 2021

OK, I see what you're saying. But it's definitely not elegant, with the bigger problem being the fact that on my vertico setup (Doom), once I select one candidate, the prompt is erased, and I have to duplicate the search again. That's not practical if I want files related to, say, 20 candidates.

But I agree the embark-export approach is in theory a good approach.

Do you happen to know how to get vertico to retain the search string?

@localauthor
Copy link
Contributor

localauthor commented Dec 9, 2021

Does this do what you're looking for?

(defun ex/citar-search-pdf-contents ()
  "Search pdfs."
  (interactive)
  (let* ((refs (citar-select-refs))
         (dir
          (if (equal (length citar-library-paths) 1)
              (car citar-library-paths)
            (completing-read "Directory to search: " citar-library-paths)))
         (files (citar-file--files-for-multiple-entries refs (list dir) '("pdf")))
         (string (read-string "String: ")))
    (pdf-occur-search files string t)))

The function pdf-occur-search bypasses dired and the marking file step. Is that the desired behavior?

@bdarcus
Copy link
Contributor Author

bdarcus commented Dec 9, 2021

Is that the desired behavior?

Yes, I believe it is! Here's the result, after entering the search string:

image

Or at least, this is what I was aiming for.

It does have the advantage that it includes the context for the search.

Now that I think about it, it also still seems to have some of the limitations in terms of multiple-candidates that I note above with embark.

We really need a way to run actions on "all" filtered candidates. But I don't believe that's currently possible.

@localauthor
Copy link
Contributor

localauthor commented Dec 9, 2021

What's the problem with running this on all the directories listed in citar-library-paths? Like this:

(defun ex/citar-search-pdf-contents ()
  "Search pdfs."
  (interactive)
  (let* ((refs (citar-select-refs))         
         (files (citar-file--files-for-multiple-entries refs citar-library-paths '("pdf")))
         (string (read-string "String: ")))
    (pdf-occur-search files string t)))

@bdarcus
Copy link
Contributor Author

bdarcus commented Dec 9, 2021

What's the problem with running this on all the directories listed in citar-library-paths?

There's no problem; I just had missed pdf-occur-search, and so was assuming a single directory for use with dired.

I updated the main post with this example code.

@localauthor
Copy link
Contributor

I've been holding out on using emacs for PDFs, but this function might be a tipping point. Great idea.

@bdarcus
Copy link
Contributor Author

bdarcus commented Dec 9, 2021

I added this to the wiki for now. I'm thinking we can use this to develop ideas for new functions.

https://github.com/bdarcus/citar/wiki/Example-functions#pdf-tools-integration

@roshanshariff
Copy link
Collaborator

I've also wanted to run actions on all filtered candidates, but I'm not sure where this functionality should be added. Perhaps consult-completing-read-multiple could have a "select all" command? Embark has embark-minibuffer-candidates and embark-vertico-candidates to collect the candidates, but as you can see this is somewhat specific to the completion system.

@bdarcus
Copy link
Contributor Author

bdarcus commented Dec 9, 2021

I've also wanted to run actions on all filtered candidates, but I'm not sure where this functionality should be added. Perhaps consult-completing-read-multiple could have a "select all" command? Embark has embark-minibuffer-candidates and embark-vertico-candidates to collect the candidates, but as you can see this is somewhat specific to the completion system.

I think @minad and @oantolin had concluded it probably belongs in embark for this reason. See:

oantolin/embark#253

... and in particular oantolin/embark#253 (comment).

@minad
Copy link
Contributor

minad commented Dec 9, 2021

I think @minad and @oantolin had concluded it probably belongs in embark fro this reason. See:

Yes, but it's not yet entirely clear how the exact solution for multi actions could look like. A rough plan:

  1. embark-act-all: A command which acts on all candidates in the current context retrieved from embark-candidate-collectors.
  2. Refine some candidate collectors to support marking. Some collectors like dired already do. For dired if files are marked only the marked files are returned, otherwise all files are returned.
  3. In the context of Vertico add a vertico-mark command and modify the embark--vertico-candidates collector to respect marking. Unfortunately this step is entirely UI dependent. (I've considered another alternative where an embark-select/collect-single command is added to Embark only. This command takes the current target/candidate and adds it to the last Embark collect buffer. After collecting all the candidates the user can run embark-act-all in the collect buffer. But this approach does not sound user friendly.)

Step 2 and 3 are optional, the only moderate addition needed to act on all candidates is embark-act-all. However without 3 I suspect that acting on multiple candidates is not convenient since then the user has to ensure that only relevant candidates remain after filtering.

@bdarcus
Copy link
Contributor Author

bdarcus commented Dec 9, 2021

Step 2 and 3 are optional, the only moderate addition needed to act on all candidates is embark-act-all. However without 3 I suspect that acting on multiple candidates is not convenient since then the user has to ensure that only relevant candidates remain after filtering.

True, but 1 might be a reasonable first step?

At least for this use case (user wants to search the content of PDFs of associated with the selected candidates), it would be, since there's no harm in some non-relevant candidates.

@minad
Copy link
Contributor

minad commented Dec 9, 2021

True, but 1 might be a reasonable first step?

Sure, but it is up to @oantolin if he would accept such a PR. Adding such a command should be a small addition. Maybe a bit of refactoring of embark-act is needed such that the acting logic can be reused for single and multiple candidates. In any case embark-act-all should probably ask the user for confirmation to avoid bad surprises.

If I recall correctly we also discussed another possible solution for multiple candidates, where target finders are modified such that they don't return a single candidate but a list of multiple candidates. I am not sure if that would be better. It seems better to me to reuse the embark-candidate-collectors for the act multi functionality.

@oantolin
Copy link
Contributor

oantolin commented Dec 9, 2021

Reusing embark-candidate-collectors for an embark-act-all sounds good to me. And, as @minad says, it's a relatively small addition that may require refactoring embark-act a bit.

@bdarcus
Copy link
Contributor Author

bdarcus commented Dec 10, 2021

On this @roshanshariff ...

The advantage of doing it this way is that it includes files given in the "file" field, rather than just files in a particular directory.

Am I right we should add something like this first function?

(defun citar-file--files-from-field-multiple (keys-entries)
  "Extract files from field for KEYS-ENTRIES."
  (seq-map
   (lambda (ke)
     (citar-file--parse-file-field (cdr ke) citar-file-variable))
   keys-entries))

(defun ex/citar-search-pdf-contents ()
  "Search pdfs."
  (interactive)
  (let* ((refs (citar-select-refs))
         (files
          (citar-file--files-for-multiple-entries
           refs citar-library-paths '("pdf")))
         (parsed-files
          (citar-file--files-from-field-multiple refs))
         (string (read-string "Search string: ")))
    (pdf-occur-search (concat files parsed-files) string t)))

@oantolin
Copy link
Contributor

oantolin commented Dec 10, 2021

I wrote an embark-act-all command (see oantolin/embark@91e6db4) and bound it to A in the general map (of course, you can also bind it outside the embark keymaps if you wish, like in the minibuffer map for your completion framework). Please test!

@roshanshariff
Copy link
Collaborator

roshanshariff commented Dec 10, 2021

Am I right we should add something like this first function?

citar-file--files-for-multiple-entries already includes files from both sources, the file field and the library directories. So the function that @localauthor wrote above should already do the right thing.

EDIT: I forgot to update its documentation to say that's what it does. I'll open a small PR to fix that.

@roshanshariff
Copy link
Collaborator

roshanshariff commented Dec 10, 2021

@oantolin, do you think it makes sense to be able to mark a command to run once with all the candidates, rather than repeatedly for each candidate? That would be useful here, for running a search over the PDF files of all entries.

Of course, Embark would need to know how to coalesce all the candidates into one minibuffer input, maybe assuming the command uses completing-read-multiple?

EDIT: This is discussed in oantolin/embark#166.

@minad
Copy link
Contributor

minad commented Dec 10, 2021

@roshanshariff Yes, I believe this is something we also considered at some point in the old discussion. For crm commands embark-act-all should join the candidate and pass them at once. This should fit cleanly into the current picture. However it may be a bit problematic to auto detect crm in practice. Furthermore there is the problem that crm string joining is not well defined since we don't know the crm separator. So maybe we need an embark-crm-action variable?

For my taste the current embark-act and embark-act-all code is a bit messy (no offense @oantolin :-P) and it will get worse if we add the crm target injection code path. So I guess we should massage all this a little bit. See also oantolin/embark#419, there are still a few minor issues with the current embark-act-all.

@minad
Copy link
Contributor

minad commented Dec 10, 2021

This reminds me of another point - do we also need a CRM candidate collector which only collects the selected candidates instead of all candidates? At this point the design clashes a bit with a potential vertico-mark command, see #477 (comment). vertico-mark would be usuable in the context of non-crm commands, but it wouldn't make so much sense in the crm context.

Maybe I should deprecate and remove consult-completing-read-multiple and replace it with a proper vertico-crm implementation. In the context of vertico-crm, vertico-mark would play a dual role as candidate selector for crm and as candidate subset selector for the embark candidate collector. In the context of non-crm vertico-mark would only be relevant for the candidate collector. This approach could lead to a uniform crm and completing-read UI.

@roshanshariff
Copy link
Collaborator

@minad, I for one really like the idea of a vertico CRM implementation 👍

The vertico-mark approach opens the door to some really nice UI improvements that seem beyond the reach of consult-crm. Since many Citar commands use CRM, their workflows would be quite elegant. And, of course, you could easily address the issue we have here by having a "mark all candidates" command.

@oantolin
Copy link
Contributor

@minad said:

For my taste the current embark-act and embark-act-all code is a bit messy (no offense @oantolin :-P)

None taken! It's messy for my taste too. In the case of embark-act-all it's because I wanted to get a decent prototype out the door in the time I had yesterday and would clean up later. In the case of embark-act, I don't really have an excuse. 😛

@roshanshariff Yes, we should definitely think of adding a way to pass all targets at once to an action. One issue is that crm-separator is not something you can use to join candidates but a regexp that recognizes such strings. I guess a simple table mapping common values for crm-separator to a string you can use for joining might be good enough. I believe that's what Selectrum uses for example.

Also, I was thinking that maybe the vertico-mark functionality can be added in a completion UI independent way with the old embark-select idea. That idea was to have a command to add a single target to a special Embark collect buffer, and then from that buffer one could run an action on all candidates. To me the only thing that sounds awkward about that is switching to the special collect buffer to run embark-act-all and then probably killing the buffer. This could be remedied by making the buffer ephemeral (= killed automatically when when you quit the minibuffer) like the Embark collect completions buffer, and adding a candidate collector with higher priority that, if the special collect buffer exists, returns those candidates.

@minad
Copy link
Contributor

minad commented Dec 10, 2021

@oantolin yes, this is the embark-select idea I had in mind and we discussed earlier. It does not sound that bad in the way you describe it 😄

I suspect that the UI would be slightly worse than a specifically tailored vertico UI. But with the big advantage of being completion UI independent. And the other big advantage that I don't have to change anything in vertico 😛

@minad
Copy link
Contributor

minad commented Dec 10, 2021

I think we should at least give the UI agnostic approach in embark a try as a prototype. If it turns out that it is not user friendly enough we can still opt for UI specific solutions.

In the case of embark collect, the UI specific solution could rely on marking the candidates directly in the collect buffer. You've mentioned some addition to the tabulated list mode a while ago which allows marking. Maybe this already works in the candidate collector? Does it also work in grid mode?

@bdarcus
Copy link
Contributor Author

bdarcus commented Dec 10, 2021

@oantolin

I wrote an embark-act-all command ...

Thanks! I tested it a bit, and seems to work as expected.

@minad

For crm commands embark-act-all should join the candidate and pass them at once. This should fit cleanly into the current picture.

This indeed was my wonder in terms of next steps.

The obvious cases here where it's important to do this are:

  1. this one; searching many PDFs at once
  2. citar-copy-reference, which adds formatted bibliographic entries to the kill ring; obviously if I do this with 10 of them and yank, I get only one reference (plus we could sort them with a list)

Maybe I should deprecate and remove consult-completing-read-multiple and replace it with a proper vertico-crm implementation. In the context of vertico-crm, vertico-mark would play a dual role as candidate selector for crm and as candidate subset selector for the embark candidate collector. In the context of non-crm vertico-mark would only be relevant for the candidate collector. This approach could lead to a uniform crm and completing-read UI.

How might such a mark UI work vis-a-vis standard selection?

I wonder if, from a user POV, the only difference is with the latter the prompt is reset upon selection, while in the latter it would stick?

We earlier discussed on the selectrum UI the idea to allow something like shift-TAB to preserve the input.

radian-software/selectrum#489 (comment)

Does that have promise?

@minad
Copy link
Contributor

minad commented Dec 10, 2021

@bdarcus

How might such a mark UI work vis-a-vis standard selection?

For crm vertico-mark would be the same as standard selection. For normal cr commands vertico-mark would not be relevant for the return value but only for multi actions via embark-act-all. However I think we should first try to implement the embark-only solution since it will integrate automatically with all different frontends. Only if the result is not satisfactory we should consider UI-specific solutions.

As you probably remember before I made consult-crm I had a vertico-crm prototype. But the prototype was insufficient and didn't work in all scenarios. Therefore I discarded that approach and went with consult-crm. It is a lot less obvious how the perfect vertico-crm UI looks like if you are really implementing it in Vertico. At this point you have a lot of flexibility but I also don't want to blow up the scope of Vertico infinitely. The consult-crm UI already goes a long way with limited means and the advantage that it can be used with different UIs.

We earlier discussed on the selectrum UI the idea to allow something like shift-TAB to preserve the input.

A shift-TAB input preserving submit command can be easily implemented for consult-crm in the user config or like in doom. Similarly one could implement it as part of vertico-crm if such a thing materializes.

@bdarcus
Copy link
Contributor Author

bdarcus commented Dec 10, 2021

I think we should first try to implement the embark-only solution since it will integrate automatically with all different frontends.

Makes sense.

And the idea here is one would mark in a collect buffer? So a workflow something like ...

  1. filter candidates
  2. embark-collect-snapshot
  3. mark candidates somehow (similar to dired, helm, tablist)
  4. run command

...?

A shift-TAB input preserving submit command can be easily implemented for consult-crm in the user config ...

Can you please give me an example?

@minad
Copy link
Contributor

minad commented Dec 10, 2021

Can you please give me an example?

I don't have an example at hand. But the +vertico commands of Doom which are used for consult-completing-read-multiple can store the (minibuffer-contents) and inject the contents again in the next iteration.

@bdarcus
Copy link
Contributor Author

bdarcus commented Dec 10, 2021

WDYM by "next iteration"?

@minad
Copy link
Contributor

minad commented Dec 10, 2021

consult-crm behaves like multiple calls to completing-read in a loop. After each selection a "new iteration" is started. However technically it works differently internally, so iteration is not really the right word. What I should say, after submitting the selection the original input string should be inserted again.

@bdarcus
Copy link
Contributor Author

bdarcus commented Dec 11, 2021

I'm going to close this, since it's now on the wiki, and we've done what we can with it.

But I think this is a really cool little command that demonstrates the value of the clean design of citar more generally. For both reasons, we may want to include it in citar proper at some point.

Separately, I did suggest to iyefrat to include shift-TAB in the doom vertico module, and he liked the idea.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants