-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider programmed completion instead of cache #681
Comments
I though this was an interesting idea so I whipped up a prototype here https://github.com/aikrahguzar/citar/blob/main/citar-no-cache.el It works well enough, not as snappy as using the cache but is responsive even for the 6000 entry bibliography. It has its oddities due to searching before parsing. For example if you search for I am not going to pursue it much farther than this but I will leave it here in case someone wants to use it as a starting point. |
From playing with other implementations, that seems the unavoidable downside of this approach. We've previously discussed possible customization options relating to performance; perhaps this ends up being one of them.
Thanks. If you prefer, I can also open a branch here to put it. I'm agnostic. Hey - can you review #683 if you have a bit of time? |
It won't be snappy unless you make use of QUERY during search. |
Up to you, I don't have any git preferences usually except not wanting to interact too much with it.
I will take a look a little later. |
That is pretty difficult to do well because of the completion styles. They don't give us a regex to run and any assumptions on the completion style can lead to bad results. For example running a regexp assuming basic when the actual style is orderless will lead to most results not appearing. A split query like consult does for async completions can work well, i.e there is an initial segment which is used to generate matches using some simple regex and another one that is filtered using full completion styles. Edit: Actually the situation is pretty bad even now. I don't really know how to handle a query like |
If you allow arbitrary completion styles then you indeed cannot do much about search optimizations. The best is probably
Sure. I do not think that using default completion style makes much sense here. The completion style generally works with the completion table entries; their format is probably arbitrary and optimized for display, not for the completion. I'd just go with a custom completion style here will separate customization to change it. |
I tired something like this and now it is faster. Basically the first word of the query is used to find matches and I put checks in place to make sure the keys like "author" don't generate matches, only the values. Completions styles are used for the whole query string to further narrow the matches. This works pretty well enough even though the fields aren't going to be parsed can still generate matches but that isn't as big a problem I think. |
This is more a polish issue not important ATM, but might be nice if it loaded, say, the last 100 selected references from history initially. I find it a little disconcerting that running @jdtsmith - you might want to play with this from @aikrahguzar? https://github.com/aikrahguzar/citar/blob/main/citar-no-cache.el Seems there's an issue with unicode? |
To me it shows the first entries it finds. I did push a change that caused an infinite loop but that is resolved now. Maybe try again and see if you see some entries initially. The problem with getting entries from history would be that there won't be an actual entry attached to them. |
Very strange, since the actual parsing of the entry is still by parsebib. Edit, I have no exactly matched the arguments to |
Took a quick peek: this seems harder to implement and more error-prone for org-bibtex, since that can have "arbitrary data" outside the I guess you could also describe using |
You can limit regexp matching to node-properties. This will cut off most of the matches inside headline conents. |
Good point. The other problem I see with this approach is it "bakes in" the completion style into the type of regexp in-buffer search being used to find candidates. All the completion styles rely on having the results of Here's an approach that seems sensible for org-bibtex files:
|
One thing that isn't clear to me: how does citar currently handle merging all the information for a given bib-record, like title, keywords, author, etc., so that all that info can be searched together as if it was just a single candidate? |
See Lines 108 to 113 in 731c0ae
It's one of those details I'm unclear how would be handled in this approach. |
Not following this part. We don't use annotation; we use affixation, and only for the "has" prefix. And that has to be real-time. Everything else related to candidate formatting is cached in the But it's why the first load is slower. |
Sorry, now I'm not following, and figure I better before I answer. In a completing-read situation with some references showing like this: is it the case that the "candidate strings" passed to completing-read include all of the formatted author+year+title+bibcode+type+keywords text? And only the flags on the LHS (none here) are added as a prefix? If that's the case, how do you then recover the bib record from the strings that completing-read returns, which might look like:
By parsing them for their bibcodes? Update: by tracing completing-read, I see your trick: you place the bibcode at the beginning of the string then make it invisible. |
Yes, along with some hidden text.
Correct.
By looking up the candidate string in the hash-table returned by Line 550 in 731c0ae
Does that answer your question? |
Thanks, I understand better now. I guess the tricky bit here is that unlike say dabbrev, or file-completion, or command-completion, "what the completion text should be" is a bit harder to pin down for a bib record. Just the bibcode? Author(s) + Title? Author+title+year? Keywords? So I do understand this choice. In terms of caching annotations, that would be relevant if most of the text in the mini-buffer were annotation data, not candidate strings. For example, the candidate string could be (hidden bibcode +) author + title, then everything else is just nicely formatted window-dressing, that can be computed just-in-time for the visible entries, and temporarily cached for good speed scrolling through the list. This is how most consult-* + marginalia stuff works. This would save you the time pre-computing all the formatting, but at the cost of less "match surface". Quite distinct from the current citar approach though. |
Yes, but the problem with doing that is users then reasonably complain they can't see what data is matching. Ivy and helm have display transforms (look at what the candidate strings Annotations are purely window dressing; their content can't be completed against. |
Yes that's definitely a limitation of the window-dressing approach. One major advantage (in my view) of org-bibtex, is you have much richer searching apparatus than just mini buffer-completion at your finger tips, e.g. with org-agenda style matching language (stuff like |
Basic idea
Ihor (see below) suggested using
completing-read
programmed completion instead of a cache, to avoid performance problems with very large reference libraries.I think the idea would be to parse and format completions incrementally.
This would obviously be a major change, but I believe it might be so simple, to start with, as allowing the
citar-select-ref
completion table ("collection" in thecompleting-read
API) to be configurable, though I don't know ATM what other implications that may have, and I can't figure out how best to modify the current code.Requirements
From a user POV, things shouldn't really change; performance of the UI should just get better and, in particular, more scalable.
Specifically, we need to maintain the current ability to:
.... but to add:
Example code
Minimal demos
An example of a very simple programmed completion table, that is not actually dynamic.
And from the docs for
completion-table-dynamic
:See also #159
Full-featured, highly-performant, example
See org-ql-completing-read, and
org-ql-find
.From Ihor
This (caching) is not good enough for really large Org files. Caching everything will be slow no matter what you do given sufficiently large file.
A much faster approach is when search matches are built dynamically: (1) Use re-search-forward of the search term to collect the potentially matching headlines; (2) Limit the number of matches to few hundreds. This is from my experience adapting org-ql to give real-time search for 20Mb Org file. Note that helm-org that is using cache-everything approach is unusable in such scenarios.
Originally posted by @yantar92 in #397 (comment)
That one is just 7.5k bibtex entries. The regexp search + limit matches approach I suggested gives real-time responsiveness on 31k Org headings in my personal notes. So, you can, in fact, get to the snappy performance (with extremely large files).
Originally posted by @yantar92 in #397 (comment)
The text was updated successfully, but these errors were encountered: