Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split data and completion caches, change APIs to use citekeys #627

Closed
bdarcus opened this issue Jun 6, 2022 · 19 comments · Fixed by #634
Closed

Split data and completion caches, change APIs to use citekeys #627

bdarcus opened this issue Jun 6, 2022 · 19 comments · Fixed by #634
Labels
enhancement New feature or request refactor request for feedback For in-progress work where we need input
Milestone

Comments

@bdarcus
Copy link
Contributor

bdarcus commented Jun 6, 2022

I went down a rabbit hole with some code experiments in #625 and #626, but maybe it's better to just talk about it generally, to decide if there's any value in the idea, or to reject it.

What I'm proposing here, I realized after opening this, is exactly the same implementation as in org-mode oc-basic. See below: #627 (comment).

There's some demo code at the end of this.

Basic Idea

The (candidate . (key . entry)) candidate structure is complex, and difficult to work with. There were good reasons for it at the time, but perhaps that was only necessary because we used the same cache for both candidates and entry data?

Aside from any performance considerations (which I actually don't think are significant; see below), am I missing something obvious, or would it not result in cleaner, easier to understand and debug code, if we got rid of the current candidate structure, and only use citekeys and hash tables to access everything?

The other huge advantage of this, I learned subsequently, is customizable hash table equality testing; code like this to invalidate the cache:

(unless (equal hash-1 hash-2))

A simple diagram of the hash structures:

citar-hashes

The Problem

This would break code throughout. A quick ripgrep shows 61 instances of "key-entry" in the repo, for example. And I haven't found it easy to adjust that code; which maybe supports my point about it being too complex (mostly because of the candidate structure)?

So it might be possible to make the code simpler, cleaner and easier to maintain in the end, with a more intuitive API, but to get there may be a PITA.

On Performance

I started out playing with this because I thought this could improve performance, and potentially solve a relatively minor issue in #623.

I no longer think this would improve performance overall, for two reasons:

  1. The current candidate structure was actually mostly a performance optimization, since it means you don't need to look up the entry.
  2. The parsebib hash values are, in fact, alists. So once you grab the entry, you're still working with small alists.

Rather, I think the improved and scalable entry access speed of the hashes makes it possible for us to simplify the code and API without losing performance.

Benchmark examples with ~1000 entries:

ELISP> (benchmark-run-compiled 1000 (citar-get-entry "2005")) ; hash version
(0.001436192 0 0.0)

ELISP> (benchmark-run-compiled 1000 (citar--get-entry "2005")) ; list version
(0.26441816199999996 1 0.10594141199999996)

ELISP> (benchmark-run-compiled 1000 (citar-get-value "title" "2005")) ; hash version
(0.002393128 0 0.0)

ELISP> (benchmark-run-compiled 1000 (citar--get-value "title" (citar--get-entry "2005"))) ; list version
(0.939745279 2 0.17277752700000004)

Finally, to confirm @aikrahguzar's profiling, parsebib is reasonably fast; what's slow is creating the candidate strings (and per my point above, there's no real difference between the hash and list versions there).

ELISP> (benchmark-run-compiled 100 (parsebib-parse citar-bibliography))
(18.502350075 96 8.255232515000003)

ELISP> (benchmark-run-compiled 100 (citar--format-candidates)) ; hash version
(67.580158682 153 20.433643597)

ELISP> (benchmark-run-compiled 100 (citar--format-candidates citar-bibliography)) ; list version
(65.927820852 153 17.207595579)

Demo Code

Here is demo code you can eval-buffer on:

(require 'parsebib)
(require 'citar)

;; Internal variables

;; This design is adapted from org-mode 'oc-basic'.

(defvar citarn--bibliography-cache nil
  "Cache for parsed bibliography files.
This is an association list following the pattern:
  (FILE-ID . ENTRIES)
FILE-ID is a cons cell (FILE . HASH), with FILE being the absolute file name of
the bibliography file, and HASH a hash of its contents.
ENTRIES is a hash table with citation references as keys and fields alist as
values.")

(defvar citarn--completion-cache nil
  "Hash with key as completion string, value as citekey.")

;; Internal functions

(defun citarn--create-data-cache (files)
  "Create data cache for FILES."
  (seq-map
   (lambda (file)
     (let* ((path (file-truename file))
            (checksum
             (split-string
              (call-process "md5sum" nil t nil path) " "))) ; TODO doesn't work correctly
       (cons
        (cons file checksum) ; get the checksum here
        (parsebib-parse path))))
   files))

(defun citarn-ref-read-make-candidates (hashes)
  "Return a hash with completion candidates from HASHES."
  (let ((chash (make-hash-table :test #'equal)))
    (dolist (hash hashes)
      (maphash
       (lambda (citekey _entry)
         (puthash (citarn-get-value citekey "title") citekey chash))
       hash))
    ;; return new hash
    chash))

(defun citarn--get-candidates ()
  "Return completion hash."
  (citarn-refresh)
  citarn--completion-cache)

(defun citarn--get-data ()
  "Return hashes."
  (remq 'nil
        (list
         citarn--bibliography-cache)))

(defun citarn-refresh ()
  "Load caches."
  (interactive)
  (unless citarn--bibliography-cache
    (setq citarn--bibliography-cache ; load this first
          (parsebib-parse citar-bibliography)))
  (unless citarn--completion-cache
    (setq citarn--completion-cache
          (citarn-ref-read-make-candidates
           (citarn--get-data)))))

(defun citarn-select-ref ()
  "Select reference, return citekey."
  (let* ((candidates (citarn--get-candidates))
         (choice (completing-read "Ref: " candidates)))
    (gethash choice candidates)))

(defun citarn-get-entry (key)
  "Return entry alist for KEY."
   (seq-some
   (lambda (ht)
     (gethash key ht))
   (citarn--get-data)))

(defun citarn-get-value (key-or-entry field)
  "Retern FIELD value for KEY-OR-ENTRY."
  (let ((entry (if (stringp key-or-entry)
                   (citarn-get-entry key-or-entry)
                 key-or-entry)))
    (cdr (assoc-string field entry))))

;; Interactive commands

(defun citarn-example ()
  "Return title as message."
  (interactive)
  (let* ((choice (citarn-select-ref))
         (title (citarn-get-value choice "title")))
  (message title)))

(citarn-example)

</details>
@bdarcus
Copy link
Contributor Author

bdarcus commented Jun 6, 2022

OK @aikrahguzar and @roshanshariff - after experimenting with this over the weekend, I've closed the two related draft PRs, and opened this issue, where I summarize what I learned. If any comments, let's put them here.

cc @localauthor

@bdarcus bdarcus pinned this issue Jun 6, 2022
@bdarcus bdarcus changed the title refactor: split data and completion caches, change APIs to use citekeys [RFC] refactor: split data and completion caches, change APIs to use citekeys Jun 6, 2022
@aikrahguzar
Copy link
Contributor

My general opinion is that the best cache is not cache at all and so caching needs to be justified by performance and it is justified here. But even once we have caching it should not be used pervasively and only very few functions should directly access the cache. I think we would have had bugs due to functions that were needed to build cache were circularly trying to access cache.

I agree with (candidate . (key . entry)) not being the greatest thing. But I think the correct thing to pass around is not the citekey but the entry. If were not for embark support, we could have local and global caches which are hash-tables with keys the formatted string and values entries. I think citekey is part of the entry alist that can be looked up like any other field. But with embark at point support, this makes looking up citekeys annoying so the two hash tables you have might be a good choice after all but some thought should be given to which functions need the cache to exist: for me there is only one citar-select-ref. With embark another function might be needed but most of the functions shouldn't be poking in the cache and instead operate on the entries returned by these two functions. In my opnion get-value should take an entry as the first argument and not the key.

@bdarcus
Copy link
Contributor Author

bdarcus commented Jun 6, 2022

... once we have caching it should not be used pervasively and only very few functions should directly access the cache.

What do you mean by "directly access"?

In the above strawman code, citar-get-value is indirectly accessing it. Do you see that as a potential problem?

I think we would have had bugs due to functions that were needed to build cache were circularly trying to access cache.

Yes, I've run into that a few times, including when initially adding the cache. It also bit me experimenting on this branch. It's annoying.

In my opnion get-value should take an entry as the first argument and not the key.

That's as currently, but it requires (maybe aesthetically?) awkward code like:

(citar--get-value (citar--get-entry "test") "title")

Users and developers just naturally think in terms of citekeys.

But that's a secondary consideration to other more substantive issues certainly. But ....

We could have local and global caches which are hash-tables with keys the formatted string and values entries.

This indeed is another option. In effect, that would replace the parebib keys with the completion strings.

... some thought should be given to which functions need the cache to exist: for me there is only one citar-select-ref.

That's definitely the thing most performance-sensitive and data-dependent.

I was going to say also the Embark stuff, but that's not true.

BTW, the issue with the "Embark stuff" is broader than Embark; references in document buffers are in fact citekeys. So the same issue would apply with hydra or transient menus that are operating at point, or other functions which work with buffer citations.

And, of course, the bibliographic formats we support (including the CSL JSON) all identify references with such keys. Which is why the parsebib hashes use them as the hash table keys.

It's just that the completion case is the reverse: we don't necessarily remember the citekey, so we use the entry data to create a string to allow us to quickly find it.

@aikrahguzar
Copy link
Contributor

aikrahguzar commented Jun 6, 2022

... once we have caching it should not be used pervasively and only very few functions should directly access the cache.

What do you mean by "directly access"?

In the above strawman code, citar-get-value is indirectly accessing it. Do you see that as a potential problem?

Yes, by direct access I mean that access to cache should not be in their code path and rather these functions should be supplied arguments that come from cache even if is more verbose. But maybe that concern can be mitigated just by renaming it to citar-get-value-from-cache which should make it harder to accidentally use it during building cache since the cache-dependence is explicit. But this works well only if we mark every function that has cache access in code path with such a label.

Users and developers just naturally think in terms of citekeys.

I actually think of authors, and titles mostly, and not having to deal with citekeys is why I use citar. For me citekeys are an annoying detail to how latex deals with references.

BTW, the issue with the "Embark stuff" is broader than Embark; references in document buffers are in fact citekeys. So the same issue would apply with hydra or transient menus that are operating at point, or other functions which work with buffer citations.

And, of course, the bibliographic formats we support (including the CSL JSON) all identify references with such keys. Which is why the parsebib hashes use them as the keys.

This ties back to using citar to avoid dealing with citekeys but for that it needs to be able to translate to (for inserting keys) and from (to tell me what the key actually represents). Selection needs the former while at point actions need the latter but for me there should be just these two functions which make use of the cache and then used as arguments of other functions. But doing this cleanly is difficult because there is no analogue of interactive spec for at point functions.

By the way, I don't think looking up entries is performance critical at all since I don't see any scenario where more than say 10 entries need to be looked up simultaneously, but it should still be fast if that is possible with a small effort.

@bdarcus
Copy link
Contributor Author

bdarcus commented Jun 6, 2022

By the way, I don't think looking up entries is performance critical at all since I don't see any scenario where more than say 10 entries need to be looked up simultaneously, but it should still be fast if that is possible with a small effort.

It's not in the current code, where as you say we mostly pass the entries as arguments through so lookup is not needed.

It would be in the strawman though, since we'd need to get the citekey to get the entry, including for formatting the candidates?

@bdarcus
Copy link
Contributor Author

bdarcus commented Jun 6, 2022

I don't know if this demonstrates anything; I don't know much about caching. But if I eval this in a buffer, it works as expected.

Under what condition(s) would this fail (other than an empty data cache)?

(require 'parsebib)
(require 'citar)

;; Internal variables

;; This design is adapted from org-mode 'oc-basic'.

(defvar citarn--data-cache nil
  "Cache for parsed bibliography files.
This is an association list following the pattern:
  (FILE-ID . ENTRIES)
FILE-ID is a cons cell (FILE . HASH), with FILE being the absolute file name of
the bibliography file, and HASH a hash of its contents.
ENTRIES is a hash table with citation references as keys and fields alist as
values.")

(defvar citarn--completion-cache nil
  "Hash with key as completion string, value as citekey.")

;; Internal functions

(defun citarn--create-data-cache (files)
  "Create data cache for FILES."
  (seq-map
   (lambda (file)
     (let* ((path (file-truename file))
            (checksum
             (split-string
              (call-process "md5sum" nil t nil path) " "))) ; TODO doesn't work correctly
       (cons
        (cons file checksum) ; get the checksum here
        (parsebib-parse path))))
   files))

(defun citarn-ref-read-make-candidates (hashes)
  "Return a hash with completion candidates from HASHES."
  (let ((chash (make-hash-table :test #'equal)))
    (dolist (hash hashes)
      (maphash
       (lambda (citekey _entry)
         (puthash (citarn-get-value citekey "title") citekey chash))
       hash))
    ;; return new hash
    chash))

(defun citarn--get-candidates ()
  "Return completion hash."
  (citarn-refresh)
  citarn--completion-cache)

(defun citarn--get-data ()
  "Return hashes."
  (remq 'nil
        (list
         citarn--data-cache)))

(defun citarn-refresh ()
  "Load caches."
  (interactive)
  (unless citarn--data-cache
    (setq citarn--data-cache ; load this first
          (parsebib-parse citar-bibliography)))
  (unless citarn--completion-cache
    (setq citarn--completion-cache
          (citarn-ref-read-make-candidates
           (citarn--get-data)))))

(defun citarn-select-ref ()
  "Select reference, return citekey."
  (let* ((candidates (citarn--get-candidates))
         (choice (completing-read "Ref: " candidates)))
    (gethash choice candidates)))

(defun citarn-get-entry (key)
  "Return entry alist for KEY."
   (seq-some
   (lambda (ht)
     (gethash key ht))
   (citarn--get-data)))

(defun citarn-get-value (key-or-entry field)
  "Retern FIELD value for KEY-OR-ENTRY."
  (let ((entry (if (stringp key-or-entry)
                   (citarn-get-entry key-or-entry)
                 key-or-entry)))
    (cdr (assoc-string field entry))))

;; Interactive commands

(defun citarn-example ()
  "Return title as message."
  (interactive)
  (let* ((choice (citarn-select-ref))
         (title (citarn-get-value choice "title")))
  (message title)))

(citarn-example)

</details>

@bdarcus
Copy link
Contributor Author

bdarcus commented Jun 7, 2022

A few observations about the org-cite oc-basic code, most notably that it uses the exactly same idea I'm proposing here of two caches: one to grab keys from completion strings, and the other entry alists from keys.

It's well worth looking at that code for ideas.

Details:

First, I was just reminded of this explanation from the org-cite author on how he handled cache invalidation in oc-basic.

The cache key contains a hash of the contents of the bib file. So whenever the bib file is modified, the cache is invalidated, and oc-basic parses again the file.

It looks like this:

((("/home/bruce/test.bib" . "afc432cb9103e4319e1402c2beff5838")
  . #<hash-table equal 778/1095 0x160dc63>))

So he has two hash caches; one for bibliography files, and the other for completion, each with the exact same structure I am proposing here.

Another interesting wrinkle is there's a cache for the bib files: org-cite-basic--file-id-cache, with key the file path, and value the checksum.

(defvar org-cite-basic--file-id-cache nil
  "Hash table linking files to their hash.")
ELISP> (gethash "~/org/bib/academic.bib" org-cite-basic--file-id-cache)
"0f1fa900095a0a6b581197510122cafdfc14a8a4"

In the function that creates the completion table (similar to citar--format-candidates) you can see he first populates the bibliography cache, and something similar in the equivalent to get-entry.

So here you see he's iterating through the cache alist and checking the hashes for the key.

(defun org-cite-basic--get-entry (key &optional info)
  "Return BibTeX entry for KEY, as an association list.
When non-nil, INFO is the export state, as a property list."
  (catch :found
    (pcase-dolist (`(,_ . ,entries) (org-cite-basic--parse-bibliography info))
      (let ((entry (gethash key entries)))
        (when entry (throw :found entry))))
    nil))

Aside: while org-cite supports local and global bib files, it doesn't have separate caches for them.

Finally, it's equivalent of get-value accepts key-or-entry as arg, which is easy enough to do here.

@bdarcus
Copy link
Contributor Author

bdarcus commented Jun 7, 2022

See #628.

Sorry for the frantic activity. I have some free time, and really wanted to figure this out!

@roshanshariff
Copy link
Collaborator

roshanshariff commented Jun 7, 2022

For a long time I've been thinking there shouldn't just be local and global caches, but instead there should be one cache per bib file, stored globally. This way, if multiple buffers share the same local bibliography, it is only parsed once and stored in one place. The cache invalidation can be done either using fsnotify or the hashing scheme you mentioned (though there are some downsides to the latter in the context of Citar that don't apply to org-export).

I had some ideas on how to implement this cleanly (including evicting a bib file from the cache when it is no longer referenced by any open buffer or the global bibliography list). I can probably code it up in the next few days and submit a PR to the new branch.

I also like your proposal of having two hash tables, one keyed by citekeys and the other keyed by "display strings". At this point, we're basically thinking of the cache as an in-memory database (with the bib entries as records) and we want to query the database by citekey (the primary identifier) as well as the completion string. So it makes sense to create those two "indices" into the database.

EDIT: I was also thinking the bib file parsing (and possibly the candidate string generation) could be done in a background emacs process using emacs-async so that the UI doesn't hang for several seconds every time the bib file needs to be re-parsed. This should be fairly easy to implement with the per-file cache I'm proposing.

@bdarcus
Copy link
Contributor Author

bdarcus commented Jun 7, 2022

For a long time I've been thinking there shouldn't just be local and global caches, but instead there should be one cache per bib file, stored globally.

Indeed; I was just thinking about this, and included a note about it on the PR.

I think we should seriously consider that possibility, mostly because org-cite already supports both seamlessly, without having separate caches.

But I don't use local bibs, so @aikrahguzar will have to test it ;-)

EDIT: I was also thinking the bib file parsing (and possibly the candidate string generation) could be done in a background emacs process using emacs-async so that the UI doesn't hang for several seconds every time the bib file needs to be re-parsed. This should be fairly easy to implement with the per-file cache I'm proposing.

That would be cool!

@aikrahguzar
Copy link
Contributor

See #628.

Sorry for the frantic activity. I have some free time, and really wanted to figure this out!

I on the other hand I am about to get busy so I will mostly disappear for a while after this comment. I think the general structure should work but I think from performance point of view both the current approach and new one are not fundamentally different from each other so probably only profiling can tell us which is better.

From the perspective of writing code, I think there are two big points:

  1. Does the refactor reduce the number of lines of code? If it does while keeping the functionality then I think it is worth doing, if it increases then probably not.
  2. New api should makes it hard to use the cache wrongly. Thinking more about it one of the problems is our cache represents more than just the contents of bib files. I think one of the more annoying bugs to track down was one which caused an infinite loop by circularly trying to get associated files out of cache while the the cache was being built and there were a few step in between. Separating auxiliary sources from the cache was one of the things that started this cycle of activity and I think that would be a good thing to come out of it.

@bdarcus
Copy link
Contributor Author

bdarcus commented Jun 7, 2022

Great points @aikrahguzar; I agree on both.

I have a feeling we could reduce LOC a reasonable amount, but that will be one good benchmark.

@roshanshariff
Copy link
Collaborator

I must say I agree with @aikrahguzar on this. I'm on board with restructuring the structure of the cache as you've suggested, @bdarcus, but I don't like the idea of functions throughout citar implicitly accessing the cache whenever they need to. Instead, the cache should be exposed through a small number of functions, and everything else should be supplied arguments with all the information needed to do the job (i.e. the citekey and/or entry).

This approach follows the principle of minimizing global state and cross-coupling of different components. On the surface it might be a little bit more verbose, but I think on the whole it's better to call a function that fetches data from the cache and then pass that data along to other functions. Moreover, you'll often be accessing many pieces of data about an entry:

(citar-get-value "field1" "citekey")
(citar-get-value "field2" "citekey")

Instead of writing code in this style, which has the overhead of repeatedly accessing the cache, it's not much worse to do this

(let ((entry (citar-get-entry "citekey")))
    (citar-get-value "field1" entry)
    (citar-get-value "field2" entry))

If you're coming from an object-oriented language, think of a method like citar.getEntry(citeKey) which returns an entry object from the cache, which has a method entry.getField(fieldName). As opposed to a method like Citar.getEntryField(citekey, fieldName). I would say the first one is more composable and clean.

@bdarcus
Copy link
Contributor Author

bdarcus commented Jun 7, 2022

The initial PR code allows both, following oc-basic.

I think we can decide later if we continue to allow that and document that issue, or we restrict it again to only accept the entry as arg. Even there, however, we'd want to document. This isn't any better, and is arguably worse.

(citar-get-value "field1" (citar-get-entry "citekey"))
(citar-get-value "field2" (citar-get-entry "citekey"))

@bdarcus
Copy link
Contributor Author

bdarcus commented Jun 7, 2022

@aikrahguzar before you get real busy elsewhere, anything in particular to look out for on the embark and multi-select end?

@bdarcus
Copy link
Contributor Author

bdarcus commented Jun 7, 2022

PS - it may be that oc-basic doesn't need a local and global cache because it has the cache for the files?

So can reparse depending?

EDIT: actually, this is what citarn--parse-bibliography returns:

ELISP> (citarn--parse-bibliography)
(("~/org/bib/academic.bib" . #<hash-table equal 778/1095 0x822025>))

So that should be helpful.

@aikrahguzar
Copy link
Contributor

@aikrahguzar before you get real busy elsewhere, anything in particular to look out for on the embark and multi-select end?

I don't think so, they should work as long as single selection is working. Multiple selection use the citar--completion-table which is one one higher layer and it doesn't matter if the internal workings of citar--get-candidates change.

@bdarcus
Copy link
Contributor Author

bdarcus commented Jun 7, 2022

I don't think so, they should work as long as single selection is working. Multiple selection use the citar--completion-table which is one one higher layer and it doesn't matter if the internal workings of citar--get-candidates change.

I got it working, and returning a list of keys regardless of multiple or not.

ELISP> (citar-select-refs)
("naim2005" "huck1977")

ELISP> (citar-select-ref)
("naim2005")

That's what we want; right?

It wouldn't make sense to have a string or a list?

The primary issue I ran into was (length candidates) froze Emacs when it didn't get a list!

bdarcus added a commit that referenced this issue Jun 10, 2022
Sorry for this breaking change, but I wanted to get the foundations
right before tagging 1.0.

This completely restructures the core of citar to borrow some code and
ideas from the org-mode oc-basic package.

In particular, it changes to using two primary caches:

- bibliography
- completion

Both of these now use hash tables, rather than lists.

Caching functionality is also changed, and the API now focuses on
citekeys as arguments for key functions.

Finally, citar--parse-bibliography should re-parse bibliography files
upon change.

Fix #623 Close #627
bdarcus added a commit that referenced this issue Jun 10, 2022
Sorry for this breaking change, but I wanted to get the foundations
right before tagging 1.0.

This completely restructures the core of citar to borrow some code and
ideas from the org-mode oc-basic package.

In particular, it changes to using two primary caches:

- bibliography
- completion

Both of these now use hash tables, rather than lists.

Caching functionality is also changed, and the API now focuses on
citekeys as arguments for key functions.

Finally, citar--parse-bibliography should re-parse bibliography files
upon change.

Fix #623 Close #627
@bdarcus bdarcus added enhancement New feature or request refactor request for feedback For in-progress work where we need input labels Jun 10, 2022
@bdarcus bdarcus changed the title [RFC] refactor: split data and completion caches, change APIs to use citekeys Split data and completion caches, change APIs to use citekeys Jun 10, 2022
@bdarcus bdarcus added this to the v1.0 milestone Jun 11, 2022
bdarcus added a commit that referenced this issue Jun 13, 2022
Sorry for this breaking change, but I wanted to get the foundations
right before tagging 1.0.

This completely restructures the core of citar to borrow some code and
ideas from the org-mode oc-basic package.

In particular, it changes to using two primary caches:

- bibliography
- completion

Both of these now use hash tables, rather than lists.

Caching functionality is also changed, and the API now focuses on
citekeys as arguments for key functions.

Finally, citar--parse-bibliography should re-parse bibliography files
upon change.

Fix #623 Close #627
roshanshariff pushed a commit to roshanshariff/citar that referenced this issue Jun 19, 2022
Sorry for this breaking change, but I wanted to get the foundations
right before tagging 1.0.

This completely restructures the core of citar to borrow some code and
ideas from the org-mode oc-basic package.

In particular, it changes to using two primary caches:

- bibliography
- completion

Both of these now use hash tables, rather than lists.

Caching functionality is also changed, and the API now focuses on
citekeys as arguments for key functions.

Finally, citar--parse-bibliography should re-parse bibliography files
upon change.

Fix emacs-citar#623 Close emacs-citar#627
roshanshariff pushed a commit to roshanshariff/citar that referenced this issue Jun 24, 2022
Sorry for this breaking change, but I wanted to get the foundations
right before tagging 1.0.

This completely restructures the core of citar to borrow some code and
ideas from the org-mode oc-basic package.

In particular, it changes to using two primary caches:

- bibliography
- completion

Both of these now use hash tables, rather than lists.

Caching functionality is also changed, and the API now focuses on
citekeys as arguments for key functions.

Finally, citar--parse-bibliography should re-parse bibliography files
upon change.

Fix emacs-citar#623 Close emacs-citar#627
@bdarcus bdarcus linked a pull request Jun 26, 2022 that will close this issue
6 tasks
@bdarcus bdarcus unpinned this issue Jul 3, 2022
@bdarcus
Copy link
Contributor Author

bdarcus commented Jul 4, 2022

Now that I merged the changes, to go back to this, @aikrahguzar:

Does the refactor reduce the number of lines of code? If it does while keeping the functionality then I think it is worth doing, if it increases then probably not.

There is a small net increase in lines of code, but functionality is increased; I'd say significantly.

That, plus a cleaner and more flexible API, made this worth the breaking change in the end.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request refactor request for feedback For in-progress work where we need input
Projects
None yet
3 participants