Atomic publishing of dataset records #85

LucaCinquini · 2016-04-27T13:01:10Z

Who: Alyn

It seems that if a call to the publication service is interrupted then
it leaves a record in Solr but with a short list of files - this can
be seen in CoG as a discrepancy between the "number of files" in the
summary metadata and the number of files in the actual file list - see
attached image with relevant part of a screenshot. Indeed, when
publishing a large dataset to the index (large enough that the
publication time is long relative to the master-slave sync interval in
Solr), I can watch the number of files grow in CoG.

This has left some items in the CEDA index in an inconsistent state in
our index because I was assuming that if a record had appeared then
publication to the index had succeeded. (Thanks to Katharina for
noticing this.)

Is it possible to make the publication service atomic? It probably
would not be a big problem if Solr does not allow it to be totally
atomic on the scale of the small amount of time it actually takes to
write the Solr document (although I'd be surprised), but could it at
least gather all the necessary information and then write it in a
single call to Solr?

LucaCinquini self-assigned this Apr 27, 2016

LucaCinquini added this to the Release 4.9.0 milestone Apr 27, 2016

LucaCinquini modified the milestones: Release 4.10, Release 4.9.0 May 23, 2016

LucaCinquini modified the milestones: Release 4.13, Release 4.12 Mar 2, 2017

LucaCinquini modified the milestones: Release X.Y.Z, Release 4.13 May 16, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Atomic publishing of dataset records #85

Atomic publishing of dataset records #85

LucaCinquini commented Apr 27, 2016

Atomic publishing of dataset records #85

Atomic publishing of dataset records #85

Comments

LucaCinquini commented Apr 27, 2016