-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Books: clean 035 field #167
Comments
I have cleaned the obviously wrong occurrences (typo, wrong value, numbers...) Indeed only CERCER should disappear (once the linking with old Aleph numbers is solved). Also, there is 'http://inspirehep.net/oai2d' not sure this information should be stored here. But I leave it to you to decide, it seems to me this is more a "technical information" about the harvest of the record. For Inspire, we can link in 2 ways:
Hope this helps, Anne |
Great, thank you @agentilb ! I have recomputed the list.
There was another typo for These are the things left to decide or to fix in order to correctly migrate
|
I have recomputed the list of values (based an all records that we need to migrate), and based on the comments above these are the things that need to be done: List of tags to keep: List of tags to fix:
To ignore: |
Strange: all the typos have been corrected on 19/10, and the correction was simply removed during the night after. Do you understand why? |
It looks like ti was a huge multiedit (it updated ~6000 records). It was done during the data, at 11:14 but was executed in the evening due to the large number of records. This multiedit touched only the 035. DO you remember if you or one of your colleagues might have run it? Unfortunately, multi-edit is not "smart" enough to detect if updates have run since the time of the edit until the time of the execution of the set of records that it touches. |
I did several multi-edits that day to clean records.. I don't remember having done one that touched so many records, but that's probably me, though. What I never realised is that multiedit modifications touch all records of the search results, even if the modification doesn't apply to them. I guess that's the cause of this. |
'cern annual report' can also be ignored. |
Regarding the Our colleagues from Indico gave us the corresponding new IDs, however, for past videos I think we can not replace them yet, because some recordings might be stored in folders named based on the old ID. This will be sorted out when we migrate lectures. I think we need to see what we should migrate as part of books, and then do changes only on these records.
Once you decide which of these records should be kept, I can replace the previous |
For: 26 records pointing to old events in indico (agendamaker): https://cds.cern.ch/search?ln=en&cc=Books+%26+Proceedings&sc=1&p=035__9%3A%22AgendaMaker%22 Can you let me know the new Indico ids of those events, so I can check what content is on Indico, and decide what are real proceedings or not? Indeed, I guess most of them will need to be filtered out... |
From the list above, we keep for sure: For: The others are events where there is only transparencies and/or videos, and in many cases, the Indico page has a restricted access. They shouldn't be in the proceedings collection, I believe. |
* Adds new lists for external identifiers in order to determine which values should be allowed and which to be ignored. (closes CERNDocumentServer#167) Signed-off-by: Ludmila Marian <[email protected]>
@agentilb what do you think we do with these records? should we update the 4 that should remain in the proceedings and move the rest in the lectures collections, or do do you have another suggestion? |
* Adds new lists for external identifiers in order to determine which values should be allowed and which to be ignored. (closes #167) Signed-off-by: Ludmila Marian <[email protected]>
Hi, I have moved all records that were not proceedings in Conference for the time being. There is only one record left in the proceedings with 340:'Streaming video'. It links to the Indico where the videos are. |
For the 5 remaining records: https://cds.cern.ch/search?ln=en&cc=Books+%26+Proceedings&sc=1&p=035__9%3A%22AgendaMaker%22 :
Let me know if you want me to help with any of the above actions. |
I've indeed removed the AgendaMaker id and added the link from the video to the proceedings for the 2 records: https://cds.cern.ch/record/821452 and https://cds.cern.ch/record/933801 For: https://cds.cern.ch/record/741349?ln=en if I'm not mistaken, there is no video on CDS and the link seems broken from Indico, do you think something is retrievable from AgendaMaker? Should we keep it. For the DVD holdings, I don't think we decided to delete them, we would need to do some cleaning beforehand, and this will be time consuming. Is it a problem to migrate them? |
For this one: https://cds.cern.ch/record/537931?ln=en I've also linked the CDS videos to the record. |
Hi @agentilb From my side, I think it is ok to close this ticket. I have update 2 or 3 records from agendaMaker to new Indico IDs. I think the DVDs issue is handled in another ticket. Let me know if there is still something to be addressed here. |
Hi @ludmilamarian I noticed one more thing for the 035 field. Otherwise, I thing we can close the ticket. |
@agentilb do you want to ignore completely any 035 that has |
@ludmilamarian If you do not need on your side, I think we can ignore completely 035 when $$9:arxiv since the arxiv number is normally stored in 037. |
* Adds new lists for external identifiers in order to determine which values should be allowed and which to be ignored. (closes #167) Signed-off-by: Ludmila Marian <[email protected]>
parent of #169
blocked by CERNDocumentServer/cds-migrator-kit#15
Currently, these are the possible values in 035__9:
set(['CNUM-INSPIRE', 'IEEECONF', 'inspire', 'INSPIRE-CNUM', 'SCEM', 'arXiv', 'LHCLHC', 'SPIRES', 'DOE', '926408', 'ADMADM', 'Iinspire-CNUM', 'Inspire-CNUM', 'CERCER', 'INDICO.CERN.CH', 'SLACCONF', 'INIS', 'INPIRE-CNUM', 'DLC', 'HAL', 'DESY', 'FIZ', 'WAI01', 'http://inspirehep.net/oai2d', 'Isnpire-CNUM', 'INSPRIE-CNUM', 'SAFARI', 'Isnpire', 'SLAC', 'INSPIRE', 'AgendaMaker', 'KEK', 'Inspire', 'EBL', '290555', 'INSPEC', 'CERN annual report', 'CERN', '273873', 'inspire-CNUM']) but it looks like only the inspire-cnum/inspirecnum are being treated in a slightly different way.
There are several things here that I think @agentilb could help with:
i) the normalization of these values, some of them are obviously typos (
Isnpire
).ii) it looks like inspire appears in various forms, do all relate to the same
inspire_cnum
? If yes, they should be migrated to the same filed in the new data modeliii) the cleaning of these values, as there are also number there, that might be a mistake and they need to be in another subfield
iv) only CERCER should potentially disappear (but only after doing CERNDocumentServer/cds-migrator-kit#15), or there are others that could be ignored?
The text was updated successfully, but these errors were encountered: