CMOR_MAX_STRING #530

wachsylon · 2019-08-09T09:00:22Z

Hi,
is there a reason why CMOR_MAX_STRING is 1024 or could we increase it?

Line 11 in 829a78a

#define CMOR_MAX_STRING 1024

Best regards,
Fabi

taylor13 · 2019-08-09T15:12:22Z

Originally, we had different string lengths for different attributes, but that go simplified, as I recall, so that all have the same length 1024 (internally in CMOR). I think there was some issue with these strings occupying lots of storage, so we didn't make it huge. I don't think there is a fundamental reason for limiting the attributes to 1024 characters, but it would be better to not have a bunch of unused space reserved for such things as source_id and experiment_id. Which attribute needs more space?

wachsylon · 2019-08-12T08:13:07Z

We discussed to specify a paper for references .

taylor13 · 2019-08-12T15:15:02Z

Would it be difficult to double the length of the references attribute, but not the others? Would this cause any problems? Would doubling be sufficient? Note that if any single attribute is very long it will make the ncdump -h result less easy to digest.

mauzey1 · 2019-09-16T19:34:47Z

There have been changes made in CMIP6_CV.json in cmip6-cmor-tables that add strings with lengths that exceed CMOR's current max length of 1024. This has caused tests in that repo to fail.

I think we should consider increasing CMOR_MAX_LENGTH to 2048 or 4096 to allow longer strings for attributes. I'm not sure how much that would impact file size.

@durack1 @doutriaux1 Do you know why 1024 was selected for the maximum string length? Do you think it could be increased now?

taylor13 · 2019-09-16T19:38:41Z

Trouble is CMOR_MAX_LENGTH is for almost all strings created by CMOR, and the internal table is already huge with wasted empty space. I don't think we want to double it. A better solution would be to define different groups of string variables with different max. length.

durack1 · 2019-09-16T20:46:10Z

@mauzey1 I tend to agree with @taylor13 that we should think a little about memory usage when the source attribute is the only string that needs changing, other fields (such as activity_participation etc) are controlled, well under the 1024 limit and will not change. Only the institution_id and source in the source_id are likely to need expansion.

doutriaux1 · 2019-09-16T20:50:11Z

@taylor13 is right the tables size is now huge we should create groups:

CMOR_TINY_LENGTH=16
CMOR_SHORT_LNGTH=32
CMOR_MESSAGE_LENGTH=128
CMOR_PARAGRAPH_LENGTH=1024
CMOR_TEXT_LENGTH=2048
CMOR_BLOG_LENGTH=4096

Or someting like that.

doutriaux1 · 2019-09-16T20:50:44Z

most can probably fit in the first 3 and it would save memory and speed up things

durack1 · 2019-09-19T00:09:10Z

@mauzey1 as part of the fix for this, we'll need to implement a test so that the nightly changes to CMIP6_CV.json are tested and not merged in the case that such an error occurs

taylor13 · 2024-05-06T22:11:26Z

I support the @doutriaux1 suggestion above. As I recall the original FORTRAN version was implemented that way. We would need to document the max length of each string under the control of users (or in the CV's read by CMOR).

durack1 · 2024-05-07T13:51:37Z

@taylor13 with a pure python implementation string lengths would not be fixed at compilation time, so the technical challenge would be removed (python can dynamically size any string as required), but we'd need to implement some guidance so ridiculous entries are flagged/warning at a minimum, and potentially error is particularly egregious cases

taylor13 · 2024-05-07T14:19:05Z

Good point. I understood that we were dropping the FORTRAN option, but didn't realize that nothing would be written in C. Will pure python be fast enough? Can it be parallelized, or aren't we going to pursue that?

durack1 · 2024-05-07T16:03:17Z

@taylor13 @mauzey1 had planned to test the performance by leveraging the latest libraries associated with xarray, so DASK, etc. From my understanding, the current performance bottlenecks relate to the netcdf library, but this will be important to benchmark as we progress. I know that @matthew-mizielinski has been monitoring the CMOR performance on their new systems, so it will be important to keep track of this as we go

mauzey1 mentioned this issue Sep 16, 2019

Entries in CMIP6_CV are exceeding CMOR's max string length. PCMDI/cmip6-cmor-tables#259

Closed

durack1 mentioned this issue Sep 19, 2019

HARD CRASH when loading latest version of the cmip6 tables #543

Closed

mauzey1 added this to the 4.0/Future milestone Mar 14, 2020

mauzey1 mentioned this issue Nov 12, 2021

Illegal instruction error when region attribute is not found in the CV #638

Closed

durack1 mentioned this issue Jul 20, 2022

E3SM-2-0's source string is too long for CMOR PCMDI/cmip6-cmor-tables#377

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CMOR_MAX_STRING #530

CMOR_MAX_STRING #530

wachsylon commented Aug 9, 2019

taylor13 commented Aug 9, 2019

wachsylon commented Aug 12, 2019

taylor13 commented Aug 12, 2019

mauzey1 commented Sep 16, 2019 •

edited

Loading

taylor13 commented Sep 16, 2019

durack1 commented Sep 16, 2019

doutriaux1 commented Sep 16, 2019

doutriaux1 commented Sep 16, 2019

durack1 commented Sep 19, 2019

taylor13 commented May 6, 2024

durack1 commented May 7, 2024

taylor13 commented May 7, 2024

durack1 commented May 7, 2024

CMOR_MAX_STRING #530

CMOR_MAX_STRING #530

Comments

wachsylon commented Aug 9, 2019

taylor13 commented Aug 9, 2019

wachsylon commented Aug 12, 2019

taylor13 commented Aug 12, 2019

mauzey1 commented Sep 16, 2019 • edited Loading

taylor13 commented Sep 16, 2019

durack1 commented Sep 16, 2019

doutriaux1 commented Sep 16, 2019

doutriaux1 commented Sep 16, 2019

durack1 commented Sep 19, 2019

taylor13 commented May 6, 2024

durack1 commented May 7, 2024

taylor13 commented May 7, 2024

durack1 commented May 7, 2024

mauzey1 commented Sep 16, 2019 •

edited

Loading