Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validating data types #194

Open
olejandro opened this issue Feb 22, 2024 · 7 comments
Open

Validating data types #194

olejandro opened this issue Feb 22, 2024 · 7 comments

Comments

@olejandro
Copy link
Member

Currently the tool may produce invalid input data for GAMS, because it doesn't verify data types. Should we introduce such a verification before export to GAMS?

@olejandro
Copy link
Member Author

@siddharth-krishna what do you think?

@siddharth-krishna
Copy link
Collaborator

Sure, perhaps add a field to times-info.json that specifies the data type of each parameter/set?

Ideally we should use the correct dtypes for pandas column from the time we read in the excel tables: #47
But this sounds like a good start towards that.

@olejandro
Copy link
Member Author

Parameters (value columns) are always numeric, e.g. double. :-) Their indices can vary, but string generally works. Sets are similar to indices of the parameters.

@siddharth-krishna
Copy link
Collaborator

Does it not matter if a year is written out as 2020.0 when in a parameter value column, or a set index column?

@Antti-L
Copy link

Antti-L commented Feb 24, 2024

@siddharth-krishna In the decimal system, it does not matter if you add zero decimals to a numeric value, onto the right side, right of the decimal point. The values are by definition identical. GAMS complies with the decimal system with respect to inputting parameter values in text files. GAMS GDX files may use a binary representation.

However, indexes, i.e. the members of sets are identified by labels, which are basically case insensitive strings. The years indexes are thus also identified by set element labels. and a string "2020.0" is different from the string "2020".

Ahh..., I now realize you probably mean Excel and not GAMS? If a year is represented as a numerical cell in Excel, I think you can just remove any zero decimal fractions (I tested with VEDA2 v3.0 and it does accept any number with zero decimal fractions displayed for years, but raises error at non-zero decimals). But if it is a text value, you should keep it as is, and invalidate e.g "2020.0". You can easily test a cell data type in Excel. Text values in year columns may include also year ranges (e.g. 2020-2050) and comma-separated lists of years.

@SamRWest
Copy link
Collaborator

If it helps, I've been using pandera for dataframe type (and range etc) validation, and it's pretty good. Might be worth a look for this.

@olejandro
Copy link
Member Author

Looks like it may be what we need. Should we be applying it to inputs (i.e. EmbeddedXlTable.dataframe) or outputs (i.e. TimesModel), or both?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants