You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
When using ThermoML dois as input data in evaluator for filtering, sometimes there are no values for pressure or temperature. Because evaluator expects this thermodynamic properties, loading and/or filtering data will rise an error. The error basically arises from the fact that every value of pressure (for example) in every row is getting turned into a physical property object, and if there are no values there, then the code breaks.
Describe the solution you'd like
It would be better that evaluator removes these data points without complete thermodynamic data automatically before the code breaks, or make evaluator accept these with a warning.
Describe alternatives you've considered
I manually removed the data points without complete thermodynamic data by using dropna().
Additional context
I attach to this issue an input json file (sorted_dois.json)
Here is the example python code to replicate the error:
import pandas as pd
import json
from pathlib import Path
from openff.evaluator.datasets import PhysicalProperty, PropertyPhase
from openff.evaluator.datasets.thermoml import thermoml_property
from openff.evaluator import properties
from openff.units import unit
from openff.evaluator.datasets.thermoml import ThermoMLDataSet
@thermoml_property("Osmotic coefficient", supported_phases=PropertyPhase.Liquid | PropertyPhase.Gas)
class OsmoticCoefficient(PhysicalProperty):
"""A class representation of a osmotic coeff property"""
@classmethod
def default_unit(cls):
return unit.dimensionless
setattr(properties, OsmoticCoefficient.__name__, OsmoticCoefficient)
from openff.evaluator.datasets.thermoml import ThermoMLDataSet
CACHED_PROP_PATH = Path('osmotic_data.csv')
if CACHED_PROP_PATH.exists():
prop_df = pd.read_csv(CACHED_PROP_PATH, index_col=0)
## delete rows with undefined thermodynamic parameters to avoid indexing errors
# prop_df = prop_df.dropna(subset=['Temperature (K)'])
# prop_df = prop_df.dropna(subset=['Pressure (kPa)'])
data_set = ThermoMLDataSet.from_pandas(prop_df)
else:
with open('sorted_dois.json') as f:
doi_dat = json.load(f)
data_set = ThermoMLDataSet.from_doi(*doi_dat['working'])
prop_df = data_set.to_pandas()
with CACHED_PROP_PATH.open('w') as file:
prop_df.to_csv(CACHED_PROP_PATH)
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem? Please describe.
When using ThermoML dois as input data in evaluator for filtering, sometimes there are no values for pressure or temperature. Because evaluator expects this thermodynamic properties, loading and/or filtering data will rise an error. The error basically arises from the fact that every value of pressure (for example) in every row is getting turned into a physical property object, and if there are no values there, then the code breaks.
Describe the solution you'd like
It would be better that evaluator removes these data points without complete thermodynamic data automatically before the code breaks, or make evaluator accept these with a warning.
Describe alternatives you've considered
I manually removed the data points without complete thermodynamic data by using dropna().
Additional context
I attach to this issue an input json file (sorted_dois.json)
Here is the example python code to replicate the error:
The text was updated successfully, but these errors were encountered: