Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting rid of data point if thermophysical data is not included #578

Open
barmoral opened this issue Oct 1, 2024 · 0 comments
Open

Getting rid of data point if thermophysical data is not included #578

barmoral opened this issue Oct 1, 2024 · 0 comments

Comments

@barmoral
Copy link
Collaborator

barmoral commented Oct 1, 2024

Is your feature request related to a problem? Please describe.
When using ThermoML dois as input data in evaluator for filtering, sometimes there are no values for pressure or temperature. Because evaluator expects this thermodynamic properties, loading and/or filtering data will rise an error. The error basically arises from the fact that every value of pressure (for example) in every row is getting turned into a physical property object, and if there are no values there, then the code breaks.

Describe the solution you'd like
It would be better that evaluator removes these data points without complete thermodynamic data automatically before the code breaks, or make evaluator accept these with a warning.

Describe alternatives you've considered
I manually removed the data points without complete thermodynamic data by using dropna().

Additional context
I attach to this issue an input json file (sorted_dois.json)

Here is the example python code to replicate the error:

import pandas as pd
import json
from pathlib import Path
from openff.evaluator.datasets import PhysicalProperty, PropertyPhase
from openff.evaluator.datasets.thermoml import thermoml_property
from openff.evaluator import properties
from openff.units import unit
from openff.evaluator.datasets.thermoml import ThermoMLDataSet

@thermoml_property("Osmotic coefficient", supported_phases=PropertyPhase.Liquid | PropertyPhase.Gas)
class OsmoticCoefficient(PhysicalProperty):
    """A class representation of a osmotic coeff property"""

    @classmethod
    def default_unit(cls):
        return unit.dimensionless
setattr(properties, OsmoticCoefficient.__name__, OsmoticCoefficient)

from openff.evaluator.datasets.thermoml import ThermoMLDataSet

CACHED_PROP_PATH = Path('osmotic_data.csv')

if CACHED_PROP_PATH.exists():
    prop_df = pd.read_csv(CACHED_PROP_PATH, index_col=0)
    ## delete rows with undefined thermodynamic parameters to avoid indexing errors
    # prop_df = prop_df.dropna(subset=['Temperature (K)'])
    # prop_df = prop_df.dropna(subset=['Pressure (kPa)'])
    data_set = ThermoMLDataSet.from_pandas(prop_df)
else:
    with open('sorted_dois.json') as f:
        doi_dat = json.load(f)
        data_set = ThermoMLDataSet.from_doi(*doi_dat['working'])

    prop_df = data_set.to_pandas()
    with CACHED_PROP_PATH.open('w') as file:
        prop_df.to_csv(CACHED_PROP_PATH)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant