FabGuard

FabGuard is a tool that helps verify input files by specifying constraints on input data. This is a first iteration where we are collecting the type of constraints in various simulation input files and deriving the tool requirements. As a first exercise, we are testing a library for data validation of Panda Dataframes, called pandera.

Installation

To install FabGuard, follow these steps:

Clone the FabGuard repository:
Install the required dependencies:

pip install pandera

Test examples

Test the examples in the test_pandera.py file to familiarise yourself with the capabilities of the library. test_pandera.py demonstrates how to test three type of constraints:

simple constraints on columns. The function below can check the following simple constraint:

population > 0
location_type shouldhave one of the following values "conflict_zone", "town", "camp", "forwarding_hub"

def validate_simple_constraints():
    schema = pa.DataFrameSchema(
        {
            "population": Column(float, Check.greater_than(10), nullable=True),
            "location_type": Column(str, Check.isin(["conflict_zone", "town", "camp", "forwarding_hub"])),
        }
    )

    return schema

constraints spanning multiple columns from the same file The function below can check the following constraint using a lambda function:

if location_type == "conflict_zone" then population > 0

def validate_two_dependent_columns():
    schema = pa.DataFrameSchema(
        {
            "population": pa.Column(float, [
                pa.Check(
                    lambda g: g["conflict_zone"] > 0,
                    groupby=["location_type"])], nullable=True, coerce=True),
            "location_type": Column(str, Check.isin(["conflict_zone", "town", "camp", "forwarding_hub"])),
        }
    )

    return schema

constarints spanning multiple files

You can check other examples in test_pandera.py

How-to: Test on your own dataset:

Define a verify function that returns a pandera schema Examples of such functions are the functions given in the Test examples section validate_simple_constraints

2 Call the validator.validate function with the above function and the data frame to be verified:

dfs = util.load_files(["test_data/locations.csv", "test_data/closures.csv"])
validator.validate(validate_simple_constraints, dfs["closures"], "verify_multi.yaml")

where

util.load_files reads the list of files and returns a dictionary of dataframes
validator.validatetakes a validation function, a dataframe, and a yaml output file

List of requirements

metrics across different runs
count function on columns: the value of one column should be the size of a column in one file should be
- conflict_period.length = size(closures.day)
✓ All cities in location.csv should have routes in routes.csv (location.name should be in routes.name1 or routes.name2)
- If locations.location_type == camp then location.name in routes.name1 or location.name is routes.name2
✓ The number of records in location_type is X then data_laypout file contains a linked record:
- if location.location_type == camp then
  - Location.name in data_layout.total and data_layout.name is nonempty.
All columns but one should satisfy the same constraint
✓ All data exists, min-max, All regions have positive values
Check scheme (such that yaml does not break w.r.t identation)

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
benchmarks		benchmarks
core		core
lammps		lammps
llm_tests		llm_tests
test_data		test_data
generic_validator.py		generic_validator.py
readme.md		readme.md
test_pandera.py		test_pandera.py
util.py		util.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FabGuard

Installation

Test examples

How-to: Test on your own dataset:

List of requirements

About

Releases

Packages

Languages

rumineykova/fabguard

Folders and files

Latest commit

History

Repository files navigation

FabGuard

Installation

Test examples

How-to: Test on your own dataset:

List of requirements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages