Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

basic Annex A pipeline #68

Open
wants to merge 66 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
66 commits
Select commit Hold shift + click to select a range
4d07d59
update schemas, add time to Column class
May 7, 2024
faf2fca
set type on time columns
May 7, 2024
e0cb559
set type to float
May 7, 2024
3da2734
coerce blank time errors
May 8, 2024
3444b18
update .yml and unit tests
May 15, 2024
48896a4
run python black
May 15, 2024
81ce969
update schemas
May 16, 2024
cab8390
update schema
May 16, 2024
ae2b83c
add schema debugging
May 24, 2024
4d6a651
remove schema debugging
May 28, 2024
6331e11
update 2017 schema
May 30, 2024
c7e5713
add csv reformatter
Jun 4, 2024
b29b315
add csv reformatter
Jun 4, 2024
8af7385
Merge branch 'school_census_pipeline' of https://github.com/SocialFin…
Jun 4, 2024
64b4519
update read_csv in csv reformatter
Jun 4, 2024
877871c
update reformat csv
Jun 5, 2024
2b0e3c4
update reformat csv
Jun 5, 2024
9c23e56
add get_headers function
Jun 6, 2024
c1652a9
add debug print statements
Jun 10, 2024
d4e059c
update 2021 schema, remove debug prints
Jun 11, 2024
85bcf2e
add debug print statements
Jun 11, 2024
e32fb9b
update 2021 schema
Jun 12, 2024
791069e
store headers in error_summary for debugging
Jun 12, 2024
d24ba74
update school 2022 schema
Jun 14, 2024
4e2639a
add CIN csv pipeline
Jun 14, 2024
9243b49
fix unit tests
Jun 14, 2024
a6fff80
update school census schema 23 and 24
Jun 18, 2024
4cf24fa
update school schemas
Jun 21, 2024
381ff91
update cin csv schemas
Jun 27, 2024
30a36e2
update cin csv schemas
Jun 27, 2024
449919b
fix tablib error
Jun 27, 2024
7ec88e1
replace np.nan with blank
Jun 28, 2024
eb23e5a
merge annex_a_pipeline
Jul 17, 2024
d7780ce
committing changes to abort merge
Jul 26, 2024
16cfe98
merge with cin-dagster
patrick-troy Jul 26, 2024
ea3f1d5
Merge branch 'annex-a-dagster' of https://github.com/SocialFinanceDig…
patrick-troy Jul 26, 2024
2713423
importing annex_a to perform load_pipeline_config, task_clean and Loa…
Jul 27, 2024
5d671fe
pushing test files
Jul 29, 2024
da74ddb
pushing test files
Jul 29, 2024
f7bc6b3
added test for check_year_within_range
Jul 30, 2024
4979435
added tests for move_current_view and concatenated_view
Aug 5, 2024
122e565
add missing list 9 columns
patrick-troy Oct 18, 2024
a6cdb1b
basic schema build
amynickolls Nov 26, 2024
991a74b
build_schema finished
amynickolls Dec 6, 2024
b777af3
build_schema finished
amynickolls Dec 6, 2024
498cb48
Merge branch '61-annex-a-schema-build' of https://github.com/SocialFi…
amynickolls Dec 6, 2024
b70f6c7
Merge branch '61-annex-a-schema-build' of https://github.com/SocialFi…
amynickolls Dec 6, 2024
ebaadcd
Merge branch '61-annex-a-schema-build' of https://github.com/SocialFi…
amynickolls Dec 6, 2024
cf7cdfe
remove print statements
amynickolls Dec 6, 2024
ff65158
comments
amynickolls Dec 9, 2024
a3f65eb
schema update
amynickolls Dec 9, 2024
41a4af8
test checkpoint
amynickolls Dec 9, 2024
4aa056c
run test
amynickolls Dec 12, 2024
292c050
Merge branch 'main' into 62-pipeline-development
amynickolls Dec 13, 2024
c8d50a3
Merge branch 'allow_list_codes_yml' into 62-pipeline-development
amynickolls Dec 13, 2024
8b1838a
pipeline.json added
amynickolls Dec 16, 2024
0372cad
schema update
amynickolls Dec 18, 2024
f908a2f
json edit
amynickolls Dec 18, 2024
739a841
remove year input from schema load
amynickolls Dec 18, 2024
0d51ddf
delete build schema and associated tests
amynickolls Dec 18, 2024
5c90b39
update to fix annex_a pipeline
patrick-troy Dec 20, 2024
1b13d3b
do not allow "current" and "aggregated" datasets to flow into final o…
amynickolls Jan 2, 2025
853257b
schema update
amynickolls Jan 6, 2025
cddb10f
switch schema load to ruamel
amynickolls Jan 7, 2025
00dee16
remove whitespace from schema
amynickolls Jan 8, 2025
eb41037
removing school census code
amynickolls Jan 9, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
# Project specific
.idea
.nux/

# Byte-compiled / optimized / DLL files
__pycache__/
Expand Down
4 changes: 2 additions & 2 deletions liiatools/__main__.py
Original file line number Diff line number Diff line change
@@ -1,16 +1,16 @@
import click as click

from liiatools.ssda903_pipeline.cli import s903
from liiatools.cin_census_pipeline.cli import cin_census
from liiatools.ssda903_pipeline.cli import s903


@click.group()
def cli():
pass


cli.add_command(s903)
cli.add_command(cin_census)
cli.add_command(s903)

if __name__ == "__main__":
cli()
Empty file.
10,572 changes: 10,572 additions & 0 deletions liiatools/annex_a_pipeline/spec/Annex_A_schema.yml

Large diffs are not rendered by default.

47 changes: 47 additions & 0 deletions liiatools/annex_a_pipeline/spec/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
import logging
from functools import lru_cache
from pathlib import Path

from pydantic_yaml import parse_yaml_file_as
from ruamel.yaml import YAML

yaml = YAML()
yaml.preserve_quotes = True

from liiatools.common.data import PipelineConfig
from liiatools.common.spec.__data_schema import DataSchema

__ALL__ = ["load_schema", "DataSchema", "Category", "Column"]

logger = logging.getLogger(__name__)

SCHEMA_DIR = Path(__file__).parent


@lru_cache
def load_pipeline_config():
"""
Load the pipeline config file
:return: Parsed pipeline config file
"""
with open(SCHEMA_DIR / "pipeline.json", "rt") as FILE:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit-pick, but all-caps variables are typically constants that are defined elsewhere (like you did with SCHEMA_DIR. Seeing FILE in all caps makes me think it's one of them, but it isn't. Ideally this variable should be lower case to match its function. See here for more info: https://peps.python.org/pep-0008/#constants

return parse_yaml_file_as(PipelineConfig, FILE)


@lru_cache
def load_schema() -> DataSchema:
"""
Load the data schema file
:return: The data schema in a DataSchema class
"""
schema_path = Path(SCHEMA_DIR, "Annex_A_schema.yml")

# If we have no schema files, raise an error
if not schema_path:
raise ValueError(f"No schema files found")

with open(schema_path, "r", encoding="utf-8") as file:
full_schema = yaml.load(file)

# Now we can parse the full schema into a DataSchema object from the dict
return DataSchema(**full_schema)
Loading
Loading