forked from SocialFinanceDigitalLabs/liia-tools
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
basic Annex A pipeline #68
Open
amynickolls
wants to merge
66
commits into
main
Choose a base branch
from
62-pipeline-development
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
66 commits
Select commit
Hold shift + click to select a range
4d07d59
update schemas, add time to Column class
faf2fca
set type on time columns
e0cb559
set type to float
3da2734
coerce blank time errors
3444b18
update .yml and unit tests
48896a4
run python black
81ce969
update schemas
cab8390
update schema
ae2b83c
add schema debugging
4d6a651
remove schema debugging
6331e11
update 2017 schema
c7e5713
add csv reformatter
b29b315
add csv reformatter
8af7385
Merge branch 'school_census_pipeline' of https://github.com/SocialFin…
64b4519
update read_csv in csv reformatter
877871c
update reformat csv
2b0e3c4
update reformat csv
9c23e56
add get_headers function
c1652a9
add debug print statements
d4e059c
update 2021 schema, remove debug prints
85bcf2e
add debug print statements
e32fb9b
update 2021 schema
791069e
store headers in error_summary for debugging
d24ba74
update school 2022 schema
4e2639a
add CIN csv pipeline
9243b49
fix unit tests
a6fff80
update school census schema 23 and 24
4cf24fa
update school schemas
381ff91
update cin csv schemas
30a36e2
update cin csv schemas
449919b
fix tablib error
7ec88e1
replace np.nan with blank
eb23e5a
merge annex_a_pipeline
d7780ce
committing changes to abort merge
16cfe98
merge with cin-dagster
patrick-troy ea3f1d5
Merge branch 'annex-a-dagster' of https://github.com/SocialFinanceDig…
patrick-troy 2713423
importing annex_a to perform load_pipeline_config, task_clean and Loa…
5d671fe
pushing test files
da74ddb
pushing test files
f7bc6b3
added test for check_year_within_range
4979435
added tests for move_current_view and concatenated_view
122e565
add missing list 9 columns
patrick-troy a6cdb1b
basic schema build
amynickolls 991a74b
build_schema finished
amynickolls b777af3
build_schema finished
amynickolls 498cb48
Merge branch '61-annex-a-schema-build' of https://github.com/SocialFi…
amynickolls b70f6c7
Merge branch '61-annex-a-schema-build' of https://github.com/SocialFi…
amynickolls ebaadcd
Merge branch '61-annex-a-schema-build' of https://github.com/SocialFi…
amynickolls cf7cdfe
remove print statements
amynickolls ff65158
comments
amynickolls a3f65eb
schema update
amynickolls 41a4af8
test checkpoint
amynickolls 4aa056c
run test
amynickolls 292c050
Merge branch 'main' into 62-pipeline-development
amynickolls c8d50a3
Merge branch 'allow_list_codes_yml' into 62-pipeline-development
amynickolls 8b1838a
pipeline.json added
amynickolls 0372cad
schema update
amynickolls f908a2f
json edit
amynickolls 739a841
remove year input from schema load
amynickolls 0d51ddf
delete build schema and associated tests
amynickolls 5c90b39
update to fix annex_a pipeline
patrick-troy 1b13d3b
do not allow "current" and "aggregated" datasets to flow into final o…
amynickolls 853257b
schema update
amynickolls cddb10f
switch schema load to ruamel
amynickolls 00dee16
remove whitespace from schema
amynickolls eb41037
removing school census code
amynickolls File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,6 @@ | ||
# Project specific | ||
.idea | ||
.nux/ | ||
|
||
# Byte-compiled / optimized / DLL files | ||
__pycache__/ | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,16 +1,16 @@ | ||
import click as click | ||
|
||
from liiatools.ssda903_pipeline.cli import s903 | ||
from liiatools.cin_census_pipeline.cli import cin_census | ||
from liiatools.ssda903_pipeline.cli import s903 | ||
|
||
|
||
@click.group() | ||
def cli(): | ||
pass | ||
|
||
|
||
cli.add_command(s903) | ||
cli.add_command(cin_census) | ||
cli.add_command(s903) | ||
|
||
if __name__ == "__main__": | ||
cli() |
Empty file.
10,572 changes: 10,572 additions & 0 deletions
10,572
liiatools/annex_a_pipeline/spec/Annex_A_schema.yml
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,47 @@ | ||
import logging | ||
from functools import lru_cache | ||
from pathlib import Path | ||
|
||
from pydantic_yaml import parse_yaml_file_as | ||
from ruamel.yaml import YAML | ||
|
||
yaml = YAML() | ||
yaml.preserve_quotes = True | ||
|
||
from liiatools.common.data import PipelineConfig | ||
from liiatools.common.spec.__data_schema import DataSchema | ||
|
||
__ALL__ = ["load_schema", "DataSchema", "Category", "Column"] | ||
|
||
logger = logging.getLogger(__name__) | ||
|
||
SCHEMA_DIR = Path(__file__).parent | ||
|
||
|
||
@lru_cache | ||
def load_pipeline_config(): | ||
""" | ||
Load the pipeline config file | ||
:return: Parsed pipeline config file | ||
""" | ||
with open(SCHEMA_DIR / "pipeline.json", "rt") as FILE: | ||
return parse_yaml_file_as(PipelineConfig, FILE) | ||
|
||
|
||
@lru_cache | ||
def load_schema() -> DataSchema: | ||
""" | ||
Load the data schema file | ||
:return: The data schema in a DataSchema class | ||
""" | ||
schema_path = Path(SCHEMA_DIR, "Annex_A_schema.yml") | ||
|
||
# If we have no schema files, raise an error | ||
if not schema_path: | ||
raise ValueError(f"No schema files found") | ||
|
||
with open(schema_path, "r", encoding="utf-8") as file: | ||
full_schema = yaml.load(file) | ||
|
||
# Now we can parse the full schema into a DataSchema object from the dict | ||
return DataSchema(**full_schema) |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nit-pick, but all-caps variables are typically constants that are defined elsewhere (like you did with SCHEMA_DIR. Seeing FILE in all caps makes me think it's one of them, but it isn't. Ideally this variable should be lower case to match its function. See here for more info: https://peps.python.org/pep-0008/#constants