Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ingest Migration - FacDB script sources #1290

Open
7 of 9 tasks
fvankrieken opened this issue Nov 27, 2024 · 5 comments · May be fixed by #1313
Open
7 of 9 tasks

Ingest Migration - FacDB script sources #1290

fvankrieken opened this issue Nov 27, 2024 · 5 comments · May be fixed by #1313
Assignees

Comments

@fvankrieken
Copy link
Contributor

fvankrieken commented Nov 27, 2024

A satisfying group to migrate, because we get to delete the scripts, meaning higher test coverage of dcpy!! Reduce that denominator

  • dcas_ipis (holdover in this issue from original scope, which was just grab bag of script sources)
  • bpl_libraries
  • dcp_facilities_with_unmapped
  • dcp_colp (not really script source, but there is one that was temporarily used for formatting issues)
  • dcp_sfpsd
  • hra_centers (remove)
  • nypl_libraries
  • nysed_nonpublicenrollment
  • usdot_airports
@fvankrieken
Copy link
Contributor Author

dcas_ipis

Seems like we have one positive difference - these fields were always str, but now they're being properly cast to str without decimal place

________________________________________________________________________________
Tables
    Left: dcas_ipis_library
    Right: dcas_ipis_ingest
________________________________________________________________________________
Row count
    Left: 17229
    Right: 17229
________________________________________________________________________________
Column comparison
    Both
        agency
        alternate_address
        bbl
        block
        boro
        borough
        cd
        data_library_version
        house_number
        lot
        ogc_fid
        owned/leased
        owner
        parcel_name
        primary_use_code
        primary_use_text
        street_name
        u_a_use_code
        u_a_use_text
        u_f_use_code
        u_f_use_text
        unnamed:_22
        use_status
        x_coord
        y_coord
    Left only: None
    Right only: None
    Type differences
        Borough
            Left: character varying
            Right: text
        Y coord
            Left: character varying
            Right: text
        House number
            Left: character varying
            Right: text
        Parcel name
            Left: character varying
            Right: text
        X coord
            Left: character varying
            Right: text
        Street name
            Left: character varying
            Right: text
        Lot
            Left: character varying
            Right: text
        Unnamed: 22
            Left: character varying
            Right: double precision
        Owned/leased
            Left: character varying
            Right: text
        U f use code
            Left: character varying
            Right: text
        Primary use text
            Left: character varying
            Right: text
        Cd
            Left: character varying
            Right: text
        Block
            Left: character varying
            Right: text
        U a use code
            Left: character varying
            Right: text
        Boro
            Left: character varying
            Right: text
        Agency
            Left: character varying
            Right: text
        Owner
            Left: character varying
            Right: text
        U f use text
            Left: character varying
            Right: text
        Alternate address
            Left: character varying
            Right: text
        U a use text
            Left: character varying
            Right: text
        Bbl
            Left: character varying
            Right: text
        Primary use code
            Left: character varying
            Right: text
        Use status
            Left: character varying
            Right: text
________________________________________________________________________________
Data comparison
    Key columns
        bbl
        agency
        primary_use_code
        owned/leased
    Left only: None
    Right only: None
    Columns with diffs
        u_a_use_code
        u_f_use_code
    Differences by column
        U a use code
            792 rows. First 20 shown
                                                               left right
            bbl        agency primary_use_code owned/leased              
            1000020002 DSBS   1900             O             1910.0  1910
            1000020023 EDC    1900             O             1910.0  1910
            1000160001 EDC    1900             O             1910.0  1910
            1000160012 EDC    1900             O             1920.0  1920
            1000360018 EDC    1900             O             1920.0  1920
            1000730002 EDC    1900             O             1910.0  1910
            1000730008 EDC    1900             O             1910.0  1910
            1000730010 EDC    1342             O             1920.0  1920
            1000730014 EDC    1900             O             1910.0  1910
            1000740001 EDC    1900             O             1910.0  1910
            1000740020 EDC    1900             O             1910.0  1910
            1000950101 DSBS   1900             O             1910.0  1910
                       EDC    412              O             1910.0  1910
            1000960001 EDC    1900             O             1910.0  1910
            1000960005 EDC    1900             O             1910.0  1910
            1000960008 EDC    1900             O             1910.0  1910
            1000960012 EDC    1900             O             1910.0  1910
            1000960013 EDC    1900             O             1910.0  1910
            1000970007 EDC    1900             O             1910.0  1910
            1001070060 EDC    1342             O             1920.0  1920
        U f use code
            1561 rows. First 20 shown
                                                               left right
            bbl        agency primary_use_code owned/leased              
            1000020023 DSBS   1900             O             1610.0  1610
                       EDC    1900             O             1610.0  1610
            1000160001 EDC    1900             O             1610.0  1610
            1000360012 DSBS   1530             O             1610.0  1610
                       EDC    1500             O             1610.0  1610
            1000360018 DSBS   1900             O             1610.0  1610
                       EDC    1900             O             1610.0  1610
            1000360025 DSBS   1500             O             1610.0  1610
                       EDC    1500             O             1610.0  1610
            1000360030 DSBS   1500             O             1610.0  1610
                       EDC    1500             O             1610.0  1610
            1000730002 DSBS   1900             O             1610.0  1610
                       EDC    1900             O             1610.0  1610
            1000730008 EDC    1900             O             1610.0  1610
            1000730011 DSBS   1510             O             1610.0  1610
                       EDC    1510             O             1610.0  1610
            1000730014 DSBS   1900             O             1610.0  1610
                       EDC    1900             O             1610.0  1610
            1000730017 DSBS   1530             O             1610.0  1610
            1000730028 DSBS   1530             O             1610.0  1610

@fvankrieken
Copy link
Contributor Author

fvankrieken commented Dec 4, 2024

bpl_libraries

________________________________________________________________________________
Tables
    Left: bpl_libraries_library
    Right: bpl_libraries_ingest
________________________________________________________________________________
Row count
    Left: 62
    Right: 62
________________________________________________________________________________
Column comparison
    Both
        access
        address
        branchid
        branchlist
        bus
        closingmsg
        data_library_version
        friday
        hours
        monday
        ogc_fid
        path
        phone
        pickuploc
        saturday
        subway
        sunday
        tags
        thursday
        title
        tuesday
        wednesday
        wkb_geometry
    Left only
        latitude
        longitude
        position
    Right only: None
    Type differences
        Access
            Left: character varying
            Right: text
        Monday
            Left: character varying
            Right: text
        Wednesday
            Left: character varying
            Right: text
        Subway
            Left: character varying
            Right: text
        Pickuploc
            Left: character varying
            Right: text
        Closingmsg
            Left: character varying
            Right: text
        Tags
            Left: character varying
            Right: text
        Branchid
            Left: character varying
            Right: text
        Thursday
            Left: character varying
            Right: text
        Path
            Left: character varying
            Right: text
        Tuesday
            Left: character varying
            Right: text
        Branchlist
            Left: character varying
            Right: text
        Phone
            Left: character varying
            Right: text
        Friday
            Left: character varying
            Right: text
        Bus
            Left: character varying
            Right: text
        Hours
            Left: character varying
            Right: text
        Saturday
            Left: character varying
            Right: text
        Sunday
            Left: character varying
            Right: text
        Address
            Left: character varying
            Right: text
        Title
            Left: character varying
            Right: text
________________________________________________________________________________
Data comparison
    Compared columns
        access
        address
        branchid
        branchlist
        bus
        closingmsg
        friday
        hours
        monday
        path
        phone
        pickuploc
        saturday
        subway
        sunday
        tags
        thursday
        title
        tuesday
        wednesday
        wkb_geometry
    Left only
        Empty DataFrame
        Columns: [access, monday, wednesday, subway, pickuploc, wkb_geometry, closingmsg, tags, branchid, thursday, path, tuesday, branchlist, phone, friday, bus, hours, saturday, sunday, address, title]
        Index: []
    Right only
        Empty DataFrame
        Columns: [access, monday, wednesday, subway, pickuploc, wkb_geometry, closingmsg, tags, branchid, thursday, path, tuesday, branchlist, phone, friday, bus, hours, saturday, sunday, address, title]
        Index: []

3 missing columns. "position" is the original geom column, which is preserved in library (along with "wkb_geometry") but not in ingest. Lat and long columns are created as intermediates by the script en route to creating "wkb_geometry" column

rerurn with csv instead of library pgdump

________________________________________________________________________________
Tables
    Left: bpl_libraries_library
    Right: bpl_libraries_ingest
________________________________________________________________________________
Row count
    Left: 63
    Right: 63
________________________________________________________________________________
Column comparison
    Both
        access
        address
        branchid
        branchlist
        bus
        closingmsg
        data_library_version
        friday
        hours
        monday
        ogc_fid
        path
        phone
        pickuploc
        saturday
        subway
        sunday
        tags
        thursday
        title
        tuesday
        wednesday
    Left only
        WKT
        latitude
        longitude
        position
    Right only
        wkb_geometry
    Type differences: None
________________________________________________________________________________
Data comparison
    Compared columns
        access
        address
        branchid
        branchlist
        bus
        closingmsg
        friday
        hours
        monday
        path
        phone
        pickuploc
        saturday
        subway
        sunday
        tags
        thursday
        title
        tuesday
        wednesday
    Ignored columns
        ogc_fid
        data_library_version
    Columns coerced to numeric: None
    Left only
        Empty DataFrame
        Columns: [tags, address, saturday, branchlist, access, wednesday, bus, friday, branchid, monday, subway, hours, path, thursday, title, pickuploc, phone, sunday, closingmsg, tuesday]
        Index: []
    Right only
        Empty DataFrame
        Columns: [tags, address, saturday, branchlist, access, wednesday, bus, friday, branchid, monday, subway, hours, path, thursday, title, pickuploc, phone, sunday, closingmsg, tuesday]
        Index: []

@fvankrieken
Copy link
Contributor Author

fvankrieken commented Dec 5, 2024

dcp_faciliities_with_unmapped. Used as pg_dump in facdb for qa comparison

________________________________________________________________________________
Tables
    Left: dcp_facilities_with_unmapped_library
    Right: dcp_facilities_with_unmapped_ingest
________________________________________________________________________________
Row count
    Left: 35387
    Right: 35387
________________________________________________________________________________
Column comparison
    Both
        address
        addressnum
        bbl
        bin
        boro
        borocode
        capacity
        captype
        cd
        city
        council
        ct2010
        ct2020
        data_library_version
        datasource
        facdomain
        facgroup
        facname
        facsubgrp
        factype
        latitude
        longitude
        nta2010
        nta2020
        ogc_fid
        opabbrev
        opname
        optype
        overabbrev
        overagency
        overlevel
        policeprct
        schooldist
        servarea
        streetname
        uid
        wkb_geometry
        xcoord
        ycoord
        zipcode
    Left only: None
    Right only: None
    Type differences
        Latitude
            Left: character varying
            Right: text
        Datasource
            Left: character varying
            Right: text
        Streetname
            Left: character varying
            Right: text
        Address
            Left: character varying
            Right: text
        Facname
            Left: character varying
            Right: text
        Overagency
            Left: character varying
            Right: text
        Ct2010
            Left: character varying
            Right: text
        Optype
            Left: character varying
            Right: text
        Cd
            Left: character varying
            Right: text
        Ct2020
            Left: character varying
            Right: text
        Captype
            Left: character varying
            Right: text
        Servarea
            Left: character varying
            Right: text
        Facdomain
            Left: character varying
            Right: text
        Xcoord
            Left: character varying
            Right: text
        Boro
            Left: character varying
            Right: text
        Facsubgrp
            Left: character varying
            Right: text
        Bin
            Left: character varying
            Right: text
        Addressnum
            Left: character varying
            Right: text
        Uid
            Left: character varying
            Right: text
        Capacity
            Left: character varying
            Right: text
        Nta2010
            Left: character varying
            Right: text
        Council
            Left: character varying
            Right: text
        Ycoord
            Left: character varying
            Right: text
        Factype
            Left: character varying
            Right: text
        Overabbrev
            Left: character varying
            Right: text
        Borocode
            Left: character varying
            Right: text
        Schooldist
            Left: character varying
            Right: text
        Bbl
            Left: character varying
            Right: text
        Opname
            Left: character varying
            Right: text
        City
            Left: character varying
            Right: text
        Policeprct
            Left: character varying
            Right: text
        Zipcode
            Left: character varying
            Right: text
        Facgroup
            Left: character varying
            Right: text
        Overlevel
            Left: character varying
            Right: text
        Opabbrev
            Left: character varying
            Right: text
        Longitude
            Left: character varying
            Right: text
        Nta2020
            Left: character varying
            Right: text
________________________________________________________________________________
Data comparison
    Key columns
        uid
    Left only: None
    Right only: None
    Columns with diffs
        bbl
        bin
        borocode
        capacity
        cd
        council
        ct2010
        ct2020
        facname
        opname
        policeprct
        schooldist
        wkb_geometry
        zipcode
    Differences by column
        Facname
            37 rows. First 20 shown
                                                                                                          left                                                           right
            uid                                                                                                                                                               
            019b633dbb2251d015c833be9b6c10db    CHILDRENâS IN VICTORY DAY CARE CORP AT 644 WEST 204TH STREET    CHILDREN’S IN VICTORY DAY CARE CORP AT 644 WEST 204TH STREET
            046735d9445866063503af537510e6e7                            ST. STEPHENâS MISSION CHURCH COMPLEX                            ST. STEPHEN’S MISSION CHURCH COMPLEX
            052f4a21504e3be0850db5a242360474                     YMCA COMMUNITY MARKET â NORTH BROOKLYN YMCA                     YMCA COMMUNITY MARKET – NORTH BROOKLYN YMCA
            0ee24e7c0ba031653bd608d4956fa6a8  MUSEUM OF JEWISH HERITAGE â A LIVING MEMORIAL TO THE HOLOCAUST  MUSEUM OF JEWISH HERITAGE – A LIVING MEMORIAL TO THE HOLOCAUST
            1c60d069a64606126b4a8f607b02184b                    FAMILY SERVICES CALL CENTER, QUEENSÂ SATELLITE                   FAMILY SERVICES CALL CENTER, QUEENS SATELLITE
            2d38a00870f4fa612abe1644cad9e73a                              ZETA CHARTER SCHOOL â TREMONT PARK                              ZETA CHARTER SCHOOL – TREMONT PARK
            39f14a64840ea4c630fa29ef4546e0f1                                                  YELED Vâ YALDA                                                  YELED V’ YALDA
            58899dfec8d3c1622f39b10573eb0524                                  P.S. 3 Ãâ¬â CHARRETTE SCHOOL                                     P.S. 3 €“ CHARRETTE SCHOOL
            5a5a3d7efec160dc3cdac85257ca8661             NW COÂ  9TH AVE & WEST 59TH STREET FOOD DROP-OFF SITE            NW CO  9TH AVE & WEST 59TH STREET FOOD DROP-OFF SITE
            6240fd0c66fc1b583854f7699028bec5            THE YOUNG WOMENâS LEADERSHIP SCHOOL OF STATEN ISLAND            THE YOUNG WOMEN’S LEADERSHIP SCHOOL OF STATEN ISLAND
            625796cefc3294daf85e4e09abbd58e0                                           RAYMOND OâCONNOR PARK                                           RAYMOND O’CONNOR PARK
            64282e66ed3a213dc570493b97031509                                      GOD̉‰
                                                                                         ¢S LOVE WE DELIVER                                         GOD€™S LOVE WE DELIVER
            6451290002962a0585e613bf88ef2587                       YMCA COMMUNITY MARKET â CONEY ISLAND YMCA                       YMCA COMMUNITY MARKET – CONEY ISLAND YMCA
            64e5cd3622fb6f2ab0f95ab3b01ae86c                  SUCCESS ACADEMY CHARTER SCHOOL â BRONX 5 UPPER                  SUCCESS ACADEMY CHARTER SCHOOL – BRONX 5 UPPER
            64fad3a444968263ae4138a189bdc152                                ZETA CHARTER SCHOOL â MOUNT EDEN                                ZETA CHARTER SCHOOL – MOUNT EDEN
            721984dfdfcca5497082ef58dc7260bc                                              YOLANDA GARCÃA PARK                                             YOLANDA GARCÍA PARK
            749b7d21562393e0a8cfa77f0b3aa72e                                    OUR SAVIOUR̉‰
                                                                                               ¢S ATONEMENT                                       OUR SAVIOUR€™S ATONEMENT
            776f6ec88e2fa1fa58bfc7b79f11f3a0                BROOKLYN COMMUNITY PRIDE CY Ãâ¬â CROWN HEIGHTS                   BROOKLYN COMMUNITY PRIDE CY €“ CROWN HEIGHTS
            79e027d580588308d38946c142047d7a                                    EL BATEY DE DOÃA PROVI GARDEN                                   EL BATEY DE DOÑA PROVI GARDEN
            7e9f7f96a960d6c9cd7c681e511e574b                                     FAR ROCKAWAY NO. 2, FIELD\n\n                                 FAR ROCKAWAY NO. 2, FIELD\r\r\n
        Ct2010
            34493 rows. First 20 shown
                                                 left  right
            uid                                             
            000081aff7e44860160371b7a6b79e9a  17010.0  17010
            000105fd986377e37a8f95abfdbda92d  43000.0  43000
            000186821f899fef28e05c9ca2182621   6500.0   6500
            0002d3b5bb64af5fbc4d28e67dd33e1a   5000.0   5000
            00056520e4c5483e13933cda8c8a736d  27000.0  27000
            0005cf2135820de8d20395875ca0e506   5400.0   5400
            0006e49b4ecf531c88f5dec8bd7d0364  58900.0  58900
            000898872e62bfd79bb2e5fb4f6fbb38  62000.0  62000
            00093367a9d66ff799c9cc47b368dd16   1800.0   1800
            00098ca3934642d37267e38f73e5818a  13300.0  13300
            000997dc329383998a57f53346b0612d   4401.0   4401
            000a2f27e18e1ee6b1d8ba78872169f2   5200.0   5200
            000b602b4850d4acb4905b495308b051  70600.0  70600
            000d687da861e4489d08e446a99b5e62  24302.0  24302
            000dbf675f6b389e6d83b001d43391e0   1402.0   1402
            000f2b19260e07073e2f0e7ec07944e7   9300.0   9300
            000ff3bb08f400c998fee0bad9b600cf  10900.0  10900
            001009f03f06f8f5bb27a99657f8c4d6  48500.0  48500
            00116519f4ca9c6bff07af18349e4a8e  22600.0  22600
            001425fed2ed0005ffb556c8d448fd8c   9601.0   9601
        Cd
            34493 rows. First 20 shown
                                               left right
            uid                                          
            000081aff7e44860160371b7a6b79e9a  503.0   503
            000105fd986377e37a8f95abfdbda92d  311.0   311
            000186821f899fef28e05c9ca2182621  201.0   201
            0002d3b5bb64af5fbc4d28e67dd33e1a  106.0   106
            00056520e4c5483e13933cda8c8a736d  311.0   311
            0005cf2135820de8d20395875ca0e506  105.0   105
            0006e49b4ecf531c88f5dec8bd7d0364  301.0   301
            000898872e62bfd79bb2e5fb4f6fbb38  315.0   315
            00093367a9d66ff799c9cc47b368dd16  409.0   409
            00098ca3934642d37267e38f73e5818a  104.0   104
            000997dc329383998a57f53346b0612d  409.0   409
            000a2f27e18e1ee6b1d8ba78872169f2  105.0   105
            000b602b4850d4acb4905b495308b051  318.0   318
            000d687da861e4489d08e446a99b5e62  110.0   110
            000dbf675f6b389e6d83b001d43391e0  103.0   103
            000f2b19260e07073e2f0e7ec07944e7  202.0   202
            000ff3bb08f400c998fee0bad9b600cf  105.0   105
            001009f03f06f8f5bb27a99657f8c4d6  301.0   301
            00116519f4ca9c6bff07af18349e4a8e  503.0   503
            001425fed2ed0005ffb556c8d448fd8c  502.0   502
        Ct2020
            34493 rows. First 20 shown
                                                 left  right
            uid                                             
            000081aff7e44860160371b7a6b79e9a  17014.0  17014
            000105fd986377e37a8f95abfdbda92d  43000.0  43000
            000186821f899fef28e05c9ca2182621   6500.0   6500
            0002d3b5bb64af5fbc4d28e67dd33e1a   5000.0   5000
            00056520e4c5483e13933cda8c8a736d  27000.0  27000
            0005cf2135820de8d20395875ca0e506   5400.0   5400
            0006e49b4ecf531c88f5dec8bd7d0364  58902.0  58902
            000898872e62bfd79bb2e5fb4f6fbb38  62000.0  62000
            00093367a9d66ff799c9cc47b368dd16   1800.0   1800
            00098ca3934642d37267e38f73e5818a  13300.0  13300
            000997dc329383998a57f53346b0612d   4401.0   4401
            000a2f27e18e1ee6b1d8ba78872169f2   5200.0   5200
            000b602b4850d4acb4905b495308b051  70601.0  70601
            000d687da861e4489d08e446a99b5e62  24302.0  24302
            000dbf675f6b389e6d83b001d43391e0   1402.0   1402
            000f2b19260e07073e2f0e7ec07944e7   9301.0   9301
            000ff3bb08f400c998fee0bad9b600cf  10900.0  10900
            001009f03f06f8f5bb27a99657f8c4d6  48500.0  48500
            00116519f4ca9c6bff07af18349e4a8e  22602.0  22602
            001425fed2ed0005ffb556c8d448fd8c   9601.0   9601
        Bin
            29864 rows. First 20 shown
                                                   left    right
            uid                                                 
            000081aff7e44860160371b7a6b79e9a  5079651.0  5079651
            000186821f899fef28e05c9ca2182621  2000933.0  2000933
            0002d3b5bb64af5fbc4d28e67dd33e1a  1017828.0  1017828
            00056520e4c5483e13933cda8c8a736d  3162352.0  3162352
            0005cf2135820de8d20395875ca0e506  1080631.0  1080631
            0006e49b4ecf531c88f5dec8bd7d0364  3398537.0  3398537
            000898872e62bfd79bb2e5fb4f6fbb38  3246519.0  3246519
            00098ca3934642d37267e38f73e5818a  1090939.0  1090939
            000a2f27e18e1ee6b1d8ba78872169f2  1016184.0  1016184
            000b602b4850d4acb4905b495308b051  3238055.0  3238055
            000d687da861e4489d08e446a99b5e62  1088672.0  1088672
            000dbf675f6b389e6d83b001d43391e0  1089451.0  1089451
            000f2b19260e07073e2f0e7ec07944e7  2006692.0  2006692
            000ff3bb08f400c998fee0bad9b600cf  1014454.0  1014454
            001009f03f06f8f5bb27a99657f8c4d6  3400489.0  3400489
            00116519f4ca9c6bff07af18349e4a8e  5086699.0  5086699
            001425fed2ed0005ffb556c8d448fd8c  5107505.0  5107505
            0017eef2e888306cb01c2f7770ef4cca  4216197.0  4216197
            00193d4c8867ba07d718b49b5ace6b20  3002801.0  3002801
            001c4ba76ffa6f279bc6aa53f2ca8c98  1084132.0  1084132
        Capacity
            1922 rows. First 20 shown
                                                left right
            uid                                           
            000081aff7e44860160371b7a6b79e9a  1579.0  1579
            0006e49b4ecf531c88f5dec8bd7d0364   271.0   271
            000898872e62bfd79bb2e5fb4f6fbb38   333.0   333
            000b602b4850d4acb4905b495308b051   132.0   132
            000f2b19260e07073e2f0e7ec07944e7   309.0   309
            00116519f4ca9c6bff07af18349e4a8e   672.0   672
            001cff28e6888be61db7b0f789b4f831   377.0   377
            0022398677714f48146da0b4b5fd6e48   442.0   442
            002b07719e22476023ebd9035eed81b8   134.0   134
            002b7e6c8b7355d93de5729ee065bd87   241.0   241
            003d88803dd0803d6544b786183e43c0   118.0   118
            0046d4d36ee8fa16aadbde802c291367    42.0    42
            004c784636b86b2a0ea2791c7a5dd156   138.0   138
            004e8d22e0663c2e48257e0ee8c73cee   531.0   531
            0051cdabfd117e137fd752cdb685211f   592.0   592
            0052ddb7b36320d5b081389d1e4feb0c    54.0    54
            006e96fba058a92aa95a7d872724475f   230.0   230
            0079ef4c55b2d8c2d1bd3b917710da1a   123.0   123
            007d1e4602a59970cdfdaf7ad3aa2ad4   108.0   108
            0082cd7068bba4504c416117dfb485a6   617.0   617
        Council
            34485 rows. First 20 shown
                                              left right
            uid                                         
            000081aff7e44860160371b7a6b79e9a  51.0    51
            000105fd986377e37a8f95abfdbda92d  44.0    44
            000186821f899fef28e05c9ca2182621  17.0    17
            0002d3b5bb64af5fbc4d28e67dd33e1a   2.0     2
            00056520e4c5483e13933cda8c8a736d  44.0    44
            0005cf2135820de8d20395875ca0e506   3.0     3
            0006e49b4ecf531c88f5dec8bd7d0364  33.0    33
            000898872e62bfd79bb2e5fb4f6fbb38  48.0    48
            00093367a9d66ff799c9cc47b368dd16  32.0    32
            00098ca3934642d37267e38f73e5818a   3.0     3
            000997dc329383998a57f53346b0612d  32.0    32
            000a2f27e18e1ee6b1d8ba78872169f2   2.0     2
            000b602b4850d4acb4905b495308b051  46.0    46
            000d687da861e4489d08e446a99b5e62   9.0     9
            000dbf675f6b389e6d83b001d43391e0   1.0     1
            000f2b19260e07073e2f0e7ec07944e7  17.0    17
            000ff3bb08f400c998fee0bad9b600cf   3.0     3
            001009f03f06f8f5bb27a99657f8c4d6  34.0    34
            00116519f4ca9c6bff07af18349e4a8e  51.0    51
            001425fed2ed0005ffb556c8d448fd8c  50.0    50
        Borocode
            35222 rows. First 20 shown
                                             left right
            uid                                        
            000081aff7e44860160371b7a6b79e9a  5.0     5
            000105fd986377e37a8f95abfdbda92d  3.0     3
            000186821f899fef28e05c9ca2182621  2.0     2
            0002d3b5bb64af5fbc4d28e67dd33e1a  1.0     1
            00056520e4c5483e13933cda8c8a736d  3.0     3
            0005cf2135820de8d20395875ca0e506  1.0     1
            0006e49b4ecf531c88f5dec8bd7d0364  3.0     3
            000898872e62bfd79bb2e5fb4f6fbb38  3.0     3
            00093367a9d66ff799c9cc47b368dd16  4.0     4
            00098ca3934642d37267e38f73e5818a  1.0     1
            000997dc329383998a57f53346b0612d  4.0     4
            000a2f27e18e1ee6b1d8ba78872169f2  1.0     1
            000b602b4850d4acb4905b495308b051  3.0     3
            000d687da861e4489d08e446a99b5e62  1.0     1
            000dbf675f6b389e6d83b001d43391e0  1.0     1
            000f2b19260e07073e2f0e7ec07944e7  2.0     2
            000ff3bb08f400c998fee0bad9b600cf  1.0     1
            001009f03f06f8f5bb27a99657f8c4d6  3.0     3
            00116519f4ca9c6bff07af18349e4a8e  5.0     5
            001425fed2ed0005ffb556c8d448fd8c  5.0     5
        Schooldist
            34491 rows. First 20 shown
                                              left right
            uid                                         
            000081aff7e44860160371b7a6b79e9a  31.0    31
            000105fd986377e37a8f95abfdbda92d  21.0    21
            000186821f899fef28e05c9ca2182621   7.0     7
            0002d3b5bb64af5fbc4d28e67dd33e1a   2.0     2
            00056520e4c5483e13933cda8c8a736d  20.0    20
            0005cf2135820de8d20395875ca0e506   2.0     2
            0006e49b4ecf531c88f5dec8bd7d0364  14.0    14
            000898872e62bfd79bb2e5fb4f6fbb38  22.0    22
            00093367a9d66ff799c9cc47b368dd16  27.0    27
            00098ca3934642d37267e38f73e5818a   2.0     2
            000997dc329383998a57f53346b0612d  27.0    27
            000a2f27e18e1ee6b1d8ba78872169f2   2.0     2
            000b602b4850d4acb4905b495308b051  22.0    22
            000d687da861e4489d08e446a99b5e62   5.0     5
            000dbf675f6b389e6d83b001d43391e0   1.0     1
            000f2b19260e07073e2f0e7ec07944e7   8.0     8
            000ff3bb08f400c998fee0bad9b600cf   2.0     2
            001009f03f06f8f5bb27a99657f8c4d6  14.0    14
            00116519f4ca9c6bff07af18349e4a8e  31.0    31
            001425fed2ed0005ffb556c8d448fd8c  31.0    31
        Bbl
            33297 rows. First 20 shown
                                                      left       right
            uid                                                       
            000081aff7e44860160371b7a6b79e9a  5060500002.0  5060500002
            000105fd986377e37a8f95abfdbda92d  3065950033.0  3065950033
            000186821f899fef28e05c9ca2182621  2023317501.0  2023317501
            0002d3b5bb64af5fbc4d28e67dd33e1a  1008720057.0  1008720057
            00056520e4c5483e13933cda8c8a736d  3062640030.0  3062640030
            0005cf2135820de8d20395875ca0e506  1008180027.0  1008180027
            0006e49b4ecf531c88f5dec8bd7d0364  3026120133.0  3026120133
            000898872e62bfd79bb2e5fb4f6fbb38  3087520035.0  3087520035
            00093367a9d66ff799c9cc47b368dd16  4089290005.0  4089290005
            00098ca3934642d37267e38f73e5818a  1010420054.0  1010420054
            000a2f27e18e1ee6b1d8ba78872169f2  1008480061.0  1008480061
            000b602b4850d4acb4905b495308b051  3084460019.0  3084460019
            000d687da861e4489d08e446a99b5e62  1021060003.0  1021060003
            000dbf675f6b389e6d83b001d43391e0  1003477501.0  1003477501
            000f2b19260e07073e2f0e7ec07944e7  2027660031.0  2027660031
            000ff3bb08f400c998fee0bad9b600cf  1007870011.0  1007870011
            001009f03f06f8f5bb27a99657f8c4d6  3030920022.0  3030920022
            00116519f4ca9c6bff07af18349e4a8e  5071200197.0  5071200197
            001425fed2ed0005ffb556c8d448fd8c  5032150001.0  5032150001
            0017eef2e888306cb01c2f7770ef4cca  4101510007.0  4101510007
        Wkb geometry
            35387 rows. First 20 shown
                                             ordering_equal spatially_equal left_geom_type right_geom_type
            uid                                                                                           
            000081aff7e44860160371b7a6b79e9a           None            None           None        ST_Point
            000105fd986377e37a8f95abfdbda92d           None            None           None        ST_Point
            000186821f899fef28e05c9ca2182621           None            None           None        ST_Point
            0002d3b5bb64af5fbc4d28e67dd33e1a           None            None           None        ST_Point
            00056520e4c5483e13933cda8c8a736d           None            None           None        ST_Point
            0005cf2135820de8d20395875ca0e506           None            None           None        ST_Point
            0006e49b4ecf531c88f5dec8bd7d0364           None            None           None        ST_Point
            000898872e62bfd79bb2e5fb4f6fbb38           None            None           None        ST_Point
            00093367a9d66ff799c9cc47b368dd16           None            None           None        ST_Point
            00098ca3934642d37267e38f73e5818a           None            None           None        ST_Point
            000997dc329383998a57f53346b0612d           None            None           None        ST_Point
            000a2f27e18e1ee6b1d8ba78872169f2           None            None           None        ST_Point
            000b602b4850d4acb4905b495308b051           None            None           None        ST_Point
            000d687da861e4489d08e446a99b5e62           None            None           None        ST_Point
            000dbf675f6b389e6d83b001d43391e0           None            None           None        ST_Point
            002a7df949e57fb493e9fd54efcb62a1           None            None           None        ST_Point
            002b07719e22476023ebd9035eed81b8           None            None           None        ST_Point
            002b7e6c8b7355d93de5729ee065bd87           None            None           None        ST_Point
            002d78d14b022647cfd6d0182f58cd32           None            None           None        ST_Point
            002dbff28913a333eccec060f70b8b72           None            None           None        ST_Point
        Opname
                                                                                                          left                                                           right
            uid                                                                                                                                                               
            019b633dbb2251d015c833be9b6c10db    CHILDRENâS IN VICTORY DAY CARE CORP AT 644 WEST 204TH STREET    CHILDREN’S IN VICTORY DAY CARE CORP AT 644 WEST 204TH STREET
            052f4a21504e3be0850db5a242360474                     YMCA Community Market â North Brooklyn YMCA                     YMCA Community Market – North Brooklyn YMCA
            0ee24e7c0ba031653bd608d4956fa6a8  Museum of Jewish Heritage â A Living Memorial to the Holocaust  Museum of Jewish Heritage – A Living Memorial to the Holocaust
            2d38a00870f4fa612abe1644cad9e73a                              Zeta Charter School â Tremont Park                              Zeta Charter School – Tremont Park
            39f14a64840ea4c630fa29ef4546e0f1                                                  YELED Vâ YALDA                                                  YELED V’ YALDA
            6451290002962a0585e613bf88ef2587                       YMCA Community Market â Coney Island YMCA                       YMCA Community Market – Coney Island YMCA
            64e5cd3622fb6f2ab0f95ab3b01ae86c                  Success Academy Charter School â Bronx 5 Upper                  Success Academy Charter School – Bronx 5 Upper
            64fad3a444968263ae4138a189bdc152                                Zeta Charter School â Mount Eden                                Zeta Charter School – Mount Eden
            9cd7c22e84d181fcf1ecf2d7917b3f13                  Success Academy Charter School â Bronx 5 Lower                  Success Academy Charter School – Bronx 5 Lower
            f9555dad4badbe18194abc58163eefc0                                   Los Colibríes Community Garden                                  Los Colibríes Community Garden
        Policeprct
            34493 rows. First 20 shown
                                               left right
            uid                                          
            000081aff7e44860160371b7a6b79e9a  123.0   123
            000105fd986377e37a8f95abfdbda92d   62.0    62
            000186821f899fef28e05c9ca2182621   40.0    40
            0002d3b5bb64af5fbc4d28e67dd33e1a   13.0    13
            00056520e4c5483e13933cda8c8a736d   62.0    62
            0005cf2135820de8d20395875ca0e506   13.0    13
            0006e49b4ecf531c88f5dec8bd7d0364   94.0    94
            000898872e62bfd79bb2e5fb4f6fbb38   61.0    61
            00093367a9d66ff799c9cc47b368dd16  102.0   102
            00098ca3934642d37267e38f73e5818a   18.0    18
            000997dc329383998a57f53346b0612d  102.0   102
            000a2f27e18e1ee6b1d8ba78872169f2   13.0    13
            000b602b4850d4acb4905b495308b051   63.0    63
            000d687da861e4489d08e446a99b5e62   32.0    32
            000dbf675f6b389e6d83b001d43391e0    7.0     7
            000f2b19260e07073e2f0e7ec07944e7   41.0    41
            000ff3bb08f400c998fee0bad9b600cf   14.0    14
            001009f03f06f8f5bb27a99657f8c4d6   90.0    90
            00116519f4ca9c6bff07af18349e4a8e  123.0   123
            001425fed2ed0005ffb556c8d448fd8c  122.0   122
        Zipcode
            35036 rows. First 20 shown
                                                 left  right
            uid                                             
            000081aff7e44860160371b7a6b79e9a  10312.0  10312
            000105fd986377e37a8f95abfdbda92d  11204.0  11204
            000186821f899fef28e05c9ca2182621  10451.0  10451
            0002d3b5bb64af5fbc4d28e67dd33e1a  10003.0  10003
            00056520e4c5483e13933cda8c8a736d  11214.0  11214
            0005cf2135820de8d20395875ca0e506  10011.0  10011
            0006e49b4ecf531c88f5dec8bd7d0364  11222.0  11222
            000898872e62bfd79bb2e5fb4f6fbb38  11235.0  11235
            00093367a9d66ff799c9cc47b368dd16  11421.0  11421
            00098ca3934642d37267e38f73e5818a  10019.0  10019
            000997dc329383998a57f53346b0612d  11416.0  11416
            000a2f27e18e1ee6b1d8ba78872169f2  10003.0  10003
            000b602b4850d4acb4905b495308b051  11234.0  11234
            000d687da861e4489d08e446a99b5e62  10039.0  10039
            000dbf675f6b389e6d83b001d43391e0  10002.0  10002
            000f2b19260e07073e2f0e7ec07944e7  10474.0  10474
            000ff3bb08f400c998fee0bad9b600cf  10018.0  10018
            001009f03f06f8f5bb27a99657f8c4d6  11206.0  11206
            00116519f4ca9c6bff07af18349e4a8e  10309.0  10309
            001425fed2ed0005ffb556c8d448fd8c  10305.0  10305

This dataset is ONLY used to QA facdb build

Who would have thought that our own would be a funny one

  • geoms in library are all null
  • all of these str/int fields come though library as strings with decimal places
  • our script read it in as iso-8859-1, though it seems that utf-8 is actually correct. Still a couple odd characters, but less than in library

I'll keep types consistent but leave the changes as they are, with one exception - capacity is coerced to int, while longitude and latitude are coerced to double within facdb qaqc. I'm going to move that to ingest and correct facdb code accordingly.

@fvankrieken fvankrieken linked a pull request Dec 10, 2024 that will close this issue
@fvankrieken fvankrieken changed the title Ingest Migration - Grab Bag of script sources Ingest Migration - DevDB script sources Dec 18, 2024
@fvankrieken fvankrieken changed the title Ingest Migration - DevDB script sources Ingest Migration - FacDB script sources Dec 18, 2024
@fvankrieken fvankrieken self-assigned this Dec 18, 2024
@fvankrieken
Copy link
Contributor Author

fvankrieken commented Dec 18, 2024

nysed_nonpublicenrollment

________________________________________________________________________________
Tables
    Left: nysed_nonpublicenrollment_library
    Right: nysed_nonpublicenrollment_ingest
________________________________________________________________________________
Row count
    Left: 1822
    Right: 1822
________________________________________________________________________________
Column comparison
    Both
        affliation
        beds_code
        county
        data_library_version
        fullk
        gr1
        gr10
        gr11
        gr12
        gr2
        gr3
        gr4
        gr5
        gr6
        gr7
        gr8
        gr9
        halfk
        institution_id
        ogc_fid
        prek
        school_name
        school_year
        uge
        ugs
    Left only: None
    Right only: None
    Type differences
        Halfk
            Left: character varying
            Right: bigint
        Institution id
            Left: character varying
            Right: bigint
        Gr11
            Left: character varying
            Right: bigint
        Beds code
            Left: character varying
            Right: bigint
        Gr8
            Left: character varying
            Right: bigint
        Gr6
            Left: character varying
            Right: bigint
        Prek
            Left: character varying
            Right: bigint
        Uge
            Left: character varying
            Right: bigint
        Gr7
            Left: character varying
            Right: bigint
        Gr2
            Left: character varying
            Right: bigint
        Gr9
            Left: character varying
            Right: bigint
        Gr10
            Left: character varying
            Right: bigint
        Gr1
            Left: character varying
            Right: bigint
        Gr3
            Left: character varying
            Right: bigint
        Affliation
            Left: character varying
            Right: text
        Gr12
            Left: character varying
            Right: bigint
        Ugs
            Left: character varying
            Right: bigint
        Gr5
            Left: character varying
            Right: bigint
        School name
            Left: character varying
            Right: text
        Gr4
            Left: character varying
            Right: bigint
        School year
            Left: character varying
            Right: text
        County
            Left: character varying
            Right: text
        Fullk
            Left: character varying
            Right: bigint
________________________________________________________________________________
Data comparison
    Compared columns
        affliation
        beds_code
        county
        fullk
        gr1
        gr10
        gr11
        gr12
        gr2
        gr3
        gr4
        gr5
        gr6
        gr7
        gr8
        gr9
        halfk
        institution_id
        prek
        school_name
        school_year
        uge
        ugs
    Ignored columns
        ogc_fid
        data_library_version
    Columns coerced to numeric
        prek
        halfk
        fullk
        gr1
        gr2
        gr3
        gr4
        gr5
        gr6
        gr7
        gr8
        gr9
        gr10
        gr11
        gr12
        institution_id
        beds_code
        ugs
        uge
    Left only
        Empty DataFrame
        Columns: [halfk, institution_id, gr11, beds_code, gr8, gr6, prek, uge, gr7, gr2, gr9, gr10, gr1, gr3, affliation, gr12, ugs, gr5, school_name, gr4, school_year, county, fullk]
        Index: []
    Right only
        Empty DataFrame
        Columns: [halfk, institution_id, gr11, beds_code, gr8, gr6, prek, uge, gr7, gr2, gr9, gr10, gr1, gr3, affliation, gr12, ugs, gr5, school_name, gr4, school_year, county, fullk]
        Index: []

@fvankrieken
Copy link
Contributor Author

fvankrieken commented Dec 18, 2024

dcp_sfpsd. library format csv

run with docker exec de python3 -m dcpy.cli lifecycle scripts validate_ingest compare dcp_sfpsd -k uid --c2n xcoord --c2n ycoord --c2n latitude --c2n longitude

________________________________________________________________________________
Tables
    Left: dcp_sfpsd_library
    Right: dcp_sfpsd_ingest
________________________________________________________________________________
Row count
    Left: 173
    Right: 173
________________________________________________________________________________
Column comparison
    Both
        address
        addressnum
        agencyjuris
        area
        areatype
        bbl
        bin
        boro
        borocode
        capacity
        captype
        censtract
        city
        commboard
        council
        data_library_version
        datadate
        dataname
        datasource
        dataurl
        facdomain
        facgroup
        facname
        facsubgrp
        factype
        idagency
        idold
        latitude
        longitude
        nta
        ogc_fid
        opabbrev
        opname
        optype
        overabbrev
        overagency
        overlevel
        pgtable
        proptype
        streetname
        the_geom_webmercator
        uid
        uid_merged
        util
        utilrate
        xcoord
        ycoord
        zipcode
    Left only
        WKT
        the_geom
    Right only
        wkb_geometry
    Type differences
        Longitude
            Left: text
            Right: double precision
        Latitude
            Left: text
            Right: double precision
        Ycoord
            Left: text
            Right: double precision
        Xcoord
            Left: text
            Right: double precision
________________________________________________________________________________
Data comparison
    Key columns
        uid
    Left only: None
    Right only: None
    Ignored columns
        ogc_fid
        data_library_version
    Columns coerced to numeric
        xcoord
        ycoord
        latitude
        longitude
    Columns with diffs
        borocode
        commboard
        idold
        zipcode
    Differences by column
        Idold
            149 rows. First 20 shown
                               left          right
            uid                                   
            22831         5402095.0        5402095
            125893     5303202001.0     5303202001
            127785        5402064.0        5402064
            23250         5405004.0        5405004
            57224         5403001.0        5403001
            182118     5302481001.0     5302481001
            119124        5401025.0        5401025
            54734      5302412002.0     5302412002
            97078      5302405001.0     5302405001
            24068         5408001.0        5408001
            116204        5402092.0        5402092
            115763  1500000000000.0  1500000000000
            129490        5405007.0        5405007
            110857     5303209002.0     5303209002
            162758        5402085.0        5402085
            167211        5306003.0        5306003
            33496         5402063.0        5402063
            182848        5402082.0        5402082
            29601         5402098.0        5402098
            119171        5402067.0        5402067
        Borocode
            169 rows. First 20 shown
                   left right
            uid              
            22831   3.0     3
            125893  2.0     2
            127785  2.0     2
            116546  1.0     1
            23250   4.0     4
            57224   2.0     2
            182118  4.0     4
            119124  3.0     3
            116544  1.0     1
            54734   4.0     4
            97078   4.0     4
            24068   4.0     4
            116204  4.0     4
            115763  1.0     1
            129490  4.0     4
            110857  2.0     2
            162758  3.0     3
            162572  1.0     1
            167211  5.0     5
            33496   3.0     3
        Zipcode
            173 rows. First 20 shown
                       left  right
            uid                   
            22831   11223.0  11223
            125893  10474.0  10474
            127785  10474.0  10474
            116546  10004.0  10004
            23250   11433.0  11433
            57224   10451.0  10451
            182118  11368.0  11368
            119124  11212.0  11212
            116544  10004.0  10004
            54734   11433.0  11433
            97078   11385.0  11385
            24068   11101.0  11101
            116204  11377.0  11377
            115763  10036.0  10036
            129490  11101.0  11101
            110857  10473.0  10473
            162758  11236.0  11236
            162572  10044.0  10044
            167211  10309.0  10309
            33496   11201.0  11201
        Commboard
            173 rows. First 20 shown
                     left right
            uid                
            22831   313.0   313
            125893  202.0   202
            127785  202.0   202
            116546  101.0   101
            23250   412.0   412
            57224   204.0   204
            182118  481.0   481
            119124  316.0   316
            116544  101.0   101
            54734   412.0   412
            97078   405.0   405
            24068   402.0   402
            116204  402.0   402
            115763  104.0   104
            129490  402.0   402
            110857  209.0   209
            162758  316.0   316
            162572  108.0   108
            167211  503.0   503
            33496   302.0   302

Running coercing these "strint" columns to numeric at comparison time, data comparison looks like

________________________________________________________________________________
Data comparison
    Key columns
        uid
    Left only: None
    Right only: None
    Ignored columns
        ogc_fid
        data_library_version
    Columns coerced to numeric
        xcoord
        ycoord
        latitude
        longitude
        idold
        borocode
        zipcode
        commboard
    Columns with diffs: None
    Differences by column: None

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: New
1 participant