Skip to content
mariohmol edited this page Oct 29, 2014 · 25 revisions

1: Data Site x Raw Data

We are checking data from the RAIS Website using the aggregate by month and year.

To get data from site, do:

2: Raw Data x Data Sent

3: Data Sent x Dataviva

4: Dataviva x Dataviva

Aggs

  • bra_id : 2,4,6,7,8 : We cant check len 2 with others.. just check 4,6 with 8. To compare 7 with 4,8 necessary to use table attrs_bra_pr
  • cnae_id: 1,6 ( isic was 1,3,5 )
  • cbo_id: 1,4

Script:

Example: python -m emprego.step_2_aggs -m all > ALL.log

Report

DV1: All OK!

DV2: All OK!

Indicators

Here a analysis containing information about required information, aggregations and values domain, considering different aggs as bra_id, isic_id and cbo_id.

Script:

Example:

  • python -m emprego.check.step_3_indicators -m all > ALlSecex.log

For Industries:

  • "Occupation Diversity" and "Effective Occupation Diversity" for all CNAE, regions and all years

    • range: >0
    • required when value employees > 0 for each industry
  • "Location Diversity" and "Effective Location Diversity" for all CNAE, regions and all years

    • range: >0
    • required when value employees > 0 for each location
  • "Domestic RCA" for all CNAE, regions and all years

    • range: >0
    • required when value employees > 0 for each industry, else domestic rca must be null
  • Distance: For all ISIC and all locations and all years

    • range: (0;1)
    • required always when Length ISIC = 4, else must be Null
  • "Opportunity": For all ISIC and all locations and all years (-...;+...) in general, (-3;+3)

    • required always when Length ISIC = 4, else must be Null
  • "Growth's": for number of employes, wage (TODO DV2)

    • range: (-...,+...)
    • Anual: 2000 is Null, growth_val cant be null, growth_pct can be null if number of employes or wage from last year or current year is 0
    • 5 Anual: 2000/2003 is Null, growth_val cant be null, growth_pct can be null if number of employes or wage from past 5 year is 0 and does not have wage/emploees in the current year

For Occupations:

  • "Industry Diversity" and "Effective Industry Diversity" for all ISIC, regions and all years

    • range: >0
    • required when value employees > 0 for each occupation
  • "Location Diversity" and "Effective Location Diversity" for all ISIC, regions and all years

    • range: >0
    • required when value employees > 0 for each location
  • Growth's: for number of employes, wage (TODO DV2)

    • range: (-...,+...)
    • Anual: 2000 is Null, growth_val cant be null, growth_pct can be null if number of employes or wage from last year or current year is 0
    • 5 Anual: 2000/2003 is Null, growth_val cant be null, growth_pct can be null if number of employes or wage from past 5 year is 0 and does not have wage/emploees in the current year
  • "Importante" for all occupations

    • range: (0;1)
    • required when value employees > 0 for each industry and length of isic is 5 and length of cbo is 4
  • "Required" for all occupations

    • range: >0
    • required when value employees > 0 for each industry and "Importante" >= 0.20

Report

Entering in checkImportance

  • checkImportance : rais_yio - OK

Entering in checkDistance

  • checkDistance : rais_ybi - OK

Entering in checkOpportunity

  • checkOpportunity : rais_ybi: - OK

Entering in checkRequired

  • checkRequired : rais_ybio - OK

Erros DV1

checkGrowth

  • Found 638 Errors in checkGrowth for rais_yo:1:
  • Found 432 Errors in checkGrowth for rais_yi:1:
  • Found 880 Errors in checkGrowth for rais_ybo:1:

checkGrowthAnual

  • Found 17199 Errors in checkGrowth_val for rais_yb
  • Found 602211 Errors in checkGrowth_val for rais_ybi
  • Found 820749 Errors in checkGrowth_val for rais_ybio
  • Found 1142556 Errors in checkGrowth_val for rais_ybo
  • Found 529 Errors in checkGrowth_val for rais_yi
  • Found 760 Errors in checkGrowth_val for rais_yo

Entering in checkWage

  • checkWage : rais_yb, rais_ybio, rais_ybo, rais_yi, rais_yo - OK
  • checkWage : rais_ybi: Found 20240252 Errors in checkWage for rais_ybi
    • Necessary just fill with zero

Entering in checkDiversity:

  • rais_yb,rais_yo,rais_yi - OK
  • checkDiversity : rais_yi: Found 418 Errors in checkDiversity for rais_yi
  • checkDiversity : rais_yb: Found 1319 Errors in checkDiversity for rais_yb
  • checkDiversity : rais_yo: Found 615 Errors in checkDiversity for rais_yo
    • Erros found just for 2004 (checked)

Entering in checkRCA

  • checkRCA : rais_ybi: Found 2354352 Errors in checkRCA for rais_ybi

Report DV 2

  • Entering in checkImportance
  • Entering in checkDiversity

Erros DV 2

  • Found 2810572 Errors in checkWage for rais_ybi
  • Cant Check checkGrowthAnual,checkGrowth: Just 2002 Year
  • Found 3233050 Errors in checkOpportunity for rais_ybi
  • Found 3233050 Errors in checkDistance for rais_ybi
  • Found 77103 Errors in checkRCA for rais_ybi

Check with Data Sent

1- Export value for all ISIC ( rais_yi )

2- Export value for all municipalities (rais_yb)

3- Export value for all CBO ( rais_yo )

Script: https://github.com/DataViva/datavivaetl/blob/master/emprego/check/step_4_sent.py

Example:

  • python -m emprego.check.step_4_sent -m all -y all > step_4_sent.log

Report

2010

DV1:

All OK!

Necessary to fix problem with this test script:

Entering in checkCBO:

  • ERROR in groupId (2010): 2011.0 / 2011 - Value in CSV 53310974 <> Value in DV 13793875 - Difference: 39517099
  • ERROR in groupId (2010): 2030.0 / 2030 - Value in CSV 61529787 <> Value in DV 20795570 - Difference: 40734217
  • ERROR in groupId (2010): 2031.0 / 2031 - Value in CSV 24955651 <> Value in DV 6910177 - Difference: 18045474
  • ERROR in groupId (2010): 2111.0 / 2111 - Value in CSV 182614470 <> Value in DV 11351355 - Difference: 171263115
  • ERROR in groupId (2010): 3011.0 / 3011 - Value in CSV 87016223 <> Value in DV 80542506 - Difference: 6473717
  • ERROR in groupId (2010): 3111.0 / 3111 - Value in CSV 83832988 <> Value in DV 65289347 - Difference: 18543641
  • ERROR in groupId (2010): 3121.0 / 3121 - Value in CSV 137066880 <> Value in DV 127607810 - Difference: 9459070

Entering in checkCBO

  • ERROR in groupId (2011): 2011.0 / 2011 - Value in CSV 18370256 <> Value in DV 14451981 - Difference: 3918275

  • ERROR in groupId (2011): 2030.0 / 2030 - Value in CSV 30105407 <> Value in DV 22183153 - Difference: 7922254

  • ERROR in groupId (2011): 2031.0 / 2031 - Value in CSV 11279653 <> Value in DV 8088330 - Difference: 3191323

  • ERROR in groupId (2011): 2111.0 / 2111 - Value in CSV 73832054 <> Value in DV 12970592 - Difference: 60861462

  • ERROR in groupId (2011): 3011.0 / 3011 - Value in CSV 92527002 <> Value in DV 91867639 - Difference: 659363

  • ERROR in groupId (2011): 3111.0 / 3111 - Value in CSV 81041358 <> Value in DV 71941507 - Difference: 9099851

  • ERROR in groupId (2011): 3121.0 / 3121 - Value in CSV 156013535 <> Value in DV 151890436 - Difference: 4123099

  • ERROR in CBO (2012): 2011.0 / 2011 - Value in CSV 16575691 <> Value in DV 16560639 - Difference: 15052

  • ERROR in CBO (2012): 2030.0 / 2030 - Value in CSV 25705426 <> Value in DV 25118529 - Difference: 586897

  • ERROR in CBO (2012): 2111.0 / 2111 - Value in CSV 14745183 <> Value in DV 13848551 - Difference: 896632

  • ERROR in CBO (2012): 3121.0 / 3121 - Value in CSV 177678826 <> Value in DV 177675357 - Difference: 3469