-
Notifications
You must be signed in to change notification settings - Fork 10
Emprego Check
We are checking data from the RAIS Website using the aggregate by month and year.
To get data from site, do:
-
http://bi.mte.gov.br/bgcaged/login.php
- Usuário: basico
- Senha: 12345678
- bra_id : 2,4,6,7,8 : We cant check len 2 with others.. just check 4,6 with 8. To compare 7 with 4,8 necessary to use table attrs_bra_pr
- cnae_id: 1,6 ( isic was 1,3,5 )
- cbo_id: 1,4
Script:
Example: python -m emprego.step_2_aggs -m all > ALL.log
DV1: All OK!
DV2: All OK!
Here a analysis containing information about required information, aggregations and values domain, considering different aggs as bra_id, isic_id and cbo_id.
Script:
Example:
- python -m emprego.check.step_3_indicators -m all > ALlSecex.log
-
"Occupation Diversity" and "Effective Occupation Diversity" for all CNAE, regions and all years
- range: >0
- required when value employees > 0 for each industry
-
"Location Diversity" and "Effective Location Diversity" for all CNAE, regions and all years
- range: >0
- required when value employees > 0 for each location
-
"Domestic RCA" for all CNAE, regions and all years
- range: >0
- required when value employees > 0 for each industry, else domestic rca must be null
-
Distance: For all ISIC and all locations and all years
- range: (0;1)
- required always when Length ISIC = 4, else must be Null
-
"Opportunity": For all ISIC and all locations and all years (-...;+...) in general, (-3;+3)
- required always when Length ISIC = 4, else must be Null
-
"Growth's": for number of employes, wage (TODO DV2)
- range: (-...,+...)
- Anual: 2000 is Null, growth_val cant be null, growth_pct can be null if number of employes or wage from last year or current year is 0
- 5 Anual: 2000/2003 is Null, growth_val cant be null, growth_pct can be null if number of employes or wage from past 5 year is 0 and does not have wage/emploees in the current year
-
"Industry Diversity" and "Effective Industry Diversity" for all ISIC, regions and all years
- range: >0
- required when value employees > 0 for each occupation
-
"Location Diversity" and "Effective Location Diversity" for all ISIC, regions and all years
- range: >0
- required when value employees > 0 for each location
-
Growth's: for number of employes, wage (TODO DV2)
- range: (-...,+...)
- Anual: 2000 is Null, growth_val cant be null, growth_pct can be null if number of employes or wage from last year or current year is 0
- 5 Anual: 2000/2003 is Null, growth_val cant be null, growth_pct can be null if number of employes or wage from past 5 year is 0 and does not have wage/emploees in the current year
-
"Importante" for all occupations
- range: (0;1)
- required when value employees > 0 for each industry and length of isic is 5 and length of cbo is 4
-
"Required" for all occupations
- range: >0
- required when value employees > 0 for each industry and "Importante" >= 0.20
Entering in checkImportance
- checkImportance : rais_yio - OK
Entering in checkDistance
- checkDistance : rais_ybi - OK
Entering in checkOpportunity
- checkOpportunity : rais_ybi: - OK
Entering in checkRequired
- checkRequired : rais_ybio - OK
checkGrowth
- Found 638 Errors in checkGrowth for rais_yo:1:
- Found 432 Errors in checkGrowth for rais_yi:1:
- Found 880 Errors in checkGrowth for rais_ybo:1:
checkGrowthAnual
- Found 17199 Errors in checkGrowth_val for rais_yb
- Found 602211 Errors in checkGrowth_val for rais_ybi
- Found 820749 Errors in checkGrowth_val for rais_ybio
- Found 1142556 Errors in checkGrowth_val for rais_ybo
- Found 529 Errors in checkGrowth_val for rais_yi
- Found 760 Errors in checkGrowth_val for rais_yo
Entering in checkWage
- checkWage : rais_yb, rais_ybio, rais_ybo, rais_yi, rais_yo - OK
- checkWage : rais_ybi: Found 20240252 Errors in checkWage for rais_ybi
- Necessary just fill with zero
Entering in checkDiversity:
- rais_yb,rais_yo,rais_yi - OK
- checkDiversity : rais_yi: Found 418 Errors in checkDiversity for rais_yi
- checkDiversity : rais_yb: Found 1319 Errors in checkDiversity for rais_yb
- checkDiversity : rais_yo: Found 615 Errors in checkDiversity for rais_yo
- Erros found just for 2004 (checked)
Entering in checkRCA
- checkRCA : rais_ybi: Found 2354352 Errors in checkRCA for rais_ybi
- Entering in checkImportance
- Entering in checkDiversity
- Found 2810572 Errors in checkWage for rais_ybi
- Cant Check checkGrowthAnual,checkGrowth: Just 2002 Year
- Found 3233050 Errors in checkOpportunity for rais_ybi
- Found 3233050 Errors in checkDistance for rais_ybi
- Found 77103 Errors in checkRCA for rais_ybi
1- Export value for all ISIC ( rais_yi )
2- Export value for all municipalities (rais_yb)
3- Export value for all CBO ( rais_yo )
Script: https://github.com/DataViva/datavivaetl/blob/master/emprego/check/step_4_sent.py
Example:
- python -m emprego.check.step_4_sent -m all -y all > step_4_sent.log
DV1:
All OK!
Necessary to fix problem with this test script:
Entering in checkCBO:
- ERROR in groupId (2010): 2011.0 / 2011 - Value in CSV 53310974 <> Value in DV 13793875 - Difference: 39517099
- ERROR in groupId (2010): 2030.0 / 2030 - Value in CSV 61529787 <> Value in DV 20795570 - Difference: 40734217
- ERROR in groupId (2010): 2031.0 / 2031 - Value in CSV 24955651 <> Value in DV 6910177 - Difference: 18045474
- ERROR in groupId (2010): 2111.0 / 2111 - Value in CSV 182614470 <> Value in DV 11351355 - Difference: 171263115
- ERROR in groupId (2010): 3011.0 / 3011 - Value in CSV 87016223 <> Value in DV 80542506 - Difference: 6473717
- ERROR in groupId (2010): 3111.0 / 3111 - Value in CSV 83832988 <> Value in DV 65289347 - Difference: 18543641
- ERROR in groupId (2010): 3121.0 / 3121 - Value in CSV 137066880 <> Value in DV 127607810 - Difference: 9459070
Entering in checkCBO
-
ERROR in groupId (2011): 2011.0 / 2011 - Value in CSV 18370256 <> Value in DV 14451981 - Difference: 3918275
-
ERROR in groupId (2011): 2030.0 / 2030 - Value in CSV 30105407 <> Value in DV 22183153 - Difference: 7922254
-
ERROR in groupId (2011): 2031.0 / 2031 - Value in CSV 11279653 <> Value in DV 8088330 - Difference: 3191323
-
ERROR in groupId (2011): 2111.0 / 2111 - Value in CSV 73832054 <> Value in DV 12970592 - Difference: 60861462
-
ERROR in groupId (2011): 3011.0 / 3011 - Value in CSV 92527002 <> Value in DV 91867639 - Difference: 659363
-
ERROR in groupId (2011): 3111.0 / 3111 - Value in CSV 81041358 <> Value in DV 71941507 - Difference: 9099851
-
ERROR in groupId (2011): 3121.0 / 3121 - Value in CSV 156013535 <> Value in DV 151890436 - Difference: 4123099
-
ERROR in CBO (2012): 2011.0 / 2011 - Value in CSV 16575691 <> Value in DV 16560639 - Difference: 15052
-
ERROR in CBO (2012): 2030.0 / 2030 - Value in CSV 25705426 <> Value in DV 25118529 - Difference: 586897
-
ERROR in CBO (2012): 2111.0 / 2111 - Value in CSV 14745183 <> Value in DV 13848551 - Difference: 896632
-
ERROR in CBO (2012): 3121.0 / 3121 - Value in CSV 177678826 <> Value in DV 177675357 - Difference: 3469