Skip to content

Database

jspeis edited this page Apr 30, 2015 · 18 revisions

The DataViva database uses a series of relational disaggregate MySQL tables to reduce redundancy and improve data query time.

## Naming Convention To simplify naming the various disaggregate data tables (and also some attribute tables), we adhere to the following abbreviations:

Category Table Name Abbreviation Key
Year y year
Brazilian Location b bra_id
Industry i isic_id
Occupation o cbo_id
Product p hs_id
Trade Partner w wld_id
School s school_id
University u university_id
Course (Higher Education) c course_hedu_id
Course (School Census) c course_sc_id

## Attribute Tables Each data point often has multiple indexes that may correspond to locations, occupations, and so on. The data tables for those data points store small VARCHAR values as indexes that are Foreign Keys to an appropriate attribute table.

### Brazilian Locations There are 4 tables that store data regarding Brazilian Locations.

###General Attributes

MySQL Table: attrs_bra

DataViva uses a custom nested ID structure for Brazilian Locations. Out of 9 digits, each 2 digits represent a different level of nesting: Region, State, Mesoregion, Microregion, Municipality.

Key Description
id The main ID used site-wide, matching "bra_id" in the various data tables.
id_ibge IBGE ID used for matching RAIS data.
id_mdic MDIC ID used for matching SECEX data.
name_en The English name for a location.
name_pt The Portuguese name for a location.
gender_pt The Portuguese gender for a location.
plural_pt Whether a location name is plural in Portuguese.
article_pt Whether the Portuguese name requires an article when conjugated in a sentence.
color The DataViva generated color to represent this location (unique by states).

###Municipality Distance

MySQL Table: attrs_bb

A table of distances (in kilometers) between each municipality.

Key Description
bra_id_origin Municipality ID that corresponds to the IDs in the General Attribute table.
bra_id_dest Municipality ID that corresponds to the IDs in the General Attribute table.
distance The geographical distance, in kilometers, between "bra_id_origin" and "bra_id_dest".

###Population

MySQL Table: attrs_yb

A lookup table showing municipality population by year.

Key Description
year Self Explanatory
bra_id Municipality ID that corresponds to the IDs in the General Attribute table.
population Population of the given municipality, in the given year.

### Industries

MySQL Table: attrs_isic

Industries in DataViva are categorized using the International Standard Industrial Classification, or "ISIC".

Key Description
id The main ID used site-wide, matching "isic_id" in the various data tables.
name_en The shortened English name for an Industry.
desc_en The longer English name for an Industry, directly corresponding to their ISIC name.
keywords_en Comma separated English keywords that help searching.
name_pt The shortened Portuguese name for an Industry.
desc_pt The longer Portuguese name for an Industry.
keywords_en Comma separated Portuguese keywords that help searching.
gender_pt The Portuguese gender for an Industry.
plural_pt Whether an Industry name is plural in Portuguese.
article_pt Whether the Portuguese name requires an article when conjugated in a sentence.
color The DataViva generated color to represent this Industry.

### Occupations

MySQL Table: attrs_cbo

Occupations in DataViva are categorized using the Classificação Brasileira de Ocupações, or "CBO".

Key Description
id The main ID used site-wide, matching "cbo_id" in the various data tables.
name_en The shortened English name for an Occupation.
desc_en The longer English name for an Occupation.
keywords_en Comma separated English keywords that help searching.
name_pt The shortened Portuguese name for an Occupation.
desc_pt The longer Portuguese name for an Occupation, directly corresponding to their CBO name.
keywords_en Comma separated Portuguese keywords that help searching.
gender_pt The Portuguese gender for an Occupation.
plural_pt Whether an Occupation name is plural in Portuguese.
article_pt Whether the Portuguese name requires an article when conjugated in a sentence.
color The DataViva generated color to represent this Occupation.

### Products

MySQL Table: attrs_hs

Products in DataViva are categorized using the Harmonized System, or "HS".

Key Description
id The main ID used site-wide, matching "hs_id" in the various data tables.
name_en The shortened English name for a Product.
desc_en The longer English name for a Product.
keywords_en Comma separated English keywords that help searching.
name_pt The shortened Portuguese name for a Product.
desc_pt The longer Portuguese name for a Product, directly corresponding to their CBO name.
keywords_en Comma separated Portuguese keywords that help searching.
gender_pt The Portuguese gender for a Product.
plural_pt Whether a Product name is plural in Portuguese.
article_pt Whether the Portuguese name requires an article when conjugated in a sentence.
color The DataViva generated color to represent this Product.

#### Trade Partners

MySQL Table: attrs_wld

In DataViva, the first 2 digits of a Trade Partner's ID represent it's continent, while the last 3 digits correspond to their 3-digit ISO code.

Key Description
id The main ID used site-wide, matching "wld_id" in the various data tables.
id_2char ISO 2-digit code
id_3char ISO 3-digit code
id_num ISO numeric code
id_mdic MDIC ID used for matching SECEX data.
name_en The English name for a Trade Partner.
name_pt The Portuguese name for a Trade Partner.
gender_pt The Portuguese gender for a Trade Partner.
plural_pt Whether a Trade Partner name is plural in Portuguese.
article_pt Whether the Portuguese name requires an article when conjugated in a sentence.
color The DataViva generated color to represent this Trade Partner.

#### Attribute Search

MySQL Table: attrs_search

The attribute search table is a table meant to facilitate rapid querying of DataViva's core attribute types (locations, occupations, industries, higher education courses, school census courses, products, universities, and trade partners).

Key Description
id The main ID used site-wide in the various data tables.
weight Standard score of this id relative to its nesting level (based on a metric such as exports or enrollment).
kind The type of attribute.
name_en The English name for the attribute.
name_pt The Portuguese name for the attribute.
color The DataViva generated color to represent the attribute.

### Universities

MySQL Table: attrs_university

Key Description
id The main ID used site-wide, matching "university_id" in the various data tables.
name_en The shortened English name for the University.
desc_en The longer English name for the University.
keywords_en Comma separated English keywords that help searching.
name_pt The shortened Portuguese name for the University.
desc_pt The longer Portuguese name for the University.
keywords_en Comma separated Portuguese keywords that help searching.
gender_pt The Portuguese gender for the University.
plural_pt Whether the University name is plural in Portuguese.
article_pt Whether the Portuguese name requires an article when conjugated in a sentence.
color The DataViva generated color to represent this University.
school_type_id Represents the type of funding model for the University.
school_type_en Represents the English name for the type of funding model for the University.
school_type_pt Represents the Portuguese name for the type of funding model for the University.

### Schools

MySQL Table: attrs_school

Key Description
id The main ID used site-wide, matching "school_id" in the various data tables.
name_en The shortened English name for the School.
desc_en The longer English name for the School.
keywords_en Comma separated English keywords that help searching.
name_pt The shortened Portuguese name for the School.
desc_pt The longer Portuguese name for the School.
keywords_en Comma separated Portuguese keywords that help searching.
gender_pt The Portuguese gender for the School.
plural_pt Whether the School name is plural in Portuguese.
article_pt Whether the Portuguese name requires an article when conjugated in a sentence.
color The DataViva generated color to represent this School.
is_vocational Flag indicating whether the given school offers vocational courses.
school_type_id Represents the type of funding model for the School.
school_type_en Represents the English name for the type of funding model for the School.
school_type_pt Represents the Portuguese name for the type of funding model for the School.

## Data Tables Each disaggregate data table has different combinations of keys, calculations, and aggregations. These following charts show which variables are present in each table.

### RAIS

###Base Values The following variables are available in every RAIS table.

Key Description
wage Total Monthly Wages, in Brazilian Reals
num_emp Total Number of Employees
wage_avg Average Monthly Wage per Employee (wage/num_emp)
num_est Total Number of Establishments
num_emp_est Average Number of Employees per Establishment (num_emp/num_est)
wage_growth_val 1-Year Wage Growth Value
wage_growth_val_5 5-Year Wage Growth Value
wage_growth_pct 1-Year Wage Growth Percentage
wage_growth_pct_5 5-Year Wage Growth Percentage
num_emp_growth_val 1-Year Employee Growth Value
num_emp_growth_val_5 5-Year Employee Growth Value
num_emp_growth_pct 1-Year Employee Growth Percentage
num_emp_growth_pct_5 5-Year Employee Growth Percentage

###Disaggregate Keys

table year bra_id isic_id cbo_id
rais_ybio X X X X
rais_ybi X X X
rais_ybo X X X
rais_yio X X X
rais_yb X X
rais_yi X X
rais_yo X X

###Calculations

table rca distance opp_gain required importance isic_diversity isic_diversity_eff cbo_diversity cbo_diversity_eff bra_diversity bra_diversity_eff
rais_ybio X
rais_ybi X X X
rais_ybo
rais_yio X
rais_yb X X X X
rais_yi X X X X
rais_yo X X X X

### SECEX

###Base Values The following variables are available in every SECEX table.

Key Description
export_val Total Export Value, in US Dollars
import_val Total Import Value, in US Dollars
export_val_growth 1-Year Export Growth Percentage
export_val_growth_5 5-Year Export Growth Percentage
import_val_growth 1-Year Import Growth Percentage
import_val_growth_5 5-Year Import Growth Percentage

###Disaggregate Keys

table year month bra_id hs_id wld_id
secex_ymbpw X X X X X
secex_ymbp X X X X
secex_ymbw X X X X
secex_ympw X X X X
secex_ymb X X X
secex_ymp X X X
secex_ymw X X X

###Calculations

table rca rca_wld distance distance_wld opp_gain opp_gain_wld eci pci hs_diversity hs_diversity_eff wld_diversity wld_diversity_eff bra_diversity bra_diversity_eff
secex_ybpw
secex_ymbp X X X X X X
secex_ymbw
secex_ympw
secex_ymb X X X X X
secex_ymp X X X X X X
secex_ymw X X X X X

### Higher Education (HEDU)

###Base Values The following variables are available in every HEDU table.

Key Description
enrolled Number of enrolled students
age Average age of students
graduates Number of graduating students
entrants Number of entering students
morning Number of morning students
afternoon Number of afternoon students
night Number of night students
full_time Number of full-time students

###Disaggregate Keys |table|year|bra_id|university_id|course_hedu_id| |---|:-:|:-:|:-:|:-:|:-:| |hedu_ybuc|X|X|X|X| |hedu_ybc|X|X||X| |hedu_ybu|X|X|X|| |hedu_yuc|X||X|X| |hedu_yb|X|X||| |hedu_yc|X|||X| |hedu_yu|X||X||

### School Census (SC)

###Base Values The following variables are available in every SC table.

Key Description
classes Number of classes
age Average age of students
enrolled Number of enrolled students

###Disaggregate Keys |table|year|bra_id|school_id|course_sc_id| |---|:-:|:-:|:-:|:-:|:-:| |sc_ybsc|X|X|X|X| |sc_ybc|X|X||X| |sc_ybs|X|X|X|| |sc_ysc|X||X|X| |sc_yb|X|X||| |sc_yc|X|||X| |sc_ys|X||X||

Clone this wiki locally