-
-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
📊 Update faostat data #2416
📊 Update faostat data #2416
Conversation
… and fbs, and replace assertions with log errors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR looks good. If it works without memory optimisations then go ahead and merge it. We'll see what it does in production.
EDIT: I checked tb
, and there are only two object columns with sizes around 2gb. That's not that much, but perhaps converting them to categories wouldn't cause problems with metadata?
ipdb> p tb.fao_unit_short_name.memory_usage(deep=True) / 2**20
1631.690894126892
ipdb> p tb.fao_element.memory_usage(deep=True) / 2**20
2025.320728302002
Hey Pablo! This looks great - what an absolute monster of a dataset, it never ceases to amaze me! I'll write some thoughts on the charts down here, but generally they look great:
|
Main changes
multi_merge
function toowid.catalog
(that properly works withTable
object).wcad
) and "Energy use" (gn
).ef
,el
, andep
. I have checked that all variables used in charts from these datasets can (in principle) be replaced by the analogous ones fromrf
,rl
, andrfn
, respectively (but will need to confirm once grapher datasets exist).qcl
(and therefore in the food explorer) I renamedFlax fibre
->Flax, raw or retted
. This follows the change that FAOSTAT did too, by which item 773 (Flax fibre
) disappeared fromqcl
, being replaced by 771 (Flax, raw or retted
).qi
, and, its definition is "Flax, processed but not spun", with description: "Broken, scutched, hackled etc. but not spun. Traditionally, FAO has used this commodity to identify production in its raw state; in reality, the primary agricultural product is the commodity 01929.01 (Flax, raw or retted) which can either be used for the production of fibre or for other purposes (Unofficial definition)".qcl
is defined as "Flax, raw or retted", with description: "Flax Straw, spp. Linum usitatissimum. Flax is cultivated for seed as well as for fibre. The fibre is obtained from the stem of the plant. Data are reported in terms of straw. (Unofficial definition)".TO-DO for Pablo R (in separate PRs):
For reviewers
There is no need to do a thoroughly review this monster PR. I'd suggest:
owid.catalog
? Also, note that I temporarily removed some of the fixes that you made toetl/steps/data/garden/faostat/2024-03-14/faostat_fbsc.py
to control for memory use. I did that because theconcatenate
function was not propagating metadata. ETL did not complain in staging, but we may need to adapt this function to work with tables (with metadata propagation). If you think we should do that before merging, let me know and I'll look into it.Thanks a lot!!