-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Configuration for credentials, proxies, and progress #162
Conversation
- credentials are stored in the dict - proxies are also taken from there Removed all writing to environment except for the token. Cleaned up bare excepts and imports.
About Q1: I'd say either here or there? Maybe we should also document the fact that tokens are stored to a CSV on the current machine if not stored through environment variables (I seem to remember I moved to environment variables for this very reason). About Q2: I like the idea about the |
Codecov ReportAttention:
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## master #162 +/- ##
==========================================
- Coverage 68.38% 67.43% -0.95%
==========================================
Files 117 115 -2
Lines 3957 3894 -63
==========================================
- Hits 2706 2626 -80
- Misses 1251 1268 +17 ☔ View full report in Codecov by Sentry. |
"!!! Token is missing, please check insee_key and insee_secret are correct !!!" | ||
) | ||
else: | ||
logger.info("Token has been created") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we keep a trace of this in log (I've a project which was repeating those calls and this was helpful)? Would be better inside _get_token_from_insee
instead, though.
Ok, so the extensive values stored in environ by pynsee are:
EDIT: just took care of the last one |
Thanks a lot for the comprehensive list of env variables. They are used mainly as workarounds, I do agree it should be cleaned up, so thanks for raising this topic. Below, my take on the topic for each case:
|
Sure, we could set a variable for that, but in that case it would probably make sense to retain the config over multiple runs and that would require a config file. Though it's probably a good idea, maybe this could be done in a later PR. In the meantime, we could probably write the credentials in a file with restricted access using @hadrilec : noted, I'll do that during the 2nd week of August, when I'm back from holiday |
@hadrilec I'll try to think on something with tests to reduce the need for About |
@tgrandje ok, I am fine for having the queries printed in the log but it should not go under INFO but under DEBUG, otherwise the user will be overwhelmed. Consequently, |
I'm wondering about the |
I'm not sure this is really slowing things (when not working from cache), though I'm saying this on a hunch and haven't time to check. In my experience, the major impact to speed is the API's rate (30requests/minute if I'm not mistaken) and there is not much to do about it... But maybe we could store a session through the dict config altogether ? |
Storing a session in the dict is definitely a great idea, will do! Regarding the token, besides the time it might take, it's also more of a "why is this needed?" question: the API is going to reply with a 401 unauthorized code if we provide incorrect headers (token) so I don't see the point of checking the token. |
@hadrilec just to clarify something: I still plan to allow some configs to be provided via environment variables but in the list that you provided, I'm not sure whether I should support all of them or not. Could you tell me if you want to keep the possibility for people to provide these via env variables?
|
@hadrilec can you also confirm that I should remove these lines (as you said |
@tfardet, indeed the while loop should be kept but the usage of the variable |
I would say that the user should be able to modify the value of |
@tfardet I've started building the test refactorization from your branch. It was easier, given the fact that you simplified the configuration storage. It also makes sense, as you said you'll try to store the session object and I'm basically patching session with a Are you ok with me requesting merge directly in you branch ? (In which case, it should speedup the tests on your side, that's precisely how I found the error I mentionned today). I'll wait for advice regarding the storage volume here beforehand of course. |
@tgrandje sorry, I did not have time to work on this lately, other stuff came up, probably won't have time to touch it before next week... I'm not sure what you mean by "requesting merge": do you mean that you would send specific patches for the config or do you want to merge your whole CI caching PR? If it's the former, it's completely fine; if it's the later, unless you manage to make your PR really small, I'm not sure it's a good idea. |
- credentials are stored in the dict - proxies are also taken from there Removed all writing to environment except for the token. Cleaned up bare excepts and imports.
Co-authored-by: tfardet <[email protected]>
Co-authored-by: tfardet <[email protected]>
Co-authored-by: tfardet <[email protected]>
Co-authored-by: tfardet <[email protected]>
yes, I agree, I should have made the changes elsewhere, let's see if I time later on to decouple the changes I deleted the function regarding, the issue on "metadonnees" I understand that the connection is lost at some point, I an going to add a by default user agent to the header of the requests to see if it solves the problem as it is the advice described here: |
Hi @tfardet, I executed the code below and whenever Could you please make sure the function import logging
import sys
logging.basicConfig(stream=sys.stdout,
level=logging.DEBUG,
format="%(message)s")
from pynsee import * |
@hadrilec ah, sorry, maybe I missed something, will check it out! EDIT: could you post what you see? I don't get anything when using a registered token, only when using env variables, and only the necessary part:
|
@hadrilec it runs just fine locally, and the code seems OK... can you post what seems wrong on your side? |
well, I have the same but I think this should not be the targeted behaviour. From a user point of view, I think it is better to start testing the connection with the api through the init_conn function and not with the import package. In addition, I suspect that as many functions seem to rely on set_config, the function testing the connection to all apis might be triggered all over again and again, is that right or is there something I misunderstand? |
|
That being said, tests are now taking 4h to run, so there's definitely something wrong, I just don't see what it is, I'll investigate EDIT: I haven't been able to find out what went wrong yet but the last run in less than 1h is 49bd3a9 while 2b9b436 is above 4h. @hadrilec you changed a lot of files between these two commits, any idea if you could have changed something relevant (the fact that the entire files appear changed makes it very hard to parse... I might actually have to restart a new PR from master to workaround that issue) Last 1h run: https://github.com/InseeFrLab/pynsee/actions/runs/5655467051/usage (49bd3a9) |
@hadrilec honestly, this PR has become unmanageable IMO ^^" Once we merge that (hopefully the tests will be back to less than 1h), I'll redo a PR from that one. |
Thanks a lot @tfardet for the second PR! |
@hadrilec given the amount of changes to the code since I started, I need to start this again from scratch. |
Use configuration dictionary:
hide_progress
option to disable progress bars fromtqdm
Question 1: where should I document that?
Removed all writing to environment except for the token.
Question 2: should I also move it to the config dict to avoid writing to env altogether?
Cleaned up some bare excepts and imports.
Fixes #152