Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix tests - steamline testing + multiprocessing backup #179

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

hadrilec
Copy link
Contributor

No description provided.

@InseeFrLab InseeFrLab deleted a comment from codecov-commenter Dec 27, 2023
Copy link
Collaborator

@tfardet tfardet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is a bug in the multiprocessing code (and I don't have time to debug it), please just wrap the original code in the try/except rather than the new one with the func_settings and func, especially as this will eventually go away with #182 anyway.

Otherwise we're back at 3-hour tests.

Comment on lines 172 to 201
length = len(list_bbox)
irange = range(length)

data_all = pd.concat(list_data).reset_index(drop=True)
func_settings = _set_global_var
func = _get_data_with_bbox2

try:
with multiprocessing.Pool(
initializer=func_settings, initargs=(args,), processes=Nprocesses
) as pool:
list_output = list(
tqdm.tqdm(
pool.imap(func, irange),
total=length
)
)
except Exception:
func_settings(args)
list_output = []

for p in tqdm.trange(length):
list_output.append(func(p))

msg = """
Multiprocessing failed in the geodata collection,
a traditional loop was used instead
"""
logger.warning(msg)

data_all = pd.concat(list_output).reset_index(drop=True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do not include these changes, there is a bug somewhere which leads to the 4h-long tests

Comment on lines +242 to +246
# df = _build_series_list()
# test = isinstance(df, pd.DataFrame)
# os.environ['pynsee_use_sdmx'] = "False"
# self.assertTrue(test)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uncomment or remove

Comment on lines +251 to +252
#python test_pynsee_macrodata.py TestFunction.test_get_column_title_1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

@InseeFrLab InseeFrLab deleted a comment from codecov-commenter Dec 27, 2023
@hadrilec hadrilec added invalid This doesn't seem right dont merge and removed invalid This doesn't seem right labels Dec 27, 2023
@InseeFrLab InseeFrLab deleted a comment from codecov-commenter Dec 28, 2023
@hadrilec
Copy link
Contributor Author

@tfardet, shall I merge?

@tfardet
Copy link
Collaborator

tfardet commented Dec 30, 2023

@hadrilec yes. The tests still took 3h, for reasons that I don't understand... but it seems to be already the case for the last 3 commits on master and checking the coverage showed that the multiprocessing code was used, not the except block.

Could you first merge master into it and fix the conflicts? (since you are using a branch on the pynsee account, I can't do it myself)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants